This is our implementation for the paper:
Ning Han, Jingjing Chen, Guangyi Xiao, Yawen Zeng, Chuhao Shi, Hao Chen. 2021. Visual Spatio-Temporal Relation-Enhanced Network for Cross-Modal Text-Video Retrieval.
The code is under preparation and is coming soon.