Cross-Modal Retrieval: Paper Collection & Code Repositories

1Your University

About

This project collects representative papers and open-source code repositories in the field of cross-modal retrieval, organized by research topics. Click the Code or Toolbox button above to browse the full paper list with links to papers and code repositories.

Topics covered include:

  • Contrastive Learning (CLIP, ALIGN, SigLIP, EVA-CLIP, LLM2CLIP...)
  • Fine-grained Matching (SCAN, FILIP, X-VLM, Long-CLIP...)
  • Fusion-based VLP (ALBEF, BLIP, BLIP-2, ViLT, CoCa...)
  • Multimodal Large Language Models (LLaVA, MiniGPT-4, InternVL...)
  • Noisy Correspondence
  • Related Surveys

Overview

[ Overview Figure — Coming Soon ]

BibTeX

@misc{yourname2025crossmodal,
    title={Cross-Modal Retrieval: Paper Collection},
    author={Your Name},
    year={2025},
    howpublished={\url{https://sun11711.github.io/cross-modal-retrieval-review/}},
}