Open-Retrievals Documentation ====================================== .. raw:: html GitHub Retrievals is an easy, flexible, scalable framework supporting state-of-the-art embeddings, retrieval and reranking for information retrieval or RAG. * Embedding fine-tuned through point-wise, pairwise, listwise, contrastive learning and LLM. * Reranking fine-tuned with Cross-Encoder, ColBERT and LLM. * Easily build modular RAG, integrated with Transformers, Langchain and LlamaIndex. Installation ------------------ Install the **prerequisites** * transformers * peft # for lora fine-tuning if necessary * faiss-cpu # for faiss retrieval if necessary Now you are ready, proceed with .. code-block:: shell # install with basic module pip install open-retrievals # install with support of evaluation pip install open-retrievals[eval] Or install from source code .. code-block:: shell python -m pip install -U git+https://github.com/LongxingTan/open-retrievals.git Examples ------------------ Run a simple example .. code-block:: python from retrievals import AutoModelForEmbedding sentences = ["Hello NLP", "Open-retrievals is designed for retrieval, rerank and RAG"] model_name_or_path = "sentence-transformers/all-MiniLM-L6-v2" model = AutoModelForEmbedding.from_pretrained(model_name_or_path, pooling_method="mean") sentence_embeddings = model.encode(sentences, normalize_embeddings=True) print(sentence_embeddings) Open-retrievals support to fine-tune the embedding model, reranking model, llm easily for custom usage. * `Embedding pairwise fine-tuning `_ * `LLM embedding pairwise fine-tuning `_ * `ColBERT fine-tuning `_ * `Cross-encoder reranking fine-tuning `_ * `LLM reranking fine-tuning `_ Use cases ------------------ * `T2 ranking dataset `_ * `scifact dataset `_ * `msmacro dataset `_ * `wikipedia nq dataset `_ * `rag example `_ Contributing --------------------- If you want to contribute to the project, please refer to our `contribution guidelines `_. .. toctree:: :maxdepth: 1 :caption: Contents: quick-start embed retrieval rerank rag