Open-Retrievals Documentation
======================================
.. raw:: html
GitHub
Retrievals is an easy, flexible, scalable framework supporting state-of-the-art embeddings, retrieval and reranking for information retrieval or RAG.
* Embedding fine-tuned through point-wise, pairwise, listwise, contrastive learning and LLM.
* Reranking fine-tuned with Cross-Encoder, ColBERT and LLM.
* Easily build modular RAG, integrated with Transformers, Langchain and LlamaIndex.
Installation
------------------
Install the **prerequisites**
* transformers
* peft # for lora fine-tuning if necessary
* faiss-cpu # for faiss retrieval if necessary
Now you are ready, proceed with
.. code-block:: shell
# install with basic module
pip install open-retrievals
# install with support of evaluation
pip install open-retrievals[eval]
Or install from source code
.. code-block:: shell
python -m pip install -U git+https://github.com/LongxingTan/open-retrievals.git
Examples
------------------
Run a simple example
.. code-block:: python
from retrievals import AutoModelForEmbedding
sentences = ["Hello NLP", "Open-retrievals is designed for retrieval, rerank and RAG"]
model_name_or_path = "sentence-transformers/all-MiniLM-L6-v2"
model = AutoModelForEmbedding.from_pretrained(model_name_or_path, pooling_method="mean")
sentence_embeddings = model.encode(sentences, normalize_embeddings=True)
print(sentence_embeddings)
Open-retrievals support to fine-tune the embedding model, reranking model, llm easily for custom usage.
* `Embedding pairwise fine-tuning `_
* `LLM embedding pairwise fine-tuning `_
* `ColBERT fine-tuning `_
* `Cross-encoder reranking fine-tuning `_
* `LLM reranking fine-tuning `_
Use cases
------------------
* `T2 ranking dataset `_
* `scifact dataset `_
* `msmacro dataset `_
* `wikipedia nq dataset `_
* `rag example `_
Contributing
---------------------
If you want to contribute to the project, please refer to our `contribution guidelines `_.
.. toctree::
:maxdepth: 1
:caption: Contents:
quick-start
embed
retrieval
rerank
rag