RAG#

1. Build an RAG Application#

RAG could help solve the false information, out-of-date information, and data security for LLM by searching the external data. The basic RAG process is document indexing, query embedding, retrieval, optional rerank, and LLM generate.

  • Output reference for explainability

  • LLM Hallucination

from retrievals import AutoModelForEmbedding

Integrated with Langchain#

Open In Colab
from retrievals.tools.langchain import LangchainEmbedding, LangchainReranker, LangchainLLM
from retrievals import AutoModelForRanking
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.vectorstores import Chroma as Vectorstore
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import RetrievalQA

persist_directory = './database/faiss.index'
embed_model_name_or_path = "sentence-transformers/all-MiniLM-L6-v2"
rerank_model_name_or_path = "BAAI/bge-reranker-base"
llm_model_name_or_path = "microsoft/Phi-3-mini-128k-instruct"

embeddings = LangchainEmbedding(model_name_or_path=embed_model_name_or_path, model_kwargs={'pooling_method': 'mean'})
vectordb = Vectorstore(
    persist_directory=persist_directory,
    embedding_function=embeddings,
)
retrieval_args = {"search_type" :"similarity", "score_threshold": 0.15, "k": 10}
retriever = vectordb.as_retriever(**retrieval_args)

ranker = AutoModelForRanking.from_pretrained(rerank_model_name_or_path)
reranker = LangchainReranker(model=ranker, top_n=3)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranker, base_retriever=retriever
)

llm = LangchainLLM(model_name_or_path=llm_model_name_or_path)

RESPONSE_TEMPLATE = """[INST]
<>
You are a helpful AI assistant. Use the following pieces of context to answer the user's question.<>
Anything between the following `context` html blocks is retrieved from a knowledge base.

    {context}

REMEMBER:
- If you don't know the answer, just say that you don't know, don't try to make up an answer.
- Let's take a deep breath and think step-by-step.

Question: {question}[/INST]
Helpful Answer:
"""

PROMPT = PromptTemplate(template=RESPONSE_TEMPLATE, input_variables=["context", "question"])

qa_chain = RetrievalQA.from_chain_type(
    llm,
    chain_type='stuff',
    retriever=compression_retriever,
    chain_type_kwargs={
        "verbose": True,
        "prompt": PROMPT,
    }
)

user_query = 'Introduce this'
response = qa_chain({"query": user_query})
print(response)

Integrated with LlamaIndex#

2. RAG enhancement tricks#

  • Multi-vector retrieval, sparse + dense

  • Rerank

  • Long contexts LLM

  • Query rewrite, or multi-queries

  • Hierarchy retrieval

  • Multi-chunks

  • Pretrain and finetune of embeddings and rerank weights

  • Meta data of documents

Agentic RAG#

Graph RAG#

Use knowledge graph

  • document processing

  • Graph extraction

  • Graph augmentation

  • Community summarization

pdf parse#

There are some tools help parse the pdf file.

  • PyPDF2
    • Good for English

    • Without bbox

  • pdfplumber
    • Good for English and Chinese

    • Good for table parse

    • With bbox

  • pdfminer

  • Camelot

  • pymupdf

  • papermage

  • llama_index parse
    • support table and figure

But if the file is a scanned pdf, we need to use the OCR.

Layout#

OCR#