RAG#
1. Build an RAG Application#
RAG could help solve the false information, out-of-date information, and data security for LLM by searching the external data. The basic RAG process is document indexing, query embedding, retrieval, optional rerank, and LLM generate.
Output reference for explainability
LLM Hallucination
from retrievals import AutoModelForEmbedding
Integrated with Langchain#
from retrievals.tools.langchain import LangchainEmbedding, LangchainReranker, LangchainLLM
from retrievals import AutoModelForRanking
from langchain.retrievers import ContextualCompressionRetriever
from langchain_community.vectorstores import Chroma as Vectorstore
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import RetrievalQA
persist_directory = './database/faiss.index'
embed_model_name_or_path = "sentence-transformers/all-MiniLM-L6-v2"
rerank_model_name_or_path = "BAAI/bge-reranker-base"
llm_model_name_or_path = "microsoft/Phi-3-mini-128k-instruct"
embeddings = LangchainEmbedding(model_name_or_path=embed_model_name_or_path, model_kwargs={'pooling_method': 'mean'})
vectordb = Vectorstore(
persist_directory=persist_directory,
embedding_function=embeddings,
)
retrieval_args = {"search_type" :"similarity", "score_threshold": 0.15, "k": 10}
retriever = vectordb.as_retriever(**retrieval_args)
ranker = AutoModelForRanking.from_pretrained(rerank_model_name_or_path)
reranker = LangchainReranker(model=ranker, top_n=3)
compression_retriever = ContextualCompressionRetriever(
base_compressor=reranker, base_retriever=retriever
)
llm = LangchainLLM(model_name_or_path=llm_model_name_or_path)
RESPONSE_TEMPLATE = """[INST]
<>
You are a helpful AI assistant. Use the following pieces of context to answer the user's question.<>
Anything between the following `context` html blocks is retrieved from a knowledge base.
{context}
REMEMBER:
- If you don't know the answer, just say that you don't know, don't try to make up an answer.
- Let's take a deep breath and think step-by-step.
Question: {question}[/INST]
Helpful Answer:
"""
PROMPT = PromptTemplate(template=RESPONSE_TEMPLATE, input_variables=["context", "question"])
qa_chain = RetrievalQA.from_chain_type(
llm,
chain_type='stuff',
retriever=compression_retriever,
chain_type_kwargs={
"verbose": True,
"prompt": PROMPT,
}
)
user_query = 'Introduce this'
response = qa_chain({"query": user_query})
print(response)
Integrated with LlamaIndex#
2. RAG enhancement tricks#
Multi-vector retrieval, sparse + dense
Rerank
Long contexts LLM
Query rewrite, or multi-queries
Hierarchy retrieval
Multi-chunks
Pretrain and finetune of embeddings and rerank weights
Meta data of documents
Agentic RAG#
Graph RAG#
Use knowledge graph
document processing
Graph extraction
Graph augmentation
Community summarization
pdf parse#
There are some tools help parse the pdf file.
- PyPDF2
Good for English
Without bbox
- pdfplumber
Good for English and Chinese
Good for table parse
With bbox
pdfminer
Camelot
pymupdf
papermage
- llama_index parse
support table and figure
But if the file is a scanned pdf, we need to use the OCR.
- fitz
transfer pdf to image
ppocr
Layout#
Layout-parser
llama_index parse (support table and figure)
ppsturcture
unstructured
OCR#