RAG
=========

.. _rag:

1. Build an RAG Application
-------------------------------

RAG could help solve the false information, out-of-date information, and data security for LLM by searching the external data.
The basic RAG process is document indexing, query embedding, retrieval, optional rerank, and LLM generate.

* Output reference for explainability
* LLM Hallucination

.. code-block:: python

    from retrievals import AutoModelForEmbedding


Integrated with Langchain
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. image:: https://colab.research.google.com/assets/colab-badge.svg
    :target: https://colab.research.google.com/drive/1fJC-8er-a4NRkdJkwWr4On7lGt9rAO4P?usp=sharing
    :alt: Open In Colab


.. code-block:: python

    from retrievals.tools.langchain import LangchainEmbedding, LangchainReranker, LangchainLLM
    from retrievals import AutoModelForRanking
    from langchain.retrievers import ContextualCompressionRetriever
    from langchain_community.vectorstores import Chroma as Vectorstore
    from langchain.prompts.prompt import PromptTemplate
    from langchain.chains import RetrievalQA

    persist_directory = './database/faiss.index'
    embed_model_name_or_path = "sentence-transformers/all-MiniLM-L6-v2"
    rerank_model_name_or_path = "BAAI/bge-reranker-base"
    llm_model_name_or_path = "microsoft/Phi-3-mini-128k-instruct"

    embeddings = LangchainEmbedding(model_name_or_path=embed_model_name_or_path, model_kwargs={'pooling_method': 'mean'})
    vectordb = Vectorstore(
        persist_directory=persist_directory,
        embedding_function=embeddings,
    )
    retrieval_args = {"search_type" :"similarity", "score_threshold": 0.15, "k": 10}
    retriever = vectordb.as_retriever(**retrieval_args)

    ranker = AutoModelForRanking.from_pretrained(rerank_model_name_or_path)
    reranker = LangchainReranker(model=ranker, top_n=3)
    compression_retriever = ContextualCompressionRetriever(
        base_compressor=reranker, base_retriever=retriever
    )

    llm = LangchainLLM(model_name_or_path=llm_model_name_or_path)

    RESPONSE_TEMPLATE = """[INST]
    <>
    You are a helpful AI assistant. Use the following pieces of context to answer the user's question.<>
    Anything between the following `context` html blocks is retrieved from a knowledge base.

        {context}

    REMEMBER:
    - If you don't know the answer, just say that you don't know, don't try to make up an answer.
    - Let's take a deep breath and think step-by-step.

    Question: {question}[/INST]
    Helpful Answer:
    """

    PROMPT = PromptTemplate(template=RESPONSE_TEMPLATE, input_variables=["context", "question"])

    qa_chain = RetrievalQA.from_chain_type(
        llm,
        chain_type='stuff',
        retriever=compression_retriever,
        chain_type_kwargs={
            "verbose": True,
            "prompt": PROMPT,
        }
    )

    user_query = 'Introduce this'
    response = qa_chain({"query": user_query})
    print(response)


Integrated with LlamaIndex
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


2. RAG enhancement tricks
----------------------------------

* Multi-vector retrieval, sparse + dense
* Rerank
* Long contexts LLM
* Query rewrite, or multi-queries
* Hierarchy retrieval
* Multi-chunks
* Pretrain and finetune of embeddings and rerank weights
* Meta data of documents


Agentic RAG
---------------------


Graph RAG
-------------------

Use knowledge graph

* document processing
* Graph extraction
* Graph augmentation
* Community summarization

- https://github.com/microsoft/graphrag


pdf parse
--------------

There are some tools help parse the pdf file.

* PyPDF2
    - Good for English
    - Without bbox
* pdfplumber
    - Good for English and Chinese
    - Good for table parse
    - With bbox
* pdfminer
* Camelot
* pymupdf
* papermage
* llama_index parse
    - support table and figure


But if the file is a scanned pdf, we need to use the OCR.

* fitz
    - transfer pdf to image
* https://github.com/mittagessen/kraken
* ppocr


Layout
~~~~~~~~~~~~~~~~~

* https://github.com/LynnHaDo/Document-Layout-Analysis
* Layout-parser
* llama_index parse (support table and figure)
* ppsturcture
* unstructured


OCR
~~~~~~~~~~~~~~

.. code-block:: python