Embeddings

One painful issue while using embeddings to retrive relevant documents is that:

  • the returned documents might vary with small changes in the query.
    • e.g. using different wording (“what does the course say about regression?”)
    • using diffent perspectives (“what is being discussed in the course regarding regression?”) etc.

MultiQueryRetriever

MultiQueryRetriever

MultiQueryRetriever tries to address this issue with the help of Language Models.

  • From user’s query, it generates multiple similar query, trying to cover various different perspectives.
  • For each of those generated queries, it get’s a list of relevant documents.
  • Then combines that list of documents.

Using the Default MultiQueryRetriever

Now let’s look at an example. This is the example that LangChain uses.

  • Normally we create a vectorstore from the text splits and embedding model

    from langchain.vectorstores import Chroma
    chroma_store = Chroma.from_documents(texts, embeddings)
    
  • Then using that vectorstore we created a retriever, and called the get_relevant_documents() of that retriever

    retriever = db.as_retriever()
    

  • Now the difference is, we create the retriever using MultiQueryRetriever.from_llm() method passing the LLM to be used to multiple query generation.

    llm = ChatOpenAI(temperature=0)
    
    multi_retriever = MultiQueryRetriever.from_llm(
      retriever=vectordb.as_retriever(), llm=llm
    )
    
  • Let’s use our multi_retriever to find relevant documents now.

    question = "What are the approaches to Task Decomposition?"
    
    unique_docs = multi_retriever.get_relevant_documents(query=question)
    

    task-decomposition

Custom MultiQueryRetriever

We can supply our own prompt and output parser.

It allows us to customize how we want the LLM to generate multiple queries based on our specific usecase.

We’re gonna need to create an LLMChain with our prompt and output parser.

  • Prompt

    QUERY_PROMPT = PromptTemplate(
      input_variables=["question"],
      template="""You are an AI language model assistant. Your task is to generate five 
      different versions of the given user question to retrieve relevant documents from a vector 
      database. By generating multiple perspectives on the user question, your goal is to help
      the user overcome some of the limitations of the distance-based similarity search. 
      Provide these alternative questions seperated by newlines.
      Original question: {question}""",
    )
    
  • Output parser - extracts queries separated by \n

    class LineListOutputParser(PydanticOutputParser):
      def __init__(self) -> None:
          super().__init__(pydantic_object=LineList)
    
      def parse(self, text: str) -> LineList:
          lines = text.strip().split("\n")
          return LineList(lines=lines)
    
  • Next, to create the custom MultiQueryRetriever, unlike the default one, where we gave the llm, this time we need to give ->

    • the LLMChain for the custom MultiQueryRetriever
    • and the parser_key for the Output Parser
    retriever = MultiQueryRetriever(
      retriever=vectordb.as_retriever(), llm_chain=llm_chain, parser_key="lines"
    )  # "lines" is the key (attribute name) of the parsed output
    
  • Finally we call the get_relevant_documents() method of the retriever as usual.


Hopefully the MultiQueryRetriever will help you to get better set of relevant documents via capturing the context from various different angles and perspectives.

But there’s another dilemma.

While chunking,

  • should we do smaller chunks to percisely reflect their meaning?
  • or larger chunks to cover it’s entire context?

Stay tuned for more.

I tweet about these topics and anything I’m exploring on a regular basis. Follow me on twitter