RAG: A way to extend the reach of LLM

RAG: A way to extend the reach of LLM

17 de mayo de 2024

RAG (Retrieval Augmented Systems) are systems that serve to process, index, and search texts from a database, allowing a LLM (Large Language Model) to access information that is not part of its training. In this way, we can expand the knowledge on which the LLM works, whether to incorporate more recent information, information from a very specific domain, or private data from a company or user.

RAG systems consist of several parts linked to the stages or processes that need to be carried out. Below, we will broadly describe the process and break down each of its components.

Data Sources

First, we need to define what type of data source we will use for the RAG: it can be anything from a text file to a video, including PDFs, audio files, tables, or an SQL database. With few exceptions, the processing of the base file should always end in plain text. If possible, in each interaction with the LLM, we would pass 100% of the resulting text file to serve as context in the interaction with the user. However, for now (March 2024), this is not efficient, so we need to cut the text into smaller pieces and only pass the relevant parts to the LLM to resolve the interaction with the user.

Chunking

The first step is to divide the text into chunks. We not only need to define the size of these chunks (how many words they include) but also define an overlap, that is, a superposition between one chunk and the next so that the context in which each word is inserted carries more weight. Once we have the chunks with overlap, we need to transform them into embeddings, that is, convert words into numerical vectors.

Database Storage

Once we have the numerical vectors representing the words in the text, we store them in a vector database. These databases are optimized to make the search more efficient. What remains is to define the mechanism by which we will perform the search. That is, what metric or algorithm we will use to find the most relevant texts for the user interaction and how many results we want to return.

Retrieval of Relevant Chunks

Each time there is a user interaction requiring a database query, we should execute this search mechanism to bring the most relevant results. These fragments are then passed as context to the LLM, which uses them to respond to the query or resolve the interaction.

RAG in production

So far, we have seen the scheme of how a RAG system works, but this does not end here. Like any system, when we want to bring it into production, we must consider some additional elements. For example, how often new files need to be processed, whether old files need to be deleted, establishing evaluation mechanisms and metrics to understand if it is necessary to modify the system parameters (chunk size, overlap, number of fragments returned in the search, metric used) or differentiate them by conversation topics. All this without considering the specifics of putting the LLM into production: limitations, restrictions, security, privacy, etc.

If you are interested in learning more about the different steps of a RAG, you can continue in this note.

Contacta con nosotros para comenzar
la fase de Discovery

Designed by jln__studio

Contacta con nosotros para comenzar
la fase de Discovery

Contacta con nosotros para comenzar
la fase de Discovery

Babarogic © 2023. Designed by Goran Babarogic