AI: Intro to RAG
RAG( Retrieval Augmented Generation) is an important component and enabler of efficient Gen AI solutions nowadays. Foundation Models are essentially learned or pre-trained Deep Neural Networks. They can act as the thinking brain but they have some significant disadvantage. One is "hallucinations" whereby they generate confident but false information, and the second is the "knowledge cutoffs" (a lack of awareness of events or data after their training period). In this article, we will look into
What is RAG? Why do we need it? what advantage do they serve? what is the typical process of a RAG looks like?
What is RAG?
RAG resolves both these issues by allows the FMs (brain) to ensure that whatever information that it is retrieving and generating is based on grounded organisational document. For example, if we are building a chatbot for a hotel today on our hotel policies, the model itself would not know about the hotel or its policies, building a RAG on top of it when documents of our latest hotel policies would ensure that the answers that the model is generating are grounded on our provided documents first.
Why do we need it? What advantage do they serve?
RAG for Foundation Models:
1. Reducing Hallucinations: By grounding responses in retrieved documents, RAG ensures the model "looks at the book" before answering, minimizing fabricated claims.
2. Extending Knowledge Scope: RAG bypasses the limitations of training data cutoffs, allowing models to discuss today’s news or recent internal reports.
3. Cost Efficiency: Fine-tuning a model is computationally expensive and time-consuming. RAG provides a cheaper alternative by simply updating the database the model queries.
RAG For Organisations
1. Proprietary Intelligence: Companies can feed their private manuals, HR policies, or codebase into a RAG system without exposing that data to the public training sets of model providers.
2. Dynamic Updates: When information changes, you don't need to retrain the model; you simply update the document in your vector store.
What does a simple RAG pipeline look like?
Deep Dive: Key Components & Nuances
1. Text Splitting & Chunking
Raw documents are often too large for an LLM's context window. Chunking involves breaking text into smaller, digestible pieces.
Chunk Size: Typically ranges between 256 and 1024 tokens. While larger chunks provide more context, they can introduce "noise." Smaller chunks are more precise but may lose the broader meaning of a paragraph.
2. Embedding Models
Embeddings are the "mathematical DNA" of text. An embedding model converts text into a dense numerical vector.
Semantic Meaning: Unlike keyword searches, embeddings capture intent. For example, a search for "feline" will successfully retrieve chunks containing the word "cat."
Dimensions: Models like OpenAI’s text-embedding-3-small use 1536 dimensions. Higher dimensionality allows for more nuanced understanding but requires more storage and compute.
3. The Vector Store (Database)
Standard SQL databases are not optimized for searching high-dimensional vectors. Vector stores like Pinecone, Weaviate, Milvus, and ChromaDB are designed for this specific task. They use specialized indexing (like HNSW) to find the "nearest neighbors" to a query vector in milliseconds, even across millions of records.
Conclusion:
RAG represents the bridge between the creative potential of Large Language Models and the rigorous requirements of real-world data accuracy. By meticulously tuning each stage of the pipeline—from the way we split text to the way we rank search results—developers can build AI systems that are not only intelligent but also reliable, transparent, and contextually aware.


Comments
Post a Comment