RAG / Retrieval-Augmented Generation

Generally, RAG is a LLM(Large Language Model) which can fetch data from external source(eg: vector database, SQL Db, Graph DB, Web Search engines) & feed to AI generation process.
Purpose of RAG? To give more Context to LLM models to predict better
What is vector?
What is Embedding Model?

RAG Pipeline


1. [Retrieval Phase] Chunks are fed into vector DB
             |-------------------- A. Retrieval Phase (Offline) --------------|
             |                                                                |
Raw          |  |--- Chunker ---|                                             |
documents->  |- | break docs in |--chunks->[Embedding]-vectors--> [vectorDB]  |
logs         |  |smaller pieces |          [  Model  ]                 |      |
             |  |---------------|                                      \/     |
             |                                                       index    |
             |----------------------------------------------------------------|

             [Node1 (score 0.92), Node2 (score 0.87), Node3 (score 0.76)]

2. [Augmentation Phase] User asks a query & information retrieved from Vector DB
User's Query: Show firewall policies blocking outbound traffic?

        index from vectorDB(Step1)
          |
          \/
        |------B. Augmentation Phase --------|
User    | index + User's Query = Prompt      | 
query-> |                                    |
        | Combine index vector DB            |
        | into well crafted prompt           |-> augmented_prompt
        |------------------------------------|

augmented_prompt=
"Context: 
[Node1][Node2][Node3] 
Question: Show firewall policies blocking outbound traffic
Answer:"

3. [Generation Phase] Feed augmented_prompt into LLM.
With (user_query + vector), LLM hallucinations reduces drastically

                     |-- LLM --|
augmented_prompt --> | GPT5.0  | --> Reponse (less hallucinations)
                     |---------|
      
1. The Retrieval Phase:
  Chunking: Raw documents are broken down into smaller, readable pieces.
  Embedding: These text chunks are converted into mathematical
representations (vectors) using an embedding model.
  Vector Search: User asks a question, system searches vector

2. The Augmentation Phase:
  Once the relevant information is retrieved, it isn't just displayed. It is packaged.
  The system takes the user’s original query and the retrieved
text chunks, combining them into a specifically crafted db

3. The Generation Phase
  This combined prompt (the user's query + the retrieved
context) is fed into the LLM.
 This forces the model to synthesize an answer based only on
the provided external data, which drastically reduces hallucinations

RAG Flow

RAG

RAG Pipeline Code

User queries from security logs

We have log files(eg: VPN, firewall).
RAG pipeline will read log files and provide answers to Administrator questions.


./logs/vpn.log
2025-05-01 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4
2025-05-01 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4

./logs/firewall.log
2025-05-01 FIREWALL_DENY src=10.1.1.5 dst=8.8.8.8 policy=OUTBOUND_BLOCK
2025-05-01 FIREWALL_DENY src=10.1.1.6 dst=1.1.1.1 policy=OUTBOUND_BLOCK

$ cat rag_pipeline.py
import os
import dotenv
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms.openai import OpenAI		#Import LLM
from llama_index.core import Settings

# Load GitHub Token and set env
dotenv.load_dotenv()
if not os.getenv("GITHUB_TOKEN"):
    raise ValueError("GITHUB_TOKEN is not set")
os.environ["OPENAI_API_KEY"] = os.getenv("GITHUB_TOKEN")
os.environ["OPENAI_BASE_URL"] = "https://models.inference.ai.azure.com/"

############## 1. Retrieval Phase Start #################
## A. Setup Embedding Model. This is Neural Network	
embed_model = OpenAIEmbedding(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY"),
    api_base=os.getenv("OPENAI_BASE_URL"),
)
Settings.embed_model = embed_model

## B. Break documents into Chunks
documents = SimpleDirectoryReader("./logs").load_data()

## C. Pass Chunked documents to Embedding model
# And store Chunks into local vectorDB
# def from_documents(documents, insert_batch_size=150): 
#   embed_model = Settings.embed_model #embed_model from Global
#   nodes = self._chunk_documents(documents) #chunks the documents into Nodes
#   for batch in batches(nodes, batch_size=insert_batch_size):
#       texts = [node.text for node in batch]
#       embeddings = embed_model.get_text_embedding_batch(texts)          
#       self._vector_store.add(embeddings, metadata=batch.metadata) #Store Tensors into the vector DB
index = VectorStoreIndex.from_documents(documents, insert_batch_size=150)
############## Retrieval Phase End #####################

# Create LLM
llm = OpenAI(
    model="gpt-4o-mini",
    api_key=os.getenv("OPENAI_API_KEY"),
    api_base=os.getenv("OPENAI_BASE_URL"),
)

############## 2,3. Augumentation & Generation Phase Start #################
# def query_engine(query_string: str):
////// Augumentation Phase. Create augmented_prompt //////
#    query_tensor = Settings.embed_model.get_text_embedding(user_query_string)
#	 top_k_nodes = self._vector_store.similarity_search(
#       query_tensor, 
#       similarity_top_k=3
#    )
# top_k_nodes now contains the 3 most relevant text chunks (Nodes)
# e.g., Node 1: "2025-05-01 08:22:47 VPN_LOGIN_FAILED user=eve.hacker ip=203.0.113.7"
#       Node 2: "2025-05-01 08:30:55 VPN_LOGIN_SUCCESS user=carol.white ip=192.168.1.52"
#       Node 3: "2025-05-01 09:01:08 VPN_LOGIN_FAILED user=john.doe ip=185.22.11.4"
#    vectordb_text = [Node1][Node2][Node3] 
# augmented_prompt=
#	"Context: 
#	[Node1][Node2][Node3] 
#	Question: failed vpn logins for 2 hours after 2025-05-01 09:01:08
#	Answer:"
#
#

query_engine = index.as_query_engine(
  llm=llm
)
response = query_engine.query("Show firewall policies blocking outbound traffic")
print(response)
Response=
The firewall policies blocking outbound traffic are as follows:

1. Policy: OUTBOUND_BLOCK
   - Source: 10.1.1.5
   - Destination: 8.8.8.8

2. Policy: OUTBOUND_BLOCK
   - Source: 10.1.1.6
   - Destination: 1.1.1.1

response = query_engine.query("Why is john.doe unable to connect to VPN?")
print(response)
Response=
john.doe is unable to connect to the VPN due to repeated login failures, as 
indicated by the log entries showing two instances of VPN_LOGIN_FAILED for the user.
############## Augumentation & Generation Phase End #################