What is Vector Database?
Vector DB stores High-dimensional
Tensors (Arrays of
floats) + Metadata. These are used in AI/ML Operations for data storage
and retrieval
These work on concept of similarity search instead of exact match.
SQL vs NoSQL vs Vector Databases
How data is stored in a Vector Database?
let's say we have 3 log entries.
1. This is the Raw Data
Log_A: "John uploaded financial report to Gmail."
Log_B: "Sarah downloaded malicious payload from unknown domain."
Log_C: "Service account accessed internal S3 bucket."
2. The Transformation (Neural Net Forward Pass)
You pass these through an embedding model. Let's pretend the output is a
3-dimensional tensor (in reality, it's 768 or 1536 dimensions).
Log_A becomes → Tensor A: [0.95, 0.20, 0.05]
Log_B becomes → Tensor B: [0.10, 0.90, 0.85]
Log_C becomes → Tensor C: [0.80, 0.15, 0.10]
3. Storage inside the Vector DB
Inside the VDB, the data is not stored as rows in a table. It is stored
as Nodes in a graph (if using HNSW)
VDB draws edges between these vectors. It notices that Vector A (0.95)
and Vector C (0.80) are mathematically close, so it makes them neighbors
in a graph. Vector B (0.90, 0.85) is far away on the other side of the
graph.
// Query to INSERT data into vector DB
# You compute the tensor first
vector = embedding_model.encode("John uploaded...")
# Then you call the API to insert
index.upsert(
vectors=[("chunk_42", vector, {"user": "John", "timestamp": "..."})]
)
{
"id": "chunk_42",
"vector": [0.95, 0.20, 0.05], // The actual Tensor
"metadata": { // The payload (this IS like NoSQL/SQL)
"source_log": "proxy_server_01",
"timestamp": "2026-06-19T14:03:00Z",
"user": "John",
"action": "BLOCKED"
},
"text": "John uploaded financial report to Gmail." // Original text for LLM
}
VDBs does batch insertions
SQL: When SQL receives an INSERT, it appends the row to disk and
updates a B-Tree (cheap).
VDB: When the VDB receives an upsert, it has to rebalance the
graph index (HNSW). It inserts this new vector into the graph,
calculates its nearest neighbors, and draws new edges between them. This
is computationally heavier and is why VDBs often batch insertions
Why is a VDB needed even with SQL/NoSQL existing
Imagine you have 1 million security logs in your SQL database.
A user asks: "Find logs that look semantically like this new threat:
'exfiltration via encrypted tunnel'."
Using SQL:
Select * from table where LIKE '%exfiltration%' OR LIKE '%encrypted%'
-> No results. Word "Data Leak" or "TLS bypass" are not present in DB
OR
pull all 1 million tensors out of storage into RAM.
compute a Dot Product between the Query Tensor and each of the 1 million tensors.
O(n) complexity. // Too Slow
Using VDB:
- It has pre-built the HNSW graph
- It starts at a random entry point and greedily hops across the graph
toward the query tensor.
- It only calculates ~100 dot products. O(log n) complexity. It finds
the Top-5 matches in ~10 milliseconds