🗄️ Module 06: Vector Storage & Hybrid Databases

Welcome to Module 06. In this section, you will master the architecture of Agentic Long-Term Memory. You will move beyond simple similarity search to understand the mechanics of High-Dimensional Geometry, HNSW Indexing Physics, and Hybrid Retrieval patterns that combine relational logic with semantic intuition.

🏛️ 1. Architectural Deep Dive: The Physics of Vector Search

In standard relational databases, we index data using B-Trees ($O(\log N)$ lookup). In vector space, we are searching for proximity in a high-dimensional manifold (e.g., 1536 dimensions for Gemini embeddings).

The Recall vs. Latency Tradeoff

Exact Search: Computing the distance between a query vector and every vector in the database (Linear Scan). This provides 100% recall but its latency is $O(N \cdot D)$, where $N$ is the number of records and $D$ is the dimensionality.
Approximate Nearest Neighbor (ANN): We use algorithms like HNSW to build a graph-based "shortcut" index. This provides sub-millisecond latency at the cost of a small % drop in search precision (Recall).

Index RAM Pressure

Vector indices (especially HNSW) are designed to live in RAM for maximum speed.

The physical constraint: An HNSW index requires significantly more memory than a standard B-Tree index. For 1M vectors of 1536 dimensions, the index alone can consume 10GB+ of RAM.
Disk I/O: If the index "spills" from RAM to Disk, search performance will drop by 1,000x. This is why Module 00: Workspace Engineering (tuning swappiness) is critical for database stability.

📊 2. Structured Tradeoff Matrix: Vector Storage Strategies

Database	Architecture	Metadata Support	Data Persistence	Primary Production Bottleneck
pgvector (Postgres)	Integrated Relational	Highest (SQL)	Disk-backed	HNSW RAM overhead on small instances.
Pinecone	Managed / Cloud-Native	Moderate	Hosted	Network Latency (Cloud API Round-trips).
Chroma	Embedded / Local	Low	In-Memory / Disk	Global lock contention on high concurrency.
Milvus	Distributed / Cluster	Moderate	Storage Tiered	Infrastructure Complexity (K8s required).

🛠️ 3. Step-by-Step Mechanics Breakdown

Pattern: HNSW (Hierarchical Navigable Small World)

In Lab 1, we use USING hnsw. Unlike flat indices, HNSW builds a multi-layered graph.

Probability Layering: Like a skip-list, the top layers contain few "expressway" nodes for broad navigation.
Greedy Routing: The search starts at the top and "hops" between nodes that are mathematically closest to the query vector.
Rationale: This allows us to search 1,000,000 records by only visiting ~100-200 nodes, maintaining sub-10ms latency.

Pattern: Hybrid Filtering (SQL + Vector)

In Lab 2, we implement WHERE agent_role = :role AND ....

The Mechanism: PostgreSQL executes the relational filters (the WHERE clause) before or during the vector traversal (depending on the planner).
Rationale: Agents rarely need "everything similar." They need "all billing logs from last Tuesday similar to this error." Hybrid search prevents the agent from hallucinating context from unrelated tenants or outdated data.

🛡️ 4. Failure Mode Analysis: Vector Breaking Points

Failure Mode	Error/Log Signature	Root Cause	Code-Level Mitigation
Dimensionality Mismatch	`vector dimensions do not match (1536 vs 768)`	You are querying a Gemini index with a BERT or local model embedding.	Enforce embedding model versioning in the `metadata` column.
Index Bloat	`ERROR: out of memory (HNSW)`	The graph grew too large for the allocated Postgres shared buffers.	Increase `shared_buffers` or switch to `IVFFlat` (lower accuracy, less RAM).
Stale Memory Drift	Agent repeats old, outdated facts.	Old records have high similarity but low temporal relevance.	Implement Temporal Decay math (Lab 3) to penalize old records.
Index Non-Hit	Search takes 5s+ for small dataset.	The index wasn't created or the planner chose a `Sequential Scan`.	Use `EXPLAIN ANALYZE` and verify `SET local enable_seqscan = off;`

🧪 5. Runtime Verification: What to Observe

When executing the labs, monitor these signals:

The Planner Audit: In psql, run EXPLAIN ANALYZE on your query.
- Observation: Look for Index Scan using ... on agent_longterm_memories. If you see Seq Scan, your index is not being used, likely due to small table size or misconfiguration.
RAM Growth: Use watch -n 1 free -h during index creation (CREATE INDEX).
- Observation: You should see "Available" memory drop as Postgres builds the HNSW graph in the shared buffer cache.
Distance Scaling: Intentionally query with a vector of all zeros [0,0,0...].
- Observation: Note the distance value. In Cosine Distance (<=>), 1.0 means orthogonal (unrelated). Values closer to 0.0 indicate high semantic overlap.

Next Step: proceed to Module 07: Advanced RAG to learn how to prepare documents for this high-performance storage layer.

🗄️ Module 06: Vector Storage & Hybrid Databases ​

🏛️ 1. Architectural Deep Dive: The Physics of Vector Search ​

The Recall vs. Latency Tradeoff ​

Index RAM Pressure ​

📊 2. Structured Tradeoff Matrix: Vector Storage Strategies ​

🛠️ 3. Step-by-Step Mechanics Breakdown ​

Pattern: HNSW (Hierarchical Navigable Small World) ​

Pattern: Hybrid Filtering (SQL + Vector) ​

🛡️ 4. Failure Mode Analysis: Vector Breaking Points ​

🧪 5. Runtime Verification: What to Observe ​