What Exactly Is a Vector Database?
This blog explains what a vector database is in simple, relational-database-friendly terms, showing how it stores embeddings (vectors) to enable semantic “find similar” search instead of exact matches. It compares vector databases with traditional relational databases, highlights when each is useful, and shows how vector search plus LLMs power modern features like “ChatGPT over your documents” without replacing your existing SQL-based systems.
6/19/20234 min read


What Exactly Is a Vector Database and how is it different from the relational databases you already know?
If you’ve worked with relational databases like MySQL, PostgreSQL, or SQL Server, you’re used to storing data in tables and querying it with SQL:
SELECT * FROM customers WHERE email = 'alice@example.com';
That world is great when you know exactly what you’re looking for: a specific ID, an exact email, a range of dates, a status flag.
But what if your question is fuzzier?
“Show me documents similar to this one.”
“Find products that are like this product, even if the keywords are different.”
“Given this user query, find the most related help articles.”
This is where vector databases come in.
From Rows and Columns to Vectors and Similarity
A vector database stores data in the form of vectors—lists of numbers that represent meaning.
Example of a vector (shortened):
[0.12, -0.34, 0.89, 0.05, ...]
Where do these vectors come from?
From embedding models (often powered by machine learning). These models take something like:
A sentence
A document
A product description
An image
…and convert it into a vector. The key idea:
Things that are similar in meaning end up with vectors that are close together in this high-dimensional space.
So instead of asking,
“Find rows where title LIKE '%refund%',”
you can ask,
“Given this question about refunds, find the most similar texts in my database, even if they use different words.”
How Vector Databases Work (At a High Level)
You can think of the workflow like this:
You create embeddings (vectors)
For each document, FAQ, product, etc., you run it through an embedding model.
You get a vector: a long list of numbers.
You store these vectors in a vector database
Each record is something like:
id
metadata (title, tags, URL, etc.)
vector (the embedding)
You query by similarity, not exact match
When a user asks a question, you create a vector for that query.
The vector DB finds the nearest neighbors: the stored vectors closest to the query vector.
These are your “most similar” documents.
This type of search is often called semantic search: it’s based on meaning, not just matching characters or keywords.
How Is This Different From a Relational Database?
Let’s compare them in simple terms.
1. Data Model
Relational DB:
Tables, rows, columns
Strongly structured schema (e.g., customers(id, name, email, created_at))
Vector DB:
Records with vectors + metadata
Focus on the vector field for similarity search
Metadata can still be stored in a structured way, but the star of the show is the vector
You can think of a vector DB more like a specialized index for similarity, with some database features around it.
2. Query Type
Relational DB:
Exact matches: WHERE id = 123
Range filters: WHERE created_at > '2023-01-01'
Joins between tables
Aggregations: COUNT, SUM, GROUP BY
Vector DB:
Nearest neighbor search:
“Find me the top 10 records whose vectors are closest to this query vector.”
Often combined with filters on metadata:
“Find documents similar to this query AND where category = 'support'.”
You don’t usually do joins or complex aggregations in a pure vector DB. It’s mainly about “similarity search + filters”.
3. What Problems They’re Good At
Relational DB:
Transactional systems (orders, users, inventory)
Reporting and analytics on structured data
Anything where correctness and relationships between rows are critical
Vector DB:
Semantic search (“find similar content”)
Recommendation systems (“people who liked this also like…”)
Question answering over documents (when used with LLMs)
Deduplication or clustering of similar items
In many real systems, you actually use both:
Relational DB for your core app data.
Vector DB for search and “intelligent” retrieval.
Where LLMs and Vector Databases Fit Together
When people talk about RAG (Retrieval-Augmented Generation) and “ChatGPT over your data,” vector databases are doing the heavy lifting.
The pattern looks like this:
Store your documents (FAQs, policies, docs) and their vectors in a vector DB.
When a user asks a question, convert that question into a vector.
Use the vector DB to find the most relevant pieces of text.
Feed those into an LLM (like ChatGPT) as context.
The LLM generates an answer, grounded in your actual documents.
The relational database can’t easily do this “semantic similarity” part, because it’s not designed for high-dimensional nearest neighbor search. That’s exactly the niche vector databases are built for.
Should You Replace Your Relational Database With a Vector Database?
Short answer: No.
They solve different problems.
Use your relational database for structured, transactional, and relational data.
Use a vector database for similarity and semantic search over text, images, or other unstructured content.
In many architectures, the two sit side by side:
Relational DB for users, orders, permissions.
Vector DB for documents, embeddings, and semantic retrieval.
Your app logic (or an API layer) decides when to query which.
A Simple Analogy
Think of it like this:
A relational database is like a very organized filing cabinet. If you know the folder name or reference number, you can grab the exact file quickly.
A vector database is like a super-smart librarian. You say,
“I’m looking for something about how to handle refunds when a subscription is paused rather than canceled,”
and the librarian says,
“Here are the three documents that most closely match what you mean,”
even if they never use those exact words.
You still need the cabinet for structure and reliability. The librarian sits on top of it to help you find things in a more human way.
Summary
A vector database stores vectors—numeric representations of meaning—from text, images, etc.
It lets you search by similarity, not just exact keywords or IDs.
It’s different from a relational database, which is built around tables, rows, and SQL queries.
In modern AI systems, vector databases are key for semantic search and LLM-powered Q&A over your own data.
You don’t replace relational DBs with vector DBs; you combine them to get both solid structure and smarter search.
As AI becomes more central to applications, understanding vector databases is becoming just as important as knowing your way around SQL—they’re simply the next piece of the data puzzle.

