This repository contains the materials for the Vector Database Practice Project, a hands-on project focused on building a real knowledge base search system using vector databases, hybrid search, and evaluation techniques.
The full project walkthrough and instructions live here:
https://www.dataquest.io/blog/vector-database-practice-project
This repo is meant to support that guide, not replace it.
In this project, you’ll build a complete search system end to end, including:
- Collecting and inspecting real-world text data
- Chunking documents and generating embeddings
- Storing and querying data in a production vector database
- Combining semantic search with keyword search
- Evaluating search quality and performance
- Documenting technical decisions for a portfolio-ready project
The emphasis is on system design, tradeoffs, and evaluation, not just implementation.
vector-database-practice-project/
├── data-collection-scripts/
│ ├── Guardian/
│ ├── Hugging-Face/
│ ├── NewsAPI/
│ └── README.md
└── README.md
Start with the project guide on the Dataquest blog. It walks through each step in order and explains how the files in this repository fit together.
If you’re looking for instructions on a specific part of the project, check the README inside the relevant subfolder.
This project is intended for learners who already understand the basics of embeddings and vector databases and want to practice applying them in a realistic setting.
It’s designed to produce a concrete, explainable system that you can reference in interviews, technical discussions, or portfolio reviews.
For full instructions and context, refer to the project guide:
https://www.dataquest.io/blog/vector-database-practice-project