Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

README.md

Vector Database Practice Project

This repository contains the materials for the Vector Database Practice Project, a hands-on project focused on building a real knowledge base search system using vector databases, hybrid search, and evaluation techniques.

The full project walkthrough and instructions live here:
https://www.dataquest.io/blog/vector-database-practice-project

This repo is meant to support that guide, not replace it.


What This Project Covers

In this project, you’ll build a complete search system end to end, including:

  • Collecting and inspecting real-world text data
  • Chunking documents and generating embeddings
  • Storing and querying data in a production vector database
  • Combining semantic search with keyword search
  • Evaluating search quality and performance
  • Documenting technical decisions for a portfolio-ready project

The emphasis is on system design, tradeoffs, and evaluation, not just implementation.


Repository Structure


vector-database-practice-project/
├── data-collection-scripts/
│   ├── Guardian/
│   ├── Hugging-Face/
│   ├── NewsAPI/
│   └── README.md
└── README.md


How to Use This Repository

Start with the project guide on the Dataquest blog. It walks through each step in order and explains how the files in this repository fit together.

If you’re looking for instructions on a specific part of the project, check the README inside the relevant subfolder.


Who This Project Is For

This project is intended for learners who already understand the basics of embeddings and vector databases and want to practice applying them in a realistic setting.

It’s designed to produce a concrete, explainable system that you can reference in interviews, technical discussions, or portfolio reviews.


For full instructions and context, refer to the project guide:
https://www.dataquest.io/blog/vector-database-practice-project