embedding-finetuning

Embedding Fine-tuning with NVIDIA NeMo Microservices

Introduction

This guide shows how to fine-tune embedding models using the NVIDIA NeMo Microservices platform to improve performance on domain-specific tasks.

Embedding Fine-tuning workflow with NeMo Microservices

Figure 1: Workflow for fine-tuning embedding models using NeMo Microservices

Objectives

This tutorial shows how to leverage the NeMo Microservices platform for finetuning a nvidia/llama-3.2-nv-embedqa-1b-v2 embedding model using the SPECTER dataset, then evaluating its accuracy on the (somewhat related) zero-shot BeIR Scidocs benchmark.

The tutorial covers the following steps:

Note: A typical workflow involves creating query, positive document, and negative document triplets from a text corpus. This may include synthetic data generation (SDG) and hard-negative mining. For a quick demonstration, we use an existing open dataset from Hugging Face.

About NVIDIA NeMo Microservices

NVIDIA NeMo is a modular, enterprise-ready software suite for managing the AI agent lifecycle, enabling enterprises to build, deploy, and optimize agentic systems.

NVIDIA NeMo microservices, part of the NVIDIA NeMo software suite, are an API-first modular set of tools that you can use to customize, evaluate, and secure large language models (LLMs) and embedding models while optimizing AI applications across on-premises or cloud-based Kubernetes clusters.

Refer to the NVIDIA NeMo microservices documentation for further information.

About the SPECTER dataset

The SPECTER dataset contains approximately 684K triplets pertaining to the scientific domain (titles of papers), which can be used to train embedding models. We will use the SPECTER data for finetuning.

Prerequisites

Deploy NeMo Microservices

To follow this tutorial, you will need at least two NVIDIA GPUs, which will be allocated as follows:

Fine-tuning: One GPU for fine-tuning the llama-3.2-nv-embedqa-1b-v2 model using NeMo Customizer.
Inference: One GPU for deploying the llama-3.2-nv-embedqa-1b-v2 NIM for inference.

Refer to the platform prerequisites and installation guide to deploy NeMo Microservices.

NOTE: Fine-tuning for embedding models is supported starting with NeMo Microservices version 25.8.0. Please ensure you deploy NeMo Microservices Helm chart version 25.8.0 or later to use these notebooks.

Client-Side Requirements

Ensure you have access to:

A Python-enabled machine capable of running Jupyter Lab.
Network access to the NeMo Microservices IP and ports.

Get Started

Create a virtual environment using uv (recommended for better dependency management):

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate virtual environment
uv venv nemo_env
source nemo_env/bin/activate

Install the required Python packages using requirements.txt with uv:
```
uv pip install -r requirements.txt
```

Update the following variables in config.py with your specific URLs and API keys.

# (Required) NeMo Microservices URLs
NDS_URL = "http://data-store.test" # Data Store
NEMO_URL = "http://nemo.test" # Customizer, Entity Store, Evaluator
NIM_URL = "http://nim.test" # NIM

# (Required) Hugging Face Token
HF_TOKEN = ""

# (Optional) To observe training with WandB
WANDB_API_KEY = ""

Launch Jupyter Lab to begin working with the provided tutorials:
```
uv run jupyter lab --ip 0.0.0.0 --port=8888 --allow-root
```
Navigate to the data preparation notebook to get started.

Name		Name	Last commit message	Last commit date
parent directory ..
img		img
1_data_preparation.ipynb		1_data_preparation.ipynb
2_finetuning_and_inference.ipynb		2_finetuning_and_inference.ipynb
3_evaluation.ipynb		3_evaluation.ipynb
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Embedding Fine-tuning with NVIDIA NeMo Microservices

Introduction

Objectives

About NVIDIA NeMo Microservices

About the SPECTER dataset

Prerequisites

Deploy NeMo Microservices

Client-Side Requirements

Get Started

FilesExpand file tree

embedding-finetuning

Directory actions

More options

Directory actions

More options

Latest commit

History

embedding-finetuning

Folders and files

parent directory

README.md

Embedding Fine-tuning with NVIDIA NeMo Microservices

Introduction

Objectives

About NVIDIA NeMo Microservices

About the SPECTER dataset

Prerequisites

Deploy NeMo Microservices

Client-Side Requirements

Get Started