All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- New dedicated notebooks showcasing usage of cloud based Nvidia AI Playground based models using Langchain connectors as well as local model deployment using Huggingface.
- Upgraded milvus container version to enable GPU accelerated vector search.
- Added support to interact with models behind NeMo Inference Microservices using new model engines
nemo-embedandnemo-infer. - Added support to provide example specific collection name for vector databases using an environment variable named
COLLECTION_NAME. - Added
faissas a generic vector database solution behindutils.py.
- Upgraded and changed base containers for all components to pytorch
23.12-py3. - Added langchain specific vector database connector in
utils.py. - Changed speech support to use single channel for Riva ASR and TTS.
- Changed
get_llmutility inutils.pyto return Langchain wrapper instead of Llmaindex wrappers.
- Fixed a bug causing empty rating in evaluation notebook
- Fixed document search implementation of query decomposition example.
- New dedicated example showcasing Nvidia AI Playground based models using Langchain connectors.
- New example demonstrating query decomposition.
- Support for using PG Vector as a vector database in the developer rag canonical example.
- Support for using Speech-in Speech-out interface in the sample frontend leveraging RIVA Skills.
- New tool showcasing RAG observability support.
- Support for on-prem deployment of TRTLLM based nemotron models.
- Upgraded Langchain and llamaindex dependencies for all container.
- Restructured README files for better intuitiveness.
- Added provision to plug in multiple examples using a common base class.
- Changed
minioservice's port to9010from9000in docker based deployment. - Moved
evaluationdirectory from top level to undertoolsand created a dedicated compose file. - Added an experimental directory for plugging in experimental features.
- Modified notebooks to use TRTLLM and Nvidia AI foundation based connectors from langchain.
- Changed
ai-playgroundmodel engine name tonv-ai-foundationin configurations.
- Support for using Nvidia AI Playground based LLM models
- Support for using Nvidia AI Playground based embedding models
- Support for deploying and using quantized LLM models
- Support for Kubernetes deployment support using helm charts
- Support for evaluating RAG pipeline
- Repository restructing to allow better open source contributions
- Upgraded dependencies for chain server container
- Upgraded NeMo Inference Framework container version, no seperate sign up needed for access.
- Main README now provides more details.
- Documentation improvements.
- Better error handling and reporting mechanism for corner cases
- Renamed
triton-inference-servercontainer tollm-inference-server
- Fixed issue #13 of pipeline not able to answer questions unrelated to knowledge base
- Fixed issue #12 typechecking while uploading PDF files