Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question answering with LiteRT and ONNX Runtime.
-
Updated
Apr 23, 2026 - Kotlin
Run local LLMs like Gemma, Qwen, and LLaMA on Android for offline, private, real-time chat and question answering with LiteRT and ONNX Runtime.
Run a <400ms latency Voice Agent on just 4GB VRAM. Fully offline, no API keys required. Optimized for GTX 1650 and edge robotics with zero-copy inference. (Apache 2.0)
🚀 A powerful Flutter-based AI chat application that lets you run LLMs directly on your mobile device or connect to local model servers. Features offline model execution, Ollama/LLMStudio integration, and a beautiful modern UI. Privacy-focused, cross-platform, and fully open source.
NULLA is a local-first personal AI that runs on your machine, remembers your work, helps with research and workflows, and can optionally share knowledge peer-to-peer.
Local LLM proxy, DevOps friendly
An advanced, fully local, and GPU-accelerated RAG pipeline. Features a sophisticated LLM-based preprocessing engine, state-of-the-art Parent Document Retriever with RAG Fusion, and a modular, Hydra-configurable architecture. Built with LangChain, Ollama, and ChromaDB for 100% private, high-performance document Q&A.
🖼️ Python Image and 🎥 Video Generator using LLM providers and models — built with Claude Code 💻 CLI
A framework for using local LLMs (Qwen2.5-coder 7B) that are fine-tuned using RL to generate, debug, and optimize code solutions through iterative refinement.
LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes a load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content).
A fully customizable, super light-weight, cross-platform GenAI based Personal Assistant that can be run locally on your private hardware!
🤖 An Intelligent Chatbot: Powered by the locally hosted Ollama 3.2 LLM 🧠 and ChromaDB 🗂️, this chatbot offers semantic search 🔍, session-aware responses 🗨️, and an interactive Streamlit interface 🎨 for seamless user interaction. 🚀
An AI-powered assistant to streamline knowledge management, member discovery, and content generation across Telegram and Twitter, while ensuring privacy with local LLM deployment.
**Ask CLI** is a command-line tool for interacting with a local LLM (Large Language Model) server. It allows you to send queries and receive concise command-line responses.
An autonomous AI agent for intelligently updating, maintaining, and curating a LightRAG knowledge base.
A lightweight frontend for LM Studio local server APIs. Built using React, Vite, and Tailwind CSS with full support for streaming responses and GitHub Flavored Markdown.
Local-first AI conversation archive. Import from ChatGPT, Claude, and Gemini. Browse, search, ask, distill, export.
A high-performance desktop intelligence agent built in Electron. Pairs deep native Windows OS automation with multimodal LLM cognition and offline voice processing.
UNOFFICIAL Simple LM Studio Web UI (Docker)
This repository has code to securely run SLM (Small language models) locally using nodejs (servers side) or inside browser .
End-to-end RAG automation built with n8n, Ollama (local LLMs), and Pinecone. Automatically ingests documents, generates embeddings, stores vectors, and enables context-aware AI chat.
Add a description, image, and links to the local-llm-integration topic page so that developers can more easily learn about it.
To associate your repository with the local-llm-integration topic, visit your repo's landing page and select "manage topics."