LM-WebUI is a unified Local AI interface and LLM runtime platform, built for privacy-first and sovereign AI systems. Native support for run local GGUF model inference, Ollama, and API-based models like OpenAI and Gemini, with multimodal RAG pipelines, and persistent vector memory.
Run AI on your control
Built open-source for comunity, developers, system integrators, and organizations that require local inference, reproducibility, and infrastructure-level control, lm-webui bridges the power of modern cloud LLM features with the integrity of local data ownership.
Run fully offline, integrate with cloud APIs when needed, and deploy across environments without sacrificing performance, privacy, or sovereignty.
β οΈ Work in Progress (WIP) lm-webui is under active development. Features, APIs, and architecture may change as the project evolves. Contributions, feedback, and early testing are welcome, but expect breaking changes.
curl -sSL https://raw.githubusercontent.com/lm-webui/lm-webui/main/install.sh | bashThis will:
- Check for Docker and Docker Compose
- Clone the repository (if needed)
- Set up environment configuration
- Build and start the Docker containers
- Provide access instructions
Access the application at http://localhost:7070
| Feature | Capabilities |
|---|---|
| Authentication | Secure JWT-based authentication with refresh tokens and persistent user sessions. Designed for multi-user deployments and role-aware environments. |
| WebSocket Streaming | Bidirectional streaming with structured events, typing indicators, cancellation support, and step-by-step reasoning visibility. |
| Hardware Acceleration | Automatic CUDA, ROCm, and Metal detection with dynamic Memory and Layer optimization for efficient local execution across GPUs and CPUs. |
| GGUF Runtime | Built-in GGUF model lifecycle management download, load, quantize, and serve models locally with HuggingFace compatibility. |
| RAG Engine | Modular retrieval pipeline powered by Qdrant for vector search, reranking, semantic chunking, and context injection. |
| Multimodal Processing | Image and document processing with OCR, embedding, and structured content extraction for unified chat workflows. |
| Knowledge Graph | Triplet-based semantic memory and entity relationship tracking to enhance long-term contextual understanding. |
| Self-Hosted Ready | Effortless on-prem, private cloud, and isolated deployments with no required external telemetry. |
- Model Management: Upload/download GGUF models with progress tracking
- HuggingFace Integration: Direct download from HuggingFace repositories
- Hardware Compatibility: Automatic model validation for your system
- Local Registry: Manage and organize local GGUF models
- Seamless Integration: Use GGUF models directly in chat conversations
For detailed documentation, see the docs/ directory:
- Getting Started - Complete setup guide
- Features - Detailed feature documentation
- Deployment - Production deployment guides
- Contributing - How to contribute to the project
lm-webui follows a modern microservices-inspired architecture:
lm-webui/
βββ backend/ # FastAPI backend with WebSocket streaming
β βββ app/ # Application code
β β βββ routes/ # API endpoints (chat, auth, gguf, etc.)
β β βββ streaming/# WebSocket streaming system
β β βββ rag/ # RAG pipeline with vector search
β β βββ services/ # Core services (GGUF, model management, etc.)
β β βββ hardware/ # Hardware acceleration detection
β β βββ security/ # Authentication & encryption
β βββ tests/ # Backend tests
βββ frontend/ # React/TypeScript frontend
β βββ src/ # Source code
β β βββ components/# UI components
β β βββ services/ # API and WebSocket services
β β βββ hooks/ # Custom React hooks
β β βββ types/ # TypeScript type definitions
β βββ __tests__/ # Frontend tests
βββ docs/ # Documentation
- Backend: Python 3.9+, PostgreSQL/SQLite
- Frontend: Node.js 16+, npm/yarn
- Optional: Docker, CUDA/ROCm for GPU acceleration
For development work, you can use either the Docker-based setup or manual installation:
# Quick setup using the installation script
curl -sSL https://raw.githubusercontent.com/lm-webui/lm-webui/main/install.sh | bash
# Or manually with Docker Compose
git clone https://github.com/lm-webui/lm-webui.git
cd lm-webui
docker-compose up --build# 1. Clone and setup
git clone https://github.com/lm-webui/lm-webui.git
cd lm-webui
# 2. Backend setup
cd backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -r requirements-test.txt # For testing
# 3. Frontend setup
cd ../frontend
npm install
# 4. Run tests
cd ../backend && pytest
cd ../frontend && npm test- Some multimodal pipelines are still experimental
- Hardware acceleration behavior may vary across GPU vendors
- RAG metadata handling is functional but not yet fully standardized
- Media library under development
Near-term
- Stabilize core orchestration APIs and configuration schema
- Improve GGUF deployment automation and quantization presets
- Expand hardware detection and backend fallback logic
Mid-term
- Add stronger RAG governance (source versioning, metadata filters)
- Introduce model bundle validation and optional signature checks
- Improve workflow reproducibility and export/import support
Long-term
- Advanced scheduling for multi-GPU and multi-model workloads
- Adapter/LoRA management for task-specific fine-tuning
- Enterprise features
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Website: lmwebui.com
- GitHub: github.com/lm-webui/lm-webui
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Letβs shape the future of local AI together π€π€
