Local Inference
- Native Ollama integration
- Support for Llama, Mistral, Phi, more
- GPU acceleration (NVIDIA, Metal)
- Fallback to CPU with performance tuning
Privacy-first desktop AI application with local inference, secure storage, and zero data transmission. Control your AI experience.
Request Early AccessCloud AI services capture sensitive data: trade secrets, personal information, confidential research. Enterprises in regulated industries (healthcare, finance, legal, government) cannot rely on external AI due to compliance requirements. Open-source models on personal hardware are powerful but fragmented—no unified workspace, poor performance, lack of professional tooling.
Market context: Privacy-sensitive teams and SMBs are a key ICP. Enterprise adoption of local AI growing from 5% (2024) to 25%+ (2026). Data sovereignty and regulatory compliance driving on-device inference adoption.
Forbidden Library combines powerful local inference with professional workspace management. Run state-of-the-art open models (Llama 2/3, Mistral, etc.) on your desktop or datacenter. Complete control. Zero data leakage. Enterprise-grade reliability.
Technical system flow: Privacy-first desktop AI from model installation to secure inference with zero data transmission.
User Action: Select model from catalog (Llama 3, Mistral, Phi, CodeLlama, etc.) via one-click interface.
Process: Model download from Ollama registry or Hugging Face (automatic source selection for best availability), local caching in content-addressable storage (deduplication of shared layers across models), GPU detection and driver validation (CUDA for NVIDIA, Metal for Apple Silicon, Vulkan for cross-platform), automatic quantization selection (Q4_K_M for 8GB RAM systems, Q5_K_M for 16GB+, FP16 for high-memory systems), performance benchmarking on first run (measure tokens/second for model + hardware combination).
Output: Optimized model ready for inference, stored locally with encryption at rest (AES-256).
Technical Stack: Ollama integration, llama.cpp backend, Rust file system abstraction (Tauri), encrypted SQLite metadata store.
User Action: Create project workspace or open existing conversation.
Process: Workspace isolation: each project stored in encrypted database partition (prevents cross-contamination), session initialization: load conversation history from local SQLite, context window management: automatic pruning of old messages to fit model context limits (typically 4K-32K tokens), tagging and search indexing: conversations indexed via full-text search for rapid retrieval, multi-user profile support: separate encrypted vaults per user (enterprise deployments).
Output: Active workspace with conversation history, tags, and metadata loaded into UI.
Technical Stack: SQLite with FTS5 (full-text search), encryption via sqlcipher, SvelteKit frontend with reactive state management.
User Action: Submit prompt to model via chat interface.
Process: Prompt preprocessing: template application (system prompts, formatting), context assembly: inject conversation history + workspace documents if relevant, local inference via Ollama API: prompt sent to local llama.cpp server (no network transmission), GPU acceleration: CUDA kernels for NVIDIA, Metal Performance Shaders for Apple, CPU fallback with SIMD optimizations, streaming response: tokens generated incrementally, displayed in real-time, token counting and performance tracking: measure latency, tokens/second, memory usage.
Output: Model response streamed to UI, stored in encrypted local database, zero data leaves device.
Technical Stack: Ollama REST API, llama.cpp inference engine, WebSocket streaming for UI updates, Rust backend for IPC (inter-process communication).
User Action: Enable MCP tools (code editor integration, database connectors, API clients).
Process: MCP server registration: external tools registered with permissions model (read/write access defined), tool invocation from model: LLM generates structured tool calls (JSON schema validated), local execution: tool runs on device with sandboxing (no external network unless explicitly authorized), result injection: tool output returned to model for next reasoning step, audit logging: all tool invocations logged with timestamps, parameters, results.
Output: Enhanced AI workflows with access to local files, databases, APIs—all executed on-device.
Technical Stack: MCP protocol implementation (Rust), JSON schema validation, sandboxed subprocess execution, audit logs in encrypted SQLite.
User Action: Export conversation/project or enable cross-device sync.
Process: Export: generate PDF/Word/Markdown from conversation with formatting preserved, encryption: if sync enabled, end-to-end encryption via CRDT (Conflict-free Replicated Data Types), sync mechanism: changes propagated to self-hosted sync server or VoidCat-managed option (user choice), decryption: only on authorized devices with user's encryption key (zero-knowledge architecture), conflict resolution: CRDTs ensure eventual consistency without data loss.
Output: Exported artifacts or synchronized workspace across devices, privacy maintained throughout.
Technical Stack: Yrs CRDT library, ChaCha20-Poly1305 encryption, optional sync via WebSocket, export via pandoc integration.
Zero Cloud Dependency: All inference runs locally via Ollama/llama.cpp—no API calls to external services.
Encrypted Storage: All data encrypted at rest (AES-256) and in transit (if sync enabled, E2E encrypted).
No Telemetry: Zero usage tracking, crash reporting optional and fully anonymized (user consent required).
Audit Trail: Complete logs of all operations (model invocations, tool usage, data access) stored locally for compliance review.
Open Source Core: Inference engine (llama.cpp) and runtime (Tauri) are open source and auditable.
| Component | Technology |
|---|---|
| Desktop Runtime | Tauri (Rust backend + webview frontend) |
| Frontend | SvelteKit with TypeScript; responsive UI |
| Model Runtime | Ollama + llama.cpp; GPU support (CUDA, Metal, Vulkan) |
| Storage | SQLite (local), encrypted filesystem, content-addressable storage |
| Sync (Optional) | End-to-end encrypted sync via CRDTs; self-hosted or managed option |
| Compliance | Audit logging, no telemetry, local-only processing, SOC2-ready |
Healthcare, finance, legal professionals needing HIPAA/PCI/SOX compliance. Local inference = no data transmission.
Run models on proprietary datasets without uploading to cloud. Full reproducibility, version control.
Founders, consultants, creatives protecting intellectual property. No vendor lock-in, full control.
Internal deployments on secure networks. Airgapped systems, compliance auditing, centralized management.
Forbidden Library is in private beta. Early access available for founders, researchers, and privacy-focused organizations.
Request Early Access