Forbidden Library

Privacy-first desktop AI application with local inference, secure storage, and zero data transmission. Control your AI experience.

Request Early Access

The Problem

Cloud AI services capture sensitive data: trade secrets, personal information, confidential research. Enterprises in regulated industries (healthcare, finance, legal, government) cannot rely on external AI due to compliance requirements. Open-source models on personal hardware are powerful but fragmented—no unified workspace, poor performance, lack of professional tooling.

Market context: Privacy-sensitive teams and SMBs are a key ICP. Enterprise adoption of local AI growing from 5% (2024) to 25%+ (2026). Data sovereignty and regulatory compliance driving on-device inference adoption.

The Solution

Forbidden Library combines powerful local inference with professional workspace management. Run state-of-the-art open models (Llama 2/3, Mistral, etc.) on your desktop or datacenter. Complete control. Zero data leakage. Enterprise-grade reliability.

Local Inference

  • Native Ollama integration
  • Support for Llama, Mistral, Phi, more
  • GPU acceleration (NVIDIA, Metal)
  • Fallback to CPU with performance tuning

Secure Workspace

  • Encrypted local storage
  • Multi-user profiles with isolation
  • Session management & audit logs
  • Zero data transmission to cloud

Professional Tools

  • MCP tool integration
  • Document management & search
  • Custom workflows & automation
  • Export to multiple formats

How It Works

Technical system flow: Privacy-first desktop AI from model installation to secure inference with zero data transmission.

Stage 1: Model Installation & Optimization

User Action: Select model from catalog (Llama 3, Mistral, Phi, CodeLlama, etc.) via one-click interface.

Process: Model download from Ollama registry or Hugging Face (automatic source selection for best availability), local caching in content-addressable storage (deduplication of shared layers across models), GPU detection and driver validation (CUDA for NVIDIA, Metal for Apple Silicon, Vulkan for cross-platform), automatic quantization selection (Q4_K_M for 8GB RAM systems, Q5_K_M for 16GB+, FP16 for high-memory systems), performance benchmarking on first run (measure tokens/second for model + hardware combination).

Output: Optimized model ready for inference, stored locally with encryption at rest (AES-256).

Technical Stack: Ollama integration, llama.cpp backend, Rust file system abstraction (Tauri), encrypted SQLite metadata store.

Stage 2: Workspace & Session Management

User Action: Create project workspace or open existing conversation.

Process: Workspace isolation: each project stored in encrypted database partition (prevents cross-contamination), session initialization: load conversation history from local SQLite, context window management: automatic pruning of old messages to fit model context limits (typically 4K-32K tokens), tagging and search indexing: conversations indexed via full-text search for rapid retrieval, multi-user profile support: separate encrypted vaults per user (enterprise deployments).

Output: Active workspace with conversation history, tags, and metadata loaded into UI.

Technical Stack: SQLite with FTS5 (full-text search), encryption via sqlcipher, SvelteKit frontend with reactive state management.

Stage 3: Local Inference Execution

User Action: Submit prompt to model via chat interface.

Process: Prompt preprocessing: template application (system prompts, formatting), context assembly: inject conversation history + workspace documents if relevant, local inference via Ollama API: prompt sent to local llama.cpp server (no network transmission), GPU acceleration: CUDA kernels for NVIDIA, Metal Performance Shaders for Apple, CPU fallback with SIMD optimizations, streaming response: tokens generated incrementally, displayed in real-time, token counting and performance tracking: measure latency, tokens/second, memory usage.

Output: Model response streamed to UI, stored in encrypted local database, zero data leaves device.

Technical Stack: Ollama REST API, llama.cpp inference engine, WebSocket streaming for UI updates, Rust backend for IPC (inter-process communication).

Stage 4: Tool Integration via MCP Protocol

User Action: Enable MCP tools (code editor integration, database connectors, API clients).

Process: MCP server registration: external tools registered with permissions model (read/write access defined), tool invocation from model: LLM generates structured tool calls (JSON schema validated), local execution: tool runs on device with sandboxing (no external network unless explicitly authorized), result injection: tool output returned to model for next reasoning step, audit logging: all tool invocations logged with timestamps, parameters, results.

Output: Enhanced AI workflows with access to local files, databases, APIs—all executed on-device.

Technical Stack: MCP protocol implementation (Rust), JSON schema validation, sandboxed subprocess execution, audit logs in encrypted SQLite.

Stage 5: Export & Sync (Optional)

User Action: Export conversation/project or enable cross-device sync.

Process: Export: generate PDF/Word/Markdown from conversation with formatting preserved, encryption: if sync enabled, end-to-end encryption via CRDT (Conflict-free Replicated Data Types), sync mechanism: changes propagated to self-hosted sync server or VoidCat-managed option (user choice), decryption: only on authorized devices with user's encryption key (zero-knowledge architecture), conflict resolution: CRDTs ensure eventual consistency without data loss.

Output: Exported artifacts or synchronized workspace across devices, privacy maintained throughout.

Technical Stack: Yrs CRDT library, ChaCha20-Poly1305 encryption, optional sync via WebSocket, export via pandoc integration.

🔒 Privacy Architecture Guarantees

Zero Cloud Dependency: All inference runs locally via Ollama/llama.cpp—no API calls to external services.
Encrypted Storage: All data encrypted at rest (AES-256) and in transit (if sync enabled, E2E encrypted).
No Telemetry: Zero usage tracking, crash reporting optional and fully anonymized (user consent required).
Audit Trail: Complete logs of all operations (model invocations, tool usage, data access) stored locally for compliance review.
Open Source Core: Inference engine (llama.cpp) and runtime (Tauri) are open source and auditable.

Core Features

  • Model Selection
    One-click install of 50+ open models; automatic downloading, caching, optimization.
  • Performance Tuning
    GPU acceleration, batching, quantization; automatic fallback for hardware constraints.
  • Workspace Management
    Projects, conversations, tags, search; organize and retrieve work across sessions.
  • Security & Privacy
    Encrypted storage, local-only execution, audit trails, compliance-ready.
  • Tool Integration
    MCP protocol support; connect to code editors, APIs, databases, external services.
  • Cross-Platform
    Mac, Windows, Linux; native performance; unified experience across desktop OS.

Technical Architecture

Component Technology
Desktop Runtime Tauri (Rust backend + webview frontend)
Frontend SvelteKit with TypeScript; responsive UI
Model Runtime Ollama + llama.cpp; GPU support (CUDA, Metal, Vulkan)
Storage SQLite (local), encrypted filesystem, content-addressable storage
Sync (Optional) End-to-end encrypted sync via CRDTs; self-hosted or managed option
Compliance Audit logging, no telemetry, local-only processing, SOC2-ready

Use Cases & Customer Profiles

Regulated Industries

Healthcare, finance, legal professionals needing HIPAA/PCI/SOX compliance. Local inference = no data transmission.

Researchers & Scientists

Run models on proprietary datasets without uploading to cloud. Full reproducibility, version control.

Individual Privacy Champions

Founders, consultants, creatives protecting intellectual property. No vendor lock-in, full control.

Enterprise Security Teams

Internal deployments on secure networks. Airgapped systems, compliance auditing, centralized management.

Supported Models

Open Foundation

  • Llama 2/3 (7B, 13B, 70B variants)
  • Mistral 7B
  • Phi-2/3

Specialized

  • Code models (StarCoder, CodeLlama)
  • Math models (MetaMath, Llemma)
  • Domain-specific fine-tunes

Performance Tiers

  • Fast: 7B models (4GB+ RAM)
  • Balanced: 13B models (8GB+ RAM)
  • Powerful: 70B+ (24GB+ VRAM)

Pricing & Distribution

  • Free Tier: Personal use, open source, community support. Model library included.
  • Pro ($49/year): Priority support, premium models, cloud sync, professional themes.
  • Team/Enterprise (Custom): Site licensing, compliance packages, white-label, managed deployment.

Key Differentiators

  • Privacy by Design: Zero data transmission; local storage encrypted; audit-ready architecture.
  • Enterprise Ready: Professional workspace management, team collaboration, compliance support.
  • Performance Optimized: GPU acceleration automatic; quantization available; benchmark-leading latency.
  • Tool Ecosystem: MCP integration enables connection to IDEs, APIs, databases, external services.
  • Cross-Platform Native: Mac, Windows, Linux with native performance; not web-wrapped.
  • Model Flexibility: Swap models instantly; no vendor lock-in; use open source or custom models.

Development Status & Roadmap

  • Q1 2026: Private beta with 500 founders; Mac/Windows native apps; model marketplace beta.
  • Q2 2026: Public beta; Linux support; team collaboration features; cloud sync (optional).
  • Q3 2026: Enterprise tier launch; compliance certifications (SOC2, HIPAA); white-label option.
  • Q4 2026: v1.0 production; ecosystem partnerships; mobile companion app.

Get Started

Forbidden Library is in private beta. Early access available for founders, researchers, and privacy-focused organizations.

Request Early Access