Back to Blog
AI/MLElectronOllamaLocal-First AITypeScript

Building a Production-Grade LLM Orchestration Layer with Electron, React & Ollama

12 min readFeatured

The Problem with Cloud-Dependent AI

Most AI tools today require API keys, internet connectivity, and monthly subscriptions. For developers working with sensitive codebases, compliance-heavy industries, or simply those who value privacy, this dependency is a non-starter. Gravity OS was built to solve exactly this — a local-first AI operating layer that runs entirely offline.

Architecture Overview

Gravity OS sits between the user and multiple local LLMs managed by Ollama. The architecture follows a layered design:

┌─────────────────────────────┐
│     Electron Shell (UI)     │
├─────────────────────────────┤
│   Agent Orchestration Core  │
├─────────────────────────────┤
│   Model Abstraction Layer   │
├─────────────────────────────┤
│         Ollama API          │
├─────────────────────────────┤
│    Local LLM Models (7B-70B)│
└─────────────────────────────┘

Key Components

Electron Shell: Provides the desktop environment, system tray integration, and native OS features like file system access and notifications.

React Frontend: Built with a component-based architecture for chat interfaces, agent management, and real-time streaming of model responses.

Agent Orchestration Core: The brain of the system. Manages agent definitions, task routing, and collaborative workflows between multiple models.

Model Abstraction Layer: A unified interface over Ollama's API that handles prompt formatting, context window management, and response parsing across different model architectures.

Agent Collaboration Patterns

One of Gravity OS's standout features is multi-agent collaboration. Rather than relying on a single monolithic model, tasks are decomposed and routed to specialized agents:

interface Agent {
  name: string;
  model: string;
  systemPrompt: string;
  capabilities: string[];
}

const agents: Agent[] = [
  {
    name: "Architect",
    model: "codellama:34b",
    systemPrompt: "You are a software architect...",
    capabilities: ["design", "code-review"],
  },
  {
    name: "Writer",
    model: "mistral:7b",
    systemPrompt: "You are a technical writer...",
    capabilities: ["documentation", "editing"],
  },
];

Tasks are routed based on capability matching, and results are aggregated through a reducer pattern that synthesizes outputs into coherent responses.

Streaming & Performance

Real-time token streaming was critical. Using Ollama's streaming API via Server-Sent Events, each token is pushed through Electron's IPC bridge to React's state management:

// IPC Bridge for Streaming
ipcMain.handle("ollama:generate", async (event, { model, prompt }) => {
  const stream = await ollama.generate({ model, prompt, stream: true });
  for await (const chunk of stream) {
    event.sender.send("ollama:token", chunk.response);
  }
});

On the frontend, tokens are buffered and rendered with a typewriter effect using React's concurrent mode for smooth UI updates even during heavy generation.

Context Window Management

LLMs have finite context windows. Gravity OS implements a sliding window with semantic compression:

1. Token counting — Estimate token usage before sending 2. Semantic chunking — Split conversation history into meaningful segments 3. Summary injection — Compress older context into summaries 4. Dynamic windowing — Adjust context size based on available VRAM

Security & Privacy

Since everything runs locally, data never leaves the machine. Encryption at rest is applied to conversation history using Node.js's native crypto module. The app sandboxes model execution and prevents any network calls from the agent process.

Lessons Learned

Building Gravity OS taught me that local AI is not just feasible — it's often superior for development workflows. The latency of local inference (even on consumer hardware) is offset by zero network overhead and unlimited token budgets. The key was designing the orchestration layer to be model-agnostic, allowing users to swap between Mistral, Llama, CodeLlama, and others without changing their workflow.