Multi-model chat, pipelines, templates, CLI tools hub — all powered by local Ollama models. No cloud dependency. No subscriptions. No data leaving your machine.
Replace a dozen separate tools. Chat, pipelines, templates, CLI, and REST API — all in one pip install.
Stream responses from any Ollama model or cloud API. Switch models mid-conversation. Full markdown + code highlighting.
Chain AI calls: summarize → translate → critique. Build in the visual UI or define in code. Run with one CLI command.
6 built-in templates with {{variable}} substitution. Code Review, Debug, Translate, Summarize, and more. Create your own.
Quick queries, stdin piping, pipeline runner. Works in shell scripts. echo "code" | cortex ask "review this"
Track token usage, latency, and request history per model. See estimated savings vs. cloud API costs.
Add custom providers in one Python class. Add routers with FastAPI. The codebase is clean and designed to extend.
21 documented endpoints. Use Cortex as a backend for your own apps. Auto-generated OpenAPI docs at /docs.
All data stored locally in SQLite. No telemetry, no accounts required. Your conversations stay on your machine.
Auto-detects Ollama. Browser opens automatically. Pre-loaded with templates and pipelines. Ready in 30 seconds.
Unified interface for ALL your CLI AI tools — Claude Code, Aider, and custom tools. Side-by-side comparison mode. Shared context across tools.
A FastAPI server proxies requests to Ollama or cloud APIs, stores history in SQLite, and serves a React UI — all in one pip install.
One pip install. Cortex auto-detects Ollama and any configured API keys.
Choose from local Ollama models or cloud APIs — all in the same dropdown.
Create pipelines and templates. Run from the UI, CLI, or REST API.
Dashboard shows usage, latency, and savings vs. cloud alternatives.
Local models via Ollama or cloud APIs — configured once, available everywhere.
Open source, self-hosted, privacy-first. No subscriptions. No data leaving your machine.