News

Cohere open-sources coding agent that runs on single H100

Jun 10, 2026 · about 2 hours ago

Cohere released North Mini Code, a 30 billion parameter open-source coding agent that runs on a single H100 GPU, offering engineering teams an alternative to proprietary models like Claude Fable 5. The model is purpose-built for agentic software engineering tasks including code review, architecture mapping, and terminal work, but generates roughly three times more output tokens than comparable models in testing, creating a verbosity cost in high-volume production scenarios.

TL;DR

North Mini Code is a 30B parameter mixture-of-experts model with 3B active parameters per token, available under Apache 2.0 license on Hugging Face
Supports 256,000 token context window and 64,000 token max generation length, trained specifically for agentic coding tasks across three different agent scaffolds
Achieves 2.8x higher output throughput and 30% inter-token latency advantage over Mistral Devstral Small 2 in internal testing, but generates 75 million tokens versus 25 million median on Artificial Analysis benchmarks
Can run on Mac Studio with 20GB RAM via MLX, making it accessible for local development alongside cloud deployment options

Why It Matters

Open-source coding agents reduce vendor lock-in and lower deployment costs for teams building agentic software engineering pipelines. North Mini Code's efficiency on modest hardware (single H100, or even Mac Studio) democratizes access to capable coding agents that were previously limited to managed API services. However, the verbosity tradeoff means teams must evaluate token costs carefully in production workloads.

Business Impact

Engineering teams can now deploy a capable coding agent without relying on Claude Fable 5 or other proprietary models, reducing per-token costs and improving data privacy. The model's ability to run on a single H100 or local hardware makes it economically viable for organizations of various scales, though the higher token generation means actual inference costs require careful calculation against alternatives.

Key Implications

Open-source coding agents are becoming competitive with proprietary offerings on speed and capability, shifting the competitive landscape away from pure API-based models
The verbosity penalty (3x output tokens) suggests that efficiency gains from sparse MoE architecture come with a tradeoff in model conciseness, requiring teams to model total cost of ownership rather than parameter count alone
Multi-harness training across different agent scaffolds (SWE-Agent, Mini-SWE-Agent, OpenCode) appears to improve generalization, potentially becoming a standard practice for agentic model development

What to Watch

Monitor whether the verbosity penalty becomes a limiting factor in production deployments and whether other vendors respond with similarly efficient open-source alternatives. Track adoption patterns across different deployment scenarios (local, single GPU, cloud) to understand where North Mini Code gains traction. Watch for improvements in token efficiency in future versions, as the 3x output token multiplier is a significant cost driver in high-volume agentic pipelines.

AI Hardware AI Agents Model Releases Open Source Coding / Dev Tools

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Cohere open-sources coding agent that runs on single H100

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Our Briefing

Broadcom Taps Apollo, Blackstone for $35B AI Data Center Fund

GM Taps EVs as Grid Storage to Handle AI Data Center Demand

Tech's Power Shift: MANGOS Replaces FAANG

Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling

Related stories

Broadcom Taps Apollo, Blackstone for $35B AI Data Center Fund

GM Taps EVs as Grid Storage to Handle AI Data Center Demand

Tech's Power Shift: MANGOS Replaces FAANG

Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling