VFF - The signal in the noise
News

Cohere open-sources coding agent that runs on single H100

Read original
Share
Cohere open-sources coding agent that runs on single H100

Cohere released North Mini Code, a 30 billion parameter open-source coding agent that runs on a single H100 GPU, offering engineering teams an alternative to proprietary models like Claude Fable 5. The model is purpose-built for agentic software engineering tasks including code review, architecture mapping, and terminal work, but generates roughly three times more output tokens than comparable models in testing, creating a verbosity cost in high-volume production scenarios.

  • North Mini Code is a 30B parameter mixture-of-experts model with 3B active parameters per token, available under Apache 2.0 license on Hugging Face
  • Supports 256,000 token context window and 64,000 token max generation length, trained specifically for agentic coding tasks across three different agent scaffolds
  • Achieves 2.8x higher output throughput and 30% inter-token latency advantage over Mistral Devstral Small 2 in internal testing, but generates 75 million tokens versus 25 million median on Artificial Analysis benchmarks
  • Can run on Mac Studio with 20GB RAM via MLX, making it accessible for local development alongside cloud deployment options

Open-source coding agents reduce vendor lock-in and lower deployment costs for teams building agentic software engineering pipelines. North Mini Code's efficiency on modest hardware (single H100, or even Mac Studio) democratizes access to capable coding agents that were previously limited to managed API services. However, the verbosity tradeoff means teams must evaluate token costs carefully in production workloads.

Engineering teams can now deploy a capable coding agent without relying on Claude Fable 5 or other proprietary models, reducing per-token costs and improving data privacy. The model's ability to run on a single H100 or local hardware makes it economically viable for organizations of various scales, though the higher token generation means actual inference costs require careful calculation against alternatives.

  • Open-source coding agents are becoming competitive with proprietary offerings on speed and capability, shifting the competitive landscape away from pure API-based models
  • The verbosity penalty (3x output tokens) suggests that efficiency gains from sparse MoE architecture come with a tradeoff in model conciseness, requiring teams to model total cost of ownership rather than parameter count alone
  • Multi-harness training across different agent scaffolds (SWE-Agent, Mini-SWE-Agent, OpenCode) appears to improve generalization, potentially becoming a standard practice for agentic model development

Monitor whether the verbosity penalty becomes a limiting factor in production deployments and whether other vendors respond with similarly efficient open-source alternatives. Track adoption patterns across different deployment scenarios (local, single GPU, cloud) to understand where North Mini Code gains traction. Watch for improvements in token efficiency in future versions, as the 3x output token multiplier is a significant cost driver in high-volume agentic pipelines.

Share

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Related stories

Broadcom Taps Apollo, Blackstone for $35B AI Data Center Fund
TrendingNews

Broadcom Taps Apollo, Blackstone for $35B AI Data Center Fund

Broadcom announced a new financing fund backed by Apollo and Blackstone to fund more than 20 gigawatts of AI data centers through 2028 using Broadcom-designed chips. The fund will support infrastructure projects tied to Anthropic and OpenAI. Apollo is leading an initial $35 billion commitment to the effort.

by Anissa Gardizyabout 1 hour ago· The Information
GM Taps EVs as Grid Storage to Handle AI Data Center Demand
TrendingNews

GM Taps EVs as Grid Storage to Handle AI Data Center Demand

General Motors announced vehicle-to-grid capabilities for current EV customers, a new commercial energy storage strategy using sodium-ion batteries, and a simplified public charging feature. The move position EVs as potential grid resources to help offset rising electricity demand from AI data centers. GM is activating these capabilities at scale across its existing customer base.

by Andrew J. Hawkinsabout 1 hour ago· The Verge AI
Tech's Power Shift: MANGOS Replaces FAANG
TrendingNews

Tech's Power Shift: MANGOS Replaces FAANG

The tech industry's dominant corporate hierarchy may shift as SpaceX, Anthropic, and OpenAI prepare for public market debuts. The article proposes replacing the FAANG acronym (Facebook, Apple, Amazon, Netflix, Google) with MANGOS to reflect this emerging class of corporate leaders. The timing and scale of these IPOs could reshape which companies define the sector's power structure.

by Julie Bortabout 2 hours ago· TechCrunch AI
Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling
TrendingNews

Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling

Apple announced AFM 3, a new foundation model family developed with Google that includes a 20-billion-parameter on-device model storing weights in NAND flash rather than DRAM. The architecture routes expert selection once per prompt instead of per token, allowing larger models to run locally while staying within consumer device memory constraints. This addresses a fundamental limitation that has kept on-device AI models significantly smaller than cloud alternatives.

about 2 hours ago· VentureBeat AI