NewsTrending

Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling

Jun 10, 2026 · about 2 hours ago

Apple announced AFM 3, a new foundation model family developed with Google that includes a 20-billion-parameter on-device model storing weights in NAND flash rather than DRAM. The architecture routes expert selection once per prompt instead of per token, allowing larger models to run locally while staying within consumer device memory constraints. This addresses a fundamental limitation that has kept on-device AI models significantly smaller than cloud alternatives.

TL;DR

AFM 3 Core Advanced stores 20B parameters in NAND flash, not DRAM, bypassing the memory ceiling that has limited on-device models
Expert routing happens once per prompt, not per token, because NAND-to-DRAM bandwidth cannot support continuous weight swapping
Active parameter count scales from 1B to 4B based on task complexity, drawn from the full 20B pool in flash storage
Apple developed the architecture with Google and runs server-side models on Nvidia GPUs in Google Cloud within Apple's Private Cloud Compute boundary

Why It Matters

On-device AI has been constrained by DRAM capacity, forcing developers to choose between capable cloud models and limited local ones. Apple's flash-based weight storage and per-prompt routing break this constraint, enabling substantially larger models to run locally. This shifts the practical frontier of what on-device AI agents can accomplish without cloud dependency.

Business Impact

Enterprise architects evaluating agentic workloads now have a third option beyond cloud-dependent or limited on-device models. Larger local models reduce latency, improve privacy, and lower cloud compute costs, but deployment viability depends on undisclosed metrics like energy consumption, thermal behavior, and transparent offloading policies that Apple has not yet published.

Key Implications

On-device model capacity can now scale to 20B parameters, closing the gap with server-side deployments and enabling more complex local reasoning
The per-prompt routing model trades token-level flexibility for memory efficiency, potentially affecting performance on tasks requiring dynamic expert selection across a sequence
Apple's undisclosed offloading behavior and lack of energy or thermal profiling data create uncertainty for enterprises planning production deployments
The architecture depends on NAND flash speed and DRAM bandwidth characteristics specific to Apple silicon, limiting portability to other platforms

What to Watch

Monitor whether Apple publishes energy, thermal, and bandwidth profiling data needed for production deployment decisions. Watch for third-party benchmarks on real-world agentic workloads and whether transparent offloading to cloud becomes visible to developers and users. Track adoption patterns among enterprise customers evaluating on-device versus hybrid inference strategies.

LLMs AI Hardware AI Agents Infrastructure Model Releases

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Anthropic has released Claude Fable 5, making its Mythos-class model available to the public for the first time. The model includes built-in guardrails that restrict responses in high-risk domains including cybersecurity and biology. This release marks a significant step in bringing advanced AI capabilities to broader audiences while attempting to manage safety concerns.

by Rebecca Bellanabout 1 hour ago· TechCrunch AI

LLMsTrendingNews

Open-Source Search Agent Outperforms GPT-5.4

Researchers from UIUC, UC Berkeley, and Chroma released Harness-1, a 20-billion parameter open-source search agent that scores 73% on information recall benchmarks, outperforming GPT-5.4 (70.9%) and other proprietary models. The model is available under Apache 2.0 license on Hugging Face. Harness-1 achieves its performance by offloading search session management to a structured software environment rather than relying on expanded context windows, suggesting that model efficiency matters more than raw parameter size for autonomous retrieval tasks.

by carl.franzen@venturebeat.com (Carl Franzen)1 day ago· VentureBeat AI

LLMsNews

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Mustafa Suleyman, CEO of Microsoft AI, discussed the company's restructured approach to AI development in a podcast interview. Microsoft signed a new contract with OpenAI in October that allows the company to pursue superintelligence independently while continuing to license OpenAI models. Suleyman has assembled a dedicated superintelligence team and built infrastructure to train frontier models, announcing seven new models across multiple modalities at Microsoft Build.

by Nilay Patel2 days ago· The Verge AI

LLMsNews

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence

Microsoft AI chief Mustafa Suleyman disclosed that a contractual change with OpenAI six months ago freed the company to independently pursue superintelligence using its own researchers, data, and custom silicon. The company announced seven new in-house AI models branded under the MAI family, including a flagship reasoning model and tools for coding, image generation, transcription, and voice synthesis. This marks a strategic shift where Microsoft is building alongside OpenAI rather than relying solely on it, though the company is not abandoning the partnership.

by michael.nunez@venturebeat.com (Michael Nuñez)2 days ago· VentureBeat AI

Apple's Flash-Based Model Architecture Breaks On-Device Memory Ceiling

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Our Briefing

Anthropic Opens Mythos-Class AI to Public With Safety Guardrails

Open-Source Search Agent Outperforms GPT-5.4

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence

Related stories

Anthropic Opens Mythos-Class AI to Public With Safety Guardrails

Open-Source Search Agent Outperforms GPT-5.4

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence