News

NVIDIA Nemotron 3 Ultra Arrives on AWS SageMaker

Dan FergusonJun 5, 2026 · 5 days ago

AWS has made NVIDIA's Nemotron 3 Ultra model available on Amazon SageMaker JumpStart with one-click deployment. The 550-billion-parameter model uses a hybrid Transformer-Mamba architecture that activates only 55 billion parameters per forward pass, delivering 5x faster inference and up to 30% lower costs for agentic AI workloads. The model supports up to 1 million token context length and is optimized for NVFP4 precision format.

TL;DR

NVIDIA Nemotron 3 Ultra now available day-zero on Amazon SageMaker JumpStart with one-click deployment
550B total parameters with 55B active parameters per forward pass using hybrid Transformer-Mamba MoE architecture
Delivers 5x faster inference and up to 30% lower costs for agentic AI tasks with up to 1M token context length
Designed for multi-step reasoning workloads including agent orchestration, coding agents, research synthesis, and complex enterprise workflows

Why It Matters

Agentic AI systems require models optimized for long-running, multi-turn interactions where every token and compute cycle compounds costs. Nemotron 3 Ultra's mixture-of-experts architecture addresses this directly by activating only a fraction of its parameters while maintaining coherence across hundreds of reasoning steps, making frontier-level reasoning economically viable for enterprise deployments.

Business Impact

Organizations building autonomous agents face significant infrastructure costs due to extended context windows and multi-step reasoning loops. The combination of 5x faster inference, 30% lower costs, and one-click deployment on SageMaker removes both technical and financial barriers to deploying sophisticated agentic systems for tasks like workflow automation, code generation, and research synthesis.

Key Implications

Mixture-of-experts architectures are becoming standard for agentic workloads, shifting the competitive advantage from raw parameter count to efficient parameter activation
AWS is positioning itself as the deployment platform for frontier reasoning models, reducing friction between model development and enterprise production
Cost and speed improvements may accelerate adoption of autonomous agents in enterprise workflows where multi-step reasoning was previously too expensive to justify

What to Watch

Monitor adoption patterns across the four highlighted use cases (agent orchestrators, coding agents, research, enterprise workflows) to understand which agentic applications drive the most value. Watch for competitive responses from other cloud providers and whether other model vendors release similar mixture-of-experts architectures optimized for long-context agentic tasks.

LLMs AI Agents Infrastructure AWS Model Releases

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Researchers from UIUC, UC Berkeley, and Chroma released Harness-1, a 20-billion parameter open-source search agent that scores 73% on information recall benchmarks, outperforming GPT-5.4 (70.9%) and other proprietary models. The model is available under Apache 2.0 license on Hugging Face. Harness-1 achieves its performance by offloading search session management to a structured software environment rather than relying on expanded context windows, suggesting that model efficiency matters more than raw parameter size for autonomous retrieval tasks.

by carl.franzen@venturebeat.com (Carl Franzen)about 24 hours ago· VentureBeat AI

LLMsNews

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Mustafa Suleyman, CEO of Microsoft AI, discussed the company's restructured approach to AI development in a podcast interview. Microsoft signed a new contract with OpenAI in October that allows the company to pursue superintelligence independently while continuing to license OpenAI models. Suleyman has assembled a dedicated superintelligence team and built infrastructure to train frontier models, announcing seven new models across multiple modalities at Microsoft Build.

by Nilay Patel2 days ago· The Verge AI

LLMsNews

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence

Microsoft AI chief Mustafa Suleyman disclosed that a contractual change with OpenAI six months ago freed the company to independently pursue superintelligence using its own researchers, data, and custom silicon. The company announced seven new in-house AI models branded under the MAI family, including a flagship reasoning model and tools for coding, image generation, transcription, and voice synthesis. This marks a strategic shift where Microsoft is building alongside OpenAI rather than relying solely on it, though the company is not abandoning the partnership.

by michael.nunez@venturebeat.com (Michael Nuñez)2 days ago· VentureBeat AI

LLMsNews

ChatGPT adds persistent memory across conversations

OpenAI has introduced a new memory system for ChatGPT designed to retain user preferences and maintain context across multiple conversations. The feature allows the AI assistant to remember details about users over time, reducing the need to repeat information in each new chat session. This update aims to make ChatGPT interactions more personalized and efficient for ongoing work.

5 days ago· OpenAI

NVIDIA Nemotron 3 Ultra Arrives on AWS SageMaker

TL;DR

Why It Matters

Business Impact

Key Implications

What to Watch

Our Briefing

Open-Source Search Agent Outperforms GPT-5.4

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence

ChatGPT adds persistent memory across conversations

Related stories

Open-Source Search Agent Outperforms GPT-5.4

Microsoft Breaks Free From OpenAI Dependency With Independent AI Push

Microsoft Breaks Free From OpenAI to Build Its Own Superintelligence

ChatGPT adds persistent memory across conversations