News

Qwen3.7-Plus Now Available Only Through Alibaba's Proprietary API

carl.franzen@venturebeat.com (Carl Franzen)Jun 3, 2026 · 7 days ago

Alibaba released Qwen3.7-Plus, a multimodal AI model supporting text, video, and image inputs at $0.40/$1.60 per 1M tokens, 60% cheaper than its text-only predecessor Qwen3.7-Max. The model marks a strategic shift away from Alibaba's open-source focus, available only through proprietary APIs and closed commercial licensing. It includes a 1-million token context window and a 'preserve_thinking' parameter to maintain reasoning state across multi-step tasks, positioning it competitively on cost among major AI models.

TL;DR

Qwen3.7-Plus costs $2.00 per 1M tokens total ($0.40 input, $1.60 output), making it among the cheapest powerful multimodal models available
Model supports text, video, and image inputs, unlike the text-only Qwen3.7-Max it replaces
Available only via proprietary API and Qwen Chat, departing from Alibaba's prior open-source strategy
Features 1-million token context window and 'preserve_thinking' parameter to maintain reasoning continuity in multi-step agent tasks

Why It Matters

Alibaba's shift to proprietary licensing for its latest models signals a broader industry trend toward closed commercial offerings for frontier capabilities, even among companies historically committed to open-source release. The low pricing and multimodal support make this a viable option for enterprises running autonomous agents and complex workflows, but the closed model limits adoption among developers and researchers who relied on open-weight alternatives.

Business Impact

For enterprises deploying autonomous agents and multi-step workflows, Qwen3.7-Plus offers cost-effective multimodal processing with architectural features designed to prevent reasoning state decay. However, organizations invested in open-source Qwen models, including major users like Airbnb, must now evaluate proprietary alternatives or remain on older open-weight versions.

Key Implications

Alibaba is abandoning its open-source-first strategy for newer model releases, consolidating advanced capabilities behind proprietary APIs
The 'preserve_thinking' parameter and 1M token context window address a real technical bottleneck in agentic AI systems, but this capability is now locked behind a commercial license
Pricing pressure from competitors like DeepSeek and MiniMax is driving aggressive cost positioning, with Qwen3.7-Plus among the cheapest multimodal options available

What to Watch

Monitor whether Alibaba continues to release open-source variants alongside proprietary models, or if this marks a permanent shift to closed releases for frontier capabilities. Track adoption rates among enterprises currently using open-weight Qwen models and whether the 'preserve_thinking' feature becomes a standard expectation across competing platforms. Watch for competitive pricing responses from OpenAI, Google, and Anthropic in the multimodal segment.

Multimodal LLMs AI Agents Generative AI Model Releases

Our Briefing

Weekly signal. No noise. Built for founders, operators, and AI-curious professionals.

No spam. Unsubscribe any time.

Google DeepMind Releases Gemma 4 12B for Laptop-Based AI

Google DeepMind introduced Gemma 4 12B, a multimodal AI model designed to run on consumer laptops with 16GB of RAM. The model uses an encoder-free architecture that processes vision and audio inputs directly into the language model backbone, reducing latency and memory overhead. Performance approaches the larger 26B model while maintaining a smaller footprint, and it is released under an Apache 2.0 license.

about 19 hours ago· Google Deepmind

MultimodalTrendingNews

Google Launches Near Real-Time Voice Translation in Gemini 3.5

Google has launched Gemini 3.5 Live Translate, a near real-time speech translation feature now available in Google AI Studio, Google Translate, and Google Meet. The system delivers natural-sounding voice translation with minimal latency. The rollout represents a significant step toward breaking down language barriers in professional and consumer communication.

about 20 hours ago· Google Deepmind

MultimodalTrendingNews

Google's Gemma 4 12B Brings Multimodal AI to Offline Laptops

Google released Gemma 4 12B, an 11.95-billion-parameter open-source model that runs entirely on a standard 16GB enterprise laptop without requiring cloud connectivity. The model uses an encoder-free architecture that processes audio and video directly without secondary processing modules, reducing latency and memory overhead. It includes a 256K token context window, native tool-use capabilities, and step-by-step reasoning mode, making it suitable for enterprises with strict data privacy requirements.

by carl.franzen@venturebeat.com (Carl Franzen)6 days ago· VentureBeat AI

MultimodalNews

AWS Adds Multimodal Evaluators to Strands Evals

AWS has announced four multimodal evaluators for Strands Evals that use large language models as judges to assess image-to-text task outputs. The evaluators, Overall Quality, Correctness, Faithfulness, and Instruction Following, score model responses against source images directly, addressing a gap where text-only evaluation cannot detect visual hallucinations or factual errors grounded in images. This addresses a growing need as Gartner predicts 80% of enterprise software will be multimodal by 2030, up from under 10% today.

by Sangmin Woo20 days ago· AWS Machine Learning Blog