
Topic
LLMs
Large language model releases, benchmarks, and capability research
Featured


DeepSeek hits $45B valuation on efficient AI training model
All Stories

Empromptu AI launches Alchemy Models for continuous fine-tuning from production workflows
Empromptu AI launched Alchemy Models, a platform that automatically captures training data from enterprise AI…

AI IQ Launches Model Scorecard, Sparks Precision vs. Simplicity Debate
A new site called AI IQ has launched a framework for scoring frontier language models on a single intelligence…

Frontier LLMs Silently Corrupt 25% of Documents in Iterative Workflows
Microsoft researchers developed a benchmark showing that frontier LLMs silently corrupt an average of 25% of document…
Hermes Agent Becomes Most-Used Framework as Local AI Agents Go Mainstream
Hermes Agent, an open source agentic AI framework from Nous Research, has reached 140,000 GitHub stars in under three…

Sakana trains 7B model to orchestrate GPT, Claude, Gemini
Sakana AI has developed RL Conductor, a 7-billion-parameter language model trained via reinforcement learning to…

AWS Details Verifiable Rewards Method for More Reliable LLM Training
AWS published a technical guide on reinforcement learning with verifiable rewards (RLVR), a method that addresses…

Subquadratic claims 1,000x efficiency gain; researchers demand proof
Miami-based startup Subquadratic emerged from stealth claiming its SubQ 1M-Preview model achieves a 1,000x efficiency…

Faithful Reasoning Emerges from Multi-Move Training, Not Direct Prediction
Researchers studied how reasoning develops in language models across supervised fine-tuning and reinforcement learning…

Safety Routing Circuits Found Across Models, Vulnerable to Encoding Attacks
Researchers have localized the policy routing mechanism in alignment-trained language models, identifying specific…

Cursor Keeps Its Distance From xAI Despite SpaceX Tie-Up
Despite SpaceX's $60 billion conditional takeover offer for Cursor last month, the coding startup is maintaining…

The AI scaffolding layer is collapsing. Context is the new moat.
The middleware layer that once helped developers build LLM applications, including indexing frameworks, query engines,…
Warmer AI Models Trade Accuracy for Empathy
Researchers at Oxford University's Internet Institute found that large language models fine-tuned to appear warmer and…

How OpenAI's Personality Feature Unleashed the Goblins
OpenAI's GPT-5.5 model exhibited unexpected behavior where it became obsessed with discussing goblins, gremlins, and…

Alibaba cuts AI agent tool calls 49x with decoupled optimization
Alibaba researchers introduced Hierarchical Decoupled Policy Optimization (HDPO), a reinforcement learning framework…

Goodfire's Silico Brings Mechanistic Interpretability to Model Development
Goodfire, a San Francisco startup, released Silico, a tool that lets developers inspect and adjust AI model parameters…

Aggregating Zero-Shot LLMs Beats Single Models for Financial Disclosure Analysis
A new paper demonstrates that a lightweight supervised aggregator can effectively combine outputs from multiple…

NanoKnow: Mapping How LLMs Encode Knowledge
Researchers have released NanoKnow, a benchmark dataset that maps questions from Natural Questions and SQuAD to whether…

AWS Seizes OpenAI Models as Exclusive Cloud Partnerships End
AWS launched a major suite of AI capabilities on Tuesday, including OpenAI's GPT-5.4 and GPT-5.5 models on Amazon…

Scaling Multi-Anchor Embeddings to LLMs with 40x Compression
Researchers introduce Adaptive Dictionary Embeddings (ADE), a framework that scales multi-anchor word representations…

Xiaomi's Open-Source MiMo Models Challenge Proprietary AI on Agentic Tasks
Xiaomi released two open-source large language models, MiMo-V2.5 and MiMo-V2.5-Pro, under the MIT License, positioning…

OpenAI's AWS Arrival Meets Muted Response From Customers
Amazon has announced a deal to bring OpenAI's models to AWS through a new offering for AI agents, but the move comes as…

New Multilingual Medical AI Benchmark Reveals Language and Vision Gaps
Researchers have developed EuropeMedQA, a multilingual and multimodal medical examination dataset drawn from official…

Mapping Causal Reasoning in LLMs with Sparse Concept Graphs
Researchers propose Causal Concept Graphs (CCG), a method that maps how concepts interact during multi-step reasoning…
