Meituan has pulled back the curtain on LongCat-2.0 — a stealthy, open-license 1.6-trillion-parameter mixture-of-experts (MoE) model that was quietly powering OpenRouter under the alias “Owl Alpha” for two months. The reveal on June 30 confirms what many suspected: this is a heavyweight built for agentic coding and long-context workloads, but with a surprisingly competitive price tag.
What it is
- Size & activation: LongCat-2.0 is a 1.6T-parameter MoE that activates roughly 48 billion parameters per token on average, with a range of about 33B–56B depending on request complexity.
- Context window: Designed for very long contexts (Meituan touts a 1M-token capability using its LongCat Sparse Attention).
- License: Open-license model, but weights are not yet available for self-hosting.
Why the stealth mattered
- While anonymous on OpenRouter as Owl Alpha, the model climbed usage rankings: #1 on Hermes Agent workspace, #2 on Claude Code, and #3 across OpenClaw deployments by monthly call volume. That early adoption validated the model’s utility before the public reveal.
Domestic hardware milestone
- LongCat-2.0 is notable as the first trillion-parameter model Meituan says was trained and deployed end-to-end on domestically produced Chinese ASICs — not just run on them for inference. The pretraining job spanned more than 35 trillion tokens across a cluster of over 50,000 homegrown accelerators, and Meituan reports the run completed with “no rollbacks or irrecoverable loss spikes.” (By contrast, DeepSeek’s V4-Pro used Huawei chips for inference but relied on Nvidia hardware for pretraining.)
Price and access — the headline advantage
- API pricing: Standard API access is $0.75 per million input tokens and $2.95 per million output tokens. A launch promo cuts that to $0.30/$1.20. Cached context reads are free.
- Token packs: 1 billion-token bundles priced at roughly $60, which is attractive for heavy coders and repo-scale automation.
- How that stacks up: LongCat-2.0’s promo price undercuts GPT-5.5’s $5/$30 per million and Claude Sonnet 5’s intro $2/$10, and sits near DeepSeek V4-Pro’s $0.435/$0.87 and Xiaomi’s MiMo-V2.5 Pro (after May cuts).
- Availability: Reachable now through Meituan’s OpenAI- and Anthropic-compatible API endpoints and via agent integrations like Hermes, Claude Code, and OpenClaw. Weights on GitHub/Hugging Face remain “coming soon”; Meituan has not given a release date for self-hosting.
How it performs
- Benchmarks: On SWE-bench Pro (real GitHub issues), LongCat-2.0 scored 59.5 — edging past GPT-5.5’s 58.6 and Gemini 3.1 Pro’s 54.2, though still behind some Claude Opus variants. On FORTE (office-agent tasks across 15 professions, 45-minute limit), it scored 73.2 — tied with Claude Opus 4.6 and trailing GPT-5.5’s 77.8.
- Hands-on test: In a quick game-building coding trial, LongCat-2.0 produced workable output and held up through iterations. Quality trailed Claude Fable and Opus 4.8 and was roughly comparable to Sonnet 4.6, but the cost-performance ratio was compelling. The demo revealed a logic bug in enemy-targeting that caused erratic target switching at high speeds — a typical limitation for lower-cost, prompt-driven coding models that often require iterative refinement.
Architectural highlights
- Sparse attention: LongCat Sparse Attention (LSA) copies the idea of focusing compute on the most relevant parts of very long conversations, enabling faster responses on large contexts without exploding model size.
- N-gram embedding system: A new embedding strategy lets the model represent phrases and common multi-token patterns (e.g., “New York City”) as richer units, multiplying representational diversity by roughly 100x without dramatically increasing parameters.
- Specialist routing: Post-training, Meituan stitches together three specialist systems — Agent (tool use), Reasoning (problem solving), and Interaction (conversation). A routing mechanism assigns the right combo of specialists to each request for more targeted behavior.
Who should pay attention
- Teams building high-volume coding agents, repository-scale automation, or any service where free cached-context reads compound costs will find LongCat-2.0 especially interesting. Its API compatibility with OpenAI/Anthropic endpoints makes swapping it into existing agent stacks straightforward.
- Self-hosters and researchers still waiting for weights will have to be patient; Meituan hasn’t published a timeline.
Bottom line
LongCat-2.0 is a strategic package: a domestically trained, trillion-scale MoE with practical long-context features and aggressive pricing aimed at volume users. It won’t beat the very top models on all quality metrics today, but for teams optimizing cost, throughput, and agentic workflows, it’s a strong new contender — especially in markets prioritizing a China-first hardware stack.
Read more AI-generated news on: undefined/news