July 02, 2026 ChainGPT

Meituan's LongCat-2.0: 1.6T MoE Trained on Chinese ASICs, Cheap API for Agent Coding

Meituan's LongCat-2.0: 1.6T MoE Trained on Chinese ASICs, Cheap API for Agent Coding
Meituan has quietly revealed the identity behind a surprise hit on OpenRouter: LongCat-2.0, a 1.6‑trillion‑parameter mixture‑of‑experts (MoE) model that spent two months running anonymously as “Owl Alpha.” The company confirmed the model on June 30 and opened access through OpenAI- and Anthropic‑compatible API endpoints, promising a million‑token context and a design aimed squarely at agentic coding workloads. Why LongCat-2.0 matters - Scale and efficiency: LongCat-2.0 is a 1.6T‑parameter MoE that activates roughly 48 billion parameters per token on average, with active parameters varying between about 33B and 56B depending on query complexity. That sparse activation lets it offer trillion‑scale capacity without the cost of fully dense inference. - Chinese hardware milestone: Meituan says this is the first trillion‑parameter model trained and deployed end‑to‑end on domestically produced Chinese ASICs (over 50,000 accelerators). Pretraining ran across 35+ trillion tokens and, crucially, completed “with no rollbacks or irrecoverable loss spikes”—a notable claim given how frequently large runs on new hardware suffer mid‑run failures. For comparison, DeepSeek’s V4‑Pro used Huawei chips only for inference while pretraining ran on Nvidia hardware. - Stealth launch paid off: While anonymous on OpenRouter, the model climbed usage charts—#1 on Hermes Agent workspace, #2 on Claude Code, and #3 across OpenClaw—ranked by monthly call volume. Price and go‑to‑market LongCat-2.0 undercuts most top models on price, making it appealing for high‑volume use cases: - Standard API: $0.75 per million input tokens / $2.95 per million output tokens. - Launch promo: $0.30 / $1.20 per million. - Cached context reads are free. - Token packs: 1 billion tokens for about $60 — attractive for coders and heavy users. Comparisons: GPT‑5.5 charges $5 / $30 per million tokens; Claude Sonnet 5 launched at $2 / $10; DeepSeek V4‑Pro is $0.435 / $0.87; Xiaomi’s MiMo‑V2.5 Pro matched similar rates after May cuts. That pricing profile makes LongCat‑2.0 a strong value play for repository‑scale and agent-driven tasks where context cache costs stack up. Architecture highlights Meituan has stitched several efficiency and capability tricks into LongCat‑2.0: - LongCat Sparse Attention (LSA): an attention system inspired by DeepSeek that scales for very long contexts (Meituan touts 1M‑token contexts) by focusing computation on the most relevant parts of long conversations. - N‑gram embeddings: the model can represent phrases and subword groups as richer tokens (about 100× more possible representations), improving understanding of multiword concepts without dramatically increasing model size. - Modular specialists + router: post‑training, Meituan combined three specialist systems—Agent (tool use), Reasoning (problem solving), and Interaction (conversational)—and uses a routing mechanism to assign the right specialist or combination to each request. Benchmarks and a hands‑on test - SWE‑bench Pro (fixing real GitHub issues): LongCat‑2.0 scored 59.5—slightly ahead of GPT‑5.5 (58.6) and Gemini 3.1 Pro (54.2), but still behind Claude Opus 4.7/4.8. - FORTE (office‑task agents across 15 professions): scored 73.2, tying Claude Opus 4.6 but trailing GPT‑5.5’s 77.8. - Practical coding test: In a short game‑building trial, LongCat‑2.0 delivered a usable product and held up after iterative refinements. Quality was behind Claude Fable and Opus 4.8 and roughly in the same ballpark as Sonnet 4.6. Notable logical issues emerged: target‑switching logic erred as enemy counts and speeds increased, causing frustrating behavior—typical of models that prioritize producing prompt‑aligned output over foreseeing all downstream consequences. Meituan’s pricing makes iterative fixes affordable, so cheaper models can still be pragmatic for development cycles. (The test build and results are available on itch.io.) Availability and limits - Live via API: LongCat‑2.0 is accessible now through Meituan’s OpenAI/Anthropic‑compatible endpoints and is already integrated into agent harnesses like Hermes, Claude Code, and OpenClaw. - Weights not released: Meituan has not made model weights available for self‑hosting; GitHub and Hugging Face repos still say “model weights coming soon” with no shipping date. What this means for builders and markets For teams running high‑volume, repository‑scale coding agents or anyone sensitive to per‑token costs, LongCat‑2.0’s combination of price, large context, and competitive benchmark performance is an immediate draw. Strategically, the successful full‑stack training on domestic ASICs is a signal that Chinese cloud and AI providers are reducing reliance on U.S. hardware for model training—an important datapoint in the global AI infrastructure race. Bottom line: LongCat‑2.0 punches above its price point, delivering a trillion‑plus parameter capability with sparse activation, long‑context tricks, and agentic design—impressive for a model trained entirely on domestic accelerators. Builders who prioritize cost and volume should take a close look; those who need self‑hosting will have to wait for the promised weights. Read more AI-generated news on: undefined/news