StepFun's StepAudio 2.5 Realtime: Human-Like, Persona-Stable Voice AI for Web3

StepFun’s new real-time voice AI claims it can both act—and listen—like a human Shanghai AI lab StepFun this week released StepAudio 2.5 Realtime, an end-to-end, real-time voice model that takes audio in and returns audio out (no intermediate text). The model supports Chinese and English and, according to StepFun’s benchmarks, outperforms current live voice systems on several key measures—most notably in reading non-verbal vocal cues. What StepAudio does differently - End-to-end realtime audio: audio input → audio output, built for low-latency spoken interactions and longer roleplay sessions. - Persona stability via roleplay-specific RLHF: StepFun says it trained the model with reinforcement learning from human feedback focused specifically on keeping characters “in‑character.” Training began with 10,000 human-authored persona seeds that were algorithmically expanded into a million-scale feature matrix so the model can better resist drift during long or adversarial conversations. - Paralinguistic comprehension: the model extracts non-verbal cues—tone, speaking rate, inferred age, emotion—from raw audio before generating a reply, which the company highlights as a core differentiator. Benchmark snapshots (StepFun’s reported scores) - Paralinguistic comprehension (0–100): StepAudio 82.18; GPT Realtime 1.5 80.46; Gemini Live 58.05; DouBao Realtime 16.09. - Human evaluation (real users via mobile app, 0–100): StepAudio 80.41; GPT Realtime 1.5 68.01; Gemini Live 67.16. - General dialogue quality (API, 0–100): StepAudio 86.36; GPT Realtime 1.5 81.60. StepFun notes these are its own benchmarks—takeaways should weigh that—but the margins on paralinguistics and live spoken Q&A are substantial enough to be notable. Company context - Founded: April 2023 by Jiang Daxin (16 years at Microsoft working on Bing, Cortana, Azure cognitive services). - Notable prior work: Step 3.5 Flash, a 196-billion-parameter text model that topped four reasoning benchmarks earlier this year against much larger rivals. - Funding / status: One of China’s “AI Tiger” startups, with roughly $1.7 billion raised so far. Product and developer access - Launch includes a flagship persona called Xiao Yue, billed as a “soul-level companion” that’s meant to feel like texting a friend—opinions, catchphrases, emotional limits all configurable. - Developers can create and customize personas via the API. Documentation and access are at platform.stepfun.com; the model is live now. Why crypto audiences should care - Voice-native, persona-stable agents matter for web3 use cases: voice interfaces for trading, DAOs, immersive metaverse worlds, game NPCs, and monetized companion avatars could all benefit from lower-latency, character-faithful, emotionally aware voice AI. - The API-first release signals potential for third-party integrations—NFT voice personas, voice-enabled dApp assistants, and in-game NPCs are practical early adopters. Bottom line StepAudio 2.5 Realtime positions StepFun as a contender in live voice AI, with a particular emphasis on persona persistence and acoustic empathy. The company’s claims look strong on its own tests; developers and integrators should test directly to judge how those gains carry into real-world crypto and gaming scenarios. Read more AI-generated news on: undefined/news

StepFun's StepAudio 2.5 Realtime: Human-Like, Persona-Stable Voice AI for Web3

Share This Article

Related News

Corporates Scoop Up Ether as Bankless Co-Founder Exits — On-Chain Acti...

Glassnode: 580K BTC Return to Loss as 'Supply in Loss' Hits 8.33M Afte...

XRPL Proposes "AMM Swappable Curves" to Cut Slippage and Add Concentra...

Ripple Eyes $1B SPAC to Create Largest-Ever Public XRP Treasury

Could Bitcoin’s Halving Cycle Be Broken? CryptoCon Warns of a “Failed...

Fed Master Account for Ripple Could Spark Major XRP Rally, AI Models S...

Most Read News

SpaceX Wins $4.16B Pentagon Satellite Contract, Bo...

Spot Bitcoin ETFs see record 10-day outflow streak...

Crypto.com’s OG Partners with U.S. SailGP to Launc...

Lenovo Soars on AI Server Boom — GPU Shortage Coul...

Nearly 5M UNI Sent to Binance in Two Days — Fresh...

More News

Corporates Scoop Up Ether as Bankless Co-Founder Exits — On-...

Glassnode: 580K BTC Return to Loss as 'Supply in Loss' Hits...

XRPL Proposes "AMM Swappable Curves" to Cut Slippage and Add...

Ripple Eyes $1B SPAC to Create Largest-Ever Public XRP Treas...

Could Bitcoin’s Halving Cycle Be Broken? CryptoCon Warns of...

Fed Master Account for Ripple Could Spark Major XRP Rally, A...

SpaceX Wins $4.16B Pentagon Satellite Contract, Boosting IPO...

Spot Bitcoin ETFs see record 10-day outflow streak, analyst...

Crypto.com’s OG Partners with U.S. SailGP to Launch CFTC‑Reg...

Menu