improvement(chat-voice): modernize ElevenLabs TTS to Flash v2.5#4943
Conversation
- Switch default TTS model from eleven_turbo_v2_5 to eleven_flash_v2_5 (ElevenLabs recommends Flash over Turbo in all cases; ~75ms latency) - Drop deprecated optimize_streaming_latency knob plus legacy use_pvc_as_ivc / enable_ssml_parsing flags - Move output_format to the query string and raise it from mp3_22050_32 to mp3_44100_128 for higher audio quality - Switch apply_text_normalization from off to auto for correct number/date pronunciation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
Greptile SummaryThis PR modernizes the deployed-chat TTS path by switching the default model from
Confidence Score: 5/5Safe to merge — changes are additive quality improvements with no logic regressions on the audio proxy path. The diff is small and targeted: a model name swap, a bitrate bump, and removal of deprecated fields that ElevenLabs no longer honours. The contract default and the hook default are kept in sync, output_format placement matches the ElevenLabs streaming endpoint spec, and no auth or data-flow logic is touched. No new error paths are introduced. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant Client as Browser (useAudioStreaming)
participant Proxy as /api/proxy/tts/stream
participant EL as ElevenLabs API
Client->>Proxy: "POST {text, voiceId, modelId="eleven_flash_v2_5", chatId}"
Proxy->>Proxy: validateChatAuth(chatId)
Proxy->>EL: "POST /v1/text-to-speech/{voiceId}/stream?output_format=mp3_44100_128"
Note over Proxy,EL: body: {model_id, voice_settings, apply_text_normalization="auto"}
EL-->>Proxy: audio/mpeg stream (128 kbps, 44100 Hz)
Proxy-->>Client: streamed audio/mpeg via TransformStream
Client->>Client: arrayBuffer() → decodeAudioData → play
Reviews (1): Last reviewed commit: "improvement(chat-voice): modernize Eleve..." | Re-trigger Greptile |
Replace the legacy Sarah default (EXAVITQu4vr4xnSDxMaL), which has no high-quality eleven_flash_v2_5 base, with Jessica (cgSgspJ2msm6clMCkdW9) — a current premade conversational voice verified against the live account and optimized for Flash v2.5.
Summary
eleven_turbo_v2_5toeleven_flash_v2_5— ElevenLabs now recommends Flash over Turbo in all cases (functionally equivalent, ~75ms latency, built for real-time/Agents)EXAVITQu4vr4xnSDxMaL) with Jessica (cgSgspJ2msm6clMCkdW9) — Sarah has no high-qualityeleven_flash_v2_5base; Jessica is a current premade conversational voice optimized for Flash v2.5. Verified against the live account.optimize_streaming_latencyknob and the legacyuse_pvc_as_ivc/enable_ssml_parsingflags from the proxy requestoutput_formatto the query string (where ElevenLabs reads it) and raise it frommp3_22050_32(32 kbps) tomp3_44100_128for noticeably better audio qualityapply_text_normalizationfromofftoautoso numbers/dates are pronounced correctly (level-4 latency mode had silently disabled the normalizer)Scope is limited to the deployed-chat voice path (
/api/proxy/tts/stream, its contract, the audio-streaming hook, and the chat default voice). STT (scribe_v2_realtime, single-use token flow) was already current and is untouched.Type of Change
Testing
Smoke-tested the exact new request (Jessica +
eleven_flash_v2_5+output_format=mp3_44100_128+apply_text_normalization=auto) against the live ElevenLabs API → HTTP 200, valid 128 kbps / 44.1 kHz MP3, numbers/dates normalized correctly.bun run check:api-validationandtsc --noEmitpass clean.Checklist