Skip to content

feat: Add TTS slide narration powered by CAMB AI#486

Open
neilruaro-camb wants to merge 1 commit intopresenton:mainfrom
neilruaro-camb:feature/camb-ai-tts-narration
Open

feat: Add TTS slide narration powered by CAMB AI#486
neilruaro-camb wants to merge 1 commit intopresenton:mainfrom
neilruaro-camb:feature/camb-ai-tts-narration

Conversation

@neilruaro-camb
Copy link
Copy Markdown

Summary

Hi there! We are the team at CAMB AI, the AI localization engine trusted by brands like the Premier League, NBA, and NASCAR for high-quality voice and audio solutions. We would love to contribute a TTS (text-to-speech) narration feature to Presenton.

This PR adds the ability to convert existing speaker notes into playable audio narration using CAMB AI's MARS speech models. The feature is fully optional and gracefully hidden when no API key is configured.

What this adds

  • Backend: New TTSService using camb-sdk with content-hash caching (same text = same cached audio file), plus /tts/generate and /tts/generate-presentation API endpoints
  • Frontend: useSlideAudio hook for audio playback state, play/pause/auto-play controls in presentation mode (with P keyboard shortcut), per-slide narration button in the slide toolbar, and a "Narration" button in the header for batch generation
  • Configuration: Two new optional env vars (CAMB_API_KEY, CAMB_TTS_MODEL) added to docker-compose.yml, nginx.conf, and next.config.mjs

How it works

  1. Users set their CAMB_API_KEY (free to get from CAMB AI Studio)
  2. A "Narration" button appears in the presentation header, and a speaker icon appears on each slide's toolbar
  3. Clicking either generates MP3 audio from the slide's speaker notes via CAMB AI's MARS TTS
  4. In presentation mode, play/pause controls appear next to the slide counter
  5. Auto-play mode can be toggled so narration plays automatically on slide transitions

Design decisions

  • No database migration required -- audio files are derived cache artifacts stored in app_data/audio/, keyed by a SHA-256 hash of the text content
  • Graceful degradation -- when CAMB_API_KEY is not set, all narration UI elements are hidden and the rest of the app works exactly as before
  • Follows existing patterns -- the service, endpoints, hooks, and env var helpers all follow the conventions already established in the codebase

Files changed

Type Files
New services/tts_service.py, endpoints/tts.py, hooks/useSlideAudio.ts
Modified pyproject.toml, get_env.py, asset_directory_utils.py, router.py, main.py (backend)
Modified presentation-generation.ts, slide.ts, PresentationMode.tsx, PresentationPage.tsx, SlideContent.tsx, PresentationHeader.tsx (frontend)
Modified docker-compose.yml, nginx.conf, next.config.mjs (config)

Testing

  • TTS service tested directly against CAMB AI API (audio generation, caching, auto-voice selection all verified)
  • Generated MP3 files verified as valid audio (MPEG ADTS Layer III)
  • Tested with mars-flash model, also supports mars-pro and mars-instruct

We would be happy to make any adjustments to better fit the project's direction. Thank you for considering this contribution!

Add text-to-speech narration for presentation slides using CAMB AI's
MARS models. Converts existing speaker notes to playable audio with
content-hash caching, auto-play in presentation mode, and per-slide
or batch generation.

- Backend: TTSService with camb-sdk, API endpoints, audio file serving
- Frontend: useSlideAudio hook, playback controls, narration buttons
- Config: CAMB_API_KEY and CAMB_TTS_MODEL env vars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant