-
Notifications
You must be signed in to change notification settings - Fork 3.5k
v0.6.31: elevenlabs voice, trigger.dev fixes, cloud whitelabeling for enterprises #4053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 1 commit
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
efb582e
feat(voice): voice input migration to eleven labs (#4041)
icecrasher321 650487c
fix(kb): doc selector (#4048)
icecrasher321 04c9057
fix(kb): disable connectors after repeated sync failures (#4046)
waleedlatif1 579d240
fix(parallel): remove broken node-counting completion + resolver clai…
waleedlatif1 a1173ee
debug(log): Add logging on socket token error (#4051)
TheodoreSpeaks c21876a
fix(trigger): add react-dom and react-email to additionalPackages (#4…
waleedlatif1 621aa65
fix(webhook): throw webhook errors as 4xxs (#4050)
TheodoreSpeaks 1189400
feat(enterprise): cloud whitelabeling for enterprise orgs (#4047)
waleedlatif1 4700590
fix(editor): stop highlighting start.input as blue when block is not …
waleedlatif1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
feat(voice): voice input migration to eleven labs (#4041)
* feat(speech): unified voice interface * add metering for voice input usage * ip key * use shared getclientip helper, fix deployed chat * cleanup code * prep merge * merge staging in * add billing check * add voice input section * remove skip billing * address comments
- Loading branch information
commit efb582e96a3a8dd393c721ce45ec6c867220e677
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| import { NextResponse } from 'next/server' | ||
| import { hasSTTService } from '@/lib/speech/config' | ||
|
|
||
| /** | ||
| * Returns whether server-side STT is configured. | ||
| * Unauthenticated — the response is a single boolean, | ||
| * not sensitive data, and deployed chat visitors need it. | ||
| */ | ||
| export async function GET() { | ||
| return NextResponse.json({ sttAvailable: hasSTTService() }) | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,171 @@ | ||
| import { db } from '@sim/db' | ||
| import { chat } from '@sim/db/schema' | ||
| import { createLogger } from '@sim/logger' | ||
| import { eq } from 'drizzle-orm' | ||
| import { type NextRequest, NextResponse } from 'next/server' | ||
| import { getSession } from '@/lib/auth' | ||
| import { hasExceededCostLimit } from '@/lib/billing/core/subscription' | ||
| import { recordUsage } from '@/lib/billing/core/usage-log' | ||
| import { env } from '@/lib/core/config/env' | ||
| import { getCostMultiplier, isBillingEnabled } from '@/lib/core/config/feature-flags' | ||
| import { RateLimiter } from '@/lib/core/rate-limiter' | ||
| import { validateAuthToken } from '@/lib/core/security/deployment' | ||
| import { getClientIp } from '@/lib/core/utils/request' | ||
|
|
||
| const logger = createLogger('SpeechTokenAPI') | ||
|
|
||
| export const dynamic = 'force-dynamic' | ||
|
|
||
| const ELEVENLABS_TOKEN_URL = 'https://api.elevenlabs.io/v1/single-use-token/realtime_scribe' | ||
|
|
||
| const VOICE_SESSION_COST_PER_MIN = 0.008 | ||
| const WORKSPACE_SESSION_MAX_MINUTES = 3 | ||
| const CHAT_SESSION_MAX_MINUTES = 1 | ||
|
|
||
| const STT_TOKEN_RATE_LIMIT = { | ||
| maxTokens: 30, | ||
| refillRate: 3, | ||
| refillIntervalMs: 72 * 1000, | ||
| } as const | ||
|
|
||
| const rateLimiter = new RateLimiter() | ||
|
|
||
| async function validateChatAuth( | ||
| request: NextRequest, | ||
| chatId: string | ||
| ): Promise<{ valid: boolean; ownerId?: string }> { | ||
| try { | ||
| const chatResult = await db | ||
| .select({ | ||
| id: chat.id, | ||
| userId: chat.userId, | ||
| isActive: chat.isActive, | ||
| authType: chat.authType, | ||
| password: chat.password, | ||
| }) | ||
| .from(chat) | ||
| .where(eq(chat.id, chatId)) | ||
| .limit(1) | ||
|
|
||
| if (chatResult.length === 0 || !chatResult[0].isActive) { | ||
| return { valid: false } | ||
| } | ||
|
|
||
| const chatData = chatResult[0] | ||
|
|
||
| if (chatData.authType === 'public') { | ||
| return { valid: true, ownerId: chatData.userId } | ||
| } | ||
|
|
||
| const cookieName = `chat_auth_${chatId}` | ||
| const authCookie = request.cookies.get(cookieName) | ||
| if (authCookie && validateAuthToken(authCookie.value, chatId, chatData.password)) { | ||
| return { valid: true, ownerId: chatData.userId } | ||
| } | ||
|
|
||
| return { valid: false } | ||
| } catch (error) { | ||
| logger.error('Error validating chat auth for STT:', error) | ||
| return { valid: false } | ||
| } | ||
| } | ||
|
|
||
| export async function POST(request: NextRequest) { | ||
| try { | ||
| const body = await request.json().catch(() => ({})) | ||
| const chatId = body?.chatId as string | undefined | ||
|
|
||
| let billingUserId: string | undefined | ||
|
|
||
| if (chatId) { | ||
| const chatAuth = await validateChatAuth(request, chatId) | ||
| if (!chatAuth.valid) { | ||
| return NextResponse.json({ error: 'Unauthorized' }, { status: 401 }) | ||
| } | ||
| billingUserId = chatAuth.ownerId | ||
| } else { | ||
| const session = await getSession() | ||
| if (!session?.user?.id) { | ||
| return NextResponse.json({ error: 'Unauthorized' }, { status: 401 }) | ||
| } | ||
| billingUserId = session.user.id | ||
| } | ||
|
|
||
| if (isBillingEnabled) { | ||
| const rateLimitKey = chatId | ||
| ? `stt-token:chat:${chatId}:${getClientIp(request)}` | ||
| : `stt-token:user:${billingUserId}` | ||
|
|
||
| const rateCheck = await rateLimiter.checkRateLimitDirect(rateLimitKey, STT_TOKEN_RATE_LIMIT) | ||
| if (!rateCheck.allowed) { | ||
| return NextResponse.json( | ||
| { error: 'Voice input rate limit exceeded. Please try again later.' }, | ||
| { | ||
| status: 429, | ||
| headers: { | ||
| 'Retry-After': String(Math.ceil((rateCheck.retryAfterMs ?? 60000) / 1000)), | ||
| }, | ||
| } | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| if (billingUserId && isBillingEnabled) { | ||
| const exceeded = await hasExceededCostLimit(billingUserId) | ||
| if (exceeded) { | ||
| return NextResponse.json( | ||
| { error: 'Usage limit exceeded. Please upgrade your plan to continue.' }, | ||
| { status: 402 } | ||
| ) | ||
| } | ||
| } | ||
|
|
||
| const apiKey = env.ELEVENLABS_API_KEY | ||
| if (!apiKey?.trim()) { | ||
| return NextResponse.json( | ||
| { error: 'Speech-to-text service is not configured' }, | ||
| { status: 503 } | ||
| ) | ||
| } | ||
|
|
||
| const response = await fetch(ELEVENLABS_TOKEN_URL, { | ||
| method: 'POST', | ||
| headers: { 'xi-api-key': apiKey }, | ||
| }) | ||
|
|
||
| if (!response.ok) { | ||
| const errBody = await response.json().catch(() => ({})) | ||
| const message = | ||
| errBody.detail || errBody.message || `Token request failed (${response.status})` | ||
| logger.error('ElevenLabs token request failed', { status: response.status, message }) | ||
| return NextResponse.json({ error: message }, { status: 502 }) | ||
| } | ||
|
|
||
| const data = await response.json() | ||
|
|
||
| if (billingUserId) { | ||
| const maxMinutes = chatId ? CHAT_SESSION_MAX_MINUTES : WORKSPACE_SESSION_MAX_MINUTES | ||
| const sessionCost = VOICE_SESSION_COST_PER_MIN * maxMinutes | ||
|
|
||
| await recordUsage({ | ||
| userId: billingUserId, | ||
| entries: [ | ||
| { | ||
| category: 'fixed', | ||
| source: 'voice-input', | ||
| description: `Voice input session (${maxMinutes} min)`, | ||
| cost: sessionCost * getCostMultiplier(), | ||
| }, | ||
| ], | ||
| }).catch((err) => { | ||
| logger.warn('Failed to record voice input usage, continuing:', err) | ||
| }) | ||
| } | ||
|
|
||
| return NextResponse.json({ token: data.token }) | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : 'Failed to generate speech token' | ||
| logger.error('Speech token error:', error) | ||
| return NextResponse.json({ error: message }, { status: 500 }) | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server records usage (line 150) and issues the ElevenLabs token before the client has checked or obtained microphone access. In
use-speech-to-text.ts,getUserMediais called only after the token response is received. If the user denies microphone permission,NotAllowedErroris thrown, cleanup runs, and the session is abandoned — but billing was already recorded and the single-use token was consumed.Consider requesting
getUserMediaon the client before calling/api/speech/token, or deferring usage recording to after the WebSocket connection is successfully established.