Skip to content

fix(stt): STT compatibility fixes for Groq Whisper and AionUI web frontend#400

Open
starm2010 wants to merge 1 commit into
iOfficeAI:mainfrom
starm2010:fix/stt-groq-compatibility
Open

fix(stt): STT compatibility fixes for Groq Whisper and AionUI web frontend#400
starm2010 wants to merge 1 commit into
iOfficeAI:mainfrom
starm2010:fix/stt-groq-compatibility

Conversation

@starm2010
Copy link
Copy Markdown

@starm2010 starm2010 commented Jun 4, 2026

Problem

When using AionUI web frontend with Groq Whisper as the STT provider, speech-to-text fails with 400/502 errors. There are four distinct issues:

  1. Preference key mismatch: The AionUI web frontend stores STT config under the key tools.speechToText in client_preferences, but the backend queries only speechToText. Since get_preferences() filters with WHERE key IN (...), the config row is never found, causing STT_DISABLED even when the user has configured STT.

  2. Multipart field name mismatch: The web frontend sends audio as FormData field audio (via formData.append('audio', blob, filename)), but the backend expects field name file. This causes a 400 Bad Request: missing 'file' field error. The frontend also embeds the filename in the blob's Content-Disposition header rather than as a separate multipart field.

  3. Double /v1 in Groq URL: When base_url includes /v1 (e.g. https://api.groq.com/openai/v1), the code appends /v1/audio/transcriptions, producing https://api.groq.com/openai/v1/v1/audio/transcriptions502 Bad Gateway.

  4. Language code and MIME type incompatibility: The browser sends languageHint: "en-US" but Groq Whisper only accepts ISO 639-1 base codes (e.g. en). Similarly, the browser sends audio/webm;codecs=opus as the MIME type, but reqwest::mime_str() requires clean MIME types without codec parameters.

Solution

  1. Key mismatch: Query both speechToText and tools.speechToText in get_preferences(). Use prefs.get("tools.speechToText").or_else(|| prefs.get("speechToText")) for backward compatibility.

  2. Multipart field: Accept both "file" and "audio" field names. Parse filename from the Content-Disposition header when the frontend sends it as part of the blob rather than a separate field.

  3. Double /v1: Add .trim_end_matches("/v1") after .trim_end_matches('/') when constructing the base URL, before appending /v1/audio/transcriptions.

  4. Language normalization: Strip region codes via lang.split('-').next() (e.g. en-USen). User-configured language in settings now takes precedence over browser languageHint, so users can override the browser locale for transcription.

  5. MIME normalization: Strip codec parameters via mime_type.split(';').next().trim() (e.g. audio/webm;codecs=opusaudio/webm).

Testing

  • Tested with AionUI v2.1.10 web frontend speaking into the microphone
  • Verified Groq Whisper returns correct transcriptions in both English and Spanish
  • Verified curl tests directly against aioncore /api/stt endpoint return 200 OK
  • Verified setting language: "es" in STT settings correctly overrides browser locale
  • Confirmed no debug/temporary code in the final commit

Checklist

  • Changes are minimal and focused on the STT compatibility issues
  • No debug/temporary code included
  • Backward compatible — both old (speechToText) and new (tools.speechToText) keys work
  • Both "file" and "audio" multipart field names accepted
  • No changes to error.rs — upstream already has the correct error model

Fixes #373

…ontend

1. Key mismatch: get_preferences now queries both 'speechToText' and
   'tools.speechToText' with fallback for backward compatibility
2. Multipart field: accept both 'file' and 'audio' field names, parse
   filename from Content-Disposition header
3. Double /v1 URL: trim_end_matches('/v1') on base_url before appending
   '/v1/audio/transcriptions' to avoid double /v1/v1/
4. Language normalization: strip region codes (en-US → en) for Groq
   Whisper which only accepts ISO 639-1 base codes
5. MIME normalization: strip codec params (audio/webm;codecs=opus →
   audio/webm) before passing to reqwest mime_str()
6. User language priority: config.language now overrides browser
   languageHint so users can set transcription language in UI settings

Also removes MaskedApiKey error variant (ACP masking is by design,
not an error).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Groq as Speech-to-Text Provider

1 participant