Skip to content

fix: prevent VAD from driving user_state when turn_detection=sst#5582

Open
MdSadiqMd wants to merge 13 commits intolivekit:mainfrom
MdSadiqMd:fix/prevent-vad-from-driving-userstate
Open

fix: prevent VAD from driving user_state when turn_detection=sst#5582
MdSadiqMd wants to merge 13 commits intolivekit:mainfrom
MdSadiqMd:fix/prevent-vad-from-driving-userstate

Conversation

@MdSadiqMd
Copy link
Copy Markdown

@MdSadiqMd MdSadiqMd commented Apr 28, 2026

Closes #5580

Summary

Added user_state_source configuration to TurnHandlingOptions with three modes: "vad", "stt", and "auto" (default). Implemented _vad_drives_user_state property in AudioRecognition that encapsulates the decision logic. VAD can now run for interruption detection without affecting user_state when user_state_source="stt", solving false positives from background noise. Fully backward compatible with default "auto" mode

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 28, 2026

CLA assistant check
All committers have signed the CLA.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

@miguelmoralai
Copy link
Copy Markdown

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

@MdSadiqMd
Copy link
Copy Markdown
Author

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

I thought Option A might be a good fit, as it just acts as a fix for the existing system, After thinking Option B makes more sense for long term, thus making the change now

@miguelmoralai
Copy link
Copy Markdown

miguelmoralai commented Apr 28, 2026

I think you should implement Option B described in the issue for a more reliable solution:

Option B — explicit configuration. Add a user_state_source: Literal["vad", "stt", "auto"] field to TurnHandlingOptions (or AgentSession). "auto" keeps current behavior. "stt" makes the VAD branch skip the _speaking writes while still running VAD inference for interruption detection. "vad" is today's default

I thought Option A might be a good fit, as it just acts as a fix for the existing system, After thinking Option B makes more sense for long term, thus making the change now

Yep but Option A (the one you implemented) has trade offs. I mean selecting vad/stt as turn_detection does not mean you also want to handle user_state through the same method. IMO seems a more logical but still wrong assumption. Ideally the user should be able to select both

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@MdSadiqMd
Copy link
Copy Markdown
Author

MdSadiqMd commented Apr 28, 2026

@miguelmoralai, can you please verify the changes

@miguelmoralai
Copy link
Copy Markdown

@claude review

@MdSadiqMd
Copy link
Copy Markdown
Author

Looks like claude is not up

cc: @miguelmoralai

@MdSadiqMd
Copy link
Copy Markdown
Author

Bump @miguelmoralai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow decoupling user_state source from VAD when STT emits speech events

3 participants