Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
65 changes: 55 additions & 10 deletions examples/other/transcription/README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,69 @@
# Speech-to-text

This example shows realtime transcription from voice to text.
These examples show realtime transcription from voice to text.

It uses OpenAI's Whisper STT API, but supports other STT plugins by changing this line:
- [`transcriber.py`](./transcriber.py) transcribes one remote participant.
- [`multi-user-transcriber.py`](./multi-user-transcriber.py) starts one transcription session for each remote participant in the room.

`transcriber.py` uses OpenAI's Whisper STT API, but supports other STT plugins by changing this line:

```python
stt = openai.STT()
```

To render the transcriptions into your client application, refer to the [full documentation](https://docs.livekit.io/agents/voice-agent/transcriptions/).
`multi-user-transcriber.py` uses LiveKit Inference with Deepgram:

```python
stt = inference.STT("deepgram/nova-3")
```

To render the transcriptions in your client application, refer to the [text and transcriptions documentation](https://docs.livekit.io/agents/multimodality/text/).

## Running the examples

From the repository root, install dependencies:

```bash
uv sync --all-extras --dev
```

Create an `examples/.env` file with your LiveKit credentials and the provider credentials for the example you run:

```bash
LIVEKIT_URL=wss://yourhost.livekit.cloud
LIVEKIT_API_KEY=livekit-api-key
LIVEKIT_API_SECRET=your-api-secret
OPENAI_API_KEY=your-api-key
```

For single-participant transcription:

```bash
uv run examples/other/transcription/transcriber.py dev
```

## Running the example
For multi-user transcription:

```bash
export LIVEKIT_URL=wss://yourhost.livekit.cloud
export LIVEKIT_API_KEY=livekit-api-key
export LIVEKIT_API_SECRET=your-api-secret
export OPENAI_API_KEY=your-api-key
uv run examples/other/transcription/multi-user-transcriber.py dev
```

Then connect one or more participants to a room and dispatch the agent to that same room. For an example frontend, you can use LiveKit's [Agents Playground](https://agents-playground.livekit.io/).

When `multi-user-transcriber.py` is working, the agent logs a line similar to this for each participant that speaks:

python3 transcriber.py start
```text
participant-identity -> hello from this participant
```

Then connect to any room. For an example frontend, you can use LiveKit's [Agents Playground](https://agents-playground.livekit.io/).
The example also publishes transcripts to the room with text output enabled, so clients can render them from the `lk.transcription` text stream topic.

## Troubleshooting

If the agent starts but does not transcribe:

- Confirm the frontend participant and the agent are connected to the same room.
- Confirm at least one remote participant is publishing microphone audio.
- Confirm the participant is not the agent itself. The multi-user example transcribes remote participants only.
- Confirm `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET` are loaded from `examples/.env` or your shell.
- For `transcriber.py`, confirm `OPENAI_API_KEY` is set. For `multi-user-transcriber.py`, confirm your LiveKit project can use LiveKit Inference.
3 changes: 2 additions & 1 deletion examples/other/transcription/multi-user-transcriber.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ def on_task_done(task: asyncio.Task):
task.add_done_callback(on_task_done)

def on_participant_disconnected(self, participant: rtc.RemoteParticipant):
if (session := self._sessions.pop(participant.identity)) is None:
if (session := self._sessions.pop(participant.identity, None)) is None:
return

logger.info(f"closing session for {participant.identity}")
Expand Down Expand Up @@ -122,6 +122,7 @@ async def _close_session(self, sess: AgentSession) -> None:

@server.rtc_session()
async def entrypoint(ctx: JobContext):
logger.info(f"starting multi-user transcriber, room: {ctx.room.name}")
transcriber = MultiUserTranscriber(ctx)
transcriber.start()

Expand Down