A tool that records web meeting audio in real-time and transcribes it. Also supports translation and meeting minutes generation.
Runs on Ubuntu + PipeWire / PulseAudio environments.
| Feature | Requires | Quality | Speed | Related settings |
|---|---|---|---|---|
| Transcription (default) | faster-whisper (included) | 3 | 4 | default_model, default_language |
| Transcription (Kotoba-Whisper) | Same (auto-downloaded on first use) | 5 | 3 | japanese_asr_model: kotoba-whisper |
| Transcription (ReazonSpeech) | uv sync --extra reazonspeech |
5 | 4 | japanese_asr_model: reazonspeech-k2 |
| Interim transcription | Same | 2 | 5 | interim_transcription: true, interim_model |
| Translation (LibreTranslate) | LibreTranslate server | 2 | 4 | translation_provider: libretranslate |
| Translation (OpenAI compatible API) | OpenAI compatible API | 3-5 | 2-5 | translation_provider: api, api_endpoint, api_model |
| Translation (Claude) | Claude Code | 5 | 2 | translation_provider: claude |
| Language detection (pre-translation) | langdetect (included) | — | — | Automatically detects source language to select correct prompt |
| Summary (Claude) | Claude Code | 5 | 3 | llm_provider: claude |
| Summary (OpenAI compatible API) | OpenAI compatible API | 3-5 | 2-5 | llm_provider: api, api_endpoint, api_model |
| Voice commands (PTT) | None (built-in) | — | — | voice_command_key |
| Voice commands (LLM matching) | OpenAI compatible API | — | — | api_endpoint, api_model |
| Spell check (pre-translation) | transformers (auto-downloaded on first use) | — | — | libretranslate_spell_check: true |
Minimal setup without LLM: Transcription + LibreTranslate translation requires no external API or Claude Code. Everything runs locally.
See the Feature Tour for a visual walkthrough with screenshots.
sudo apt install libportaudio2 portaudio19-devgit clone https://gitlab.edocode.co.jp/common/shadow-clerk.git
cd shadow-clerk| Command | |
|---|---|
| Basic | uv tool install -e . |
| + ReazonSpeech | uv tool install -e ".[reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr" |
| + Spell check | uv tool install -e ".[spell-check]" |
| + Both (ReazonSpeech + Spell check) | uv tool install -e ".[spell-check,reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr" |
| + Google Calendar | uv tool install -e ".[gcal]" |
| All | uv tool install -e ".[spell-check,gcal,reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr" |
Note:
uv tool installmaintains a single environment per tool. When reinstalling with different extras, use--force— without it,uv tool installreports "already installed" and does not add the extra. Only the extras specified in the command are included; previously installed extras are removed.
| Command | |
|---|---|
| Basic | uv sync |
| + ReazonSpeech | uv sync --extra reazonspeech |
| + Spell check | uv sync --extra spell-check |
| + Both (ReazonSpeech + Spell check) | uv sync --extra spell-check --extra reazonspeech |
| + Google Calendar | uv sync --extra gcal |
| All | uv sync --extra spell-check --extra gcal --extra reazonspeech |
This is all you need for transcription. The following optional extras are available:
Kotoba-Whisper — No extra install required. The model is auto-downloaded on first use. Just set:
# config.yaml
japanese_asr_model: kotoba-whisperReazonSpeech k2 — Requires the reazonspeech extra:
uv tool install -e ".[reazonspeech]" \
--with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr"
# or for development:
uv sync --extra reazonspeech# config.yaml
japanese_asr_model: reazonspeech-k2Requires the spell-check extra (installs transformers, torch, sentencepiece):
uv tool install "shadow-clerk[spell-check]"
# or for development:
uv sync --extra spell-check# config.yaml
libretranslate_spell_check: true
spell_check_model: mbyhphat/t5-japanese-typo-correction # defaultThe spell check model is auto-downloaded on first use. It corrects Japanese speech recognition typos before sending text to LibreTranslate.
Automatically starts and ends meeting sessions based on your Google Calendar schedule. Requires the gcal extra:
uv tool install -e ".[gcal]"
# or for development:
uv sync --extra gcalThen authenticate and configure:
# One-time OAuth setup (opens browser)
clerk-util gcal-auth ~/credentials.json
# Enable in config
clerk-util write-config-value gcal_integration true
clerk-util write-config-value gcal_credentials_file ~/credentials.jsonWhen enabled, clerk-daemon polls Google Calendar every 60 seconds. Events automatically trigger start_meeting / end_meeting, creating transcript files named transcript-YYYYMMDDHHMM@EventTitle.txt.
See docs/google-calendar-setup.md for full setup instructions including how to obtain credentials.json from Google Cloud Console.
Add the following options if you need translation or summarization.
Local translation without LLM. Install via Docker or pip:
# Docker (recommended)
docker run -d -p 5000:5000 libretranslate/libretranslate
# Or pip
pip install libretranslate
libretranslate --host 0.0.0.0 --port 5000Configuration:
# config.yaml
translation_provider: libretranslate
libretranslate_endpoint: http://localhost:5000Used for translation, summarization, and LLM voice command matching:
# config.yaml — OpenAI
llm_provider: api
api_endpoint: https://api.openai.com/v1
api_model: gpt-4o
# Add SHADOW_CLERK_API_KEY=sk-... to ~/.local/share/shadow-clerk/.env# config.yaml — Ollama (local)
llm_provider: api
api_endpoint: http://localhost:11434/v1
api_model: llama3For managing minutes generation, translation, and controls from Claude Code:
clerk-util claude-setupThis generates ~/.claude/skills/shadow-clerk/SKILL.md and adds permissions to ~/.claude/settings.local.json.
If you installed via uv tool install:
clerk-daemonFor development (uv sync):
uv run clerk-daemonNote:
uv runuses the project.venv, whileuv tool installuses its own isolated environment. Make sure extras (e.g.spell-check,reazonspeech) are installed in the matching environment.
# Basic (record mic + system audio, auto-transcribe)
clerk-daemon
# List available devices
clerk-daemon --list-devices
# With options
clerk-daemon \
--language ja \
--model small \
--output ~/my-transcript.txt \
--verbosePress Ctrl+C to stop recording.
Hold down the Menu key (next to Right Alt) while speaking a command — no wake word needed:
[Hold Menu key] "start translation" → Translation starts
[Hold Menu key] "start meeting" → Meeting session starts
The trigger key can be changed via voice_command_key in config.yaml (ctrl_r, ctrl_l, alt_r, alt_l, shift_r, shift_l). Set to null to disable.
During recording, say the wake word (default: "sheruku" / "シェルク") followed by a command for hands-free control:
| Voice command | Action |
|---|---|
| "sheruku, start meeting" | Start a new meeting session |
| "sheruku, end meeting" | End the meeting session |
| "sheruku, language ja" | Switch transcription language to Japanese |
| "sheruku, language en" | Switch transcription language to English |
| "sheruku, unset language" | Reset to auto-detect |
| "sheruku, start translation" | Start the translation loop |
| "sheruku, stop translation" | Stop the translation loop |
The separator (comma, space) between the wake word and command is optional. The wake word can be changed via wake_word in config.yaml.
You can register custom voice commands in config.yaml under custom_commands. They are evaluated after built-in commands:
custom_commands:
- pattern: "youtube"
action: "xdg-open https://www.youtube.com"
- pattern: "gmail|mail"
action: "xdg-open https://mail.google.com"pattern: Regular expression (case-insensitive)action: Shell command to execute
If a voice command doesn't match any built-in or custom command and api_endpoint is configured, the utterance is sent to the LLM as a query. The response is printed to stdout and saved to .clerk_response.
"sheruku, what is 1+1?" → LLM returns the answer
| Option | Description | Default |
|---|---|---|
--output, -o |
Output file path | ~/.local/share/shadow-clerk/transcript-YYYYMMDD.txt |
--model, -m |
Whisper model size (tiny, base, small, medium, large-v3) |
small |
--language, -l |
Language code (ja, en, etc.). Auto-detect if omitted |
Auto |
--mic |
Microphone device number | Auto-detect |
--monitor |
Monitor device number (sounddevice) | Auto-detect |
--backend |
Audio backend (auto, pipewire, pulseaudio, sounddevice) |
auto |
--list-devices |
List devices and exit | - |
--verbose, -v |
Verbose logging | - |
--dashboard / --no-dashboard |
Enable/disable dashboard | Enabled |
--dashboard-port |
Dashboard port number | 8765 |
--beam-size |
Whisper beam size (1=fast, 5=accurate) |
5 |
--compute-type |
Whisper compute precision (int8, float16, float32) |
int8 |
--device |
Whisper device (cpu, cuda) |
cpu |
Translation and summary each support multiple providers with different operation modes:
Runs via the Claude Code Skill (/shadow-clerk). Claude performs translation and summary inline.
- Highest quality — especially for Japanese homophone correction (ja→ja)
- Requires Claude Code — must be running in a Claude Code terminal session
- How translation works:
/shadow-clerk startlaunches clerk-daemon with a background subagent that handles translation and command monitoring. Dashboard-initiated translation is also processed by this subagent - Foreground translation:
/shadow-clerk translate startruns the translation loop directly in the terminal (polling output is visible; use/shadow-clerk startfor background operation instead)
# config.yaml
translation_provider: claude # Translation by Claude
llm_provider: claude # Summary by Claude (default)clerk-daemon calls an external API (OpenAI-compatible) internally. Claude Code is not required.
- Works without Claude Code — clerk-daemon handles translation and summary on its own
- Quality depends on model — high-end models (GPT-4o) produce good results; smaller models may struggle with Japanese correction
- How translation works: An internal thread in clerk-daemon processes translation. Started/stopped via voice commands or dashboard
- Summary works similarly:
clerk-util summarizegenerates minutes via the external API
# config.yaml
translation_provider: api # Translation via external API
llm_provider: api # Summary via external API
api_endpoint: https://api.openai.com/v1
api_model: gpt-4oTranslation only. Runs locally without any external API or Claude Code (summary still needs llm_provider).
| Use case | Translation | Summary | Notes |
|---|---|---|---|
| Best quality (with Claude Code) | translation_provider: claude |
llm_provider: claude |
Highest quality, requires Claude Code |
| Autonomous (external API) | translation_provider: api |
llm_provider: api |
No Claude Code needed, quality varies by model |
| Fully local | translation_provider: libretranslate |
— | No LLM needed, lower quality |
| Hybrid | translation_provider: api |
llm_provider: claude |
Auto translation + high-quality summary |
You can start/stop clerk-daemon and generate meeting minutes from Claude Code:
/shadow-clerk start # Start clerk-daemon in the background (with translation subagent)
/shadow-clerk start --language ja # Start with options
/shadow-clerk stop # Stop clerk-daemon
/shadow-clerk start meeting # Start a meeting session (auto_translate linked)
/shadow-clerk end meeting # End a meeting session (auto_summary linked)
/shadow-clerk # Update minutes from transcript diff
/shadow-clerk full # Regenerate minutes from full transcript
/shadow-clerk status # Check current status
/shadow-clerk translate start # Start translation loop (foreground)
/shadow-clerk translate stop # Stop translation loop
Note:
/shadow-clerk startlaunches a background subagent for command monitoring. Whentranslation_provider: claude, dashboard-initiated translation (start/regenerate) is processed by this subagent.
Generated meeting minutes are saved to ~/.local/share/shadow-clerk/summary-YYYYMMDD.md.
Customize defaults and auto-features in ~/.local/share/shadow-clerk/config.yaml:
# shadow-clerk config
translate_language: en # Translation target language (ja/en/etc)
auto_translate: false # Auto-start translation on start meeting
auto_summary: false # Auto-generate summary on end meeting
default_language: null # Default language for clerk-daemon (null=auto-detect)
default_model: small # Default Whisper model for clerk-daemon
output_directory: null # Transcript output directory (null=data directory)
llm_provider: claude # LLM for summary ("claude" or "api")
translation_provider: null # Translation provider (null=use llm_provider, "claude", "api", "libretranslate")
api_endpoint: null # OpenAI Compatible API base URL
api_model: null # API model name (gpt-4o, etc.)
api_key_env: SHADOW_CLERK_API_KEY # Environment variable name for API key
summary_source: null # Summary source (null=auto: prefer translation if exists / "transcript" / "translate")
summary_language: null # Summary output language (null=fallback to ui_language / ja, en, zh, ...)
libretranslate_endpoint: null # LibreTranslate API URL (e.g. http://localhost:5000)
libretranslate_api_key: null # LibreTranslate API key (null if not required)
libretranslate_spell_check: false # Spell check before LibreTranslate translation
spell_check_model: mbyhphat/t5-japanese-typo-correction # Spell check model
custom_commands: [] # Custom voice commands (list of pattern + action)
initial_prompt: null # Whisper initial_prompt (vocabulary hints for recognition)
voice_command_key: f23 # Push-to-Talk key (null=disabled)
wake_word: シェルク # Wake word (trigger word for voice commands)
whisper_beam_size: 5 # Whisper beam size (1=fast, 5=accurate)
whisper_compute_type: int8 # Compute precision (int8/float16/float32)
whisper_device: cpu # Device (cpu/cuda)
interim_transcription: false # Interim transcription (real-time display while speaking)
interim_model: base # Model for interim transcription
japanese_asr_model: default # Japanese ASR model (default/kotoba-whisper/reazonspeech-k2)
kotoba_whisper_model: kotoba-tech/kotoba-whisper-v2.0-faster # Kotoba-Whisper model
interim_japanese_asr_model: default # Japanese ASR for interim transcription
ui_language: ja # UI language (ja/en) — dashboard, terminal output, LLM promptsManage configuration from Claude Code:
/shadow-clerk config show # Show current config
/shadow-clerk config set default_model tiny # Change a setting
/shadow-clerk config set auto_translate true # Enable auto-translation
/shadow-clerk config init # Generate default config file
With auto_translate: true, translation starts automatically on /shadow-clerk start meeting.
With auto_summary: true, meeting minutes are generated automatically on /shadow-clerk end meeting.
When summary_source is unset (null/auto), the summary is generated from the translation file if one exists (falling back to the transcript if not). To pin the behavior explicitly:
/shadow-clerk config set summary_source transcript # always use transcript
/shadow-clerk config set summary_source translate # always use translation (fallback to transcript if missing)
summary_language controls the output language of the summary. When unset (null), it falls back to ui_language:
/shadow-clerk config set summary_language en # summarize in English
/shadow-clerk config set summary_language ja # summarize in Japanese
shadow-clerk/ # Repository
pyproject.toml # Project definition & dependencies
src/shadow_clerk/ # Main package
__init__.py # Data directory configuration
clerk_daemon.py # Recording, VAD, transcription & dashboard
llm_client.py # External API translation & summary
i18n.py # Internationalization (ja/en)
clerk_util.py # Data directory operations & process management
data/
SKILL.md.template # Claude Code Skill template
skills/
SKILL.md # Claude Code Skill definition (development)
~/.local/share/shadow-clerk/ # Runtime data
transcript-YYYYMMDD.txt # Transcription output (date-based)
transcript-YYYYMMDDHHMM.txt # Meeting session transcript
transcript-YYYYMMDDHHMM@Title.txt # Meeting session transcript (with event title)
transcript-YYYYMMDD-<lang>.txt # Translation output
summary-YYYYMMDD.md # Meeting minutes (corresponds to transcript)
summary-YYYYMMDDHHMM@Title.md # Meeting minutes (named session)
glossary.txt # Glossary (TSV: translation terms & reading-based text replacement)
config.yaml # Configuration file
gcal_token.json # Google Calendar OAuth token (created by gcal-auth)
# List available devices
clerk-daemon --list-devices
# PipeWire: check status
wpctl status
# PulseAudio: list sources
pactl list short sourcesOn PipeWire, check sink (output) devices with wpctl status.
On PulseAudio, look for sources containing .monitor with pactl list short sources.
You can also specify the device number manually:
clerk-daemon --monitor 5Make sure libportaudio2 is installed:
dpkg -l | grep portaudioIf you see PortAudioError: Error initializing PortAudio: ... PulseAudio_Initialize: Can't connect to server, the PulseAudio-compatible service may have crashed. On PipeWire systems, restart pipewire-pulse:
systemctl --user restart pipewire-pulseUse a lighter model with --model tiny:
clerk-daemon --model tinyThe japanese_asr_model setting selects the ASR backend used when language=ja. When the language changes to something other than ja, it automatically reverts to standard Whisper.
| Value | Model | Requires | Japanese accuracy | CPU speed |
|---|---|---|---|---|
default |
Standard Whisper | — | Depends on model size | Depends on model size |
kotoba-whisper |
Kotoba-Whisper | Auto-downloaded on first use | High (rivals large-v3) | ~medium |
reazonspeech-k2 |
ReazonSpeech k2 | uv sync --extra reazonspeech |
High | Fast |
Kotoba-Whisper retains the full large-v3 encoder (32 layers) while distilling the decoder down to just 2 layers. Since it has only 2 decoder layers, beam=5 has almost no speed penalty.
ReazonSpeech k2 uses sherpa-onnx for inference. When selected, Whisper-specific settings (default_model, whisper_beam_size, whisper_compute_type, initial_prompt) are not used.
Selection guide:
| Use case | Settings |
|---|---|
| Japanese-focused, accuracy priority | japanese_asr_model: kotoba-whisper, whisper_beam_size: 5 |
| Japanese-focused, fast & accurate | japanese_asr_model: reazonspeech-k2 |
| Japanese-focused, speed priority (CPU) | japanese_asr_model: default, default_model: small, whisper_beam_size: 3 |
| Multilingual | japanese_asr_model: kotoba-whisper, default_model: small (Kotoba for ja, small for others) |
Interim transcription:
interim_japanese_asr_model controls which Japanese ASR model is used for interim transcription (real-time display while speaking). On CPU, keeping the default (default with a lightweight model like tiny/base) is recommended.
# Japanese accuracy priority (GPU recommended)
japanese_asr_model: kotoba-whisper
interim_japanese_asr_model: kotoba-whisper
whisper_beam_size: 5
# Japanese accuracy + fast interim (CPU recommended)
japanese_asr_model: kotoba-whisper
interim_japanese_asr_model: default
interim_model: base
whisper_beam_size: 5 # Kotoba has only 2 decoder layers, beam=5 is fine
# ReazonSpeech (fast & accurate, CPU friendly)
japanese_asr_model: reazonspeech-k2
interim_japanese_asr_model: default
interim_model: base
# Maximum speed (CPU)
japanese_asr_model: default
default_model: small
interim_model: base
whisper_beam_size: 1