Shadow-clerk

A tool that records web meeting audio in real-time and transcribes it. Also supports translation and meeting minutes generation.

Runs on Ubuntu + PipeWire / PulseAudio environments.

Features and requirements

Feature	Requires	Quality	Speed	Related settings
Transcription (default)	faster-whisper (included)	3	4	`default_model`, `default_language`
Transcription (Kotoba-Whisper)	Same (auto-downloaded on first use)	5	3	`japanese_asr_model: kotoba-whisper`
Transcription (ReazonSpeech)	`uv sync --extra reazonspeech`	5	4	`japanese_asr_model: reazonspeech-k2`
Interim transcription	Same	2	5	`interim_transcription: true`, `interim_model`
Translation (LibreTranslate)	LibreTranslate server	2	4	`translation_provider: libretranslate`
Translation (OpenAI compatible API)	OpenAI compatible API	3-5	2-5	`translation_provider: api`, `api_endpoint`, `api_model`
Translation (Claude)	Claude Code	5	2	`translation_provider: claude`
Language detection (pre-translation)	langdetect (included)	—	—	Automatically detects source language to select correct prompt
Summary (Claude)	Claude Code	5	3	`llm_provider: claude`
Summary (OpenAI compatible API)	OpenAI compatible API	3-5	2-5	`llm_provider: api`, `api_endpoint`, `api_model`
Voice commands (PTT)	None (built-in)	—	—	`voice_command_key`
Voice commands (LLM matching)	OpenAI compatible API	—	—	`api_endpoint`, `api_model`
Spell check (pre-translation)	transformers (auto-downloaded on first use)	—	—	`libretranslate_spell_check: true`

Minimal setup without LLM: Transcription + LibreTranslate translation requires no external API or Claude Code. Everything runs locally.

See the Feature Tour for a visual walkthrough with screenshots.

Setup

1. System packages

sudo apt install libportaudio2 portaudio19-dev

2. Install

git clone https://gitlab.edocode.co.jp/common/shadow-clerk.git
cd shadow-clerk

	Command
Basic	`uv tool install -e .`
+ ReazonSpeech	`uv tool install -e ".[reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr"`
+ Spell check	`uv tool install -e ".[spell-check]"`
+ Both (ReazonSpeech + Spell check)	`uv tool install -e ".[spell-check,reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr"`
+ Google Calendar	`uv tool install -e ".[gcal]"`
All	`uv tool install -e ".[spell-check,gcal,reazonspeech]" --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr"`

Note: uv tool install maintains a single environment per tool. When reinstalling with different extras, use --force — without it, uv tool install reports "already installed" and does not add the extra. Only the extras specified in the command are included; previously installed extras are removed.

2a. For development

	Command
Basic	`uv sync`
+ ReazonSpeech	`uv sync --extra reazonspeech`
+ Spell check	`uv sync --extra spell-check`
+ Both (ReazonSpeech + Spell check)	`uv sync --extra spell-check --extra reazonspeech`
+ Google Calendar	`uv sync --extra gcal`
All	`uv sync --extra spell-check --extra gcal --extra reazonspeech`

This is all you need for transcription. The following optional extras are available:

Optional: Japanese ASR models

Kotoba-Whisper — No extra install required. The model is auto-downloaded on first use. Just set:

# config.yaml
japanese_asr_model: kotoba-whisper

ReazonSpeech k2 — Requires the reazonspeech extra:

uv tool install -e ".[reazonspeech]" \
  --with "reazonspeech-k2-asr @ git+https://github.com/reazon-research/ReazonSpeech.git#subdirectory=pkg/k2-asr"
# or for development:
uv sync --extra reazonspeech

# config.yaml
japanese_asr_model: reazonspeech-k2

Optional: Spell check (pre-translation correction)

Requires the spell-check extra (installs transformers, torch, sentencepiece):

uv tool install "shadow-clerk[spell-check]"
# or for development:
uv sync --extra spell-check

# config.yaml
libretranslate_spell_check: true
spell_check_model: mbyhphat/t5-japanese-typo-correction  # default

The spell check model is auto-downloaded on first use. It corrects Japanese speech recognition typos before sending text to LibreTranslate.

Optional: Google Calendar integration

Automatically starts and ends meeting sessions based on your Google Calendar schedule. Requires the gcal extra:

uv tool install -e ".[gcal]"
# or for development:
uv sync --extra gcal

Then authenticate and configure:

# One-time OAuth setup (opens browser)
clerk-util gcal-auth ~/credentials.json

# Enable in config
clerk-util write-config-value gcal_integration true
clerk-util write-config-value gcal_credentials_file ~/credentials.json

When enabled, clerk-daemon polls Google Calendar every 60 seconds. Events automatically trigger start_meeting / end_meeting, creating transcript files named transcript-YYYYMMDDHHMM@EventTitle.txt.

See docs/google-calendar-setup.md for full setup instructions including how to obtain credentials.json from Google Cloud Console.

Add the following options if you need translation or summarization.

3. (Optional) LibreTranslate setup

Local translation without LLM. Install via Docker or pip:

# Docker (recommended)
docker run -d -p 5000:5000 libretranslate/libretranslate

# Or pip
pip install libretranslate
libretranslate --host 0.0.0.0 --port 5000

Configuration:

# config.yaml
translation_provider: libretranslate
libretranslate_endpoint: http://localhost:5000

4. (Optional) OpenAI compatible API setup

Used for translation, summarization, and LLM voice command matching:

# config.yaml — OpenAI
llm_provider: api
api_endpoint: https://api.openai.com/v1
api_model: gpt-4o
# Add SHADOW_CLERK_API_KEY=sk-... to ~/.local/share/shadow-clerk/.env

# config.yaml — Ollama (local)
llm_provider: api
api_endpoint: http://localhost:11434/v1
api_model: llama3

5. (Optional) Register as Claude Code Skill

For managing minutes generation, translation, and controls from Claude Code:

clerk-util claude-setup

This generates ~/.claude/skills/shadow-clerk/SKILL.md and adds permissions to ~/.claude/settings.local.json.

Usage

Starting the daemon

If you installed via uv tool install:

clerk-daemon

For development (uv sync):

uv run clerk-daemon

Note: uv run uses the project .venv, while uv tool install uses its own isolated environment. Make sure extras (e.g. spell-check, reazonspeech) are installed in the matching environment.

Recording & transcription

# Basic (record mic + system audio, auto-transcribe)
clerk-daemon

# List available devices
clerk-daemon --list-devices

# With options
clerk-daemon \
  --language ja \
  --model small \
  --output ~/my-transcript.txt \
  --verbose

Press Ctrl+C to stop recording.

Voice commands

Push-to-Talk (recommended)

Hold down the Menu key (next to Right Alt) while speaking a command — no wake word needed:

[Hold Menu key] "start translation" → Translation starts
[Hold Menu key] "start meeting"     → Meeting session starts

The trigger key can be changed via voice_command_key in config.yaml (ctrl_r, ctrl_l, alt_r, alt_l, shift_r, shift_l). Set to null to disable.

Prefix mode (fallback)

During recording, say the wake word (default: "sheruku" / "シェルク") followed by a command for hands-free control:

Voice command	Action
"sheruku, start meeting"	Start a new meeting session
"sheruku, end meeting"	End the meeting session
"sheruku, language ja"	Switch transcription language to Japanese
"sheruku, language en"	Switch transcription language to English
"sheruku, unset language"	Reset to auto-detect
"sheruku, start translation"	Start the translation loop
"sheruku, stop translation"	Stop the translation loop

The separator (comma, space) between the wake word and command is optional. The wake word can be changed via wake_word in config.yaml.

Custom voice commands

You can register custom voice commands in config.yaml under custom_commands. They are evaluated after built-in commands:

custom_commands:
  - pattern: "youtube"
    action: "xdg-open https://www.youtube.com"
  - pattern: "gmail|mail"
    action: "xdg-open https://mail.google.com"

pattern: Regular expression (case-insensitive)
action: Shell command to execute

LLM fallback

If a voice command doesn't match any built-in or custom command and api_endpoint is configured, the utterance is sent to the LLM as a query. The response is printed to stdout and saved to .clerk_response.

"sheruku, what is 1+1?" → LLM returns the answer

CLI options

Option	Description	Default
`--output`, `-o`	Output file path	`~/.local/share/shadow-clerk/transcript-YYYYMMDD.txt`
`--model`, `-m`	Whisper model size (`tiny`, `base`, `small`, `medium`, `large-v3`)	`small`
`--language`, `-l`	Language code (`ja`, `en`, etc.). Auto-detect if omitted	Auto
`--mic`	Microphone device number	Auto-detect
`--monitor`	Monitor device number (sounddevice)	Auto-detect
`--backend`	Audio backend (`auto`, `pipewire`, `pulseaudio`, `sounddevice`)	`auto`
`--list-devices`	List devices and exit	-
`--verbose`, `-v`	Verbose logging	-
`--dashboard` / `--no-dashboard`	Enable/disable dashboard	Enabled
`--dashboard-port`	Dashboard port number	`8765`
`--beam-size`	Whisper beam size (`1`=fast, `5`=accurate)	`5`
`--compute-type`	Whisper compute precision (`int8`, `float16`, `float32`)	`int8`
`--device`	Whisper device (`cpu`, `cuda`)	`cpu`

Translation & Summary Providers

Translation and summary each support multiple providers with different operation modes:

Claude mode (`translation_provider: claude` / `llm_provider: claude`)

Runs via the Claude Code Skill (/shadow-clerk). Claude performs translation and summary inline.

Highest quality — especially for Japanese homophone correction (ja→ja)
Requires Claude Code — must be running in a Claude Code terminal session
How translation works: /shadow-clerk start launches clerk-daemon with a background subagent that handles translation and command monitoring. Dashboard-initiated translation is also processed by this subagent
Foreground translation: /shadow-clerk translate start runs the translation loop directly in the terminal (polling output is visible; use /shadow-clerk start for background operation instead)

# config.yaml
translation_provider: claude   # Translation by Claude
llm_provider: claude           # Summary by Claude (default)

API mode (`translation_provider: api` / `llm_provider: api`)

clerk-daemon calls an external API (OpenAI-compatible) internally. Claude Code is not required.

Works without Claude Code — clerk-daemon handles translation and summary on its own
Quality depends on model — high-end models (GPT-4o) produce good results; smaller models may struggle with Japanese correction
How translation works: An internal thread in clerk-daemon processes translation. Started/stopped via voice commands or dashboard
Summary works similarly: clerk-util summarize generates minutes via the external API

# config.yaml
translation_provider: api     # Translation via external API
llm_provider: api             # Summary via external API
api_endpoint: https://api.openai.com/v1
api_model: gpt-4o

LibreTranslate mode (`translation_provider: libretranslate`)

Translation only. Runs locally without any external API or Claude Code (summary still needs llm_provider).

Recommended configurations

Use case	Translation	Summary	Notes
Best quality (with Claude Code)	`translation_provider: claude`	`llm_provider: claude`	Highest quality, requires Claude Code
Autonomous (external API)	`translation_provider: api`	`llm_provider: api`	No Claude Code needed, quality varies by model
Fully local	`translation_provider: libretranslate`	—	No LLM needed, lower quality
Hybrid	`translation_provider: api`	`llm_provider: claude`	Auto translation + high-quality summary

Meeting minutes (Claude Code Skill)

You can start/stop clerk-daemon and generate meeting minutes from Claude Code:

/shadow-clerk start                    # Start clerk-daemon in the background (with translation subagent)
/shadow-clerk start --language ja      # Start with options
/shadow-clerk stop                     # Stop clerk-daemon
/shadow-clerk start meeting            # Start a meeting session (auto_translate linked)
/shadow-clerk end meeting              # End a meeting session (auto_summary linked)
/shadow-clerk          # Update minutes from transcript diff
/shadow-clerk full     # Regenerate minutes from full transcript
/shadow-clerk status   # Check current status
/shadow-clerk translate start          # Start translation loop (foreground)
/shadow-clerk translate stop           # Stop translation loop

Note: /shadow-clerk start launches a background subagent for command monitoring. When translation_provider: claude, dashboard-initiated translation (start/regenerate) is processed by this subagent.

Generated meeting minutes are saved to ~/.local/share/shadow-clerk/summary-YYYYMMDD.md.

Configuration file

Customize defaults and auto-features in ~/.local/share/shadow-clerk/config.yaml:

# shadow-clerk config
translate_language: en        # Translation target language (ja/en/etc)
auto_translate: false         # Auto-start translation on start meeting
auto_summary: false           # Auto-generate summary on end meeting
default_language: null        # Default language for clerk-daemon (null=auto-detect)
default_model: small          # Default Whisper model for clerk-daemon
output_directory: null        # Transcript output directory (null=data directory)
llm_provider: claude          # LLM for summary ("claude" or "api")
translation_provider: null    # Translation provider (null=use llm_provider, "claude", "api", "libretranslate")
api_endpoint: null            # OpenAI Compatible API base URL
api_model: null               # API model name (gpt-4o, etc.)
api_key_env: SHADOW_CLERK_API_KEY  # Environment variable name for API key
summary_source: null          # Summary source (null=auto: prefer translation if exists / "transcript" / "translate")
summary_language: null        # Summary output language (null=fallback to ui_language / ja, en, zh, ...)
libretranslate_endpoint: null     # LibreTranslate API URL (e.g. http://localhost:5000)
libretranslate_api_key: null      # LibreTranslate API key (null if not required)
libretranslate_spell_check: false # Spell check before LibreTranslate translation
spell_check_model: mbyhphat/t5-japanese-typo-correction  # Spell check model
custom_commands: []               # Custom voice commands (list of pattern + action)
initial_prompt: null              # Whisper initial_prompt (vocabulary hints for recognition)
voice_command_key: f23         # Push-to-Talk key (null=disabled)
wake_word: シェルク              # Wake word (trigger word for voice commands)
whisper_beam_size: 5           # Whisper beam size (1=fast, 5=accurate)
whisper_compute_type: int8     # Compute precision (int8/float16/float32)
whisper_device: cpu            # Device (cpu/cuda)
interim_transcription: false   # Interim transcription (real-time display while speaking)
interim_model: base            # Model for interim transcription
japanese_asr_model: default    # Japanese ASR model (default/kotoba-whisper/reazonspeech-k2)
kotoba_whisper_model: kotoba-tech/kotoba-whisper-v2.0-faster  # Kotoba-Whisper model
interim_japanese_asr_model: default  # Japanese ASR for interim transcription
ui_language: ja                # UI language (ja/en) — dashboard, terminal output, LLM prompts

Manage configuration from Claude Code:

/shadow-clerk config show                     # Show current config
/shadow-clerk config set default_model tiny   # Change a setting
/shadow-clerk config set auto_translate true  # Enable auto-translation
/shadow-clerk config init                     # Generate default config file

With auto_translate: true, translation starts automatically on /shadow-clerk start meeting. With auto_summary: true, meeting minutes are generated automatically on /shadow-clerk end meeting.

Summary source selection

When summary_source is unset (null/auto), the summary is generated from the translation file if one exists (falling back to the transcript if not). To pin the behavior explicitly:

/shadow-clerk config set summary_source transcript   # always use transcript
/shadow-clerk config set summary_source translate    # always use translation (fallback to transcript if missing)

Summary language

summary_language controls the output language of the summary. When unset (null), it falls back to ui_language:

/shadow-clerk config set summary_language en   # summarize in English
/shadow-clerk config set summary_language ja   # summarize in Japanese

File structure

shadow-clerk/                          # Repository
  pyproject.toml                       # Project definition & dependencies
  src/shadow_clerk/                    # Main package
    __init__.py                        # Data directory configuration
    clerk_daemon.py                    # Recording, VAD, transcription & dashboard
    llm_client.py                      # External API translation & summary
    i18n.py                            # Internationalization (ja/en)
    clerk_util.py                      # Data directory operations & process management
    data/
      SKILL.md.template                # Claude Code Skill template
  skills/
    SKILL.md                           # Claude Code Skill definition (development)

~/.local/share/shadow-clerk/           # Runtime data
  transcript-YYYYMMDD.txt              # Transcription output (date-based)
  transcript-YYYYMMDDHHMM.txt          # Meeting session transcript
  transcript-YYYYMMDDHHMM@Title.txt    # Meeting session transcript (with event title)
  transcript-YYYYMMDD-<lang>.txt       # Translation output
  summary-YYYYMMDD.md                  # Meeting minutes (corresponds to transcript)
  summary-YYYYMMDDHHMM@Title.md        # Meeting minutes (named session)
  glossary.txt                         # Glossary (TSV: translation terms & reading-based text replacement)
  config.yaml                          # Configuration file
  gcal_token.json                      # Google Calendar OAuth token (created by gcal-auth)

Troubleshooting

Device not found

# List available devices
clerk-daemon --list-devices

# PipeWire: check status
wpctl status

# PulseAudio: list sources
pactl list short sources

Monitor source (system audio) not detected

On PipeWire, check sink (output) devices with wpctl status. On PulseAudio, look for sources containing .monitor with pactl list short sources.

You can also specify the device number manually:

clerk-daemon --monitor 5

PortAudio error

Make sure libportaudio2 is installed:

dpkg -l | grep portaudio

If you see PortAudioError: Error initializing PortAudio: ... PulseAudio_Initialize: Can't connect to server, the PulseAudio-compatible service may have crashed. On PipeWire systems, restart pipewire-pulse:

systemctl --user restart pipewire-pulse

Slow transcription

Use a lighter model with --model tiny:

clerk-daemon --model tiny

Japanese ASR models

The japanese_asr_model setting selects the ASR backend used when language=ja. When the language changes to something other than ja, it automatically reverts to standard Whisper.

Value	Model	Requires	Japanese accuracy	CPU speed
`default`	Standard Whisper	—	Depends on model size	Depends on model size
`kotoba-whisper`	Kotoba-Whisper	Auto-downloaded on first use	High (rivals large-v3)	~medium
`reazonspeech-k2`	ReazonSpeech k2	`uv sync --extra reazonspeech`	High	Fast

Kotoba-Whisper retains the full large-v3 encoder (32 layers) while distilling the decoder down to just 2 layers. Since it has only 2 decoder layers, beam=5 has almost no speed penalty.

ReazonSpeech k2 uses sherpa-onnx for inference. When selected, Whisper-specific settings (default_model, whisper_beam_size, whisper_compute_type, initial_prompt) are not used.

Selection guide:

Use case	Settings
Japanese-focused, accuracy priority	`japanese_asr_model: kotoba-whisper`, `whisper_beam_size: 5`
Japanese-focused, fast & accurate	`japanese_asr_model: reazonspeech-k2`
Japanese-focused, speed priority (CPU)	`japanese_asr_model: default`, `default_model: small`, `whisper_beam_size: 3`
Multilingual	`japanese_asr_model: kotoba-whisper`, `default_model: small` (Kotoba for ja, small for others)

Interim transcription:

interim_japanese_asr_model controls which Japanese ASR model is used for interim transcription (real-time display while speaking). On CPU, keeping the default (default with a lightweight model like tiny/base) is recommended.

# Japanese accuracy priority (GPU recommended)
japanese_asr_model: kotoba-whisper
interim_japanese_asr_model: kotoba-whisper
whisper_beam_size: 5

# Japanese accuracy + fast interim (CPU recommended)
japanese_asr_model: kotoba-whisper
interim_japanese_asr_model: default
interim_model: base
whisper_beam_size: 5        # Kotoba has only 2 decoder layers, beam=5 is fine

# ReazonSpeech (fast & accurate, CPU friendly)
japanese_asr_model: reazonspeech-k2
interim_japanese_asr_model: default
interim_model: base

# Maximum speed (CPU)
japanese_asr_model: default
default_model: small
interim_model: base
whisper_beam_size: 1

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
docs		docs
improvement		improvement
skills		skills
src/shadow_clerk		src/shadow_clerk
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.ja.md		README.ja.md
README.md		README.md
SPEC.md		SPEC.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Shadow-clerk

Features and requirements

Setup

1. System packages

2. Install

2a. For development

Optional: Japanese ASR models

Optional: Spell check (pre-translation correction)

Optional: Google Calendar integration

3. (Optional) LibreTranslate setup

4. (Optional) OpenAI compatible API setup

5. (Optional) Register as Claude Code Skill

Usage

Starting the daemon

Recording & transcription

Voice commands

Push-to-Talk (recommended)

Prefix mode (fallback)

Custom voice commands

LLM fallback

CLI options

Translation & Summary Providers

Claude mode (translation_provider: claude / llm_provider: claude)

API mode (translation_provider: api / llm_provider: api)

LibreTranslate mode (translation_provider: libretranslate)

Recommended configurations

Meeting minutes (Claude Code Skill)

Configuration file

Summary source selection

Summary language

File structure

Troubleshooting

Device not found

Monitor source (system audio) not detected

PortAudio error

Slow transcription

Japanese ASR models

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Claude mode (`translation_provider: claude` / `llm_provider: claude`)

API mode (`translation_provider: api` / `llm_provider: api`)

LibreTranslate mode (`translation_provider: libretranslate`)

Packages