feat: add opt-in TwelveLabs Pegasus on-screen visual context by mohit-twelvelabs · Pull Request #580 · Huanshere/VideoLingo

mohit-twelvelabs · 2026-06-27T07:18:51Z

Hi! I'm Mohit, I work at TwelveLabs (@mohit-twelvelabs).

简体中文（摘要）

目前的总结 / 术语提取步骤（_4_1_summarize.py）只依据音频转写文本，因此看不到画面上的文字、产品 / 品牌名称、UI 标签和图表。本 PR 新增一个可选的 TwelveLabs Pegasus 视觉上下文：开启后，Pegasus 会对视频画面分析一次，把"画面里有什么"的描述注入到总结 / 术语提示词中，从而帮助消除画面内容在断句与翻译时的歧义（例如专有名词、屏幕上出现的品牌名）。

完全可选、不破坏现有行为：默认关闭（pegasus.enabled: false），未配置 key 时流程与之前完全一致。可在 twelvelabs.io 免费获取 API key，有慷慨的免费额度。

What this adds

VideoLingo's summary/terminology step (core/_4_1_summarize.py) builds the theme and glossary from the transcript only, so it's blind to on-screen text, product/brand names, UI labels, and charts that the narration never says aloud. This PR adds an opt-in TwelveLabs Pegasus visual-context pass:

New core/utils/pegasus_context.py — uploads the input video once and asks Pegasus 1.5 to describe the on-screen visual layer.
The description is injected into get_summary_prompt(...) (new optional visual_context arg) so the summary + terminology extraction can disambiguate proper nouns and on-screen terms before translation/segmentation.
New pegasus: block in config.yaml.

Why it helps VideoLingo

The pipeline's translation quality hinges on the summary/terminology step. Feeding it what's actually on screen (e.g. a brand logo, an app name in the UI, a chart label) lets it pick the right translation for ambiguous terms and proper nouns that the audio alone can't pin down.

Opt-in / non-breaking

Disabled by default (pegasus.enabled: false). With no key configured, get_visual_context() returns "" and the pipeline behaves exactly as before.
Key is read from pegasus.api_key, falling back to the TWELVELABS_API_KEY env var — no key is hardcoded.
All Pegasus errors are caught with a warning so a hiccup never breaks translation. Result is cached to output/log/ for resumed runs. >200MB direct-upload cap is guarded.

How it was tested

tests/test_pegasus_context.py: 4 no-network unit tests verifying the feature is genuinely opt-in (disabled / no-key → no-op) and that the prompt is unchanged without context but injects it when present — all pass.
A live test gated on TWELVELABS_API_KEY (skipped without it) that performs a real asset upload + Pegasus analyze — verified passing locally against the TwelveLabs API (returns a non-empty on-screen description).
Modules compile and import cleanly; config loads.

Note: I couldn't run VideoLingo's full GPU/WhisperX pipeline end-to-end in my Linux sandbox (heavy CUDA/ML deps), so the integration follows the repo's existing backend conventions (requests-style modules, load_key, rprint, output/log/ caching) and was validated at the unit + live-API level. Happy to adjust to your preferences.

twelvelabs>=1.2.8 is added to requirements.txt (pure-Python SDK; the import is guarded so it's only needed when the feature is enabled).

You can grab a free API key at https://twelvelabs.io — there's a generous free tier.

The summary/terminology step works from the transcript only and is blind to on-screen text, product/brand names, UI labels and charts. When enabled via the new pegasus config block, TwelveLabs Pegasus describes that visual layer once and feeds it into the summary prompt to disambiguate segmentation and translation of on-screen content. Opt-in and non-breaking: disabled by default, and with no key configured the pipeline behaves exactly as before. Adds focused tests (no-network unit tests plus a live Pegasus check gated on TWELVELABS_API_KEY).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add opt-in TwelveLabs Pegasus on-screen visual context#580

feat: add opt-in TwelveLabs Pegasus on-screen visual context#580
mohit-twelvelabs wants to merge 1 commit into
Huanshere:mainfrom
mohit-twelvelabs:feat/twelvelabs-integration

mohit-twelvelabs commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mohit-twelvelabs commented Jun 27, 2026

简体中文（摘要）

What this adds

Why it helps VideoLingo

Opt-in / non-breaking

How it was tested

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant