From 8eb83cdd989d84b09ccfa6f7b292d00e85340fb1 Mon Sep 17 00:00:00 2001
From: Alex Liang <leungpuingai@gmail.com>
Date: Thu, 11 Jun 2026 13:58:40 +0800
Subject: [PATCH 1/2] feat: v3.0.2 - i18n, audio-only flow, task control, UX
 polish
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Bumps version to 3.0.2.
版本号升至 3.0.2。

主要内容 / Major themes:

i18n
- Browser language auto-detection (Accept-Language) on first load,
  with a top-right language selector overriding the choice per session.
  首次加载根据浏览器 Accept-Language 自动识别语言；右上角语言选择器可按会话覆盖。
- Sidebar duplicate language selector removed.
  移除侧边栏重复的语言选择器。
- Routes display_language through query params + session_state, with
  config.yaml as a fallback only.
  display_language 改为优先走 query params + session_state，config.yaml 仅作兜底。
- Adds normalize_language_code() to map zh / zh-CN / zh-HK / zh-Hant /
  variants to the supported set.
  新增 normalize_language_code()，把 zh / zh-CN / zh-HK / zh-Hant 等变体统一映射到受支持的语言。
- Translates previously hard-coded UI strings: WhisperX runtime, TTS
  engine names, Voice / 302ai API / ElevenLabs API labels, "Star on
  GitHub" button, YouTube resolution "Best".
  翻译之前硬编码的 UI 文案：WhisperX runtime、TTS 引擎名、Voice / 302ai API / ElevenLabs API 标签、"Star on GitHub" 按钮、YouTube 分辨率 "Best"。
- Fixes 'here' link text leaking English in zh-CN / zh-HK welcome
  string.
  修复欢迎语在简中/繁中里 "here" 链接文字仍为英文的问题。
- Adds CSS overlay for the file_uploader internals (Streamlit has no
  official i18n for these) covering "Drag and drop file here",
  "Limit ... per file" and "Browse files" labels.
  通过 CSS 覆盖 file_uploader 内部文案（Streamlit 官方未提供 i18n），包括 "Drag and drop file here"、"Limit ... per file"、"Browse files"。
- Hides Streamlit developer toolbar (client.toolbarMode = "viewer")
  and disables the file watcher (server.fileWatcherType = "none") so
  "File change / Rerun / Always rerun" prompts no longer appear.
  隐藏 Streamlit 开发者工具栏（client.toolbarMode = "viewer"），关闭文件监听（server.fileWatcherType = "none"），避免出现 "File change / Rerun / Always rerun" 英文提示。
- Fills missing translation keys across en / zh-CN / zh-HK / es / fr /
  ja / ru.
  补全 en / zh-CN / zh-HK / es / fr / ja / ru 七种语言中缺失的翻译键。

Audio-only input flow / 纯音频输入流程
- Adds output/input_manifest.json written by the upload / YouTube
  download path, recording the original media type. find_media_file()
  now reads the manifest first, so generated artefacts (dub.mp3,
  normalized_dub.wav) no longer poison detection.
  上传 / YouTube 下载后写入 output/input_manifest.json 记录原始媒体类型。find_media_file() 优先读 manifest，避免生成产物（dub.mp3、normalized_dub.wav）污染识别。
- find_audio_files() now skips generated audio names.
  find_audio_files() 自动跳过生成产物文件名。
- find_media_file() distinguishes "no media" vs "multiple media"
  errors instead of silently falling back.
  find_media_file() 区分"无媒体"和"多个媒体"两种错误，不再静默 fallback。
- Sidebar no longer persistently writes burn_subtitles = false when
  the input is audio; the toggle is only disabled in the UI.
  音频输入时不再持久化写入 burn_subtitles = false，只在 UI 层禁用开关。
- Main pipeline now flows download -> subtitles -> (optional) dubbing,
  only showing the dubbing section after subtitles are done AND the
  input is not audio.
  主流程改为：下载 → 字幕 →（可选）配音；只有字幕完成且输入不是音频时才显示配音段。
- Adds prepare_audio_for_asr() so audio-only inputs are normalized to
  16k / mono / mp3 without going through video conversion.
  新增 prepare_audio_for_asr()，纯音频输入直接归一化为 16k / 单声道 / mp3，不再经过视频转换。
- Removes the obsolete convert_audio_to_video() placeholder path.
  移除已废弃的 convert_audio_to_video() 占位路径。

Task control (pause / resume / stop) / 任务控制（暂停 / 继续 / 停止）
- TaskRunner gains a class-level _current pointer and TaskRunner.check_cancel()
  so long-running core loops can cooperatively cancel.
  TaskRunner 新增类级 _current 指针和 TaskRunner.check_cancel()，让长循环可以协作式取消。
- Adds core.utils.check_cancel() wrapper, imported via the existing
  `from core.utils import *` pattern.
  新增 core.utils.check_cancel() 包装，沿用现有 `from core.utils import *` 导入方式。
- Inserts check_cancel() into the hot loops: ASR segment loop,
  translate parallel loop, TTS warmup / parallel collection / chunk
  merge, audio segment merge, translate_lines entry. Parallel loops
  also cancel pending futures on stop.
  在热点循环里插入 check_cancel()：ASR 分段循环、翻译并行循环、TTS warmup / 并行收集 / chunk 合并、音频段合并、translate_lines 入口。并行循环在 stop 时主动 cancel 未启动的 futures。

Done markers and completion detection / 完成标记与状态判断
- Adds output/.subtitle_done and output/.dubbing_done markers written
  by the runner as the last step of each stage.
  TaskRunner 在每个阶段最后一步写入 output/.subtitle_done 和 output/.dubbing_done 标记。
- text_done / audio_done now prefer the marker, falling back to a full
  outputs-present check, so half-failed runs are no longer mistaken
  for completion.
  text_done / audio_done 优先读标记，没有则回落到"所有最终产物齐全"检查，避免半失败被误判为完成。
- Subtitle length tuning controls (max_split_length, subtitle.max_length)
  surfaced in an expandable section above "Start Processing Subtitles",
  with suggested ranges and a "Restore defaults" button.
  在"开始处理字幕"上方加入折叠的字幕长度微调（max_split_length、subtitle.max_length），含建议范围和"恢复默认"按钮。

Robustness fixes / 健壮性修复
- ElevenLabs ASR: elev2whisper() now always emits word-level
  timestamps; process_transcription() tolerates segments without
  `words` by synthesizing one from the segment text.
  ElevenLabs ASR：elev2whisper() 始终输出词级时间戳；process_transcription() 对没有 `words` 的 segment 用 segment 文本合成一个，避免 KeyError。
- download_video_section now surfaces detection errors (e.g. multiple
  media files in output/) with a clear message and a "Clear output and
  reselect" button, instead of silently falling back to the upload
  view.
  download_video_section 在媒体识别失败（例如 output/ 里有多个媒体）时显示明确错误和"清空输出并重新选择"按钮，不再静默回到上传界面。
- Re-upload of the same file is detected via session_state, avoiding
  an infinite rerun loop.
  通过 session_state 识别同一文件的重复上传，避免无限 rerun 循环。
- give_star_button rewritten to a plain string template; previous
  f-string broke on the literal `{` in the embedded CSS.
  give_star_button 改写为普通字符串模板；原 f-string 因内嵌 CSS 里的 `{` 报错。

Tooling and config / 工具与配置
- OneKeyStart.bat consolidated: auto-detects .venv (uv install) or
  falls back to the legacy Conda env "videolingo"; OneKeyStart_uv.bat
  removed.
  合并 OneKeyStart.bat：自动检测 .venv（uv 安装）或回落到旧的 Conda 环境 "videolingo"；删除 OneKeyStart_uv.bat。
- Logs now go to logs/videolingo_<timestamp>.log instead of the
  project root.
  日志写入 logs/videolingo_<timestamp>.log，不再散落在项目根目录。
- .streamlit/config.toml: client.toolbarMode = "viewer",
  server.fileWatcherType = "none", server.maxUploadSize preserved.
  .streamlit/config.toml：client.toolbarMode = "viewer"，server.fileWatcherType = "none"，保留 server.maxUploadSize。
- .gitignore: ignores logs/, videolingo_*.log, AGENTS.md, pr-body.md.
  .gitignore：忽略 logs/、videolingo_*.log、AGENTS.md、pr-body.md。
- setup.py + config.yaml header bumped to 3.0.2.
  setup.py 与 config.yaml 顶部版本号统一升至 3.0.2。

No new dependencies. No CLI behavior changes.
未引入新依赖。CLI 行为无变更。
---
 .gitignore                              |   5 +-
 .streamlit/config.toml                  |   6 +-
 OneKeyStart_uv.bat                      |  19 ---
 config.yaml                             |  10 +-
 core/_10_gen_audio.py                   |  24 ++--
 core/_11_merge_audio.py                 |   1 +
 core/_12_dub_to_vid.py                  |   5 +
 core/_1_ytdlp.py                        |  72 ++++++++++
 core/_2_asr.py                          |  16 ++-
 core/_4_2_translate.py                  |  12 +-
 core/_7_sub_into_vid.py                 |   7 +-
 core/asr_backend/audio_preprocess.py    |  41 +++++-
 core/asr_backend/elevenlabs_asr.py      |   3 +-
 core/st_utils/download_video_section.py | 169 +++++++++++++++++-------
 core/st_utils/imports_and_utils.py      |  10 +-
 core/st_utils/sidebar_setting.py        |  69 ++++++----
 core/st_utils/task_runner.py            |  29 ++++
 core/translate_lines.py                 |   1 +
 core/utils/__init__.py                  |  25 +++-
 core/utils/models.py                    |  11 +-
 core/utils/onekeycleanup.py             |  11 +-
 setup.py                                |   2 +-
 st.py                                   | 154 +++++++++++++++++++--
 translations/en.json                    |  39 +++++-
 translations/es.json                    |  32 ++++-
 translations/fr.json                    |  32 ++++-
 translations/ja.json                    |  32 ++++-
 translations/ru.json                    |  32 ++++-
 translations/translations.py            |  90 ++++++++++++-
 translations/zh-CN.json                 |  41 +++++-
 translations/zh-HK.json                 |  41 +++++-
 31 files changed, 882 insertions(+), 159 deletions(-)
 delete mode 100644 OneKeyStart_uv.bat

diff --git a/.gitignore b/.gitignore
index b646718e..de46b8a4 100644
--- a/.gitignore
+++ b/.gitignore
@@ -172,4 +172,7 @@ config.backup.yaml
 runtime/
 dev/
 installer_files/
-logs/
\ No newline at end of file
+
+# Streamlit runtime logs from OneKeyStart.bat
+logs/
+videolingo_*.log
diff --git a/.streamlit/config.toml b/.streamlit/config.toml
index 8c550bea..38567ef9 100644
--- a/.streamlit/config.toml
+++ b/.streamlit/config.toml
@@ -1,2 +1,6 @@
 [server]
-maxUploadSize = 4096
\ No newline at end of file
+maxUploadSize = 4096
+fileWatcherType = "none"
+
+[client]
+toolbarMode = "viewer"
diff --git a/OneKeyStart_uv.bat b/OneKeyStart_uv.bat
deleted file mode 100644
index 8cb2c70c..00000000
--- a/OneKeyStart_uv.bat
+++ /dev/null
@@ -1,19 +0,0 @@
-@echo off
-cd /D "%~dp0"
-
-:: Log file with timestamp
-for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /value') do set dt=%%I
-set LOGFILE=videolingo_%dt:~0,8%_%dt:~8,6%.log
-
-echo [%date% %time%] VideoLingo starting... > "%LOGFILE%"
-echo Log file: %LOGFILE%
-
-if exist ".venv\Scripts\streamlit.exe" (
-    .venv\Scripts\streamlit run st.py 2>&1 | powershell -Command "$input | Tee-Object -FilePath '%LOGFILE%' -Append"
-) else if exist ".venv\Scripts\python.exe" (
-    .venv\Scripts\python -m streamlit run st.py 2>&1 | powershell -Command "$input | Tee-Object -FilePath '%LOGFILE%' -Append"
-) else (
-    echo ERROR: .venv not found. Please run setup first: | powershell -Command "$input | Tee-Object -FilePath '%LOGFILE%' -Append"
-    echo   python setup_env.py
-)
-pause
diff --git a/config.yaml b/config.yaml
index c4b98cb2..a91bacd0 100644
--- a/config.yaml
+++ b/config.yaml
@@ -1,7 +1,7 @@
 # * Settings marked with * are advanced settings that won't appear in the Streamlit page and can only be modified manually in config.py
 # recommend to set in streamlit page
 # -------------------
-# version: "3.0.0"
+# version: "3.0.2"
 # author: "Huanshere"
 # -------------------
 
@@ -11,9 +11,9 @@ display_language: "zh-CN"
 
 # API settings
 api:
-  key: 'your-api-key'
+  key: 'YOUR_API_KEY'
   base_url: 'https://yunwu.ai'
-  model: ''
+  model: 'gpt-5.5'
   llm_support_json: false
 # *Number of LLM multi-threaded accesses, set to 1 if using local LLM
 max_workers: 4
@@ -22,7 +22,7 @@ max_workers: 4
 target_language: '简体中文'
 
 # Whether to use Demucs for vocal separation before transcription
-demucs: true
+demucs: false
 
 whisper:
   # ["large-v3", "large-v3-turbo"]. Note: for zh model will force to use Belle/large-v3
@@ -38,7 +38,7 @@ whisper:
   elevenlabs_api_key: 'your_elevenlabs_api_key'
 
 # Whether to burn subtitles into the video
-burn_subtitles: true
+burn_subtitles: false
 
 ## ======================== Advanced Settings ======================== ##
 # *🔬 h264_nvenc GPU acceleration for ffmpeg, make sure your GPU supports it
diff --git a/core/_10_gen_audio.py b/core/_10_gen_audio.py
index 94e52454..a80a650e 100644
--- a/core/_10_gen_audio.py
+++ b/core/_10_gen_audio.py
@@ -85,6 +85,7 @@ def generate_tts_audio(tasks_df: pd.DataFrame) -> pd.DataFrame:
         warmup_size = min(WARMUP_SIZE, len(tasks_df))
         for _, row in tasks_df.head(warmup_size).iterrows():
             try:
+                check_cancel()
                 number, real_dur = process_row(row, tasks_df)
                 tasks_df.loc[tasks_df['number'] == number, 'real_dur'] = real_dur
                 progress.advance(task)
@@ -103,14 +104,20 @@ def generate_tts_audio(tasks_df: pd.DataFrame) -> pd.DataFrame:
                     for _, row in remaining_tasks.iterrows()
                 ]
                 
-                for future in as_completed(futures):
-                    try:
-                        number, real_dur = future.result()
-                        tasks_df.loc[tasks_df['number'] == number, 'real_dur'] = real_dur
-                        progress.advance(task)
-                    except Exception as e:
-                        rprint(f"[red]❌ Error: {str(e)}[/red]")
-                        raise e
+                try:
+                    for future in as_completed(futures):
+                        check_cancel()
+                        try:
+                            number, real_dur = future.result()
+                            tasks_df.loc[tasks_df['number'] == number, 'real_dur'] = real_dur
+                            progress.advance(task)
+                        except Exception as e:
+                            rprint(f"[red]❌ Error: {str(e)}[/red]")
+                            raise e
+                except BaseException:
+                    for f in futures:
+                        f.cancel()
+                    raise
 
     rprint("[bold green]✨ TTS audio generation completed![/bold green]")
     return tasks_df
@@ -149,6 +156,7 @@ def merge_chunks(tasks_df: pd.DataFrame) -> pd.DataFrame:
     
     for index, row in tasks_df.iterrows():
         if row['cut_off'] == 1:
+            check_cancel()
             chunk_df = tasks_df.iloc[chunk_start:index+1].reset_index(drop=True)
             speed_factor, keep_gaps = process_chunk(chunk_df, accept, min_speed)
             
diff --git a/core/_11_merge_audio.py b/core/_11_merge_audio.py
index 41c8ac16..014cb952 100644
--- a/core/_11_merge_audio.py
+++ b/core/_11_merge_audio.py
@@ -58,6 +58,7 @@ def merge_audio_segments(audios, new_sub_times, sample_rate):
         merge_task = progress.add_task("🎵 Merging audio segments...", total=len(audios))
         
         for i, (audio_file, time_range) in enumerate(zip(audios, new_sub_times)):
+            check_cancel()
             if not os.path.exists(audio_file):
                 console.print(f"[bold yellow]⚠️  Warning: File {audio_file} does not exist, skipping...[/bold yellow]")
                 progress.advance(merge_task)
diff --git a/core/_12_dub_to_vid.py b/core/_12_dub_to_vid.py
index da7b2895..4bbdbe58 100644
--- a/core/_12_dub_to_vid.py
+++ b/core/_12_dub_to_vid.py
@@ -30,6 +30,11 @@
 
 def merge_video_audio():
     """Merge video and audio, and reduce video volume"""
+    from core._1_ytdlp import is_audio_only_input
+    if is_audio_only_input():
+        rprint("[bold green]🎵 Audio-only input: skipping dubbing video merge. Dubbed audio is in the `output` directory.[/bold green]")
+        return
+
     VIDEO_FILE = find_video_files()
     background_file = _BACKGROUND_AUDIO_FILE
     
diff --git a/core/_1_ytdlp.py b/core/_1_ytdlp.py
index 6064ae27..c0511029 100644
--- a/core/_1_ytdlp.py
+++ b/core/_1_ytdlp.py
@@ -1,9 +1,14 @@
 import os,sys
 import glob
+import json
 import re
 import subprocess
 from core.utils import *
 
+OUTPUT_DIR = "output"
+INPUT_MANIFEST = "input_manifest.json"
+GENERATED_AUDIO_NAMES = {"dub.mp3", "normalized_dub.wav"}
+
 def sanitize_filename(filename):
     # Remove or replace illegal characters
     filename = re.sub(r'[<>:"/\\|?*]', '', filename)
@@ -51,6 +56,27 @@ def download_video_ytdlp(url, save_path='output', resolution='1080'):
             new_filename = sanitize_filename(filename)
             if new_filename != filename:
                 os.rename(os.path.join(save_path, file), os.path.join(save_path, new_filename + ext))
+    media_file = find_video_files(save_path)
+    write_input_manifest(media_file, "video", save_path)
+
+def write_input_manifest(media_file: str, media_type: str, save_path='output'):
+    os.makedirs(save_path, exist_ok=True)
+    manifest_path = os.path.join(save_path, INPUT_MANIFEST)
+    media_path = media_file.replace("\\", "/") if sys.platform.startswith('win') else media_file
+    with open(manifest_path, "w", encoding="utf-8") as f:
+        json.dump({"path": media_path, "type": media_type}, f, ensure_ascii=False, indent=2)
+
+def _read_input_manifest(save_path='output'):
+    manifest_path = os.path.join(save_path, INPUT_MANIFEST)
+    if not os.path.exists(manifest_path):
+        return None
+    with open(manifest_path, "r", encoding="utf-8") as f:
+        data = json.load(f)
+    media_file = data.get("path")
+    media_type = data.get("type")
+    if media_type not in {"video", "audio"} or not media_file or not os.path.exists(media_file):
+        return None
+    return media_file.replace("\\", "/") if sys.platform.startswith('win') else media_file, media_type
 
 def find_video_files(save_path='output'):
     video_files = [file for file in glob.glob(save_path + "/*") if os.path.splitext(file)[1][1:].lower() in load_key("allowed_video_formats")]
@@ -62,6 +88,52 @@ def find_video_files(save_path='output'):
         raise ValueError(f"Number of videos found {len(video_files)} is not unique. Please check.")
     return video_files[0]
 
+def find_audio_files(save_path='output'):
+    audio_files = [file for file in glob.glob(save_path + "/*") if os.path.splitext(file)[1][1:].lower() in load_key("allowed_audio_formats")]
+    if sys.platform.startswith('win'):
+        audio_files = [file.replace("\\", "/") for file in audio_files]
+    audio_files = [file for file in audio_files if os.path.basename(file) not in GENERATED_AUDIO_NAMES]
+    if len(audio_files) != 1:
+        raise ValueError(f"Number of audio files found {len(audio_files)} is not unique. Please check.")
+    return audio_files[0]
+
+def _safe_find_video_file(save_path='output'):
+    try:
+        return find_video_files(save_path)
+    except ValueError as e:
+        if "found 0" in str(e):
+            return None
+        raise
+
+def _safe_find_audio_file(save_path='output'):
+    try:
+        return find_audio_files(save_path)
+    except ValueError as e:
+        if "found 0" in str(e):
+            return None
+        raise
+
+def find_media_file(save_path='output'):
+    manifest = _read_input_manifest(save_path)
+    if manifest:
+        return manifest
+    video_file = _safe_find_video_file(save_path)
+    if video_file:
+        return video_file, "video"
+    audio_file = _safe_find_audio_file(save_path)
+    if audio_file:
+        return audio_file, "audio"
+    raise ValueError("No media file found. Please download or upload a media file first.")
+
+def is_audio_only_input(save_path='output'):
+    # True when the input is a standalone audio file (no video present).
+    # In this case VideoLingo only produces subtitle files; no video output.
+    try:
+        _, media_type = find_media_file(save_path)
+        return media_type == "audio"
+    except Exception:
+        return False
+
 if __name__ == '__main__':
     # Example usage
     url = input('Please enter the URL of the video you want to download: ')
diff --git a/core/_2_asr.py b/core/_2_asr.py
index f54e8b10..24e071ec 100644
--- a/core/_2_asr.py
+++ b/core/_2_asr.py
@@ -1,14 +1,17 @@
 from core.utils import *
 from core.asr_backend.demucs_vl import demucs_audio
-from core.asr_backend.audio_preprocess import process_transcription, convert_video_to_audio, split_audio, save_results, normalize_audio_volume
-from core._1_ytdlp import find_video_files
+from core.asr_backend.audio_preprocess import process_transcription, convert_video_to_audio, prepare_audio_for_asr, split_audio, save_results, normalize_audio_volume
+from core._1_ytdlp import find_media_file
 from core.utils.models import *
 
 @check_file_exists(_2_CLEANED_CHUNKS)
 def transcribe():
-    # 1. video to audio
-    video_file = find_video_files()
-    convert_video_to_audio(video_file)
+    # 1. prepare audio
+    media_file, media_type = find_media_file()
+    if media_type == "video":
+        convert_video_to_audio(media_file)
+    else:
+        prepare_audio_for_asr(media_file)
 
     # 2. Demucs vocal separation:
     if load_key("demucs"):
@@ -34,6 +37,7 @@ def transcribe():
         rprint("[cyan]🎤 Transcribing audio with ElevenLabs API...[/cyan]")
 
     for start, end in segments:
+        check_cancel()
         result = ts(_RAW_AUDIO_FILE, vocal_audio, start, end)
         all_results.append(result)
     
@@ -47,4 +51,4 @@ def transcribe():
     save_results(df)
         
 if __name__ == "__main__":
-    transcribe()
\ No newline at end of file
+    transcribe()
diff --git a/core/_4_2_translate.py b/core/_4_2_translate.py
index 1376f88f..dca7ffcb 100644
--- a/core/_4_2_translate.py
+++ b/core/_4_2_translate.py
@@ -67,9 +67,15 @@ def translate_all():
                 future = executor.submit(translate_chunk, chunk, chunks, theme_prompt, i)
                 futures.append(future)
             results = []
-            for future in concurrent.futures.as_completed(futures):
-                results.append(future.result())
-                progress.update(task, advance=1)
+            try:
+                for future in concurrent.futures.as_completed(futures):
+                    check_cancel()
+                    results.append(future.result())
+                    progress.update(task, advance=1)
+            except BaseException:
+                for f in futures:
+                    f.cancel()
+                raise
 
     results.sort(key=lambda x: x[0])  # Sort results based on original order
     
diff --git a/core/_7_sub_into_vid.py b/core/_7_sub_into_vid.py
index 7a2e253f..239c10b4 100644
--- a/core/_7_sub_into_vid.py
+++ b/core/_7_sub_into_vid.py
@@ -41,6 +41,11 @@ def check_gpu_available():
         return False
 
 def merge_subtitles_to_video():
+    from core._1_ytdlp import is_audio_only_input
+    if is_audio_only_input():
+        rprint("[bold green]🎵 Audio-only input: skipping video merge. Subtitle files are ready in the `output` directory.[/bold green]")
+        return
+
     video_file = find_video_files()
     os.makedirs(os.path.dirname(OUTPUT_VIDEO), exist_ok=True)
 
@@ -103,4 +108,4 @@ def merge_subtitles_to_video():
             process.kill()
 
 if __name__ == "__main__":
-    merge_subtitles_to_video()
\ No newline at end of file
+    merge_subtitles_to_video()
diff --git a/core/asr_backend/audio_preprocess.py b/core/asr_backend/audio_preprocess.py
index 0d0db2ff..19738d83 100644
--- a/core/asr_backend/audio_preprocess.py
+++ b/core/asr_backend/audio_preprocess.py
@@ -52,6 +52,27 @@ def convert_video_to_audio(video_file: str):
         subprocess.run(cmd, check=True, stderr=subprocess.PIPE)
         rprint(f"[green]🎬➡️🎵 Converted <{video_file}> to <{_RAW_AUDIO_FILE}> with FFmpeg\n[/green]")
 
+def prepare_audio_for_asr(audio_file: str):
+    os.makedirs(_AUDIO_DIR, exist_ok=True)
+    if not os.path.exists(_RAW_AUDIO_FILE):
+        rprint(f"[blue]🎵 Preparing uploaded audio for ASR with FFmpeg ......[/blue]")
+        if _ffmpeg_has_encoder('libmp3lame'):
+            cmd = [
+                'ffmpeg', '-y', '-i', audio_file, '-vn',
+                '-c:a', 'libmp3lame', '-b:a', '32k',
+                '-ar', '16000', '-ac', '1',
+                '-metadata', 'encoding=UTF-8', _RAW_AUDIO_FILE
+            ]
+        else:
+            rprint("[yellow]⚠️ libmp3lame not found in ffmpeg, falling back to WAV (PCM) encoding[/yellow]")
+            cmd = [
+                'ffmpeg', '-y', '-i', audio_file, '-vn',
+                '-c:a', 'pcm_s16le', '-ar', '16000', '-ac', '1',
+                '-f', 'wav', _RAW_AUDIO_FILE
+            ]
+        subprocess.run(cmd, check=True, stderr=subprocess.PIPE)
+        rprint(f"[green]🎵 Prepared <{audio_file}> as <{_RAW_AUDIO_FILE}>\n[/green]")
+
 def get_audio_duration(audio_file: str) -> float:
     """Get the duration of an audio file using ffmpeg."""
     cmd = ['ffmpeg', '-i', audio_file]
@@ -111,8 +132,22 @@ def process_transcription(result: Dict) -> pd.DataFrame:
     for segment in result['segments']:
         # Get speaker_id, if not exists, set to None
         speaker_id = segment.get('speaker_id', None)
-        
-        for word in segment['words']:
+
+        words = segment.get('words')
+        if not words:
+            # Some ASR backends (e.g. ElevenLabs without word-level timestamps)
+            # return segments without per-word entries. Synthesize a single
+            # word from the segment text so downstream alignment still works.
+            seg_text = (segment.get('text') or '').strip()
+            if not seg_text:
+                continue
+            words = [{
+                'word': seg_text,
+                'start': segment.get('start'),
+                'end': segment.get('end'),
+            }]
+
+        for word in words:
             # Check word length
             if len(word["word"]) > 30:
                 rprint(f"[yellow]⚠️ Warning: Detected word longer than 30 characters, skipping: {word['word']}[/yellow]")
@@ -178,4 +213,4 @@ def save_results(df: pd.DataFrame):
     rprint(f"[green]📊 Excel file saved to {_2_CLEANED_CHUNKS}[/green]")
 
 def save_language(language: str):
-    update_key("whisper.detected_language", language)
\ No newline at end of file
+    update_key("whisper.detected_language", language)
diff --git a/core/asr_backend/elevenlabs_asr.py b/core/asr_backend/elevenlabs_asr.py
index 5a4c5dda..af151b63 100644
--- a/core/asr_backend/elevenlabs_asr.py
+++ b/core/asr_backend/elevenlabs_asr.py
@@ -124,7 +124,8 @@ def transcribe_audio_elevenlabs(raw_audio_path, vocal_audio_path, start = None,
                     word['end'] += start
         
         rprint(f"[green]✓ Transcription completed in {time.time() - start_time:.2f} seconds[/green]")
-        parsed_result = elev2whisper(result)
+        # Keep word-level timestamps so downstream process_transcription has `words`.
+        parsed_result = elev2whisper(result, word_level_timestamp=True)
         os.makedirs(os.path.dirname(LOG_FILE), exist_ok=True)
         with open(LOG_FILE, "w", encoding="utf-8") as f:
             json.dump(parsed_result, f, indent=4, ensure_ascii=False)
diff --git a/core/st_utils/download_video_section.py b/core/st_utils/download_video_section.py
index 5f3023e5..c31ac909 100644
--- a/core/st_utils/download_video_section.py
+++ b/core/st_utils/download_video_section.py
@@ -1,76 +1,145 @@
 import os
 import re
 import shutil
-import subprocess
 from time import sleep
 
 import streamlit as st
-from core._1_ytdlp import download_video_ytdlp, find_video_files
+from core._1_ytdlp import download_video_ytdlp, find_media_file, write_input_manifest
 from core.utils import *
 from translations.translations import translate as t
 
 OUTPUT_DIR = "output"
 
+
+def _css_text(value):
+    return str(value).replace("\\", "\\\\").replace('"', '\\"')
+
+
+def _inject_file_uploader_i18n():
+    # Streamlit does not expose official i18n for file_uploader internals.
+    # Streamlit 1.49 DOM:
+    #   div[data-testid="stFileUploaderDropzoneInstructions"]
+    #     > span    (cloud icon, must keep)
+    #     > div     (column flex)
+    #         > span  (1st: "Drag and drop ... here")
+    #         > span  (2nd: "Limit ... · MP4, MOV ...")
+    # So we target ONLY the two direct child spans of the inner div, leaving
+    # the icon and other elements untouched.
+    drag_text = _css_text(t("Drag and drop file here"))
+    limit_text = _css_text(t("Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A"))
+    browse_text = _css_text(t("Browse files"))
+    st.markdown(
+        f"""
+        <style>
+        /* Title line */
+        div[data-testid="stFileUploaderDropzoneInstructions"] > div > span:nth-of-type(1) {{
+            font-size: 0 !important;
+            line-height: 1.4;
+        }}
+        div[data-testid="stFileUploaderDropzoneInstructions"] > div > span:nth-of-type(1)::before {{
+            content: "{drag_text}";
+            font-size: 1rem;
+        }}
+        /* Sub line (limit + accepted formats) */
+        div[data-testid="stFileUploaderDropzoneInstructions"] > div > span:nth-of-type(2) {{
+            font-size: 0 !important;
+            line-height: 1.4;
+        }}
+        div[data-testid="stFileUploaderDropzoneInstructions"] > div > span:nth-of-type(2)::before {{
+            content: "{limit_text}";
+            font-size: 0.8rem;
+        }}
+        /* Browse files button */
+        div[data-testid="stFileUploader"] button[kind="secondary"] {{
+            font-size: 0 !important;
+        }}
+        div[data-testid="stFileUploader"] button[kind="secondary"]::before {{
+            content: "{browse_text}";
+            font-size: 0.875rem;
+        }}
+        </style>
+        """,
+        unsafe_allow_html=True,
+    )
+
 def download_video_section():
     st.header(t("a. Download or Upload Video"))
     with st.container(border=True):
         try:
-            video_file = find_video_files()
-            st.video(video_file)
+            media_file, media_type = find_media_file()
+            if media_type == "video":
+                st.video(media_file)
+            else:
+                st.audio(media_file)
             if st.button(t("Delete and Reselect"), key="delete_video_button"):
-                os.remove(video_file)
+                os.remove(media_file)
                 if os.path.exists(OUTPUT_DIR):
                     shutil.rmtree(OUTPUT_DIR)
+                st.session_state.pop("_processed_upload_id", None)
                 sleep(1)
                 st.rerun()
             return True
-        except:
-            col1, col2 = st.columns([3, 1])
-            with col1:
-                url = st.text_input(t("Enter YouTube link:"))
-            with col2:
-                res_dict = {
-                    "360p": "360",
-                    "1080p": "1080",
-                    "Best": "best"
-                }
-                target_res = load_key("ytb_resolution")
-                res_options = list(res_dict.keys())
-                default_idx = list(res_dict.values()).index(target_res) if target_res in res_dict.values() else 0
-                res_display = st.selectbox(t("Resolution"), options=res_options, index=default_idx)
-                res = res_dict[res_display]
-            if st.button(t("Download Video"), key="download_button", width="stretch"):
-                if url:
-                    with st.spinner("Downloading video..."):
-                        download_video_ytdlp(url, resolution=res)
+        except ValueError as e:
+            if "No media file found" not in str(e):
+                st.error(t("Media file detection failed: {error}").replace("{error}", str(e)))
+                if st.button(t("Clear output and reselect"), key="clear_output_button"):
+                    if os.path.exists(OUTPUT_DIR):
+                        shutil.rmtree(OUTPUT_DIR)
+                    st.session_state.pop("_processed_upload_id", None)
                     st.rerun()
+                return False
+        except Exception:
+            pass
 
-            uploaded_file = st.file_uploader(t("Or upload video"), type=load_key("allowed_video_formats") + load_key("allowed_audio_formats"))
-            if uploaded_file:
-                if os.path.exists(OUTPUT_DIR):
-                    shutil.rmtree(OUTPUT_DIR)
-                os.makedirs(OUTPUT_DIR, exist_ok=True)
+        col1, col2 = st.columns([3, 1])
+        with col1:
+            url = st.text_input(t("Enter YouTube link:"))
+        with col2:
+            res_dict = {
+                "360p": "360",
+                "1080p": "1080",
+                t("Best"): "best"
+            }
+            target_res = load_key("ytb_resolution")
+            res_options = list(res_dict.keys())
+            default_idx = list(res_dict.values()).index(target_res) if target_res in res_dict.values() else 0
+            res_display = st.selectbox(t("Resolution"), options=res_options, index=default_idx)
+            res = res_dict[res_display]
+        if st.button(t("Download Video"), key="download_button", width="stretch"):
+            if url:
+                with st.spinner(t("Downloading video...")):
+                    download_video_ytdlp(url, resolution=res)
+                st.rerun()
+
+        _inject_file_uploader_i18n()
+        uploaded_file = st.file_uploader(t("Upload local media file"), type=load_key("allowed_video_formats") + load_key("allowed_audio_formats"))
+        if uploaded_file:
+            upload_id = f"{uploaded_file.name}:{uploaded_file.size}"
+            if st.session_state.get("_processed_upload_id") == upload_id:
+                try:
+                    find_media_file()
+                    st.warning(t("Upload was already processed. Delete and reselect to upload again."))
+                    return False
+                except Exception:
+                    st.session_state.pop("_processed_upload_id", None)
+
+            if os.path.exists(OUTPUT_DIR):
+                shutil.rmtree(OUTPUT_DIR)
+            os.makedirs(OUTPUT_DIR, exist_ok=True)
+
+            raw_name = uploaded_file.name.replace(' ', '_')
+            name, ext = os.path.splitext(raw_name)
+            clean_name = re.sub(r'[^\w\-_\.]', '', name) + ext.lower()
                 
-                raw_name = uploaded_file.name.replace(' ', '_')
-                name, ext = os.path.splitext(raw_name)
-                clean_name = re.sub(r'[^\w\-_\.]', '', name) + ext.lower()
-                    
-                with open(os.path.join(OUTPUT_DIR, clean_name), "wb") as f:
-                    f.write(uploaded_file.getbuffer())
+            with open(os.path.join(OUTPUT_DIR, clean_name), "wb") as f:
+                f.write(uploaded_file.getbuffer())
 
-                if ext.lower() in load_key("allowed_audio_formats"):
-                    convert_audio_to_video(os.path.join(OUTPUT_DIR, clean_name))
-                st.rerun()
-            else:
-                return False
+            media_path = os.path.join(OUTPUT_DIR, clean_name)
+            media_ext = ext.lower().lstrip(".")
+            media_type = "video" if media_ext in load_key("allowed_video_formats") else "audio"
+            write_input_manifest(media_path, media_type)
 
-def convert_audio_to_video(audio_file: str) -> str:
-    output_video = os.path.join(OUTPUT_DIR, 'black_screen.mp4')
-    if not os.path.exists(output_video):
-        print(f"🎵➡️🎬 Converting audio to video with FFmpeg ......")
-        ffmpeg_cmd = ['ffmpeg', '-y', '-f', 'lavfi', '-i', 'color=c=black:s=640x360', '-i', audio_file, '-shortest', '-c:v', 'libx264', '-c:a', 'aac', '-pix_fmt', 'yuv420p', output_video]
-        subprocess.run(ffmpeg_cmd, check=True, capture_output=True, text=True, encoding='utf-8')
-        print(f"🎵➡️🎬 Converted <{audio_file}> to <{output_video}> with FFmpeg\n")
-        # delete audio file
-        os.remove(audio_file)
-    return output_video
+            st.session_state["_processed_upload_id"] = upload_id
+            st.rerun()
+        else:
+            return False
diff --git a/core/st_utils/imports_and_utils.py b/core/st_utils/imports_and_utils.py
index c6929094..23569e92 100644
--- a/core/st_utils/imports_and_utils.py
+++ b/core/st_utils/imports_and_utils.py
@@ -26,7 +26,7 @@ def download_subtitle_zip_button(text: str):
     )
 
 # st.markdown
-give_star_button = """
+_GIVE_STAR_BUTTON_TEMPLATE = """
 <style>
     .github-button {
         display: block;
@@ -48,11 +48,15 @@ def download_subtitle_zip_button(text: str):
 </style>
 <a href="https://github.com/Huanshere/VideoLingo" target="_blank" style="text-decoration: none;">
     <div class="github-button">
-        Star on GitHub 🌟
+        __STAR_LABEL__
     </div>
 </a>
 """
 
+
+def give_star_button():
+    return _GIVE_STAR_BUTTON_TEMPLATE.replace("__STAR_LABEL__", t("Star on GitHub 🌟"))
+
 button_style = """
 <style>
 div.stButton > button:first-child {
@@ -116,4 +120,4 @@ def download_subtitle_zip_button(text: str):
     box-shadow: none !important;
 }
 </style>
-"""
\ No newline at end of file
+"""
diff --git a/core/st_utils/sidebar_setting.py b/core/st_utils/sidebar_setting.py
index a300de5b..df462cce 100644
--- a/core/st_utils/sidebar_setting.py
+++ b/core/st_utils/sidebar_setting.py
@@ -1,7 +1,6 @@
 import streamlit as st
 import requests
 from translations.translations import translate as t
-from translations.translations import DISPLAY_LANGUAGES
 from core.utils import *
 
 
@@ -52,15 +51,6 @@ def page_setting():
         unsafe_allow_html=True,
     )
 
-    display_language = st.selectbox(
-        "Display Language 🌐",
-        options=list(DISPLAY_LANGUAGES.keys()),
-        index=list(DISPLAY_LANGUAGES.values()).index(load_key("display_language")),
-    )
-    if DISPLAY_LANGUAGES[display_language] != load_key("display_language"):
-        update_key("display_language", DISPLAY_LANGUAGES[display_language])
-        st.rerun()
-
     # with st.expander(t("Youtube Settings"), expanded=True):
     #     config_input(t("Cookies Path"), "youtube.cookies_path")
 
@@ -181,6 +171,11 @@ def page_setting():
             t("WhisperX Runtime"),
             options=["local", "cloud", "elevenlabs"],
             index=["local", "cloud", "elevenlabs"].index(load_key("whisper.runtime")),
+            format_func=lambda x: {
+                "local": t("Local"),
+                "cloud": t("Cloud"),
+                "elevenlabs": t("ElevenLabs"),
+            }[x],
             help=t(
                 "Local runtime requires >8GB GPU, cloud runtime requires 302ai API key, elevenlabs runtime requires ElevenLabs API key"
             ),
@@ -191,7 +186,7 @@ def page_setting():
         if runtime == "cloud":
             config_input(t("WhisperX 302ai API"), "whisper.whisperX_302_api_key")
         if runtime == "elevenlabs":
-            config_input(("ElevenLabs API"), "whisper.elevenlabs_api_key")
+            config_input(t("ElevenLabs API"), "whisper.elevenlabs_api_key")
 
         with c2:
             target_language = st.text_input(
@@ -216,16 +211,26 @@ def page_setting():
             update_key("demucs", demucs)
             st.rerun()
 
-        burn_subtitles = st.toggle(
-            t("Burn-in Subtitles"),
-            value=load_key("burn_subtitles"),
-            help=t(
-                "Whether to burn subtitles into the video, will increase processing time"
-            ),
-        )
-        if burn_subtitles != load_key("burn_subtitles"):
-            update_key("burn_subtitles", burn_subtitles)
-            st.rerun()
+        from core._1_ytdlp import is_audio_only_input
+        audio_only = is_audio_only_input()
+        if audio_only:
+            st.toggle(
+                t("Burn-in Subtitles"),
+                value=False,
+                disabled=True,
+                help=t("Audio-only input produces subtitle files only; no video is generated."),
+            )
+        else:
+            burn_subtitles = st.toggle(
+                t("Burn-in Subtitles"),
+                value=load_key("burn_subtitles"),
+                help=t(
+                    "Whether to burn subtitles into the video, will increase processing time"
+                ),
+            )
+            if burn_subtitles != load_key("burn_subtitles"):
+                update_key("burn_subtitles", burn_subtitles)
+                st.rerun()
     with st.expander(t("Dubbing Settings"), expanded=True):
         tts_methods = [
             "azure_tts",
@@ -238,10 +243,22 @@ def page_setting():
             "sf_cosyvoice2",
             "f5tts",
         ]
+        tts_method_labels = {
+            "azure_tts": t("Azure TTS"),
+            "openai_tts": t("OpenAI TTS"),
+            "fish_tts": t("Fish TTS"),
+            "sf_fish_tts": t("SiliconFlow Fish TTS"),
+            "edge_tts": t("Edge TTS"),
+            "gpt_sovits": t("GPT-SoVITS"),
+            "custom_tts": t("Custom TTS"),
+            "sf_cosyvoice2": t("SiliconFlow CosyVoice2"),
+            "f5tts": t("F5-TTS"),
+        }
         select_tts = st.selectbox(
             t("TTS Method"),
             options=tts_methods,
             index=tts_methods.index(load_key("tts_method")),
+            format_func=lambda x: tts_method_labels[x],
         )
         if select_tts != load_key("tts_method"):
             update_key("tts_method", select_tts)
@@ -269,14 +286,14 @@ def page_setting():
                 update_key("sf_fish_tts.mode", selected_mode)
                 st.rerun()
             if selected_mode == "preset":
-                config_input("Voice", "sf_fish_tts.voice")
+                config_input(t("Voice"), "sf_fish_tts.voice")
 
         elif select_tts == "openai_tts":
-            config_input("302ai API", "openai_tts.api_key")
+            config_input(t("302ai API"), "openai_tts.api_key")
             config_input(t("OpenAI Voice"), "openai_tts.voice")
 
         elif select_tts == "fish_tts":
-            config_input("302ai API", "fish_tts.api_key")
+            config_input(t("302ai API"), "fish_tts.api_key")
             fish_tts_character = st.selectbox(
                 t("Fish TTS Character"),
                 options=list(load_key("fish_tts.character_id_dict").keys()),
@@ -289,7 +306,7 @@ def page_setting():
                 st.rerun()
 
         elif select_tts == "azure_tts":
-            config_input("302ai API", "azure_tts.api_key")
+            config_input(t("302ai API"), "azure_tts.api_key")
             config_input(t("Azure Voice"), "azure_tts.voice")
 
         elif select_tts == "gpt_sovits":
@@ -321,7 +338,7 @@ def page_setting():
             config_input(t("SiliconFlow API Key"), "sf_cosyvoice2.api_key")
 
         elif select_tts == "f5tts":
-            config_input("302ai API", "f5tts.302_api")
+            config_input(t("302ai API"), "f5tts.302_api")
 
 
 def check_api():
diff --git a/core/st_utils/task_runner.py b/core/st_utils/task_runner.py
index 8a38248f..dba8f301 100644
--- a/core/st_utils/task_runner.py
+++ b/core/st_utils/task_runner.py
@@ -40,9 +40,32 @@ class TaskRunner:
     _thread: threading.Thread | None = None
     _steps: list = field(default_factory=list)
 
+    # Class-level pointer to the currently executing runner so that long-running
+    # core functions can call ``TaskRunner.check_cancel()`` without needing a
+    # direct reference. The pointer is only meaningful inside the background
+    # thread that ``start()`` launches.
+    _current: "TaskRunner | None" = None
+
     def __post_init__(self):
         self._pause_event.set()  # not paused initially
 
+    # ------ Cancellation helpers (called from core code) ------
+
+    @classmethod
+    def check_cancel(cls) -> None:
+        """Block while paused and raise :class:`StopTask` if a stop was requested.
+
+        Safe to call from any thread; becomes a no-op when no runner is active
+        (e.g. when core scripts are invoked from the CLI).
+        """
+        runner = cls._current
+        if runner is None:
+            return
+        # Block while paused so long loops freeze on pause too.
+        runner._pause_event.wait()
+        if runner._stop_event.is_set():
+            raise StopTask()
+
     # ------ Singleton per session_state ------
     @staticmethod
     def get(session_state, key: str = "_task_runner") -> "TaskRunner":
@@ -121,6 +144,7 @@ def progress(self) -> float:
 
     def _run(self):
         """Execute steps sequentially in background thread."""
+        type(self)._current = self
         try:
             for i, (label, func) in enumerate(self._steps):
                 # Check stop before each step
@@ -141,7 +165,12 @@ def _run(self):
                 func()
 
             self.state = "completed"
+        except StopTask:
+            self.state = "stopped"
         except Exception as e:
             self.error_msg = str(e)
             self.state = "error"
             traceback.print_exc()
+        finally:
+            if type(self)._current is self:
+                type(self)._current = None
diff --git a/core/translate_lines.py b/core/translate_lines.py
index fd03da4c..11c0fe63 100644
--- a/core/translate_lines.py
+++ b/core/translate_lines.py
@@ -19,6 +19,7 @@ def valid_translate_result(result: dict, required_keys: list, required_sub_keys:
     return {"status": "success", "message": "Translation completed"}
 
 def translate_lines(lines, previous_content_prompt, after_cotent_prompt, things_to_note_prompt, summary_prompt, index = 0):
+    check_cancel()
     shared_prompt = generate_shared_prompt(previous_content_prompt, after_cotent_prompt, summary_prompt, things_to_note_prompt)
 
     # Retry translation if the length of the original text and the translated text are not the same, or if the specified key is missing
diff --git a/core/utils/__init__.py b/core/utils/__init__.py
index b864f4e4..2dc062eb 100644
--- a/core/utils/__init__.py
+++ b/core/utils/__init__.py
@@ -7,4 +7,27 @@
 except ImportError:
     pass
 
-__all__ = ["ask_gpt", "except_handler", "check_file_exists", "load_key", "update_key", "rprint", "get_joiner"]
\ No newline at end of file
+
+def check_cancel():
+    """Cooperative cancellation hook for long-running core loops.
+
+    Imports lazily to avoid coupling core scripts to the Streamlit-side
+    TaskRunner. Becomes a no-op when no runner is active (CLI usage).
+    """
+    try:
+        from core.st_utils.task_runner import TaskRunner
+    except Exception:
+        return
+    TaskRunner.check_cancel()
+
+
+__all__ = [
+    "ask_gpt",
+    "except_handler",
+    "check_file_exists",
+    "load_key",
+    "update_key",
+    "rprint",
+    "get_joiner",
+    "check_cancel",
+]
\ No newline at end of file
diff --git a/core/utils/models.py b/core/utils/models.py
index 8e15c829..73a358d5 100644
--- a/core/utils/models.py
+++ b/core/utils/models.py
@@ -25,6 +25,13 @@
 _AUDIO_SEGS_DIR = "output/audio/segs"
 _AUDIO_TMP_DIR = "output/audio/tmp"
 
+# ------------------------------------------
+# Done markers (written by st.py task runner after a stage finishes
+# cleanly; absence implies the stage did not complete).
+# ------------------------------------------
+_TEXT_DONE_MARKER = "output/.subtitle_done"
+_AUDIO_DONE_MARKER = "output/.dubbing_done"
+
 # ------------------------------------------
 # 导出
 # ------------------------------------------
@@ -45,5 +52,7 @@
     "_BACKGROUND_AUDIO_FILE",
     "_AUDIO_REFERS_DIR",
     "_AUDIO_SEGS_DIR",
-    "_AUDIO_TMP_DIR"
+    "_AUDIO_TMP_DIR",
+    "_TEXT_DONE_MARKER",
+    "_AUDIO_DONE_MARKER",
 ]
diff --git a/core/utils/onekeycleanup.py b/core/utils/onekeycleanup.py
index 286a8e76..f5199bcc 100644
--- a/core/utils/onekeycleanup.py
+++ b/core/utils/onekeycleanup.py
@@ -1,13 +1,12 @@
 import os
 import glob
-from core._1_ytdlp import find_video_files
+from core._1_ytdlp import find_media_file
 import shutil
 
 def cleanup(history_dir="history"):
-    # Get video file name
-    video_file = find_video_files()
-    video_name = video_file.split("/")[1]
-    video_name = os.path.splitext(video_name)[0]
+    # Get input media file name
+    media_file, _ = find_media_file()
+    video_name = os.path.splitext(os.path.basename(media_file))[0]
     video_name = sanitize_filename(video_name)
     
     # Create required folders
@@ -77,4 +76,4 @@ def sanitize_filename(filename):
     return filename
 
 if __name__ == "__main__":
-    cleanup()
\ No newline at end of file
+    cleanup()
diff --git a/setup.py b/setup.py
index 07bad53e..b1021893 100644
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,7 @@
 from setuptools import setup, find_packages
 
 NAME = 'VideoLingo'
-VERSION = '3.0.0'
+VERSION = '3.0.2'
 
 with open('requirements.txt', encoding='utf-8') as f:
     requirements = f.read().splitlines()
diff --git a/st.py b/st.py
index b8486e94..992534c2 100644
--- a/st.py
+++ b/st.py
@@ -14,6 +14,8 @@ def _configure_utf8_console():
 from core.st_utils.imports_and_utils import *
 from core.st_utils.task_runner import TaskRunner
 from core import *
+from translations.translations import DISPLAY_LANGUAGES, init_display_language, set_display_language
+from core.utils.models import _TEXT_DONE_MARKER, _AUDIO_DONE_MARKER
 
 # SET PATH
 current_dir = os.path.dirname(os.path.abspath(__file__))
@@ -103,6 +105,20 @@ def _task_control_panel(runner_key: str):
 # ─── Text processing ───
 
 
+def _touch(path):
+    os.makedirs(os.path.dirname(path) or ".", exist_ok=True)
+    with open(path, "w", encoding="utf-8") as f:
+        f.write("")
+
+
+def _clear_path(path):
+    if os.path.exists(path):
+        try:
+            os.remove(path)
+        except OSError:
+            pass
+
+
 def _get_text_steps():
     """Return the subtitle processing steps as (label, callable) list."""
     steps = [
@@ -129,15 +145,90 @@ def _get_text_steps():
             t("Merging subtitles into the video"),
             _7_sub_into_vid.merge_subtitles_to_video,
         ),
+        (
+            t("Finalize subtitle outputs"),
+            lambda: _touch(_TEXT_DONE_MARKER),
+        ),
     ]
     return steps
 
 
+def _subtitle_length_controls():
+    """Render inline controls for the two subtitle-length tunables.
+
+    Both values live in config.yaml and are read by:
+      - max_split_length    → core/_3_2_split_meaning.py (first pass NLP cut)
+      - subtitle.max_length → core/_5_split_sub.py       (final subtitle line)
+    """
+    DEFAULT_MAX_SPLIT_LENGTH = 20
+    DEFAULT_MAX_LENGTH = 75
+    MAX_LENGTH_KEY = "subtitle.max_length"
+
+    with st.expander(t("Subtitle length tuning"), expanded=False):
+        st.caption(
+            t(
+                "These two values control how subtitles are cut. "
+                "Smaller = more, shorter lines. Larger = fewer, longer lines."
+            )
+        )
+
+        new_max_split = st.number_input(
+            t("max_split_length (rough cut, words/tokens per chunk)"),
+            min_value=8,
+            max_value=60,
+            value=int(load_key("max_split_length")),
+            step=1,
+            help=t(
+                "Suggested: 18-25. Below 18 cuts too finely and hurts translation; "
+                "above 25 makes downstream subtitle splitting hard to align."
+            ),
+            key="cfg_max_split_length",
+            width=220,
+        )
+        new_max_length = st.number_input(
+            t("max_length (max characters per subtitle line)"),
+            min_value=20,
+            max_value=200,
+            value=int(load_key(MAX_LENGTH_KEY)),
+            step=1,
+            help=t(
+                "Suggested: 50-90. Lower if a subtitle line looks crowded on screen; "
+                "raise if lines are split too aggressively."
+            ),
+            key="cfg_max_length",
+            width=220,
+        )
+
+        changed = False
+        if new_max_split != load_key("max_split_length"):
+            update_key("max_split_length", int(new_max_split))
+            changed = True
+        if new_max_length != load_key(MAX_LENGTH_KEY):
+            update_key(MAX_LENGTH_KEY, int(new_max_length))
+            changed = True
+
+        if st.button(
+            t("Restore defaults ({split}/{length})")
+            .replace("{split}", str(DEFAULT_MAX_SPLIT_LENGTH))
+            .replace("{length}", str(DEFAULT_MAX_LENGTH)),
+            key="restore_subtitle_length_defaults",
+        ):
+            update_key("max_split_length", DEFAULT_MAX_SPLIT_LENGTH)
+            update_key(MAX_LENGTH_KEY, DEFAULT_MAX_LENGTH)
+            st.rerun()
+
+        if changed:
+            st.rerun()
+
+
 def text_processing_section():
     st.header(t("b. Translate and Generate Subtitles"))
     runner = TaskRunner.get(st.session_state, "_text_runner")
+    from core._1_ytdlp import is_audio_only_input
+    audio_only = is_audio_only_input()
 
     with st.container(border=True):
+        final_text_step = t("Generate subtitle files") if audio_only else t("Merging subtitles into the video")
         st.markdown(
             f"""
         <p style='font-size: 20px;'>
@@ -147,26 +238,33 @@ def text_processing_section():
             2. {t("Sentence segmentation using NLP and LLM")}<br>
             3. {t("Summarization and multi-step translation")}<br>
             4. {t("Cutting and aligning long subtitles")}<br>
-            5. {t("Generating timeline and subtitles")}<br>
-            6. {t("Merging subtitles into the video")}
+            5. {final_text_step}
         """,
             unsafe_allow_html=True,
         )
 
-        if not os.path.exists(SUB_VIDEO):
+        text_done = os.path.exists(_TEXT_DONE_MARKER) or (
+            os.path.exists("output/trans.srt")
+            and os.path.exists("output/src.srt")
+            and (audio_only or os.path.exists(SUB_VIDEO))
+        )
+
+        if not text_done:
             if runner.is_active:
                 _task_control_panel("_text_runner")
             elif runner.is_done:
                 _task_control_panel("_text_runner")
             else:
+                _subtitle_length_controls()
                 if st.button(
                     t("Start Processing Subtitles"), key="text_processing_button"
                 ):
+                    _clear_path(_TEXT_DONE_MARKER)
                     steps = _get_text_steps()
                     runner.start(steps)
                     st.rerun()
         else:
-            if load_key("burn_subtitles"):
+            if not audio_only and load_key("burn_subtitles") and os.path.exists(SUB_VIDEO):
                 st.video(SUB_VIDEO)
             download_subtitle_zip_button(text=t("Download All Srt Files"))
 
@@ -193,11 +291,17 @@ def _get_audio_steps():
         (t("Generate and merge audio files"), _10_gen_audio.gen_audio),
         (t("Merge full audio"), _11_merge_audio.merge_full_audio),
         (t("Merge final audio into video"), _12_dub_to_vid.merge_video_audio),
+        (
+            t("Finalize dubbing outputs"),
+            lambda: _touch(_AUDIO_DONE_MARKER),
+        ),
     ]
     return steps
 
 
 def audio_processing_section():
+    from core._1_ytdlp import is_audio_only_input
+    audio_only = is_audio_only_input()
     st.header(t("c. Dubbing"))
     runner = TaskRunner.get(st.session_state, "_audio_runner")
 
@@ -210,12 +314,17 @@ def audio_processing_section():
             1. {t("Generate audio tasks and chunks")}<br>
             2. {t("Extract reference audio")}<br>
             3. {t("Generate and merge audio files")}<br>
-            4. {t("Merge final audio into video")}
+            4. {t("Merge full audio")}<br>
+            5. {t("Merge final audio into video")}
         """,
             unsafe_allow_html=True,
         )
 
-        if not os.path.exists(DUB_VIDEO):
+        audio_done = os.path.exists(_AUDIO_DONE_MARKER) or (
+            os.path.exists("output/dub.mp3")
+            and (audio_only or os.path.exists(DUB_VIDEO))
+        )
+        if not audio_done:
             if runner.is_active:
                 _task_control_panel("_audio_runner")
             elif runner.is_done:
@@ -224,6 +333,7 @@ def audio_processing_section():
                 if st.button(
                     t("Start Audio Processing"), key="audio_processing_button"
                 ):
+                    _clear_path(_AUDIO_DONE_MARKER)
                     steps = _get_audio_steps()
                     runner.start(steps)
                     st.rerun()
@@ -233,9 +343,10 @@ def audio_processing_section():
                     "Audio processing is complete! You can check the audio files in the `output` folder."
                 )
             )
-            if load_key("burn_subtitles"):
+            if not audio_only and load_key("burn_subtitles") and os.path.exists(DUB_VIDEO):
                 st.video(DUB_VIDEO)
             if st.button(t("Delete dubbing files"), key="delete_dubbing_files"):
+                _clear_path(_AUDIO_DONE_MARKER)
                 delete_dubbing_files()
                 st.rerun()
             if st.button(t("Archive to 'history'"), key="cleanup_in_audio_processing"):
@@ -247,9 +358,26 @@ def audio_processing_section():
 
 
 def main():
-    logo_col, _ = st.columns([1, 1])
+    init_display_language()
+    st.set_option("client.toolbarMode", "viewer")
+
+    logo_col, lang_col = st.columns([3, 1])
     with logo_col:
         st.image("docs/logo.png", width="stretch")
+    with lang_col:
+        language_values = list(DISPLAY_LANGUAGES.values())
+        current_language = init_display_language()
+        selected_language = st.selectbox(
+            t("Display Language 🌐"),
+            options=list(DISPLAY_LANGUAGES.keys()),
+            index=language_values.index(current_language) if current_language in language_values else 0,
+            key="display_language_selector",
+        )
+        new_language = DISPLAY_LANGUAGES[selected_language]
+        if new_language != current_language:
+            set_display_language(new_language)
+            st.rerun()
+
     st.markdown(button_style, unsafe_allow_html=True)
     welcome_text = t(
         'Hello, welcome to VideoLingo. If you encounter any issues, feel free to get instant answers with our Free QA Agent <a href="https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh" target="_blank">here</a>! You can also try out our SaaS website at <a href="https://videolingo.io" target="_blank">videolingo.io</a> for free!'
@@ -261,10 +389,12 @@ def main():
     # add settings
     with st.sidebar:
         page_setting()
-        st.markdown(give_star_button, unsafe_allow_html=True)
-    download_video_section()
-    text_processing_section()
-    audio_processing_section()
+        st.markdown(give_star_button(), unsafe_allow_html=True)
+    if download_video_section():
+        text_done = text_processing_section()
+        from core._1_ytdlp import is_audio_only_input
+        if text_done and not is_audio_only_input():
+            audio_processing_section()
 
 
 if __name__ == "__main__":
diff --git a/translations/en.json b/translations/en.json
index 3a99c0de..21a90b62 100644
--- a/translations/en.json
+++ b/translations/en.json
@@ -122,5 +122,42 @@
     "Task completed!": "Task completed!",
     "Task stopped": "Task stopped",
     "Task error": "Task error",
-    "OK": "OK"
+    "OK": "OK",
+    "Display Language 🌐": "Display Language 🌐",
+    "Downloading video...": "Downloading video...",
+    "Best": "Best",
+    "Upload local media file": "Upload local media file",
+    "Upload was already processed. Delete and reselect to upload again.": "Upload was already processed. Delete and reselect to upload again.",
+    "Audio-only input produces subtitle files only; no video is generated.": "Audio-only input produces subtitle files only; no video is generated.",
+    "Local": "Local",
+    "Cloud": "Cloud",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "ElevenLabs API",
+    "Voice": "Voice",
+    "302ai API": "302ai API",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "SiliconFlow Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "Custom TTS",
+    "SiliconFlow CosyVoice2": "SiliconFlow CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "Star on GitHub 🌟",
+    "Generate subtitle files": "Generate subtitle files",
+    "Drag and drop file here": "Drag and drop file here",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A",
+    "Browse files": "Browse files",
+    "Media file detection failed: {error}": "Media file detection failed: {error}",
+    "Clear output and reselect": "Clear output and reselect",
+    "Finalize subtitle outputs": "Finalize subtitle outputs",
+    "Finalize dubbing outputs": "Finalize dubbing outputs",
+    "Subtitle length tuning": "Subtitle length tuning",
+    "These two values control how subtitles are cut. Smaller = more, shorter lines. Larger = fewer, longer lines.": "These two values control how subtitles are cut. Smaller = more, shorter lines. Larger = fewer, longer lines.",
+    "max_split_length (rough cut, words/tokens per chunk)": "max_split_length (rough cut, words/tokens per chunk)",
+    "Suggested: 18-25. Below 18 cuts too finely and hurts translation; above 25 makes downstream subtitle splitting hard to align.": "Suggested: 18-25. Below 18 cuts too finely and hurts translation; above 25 makes downstream subtitle splitting hard to align.",
+    "max_length (max characters per subtitle line)": "max_length (max characters per subtitle line)",
+    "Suggested: 50-90. Lower if a subtitle line looks crowded on screen; raise if lines are split too aggressively.": "Suggested: 50-90. Lower if a subtitle line looks crowded on screen; raise if lines are split too aggressively.",
+    "Restore defaults ({split}/{length})": "Restore defaults ({split}/{length})"
 }
diff --git a/translations/es.json b/translations/es.json
index 469fba84..6d79e172 100644
--- a/translations/es.json
+++ b/translations/es.json
@@ -122,5 +122,35 @@
     "Task completed!": "¡Tarea completada!",
     "Task stopped": "Tarea detenida",
     "Task error": "Error en la tarea",
-    "OK": "OK"
+    "OK": "OK",
+    "Display Language 🌐": "Idioma de visualización 🌐",
+    "Downloading video...": "Descargando video...",
+    "Best": "Mejor",
+    "Upload local media file": "Subir archivo local de audio o video",
+    "Upload was already processed. Delete and reselect to upload again.": "La subida ya fue procesada. Elimine y vuelva a seleccionar para subir de nuevo.",
+    "Audio-only input produces subtitle files only; no video is generated.": "La entrada solo de audio produce únicamente archivos de subtítulos; no se genera video.",
+    "Local": "Local",
+    "Cloud": "Nube",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "API de ElevenLabs",
+    "Voice": "Voz",
+    "302ai API": "API de 302ai",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "SiliconFlow Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "TTS personalizado",
+    "SiliconFlow CosyVoice2": "SiliconFlow CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "Dar estrella en GitHub 🌟",
+    "Generate subtitle files": "Generar archivos de subtítulos",
+    "Drag and drop file here": "Arrastre y suelte el archivo aquí",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "Límite 4GB por archivo · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A",
+    "Browse files": "Buscar archivos",
+    "Media file detection failed: {error}": "Error al detectar el archivo multimedia: {error}",
+    "Clear output and reselect": "Limpiar salida y volver a seleccionar",
+    "Finalize subtitle outputs": "Finalizar resultados de subtítulos",
+    "Finalize dubbing outputs": "Finalizar resultados de doblaje"
 }
diff --git a/translations/fr.json b/translations/fr.json
index dcdac7af..d05f48c6 100644
--- a/translations/fr.json
+++ b/translations/fr.json
@@ -122,5 +122,35 @@
     "Task completed!": "Tâche terminée !",
     "Task stopped": "Tâche arrêtée",
     "Task error": "Erreur de tâche",
-    "OK": "OK"
+    "OK": "OK",
+    "Display Language 🌐": "Langue d'affichage 🌐",
+    "Downloading video...": "Téléchargement de la vidéo...",
+    "Best": "Meilleure",
+    "Upload local media file": "Importer un fichier audio ou vidéo local",
+    "Upload was already processed. Delete and reselect to upload again.": "Ce fichier importé a déjà été traité. Supprimez-le et sélectionnez-le à nouveau pour le réimporter.",
+    "Audio-only input produces subtitle files only; no video is generated.": "Une entrée audio seule produit uniquement des fichiers de sous-titres ; aucune vidéo n'est générée.",
+    "Local": "Local",
+    "Cloud": "Cloud",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "API ElevenLabs",
+    "Voice": "Voix",
+    "302ai API": "API 302ai",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "SiliconFlow Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "TTS personnalisé",
+    "SiliconFlow CosyVoice2": "SiliconFlow CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "Mettre une étoile sur GitHub 🌟",
+    "Generate subtitle files": "Générer les fichiers de sous-titres",
+    "Drag and drop file here": "Glissez-déposez le fichier ici",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "Limite de 4GB par fichier · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A",
+    "Browse files": "Parcourir les fichiers",
+    "Media file detection failed: {error}": "Échec de la détection du fichier média : {error}",
+    "Clear output and reselect": "Effacer la sortie et resélectionner",
+    "Finalize subtitle outputs": "Finaliser les sorties de sous-titres",
+    "Finalize dubbing outputs": "Finaliser les sorties de doublage"
 }
diff --git a/translations/ja.json b/translations/ja.json
index 4595ba97..8afc3d5a 100644
--- a/translations/ja.json
+++ b/translations/ja.json
@@ -122,5 +122,35 @@
     "Task completed!": "タスク完了！",
     "Task stopped": "タスクが停止されました",
     "Task error": "タスクエラー",
-    "OK": "OK"
+    "OK": "OK",
+    "Display Language 🌐": "表示言語 🌐",
+    "Downloading video...": "動画をダウンロード中...",
+    "Best": "最高品質",
+    "Upload local media file": "ローカルの音声/動画ファイルをアップロード",
+    "Upload was already processed. Delete and reselect to upload again.": "このアップロードはすでに処理済みです。再アップロードするには削除して選択し直してください。",
+    "Audio-only input produces subtitle files only; no video is generated.": "音声のみの入力では字幕ファイルだけを生成し、動画は生成しません。",
+    "Local": "ローカル",
+    "Cloud": "クラウド",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "ElevenLabs API",
+    "Voice": "ボイス",
+    "302ai API": "302ai API",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "SiliconFlow Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "カスタム TTS",
+    "SiliconFlow CosyVoice2": "SiliconFlow CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "GitHubでスター 🌟",
+    "Generate subtitle files": "字幕ファイルを生成",
+    "Drag and drop file here": "ここにファイルをドラッグ＆ドロップ",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "1ファイル4GBまで · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A",
+    "Browse files": "ファイルを選択",
+    "Media file detection failed: {error}": "メディアファイルの検出に失敗しました: {error}",
+    "Clear output and reselect": "出力をクリアして再選択",
+    "Finalize subtitle outputs": "字幕出力を仕上げる",
+    "Finalize dubbing outputs": "吹き替え出力を仕上げる"
 }
diff --git a/translations/ru.json b/translations/ru.json
index ec7e97fe..bb27367a 100644
--- a/translations/ru.json
+++ b/translations/ru.json
@@ -122,5 +122,35 @@
     "Task completed!": "Задача выполнена!",
     "Task stopped": "Задача остановлена",
     "Task error": "Ошибка задачи",
-    "OK": "OK"
+    "OK": "OK",
+    "Display Language 🌐": "Язык интерфейса 🌐",
+    "Downloading video...": "Загрузка видео...",
+    "Best": "Лучшее",
+    "Upload local media file": "Загрузить локальный аудио- или видеофайл",
+    "Upload was already processed. Delete and reselect to upload again.": "Этот загруженный файл уже обработан. Удалите и выберите его снова для повторной загрузки.",
+    "Audio-only input produces subtitle files only; no video is generated.": "При аудио-вводе создаются только файлы субтитров; видео не создается.",
+    "Local": "Локально",
+    "Cloud": "Облако",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "API ElevenLabs",
+    "Voice": "Голос",
+    "302ai API": "API 302ai",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "SiliconFlow Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "Пользовательский TTS",
+    "SiliconFlow CosyVoice2": "SiliconFlow CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "Поставить звезду на GitHub 🌟",
+    "Generate subtitle files": "Создать файлы субтитров",
+    "Drag and drop file here": "Перетащите файл сюда",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "Лимит 4GB на файл · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A",
+    "Browse files": "Выбрать файлы",
+    "Media file detection failed: {error}": "Не удалось определить медиафайл: {error}",
+    "Clear output and reselect": "Очистить output и выбрать заново",
+    "Finalize subtitle outputs": "Финализировать вывод субтитров",
+    "Finalize dubbing outputs": "Финализировать вывод дубляжа"
 }
diff --git a/translations/translations.py b/translations/translations.py
index 01e10186..e3538fff 100644
--- a/translations/translations.py
+++ b/translations/translations.py
@@ -10,6 +10,93 @@
     "🇫🇷 Français": "fr",
 }
 
+SUPPORTED_LANGUAGES = set(DISPLAY_LANGUAGES.values())
+
+
+def normalize_language_code(language):
+    if not language:
+        return None
+
+    code = str(language).replace("_", "-").lower()
+    if code in {"zh", "zh-cn", "zh-hans", "zh-sg"}:
+        return "zh-CN"
+    if code in {"zh-hk", "zh-tw", "zh-mo", "zh-hant"}:
+        return "zh-HK"
+
+    base_code = code.split("-")[0]
+    if base_code in SUPPORTED_LANGUAGES:
+        return base_code
+    return None
+
+
+def _language_from_accept_language(header):
+    for item in (header or "").split(","):
+        code = normalize_language_code(item.split(";")[0].strip())
+        if code:
+            return code
+    return None
+
+
+def _config_language():
+    try:
+        from core.utils.config_utils import load_key
+
+        return normalize_language_code(load_key("display_language"))
+    except Exception:
+        return None
+
+
+def _streamlit_language():
+    try:
+        import streamlit as st
+
+        if "_display_language" in st.session_state:
+            return normalize_language_code(st.session_state["_display_language"])
+
+        query_language = st.query_params.get("lang")
+        if isinstance(query_language, list):
+            query_language = query_language[0] if query_language else None
+        query_language = normalize_language_code(query_language)
+        if query_language:
+            return query_language
+
+        return _language_from_accept_language(st.context.headers.get("accept-language", ""))
+    except Exception:
+        return None
+
+
+def get_current_language(default="en"):
+    return _streamlit_language() or _config_language() or default
+
+
+def init_display_language():
+    language = get_current_language(default="en")
+    try:
+        import streamlit as st
+
+        st.session_state.setdefault("_display_language", language)
+    except Exception:
+        pass
+    return language
+
+
+def set_display_language(language):
+    language = normalize_language_code(language) or "en"
+    try:
+        import streamlit as st
+
+        st.session_state["_display_language"] = language
+        st.query_params["lang"] = language
+    except Exception:
+        pass
+    try:
+        from core.utils.config_utils import update_key
+
+        update_key("display_language", language)
+    except Exception:
+        pass
+    return language
+
 # Load the language file based on user selection
 def load_translations(language="en"):
     with open(f'translations/{language}.json', 'r', encoding='utf-8') as file:
@@ -17,9 +104,8 @@ def load_translations(language="en"):
 
 # Function to fetch the translation
 def translate(key):
-    from core.utils.config_utils import load_key
     try:
-        display_language = load_key("display_language")
+        display_language = get_current_language()
         translations = load_translations(display_language)
         translation = translations.get(key)
         if translation is None:
diff --git a/translations/zh-CN.json b/translations/zh-CN.json
index 052ab034..df77f89c 100644
--- a/translations/zh-CN.json
+++ b/translations/zh-CN.json
@@ -75,7 +75,7 @@
     "Merge full audio": "合并完整音频",
     "Merge dubbing to the video": "将配音合并到视频中",
     "Audio processing complete! 🎇": "音频处理完成! 🎇",
-    "Hello, welcome to VideoLingo. If you encounter any issues, feel free to get instant answers with our Free QA Agent <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a>! You can also try out our SaaS website at <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a> for free!": "欢迎来到VideoLingo。如果遇到任何问题，随时可以通过我们的免费问答助手 <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a> 获取即时解答！还可以免费试用我们的SaaS网站 <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a>！",
+    "Hello, welcome to VideoLingo. If you encounter any issues, feel free to get instant answers with our Free QA Agent <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a>! You can also try out our SaaS website at <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a> for free!": "欢迎来到VideoLingo。如果遇到任何问题，随时可以通过我们的免费问答助手 <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">这里</a> 获取即时解答！还可以免费试用我们的SaaS网站 <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a>！",
     "WhisperX Runtime": "WhisperX 运行环境",
     "Local runtime requires >8GB GPU, cloud runtime requires 302ai API key, elevenlabs runtime requires ElevenLabs API key": "本地运行需要>8GB显存GPU，云端运行需要302ai API密钥，elevenlabs运行需要ElevenLabs API密钥",
     "WhisperX 302ai API": "WhisperX 302ai API密钥",
@@ -122,5 +122,42 @@
     "Task completed!": "任务完成！",
     "Task stopped": "任务已停止",
     "Task error": "任务出错",
-    "OK": "确定"
+    "OK": "确定",
+    "Display Language 🌐": "显示语言 🌐",
+    "Downloading video...": "正在下载视频...",
+    "Best": "最佳",
+    "Upload local media file": "上传本地音视频文件",
+    "Upload was already processed. Delete and reselect to upload again.": "这个上传文件已经处理过。如需重新上传，请先删除并重新选择。",
+    "Audio-only input produces subtitle files only; no video is generated.": "纯音频输入只生成字幕文件，不会生成视频。",
+    "Local": "本地",
+    "Cloud": "云端",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "ElevenLabs API密钥",
+    "Voice": "声音",
+    "302ai API": "302ai API密钥",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "硅基流动 Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "自定义 TTS",
+    "SiliconFlow CosyVoice2": "硅基流动 CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "在 GitHub 点星 🌟",
+    "Generate subtitle files": "生成字幕文件",
+    "Drag and drop file here": "将文件拖到这里",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "单个文件限制 4GB · MP4、MOV、AVI、MKV、FLV、WMV、WEBM、WAV、MP3、FLAC、M4A",
+    "Browse files": "浏览文件",
+    "Media file detection failed: {error}": "媒体文件识别失败：{error}",
+    "Clear output and reselect": "清空输出并重新选择",
+    "Finalize subtitle outputs": "完成字幕产出收尾",
+    "Finalize dubbing outputs": "完成配音产出收尾",
+    "Subtitle length tuning": "字幕长度微调",
+    "These two values control how subtitles are cut. Smaller = more, shorter lines. Larger = fewer, longer lines.": "这两个值控制字幕怎么切。值越小，断句越多、每条越短；值越大，断句越少、每条越长。",
+    "max_split_length (rough cut, words/tokens per chunk)": "max_split_length（粗切，每段词/Token 数）",
+    "Suggested: 18-25. Below 18 cuts too finely and hurts translation; above 25 makes downstream subtitle splitting hard to align.": "建议 18-25。小于 18 会切得太碎，影响翻译质量；大于 25 会让后续字幕拆分难对齐。",
+    "max_length (max characters per subtitle line)": "max_length（每行字幕字符数上限）",
+    "Suggested: 50-90. Lower if a subtitle line looks crowded on screen; raise if lines are split too aggressively.": "建议 50-90。如果一行字幕看着挤就调小；如果一句话被拆得太碎就调大。",
+    "Restore defaults ({split}/{length})": "恢复默认值（{split}/{length}）"
 }
diff --git a/translations/zh-HK.json b/translations/zh-HK.json
index cdb4e012..56d2ea50 100644
--- a/translations/zh-HK.json
+++ b/translations/zh-HK.json
@@ -75,7 +75,7 @@
     "Merge full audio": "合併完整音頻",
     "Merge dubbing to the video": "將配音合併到影片中",
     "Audio processing complete! 🎇": "音頻處理完成! 🎇",
-    "Hello, welcome to VideoLingo. If you encounter any issues, feel free to get instant answers with our Free QA Agent <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a>! You can also try out our SaaS website at <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a> for free!": "歡迎來到VideoLingo。如果遇到任何問題，隨時可以透過我們的免費問答助手 <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a> 獲取即時解答！還可以免費試用我們的SaaS網站 <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a>！",
+    "Hello, welcome to VideoLingo. If you encounter any issues, feel free to get instant answers with our Free QA Agent <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">here</a>! You can also try out our SaaS website at <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a> for free!": "歡迎來到VideoLingo。如果遇到任何問題，隨時可以透過我們的免費問答助手 <a href=\"https://share.fastgpt.in/chat/share?shareId=066w11n3r9aq6879r4z0v9rh\" target=\"_blank\">這裡</a> 獲取即時解答！還可以免費試用我們的SaaS網站 <a href=\"https://videolingo.io\" target=\"_blank\">videolingo.io</a>！",
     "WhisperX Runtime": "WhisperX 運行環境",
     "Local runtime requires >8GB GPU, cloud runtime requires 302ai API key, elevenlabs runtime requires ElevenLabs API key": "本地運行需要>8GB顯存GPU，雲端運行需要302ai API金鑰，elevenlabs運行需要ElevenLabs API金鑰",
     "WhisperX 302ai API": "WhisperX 302ai API金鑰",
@@ -122,5 +122,42 @@
     "Task completed!": "任務完成！",
     "Task stopped": "任務已停止",
     "Task error": "任務出錯",
-    "OK": "確定"
+    "OK": "確定",
+    "Display Language 🌐": "顯示語言 🌐",
+    "Downloading video...": "正在下載影片...",
+    "Best": "最佳",
+    "Upload local media file": "上傳本地音影片檔案",
+    "Upload was already processed. Delete and reselect to upload again.": "這個上傳檔案已經處理過。如需重新上傳，請先刪除並重新選擇。",
+    "Audio-only input produces subtitle files only; no video is generated.": "純音頻輸入只生成字幕檔案，不會生成影片。",
+    "Local": "本地",
+    "Cloud": "雲端",
+    "ElevenLabs": "ElevenLabs",
+    "ElevenLabs API": "ElevenLabs API金鑰",
+    "Voice": "聲音",
+    "302ai API": "302ai API金鑰",
+    "Azure TTS": "Azure TTS",
+    "OpenAI TTS": "OpenAI TTS",
+    "Fish TTS": "Fish TTS",
+    "SiliconFlow Fish TTS": "矽基流動 Fish TTS",
+    "Edge TTS": "Edge TTS",
+    "GPT-SoVITS": "GPT-SoVITS",
+    "Custom TTS": "自訂 TTS",
+    "SiliconFlow CosyVoice2": "矽基流動 CosyVoice2",
+    "F5-TTS": "F5-TTS",
+    "Star on GitHub 🌟": "在 GitHub 點星 🌟",
+    "Generate subtitle files": "生成字幕檔案",
+    "Drag and drop file here": "將檔案拖到這裡",
+    "Limit 4GB per file · MP4, MOV, AVI, MKV, FLV, WMV, WEBM, WAV, MP3, FLAC, M4A": "單個檔案限制 4GB · MP4、MOV、AVI、MKV、FLV、WMV、WEBM、WAV、MP3、FLAC、M4A",
+    "Browse files": "瀏覽檔案",
+    "Media file detection failed: {error}": "媒體檔案識別失敗：{error}",
+    "Clear output and reselect": "清空輸出並重新選擇",
+    "Finalize subtitle outputs": "完成字幕產出收尾",
+    "Finalize dubbing outputs": "完成配音產出收尾",
+    "Subtitle length tuning": "字幕長度微調",
+    "These two values control how subtitles are cut. Smaller = more, shorter lines. Larger = fewer, longer lines.": "這兩個值控制字幕怎麼切。值越小，斷句越多、每條越短；值越大，斷句越少、每條越長。",
+    "max_split_length (rough cut, words/tokens per chunk)": "max_split_length（粗切，每段詞/Token 數）",
+    "Suggested: 18-25. Below 18 cuts too finely and hurts translation; above 25 makes downstream subtitle splitting hard to align.": "建議 18-25。小於 18 會切得太碎，影響翻譯品質；大於 25 會讓後續字幕拆分難對齊。",
+    "max_length (max characters per subtitle line)": "max_length（每行字幕字元數上限）",
+    "Suggested: 50-90. Lower if a subtitle line looks crowded on screen; raise if lines are split too aggressively.": "建議 50-90。如果一行字幕看著擠就調小；如果一句話被拆得太碎就調大。",
+    "Restore defaults ({split}/{length})": "恢復預設值（{split}/{length}）"
 }

From e0f39bdca56fcd556adc5d4b0b5c70051b01136e Mon Sep 17 00:00:00 2001
From: Alex Liang <leungpuingai@gmail.com>
Date: Thu, 11 Jun 2026 15:38:50 +0800
Subject: [PATCH 2/2] feat: v3.0.3 installer refactor and shared env
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

English:
- Add stage-based installer.py with resumable dependency setup and health checks
- Support shared uv venv via setup_env.py and OneKeyStart.bat
- Make OneKeyStart.bat run pre-launch checks and repair incomplete environments
- Relax spaCy patch pin and make Demucs installation more resilient
- Fall back to global HuggingFace cache when project WhisperX cache is incomplete
- Bump version metadata to v3.0.3

中文:
- 新增分阶段 installer.py,支持可重复执行的依赖安装和环境健康检查
- setup_env.py 与 OneKeyStart.bat 支持共享 uv venv
- OneKeyStart.bat 启动前自动检查环境,缺失或过期时自动修复
- 放宽 spaCy patch 版本约束,提升 Demucs 安装容错
- 项目内 WhisperX 缓存不完整时自动回退到全局 HuggingFace 缓存
- 版本元数据升级到 v3.0.3
---
 OneKeyStart.bat                    |  90 ++++++-
 config.yaml                        |   2 +-
 core/asr_backend/whisperX_local.py |  42 +++-
 install.py                         | 266 +-------------------
 installer.py                       | 389 +++++++++++++++++++++++++++++
 requirements.txt                   |   4 +-
 setup.py                           |   2 +-
 setup_env.py                       | 288 +++++++++------------
 8 files changed, 651 insertions(+), 432 deletions(-)
 create mode 100644 installer.py

diff --git a/OneKeyStart.bat b/OneKeyStart.bat
index f58aff66..ab35955f 100644
--- a/OneKeyStart.bat
+++ b/OneKeyStart.bat
@@ -1,11 +1,85 @@
 @echo off
-chcp 65001 >nul 2>&1
-call conda activate videolingo 2>nul
-set PYTHONWARNINGS=ignore
-python "%~dp0launch.py"
-if %errorlevel% neq 0 (
-    echo.
-    echo  Pre-flight checks or Streamlit failed. See logs\ for details.
-    echo.
+setlocal EnableExtensions
+cd /D "%~dp0"
+
+for /F "tokens=1,2 delims=#" %%A in ('"prompt #$H#$E# & echo on & for %%B in (1) do rem"') do set "ESC=%%B"
+set "C_RESET=%ESC%[0m"
+set "C_GREEN=%ESC%[32m"
+set "C_YELLOW=%ESC%[33m"
+set "C_RED=%ESC%[31m"
+set "C_CYAN=%ESC%[36m"
+set "C_BOLD=%ESC%[1m"
+
+if not exist "logs" mkdir "logs"
+for /f "tokens=2 delims==" %%I in ('wmic os get localdatetime /value') do set dt=%%I
+set "LOGFILE=logs\videolingo_%dt:~0,8%_%dt:~8,6%.log"
+set "CHECK_ONLY="
+if /I "%~1"=="--check-only" set "CHECK_ONLY=1"
+
+echo [%date% %time%] VideoLingo starting... > "%LOGFILE%"
+echo %C_CYAN%Log file:%C_RESET% %LOGFILE%
+
+set "VENV_LABEL="
+set "VENV_PY="
+
+set "SHARED_VENV=%USERPROFILE%\.venvs\videolingo"
+if exist "%SHARED_VENV%\Scripts\python.exe" (
+    set "VENV_LABEL=shared venv"
+    set "VENV_PY=%SHARED_VENV%\Scripts\python.exe"
+    goto venv_found
 )
+
+if exist ".venv\Scripts\python.exe" (
+    set "VENV_LABEL=project .venv"
+    set "VENV_PY=.venv\Scripts\python.exe"
+    goto venv_found
+)
+
+where conda >nul 2>nul
+if %errorlevel%==0 (
+    echo %C_YELLOW%No uv venv found, falling back to Conda env "videolingo"...%C_RESET%
+    call conda activate videolingo
+    python installer.py --check --quiet
+if errorlevel 1 (
+    echo %C_YELLOW%Conda env is incomplete or outdated. Repairing...%C_RESET%
+    python installer.py --yes
+    if errorlevel 1 goto install_failed
+)
+    if defined CHECK_ONLY (
+        echo %C_GREEN%Environment check passed. --check-only set, not starting Streamlit.%C_RESET%
+        goto end
+    )
+    echo %C_GREEN%Starting VideoLingo with Conda...%C_RESET%
+    python -m streamlit run st.py 2>&1 | powershell -NoProfile -Command "$input | Tee-Object -FilePath '%LOGFILE%' -Append"
+    goto end
+)
+
+echo %C_RED%ERROR: No usable VideoLingo environment found.%C_RESET%
+echo Run one of these first:
+echo   python setup_env.py --shared
+echo   python setup_env.py
+goto end
+
+:venv_found
+echo %C_GREEN%Detected %VENV_LABEL%:%C_RESET% %VENV_PY%
+"%VENV_PY%" installer.py --check --quiet
+if errorlevel 1 (
+    echo %C_YELLOW%Environment is incomplete or outdated. Repairing with installer.py...%C_RESET%
+    "%VENV_PY%" installer.py --yes
+    if errorlevel 1 goto install_failed
+)
+
+if defined CHECK_ONLY (
+    echo %C_GREEN%Environment check passed. --check-only set, not starting Streamlit.%C_RESET%
+    goto end
+)
+
+echo %C_GREEN%Starting VideoLingo with %VENV_LABEL%...%C_RESET%
+"%VENV_PY%" -m streamlit run st.py 2>&1 | powershell -NoProfile -Command "$input | Tee-Object -FilePath '%LOGFILE%' -Append"
+goto end
+
+:install_failed
+echo %C_RED%Install/repair failed. Check the messages above and the log file.%C_RESET%
+
+:end
 pause
diff --git a/config.yaml b/config.yaml
index a91bacd0..e9d30c44 100644
--- a/config.yaml
+++ b/config.yaml
@@ -1,7 +1,7 @@
 # * Settings marked with * are advanced settings that won't appear in the Streamlit page and can only be modified manually in config.py
 # recommend to set in streamlit page
 # -------------------
-# version: "3.0.2"
+# version: "3.0.3"
 # author: "Huanshere"
 # -------------------
 
diff --git a/core/asr_backend/whisperX_local.py b/core/asr_backend/whisperX_local.py
index da96c7b8..2caab063 100644
--- a/core/asr_backend/whisperX_local.py
+++ b/core/asr_backend/whisperX_local.py
@@ -4,6 +4,7 @@
 import subprocess
 import torch
 import functools
+from pathlib import Path
 
 warnings.filterwarnings("ignore")
 
@@ -34,6 +35,22 @@ def _patched_torch_load(*args, **kwargs):
 from core.utils import *
 MODEL_DIR = load_key("model_dir")
 
+
+def _hf_cache_dir_for_repo(cache_root, repo_id):
+    return Path(cache_root) / f"models--{repo_id.replace('/', '--')}"
+
+
+def _has_complete_hf_snapshot(cache_root, repo_id):
+    repo_dir = _hf_cache_dir_for_repo(cache_root, repo_id)
+    snapshots = repo_dir / "snapshots"
+    if not snapshots.exists():
+        return False
+    required_files = {"config.json", "model.bin", "tokenizer.json"}
+    for snapshot in snapshots.iterdir():
+        if snapshot.is_dir() and all((snapshot / name).exists() for name in required_files):
+            return True
+    return False
+
 @except_handler("failed to check hf mirror", default_return=None)
 def check_hf_mirror():
     mirrors = {'Official': 'huggingface.co', 'Mirror': 'hf-mirror.com'}
@@ -76,6 +93,7 @@ def transcribe_audio(raw_audio_file, vocal_audio_file, start, end):
         rprint(f"[cyan]📦 Batch size:[/cyan] {batch_size}, [cyan]⚙️ Compute type:[/cyan] {compute_type}")
     rprint(f"[green]▶️ Starting WhisperX for segment {start:.2f}s to {end:.2f}s...[/green]")
     
+    download_root = MODEL_DIR
     if WHISPER_LANGUAGE == 'zh':
         model_name = "Huan69/Belle-whisper-large-v3-zh-punct-fasterwhisper"
         local_model = os.path.join(MODEL_DIR, "Belle-whisper-large-v3-zh-punct-fasterwhisper")
@@ -86,14 +104,34 @@ def transcribe_audio(raw_audio_file, vocal_audio_file, start, end):
     if os.path.exists(local_model):
         rprint(f"[green]📥 Loading local WHISPER model:[/green] {local_model} ...")
         model_name = local_model
+        download_root = None
     else:
         rprint(f"[green]📥 Using WHISPER model from HuggingFace:[/green] {model_name} ...")
+        # If the project-local cache is missing or only partially downloaded,
+        # let HuggingFace use the default global cache. This avoids getting
+        # stuck on a half-created ./_model_cache after a network interruption.
+        repo_id = model_name if "/" in model_name else f"Systran/faster-whisper-{model_name}"
+        if not _has_complete_hf_snapshot(MODEL_DIR, repo_id):
+            rprint(
+                "[yellow]⚠️ Project model cache is incomplete; "
+                "falling back to the global HuggingFace cache.[/yellow]"
+            )
+            download_root = None
 
     vad_options = {"vad_onset": 0.500,"vad_offset": 0.363}
     asr_options = {"temperatures": [0],"initial_prompt": "",}
     whisper_language = None if 'auto' in WHISPER_LANGUAGE else WHISPER_LANGUAGE
     rprint("[bold yellow] You can ignore warning of `Model was trained with torch 1.10.0+cu102, yours is 2.0.0+cu118...`[/bold yellow]")
-    model = whisperx.load_model(model_name, device, compute_type=compute_type, language=whisper_language, vad_options=vad_options, asr_options=asr_options, download_root=MODEL_DIR)
+    load_kwargs = dict(
+        device=device,
+        compute_type=compute_type,
+        language=whisper_language,
+        vad_options=vad_options,
+        asr_options=asr_options,
+    )
+    if download_root:
+        load_kwargs["download_root"] = download_root
+    model = whisperx.load_model(model_name, **load_kwargs)
 
     def load_audio_segment(audio_file, start, end):
         # Use whisperx's ffmpeg-based loader instead of librosa.load() which
@@ -147,4 +185,4 @@ def load_audio_segment(audio_file, start, end):
                 word['start'] += start
             if 'end' in word:
                 word['end'] += start
-    return result
\ No newline at end of file
+    return result
diff --git a/install.py b/install.py
index 9e38f035..c0e87953 100644
--- a/install.py
+++ b/install.py
@@ -1,263 +1,19 @@
-import os, sys
-import platform
-import subprocess
-sys.path.append(os.path.dirname(os.path.abspath(__file__)))
+"""Compatibility wrapper for the stage-based installer.
 
-ascii_logo = """
-__     ___     _            _     _                    
-\ \   / (_) __| | ___  ___ | |   (_)_ __   __ _  ___  
- \ \ / /| |/ _` |/ _ \/ _ \| |   | | '_ \ / _` |/ _ \ 
-  \ V / | | (_| |  __/ (_) | |___| | | | | (_| | (_) |
-   \_/  |_|\__,_|\___|\___/|_____|_|_| |_|\__, |\___/ 
-                                          |___/        
+Historically users ran ``python install.py`` and the app launched at the end.
+Keep that behavior here while moving the real installation logic to
+``installer.py`` so setup_env.py and launchers can reuse it safely.
 """
 
-def install_package(*packages):
-    subprocess.check_call([sys.executable, "-m", "pip", "install", *packages])
+from __future__ import annotations
 
-def check_nvidia_gpu():
-    install_package("nvidia-ml-py")
-    import pynvml
-    from translations.translations import translate as t
-    initialized = False
-    try:
-        pynvml.nvmlInit()
-        initialized = True
-        device_count = pynvml.nvmlDeviceGetCount()
-        if device_count > 0:
-            print(t("Detected NVIDIA GPU(s)"))
-            for i in range(device_count):
-                handle = pynvml.nvmlDeviceGetHandleByIndex(i)
-                name = pynvml.nvmlDeviceGetName(handle)
-                print(f"GPU {i}: {name}")
-            return True
-        else:
-            print(t("No NVIDIA GPU detected"))
-            return False
-    except pynvml.NVMLError:
-        print(t("No NVIDIA GPU detected or NVIDIA drivers not properly installed"))
-        return False
-    finally:
-        if initialized:
-            pynvml.nvmlShutdown()
+import sys
 
-def check_ffmpeg():
-    from rich.console import Console
-    from rich.panel import Panel
-    from translations.translations import translate as t
-    console = Console()
+from installer import main
 
-    try:
-        # Check if ffmpeg is installed
-        subprocess.run(['ffmpeg', '-version'], stdout=subprocess.PIPE, stderr=subprocess.PIPE, check=True)
-        console.print(Panel(t("✅ FFmpeg is already installed"), style="green"))
-    except (subprocess.CalledProcessError, FileNotFoundError):
-        system = platform.system()
-        install_cmd = ""
-        
-        if system == "Windows":
-            install_cmd = "choco install ffmpeg"
-            extra_note = t("Install Chocolatey first (https://chocolatey.org/)")
-        elif system == "Darwin":
-            install_cmd = "brew install ffmpeg"
-            extra_note = t("Install Homebrew first (https://brew.sh/)")
-        elif system == "Linux":
-            install_cmd = "sudo apt install ffmpeg  # Ubuntu/Debian\nsudo yum install ffmpeg  # CentOS/RHEL"
-            extra_note = t("Use your distribution's package manager")
-        
-        console.print(Panel.fit(
-            t("❌ FFmpeg not found\n\n") +
-            f"{t('🛠️ Install using:')}\n[bold cyan]{install_cmd}[/bold cyan]\n\n" +
-            f"{t('💡 Note:')}\n{extra_note}\n\n" +
-            f"{t('🔄 After installing FFmpeg, please run this installer again:')}\n[bold cyan]python install.py[/bold cyan]",
-            style="red"
-        ))
-        raise SystemExit(t("FFmpeg is required. Please install it and run the installer again."))
-
-    # Warn if ffmpeg lacks libmp3lame (common with conda-forge builds)
-    try:
-        result = subprocess.run(['ffmpeg', '-encoders'], capture_output=True, text=True, timeout=10)
-        if 'libmp3lame' not in result.stdout:
-            console.print(Panel.fit(
-                "⚠️ Your ffmpeg does not include [bold]libmp3lame[/bold] (MP3 encoder).\n"
-                "This is common with conda-forge ffmpeg builds.\n\n"
-                "VideoLingo will fall back to WAV encoding automatically, but for\n"
-                "smaller intermediate files, consider installing a full ffmpeg:\n\n"
-                "[bold cyan]" + (
-                    "winget install Gyan.FFmpeg" if platform.system() == "Windows"
-                    else "brew install ffmpeg" if platform.system() == "Darwin"
-                    else "sudo apt install ffmpeg"
-                ) + "[/bold cyan]",
-                style="yellow"
-            ))
-    except Exception:
-        pass
-
-def _detect_cuda_version_from_smi():
-    """Detect CUDA version from nvidia-smi output (driver's CUDA capability)."""
-    import re
-    try:
-        result = subprocess.run(
-            ["nvidia-smi"], capture_output=True, text=True, timeout=10
-        )
-        m = re.search(r"CUDA Version:\s*(\d+)\.(\d+)", result.stdout)
-        if m:
-            return (int(m.group(1)), int(m.group(2)))
-    except Exception:
-        pass
-    return None
-
-
-def _detect_cuda_index():
-    """Detect the CUDA version and return the best PyTorch wheel index URL.
-    Falls back to cu126 when detection fails.
-
-    For RTX 50 series (Blackwell architecture, compute capability 10.0+),
-    we need PyTorch wheels compiled with CUDA 12.8+ that include sm_100 kernels.
-
-    We prefer nvidia-smi (driver CUDA version) over nvcc (toolkit version) because:
-    - Driver version determines what CUDA features the GPU can run at runtime
-    - Toolkit version is for compilation, not runtime compatibility
-    - Blackwell GPUs need cu129+ wheels even if user has older CUDA toolkit installed
-    """
-    cuda_version = _detect_cuda_version_from_smi()
-
-    # Map CUDA major.minor to PyTorch wheel index.
-    # For CUDA 13.x (RTX 50 series / Blackwell), use cu129 which includes sm_100 kernels.
-    INDEX = "https://download.pytorch.org/whl"
-    CU_TAGS = [
-        ((13, 0), "cu129"),  # CUDA 13.x (Blackwell / RTX 50 series)
-        ((12, 9), "cu129"),  # CUDA 12.9+
-        ((12, 8), "cu128"),  # CUDA 12.8+
-        ((12, 6), "cu126"),  # CUDA 12.6+
-    ]
-
-    if cuda_version:
-        for min_ver, tag in CU_TAGS:
-            if cuda_version >= min_ver:
-                return f"{INDEX}/{tag}"
-
-    # Default: cu126 is the broadest CUDA 12 index for PyTorch 2.8
-    return f"{INDEX}/cu126"
-
-def main():
-    install_package("requests", "rich", "ruamel.yaml", "InquirerPy")
-    from rich.console import Console
-    from rich.panel import Panel
-    from rich.box import DOUBLE
-    from InquirerPy import inquirer
-    from translations.translations import translate as t
-    from translations.translations import DISPLAY_LANGUAGES
-    from core.utils.config_utils import load_key, update_key
-    from core.utils.decorator import except_handler
-
-    console = Console()
-    
-    width = max(len(line) for line in ascii_logo.splitlines()) + 4
-    welcome_panel = Panel(
-        ascii_logo,
-        width=width,
-        box=DOUBLE,
-        title="[bold green]🌏[/bold green]",
-        border_style="bright_blue"
-    )
-    console.print(welcome_panel)
-    # Language selection
-    current_language = load_key("display_language")
-    # Find the display name for current language code
-    current_display = next((k for k, v in DISPLAY_LANGUAGES.items() if v == current_language), "🇬🇧 English")
-    selected_language = DISPLAY_LANGUAGES[inquirer.select(
-        message="Select language / 选择语言 / 選擇語言 / 言語を選択 / Seleccionar idioma / Sélectionner la langue / Выберите язык:",
-        choices=list(DISPLAY_LANGUAGES.keys()),
-        default=current_display
-    ).execute()]
-    update_key("display_language", selected_language)
-
-    console.print(Panel.fit(t("🚀 Starting Installation"), style="bold magenta"))
-
-    # Configure mirrors
-    # add a check to ask user if they want to configure mirrors
-    if inquirer.confirm(
-        message=t("Do you need to auto-configure PyPI mirrors? (Recommended if you have difficulty accessing pypi.org)"),
-        default=True
-    ).execute():
-        from core.utils.pypi_autochoose import main as choose_mirror
-        choose_mirror()
-
-    # Detect system and GPU
-    has_gpu = platform.system() != 'Darwin' and check_nvidia_gpu()
-
-    @except_handler("Failed to install PyTorch", retry=1, delay=5)
-    def install_pytorch():
-        if has_gpu:
-            console.print(Panel(t("🎮 NVIDIA GPU detected, installing CUDA version of PyTorch..."), style="cyan"))
-            cuda_index = _detect_cuda_index()
-            console.print(f"[cyan]📦 Using PyTorch index:[/cyan] {cuda_index}")
-            subprocess.check_call([sys.executable, "-m", "pip", "install", "torch==2.8.0", "torchaudio==2.8.0", "--index-url", cuda_index])
-        else:
-            system_name = "🍎 MacOS" if platform.system() == 'Darwin' else "💻 No NVIDIA GPU"
-            console.print(Panel(t(f"{system_name} detected, installing CPU version of PyTorch... Note: it might be slow during whisperX transcription."), style="cyan"))
-            subprocess.check_call([sys.executable, "-m", "pip", "install", "torch==2.8.0", "torchaudio==2.8.0"])
-
-    @except_handler("Failed to install project", retry=1, delay=5)
-    def install_requirements():
-        # Install demucs separately with --no-deps to avoid its outdated
-        # torchaudio<2.2 constraint conflicting with whisperx's torchaudio>=2.5.1.
-        # demucs works fine with torchaudio 2.6.0 at runtime.
-        console.print(Panel(t("Installing demucs (--no-deps to avoid torchaudio conflict)..."), style="cyan"))
-        subprocess.check_call([sys.executable, "-m", "pip", "install", "--no-deps", "demucs[dev]@git+https://github.com/adefossez/demucs"])
-        # demucs --no-deps skips its own dependencies; install the ones it
-        # actually needs at runtime that aren't already pulled in elsewhere.
-        console.print(Panel(t("Installing demucs runtime dependencies..."), style="cyan"))
-        subprocess.check_call([sys.executable, "-m", "pip", "install", "dora-search", "openunmix", "lameenc"])
-
-        console.print(Panel(t("Installing project in editable mode using `pip install -e .`"), style="cyan"))
-        subprocess.check_call([sys.executable, "-m", "pip", "install", "-e", "."], env={**os.environ, "PYTHONIOENCODING": "utf-8"})
-
-    @except_handler("Failed to install Noto fonts")
-    def install_noto_font():
-        # Detect Linux distribution type
-        if os.path.exists('/etc/debian_version'):
-            # Debian/Ubuntu systems
-            cmd = ['sudo', 'apt-get', 'install', '-y', 'fonts-noto']
-            pkg_manager = "apt-get"
-        elif os.path.exists('/etc/redhat-release'):
-            # RHEL/CentOS/Fedora systems
-            cmd = ['sudo', 'yum', 'install', '-y', 'google-noto*']
-            pkg_manager = "yum"
-        else:
-            console.print("Warning: Unrecognized Linux distribution, please install Noto fonts manually", style="yellow")
-            return
-
-        subprocess.run(cmd, check=True)
-        console.print(f"✅ Successfully installed Noto fonts using {pkg_manager}", style="green")
-
-    if platform.system() == 'Linux':
-        install_noto_font()
-    
-    install_pytorch()
-    install_requirements()
-    check_ffmpeg()
-    
-    # First panel with installation complete and startup command
-    panel1_text = (
-        t("Installation completed") + "\n\n" +
-        t("Now I will run this command to start the application:") + "\n" +
-        "[bold]streamlit run st.py[/bold]\n" +
-        t("Note: First startup may take up to 1 minute")
-    )
-    console.print(Panel(panel1_text, style="bold green"))
-
-    # Second panel with troubleshooting tips
-    panel2_text = (
-        t("If the application fails to start:") + "\n" +
-        "1. " + t("Check your network connection") + "\n" +
-        "2. " + t("Re-run the installer: [bold]python install.py[/bold]")
-    )
-    console.print(Panel(panel2_text, style="yellow"))
-
-    # start the application
-    subprocess.Popen([sys.executable, "-m", "streamlit", "run", "st.py"])
 
 if __name__ == "__main__":
-    main()
+    args = sys.argv[1:]
+    if "--check" not in args and "--launch" not in args and "--no-launch" not in args:
+        args.append("--launch")
+    raise SystemExit(main(args))
diff --git a/installer.py b/installer.py
new file mode 100644
index 00000000..aa58e235
--- /dev/null
+++ b/installer.py
@@ -0,0 +1,389 @@
+"""Resumable VideoLingo installer and environment checker.
+
+This script is intentionally split from setup_env.py:
+- setup_env.py creates/selects the venv.
+- installer.py installs packages inside the selected venv.
+- OneKeyStart.bat starts the app and can call ``installer.py --check``.
+
+The installer is stage-based and safe to rerun. Network-sensitive optional
+packages (Demucs, spaCy model downloads) warn instead of breaking the whole
+installation.
+"""
+
+from __future__ import annotations
+
+import argparse
+import hashlib
+import importlib
+import importlib.metadata as metadata
+import json
+import os
+import platform
+import re
+import shutil
+import subprocess
+import sys
+import time
+from pathlib import Path
+
+
+ROOT = Path(__file__).resolve().parent
+STATE_FILE = Path(sys.prefix) / ".videolingo-install.json"
+REQUIREMENTS = ROOT / "requirements.txt"
+
+TORCH_VERSION = "2.8.0"
+TORCH_INDEX = "https://download.pytorch.org/whl"
+BOOTSTRAP_PACKAGES = ["requests", "rich", "ruamel.yaml", "InquirerPy", "packaging"]
+FILTERED_REQUIREMENTS = {"spacy", "whisperx"}
+DEMUX_GIT = "demucs[dev]@git+https://github.com/adefossez/demucs@b9ab48cad45976ba42b2ff17b229c071f0df9390"
+
+
+def run(cmd: list[str], retries: int = 0, env: dict[str, str] | None = None) -> None:
+    for attempt in range(retries + 1):
+        print("  > " + " ".join(str(x) for x in cmd), flush=True)
+        proc = subprocess.run(cmd, cwd=ROOT, env=env)
+        if proc.returncode == 0:
+            return
+        if attempt < retries:
+            delay = min(20, 3 * (attempt + 1))
+            print(f"  Command failed, retrying in {delay}s ({attempt + 1}/{retries})...")
+            time.sleep(delay)
+    raise subprocess.CalledProcessError(proc.returncode, cmd)
+
+
+def pip_install(packages: list[str], retries: int = 2, extra_args: list[str] | None = None) -> None:
+    if not packages:
+        return
+    cmd = [
+        sys.executable,
+        "-m",
+        "pip",
+        "install",
+        "--disable-pip-version-check",
+        "--prefer-binary",
+        "--retries",
+        "5",
+        "--timeout",
+        "120",
+    ]
+    if extra_args:
+        cmd.extend(extra_args)
+    cmd.extend(packages)
+    env = os.environ.copy()
+    env.setdefault("PIP_NO_INPUT", "1")
+    run(cmd, retries=retries, env=env)
+
+
+def soft_pip_install(packages: list[str], retries: int = 1, extra_args: list[str] | None = None) -> bool:
+    try:
+        pip_install(packages, retries=retries, extra_args=extra_args)
+        return True
+    except Exception as exc:
+        print(f"  Warning: optional install failed: {exc}")
+        return False
+
+
+def package_version(name: str) -> str | None:
+    try:
+        return metadata.version(name)
+    except metadata.PackageNotFoundError:
+        return None
+
+
+def package_ok(name: str, prefix: str | None = None) -> bool:
+    version = package_version(name)
+    if version is None:
+        return False
+    return prefix is None or version.split("+")[0].startswith(prefix)
+
+
+def import_ok(module: str) -> bool:
+    try:
+        importlib.import_module(module)
+        return True
+    except Exception:
+        return False
+
+
+def requirements_hash() -> str:
+    h = hashlib.sha256()
+    h.update(REQUIREMENTS.read_bytes())
+    h.update(f"torch={TORCH_VERSION}\n".encode())
+    h.update(DEMUX_GIT.encode())
+    return h.hexdigest()
+
+
+def load_state() -> dict:
+    if not STATE_FILE.exists():
+        return {}
+    try:
+        return json.loads(STATE_FILE.read_text(encoding="utf-8"))
+    except Exception:
+        return {}
+
+
+def save_state() -> None:
+    data = {
+        "requirements_hash": requirements_hash(),
+        "python": sys.version.split()[0],
+        "torch": package_version("torch"),
+        "torchaudio": package_version("torchaudio"),
+        "spacy": package_version("spacy"),
+        "whisperx": package_version("whisperx"),
+        "demucs": package_version("demucs"),
+        "updated_at": time.strftime("%Y-%m-%d %H:%M:%S"),
+    }
+    STATE_FILE.write_text(json.dumps(data, indent=2), encoding="utf-8")
+
+
+def requirement_name(line: str) -> str | None:
+    line = line.strip()
+    if not line or line.startswith("#") or line.startswith("-"):
+        return None
+    line = line.split(";", 1)[0].strip()
+    name = re.split(r"\s*(?:==|>=|<=|~=|!=|>|<|\[)", line, maxsplit=1)[0]
+    return name.strip().lower().replace("_", "-") or None
+
+
+def read_base_requirements() -> list[str]:
+    reqs: list[str] = []
+    for raw in REQUIREMENTS.read_text(encoding="utf-8").splitlines():
+        name = requirement_name(raw)
+        if not name or name in FILTERED_REQUIREMENTS:
+            continue
+        reqs.append(raw.strip())
+    return reqs
+
+
+def detect_nvidia_gpu() -> bool:
+    if platform.system() == "Darwin":
+        return False
+    try:
+        result = subprocess.run(["nvidia-smi"], capture_output=True, text=True, timeout=10)
+        return result.returncode == 0
+    except Exception:
+        return False
+
+
+def detect_cuda_version_from_smi() -> tuple[int, int] | None:
+    try:
+        result = subprocess.run(["nvidia-smi"], capture_output=True, text=True, timeout=10)
+        match = re.search(r"CUDA Version:\s*(\d+)\.(\d+)", result.stdout)
+        if match:
+            return int(match.group(1)), int(match.group(2))
+    except Exception:
+        pass
+    return None
+
+
+def detect_torch_index() -> str:
+    cuda_version = detect_cuda_version_from_smi()
+    tags = [
+        ((13, 0), "cu129"),
+        ((12, 9), "cu129"),
+        ((12, 8), "cu128"),
+        ((12, 6), "cu126"),
+    ]
+    if cuda_version:
+        for minimum, tag in tags:
+            if cuda_version >= minimum:
+                return f"{TORCH_INDEX}/{tag}"
+    return f"{TORCH_INDEX}/cu126"
+
+
+def install_bootstrap() -> None:
+    print("\n[1/7] Bootstrap installer packages")
+    missing = [pkg for pkg in BOOTSTRAP_PACKAGES if package_version(pkg) is None]
+    if missing:
+        pip_install(missing)
+    else:
+        print("  Bootstrap packages already installed.")
+
+
+def maybe_configure_mirror(auto_mirror: bool) -> None:
+    if not auto_mirror:
+        return
+    print("\n[2/7] Configure PyPI mirror")
+    try:
+        from core.utils.pypi_autochoose import main as choose_mirror
+
+        choose_mirror()
+    except Exception as exc:
+        print(f"  Warning: mirror auto-config failed: {exc}")
+
+
+def install_torch(force: bool = False) -> None:
+    print("\n[3/7] Install PyTorch / torchaudio")
+    if not force and package_ok("torch", TORCH_VERSION) and package_ok("torchaudio", TORCH_VERSION):
+        print(f"  torch {package_version('torch')} and torchaudio {package_version('torchaudio')} already installed.")
+        return
+    packages = [f"torch=={TORCH_VERSION}", f"torchaudio=={TORCH_VERSION}"]
+    if detect_nvidia_gpu():
+        index = detect_torch_index()
+        print(f"  NVIDIA GPU detected. Using PyTorch index: {index}")
+        pip_install(packages, retries=3, extra_args=["--index-url", index])
+    else:
+        print("  No NVIDIA GPU detected. Installing CPU PyTorch wheels.")
+        pip_install(packages, retries=3)
+
+
+def install_base_requirements(force: bool = False) -> None:
+    print("\n[4/7] Install base requirements")
+    state = load_state()
+    current_hash = requirements_hash()
+    previous_hash = state.get("requirements_hash")
+    if not force and previous_hash == current_hash and health_check(quiet=True, require_demucs=False, check_state=False) == 0:
+        print("  Environment already matches requirements hash; skipping base install.")
+        return
+    if not force and previous_hash is None and health_check(quiet=True, require_demucs=False, check_state=False) == 0:
+        print("  Packages are already healthy; writing fresh install state later.")
+        return
+    if previous_hash and previous_hash != current_hash:
+        print("  requirements.txt changed; syncing base requirements.")
+    pip_install(read_base_requirements(), retries=3)
+
+
+def install_spacy(force: bool = False) -> None:
+    print("\n[5/7] Install spaCy")
+    if not force and package_ok("spacy", "3.8."):
+        print(f"  spacy {package_version('spacy')} already installed.")
+        return
+    # Keep this flexible. Exact spaCy patch releases can disappear for a Python
+    # minor version, which made plain `pip install -r requirements.txt` brittle.
+    pip_install(["spacy>=3.8.7,<3.9"], retries=3)
+
+
+def install_whisperx(force: bool = False) -> None:
+    print("\n[6/7] Install WhisperX")
+    if not force and package_version("whisperx") is not None:
+        print(f"  whisperx {package_version('whisperx')} already installed.")
+        return
+    pip_install(["whisperx>=3.8.1"], retries=3)
+
+
+def install_demucs(force: bool = False, require: bool = False) -> None:
+    print("\n[7/7] Install Demucs (optional)")
+    if not force and package_version("demucs") is not None and import_ok("demucs.api"):
+        print(f"  demucs {package_version('demucs')} already installed.")
+        return
+    pip_install(["dora-search", "openunmix", "lameenc"], retries=3)
+    if soft_pip_install([DEMUX_GIT], retries=2, extra_args=["--no-deps"]):
+        return
+    print("  Falling back to PyPI demucs. Demucs is optional; install can continue if this fails.")
+    ok = soft_pip_install(["demucs==4.0.1"], retries=2, extra_args=["--no-deps"])
+    if require and not ok:
+        raise RuntimeError("Demucs installation failed")
+
+
+def install_project_metadata() -> None:
+    print("\n[post] Register project metadata (no dependency resolution)")
+    soft_pip_install(["-e", str(ROOT)], retries=1, extra_args=["--no-deps"])
+
+
+def check_ffmpeg() -> bool:
+    if not shutil.which("ffmpeg"):
+        print("  ERROR: ffmpeg not found in PATH.")
+        if platform.system() == "Windows":
+            print("  Install with: winget install Gyan.FFmpeg")
+        elif platform.system() == "Darwin":
+            print("  Install with: brew install ffmpeg")
+        else:
+            print("  Install with your distribution package manager, e.g. sudo apt install ffmpeg")
+        return False
+    return True
+
+
+def health_check(quiet: bool = False, require_demucs: bool = False, check_state: bool = True) -> int:
+    errors: list[str] = []
+    warnings: list[str] = []
+    state = load_state()
+    if check_state:
+        if state.get("requirements_hash") and state.get("requirements_hash") != requirements_hash():
+            errors.append("requirements changed since the last install; rerun installer.py")
+        elif not state.get("requirements_hash"):
+            errors.append("install state file is missing; rerun installer.py once to enable change detection")
+    required = {
+        "streamlit": None,
+        "openai": None,
+        "pandas": None,
+        "torch": TORCH_VERSION,
+        "torchaudio": TORCH_VERSION,
+        "spacy": "3.8.",
+        "whisperx": None,
+    }
+    for package, prefix in required.items():
+        version = package_version(package)
+        if version is None:
+            errors.append(f"missing package: {package}")
+        elif prefix and not version.split("+")[0].startswith(prefix):
+            errors.append(f"{package} version {version} does not match expected {prefix}*")
+    if require_demucs and package_version("demucs") is None:
+        errors.append("missing optional package required by flag: demucs")
+    elif package_version("demucs") is None:
+        warnings.append("demucs is not installed; vocal separation will be unavailable")
+    if not shutil.which("ffmpeg"):
+        errors.append("ffmpeg not found in PATH")
+    if not quiet:
+        print("\nEnvironment check")
+        for package in ["streamlit", "torch", "torchaudio", "spacy", "whisperx", "demucs"]:
+            print(f"  {package}: {package_version(package) or 'missing'}")
+        for warning in warnings:
+            print(f"  WARN: {warning}")
+        for error in errors:
+            print(f"  ERROR: {error}")
+    return 1 if errors else 0
+
+
+def launch_streamlit() -> int:
+    env = os.environ.copy()
+    env["PYTHONWARNINGS"] = "ignore"
+    return subprocess.run([sys.executable, "-m", "streamlit", "run", "st.py"], cwd=ROOT, env=env).returncode
+
+
+def install_all(args: argparse.Namespace) -> int:
+    install_bootstrap()
+    maybe_configure_mirror(args.auto_mirror)
+    install_torch(force=args.force)
+    install_base_requirements(force=args.force)
+    install_spacy(force=args.force)
+    install_whisperx(force=args.force)
+    if not args.skip_demucs:
+        install_demucs(force=args.force, require=args.require_demucs)
+    install_project_metadata()
+    ffmpeg_ok = check_ffmpeg()
+    save_state()
+    status = health_check(require_demucs=args.require_demucs)
+    if not ffmpeg_ok or status != 0:
+        return 1
+    if args.launch:
+        return launch_streamlit()
+    print("\nInstall complete. Start with OneKeyStart.bat or: python -m streamlit run st.py")
+    return 0
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Install or check VideoLingo dependencies")
+    parser.add_argument("--check", action="store_true", help="check environment health only")
+    parser.add_argument("--quiet", action="store_true", help="quiet check output")
+    parser.add_argument("--force", action="store_true", help="force reinstall staged packages")
+    parser.add_argument("--auto-mirror", action="store_true", help="auto-select and configure a PyPI mirror")
+    parser.add_argument("--skip-demucs", action="store_true", help="skip optional Demucs install")
+    parser.add_argument("--require-demucs", action="store_true", help="fail if Demucs cannot be installed")
+    parser.add_argument("--launch", action="store_true", help="launch Streamlit after a successful install")
+    parser.add_argument("--yes", action="store_true", help="accepted for non-interactive wrappers")
+    parser.add_argument("--no-launch", action="store_true", help="compatibility alias; launching is opt-in")
+    return parser
+
+
+def main(argv: list[str] | None = None) -> int:
+    parser = build_parser()
+    args = parser.parse_args(argv)
+    if args.no_launch:
+        args.launch = False
+    if args.check:
+        return health_check(quiet=args.quiet, require_demucs=args.require_demucs)
+    return install_all(args)
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/requirements.txt b/requirements.txt
index 2050d02c..cd30fb20 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,7 +13,9 @@ PyYAML==6.0.3
 replicate==0.33.0
 requests==2.32.5
 resampy==0.4.3
-spacy==3.8.11
+# Keep spaCy on the 3.8 line. Exact patch pins can be unavailable for a
+# specific Python minor version on some mirrors, which makes setup brittle.
+spacy>=3.8.7,<3.9
 streamlit==1.49.1
 streamlit-searchbox
 yt-dlp
diff --git a/setup.py b/setup.py
index b1021893..f122e3e7 100644
--- a/setup.py
+++ b/setup.py
@@ -1,7 +1,7 @@
 from setuptools import setup, find_packages
 
 NAME = 'VideoLingo'
-VERSION = '3.0.2'
+VERSION = '3.0.3'
 
 with open('requirements.txt', encoding='utf-8') as f:
     requirements = f.read().splitlines()
diff --git a/setup_env.py b/setup_env.py
index da097336..657ab900 100644
--- a/setup_env.py
+++ b/setup_env.py
@@ -1,221 +1,181 @@
-"""
-VideoLingo Environment Setup (No Anaconda Required)
-
-This script provides a conda-free installation path using `uv` (by Astral).
-It automatically:
-  1. Installs uv if not found
-  2. Creates a .venv with Python 3.10
-  3. Runs install.py inside the venv
+"""Create a VideoLingo Python environment, then run the stage-based installer.
 
-Usage:
-  python setup_env.py          # Full setup (any system Python 3.x works)
-  python setup_env.py --skip-install  # Only create venv, don't run install.py
+Default behavior creates a project-local ``.venv``. Use ``--shared`` to create
+or reuse ``~/.venvs/videolingo`` so multiple VideoLingo checkouts share the same
+heavy dependencies (PyTorch, WhisperX, Demucs, etc.).
 """
 
+from __future__ import annotations
+
+import argparse
 import os
-import sys
+import platform
 import shutil
 import subprocess
-import platform
+import sys
+from pathlib import Path
+
 
 PYTHON_VERSION = "3.10"
-VENV_DIR = ".venv"
-SCRIPT_DIR = os.path.dirname(os.path.abspath(__file__))
+SCRIPT_DIR = Path(__file__).resolve().parent
+LOCAL_VENV = SCRIPT_DIR / ".venv"
+SHARED_VENV = Path.home() / ".venvs" / "videolingo"
 
 
-def run(cmd, check=True, **kwargs):
-    """Run a command and return the CompletedProcess."""
-    print(f"  > {' '.join(cmd) if isinstance(cmd, list) else cmd}")
+def run(cmd: list[str], check: bool = True, **kwargs) -> subprocess.CompletedProcess:
+    print("  > " + " ".join(str(x) for x in cmd))
     return subprocess.run(cmd, check=check, **kwargs)
 
 
-def is_uv_installed():
-    """Check if uv is available on PATH."""
+def is_uv_installed() -> bool:
     return shutil.which("uv") is not None
 
 
-def install_uv():
-    """Install uv using platform-appropriate method with fallbacks."""
-    print("\n[1/3] Installing uv...")
-
+def install_uv() -> None:
+    print("\n[1/3] Checking uv")
     if is_uv_installed():
-        ver = subprocess.run(
-            ["uv", "--version"], capture_output=True, text=True
-        ).stdout.strip()
+        ver = subprocess.run(["uv", "--version"], capture_output=True, text=True).stdout.strip()
         print(f"  uv is already installed: {ver}")
         return
 
-    system = platform.system()
-    if system == "Windows":
-        _install_uv_windows()
+    if platform.system() == "Windows":
+        methods = [
+            ["winget", "install", "astral-sh.uv", "--accept-package-agreements", "--accept-source-agreements"],
+            ["powershell", "-ExecutionPolicy", "ByPass", "-c", "irm https://astral.sh/uv/install.ps1 | iex"],
+            [sys.executable, "-m", "pip", "install", "uv"],
+        ]
     else:
-        # macOS / Linux
-        try:
-            run(["sh", "-c", "curl -LsSf https://astral.sh/uv/install.sh | sh"])
-        except subprocess.CalledProcessError:
-            print("  curl installer failed, trying pip...")
-            run([sys.executable, "-m", "pip", "install", "uv"])
-
-    # After installation, uv may not be on PATH in the current session.
-    if not is_uv_installed():
-        _add_uv_to_path()
-
-    if not is_uv_installed():
-        print(
-            "\n*** ERROR: uv was installed but not found on PATH. ***\n"
-            "Please restart your terminal and run this script again.\n"
-            "Or install uv manually: https://docs.astral.sh/uv/getting-started/installation/"
-        )
-        sys.exit(1)
-
-    ver = subprocess.run(
-        ["uv", "--version"], capture_output=True, text=True
-    ).stdout.strip()
-    print(f"  uv installed successfully: {ver}")
-
-
-def _install_uv_windows():
-    """Try multiple methods to install uv on Windows."""
-    methods = [
-        ("winget", ["winget", "install", "astral-sh.uv",
-                     "--accept-package-agreements", "--accept-source-agreements"]),
-        ("PowerShell installer", [
-            "powershell", "-ExecutionPolicy", "ByPass", "-c",
-            "irm https://astral.sh/uv/install.ps1 | iex"
-        ]),
-        ("pip", [sys.executable, "-m", "pip", "install", "uv"]),
-    ]
+        methods = [
+            ["sh", "-c", "curl -LsSf https://astral.sh/uv/install.sh | sh"],
+            [sys.executable, "-m", "pip", "install", "uv"],
+        ]
 
-    for name, cmd in methods:
+    for cmd in methods:
         try:
-            print(f"  Trying {name}...")
             run(cmd)
-            # Check if PATH needs updating after install
-            if not is_uv_installed():
-                _add_uv_to_path()
+            add_uv_to_path()
             if is_uv_installed():
+                print("  uv installed successfully")
                 return
         except (subprocess.CalledProcessError, FileNotFoundError):
-            print(f"  {name} failed, trying next method...")
-            continue
+            print("  install method failed, trying next method...")
 
-    print("  All installation methods failed.")
+    raise SystemExit("ERROR: uv could not be installed. Install it manually: https://docs.astral.sh/uv/")
 
 
-def _add_uv_to_path():
-    """Try to add uv's default install location to PATH for this session."""
-    home = os.path.expanduser("~")
+def add_uv_to_path() -> None:
     candidates = [
-        os.path.join(home, ".local", "bin"),
-        os.path.join(home, ".cargo", "bin"),
-        os.path.join(os.environ.get("LOCALAPPDATA", ""), "uv", "bin"),
-        os.path.join(os.environ.get("LOCALAPPDATA", ""), "Programs", "uv"),
+        Path.home() / ".local" / "bin",
+        Path.home() / ".cargo" / "bin",
+        Path(os.environ.get("LOCALAPPDATA", "")) / "uv" / "bin",
+        Path(os.environ.get("LOCALAPPDATA", "")) / "Programs" / "uv",
+        Path(os.environ.get("LOCALAPPDATA", "")) / "Microsoft" / "WinGet" / "Links",
     ]
-    for p in candidates:
-        if not os.path.isdir(p):
-            continue
-        uv_name = "uv.exe" if platform.system() == "Windows" else "uv"
-        if os.path.isfile(os.path.join(p, uv_name)):
-            os.environ["PATH"] = p + os.pathsep + os.environ["PATH"]
+    name = "uv.exe" if platform.system() == "Windows" else "uv"
+    for path in candidates:
+        if (path / name).is_file():
+            os.environ["PATH"] = str(path) + os.pathsep + os.environ.get("PATH", "")
             return
 
 
-def create_venv():
-    """Create a virtual environment with Python 3.10 using uv."""
-    print(f"\n[2/3] Creating virtual environment with Python {PYTHON_VERSION}...")
-
-    venv_path = os.path.join(SCRIPT_DIR, VENV_DIR)
-
-    if os.path.exists(venv_path):
-        # Check if existing venv has the right Python version
-        python_exe = _get_venv_python(venv_path)
-        if python_exe and os.path.isfile(python_exe):
-            result = subprocess.run(
-                [python_exe, "--version"], capture_output=True, text=True
-            )
-            ver = result.stdout.strip()
-            if "3.10" in ver:
-                print(f"  .venv already exists with {ver}, reusing it.")
-                return python_exe
-
-        print("  Removing existing .venv (wrong Python version)...")
-        shutil.rmtree(venv_path, ignore_errors=True)
-
-    # uv venv will auto-download Python 3.10 if not present
-    # --seed installs pip/setuptools into the venv (install.py needs pip)
-    run(["uv", "venv", "--seed", "--python", PYTHON_VERSION, VENV_DIR], cwd=SCRIPT_DIR)
-
-    python_exe = _get_venv_python(venv_path)
-    if not python_exe or not os.path.isfile(python_exe):
-        print("*** ERROR: Failed to create virtual environment. ***")
-        sys.exit(1)
-
-    result = subprocess.run(
-        [python_exe, "--version"], capture_output=True, text=True
-    )
-    print(f"  Virtual environment created: {result.stdout.strip()}")
-    return python_exe
-
-
-def _get_venv_python(venv_path):
-    """Get the Python executable path inside the venv."""
+def venv_python(venv_path: Path) -> Path:
     if platform.system() == "Windows":
-        return os.path.join(venv_path, "Scripts", "python.exe")
-    else:
-        return os.path.join(venv_path, "bin", "python")
-
+        return venv_path / "Scripts" / "python.exe"
+    return venv_path / "bin" / "python"
 
-def run_install(python_exe):
-    """Run install.py using the venv's Python."""
-    print("\n[3/3] Running install.py...")
-    install_script = os.path.join(SCRIPT_DIR, "install.py")
 
-    # Prepare env for install.py subprocess:
-    env = os.environ.copy()
-    # 1. Avoid pip cache permission errors (common on Windows when cache dir
-    #    is locked or has restrictive ACLs from a previous Python install)
-    env["PIP_NO_CACHE_DIR"] = "1"
-    # 2. Put venv Scripts/bin on PATH so install.py can find streamlit etc.
-    venv_path = os.path.join(SCRIPT_DIR, VENV_DIR)
+def venv_bin(venv_path: Path) -> Path:
     if platform.system() == "Windows":
-        venv_bin = os.path.join(venv_path, "Scripts")
-    else:
-        venv_bin = os.path.join(venv_path, "bin")
-    env["PATH"] = venv_bin + os.pathsep + env.get("PATH", "")
+        return venv_path / "Scripts"
+    return venv_path / "bin"
+
+
+def python_version_ok(python_exe: Path) -> bool:
+    if not python_exe.is_file():
+        return False
+    result = subprocess.run([str(python_exe), "--version"], capture_output=True, text=True)
+    return "3.10" in (result.stdout or result.stderr)
+
+
+def create_venv(path: Path, yes: bool = False) -> Path:
+    print(f"\n[2/3] Creating/reusing virtual environment: {path}")
+    python_exe = venv_python(path)
+    if python_version_ok(python_exe):
+        result = subprocess.run([str(python_exe), "--version"], capture_output=True, text=True)
+        print(f"  Reusing existing venv: {result.stdout.strip() or result.stderr.strip()}")
+        return python_exe
+
+    if path.exists():
+        if not yes:
+            answer = input(f"  Existing venv at {path} is not Python 3.10. Remove and recreate it? [y/N] ").strip().lower()
+            if answer != "y":
+                raise SystemExit("Cancelled.")
+        shutil.rmtree(path, ignore_errors=True)
+
+    path.parent.mkdir(parents=True, exist_ok=True)
+    run(["uv", "venv", "--seed", "--python", PYTHON_VERSION, str(path)], cwd=SCRIPT_DIR)
+    if not python_version_ok(python_exe):
+        raise SystemExit("ERROR: failed to create a Python 3.10 virtual environment")
+    return python_exe
 
-    run([python_exe, install_script], cwd=SCRIPT_DIR, env=env)
 
+def run_installer(python_exe: Path, args: argparse.Namespace) -> None:
+    print("\n[3/3] Installing VideoLingo dependencies")
+    env = os.environ.copy()
+    env["PATH"] = str(venv_bin(python_exe.parent.parent)) + os.pathsep + env.get("PATH", "")
+    cmd = [str(python_exe), str(SCRIPT_DIR / "installer.py"), "--yes"]
+    if args.auto_mirror:
+        cmd.append("--auto-mirror")
+    if args.force:
+        cmd.append("--force")
+    if args.skip_demucs:
+        cmd.append("--skip-demucs")
+    if args.require_demucs:
+        cmd.append("--require-demucs")
+    run(cmd, cwd=SCRIPT_DIR, env=env)
+
+
+def build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Create and install a VideoLingo environment")
+    parser.add_argument("--shared", action="store_true", help=f"use shared venv at {SHARED_VENV}")
+    parser.add_argument("--path", help="custom venv path; implies --shared-style external venv")
+    parser.add_argument("--skip-install", action="store_true", help="only create/reuse the venv")
+    parser.add_argument("--auto-mirror", action="store_true", help="auto-select a PyPI mirror before install")
+    parser.add_argument("--skip-demucs", action="store_true", help="skip optional Demucs install")
+    parser.add_argument("--require-demucs", action="store_true", help="fail if Demucs cannot be installed")
+    parser.add_argument("--force", action="store_true", help="force reinstall staged packages")
+    parser.add_argument("--yes", action="store_true", help="non-interactive; recreate wrong-version venvs")
+    return parser
+
+
+def main() -> None:
+    args = build_parser().parse_args()
+    target = Path(args.path).expanduser() if args.path else (SHARED_VENV if args.shared else LOCAL_VENV)
 
-def main():
     print("=" * 60)
-    print("  VideoLingo Environment Setup (conda-free)")
+    print("  VideoLingo Environment Setup")
     print("=" * 60)
-    print(f"\n  Project dir : {SCRIPT_DIR}")
+    print(f"  Project dir : {SCRIPT_DIR}")
     print(f"  Python ver  : {PYTHON_VERSION}")
-    print(f"  Venv dir    : {VENV_DIR}")
-
-    skip_install = "--skip-install" in sys.argv
+    print(f"  Venv path   : {target}")
 
     install_uv()
-    python_exe = create_venv()
+    python_exe = create_venv(target, yes=args.yes)
 
-    if skip_install:
-        print(f"\n  --skip-install: Skipping install.py")
-        print(f"\n  To install dependencies manually:")
-        print(f"    {python_exe} install.py")
+    if args.skip_install:
+        print("\n  --skip-install: dependencies were not installed")
+        print(f"  To install later: {python_exe} {SCRIPT_DIR / 'installer.py'} --yes")
     else:
-        run_install(python_exe)
+        run_installer(python_exe, args)
 
     print("\n" + "=" * 60)
-    print("  Setup complete!")
+    print("  Setup complete")
     print("=" * 60)
-    print(f"\n  To start VideoLingo:")
     if platform.system() == "Windows":
-        print(f"    .venv\\Scripts\\streamlit run st.py")
-        print(f"    (or double-click OneKeyStart_uv.bat)")
+        print("  Start with: OneKeyStart.bat")
     else:
-        print(f"    .venv/bin/streamlit run st.py")
-    print()
+        streamlit = venv_bin(target) / "streamlit"
+        print(f"  Start with: {streamlit} run st.py")
 
 
 if __name__ == "__main__":