Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-20) — Voice-Command Router](#whats-new-2026-06-20--voice-command-router)
- [What's new (2026-06-20) — Locale-Aware Number, Currency & Date Parsing](#whats-new-2026-06-20--locale-aware-number-currency--date-parsing)
- [What's new (2026-06-20) — Perceptual-Hash Image Dedupe](#whats-new-2026-06-20--perceptual-hash-image-dedupe)
- [What's new (2026-06-20) — S3-Compatible Artifact Store](#whats-new-2026-06-20--s3-compatible-artifact-store)
Expand Down Expand Up @@ -96,6 +97,12 @@

---

## What's new (2026-06-20) — Voice-Command Router

Trigger flows hands-free from recognized speech. Full reference: [`docs/source/Eng/doc/new_features/v44_features_doc.rst`](docs/source/Eng/doc/new_features/v44_features_doc.rst).

- **`VoiceRouter`** (`AC_voice_register` / `AC_voice_dispatch` / `AC_voice_list` / `AC_voice_clear`, `ac_*`): map spoken trigger phrases to `AC_*` action lists; feed it recognized text and it runs the closest registered command (phrase matching reuses the fuzzy matcher, so "save the file" fires "save file"). **Speech-to-text is out of scope and injectable** — the router takes text and a `recognizer`/`runner` callable, so routing is fully unit-tested without audio or any speech dependency (a real Vosk/mic recogniser plugs into `listen_once`).

## What's new (2026-06-20) — Locale-Aware Number, Currency & Date Parsing

Parse localized numbers/currency/dates. Full reference: [`docs/source/Eng/doc/new_features/v43_features_doc.rst`](docs/source/Eng/doc/new_features/v43_features_doc.rst).
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-20) — 语音指令路由器](#本次更新-2026-06-20--语音指令路由器)
- [本次更新 (2026-06-20) — 区域设置感知的数字、货币与日期解析](#本次更新-2026-06-20--区域设置感知的数字货币与日期解析)
- [本次更新 (2026-06-20) — 感知哈希图像去重](#本次更新-2026-06-20--感知哈希图像去重)
- [本次更新 (2026-06-20) — S3 兼容成品存储](#本次更新-2026-06-20--s3-兼容成品存储)
Expand Down Expand Up @@ -95,6 +96,12 @@

---

## 本次更新 (2026-06-20) — 语音指令路由器

以已识别语音免手动触发流程。完整参考:[`docs/source/Zh/doc/new_features/v44_features_doc.rst`](../docs/source/Zh/doc/new_features/v44_features_doc.rst)。

- **`VoiceRouter`**(`AC_voice_register` / `AC_voice_dispatch` / `AC_voice_list` / `AC_voice_clear`、`ac_*`):将语音触发短语映射到 `AC_*` 动作列表;喂入已识别文本即执行最接近的已注册指令(短语匹配重用模糊匹配器,因此「save the file」会触发「save file」)。**语音转文本不在范围内且可注入** —— 路由器接受文本与 `recognizer`/`runner` 可调用对象,因此路由在无音频、无任何语音依赖下完整单元测试(真实 Vosk/麦克风识别器接入 `listen_once`)。

## 本次更新 (2026-06-20) — 区域设置感知的数字、货币与日期解析

解析本地化的数字/货币/日期。完整参考:[`docs/source/Zh/doc/new_features/v43_features_doc.rst`](../docs/source/Zh/doc/new_features/v43_features_doc.rst)。
Expand Down
7 changes: 7 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-20) — 語音指令路由器](#本次更新-2026-06-20--語音指令路由器)
- [本次更新 (2026-06-20) — 區域設定感知的數字、貨幣與日期解析](#本次更新-2026-06-20--區域設定感知的數字貨幣與日期解析)
- [本次更新 (2026-06-20) — 感知雜湊影像去重](#本次更新-2026-06-20--感知雜湊影像去重)
- [本次更新 (2026-06-20) — S3 相容成品儲存](#本次更新-2026-06-20--s3-相容成品儲存)
Expand Down Expand Up @@ -95,6 +96,12 @@

---

## 本次更新 (2026-06-20) — 語音指令路由器

以已辨識語音免手動觸發流程。完整參考:[`docs/source/Zh/doc/new_features/v44_features_doc.rst`](../docs/source/Zh/doc/new_features/v44_features_doc.rst)。

- **`VoiceRouter`**(`AC_voice_register` / `AC_voice_dispatch` / `AC_voice_list` / `AC_voice_clear`、`ac_*`):將語音觸發片語對應到 `AC_*` 動作清單;餵入已辨識文字即執行最接近的已註冊指令(片語比對重用模糊比對器,因此「save the file」會觸發「save file」)。**語音轉文字不在範圍內且可注入** —— 路由器接受文字與 `recognizer`/`runner` 可呼叫物件,因此路由在無音訊、無任何語音相依下完整單元測試(真實 Vosk/麥克風辨識器接入 `listen_once`)。

## 本次更新 (2026-06-20) — 區域設定感知的數字、貨幣與日期解析

解析在地化的數字/貨幣/日期。完整參考:[`docs/source/Zh/doc/new_features/v43_features_doc.rst`](../docs/source/Zh/doc/new_features/v43_features_doc.rst)。
Expand Down
57 changes: 57 additions & 0 deletions docs/source/Eng/doc/new_features/v44_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
Voice-Command Router
====================

``VoiceRouter`` maps spoken trigger *phrases* to ``AC_*`` action lists: feed it
the text of a recognized utterance and it runs the closest registered command —
hands-free triggering of automation flows. Phrase matching reuses the project's
fuzzy matcher, so "save the file" still fires a ``"save file"`` command despite
recogniser noise.

Speech-to-text is intentionally **out of scope and injectable**: the router takes
already-recognised *text*. A real microphone/Vosk recogniser is supplied as a
``recognizer`` callable to :meth:`VoiceRouter.listen_once`, which keeps the
routing logic fully unit-testable without audio or any speech dependency. Imports
no ``PySide6``.

Headless API
------------

.. code-block:: python

from je_auto_control import VoiceRouter

router = VoiceRouter(threshold=0.7)
router.register("save file", [["AC_hotkey", {"keys": ["ctrl", "s"]}]])
router.register("close window", [["AC_close_window", {}]])

router.dispatch("save the file") # fuzzy-matches -> runs the save actions

# with a real recogniser (any callable returning text):
def vosk_listen() -> str:
... # capture audio, return transcript
router.listen_once(vosk_listen)

``dispatch`` (and ``listen_once``) accept a ``runner`` to execute the action list
— it defaults to the executor; inject a fake to test routing without running real
automation. ``match`` returns the best ``VoiceCommand`` at or above ``threshold``
(or ``None``); ``register`` replaces an existing phrase; ``phrases`` / ``clear``
inspect and reset.

Executor commands
-----------------

A module-level default router backs the executor/MCP surfaces:

================================ ===================================================
Command Effect
================================ ===================================================
``AC_voice_register`` Map a ``phrase`` to an ``actions`` list.
``AC_voice_dispatch`` Run the command best matching recognized ``text``.
``AC_voice_list`` List registered phrases.
``AC_voice_clear`` Remove all registered commands.
================================ ===================================================

``actions`` accepts a list or a JSON-string list (so the visual builder works).
The same operations are exposed as MCP tools (``ac_voice_register`` /
``ac_voice_dispatch`` / ``ac_voice_list`` / ``ac_voice_clear``) and as Script
Builder commands under **Agent**.
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v41_features_doc
doc/new_features/v42_features_doc
doc/new_features/v43_features_doc
doc/new_features/v44_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
51 changes: 51 additions & 0 deletions docs/source/Zh/doc/new_features/v44_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
語音指令路由器
==============

``VoiceRouter`` 將語音觸發*片語*對應到 ``AC_*`` 動作清單:餵入一段已辨識語句的文字,它
就會執行最接近的已註冊指令 —— 免手動觸發自動化流程。片語比對重用本專案的模糊比對器,因
此即使辨識有雜訊,「save the file」仍會觸發 ``"save file"`` 指令。

語音轉文字刻意**不在範圍內且可注入**:路由器接受的是已辨識的*文字*。真實的麥克風/Vosk
辨識器以 ``recognizer`` 可呼叫物件傳入 :meth:`VoiceRouter.listen_once`,如此路由邏輯可
在無音訊、無任何語音相依的情況下完整單元測試。不匯入 ``PySide6``。

無頭 API
--------

.. code-block:: python

from je_auto_control import VoiceRouter

router = VoiceRouter(threshold=0.7)
router.register("save file", [["AC_hotkey", {"keys": ["ctrl", "s"]}]])
router.register("close window", [["AC_close_window", {}]])

router.dispatch("save the file") # 模糊比對 -> 執行儲存動作

# 搭配真實辨識器(任何回傳文字的可呼叫物件):
def vosk_listen() -> str:
... # 擷取音訊、回傳逐字稿
router.listen_once(vosk_listen)

``dispatch``(與 ``listen_once``)接受 ``runner`` 來執行動作清單 —— 預設為執行器;注入假
物件即可在不執行真實自動化下測試路由。``match`` 回傳達到或高於 ``threshold`` 的最佳
``VoiceCommand``(否則 ``None``);``register`` 會取代既有片語;``phrases`` / ``clear``
則用於檢視與重置。

執行器指令
----------

模組層級的預設路由器支撐 executor/MCP 介面:

================================ ===================================================
指令 效果
================================ ===================================================
``AC_voice_register`` 將 ``phrase`` 對應到 ``actions`` 清單。
``AC_voice_dispatch`` 執行最符合已辨識 ``text`` 的指令。
``AC_voice_list`` 列出已註冊片語。
``AC_voice_clear`` 移除所有已註冊指令。
================================ ===================================================

``actions`` 接受清單或 JSON 字串清單(因此視覺化建構器可用)。相同操作亦提供為 MCP 工具
(``ac_voice_register`` / ``ac_voice_dispatch`` / ``ac_voice_list`` /
``ac_voice_clear``),以及 Script Builder 中 **Agent** 分類下的指令。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v41_features_doc
doc/new_features/v42_features_doc
doc/new_features/v43_features_doc
doc/new_features/v44_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
5 changes: 5 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,10 @@
from je_auto_control.utils.locale_parse import (
format_currency, format_date, format_decimal, parse_decimal, parse_number,
)
# Voice-command router (injectable speech-to-text)
from je_auto_control.utils.voice import (
VoiceCommand, VoiceRouter, default_voice_router,
)
# Background popup/interrupt watchdog (unattended automation)
from je_auto_control.utils.watchdog import (
PopupWatchdog, WatchdogRule, default_popup_watchdog,
Expand Down Expand Up @@ -700,6 +704,7 @@ def start_autocontrol_gui(*args, **kwargs):
"images_similar",
"format_currency", "format_date", "format_decimal", "parse_decimal",
"parse_number",
"VoiceCommand", "VoiceRouter", "default_voice_router",
# MCP server
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",
Expand Down
25 changes: 25 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -994,6 +994,31 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
),
description="Format an ISO date string for a locale.",
))
specs.append(CommandSpec(
"AC_voice_register", "Agent", "Voice: Register Command",
fields=(
FieldSpec("phrase", FieldType.STRING, placeholder="save file"),
FieldSpec("actions", FieldType.STRING,
placeholder='[["AC_hotkey", {"keys": ["ctrl", "s"]}]]'),
),
description="Map a spoken phrase to an action list (JSON).",
))
specs.append(CommandSpec(
"AC_voice_dispatch", "Agent", "Voice: Dispatch Text",
fields=(FieldSpec("text", FieldType.STRING,
placeholder="save the file"),),
description="Run the command best matching recognized text.",
))
specs.append(CommandSpec(
"AC_voice_list", "Agent", "Voice: List Commands",
fields=(),
description="List registered voice-command phrases.",
))
specs.append(CommandSpec(
"AC_voice_clear", "Agent", "Voice: Clear Commands",
fields=(),
description="Remove all registered voice commands.",
))
specs.append(CommandSpec(
"AC_generate_sop", "Report", "Generate SOP Document",
fields=(
Expand Down
31 changes: 31 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3168,6 +3168,33 @@ def _format_date(value: str, locale: str = "en_US",
return {"text": format_date(value, locale, fmt)}


def _voice_register(phrase: str, actions: Any) -> Dict[str, Any]:
"""Adapter: register a voice command on the default router."""
from je_auto_control.utils.voice import default_voice_router
default_voice_router.register(phrase, _coerce_list(actions))
return {"phrases": default_voice_router.phrases()}


def _voice_dispatch(text: str) -> Dict[str, Any]:
"""Adapter: run the command best matching recognized ``text``."""
from je_auto_control.utils.voice import default_voice_router
outcome = default_voice_router.dispatch(text)
return {"matched": outcome["matched"], "phrase": outcome["phrase"]}


def _voice_list() -> Dict[str, Any]:
"""Adapter: list registered voice-command phrases."""
from je_auto_control.utils.voice import default_voice_router
return {"phrases": default_voice_router.phrases()}


def _voice_clear() -> Dict[str, Any]:
"""Adapter: clear all registered voice commands."""
from je_auto_control.utils.voice import default_voice_router
default_voice_router.clear()
return {"cleared": True}


class Executor:
"""
Executor
Expand Down Expand Up @@ -3434,6 +3461,10 @@ def __init__(self):
"AC_format_decimal": _format_decimal,
"AC_format_currency": _format_currency,
"AC_format_date": _format_date,
"AC_voice_register": _voice_register,
"AC_voice_dispatch": _voice_dispatch,
"AC_voice_list": _voice_list,
"AC_voice_clear": _voice_clear,
"AC_a11y_record_start": _a11y_record_start,
"AC_a11y_record_stop": _a11y_record_stop,
"AC_a11y_record_events": _a11y_record_events,
Expand Down
43 changes: 42 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -3008,6 +3008,47 @@ def locale_tools() -> List[MCPTool]:
]


def voice_tools() -> List[MCPTool]:
return [
MCPTool(
name="ac_voice_register",
description=("Register a voice command: a trigger 'phrase' and an "
"'actions' list (AC_* steps) to run when recognized "
"speech best-matches it. Returns {phrases}."),
input_schema=schema(
{"phrase": {"type": "string"},
"actions": {"type": "array", "items": {"type": "object"}}},
["phrase", "actions"]),
handler=h.voice_register,
annotations=SIDE_EFFECT_ONLY,
),
MCPTool(
name="ac_voice_dispatch",
description=("Run the command whose phrase best matches recognized "
"'text' (fuzzy). Returns {matched, phrase}."),
input_schema=schema({"text": {"type": "string"}}, ["text"]),
handler=h.voice_dispatch,
annotations=SIDE_EFFECT_ONLY,
),
MCPTool(
name="ac_voice_list",
description="List registered voice-command phrases. Returns "
"{phrases}.",
input_schema=schema({}),
handler=h.voice_list,
annotations=READ_ONLY,
),
MCPTool(
name="ac_voice_clear",
description="Remove all registered voice commands. Returns "
"{cleared}.",
input_schema=schema({}),
handler=h.voice_clear,
annotations=SIDE_EFFECT_ONLY,
),
]


def unattended_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -4069,7 +4110,7 @@ def media_assert_tools() -> List[MCPTool]:
credential_lease_tools, egress_tools, approval_testing_tools,
trajectory_eval_tools, compliance_tools, agent_trace_tools,
video_report_tools, fuzzy_tools, artifact_store_tools, image_dedup_tools,
locale_tools,
locale_tools, voice_tools,
screen_record_tools,
process_and_shell_tools, remote_desktop_tools, gamepad_tools,
usb_passthrough_tools, assertion_tools, data_source_tools,
Expand Down
23 changes: 23 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1448,6 +1448,29 @@ def format_date(value, locale="en_US", fmt="medium"):
return {"text": _fmt(value, locale, fmt)}


def voice_register(phrase, actions):
from je_auto_control.utils.voice import default_voice_router
default_voice_router.register(phrase, actions)
return {"phrases": default_voice_router.phrases()}


def voice_dispatch(text):
from je_auto_control.utils.voice import default_voice_router
outcome = default_voice_router.dispatch(text)
return {"matched": outcome["matched"], "phrase": outcome["phrase"]}


def voice_list():
from je_auto_control.utils.voice import default_voice_router
return {"phrases": default_voice_router.phrases()}


def voice_clear():
from je_auto_control.utils.voice import default_voice_router
default_voice_router.clear()
return {"cleared": True}


def vlm_locate(description: str,
screen_region: Optional[List[int]] = None,
model: Optional[str] = None) -> Optional[List[int]]:
Expand Down
6 changes: 6 additions & 0 deletions je_auto_control/utils/voice/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""Voice-command router: map recognized phrases to AC_* action lists."""
from je_auto_control.utils.voice.voice_router import (
VoiceCommand, VoiceRouter, default_voice_router,
)

__all__ = ["VoiceCommand", "VoiceRouter", "default_voice_router"]
Loading
Loading