diff --git a/README.md b/README.md index e9a8a9fc..a3cefad1 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ ## Table of Contents +- [What's new (2026-06-19) — Semantic Screen State](#whats-new-2026-06-19--semantic-screen-state) - [What's new (2026-06-19) — Set-of-Marks Overlay](#whats-new-2026-06-19--set-of-marks-overlay) - [What's new (2026-06-19) — Checkpoint & Resume](#whats-new-2026-06-19--checkpoint--resume) - [What's new (2026-06-19) — i18n / l10n Testing](#whats-new-2026-06-19--i18n--l10n-testing) @@ -75,6 +76,13 @@ --- +## What's new (2026-06-19) — Semantic Screen State + +The semantic companion to the pixel diff, full stack. Full reference: [`docs/source/Eng/doc/new_features/v23_features_doc.rst`](docs/source/Eng/doc/new_features/v23_features_doc.rst). + +- **Snapshot & diff** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed` (`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`, `ac_*`): normalize the a11y tree to `{role, name, bbox}` and report what **appeared / vanished / moved** with a human-readable summary — the feedback signal an agent needs to verify a step ("Save dialog appeared"). +- **Describe the screen** — `describe_screen` (`AC_describe_screen`, `ac_describe_screen`): a compact "where am I" — role counts + interactive control labels. + ## What's new (2026-06-19) — Set-of-Marks Overlay The standard VLM-grounding format, full stack. Full reference: [`docs/source/Eng/doc/new_features/v22_features_doc.rst`](docs/source/Eng/doc/new_features/v22_features_doc.rst). diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md index 3c82946d..c62a449c 100644 --- a/README/README_zh-CN.md +++ b/README/README_zh-CN.md @@ -12,6 +12,7 @@ ## 目录 +- [本次更新 (2026-06-19) — 语义屏幕状态](#本次更新-2026-06-19--语义屏幕状态) - [本次更新 (2026-06-19) — Set-of-Marks 叠图](#本次更新-2026-06-19--set-of-marks-叠图) - [本次更新 (2026-06-19) — 检查点与续跑](#本次更新-2026-06-19--检查点与续跑) - [本次更新 (2026-06-19) — i18n / l10n 测试](#本次更新-2026-06-19--i18n--l10n-测试) @@ -74,6 +75,13 @@ --- +## 本次更新 (2026-06-19) — 语义屏幕状态 + +像素差异的语义对应物,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。 + +- **快照与差异** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 树规范化为 `{role, name, bbox}`,报告**出现 / 消失 / 移动**并附人类可读摘要——agent 验证某步效果所需的反馈信号(「Save 对话框出现了」)。 +- **描述屏幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉价的「我在哪」——各 role 计数 + 交互控件标签。 + ## 本次更新 (2026-06-19) — Set-of-Marks 叠图 VLM 定位的标准格式,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。 diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md index abda423c..103d791e 100644 --- a/README/README_zh-TW.md +++ b/README/README_zh-TW.md @@ -12,6 +12,7 @@ ## 目錄 +- [本次更新 (2026-06-19) — 語意螢幕狀態](#本次更新-2026-06-19--語意螢幕狀態) - [本次更新 (2026-06-19) — Set-of-Marks 疊圖](#本次更新-2026-06-19--set-of-marks-疊圖) - [本次更新 (2026-06-19) — 檢查點與續跑](#本次更新-2026-06-19--檢查點與續跑) - [本次更新 (2026-06-19) — i18n / l10n 測試](#本次更新-2026-06-19--i18n--l10n-測試) @@ -74,6 +75,13 @@ --- +## 本次更新 (2026-06-19) — 語意螢幕狀態 + +像素差異的語意對應物,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。 + +- **快照與差異** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 樹正規化為 `{role, name, bbox}`,回報**出現 / 消失 / 移動**並附人類可讀摘要——agent 驗證某步效果所需的回饋訊號(「Save 對話框出現了」)。 +- **描述螢幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉價的「我在哪」——各 role 計數 + 互動控制項標籤。 + ## 本次更新 (2026-06-19) — Set-of-Marks 疊圖 VLM 定位的標準格式,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。 diff --git a/docs/source/Eng/doc/new_features/v23_features_doc.rst b/docs/source/Eng/doc/new_features/v23_features_doc.rst new file mode 100644 index 00000000..2d38bbf3 --- /dev/null +++ b/docs/source/Eng/doc/new_features/v23_features_doc.rst @@ -0,0 +1,47 @@ +================================================== +New Features (2026-06-19) — Semantic Screen State +================================================== + +The *semantic* companion to the existing pixel (visual-regression) diff: +snapshot the accessibility tree, diff two snapshots into what **appeared / +vanished / moved**, and get a compact structured **description** of the +screen. This is the feedback signal an agent needs to verify a step's +effect and orient itself. Pure standard library; full stack. + +.. contents:: + :local: + :depth: 2 + + +Snapshot & diff +============== + +:: + + from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed + + before = snapshot_screen() # baseline from the live a11y tree + ... # perform a step + delta = screen_changed() # diff vs the baseline + delta["summary"] # ["appeared: window Save", "moved: button OK"] + +``snapshot`` normalizes elements to ``[{role, name, bbox}]`` (identity = +``(role, name)``); ``diff_snapshots(before, after)`` returns ``added`` / +``removed`` / ``moved`` lists plus a human-readable ``summary`` and +``changed_count``. ``snapshot_screen`` / ``screen_changed`` capture and diff +the *live* tree (caching the baseline). Exposed as ``AC_screen_snapshot`` / +``AC_screen_diff`` / ``AC_screen_changed``. + + +Describe the screen +================== + +:: + + from je_auto_control import describe_screen + + describe_screen() # {app, element_count, by_role: {...}, controls: [...]} + +A cheap "where am I" for an agent: counts per role and the labels of the +interactive controls. Exposed as ``AC_describe_screen`` / +``ac_describe_screen`` (and ``ac_screen_*`` for the diff family). diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index 44c04b96..7831a2fd 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -45,6 +45,7 @@ Comprehensive guides for all AutoControl features. doc/new_features/v20_features_doc doc/new_features/v21_features_doc doc/new_features/v22_features_doc + doc/new_features/v23_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v23_features_doc.rst b/docs/source/Zh/doc/new_features/v23_features_doc.rst new file mode 100644 index 00000000..83345583 --- /dev/null +++ b/docs/source/Zh/doc/new_features/v23_features_doc.rst @@ -0,0 +1,44 @@ +========================================== +新功能 (2026-06-19) — 語意螢幕狀態 +========================================== + +既有像素(視覺回歸)差異的*語意*對應物:快照 accessibility 樹、把兩份 +快照差異成**出現 / 消失 / 移動**,並取得螢幕的精簡結構化**描述**。這是 +agent 驗證某步效果與自我定位所需的回饋訊號。純標準庫;走完整五層。 + +.. contents:: + :local: + :depth: 2 + + +快照與差異 +========== + +:: + + from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed + + before = snapshot_screen() # 從即時 a11y 樹取基準 + ... # 執行某個步驟 + delta = screen_changed() # 與基準比對 + delta["summary"] # ["appeared: window Save", "moved: button OK"] + +``snapshot`` 把元素正規化為 ``[{role, name, bbox}]``(識別 = +``(role, name)``);``diff_snapshots(before, after)`` 回傳 ``added`` / +``removed`` / ``moved`` 清單,加上人類可讀的 ``summary`` 與 +``changed_count``。``snapshot_screen`` / ``screen_changed`` 擷取並比對*即時* +樹(會快取基準)。對應 ``AC_screen_snapshot`` / ``AC_screen_diff`` / +``AC_screen_changed``。 + + +描述螢幕 +======== + +:: + + from je_auto_control import describe_screen + + describe_screen() # {app, element_count, by_role: {...}, controls: [...]} + +給 agent 的廉價「我在哪」:各 role 計數與互動控制項的標籤。對應 +``AC_describe_screen`` / ``ac_describe_screen``(差異家族則為 ``ac_screen_*``)。 diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index 28ec678a..bf662c19 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -45,6 +45,7 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v20_features_doc doc/new_features/v21_features_doc doc/new_features/v22_features_doc + doc/new_features/v23_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index c087b231..891091b1 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -169,6 +169,11 @@ from je_auto_control.utils.set_of_marks import ( mark_click, mark_elements, mark_screen, render_marks, resolve_mark, ) +# Semantic screen state (snapshot/diff + structured description) +from je_auto_control.utils.screen_state import ( + describe_screen, diff_snapshots, screen_changed, snapshot, + snapshot_screen, +) # Background popup/interrupt watchdog (unattended automation) from je_auto_control.utils.watchdog import ( PopupWatchdog, WatchdogRule, default_popup_watchdog, @@ -594,6 +599,8 @@ def start_autocontrol_gui(*args, **kwargs): "Checkpoint", "CheckpointStore", "run_resumable", "mark_click", "mark_elements", "mark_screen", "render_marks", "resolve_mark", + "describe_screen", "diff_snapshots", "screen_changed", "snapshot", + "snapshot_screen", # MCP server "AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt", "MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index 6e84d913..e034c37e 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -662,6 +662,30 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None: _add_i18n_specs(specs) _add_checkpoint_specs(specs) _add_set_of_marks_specs(specs) + _add_screen_state_specs(specs) + + +def _add_screen_state_specs(specs: List[CommandSpec]) -> None: + app = FieldSpec("app_name", FieldType.STRING, optional=True) + specs.append(CommandSpec( + "AC_screen_snapshot", "Native UI", "Screen: Snapshot Baseline", + fields=(app,), + description="Snapshot the a11y tree as a semantic-diff baseline.", + )) + specs.append(CommandSpec( + "AC_screen_diff", "Native UI", "Screen: Diff Snapshots", + description="Semantic diff of 'before'/'after' snapshots (JSON view).", + )) + specs.append(CommandSpec( + "AC_screen_changed", "Native UI", "Screen: What Changed", + fields=(app,), + description="Diff the live screen against the last snapshot baseline.", + )) + specs.append(CommandSpec( + "AC_describe_screen", "Native UI", "Screen: Describe", + fields=(app,), + description="Structured 'where am I' (role counts + control labels).", + )) def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None: diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index 4bc97df6..3558321b 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -2756,6 +2756,31 @@ def _mark_click(mark_id: int) -> Dict[str, Any]: return {"clicked": mark_click(int(mark_id))} +def _screen_snapshot(app_name: Optional[str] = None) -> Dict[str, Any]: + """Adapter: snapshot the live a11y tree as a diff baseline.""" + from je_auto_control.utils.screen_state import snapshot_screen + return {"snapshot": snapshot_screen(app_name=app_name)} + + +def _screen_diff(before: List[Dict[str, Any]], + after: List[Dict[str, Any]]) -> Dict[str, Any]: + """Adapter: semantic diff between two snapshots.""" + from je_auto_control.utils.screen_state import diff_snapshots + return diff_snapshots(before, after) + + +def _screen_changed(app_name: Optional[str] = None) -> Dict[str, Any]: + """Adapter: diff the live screen against the last snapshot baseline.""" + from je_auto_control.utils.screen_state import screen_changed + return screen_changed(app_name=app_name) + + +def _describe_screen(app_name: Optional[str] = None) -> Dict[str, Any]: + """Adapter: structured 'where am I' description of the live screen.""" + from je_auto_control.utils.screen_state import describe_screen + return describe_screen(app_name=app_name) + + class Executor: """ Executor @@ -2968,6 +2993,10 @@ def __init__(self): "AC_checkpoint_clear": _checkpoint_clear, "AC_mark_screen": _mark_screen, "AC_mark_click": _mark_click, + "AC_screen_snapshot": _screen_snapshot, + "AC_screen_diff": _screen_diff, + "AC_screen_changed": _screen_changed, + "AC_describe_screen": _describe_screen, "AC_a11y_record_start": _a11y_record_start, "AC_a11y_record_stop": _a11y_record_stop, "AC_a11y_record_events": _a11y_record_events, diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index 96c9bda2..ad0e2f0f 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -2329,6 +2329,47 @@ def set_of_marks_tools() -> List[MCPTool]: ] +def screen_state_tools() -> List[MCPTool]: + _SNAP = {"type": "array", "items": {"type": "object"}} + return [ + MCPTool( + name="ac_screen_snapshot", + description=("Snapshot the live accessibility tree to " + "[{role, name, bbox}] and cache it as the diff " + "baseline."), + input_schema=schema({"app_name": {"type": "string"}}), + handler=h.screen_snapshot, + annotations=SIDE_EFFECT_ONLY, + ), + MCPTool( + name="ac_screen_diff", + description=("Semantic diff between two snapshots: what appeared / " + "vanished / moved, with a human-readable summary."), + input_schema=schema({"before": _SNAP, "after": _SNAP}, + required=["before", "after"]), + handler=h.screen_diff, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_screen_changed", + description=("Diff the live screen against the last " + "ac_screen_snapshot baseline (agent feedback signal: " + "'Save dialog appeared')."), + input_schema=schema({"app_name": {"type": "string"}}), + handler=h.screen_changed, + annotations=SIDE_EFFECT_ONLY, + ), + MCPTool( + name="ac_describe_screen", + description=("Compact 'where am I' description of the live screen: " + "{app, element_count, by_role, controls}."), + input_schema=schema({"app_name": {"type": "string"}}), + handler=h.describe_screen, + annotations=READ_ONLY, + ), + ] + + def unattended_tools() -> List[MCPTool]: return [ MCPTool( @@ -3383,7 +3424,7 @@ def media_assert_tools() -> List[MCPTool]: skill_library_tools, guardrail_tools, a2a_tools, office_tools, agent_memory_tools, determinism_tools, observer_tools, sbom_tools, sharding_tools, data_quality_tools, i18n_tools, - checkpoint_tools, set_of_marks_tools, + checkpoint_tools, set_of_marks_tools, screen_state_tools, screen_record_tools, process_and_shell_tools, remote_desktop_tools, gamepad_tools, usb_passthrough_tools, assertion_tools, data_source_tools, diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index a802b3c9..47637ae2 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -1143,6 +1143,26 @@ def mark_click(mark_id): return {"clicked": _mc(int(mark_id))} +def screen_snapshot(app_name=None): + from je_auto_control.utils.screen_state import snapshot_screen + return {"snapshot": snapshot_screen(app_name=app_name)} + + +def screen_diff(before, after): + from je_auto_control.utils.screen_state import diff_snapshots + return diff_snapshots(before, after) + + +def screen_changed(app_name=None): + from je_auto_control.utils.screen_state import screen_changed as _sc + return _sc(app_name=app_name) + + +def describe_screen(app_name=None): + from je_auto_control.utils.screen_state import describe_screen as _ds + return _ds(app_name=app_name) + + def vlm_locate(description: str, screen_region: Optional[List[int]] = None, model: Optional[str] = None) -> Optional[List[int]]: diff --git a/je_auto_control/utils/screen_state/__init__.py b/je_auto_control/utils/screen_state/__init__.py new file mode 100644 index 00000000..17e250d7 --- /dev/null +++ b/je_auto_control/utils/screen_state/__init__.py @@ -0,0 +1,9 @@ +"""Semantic screen state: snapshot/diff and a structured screen description.""" +from je_auto_control.utils.screen_state.screen_state import ( + describe_screen, diff_snapshots, screen_changed, snapshot, snapshot_screen, +) + +__all__ = [ + "describe_screen", "diff_snapshots", "screen_changed", "snapshot", + "snapshot_screen", +] diff --git a/je_auto_control/utils/screen_state/screen_state.py b/je_auto_control/utils/screen_state/screen_state.py new file mode 100644 index 00000000..f516a82d --- /dev/null +++ b/je_auto_control/utils/screen_state/screen_state.py @@ -0,0 +1,134 @@ +"""Semantic screen state — snapshot/diff and a structured description. + +AutoControl ships a *pixel* diff (visual regression); this is the *semantic* +companion. A snapshot normalizes the accessibility tree to +``{role, name, bbox}`` rows; :func:`diff_snapshots` reports what **appeared**, +**vanished**, or **moved** ("Save dialog appeared", "row added") — the +feedback signal an agent needs to verify a step's effect. :func:`describe_screen` +returns a compact "where am I" structure (role counts + control labels) for an +agent's orientation. + +Pure standard library; imports no ``PySide6``. The pure functions +(``snapshot`` / ``diff_snapshots`` / ``describe_screen`` with supplied +elements) are unit-testable without a live desktop. +""" +from typing import Any, Dict, List, Optional + +_INTERACTIVE_HINTS = ("button", "edit", "text", "combo", "check", "radio", + "menu", "link", "tab", "list", "slider") +_last_snapshot: List[Dict[str, Any]] = [] + + +def _role_of(element: Any) -> str: + if isinstance(element, dict): + return str(element.get("role") or "") + return str(getattr(element, "role", "") or "") + + +def _name_of(element: Any) -> str: + if isinstance(element, dict): + return str(element.get("name") or element.get("text") or "") + return str(getattr(element, "name", "") or "") + + +def _bbox_of(element: Any) -> List[int]: + if isinstance(element, dict): + raw = element.get("bbox") or element.get("bounds") or [] + else: + raw = getattr(element, "bounds", []) or [] + return list(raw) + + +def snapshot(elements: List[Any]) -> List[Dict[str, Any]]: + """Normalize elements to ``[{role, name, bbox}]`` for diffing.""" + return [{"role": _role_of(el), "name": _name_of(el), "bbox": _bbox_of(el)} + for el in elements] + + +def _key(item: Dict[str, Any]) -> tuple: + return (item.get("role", ""), item.get("name", "")) + + +def _moved_items(before_map: Dict[tuple, Dict[str, Any]], + after_map: Dict[tuple, Dict[str, Any]]) -> List[Dict[str, Any]]: + moved = [] + for key, item in after_map.items(): + prior = before_map.get(key) + if prior is not None and item.get("bbox") != prior.get("bbox"): + moved.append({"role": key[0], "name": key[1], + "before": prior.get("bbox"), + "after": item.get("bbox")}) + return moved + + +def _diff_summary(added: List[Dict[str, Any]], removed: List[Dict[str, Any]], + moved: List[Dict[str, Any]]) -> List[str]: + lines = [f"appeared: {i['role']} {i['name']}".strip() for i in added] + lines += [f"vanished: {i['role']} {i['name']}".strip() for i in removed] + lines += [f"moved: {m['role']} {m['name']}".strip() for m in moved] + return lines + + +def diff_snapshots(before: List[Dict[str, Any]], + after: List[Dict[str, Any]]) -> Dict[str, Any]: + """Diff two snapshots into ``{added, removed, moved, summary}``. + + Identity is ``(role, name)``; ``moved`` are matched items whose bbox + changed. ``summary`` is a list of human-readable strings. + """ + before_map = {_key(i): i for i in before} + after_map = {_key(i): i for i in after} + added = [after_map[k] for k in after_map if k not in before_map] + removed = [before_map[k] for k in before_map if k not in after_map] + moved = _moved_items(before_map, after_map) + return {"added": added, "removed": removed, "moved": moved, + "summary": _diff_summary(added, removed, moved), + "changed_count": len(added) + len(removed) + len(moved)} + + +def _live_elements(app_name: Optional[str]) -> List[Any]: + from je_auto_control.utils.accessibility.accessibility_api import ( + list_accessibility_elements) + return list_accessibility_elements(app_name=app_name) + + +def snapshot_screen(app_name: Optional[str] = None) -> List[Dict[str, Any]]: + """Snapshot the live accessibility tree and cache it as the baseline.""" + snap = snapshot(_live_elements(app_name)) + _last_snapshot.clear() + _last_snapshot.extend(snap) + return snap + + +def screen_changed(app_name: Optional[str] = None) -> Dict[str, Any]: + """Diff the live screen against the last :func:`snapshot_screen` baseline.""" + before = list(_last_snapshot) + after = snapshot(_live_elements(app_name)) + _last_snapshot.clear() + _last_snapshot.extend(after) + return diff_snapshots(before, after) + + +def _is_interactive(role: str) -> bool: + lowered = role.lower() + return any(hint in lowered for hint in _INTERACTIVE_HINTS) + + +def describe_screen(elements: Optional[List[Any]] = None, + app_name: Optional[str] = None) -> Dict[str, Any]: + """Return a compact structured description of the screen. + + ``{app, element_count, by_role, controls}`` where ``controls`` lists the + labels of interactive elements — a cheap "where am I" for an agent. + """ + items = snapshot(elements if elements is not None + else _live_elements(app_name)) + by_role: Dict[str, int] = {} + controls: List[str] = [] + for item in items: + role = item["role"] or "(unknown)" + by_role[role] = by_role.get(role, 0) + 1 + if item["name"] and _is_interactive(role): + controls.append(item["name"]) + return {"app": app_name or "", "element_count": len(items), + "by_role": by_role, "controls": controls} diff --git a/test/unit_test/headless/test_screen_state_batch.py b/test/unit_test/headless/test_screen_state_batch.py new file mode 100644 index 00000000..4149f73d --- /dev/null +++ b/test/unit_test/headless/test_screen_state_batch.py @@ -0,0 +1,71 @@ +"""Headless tests for semantic screen state: snapshot/diff and describe. +Pure stdlib; diffs/describe run on supplied elements (no live desktop).""" +import je_auto_control as ac +from je_auto_control.utils.screen_state import ( + describe_screen, diff_snapshots, snapshot) + + +def test_snapshot_normalizes(): + snap = snapshot([{"role": "button", "name": "OK", "bbox": [0, 0, 10, 10]}]) + assert snap == [{"role": "button", "name": "OK", "bbox": [0, 0, 10, 10]}] + + +def test_diff_reports_appeared_vanished_moved(): + before = [{"role": "button", "name": "OK", "bbox": [0, 0, 10, 10]}, + {"role": "edit", "name": "user", "bbox": [0, 20, 50, 10]}] + after = [{"role": "button", "name": "OK", "bbox": [5, 5, 10, 10]}, + {"role": "window", "name": "Save", "bbox": [0, 0, 200, 100]}] + diff = diff_snapshots(before, after) + assert [a["name"] for a in diff["added"]] == ["Save"] + assert [r["name"] for r in diff["removed"]] == ["user"] + assert [m["name"] for m in diff["moved"]] == ["OK"] + assert diff["changed_count"] == 3 + assert any("appeared: window Save" in s for s in diff["summary"]) + + +def test_diff_no_change(): + snap = [{"role": "button", "name": "OK", "bbox": [0, 0, 10, 10]}] + diff = diff_snapshots(snap, snap) + assert diff["changed_count"] == 0 and diff["summary"] == [] + + +def test_describe_groups_and_lists_controls(): + elements = [ + {"role": "button", "name": "Save", "bbox": [0, 0, 10, 10]}, + {"role": "button", "name": "Cancel", "bbox": [0, 0, 10, 10]}, + {"role": "window", "name": "Dlg", "bbox": [0, 0, 10, 10]}, + ] + out = describe_screen(elements=elements, app_name="MyApp") + assert out["app"] == "MyApp" and out["element_count"] == 3 + assert out["by_role"]["button"] == 2 + assert set(out["controls"]) == {"Save", "Cancel"} # window not a control + + +# --- wiring --------------------------------------------------------------- + +def test_executor_wiring(): + rec = ac.execute_action([["AC_screen_diff", { + "before": [], "after": [{"role": "x", "name": "y", "bbox": []}]}]]) + assert any("appeared" in str(v) for v in rec.values()) + known = ac.executor.known_commands() + assert {"AC_screen_snapshot", "AC_screen_diff", "AC_screen_changed", + "AC_describe_screen"} <= known + + +def test_mcp_and_builder_wiring(): + from je_auto_control.utils.mcp_server.tools import ( + build_default_tool_registry) + names = {t.name for t in build_default_tool_registry()} + assert {"ac_screen_snapshot", "ac_screen_diff", "ac_screen_changed", + "ac_describe_screen"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + cmds = {s.command for s in _build_specs()} + assert {"AC_screen_snapshot", "AC_screen_diff", "AC_screen_changed", + "AC_describe_screen"} <= cmds + + +def test_facade_exports(): + for attr in ("snapshot", "diff_snapshots", "screen_changed", + "snapshot_screen", "describe_screen"): + assert hasattr(ac, attr) + assert attr in ac.__all__