Integration-Automation · JE-Chen · Jun 19, 2026 · Jun 19, 2026
diff --git a/README.md b/README.md
@@ -13,6 +13,7 @@
 
 ## Table of Contents
 
+- [What's new (2026-06-19) — Semantic Screen State](#whats-new-2026-06-19--semantic-screen-state)
 - [What's new (2026-06-19) — Set-of-Marks Overlay](#whats-new-2026-06-19--set-of-marks-overlay)
 - [What's new (2026-06-19) — Checkpoint & Resume](#whats-new-2026-06-19--checkpoint--resume)
 - [What's new (2026-06-19) — i18n / l10n Testing](#whats-new-2026-06-19--i18n--l10n-testing)
@@ -75,6 +76,13 @@
 
 ---
 
+## What's new (2026-06-19) — Semantic Screen State
+
+The semantic companion to the pixel diff, full stack. Full reference: [`docs/source/Eng/doc/new_features/v23_features_doc.rst`](docs/source/Eng/doc/new_features/v23_features_doc.rst).
+
+- **Snapshot & diff** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed` (`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`, `ac_*`): normalize the a11y tree to `{role, name, bbox}` and report what **appeared / vanished / moved** with a human-readable summary — the feedback signal an agent needs to verify a step ("Save dialog appeared").
+- **Describe the screen** — `describe_screen` (`AC_describe_screen`, `ac_describe_screen`): a compact "where am I" — role counts + interactive control labels.
+
 ## What's new (2026-06-19) — Set-of-Marks Overlay
 
 The standard VLM-grounding format, full stack. Full reference: [`docs/source/Eng/doc/new_features/v22_features_doc.rst`](docs/source/Eng/doc/new_features/v22_features_doc.rst).

diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md
@@ -12,6 +12,7 @@
 
 ## 目录
 
+- [本次更新 (2026-06-19) — 语义屏幕状态](#本次更新-2026-06-19--语义屏幕状态)
 - [本次更新 (2026-06-19) — Set-of-Marks 叠图](#本次更新-2026-06-19--set-of-marks-叠图)
 - [本次更新 (2026-06-19) — 检查点与续跑](#本次更新-2026-06-19--检查点与续跑)
 - [本次更新 (2026-06-19) — i18n / l10n 测试](#本次更新-2026-06-19--i18n--l10n-测试)
@@ -74,6 +75,13 @@
 
 ---
 
+## 本次更新 (2026-06-19) — 语义屏幕状态
+
+像素差异的语义对应物,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。
+
+- **快照与差异** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 树规范化为 `{role, name, bbox}`,报告**出现 / 消失 / 移动**并附人类可读摘要——agent 验证某步效果所需的反馈信号(「Save 对话框出现了」)。
+- **描述屏幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉价的「我在哪」——各 role 计数 + 交互控件标签。
+
 ## 本次更新 (2026-06-19) — Set-of-Marks 叠图
 
 VLM 定位的标准格式,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。

diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md
@@ -12,6 +12,7 @@
 
 ## 目錄
 
+- [本次更新 (2026-06-19) — 語意螢幕狀態](#本次更新-2026-06-19--語意螢幕狀態)
 - [本次更新 (2026-06-19) — Set-of-Marks 疊圖](#本次更新-2026-06-19--set-of-marks-疊圖)
 - [本次更新 (2026-06-19) — 檢查點與續跑](#本次更新-2026-06-19--檢查點與續跑)
 - [本次更新 (2026-06-19) — i18n / l10n 測試](#本次更新-2026-06-19--i18n--l10n-測試)
@@ -74,6 +75,13 @@
 
 ---
 
+## 本次更新 (2026-06-19) — 語意螢幕狀態
+
+像素差異的語意對應物,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。
+
+- **快照與差異** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 樹正規化為 `{role, name, bbox}`,回報**出現 / 消失 / 移動**並附人類可讀摘要——agent 驗證某步效果所需的回饋訊號(「Save 對話框出現了」)。
+- **描述螢幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉價的「我在哪」——各 role 計數 + 互動控制項標籤。
+
 ## 本次更新 (2026-06-19) — Set-of-Marks 疊圖
 
 VLM 定位的標準格式,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。

diff --git a/docs/source/Eng/doc/new_features/v23_features_doc.rst b/docs/source/Eng/doc/new_features/v23_features_doc.rst
@@ -0,0 +1,47 @@
+==================================================
+New Features (2026-06-19) — Semantic Screen State
+==================================================
+
+The *semantic* companion to the existing pixel (visual-regression) diff:
+snapshot the accessibility tree, diff two snapshots into what **appeared /
+vanished / moved**, and get a compact structured **description** of the
+screen. This is the feedback signal an agent needs to verify a step's
+effect and orient itself. Pure standard library; full stack.
+
+.. contents::
+   :local:
+   :depth: 2
+
+
+Snapshot & diff
+==============
+
+::
+
+    from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed
+
+    before = snapshot_screen()      # baseline from the live a11y tree
+    ...                              # perform a step
+    delta = screen_changed()         # diff vs the baseline
+    delta["summary"]                 # ["appeared: window Save", "moved: button OK"]
+
+``snapshot`` normalizes elements to ``[{role, name, bbox}]`` (identity =
+``(role, name)``); ``diff_snapshots(before, after)`` returns ``added`` /
+``removed`` / ``moved`` lists plus a human-readable ``summary`` and
+``changed_count``. ``snapshot_screen`` / ``screen_changed`` capture and diff
+the *live* tree (caching the baseline). Exposed as ``AC_screen_snapshot`` /
+``AC_screen_diff`` / ``AC_screen_changed``.
+
+
+Describe the screen
+==================
+
+::
+
+    from je_auto_control import describe_screen
+
+    describe_screen()    # {app, element_count, by_role: {...}, controls: [...]}
+
+A cheap "where am I" for an agent: counts per role and the labels of the
+interactive controls. Exposed as ``AC_describe_screen`` /
+``ac_describe_screen`` (and ``ac_screen_*`` for the diff family).
diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst
@@ -45,6 +45,7 @@ Comprehensive guides for all AutoControl features.
    doc/new_features/v20_features_doc
    doc/new_features/v21_features_doc
    doc/new_features/v22_features_doc
+   doc/new_features/v23_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/docs/source/Zh/doc/new_features/v23_features_doc.rst b/docs/source/Zh/doc/new_features/v23_features_doc.rst
@@ -0,0 +1,44 @@
+==========================================
+新功能 (2026-06-19) — 語意螢幕狀態
+==========================================
+
+既有像素(視覺回歸)差異的*語意*對應物:快照 accessibility 樹、把兩份
+快照差異成**出現 / 消失 / 移動**,並取得螢幕的精簡結構化**描述**。這是
+agent 驗證某步效果與自我定位所需的回饋訊號。純標準庫;走完整五層。
+
+.. contents::
+   :local:
+   :depth: 2
+
+
+快照與差異
+==========
+
+::
+
+    from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed
+
+    before = snapshot_screen()      # 從即時 a11y 樹取基準
+    ...                              # 執行某個步驟
+    delta = screen_changed()         # 與基準比對
+    delta["summary"]                 # ["appeared: window Save", "moved: button OK"]
+
+``snapshot`` 把元素正規化為 ``[{role, name, bbox}]``(識別 =
+``(role, name)``);``diff_snapshots(before, after)`` 回傳 ``added`` /
+``removed`` / ``moved`` 清單,加上人類可讀的 ``summary`` 與
+``changed_count``。``snapshot_screen`` / ``screen_changed`` 擷取並比對*即時*
+樹(會快取基準)。對應 ``AC_screen_snapshot`` / ``AC_screen_diff`` /
+``AC_screen_changed``。
+
+
+描述螢幕
+========
+
+::
+
+    from je_auto_control import describe_screen
+
+    describe_screen()    # {app, element_count, by_role: {...}, controls: [...]}
+
+給 agent 的廉價「我在哪」:各 role 計數與互動控制項的標籤。對應
+``AC_describe_screen`` / ``ac_describe_screen``(差異家族則為 ``ac_screen_*``)。
diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst
@@ -45,6 +45,7 @@ AutoControl 所有功能的完整使用指南。
    doc/new_features/v20_features_doc
    doc/new_features/v21_features_doc
    doc/new_features/v22_features_doc
+   doc/new_features/v23_features_doc
    doc/ocr_backends/ocr_backends_doc
    doc/observability/observability_doc
    doc/operations_layer/operations_layer_doc

diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py
@@ -169,6 +169,11 @@
 from je_auto_control.utils.set_of_marks import (
     mark_click, mark_elements, mark_screen, render_marks, resolve_mark,
 )
+# Semantic screen state (snapshot/diff + structured description)
+from je_auto_control.utils.screen_state import (
+    describe_screen, diff_snapshots, screen_changed, snapshot,
+    snapshot_screen,
+)
 # Background popup/interrupt watchdog (unattended automation)
 from je_auto_control.utils.watchdog import (
     PopupWatchdog, WatchdogRule, default_popup_watchdog,
@@ -594,6 +599,8 @@ def start_autocontrol_gui(*args, **kwargs):
     "Checkpoint", "CheckpointStore", "run_resumable",
     "mark_click", "mark_elements", "mark_screen", "render_marks",
     "resolve_mark",
+    "describe_screen", "diff_snapshots", "screen_changed", "snapshot",
+    "snapshot_screen",
     # MCP server
     "AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
     "MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",

diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py
@@ -577,7 +577,7 @@
        FieldSpec("automation_id", FieldType.STRING, optional=True),
    )
    specs.append(CommandSpec(
        "AC_control_get_value", "Native UI", "Get Control Value",
        fields=fields,
        description="Read a native control's value via the accessibility API.",
    ))
@@ -662,6 +662,30 @@
     _add_i18n_specs(specs)
     _add_checkpoint_specs(specs)
     _add_set_of_marks_specs(specs)
+    _add_screen_state_specs(specs)
+
+
+def _add_screen_state_specs(specs: List[CommandSpec]) -> None:
+    app = FieldSpec("app_name", FieldType.STRING, optional=True)
+    specs.append(CommandSpec(
+        "AC_screen_snapshot", "Native UI", "Screen: Snapshot Baseline",
+        fields=(app,),
+        description="Snapshot the a11y tree as a semantic-diff baseline.",
+    ))
+    specs.append(CommandSpec(
+        "AC_screen_diff", "Native UI", "Screen: Diff Snapshots",
+        description="Semantic diff of 'before'/'after' snapshots (JSON view).",
+    ))
+    specs.append(CommandSpec(
+        "AC_screen_changed", "Native UI", "Screen: What Changed",
+        fields=(app,),
+        description="Diff the live screen against the last snapshot baseline.",
+    ))
+    specs.append(CommandSpec(
+        "AC_describe_screen", "Native UI", "Screen: Describe",
+        fields=(app,),
+        description="Structured 'where am I' (role counts + control labels).",
+    ))
 
 
 def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None:

diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py
@@ -2756,6 +2756,31 @@ def _mark_click(mark_id: int) -> Dict[str, Any]:
     return {"clicked": mark_click(int(mark_id))}
 
 
+def _screen_snapshot(app_name: Optional[str] = None) -> Dict[str, Any]:
+    """Adapter: snapshot the live a11y tree as a diff baseline."""
+    from je_auto_control.utils.screen_state import snapshot_screen
+    return {"snapshot": snapshot_screen(app_name=app_name)}
+
+
+def _screen_diff(before: List[Dict[str, Any]],
+                 after: List[Dict[str, Any]]) -> Dict[str, Any]:
+    """Adapter: semantic diff between two snapshots."""
+    from je_auto_control.utils.screen_state import diff_snapshots
+    return diff_snapshots(before, after)
+
+
+def _screen_changed(app_name: Optional[str] = None) -> Dict[str, Any]:
+    """Adapter: diff the live screen against the last snapshot baseline."""
+    from je_auto_control.utils.screen_state import screen_changed
+    return screen_changed(app_name=app_name)
+
+
+def _describe_screen(app_name: Optional[str] = None) -> Dict[str, Any]:
+    """Adapter: structured 'where am I' description of the live screen."""
+    from je_auto_control.utils.screen_state import describe_screen
+    return describe_screen(app_name=app_name)
+
+
 class Executor:
     """
     Executor
@@ -2968,6 +2993,10 @@ def __init__(self):
             "AC_checkpoint_clear": _checkpoint_clear,
             "AC_mark_screen": _mark_screen,
             "AC_mark_click": _mark_click,
+            "AC_screen_snapshot": _screen_snapshot,
+            "AC_screen_diff": _screen_diff,
+            "AC_screen_changed": _screen_changed,
+            "AC_describe_screen": _describe_screen,
             "AC_a11y_record_start": _a11y_record_start,
             "AC_a11y_record_stop": _a11y_record_stop,
             "AC_a11y_record_events": _a11y_record_events,

diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py
@@ -2329,6 +2329,47 @@ def set_of_marks_tools() -> List[MCPTool]:
     ]
 
 
+def screen_state_tools() -> List[MCPTool]:
+    _SNAP = {"type": "array", "items": {"type": "object"}}
+    return [
+        MCPTool(
+            name="ac_screen_snapshot",
+            description=("Snapshot the live accessibility tree to "
+                         "[{role, name, bbox}] and cache it as the diff "
+                         "baseline."),
+            input_schema=schema({"app_name": {"type": "string"}}),
+            handler=h.screen_snapshot,
+            annotations=SIDE_EFFECT_ONLY,
+        ),
+        MCPTool(
+            name="ac_screen_diff",
+            description=("Semantic diff between two snapshots: what appeared / "
+                         "vanished / moved, with a human-readable summary."),
+            input_schema=schema({"before": _SNAP, "after": _SNAP},
+                                required=["before", "after"]),
+            handler=h.screen_diff,
+            annotations=READ_ONLY,
+        ),
+        MCPTool(
+            name="ac_screen_changed",
+            description=("Diff the live screen against the last "
+                         "ac_screen_snapshot baseline (agent feedback signal: "
+                         "'Save dialog appeared')."),
+            input_schema=schema({"app_name": {"type": "string"}}),
+            handler=h.screen_changed,
+            annotations=SIDE_EFFECT_ONLY,
+        ),
+        MCPTool(
+            name="ac_describe_screen",
+            description=("Compact 'where am I' description of the live screen: "
+                         "{app, element_count, by_role, controls}."),
+            input_schema=schema({"app_name": {"type": "string"}}),
+            handler=h.describe_screen,
+            annotations=READ_ONLY,
+        ),
+    ]
+
+
 def unattended_tools() -> List[MCPTool]:
     return [
         MCPTool(
@@ -3383,7 +3424,7 @@ def media_assert_tools() -> List[MCPTool]:
     skill_library_tools, guardrail_tools, a2a_tools, office_tools,
     agent_memory_tools, determinism_tools, observer_tools,
     sbom_tools, sharding_tools, data_quality_tools, i18n_tools,
-    checkpoint_tools, set_of_marks_tools,
+    checkpoint_tools, set_of_marks_tools, screen_state_tools,
     screen_record_tools,
     process_and_shell_tools, remote_desktop_tools, gamepad_tools,
     usb_passthrough_tools, assertion_tools, data_source_tools,

diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py
@@ -1143,6 +1143,26 @@ def mark_click(mark_id):
     return {"clicked": _mc(int(mark_id))}
 
 
+def screen_snapshot(app_name=None):
+    from je_auto_control.utils.screen_state import snapshot_screen
+    return {"snapshot": snapshot_screen(app_name=app_name)}
+
+
+def screen_diff(before, after):
+    from je_auto_control.utils.screen_state import diff_snapshots
+    return diff_snapshots(before, after)
+
+
+def screen_changed(app_name=None):
+    from je_auto_control.utils.screen_state import screen_changed as _sc
+    return _sc(app_name=app_name)
+
+
+def describe_screen(app_name=None):
+    from je_auto_control.utils.screen_state import describe_screen as _ds
+    return _ds(app_name=app_name)
+
+
 def vlm_locate(description: str,
                screen_region: Optional[List[int]] = None,
                model: Optional[str] = None) -> Optional[List[int]]:

diff --git a/je_auto_control/utils/screen_state/__init__.py b/je_auto_control/utils/screen_state/__init__.py
@@ -0,0 +1,9 @@
+"""Semantic screen state: snapshot/diff and a structured screen description."""
+from je_auto_control.utils.screen_state.screen_state import (
+    describe_screen, diff_snapshots, screen_changed, snapshot, snapshot_screen,
+)
+
+__all__ = [
+    "describe_screen", "diff_snapshots", "screen_changed", "snapshot",
+    "snapshot_screen",
+]