Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

## Table of Contents

- [What's new (2026-06-19) — Semantic Screen State](#whats-new-2026-06-19--semantic-screen-state)
- [What's new (2026-06-19) — Set-of-Marks Overlay](#whats-new-2026-06-19--set-of-marks-overlay)
- [What's new (2026-06-19) — Checkpoint & Resume](#whats-new-2026-06-19--checkpoint--resume)
- [What's new (2026-06-19) — i18n / l10n Testing](#whats-new-2026-06-19--i18n--l10n-testing)
Expand Down Expand Up @@ -75,6 +76,13 @@

---

## What's new (2026-06-19) — Semantic Screen State

The semantic companion to the pixel diff, full stack. Full reference: [`docs/source/Eng/doc/new_features/v23_features_doc.rst`](docs/source/Eng/doc/new_features/v23_features_doc.rst).

- **Snapshot & diff** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed` (`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`, `ac_*`): normalize the a11y tree to `{role, name, bbox}` and report what **appeared / vanished / moved** with a human-readable summary — the feedback signal an agent needs to verify a step ("Save dialog appeared").
- **Describe the screen** — `describe_screen` (`AC_describe_screen`, `ac_describe_screen`): a compact "where am I" — role counts + interactive control labels.

## What's new (2026-06-19) — Set-of-Marks Overlay

The standard VLM-grounding format, full stack. Full reference: [`docs/source/Eng/doc/new_features/v22_features_doc.rst`](docs/source/Eng/doc/new_features/v22_features_doc.rst).
Expand Down
8 changes: 8 additions & 0 deletions README/README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目录

- [本次更新 (2026-06-19) — 语义屏幕状态](#本次更新-2026-06-19--语义屏幕状态)
- [本次更新 (2026-06-19) — Set-of-Marks 叠图](#本次更新-2026-06-19--set-of-marks-叠图)
- [本次更新 (2026-06-19) — 检查点与续跑](#本次更新-2026-06-19--检查点与续跑)
- [本次更新 (2026-06-19) — i18n / l10n 测试](#本次更新-2026-06-19--i18n--l10n-测试)
Expand Down Expand Up @@ -74,6 +75,13 @@

---

## 本次更新 (2026-06-19) — 语义屏幕状态

像素差异的语义对应物,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。

- **快照与差异** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 树规范化为 `{role, name, bbox}`,报告**出现 / 消失 / 移动**并附人类可读摘要——agent 验证某步效果所需的反馈信号(「Save 对话框出现了」)。
- **描述屏幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉价的「我在哪」——各 role 计数 + 交互控件标签。

## 本次更新 (2026-06-19) — Set-of-Marks 叠图

VLM 定位的标准格式,走完整五层。完整参考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。
Expand Down
8 changes: 8 additions & 0 deletions README/README_zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

## 目錄

- [本次更新 (2026-06-19) — 語意螢幕狀態](#本次更新-2026-06-19--語意螢幕狀態)
- [本次更新 (2026-06-19) — Set-of-Marks 疊圖](#本次更新-2026-06-19--set-of-marks-疊圖)
- [本次更新 (2026-06-19) — 檢查點與續跑](#本次更新-2026-06-19--檢查點與續跑)
- [本次更新 (2026-06-19) — i18n / l10n 測試](#本次更新-2026-06-19--i18n--l10n-測試)
Expand Down Expand Up @@ -74,6 +75,13 @@

---

## 本次更新 (2026-06-19) — 語意螢幕狀態

像素差異的語意對應物,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v23_features_doc.rst`](../docs/source/Zh/doc/new_features/v23_features_doc.rst)。

- **快照與差異** — `snapshot` / `diff_snapshots` / `snapshot_screen` / `screen_changed`(`AC_screen_snapshot` / `AC_screen_diff` / `AC_screen_changed`、`ac_*`):把 a11y 樹正規化為 `{role, name, bbox}`,回報**出現 / 消失 / 移動**並附人類可讀摘要——agent 驗證某步效果所需的回饋訊號(「Save 對話框出現了」)。
- **描述螢幕** — `describe_screen`(`AC_describe_screen`、`ac_describe_screen`):廉價的「我在哪」——各 role 計數 + 互動控制項標籤。

## 本次更新 (2026-06-19) — Set-of-Marks 疊圖

VLM 定位的標準格式,走完整五層。完整參考:[`docs/source/Zh/doc/new_features/v22_features_doc.rst`](../docs/source/Zh/doc/new_features/v22_features_doc.rst)。
Expand Down
47 changes: 47 additions & 0 deletions docs/source/Eng/doc/new_features/v23_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
==================================================
New Features (2026-06-19) — Semantic Screen State
==================================================

The *semantic* companion to the existing pixel (visual-regression) diff:
snapshot the accessibility tree, diff two snapshots into what **appeared /
vanished / moved**, and get a compact structured **description** of the
screen. This is the feedback signal an agent needs to verify a step's
effect and orient itself. Pure standard library; full stack.

.. contents::
:local:
:depth: 2


Snapshot & diff
==============

::

from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed

before = snapshot_screen() # baseline from the live a11y tree
... # perform a step
delta = screen_changed() # diff vs the baseline
delta["summary"] # ["appeared: window Save", "moved: button OK"]

``snapshot`` normalizes elements to ``[{role, name, bbox}]`` (identity =
``(role, name)``); ``diff_snapshots(before, after)`` returns ``added`` /
``removed`` / ``moved`` lists plus a human-readable ``summary`` and
``changed_count``. ``snapshot_screen`` / ``screen_changed`` capture and diff
the *live* tree (caching the baseline). Exposed as ``AC_screen_snapshot`` /
``AC_screen_diff`` / ``AC_screen_changed``.


Describe the screen
==================

::

from je_auto_control import describe_screen

describe_screen() # {app, element_count, by_role: {...}, controls: [...]}

A cheap "where am I" for an agent: counts per role and the labels of the
interactive controls. Exposed as ``AC_describe_screen`` /
``ac_describe_screen`` (and ``ac_screen_*`` for the diff family).
1 change: 1 addition & 0 deletions docs/source/Eng/eng_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Comprehensive guides for all AutoControl features.
doc/new_features/v20_features_doc
doc/new_features/v21_features_doc
doc/new_features/v22_features_doc
doc/new_features/v23_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
44 changes: 44 additions & 0 deletions docs/source/Zh/doc/new_features/v23_features_doc.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
==========================================
新功能 (2026-06-19) — 語意螢幕狀態
==========================================

既有像素(視覺回歸)差異的*語意*對應物:快照 accessibility 樹、把兩份
快照差異成**出現 / 消失 / 移動**,並取得螢幕的精簡結構化**描述**。這是
agent 驗證某步效果與自我定位所需的回饋訊號。純標準庫;走完整五層。

.. contents::
:local:
:depth: 2


快照與差異
==========

::

from je_auto_control import snapshot, diff_snapshots, snapshot_screen, screen_changed

before = snapshot_screen() # 從即時 a11y 樹取基準
... # 執行某個步驟
delta = screen_changed() # 與基準比對
delta["summary"] # ["appeared: window Save", "moved: button OK"]

``snapshot`` 把元素正規化為 ``[{role, name, bbox}]``(識別 =
``(role, name)``);``diff_snapshots(before, after)`` 回傳 ``added`` /
``removed`` / ``moved`` 清單,加上人類可讀的 ``summary`` 與
``changed_count``。``snapshot_screen`` / ``screen_changed`` 擷取並比對*即時*
樹(會快取基準)。對應 ``AC_screen_snapshot`` / ``AC_screen_diff`` /
``AC_screen_changed``。


描述螢幕
========

::

from je_auto_control import describe_screen

describe_screen() # {app, element_count, by_role: {...}, controls: [...]}

給 agent 的廉價「我在哪」:各 role 計數與互動控制項的標籤。對應
``AC_describe_screen`` / ``ac_describe_screen``(差異家族則為 ``ac_screen_*``)。
1 change: 1 addition & 0 deletions docs/source/Zh/zh_index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ AutoControl 所有功能的完整使用指南。
doc/new_features/v20_features_doc
doc/new_features/v21_features_doc
doc/new_features/v22_features_doc
doc/new_features/v23_features_doc
doc/ocr_backends/ocr_backends_doc
doc/observability/observability_doc
doc/operations_layer/operations_layer_doc
Expand Down
7 changes: 7 additions & 0 deletions je_auto_control/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,11 @@
from je_auto_control.utils.set_of_marks import (
mark_click, mark_elements, mark_screen, render_marks, resolve_mark,
)
# Semantic screen state (snapshot/diff + structured description)
from je_auto_control.utils.screen_state import (
describe_screen, diff_snapshots, screen_changed, snapshot,
snapshot_screen,
)
# Background popup/interrupt watchdog (unattended automation)
from je_auto_control.utils.watchdog import (
PopupWatchdog, WatchdogRule, default_popup_watchdog,
Expand Down Expand Up @@ -594,6 +599,8 @@ def start_autocontrol_gui(*args, **kwargs):
"Checkpoint", "CheckpointStore", "run_resumable",
"mark_click", "mark_elements", "mark_screen", "render_marks",
"resolve_mark",
"describe_screen", "diff_snapshots", "screen_changed", "snapshot",
"snapshot_screen",
# MCP server
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",
Expand Down
24 changes: 24 additions & 0 deletions je_auto_control/gui/script_builder/command_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -577,7 +577,7 @@
FieldSpec("automation_id", FieldType.STRING, optional=True),
)
specs.append(CommandSpec(
"AC_control_get_value", "Native UI", "Get Control Value",

Check failure on line 580 in je_auto_control/gui/script_builder/command_schema.py

View check run for this annotation

SonarQubeCloud / SonarCloud Code Analysis

Define a constant instead of duplicating this literal "Native UI" 17 times.

See more on https://sonarcloud.io/project/issues?id=Integration-Automation_AutoControlGUI&issues=AZ7fI6sHk_nvyaeE7yqd&open=AZ7fI6sHk_nvyaeE7yqd&pullRequest=231
fields=fields,
description="Read a native control's value via the accessibility API.",
))
Expand Down Expand Up @@ -662,6 +662,30 @@
_add_i18n_specs(specs)
_add_checkpoint_specs(specs)
_add_set_of_marks_specs(specs)
_add_screen_state_specs(specs)


def _add_screen_state_specs(specs: List[CommandSpec]) -> None:
app = FieldSpec("app_name", FieldType.STRING, optional=True)
specs.append(CommandSpec(
"AC_screen_snapshot", "Native UI", "Screen: Snapshot Baseline",
fields=(app,),
description="Snapshot the a11y tree as a semantic-diff baseline.",
))
specs.append(CommandSpec(
"AC_screen_diff", "Native UI", "Screen: Diff Snapshots",
description="Semantic diff of 'before'/'after' snapshots (JSON view).",
))
specs.append(CommandSpec(
"AC_screen_changed", "Native UI", "Screen: What Changed",
fields=(app,),
description="Diff the live screen against the last snapshot baseline.",
))
specs.append(CommandSpec(
"AC_describe_screen", "Native UI", "Screen: Describe",
fields=(app,),
description="Structured 'where am I' (role counts + control labels).",
))


def _add_set_of_marks_specs(specs: List[CommandSpec]) -> None:
Expand Down
29 changes: 29 additions & 0 deletions je_auto_control/utils/executor/action_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -2756,6 +2756,31 @@ def _mark_click(mark_id: int) -> Dict[str, Any]:
return {"clicked": mark_click(int(mark_id))}


def _screen_snapshot(app_name: Optional[str] = None) -> Dict[str, Any]:
"""Adapter: snapshot the live a11y tree as a diff baseline."""
from je_auto_control.utils.screen_state import snapshot_screen
return {"snapshot": snapshot_screen(app_name=app_name)}


def _screen_diff(before: List[Dict[str, Any]],
after: List[Dict[str, Any]]) -> Dict[str, Any]:
"""Adapter: semantic diff between two snapshots."""
from je_auto_control.utils.screen_state import diff_snapshots
return diff_snapshots(before, after)


def _screen_changed(app_name: Optional[str] = None) -> Dict[str, Any]:
"""Adapter: diff the live screen against the last snapshot baseline."""
from je_auto_control.utils.screen_state import screen_changed
return screen_changed(app_name=app_name)


def _describe_screen(app_name: Optional[str] = None) -> Dict[str, Any]:
"""Adapter: structured 'where am I' description of the live screen."""
from je_auto_control.utils.screen_state import describe_screen
return describe_screen(app_name=app_name)


class Executor:
"""
Executor
Expand Down Expand Up @@ -2968,6 +2993,10 @@ def __init__(self):
"AC_checkpoint_clear": _checkpoint_clear,
"AC_mark_screen": _mark_screen,
"AC_mark_click": _mark_click,
"AC_screen_snapshot": _screen_snapshot,
"AC_screen_diff": _screen_diff,
"AC_screen_changed": _screen_changed,
"AC_describe_screen": _describe_screen,
"AC_a11y_record_start": _a11y_record_start,
"AC_a11y_record_stop": _a11y_record_stop,
"AC_a11y_record_events": _a11y_record_events,
Expand Down
43 changes: 42 additions & 1 deletion je_auto_control/utils/mcp_server/tools/_factories.py
Original file line number Diff line number Diff line change
Expand Up @@ -2329,6 +2329,47 @@ def set_of_marks_tools() -> List[MCPTool]:
]


def screen_state_tools() -> List[MCPTool]:
_SNAP = {"type": "array", "items": {"type": "object"}}
return [
MCPTool(
name="ac_screen_snapshot",
description=("Snapshot the live accessibility tree to "
"[{role, name, bbox}] and cache it as the diff "
"baseline."),
input_schema=schema({"app_name": {"type": "string"}}),
handler=h.screen_snapshot,
annotations=SIDE_EFFECT_ONLY,
),
MCPTool(
name="ac_screen_diff",
description=("Semantic diff between two snapshots: what appeared / "
"vanished / moved, with a human-readable summary."),
input_schema=schema({"before": _SNAP, "after": _SNAP},
required=["before", "after"]),
handler=h.screen_diff,
annotations=READ_ONLY,
),
MCPTool(
name="ac_screen_changed",
description=("Diff the live screen against the last "
"ac_screen_snapshot baseline (agent feedback signal: "
"'Save dialog appeared')."),
input_schema=schema({"app_name": {"type": "string"}}),
handler=h.screen_changed,
annotations=SIDE_EFFECT_ONLY,
),
MCPTool(
name="ac_describe_screen",
description=("Compact 'where am I' description of the live screen: "
"{app, element_count, by_role, controls}."),
input_schema=schema({"app_name": {"type": "string"}}),
handler=h.describe_screen,
annotations=READ_ONLY,
),
]


def unattended_tools() -> List[MCPTool]:
return [
MCPTool(
Expand Down Expand Up @@ -3383,7 +3424,7 @@ def media_assert_tools() -> List[MCPTool]:
skill_library_tools, guardrail_tools, a2a_tools, office_tools,
agent_memory_tools, determinism_tools, observer_tools,
sbom_tools, sharding_tools, data_quality_tools, i18n_tools,
checkpoint_tools, set_of_marks_tools,
checkpoint_tools, set_of_marks_tools, screen_state_tools,
screen_record_tools,
process_and_shell_tools, remote_desktop_tools, gamepad_tools,
usb_passthrough_tools, assertion_tools, data_source_tools,
Expand Down
20 changes: 20 additions & 0 deletions je_auto_control/utils/mcp_server/tools/_handlers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1143,6 +1143,26 @@ def mark_click(mark_id):
return {"clicked": _mc(int(mark_id))}


def screen_snapshot(app_name=None):
from je_auto_control.utils.screen_state import snapshot_screen
return {"snapshot": snapshot_screen(app_name=app_name)}


def screen_diff(before, after):
from je_auto_control.utils.screen_state import diff_snapshots
return diff_snapshots(before, after)


def screen_changed(app_name=None):
from je_auto_control.utils.screen_state import screen_changed as _sc
return _sc(app_name=app_name)


def describe_screen(app_name=None):
from je_auto_control.utils.screen_state import describe_screen as _ds
return _ds(app_name=app_name)


def vlm_locate(description: str,
screen_region: Optional[List[int]] = None,
model: Optional[str] = None) -> Optional[List[int]]:
Expand Down
9 changes: 9 additions & 0 deletions je_auto_control/utils/screen_state/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""Semantic screen state: snapshot/diff and a structured screen description."""
from je_auto_control.utils.screen_state.screen_state import (
describe_screen, diff_snapshots, screen_changed, snapshot, snapshot_screen,
)

__all__ = [
"describe_screen", "diff_snapshots", "screen_changed", "snapshot",
"snapshot_screen",
]
Loading
Loading