diff --git a/README.md b/README.md index 3b82389f..e91e379d 100644 --- a/README.md +++ b/README.md @@ -13,6 +13,7 @@ ## Table of Contents +- [What's new (2026-06-20) — Coordinate-Space Mapping (Model Grid ⇄ Physical Pixels)](#whats-new-2026-06-20--coordinate-space-mapping-model-grid--physical-pixels) - [What's new (2026-06-20) — Voice-Command Router](#whats-new-2026-06-20--voice-command-router) - [What's new (2026-06-20) — Locale-Aware Number, Currency & Date Parsing](#whats-new-2026-06-20--locale-aware-number-currency--date-parsing) - [What's new (2026-06-20) — Perceptual-Hash Image Dedupe](#whats-new-2026-06-20--perceptual-hash-image-dedupe) @@ -97,6 +98,12 @@ --- +## What's new (2026-06-20) — Coordinate-Space Mapping (Model Grid ⇄ Physical Pixels) + +Translate computer-use model clicks to real pixels. Full reference: [`docs/source/Eng/doc/new_features/v45_features_doc.rst`](docs/source/Eng/doc/new_features/v45_features_doc.rst). + +- **`CoordinateSpace` / `xga_space` / `normalized_space` / `downscale_png`** (`AC_to_physical` / `AC_to_model`, `ac_*`): computer-use/VLA models click in a fixed grid (Anthropic downscales to XGA; Gemini returns a 1000×1000 grid), not physical pixels. This maps both ways (round + clamp), `xga_space` aspect-preserves without upscaling, and `downscale_png` resizes a screenshot to the model's input size (Pillow, already core). Pure-arithmetic mapping — unit-tested without a model/GPU. + ## What's new (2026-06-20) — Voice-Command Router Trigger flows hands-free from recognized speech. Full reference: [`docs/source/Eng/doc/new_features/v44_features_doc.rst`](docs/source/Eng/doc/new_features/v44_features_doc.rst). diff --git a/README/README_zh-CN.md b/README/README_zh-CN.md index 7696d8af..cc053d9f 100644 --- a/README/README_zh-CN.md +++ b/README/README_zh-CN.md @@ -12,6 +12,7 @@ ## 目录 +- [本次更新 (2026-06-20) — 坐标空间映射(模型网格 ⇄ 物理像素)](#本次更新-2026-06-20--坐标空间映射模型网格--物理像素) - [本次更新 (2026-06-20) — 语音指令路由器](#本次更新-2026-06-20--语音指令路由器) - [本次更新 (2026-06-20) — 区域设置感知的数字、货币与日期解析](#本次更新-2026-06-20--区域设置感知的数字货币与日期解析) - [本次更新 (2026-06-20) — 感知哈希图像去重](#本次更新-2026-06-20--感知哈希图像去重) @@ -96,6 +97,12 @@ --- +## 本次更新 (2026-06-20) — 坐标空间映射(模型网格 ⇄ 物理像素) + +将电脑操作模型的点击转成物理像素。完整参考:[`docs/source/Zh/doc/new_features/v45_features_doc.rst`](../docs/source/Zh/doc/new_features/v45_features_doc.rst)。 + +- **`CoordinateSpace` / `xga_space` / `normalized_space` / `downscale_png`**(`AC_to_physical` / `AC_to_model`、`ac_*`):电脑操作/VLA 模型以固定网格点击(Anthropic 缩小到 XGA;Gemini 返回 1000×1000 网格),而非物理像素。本功能双向映射(四舍五入 + 夹限),`xga_space` 保持长宽比且不放大,`downscale_png` 将截图缩到模型输入尺寸(Pillow,已是核心)。纯算术映射 —— 无需模型/GPU 即可单元测试。 + ## 本次更新 (2026-06-20) — 语音指令路由器 以已识别语音免手动触发流程。完整参考:[`docs/source/Zh/doc/new_features/v44_features_doc.rst`](../docs/source/Zh/doc/new_features/v44_features_doc.rst)。 diff --git a/README/README_zh-TW.md b/README/README_zh-TW.md index 7eadc21f..c4578c07 100644 --- a/README/README_zh-TW.md +++ b/README/README_zh-TW.md @@ -12,6 +12,7 @@ ## 目錄 +- [本次更新 (2026-06-20) — 座標空間對映(模型網格 ⇄ 實體像素)](#本次更新-2026-06-20--座標空間對映模型網格--實體像素) - [本次更新 (2026-06-20) — 語音指令路由器](#本次更新-2026-06-20--語音指令路由器) - [本次更新 (2026-06-20) — 區域設定感知的數字、貨幣與日期解析](#本次更新-2026-06-20--區域設定感知的數字貨幣與日期解析) - [本次更新 (2026-06-20) — 感知雜湊影像去重](#本次更新-2026-06-20--感知雜湊影像去重) @@ -96,6 +97,12 @@ --- +## 本次更新 (2026-06-20) — 座標空間對映(模型網格 ⇄ 實體像素) + +將電腦操作模型的點擊轉成真實像素。完整參考:[`docs/source/Zh/doc/new_features/v45_features_doc.rst`](../docs/source/Zh/doc/new_features/v45_features_doc.rst)。 + +- **`CoordinateSpace` / `xga_space` / `normalized_space` / `downscale_png`**(`AC_to_physical` / `AC_to_model`、`ac_*`):電腦操作/VLA 模型以固定網格點擊(Anthropic 縮小到 XGA;Gemini 回傳 1000×1000 網格),而非實體像素。本功能雙向對映(四捨五入 + 夾限),`xga_space` 保持長寬比且不放大,`downscale_png` 將截圖縮到模型輸入尺寸(Pillow,已是核心)。純算術對映 —— 無需模型/GPU 即可單元測試。 + ## 本次更新 (2026-06-20) — 語音指令路由器 以已辨識語音免手動觸發流程。完整參考:[`docs/source/Zh/doc/new_features/v44_features_doc.rst`](../docs/source/Zh/doc/new_features/v44_features_doc.rst)。 diff --git a/docs/source/Eng/doc/new_features/v45_features_doc.rst b/docs/source/Eng/doc/new_features/v45_features_doc.rst new file mode 100644 index 00000000..729d20ea --- /dev/null +++ b/docs/source/Eng/doc/new_features/v45_features_doc.rst @@ -0,0 +1,45 @@ +Coordinate-Space Mapping (Model Grid ⇄ Physical Pixels) +======================================================= + +Computer-use / VLA models do not click in physical pixels. Anthropic recommends +downscaling the screenshot to XGA (~1024×768) and mapping clicks back; Gemini's +computer-use model returns a normalized **1000×1000** grid; others assume the +display size you declared. ``CoordinateSpace`` captures the physical resolution +and the model's grid and converts both ways, so an agent loop can feed the model +a right-sized screenshot and translate its clicks back to real coordinates. + +The mapping is pure arithmetic (no dependency); :func:`downscale_png` uses Pillow +(already a core dependency). Imports no ``PySide6``. + +Headless API +------------ + +.. code-block:: python + + from je_auto_control import ( + CoordinateSpace, xga_space, normalized_space, downscale_png) + + space = normalized_space(1920, 1080, grid=1000) # Gemini-style 1000x1000 + space.to_physical(500, 500) # -> (960, 540) model click -> real pixel + space.to_model(960, 540) # -> (500, 500) real pixel -> model grid + + xga = xga_space(2560, 1440) # Anthropic-style downscale, aspect-preserved + small_png = downscale_png(screenshot_png, xga) # send this to the model + +``xga_space`` preserves aspect ratio and never upscales; ``normalized_space`` +builds a square grid. Both ``to_physical`` / ``to_model`` round and clamp to valid +pixel/grid bounds. + +Executor commands +----------------- + +================================ =================================================== +Command Effect +================================ =================================================== +``AC_to_physical`` Map a model-grid ``(x, y)`` to physical pixels. +``AC_to_model`` Map physical pixels to a model grid (inverse). +================================ =================================================== + +Both take ``x, y, physical_w, physical_h, model_w, model_h`` and return +``{x, y}``. The same operations are exposed as MCP tools (``ac_to_physical`` / +``ac_to_model``) and as Script Builder commands under **Agent**. diff --git a/docs/source/Eng/eng_index.rst b/docs/source/Eng/eng_index.rst index df05947f..35514b17 100644 --- a/docs/source/Eng/eng_index.rst +++ b/docs/source/Eng/eng_index.rst @@ -67,6 +67,7 @@ Comprehensive guides for all AutoControl features. doc/new_features/v42_features_doc doc/new_features/v43_features_doc doc/new_features/v44_features_doc + doc/new_features/v45_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/docs/source/Zh/doc/new_features/v45_features_doc.rst b/docs/source/Zh/doc/new_features/v45_features_doc.rst new file mode 100644 index 00000000..e90ddb7e --- /dev/null +++ b/docs/source/Zh/doc/new_features/v45_features_doc.rst @@ -0,0 +1,42 @@ +座標空間對映(模型網格 ⇄ 實體像素) +==================================== + +電腦操作 / VLA 模型並不是以實體像素點擊。Anthropic 建議將螢幕截圖縮小到 XGA +(~1024×768)再把點擊映射回去;Gemini 的電腦操作模型回傳正規化的 **1000×1000** 網格; +其他模型則假設你宣告的顯示尺寸。``CoordinateSpace`` 捕捉實體解析度與模型網格並雙向轉 +換,因此 agent loop 可餵給模型一張尺寸正確的截圖,並把它的點擊轉回真實座標。 + +對映為純算術(無相依);:func:`downscale_png` 使用 Pillow(已是核心相依)。不匯入 +``PySide6``。 + +無頭 API +-------- + +.. code-block:: python + + from je_auto_control import ( + CoordinateSpace, xga_space, normalized_space, downscale_png) + + space = normalized_space(1920, 1080, grid=1000) # Gemini 式 1000x1000 + space.to_physical(500, 500) # -> (960, 540) 模型點擊 -> 真實像素 + space.to_model(960, 540) # -> (500, 500) 真實像素 -> 模型網格 + + xga = xga_space(2560, 1440) # Anthropic 式縮小,保持長寬比 + small_png = downscale_png(screenshot_png, xga) # 把這張送給模型 + +``xga_space`` 會保持長寬比且永不放大;``normalized_space`` 建立方形網格。 +``to_physical`` / ``to_model`` 皆會四捨五入並夾限到有效的像素/網格範圍內。 + +執行器指令 +---------- + +================================ =================================================== +指令 效果 +================================ =================================================== +``AC_to_physical`` 將模型網格 ``(x, y)`` 對映到實體像素。 +``AC_to_model`` 將實體像素對映到模型網格(反向)。 +================================ =================================================== + +兩者皆接受 ``x, y, physical_w, physical_h, model_w, model_h`` 並回傳 ``{x, y}``。相同操 +作亦提供為 MCP 工具(``ac_to_physical`` / ``ac_to_model``),以及 Script Builder 中 +**Agent** 分類下的指令。 diff --git a/docs/source/Zh/zh_index.rst b/docs/source/Zh/zh_index.rst index a4d71f19..668fdc02 100644 --- a/docs/source/Zh/zh_index.rst +++ b/docs/source/Zh/zh_index.rst @@ -67,6 +67,7 @@ AutoControl 所有功能的完整使用指南。 doc/new_features/v42_features_doc doc/new_features/v43_features_doc doc/new_features/v44_features_doc + doc/new_features/v45_features_doc doc/ocr_backends/ocr_backends_doc doc/observability/observability_doc doc/operations_layer/operations_layer_doc diff --git a/je_auto_control/__init__.py b/je_auto_control/__init__.py index 3dec62a8..9a565936 100644 --- a/je_auto_control/__init__.py +++ b/je_auto_control/__init__.py @@ -251,6 +251,10 @@ from je_auto_control.utils.voice import ( VoiceCommand, VoiceRouter, default_voice_router, ) +# Coordinate-space mapping (model grid <-> physical pixels) +from je_auto_control.utils.coordinate_space import ( + CoordinateSpace, downscale_png, normalized_space, xga_space, +) # Background popup/interrupt watchdog (unattended automation) from je_auto_control.utils.watchdog import ( PopupWatchdog, WatchdogRule, default_popup_watchdog, @@ -705,6 +709,7 @@ def start_autocontrol_gui(*args, **kwargs): "format_currency", "format_date", "format_decimal", "parse_decimal", "parse_number", "VoiceCommand", "VoiceRouter", "default_voice_router", + "CoordinateSpace", "downscale_png", "normalized_space", "xga_space", # MCP server "AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt", "MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool", diff --git a/je_auto_control/gui/script_builder/command_schema.py b/je_auto_control/gui/script_builder/command_schema.py index 96a8a478..8d70be51 100644 --- a/je_auto_control/gui/script_builder/command_schema.py +++ b/je_auto_control/gui/script_builder/command_schema.py @@ -1019,6 +1019,28 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None: fields=(), description="Remove all registered voice commands.", )) + specs.append(CommandSpec( + "AC_to_physical", "Agent", "Coords: Model -> Physical", + fields=( + FieldSpec("x", FieldType.FLOAT), FieldSpec("y", FieldType.FLOAT), + FieldSpec("physical_w", FieldType.INT), + FieldSpec("physical_h", FieldType.INT), + FieldSpec("model_w", FieldType.INT), + FieldSpec("model_h", FieldType.INT), + ), + description="Map a model-grid coordinate to physical pixels.", + )) + specs.append(CommandSpec( + "AC_to_model", "Agent", "Coords: Physical -> Model", + fields=( + FieldSpec("x", FieldType.INT), FieldSpec("y", FieldType.INT), + FieldSpec("physical_w", FieldType.INT), + FieldSpec("physical_h", FieldType.INT), + FieldSpec("model_w", FieldType.INT), + FieldSpec("model_h", FieldType.INT), + ), + description="Map a physical-pixel coordinate to a model grid.", + )) specs.append(CommandSpec( "AC_generate_sop", "Report", "Generate SOP Document", fields=( diff --git a/je_auto_control/utils/coordinate_space/__init__.py b/je_auto_control/utils/coordinate_space/__init__.py new file mode 100644 index 00000000..17d35eca --- /dev/null +++ b/je_auto_control/utils/coordinate_space/__init__.py @@ -0,0 +1,8 @@ +"""Coordinate-space mapping between model grids and physical pixels.""" +from je_auto_control.utils.coordinate_space.coordinate_space import ( + CoordinateSpace, downscale_png, normalized_space, xga_space, +) + +__all__ = [ + "CoordinateSpace", "downscale_png", "normalized_space", "xga_space", +] diff --git a/je_auto_control/utils/coordinate_space/coordinate_space.py b/je_auto_control/utils/coordinate_space/coordinate_space.py new file mode 100644 index 00000000..54bd21d5 --- /dev/null +++ b/je_auto_control/utils/coordinate_space/coordinate_space.py @@ -0,0 +1,76 @@ +"""Map coordinates between a model's grid and physical screen pixels. + +Computer-use / VLA models do not click in physical pixels: Anthropic recommends +downscaling the screenshot to XGA (~1024x768) and mapping clicks back; Gemini +computer-use returns a normalized **1000x1000** grid; others assume the declared +display size. A :class:`CoordinateSpace` captures the physical resolution and the +model's grid and converts both ways, so an agent loop can send the model a +right-sized screenshot and translate its clicks back to real coordinates. + +Pure arithmetic for the mapping (no dependency); :func:`downscale_png` uses +Pillow, which is already a core dependency. Imports no ``PySide6``. +""" +from dataclasses import dataclass +from typing import Tuple + + +@dataclass(frozen=True) +class CoordinateSpace: + """A mapping between physical pixels and a model coordinate grid.""" + + physical_w: int + physical_h: int + model_w: int + model_h: int + + def to_physical(self, x: float, y: float) -> Tuple[int, int]: + """Map a model-space ``(x, y)`` to physical pixels (clamped, rounded).""" + px = round(x * self.physical_w / self.model_w) + py = round(y * self.physical_h / self.model_h) + return (_clamp(px, self.physical_w), _clamp(py, self.physical_h)) + + def to_model(self, x: int, y: int) -> Tuple[int, int]: + """Map physical pixels ``(x, y)`` to model space (clamped, rounded).""" + mx = round(x * self.model_w / self.physical_w) + my = round(y * self.model_h / self.physical_h) + return (_clamp(mx, self.model_w), _clamp(my, self.model_h)) + + @property + def model_size(self) -> Tuple[int, int]: + """The model grid as ``(width, height)``.""" + return (self.model_w, self.model_h) + + +def _clamp(value: int, size: int) -> int: + return max(0, min(int(value), size - 1)) + + +def xga_space(physical_w: int, physical_h: int, *, max_w: int = 1024, + max_h: int = 768) -> CoordinateSpace: + """Build a space that fits the screen within ``max_w`` x ``max_h``. + + The aspect ratio is preserved (the larger downscale factor wins), matching + the Anthropic "downscale to XGA" recommendation. + """ + scale = min(max_w / physical_w, max_h / physical_h, 1.0) + model_w = max(1, round(physical_w * scale)) + model_h = max(1, round(physical_h * scale)) + return CoordinateSpace(physical_w, physical_h, model_w, model_h) + + +def normalized_space(physical_w: int, physical_h: int, *, + grid: int = 1000) -> CoordinateSpace: + """Build a square normalized grid (default 1000x1000, Gemini-style).""" + return CoordinateSpace(physical_w, physical_h, grid, grid) + + +def downscale_png(png: bytes, space: CoordinateSpace) -> bytes: + """Resize a PNG screenshot to ``space``'s model size (for model input).""" + import io + + from PIL import Image + with Image.open(io.BytesIO(png)) as image: + resized = image.convert("RGB").resize(space.model_size) + buffer = io.BytesIO() + resized.save(buffer, format="PNG") + return buffer.getvalue() diff --git a/je_auto_control/utils/executor/action_executor.py b/je_auto_control/utils/executor/action_executor.py index 442f8f2b..2a50cdab 100644 --- a/je_auto_control/utils/executor/action_executor.py +++ b/je_auto_control/utils/executor/action_executor.py @@ -3195,6 +3195,24 @@ def _voice_clear() -> Dict[str, Any]: return {"cleared": True} +def _to_physical(x: float, y: float, physical_w: int, physical_h: int, + model_w: int, model_h: int) -> Dict[str, Any]: + """Adapter: map a model-grid coordinate to physical pixels.""" + from je_auto_control.utils.coordinate_space import CoordinateSpace + px, py = CoordinateSpace(physical_w, physical_h, model_w, + model_h).to_physical(x, y) + return {"x": px, "y": py} + + +def _to_model(x: int, y: int, physical_w: int, physical_h: int, + model_w: int, model_h: int) -> Dict[str, Any]: + """Adapter: map a physical-pixel coordinate to a model grid.""" + from je_auto_control.utils.coordinate_space import CoordinateSpace + mx, my = CoordinateSpace(physical_w, physical_h, model_w, + model_h).to_model(x, y) + return {"x": mx, "y": my} + + class Executor: """ Executor @@ -3465,6 +3483,8 @@ def __init__(self): "AC_voice_dispatch": _voice_dispatch, "AC_voice_list": _voice_list, "AC_voice_clear": _voice_clear, + "AC_to_physical": _to_physical, + "AC_to_model": _to_model, "AC_a11y_record_start": _a11y_record_start, "AC_a11y_record_stop": _a11y_record_stop, "AC_a11y_record_events": _a11y_record_events, diff --git a/je_auto_control/utils/mcp_server/tools/_factories.py b/je_auto_control/utils/mcp_server/tools/_factories.py index b91cdf87..64d15f8e 100644 --- a/je_auto_control/utils/mcp_server/tools/_factories.py +++ b/je_auto_control/utils/mcp_server/tools/_factories.py @@ -3049,6 +3049,33 @@ def voice_tools() -> List[MCPTool]: ] +def coordinate_space_tools() -> List[MCPTool]: + _DIMS = {"x": {"type": "number"}, "y": {"type": "number"}, + "physical_w": {"type": "integer"}, + "physical_h": {"type": "integer"}, + "model_w": {"type": "integer"}, "model_h": {"type": "integer"}} + _REQ = ["x", "y", "physical_w", "physical_h", "model_w", "model_h"] + return [ + MCPTool( + name="ac_to_physical", + description=("Map a model-grid coordinate (e.g. a 1000x1000 or XGA " + "click from a computer-use model) to physical screen " + "pixels. Returns {x, y}."), + input_schema=schema(dict(_DIMS), list(_REQ)), + handler=h.to_physical, + annotations=READ_ONLY, + ), + MCPTool( + name="ac_to_model", + description=("Map a physical-pixel coordinate to a model grid " + "(inverse of ac_to_physical). Returns {x, y}."), + input_schema=schema(dict(_DIMS), list(_REQ)), + handler=h.to_model, + annotations=READ_ONLY, + ), + ] + + def unattended_tools() -> List[MCPTool]: return [ MCPTool( @@ -4110,7 +4137,7 @@ def media_assert_tools() -> List[MCPTool]: credential_lease_tools, egress_tools, approval_testing_tools, trajectory_eval_tools, compliance_tools, agent_trace_tools, video_report_tools, fuzzy_tools, artifact_store_tools, image_dedup_tools, - locale_tools, voice_tools, + locale_tools, voice_tools, coordinate_space_tools, screen_record_tools, process_and_shell_tools, remote_desktop_tools, gamepad_tools, usb_passthrough_tools, assertion_tools, data_source_tools, diff --git a/je_auto_control/utils/mcp_server/tools/_handlers.py b/je_auto_control/utils/mcp_server/tools/_handlers.py index 8685dcb9..decc0e77 100644 --- a/je_auto_control/utils/mcp_server/tools/_handlers.py +++ b/je_auto_control/utils/mcp_server/tools/_handlers.py @@ -1471,6 +1471,20 @@ def voice_clear(): return {"cleared": True} +def to_physical(x, y, physical_w, physical_h, model_w, model_h): + from je_auto_control.utils.coordinate_space import CoordinateSpace + px, py = CoordinateSpace(physical_w, physical_h, model_w, + model_h).to_physical(x, y) + return {"x": px, "y": py} + + +def to_model(x, y, physical_w, physical_h, model_w, model_h): + from je_auto_control.utils.coordinate_space import CoordinateSpace + mx, my = CoordinateSpace(physical_w, physical_h, model_w, + model_h).to_model(x, y) + return {"x": mx, "y": my} + + def vlm_locate(description: str, screen_region: Optional[List[int]] = None, model: Optional[str] = None) -> Optional[List[int]]: diff --git a/test/unit_test/headless/test_coordinate_space_batch.py b/test/unit_test/headless/test_coordinate_space_batch.py new file mode 100644 index 00000000..93ae65bb --- /dev/null +++ b/test/unit_test/headless/test_coordinate_space_batch.py @@ -0,0 +1,85 @@ +"""Headless tests for coordinate-space mapping. The math path is pure stdlib; +the PNG downscale runs under importorskip(PIL). No Qt imports.""" +import pytest + +import je_auto_control as ac +from je_auto_control.utils.coordinate_space import ( + CoordinateSpace, normalized_space, xga_space) + + +def test_normalized_space_round_trip(): + space = normalized_space(1920, 1080, grid=1000) + assert space.model_size == (1000, 1000) + # centre maps both ways within rounding + mx, my = space.to_model(960, 540) + assert (mx, my) == (500, 500) + px, py = space.to_physical(500, 500) + assert abs(px - 960) <= 1 and abs(py - 540) <= 1 + + +def test_to_physical_scales_from_grid(): + space = normalized_space(1000, 500, grid=100) + assert space.to_physical(50, 50) == (500, 250) + assert space.to_physical(100, 100) == (999, 499) # clamped to last pixel + + +def test_xga_preserves_aspect_and_fits(): + space = xga_space(1920, 1080) # 16:9 fits in 1024x768 + assert space.model_w <= 1024 and space.model_h <= 768 + # aspect ratio preserved + assert abs(space.model_w / space.model_h - 1920 / 1080) < 0.02 + assert space.model_w == 1024 # width-bound for 16:9 + + +def test_xga_no_upscale_for_small_screens(): + space = xga_space(800, 600) # already within XGA -> unchanged + assert (space.model_w, space.model_h) == (800, 600) + + +def test_clamping_is_in_bounds(): + space = CoordinateSpace(100, 100, 10, 10) + assert space.to_physical(999, 999) == (99, 99) + assert space.to_model(-5, -5) == (0, 0) + + +def test_downscale_png_matches_model_size(): + Image = pytest.importorskip("PIL.Image") + import io + buf = io.BytesIO() + Image.new("RGB", (640, 480), (1, 2, 3)).save(buf, format="PNG") + from je_auto_control.utils.coordinate_space import downscale_png + space = normalized_space(640, 480, grid=64) + out = downscale_png(buf.getvalue(), space) + with Image.open(io.BytesIO(out)) as resized: + assert resized.size == (64, 64) + + +# --- wiring --------------------------------------------------------------- + +def test_executor_round_trip(): + rec = ac.execute_action([[ + "AC_to_physical", + {"x": 500, "y": 500, "physical_w": 1920, "physical_h": 1080, + "model_w": 1000, "model_h": 1000}, + ]]) + point = next(v for v in rec.values() if isinstance(v, dict)) + assert abs(point["x"] - 960) <= 1 and abs(point["y"] - 540) <= 1 + + +def test_wiring(): + known = ac.executor.known_commands() + assert {"AC_to_physical", "AC_to_model"} <= known + from je_auto_control.utils.mcp_server.tools import ( + build_default_tool_registry) + names = {t.name for t in build_default_tool_registry()} + assert {"ac_to_physical", "ac_to_model"} <= names + from je_auto_control.gui.script_builder.command_schema import _build_specs + cmds = {s.command for s in _build_specs()} + assert {"AC_to_physical", "AC_to_model"} <= cmds + + +def test_facade_exports(): + for attr in ("CoordinateSpace", "xga_space", "normalized_space", + "downscale_png"): + assert hasattr(ac, attr) + assert attr in ac.__all__