Skip to content

Commit 2b7bd07

Browse files
authored
Merge pull request #251 from Integration-Automation/feat/locale-parse-batch
Add locale-aware number/currency/date parsing
2 parents a3daf70 + 486b1db commit 2b7bd07

16 files changed

Lines changed: 437 additions & 0 deletions

File tree

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313

1414
## Table of Contents
1515

16+
- [What's new (2026-06-20) — Locale-Aware Number, Currency & Date Parsing](#whats-new-2026-06-20--locale-aware-number-currency--date-parsing)
1617
- [What's new (2026-06-20) — Perceptual-Hash Image Dedupe](#whats-new-2026-06-20--perceptual-hash-image-dedupe)
1718
- [What's new (2026-06-20) — S3-Compatible Artifact Store](#whats-new-2026-06-20--s3-compatible-artifact-store)
1819
- [What's new (2026-06-20) — Fuzzy String Matching & Dedupe](#whats-new-2026-06-20--fuzzy-string-matching--dedupe)
@@ -95,6 +96,12 @@
9596

9697
---
9798

99+
## What's new (2026-06-20) — Locale-Aware Number, Currency & Date Parsing
100+
101+
Parse localized numbers/currency/dates. Full reference: [`docs/source/Eng/doc/new_features/v43_features_doc.rst`](docs/source/Eng/doc/new_features/v43_features_doc.rst).
102+
103+
- **`parse_decimal` / `parse_number` / `format_decimal` / `format_currency` / `format_date`** (`AC_parse_decimal` / `AC_parse_number` / `AC_format_decimal` / `AC_format_currency` / `AC_format_date`, `ac_*`): OCR/UI text like `"1.234,56"` (de_DE) parses correctly to `1234.56` via **Babel**'s CLDR data, and values format back per-locale. `babel` is an optional `[locale]` extra, imported lazily; functional tests run under `importorskip` (wiring/facade always verified).
104+
98105
## What's new (2026-06-20) — Perceptual-Hash Image Dedupe
99106

100107
Collapse near-identical screenshots. Full reference: [`docs/source/Eng/doc/new_features/v42_features_doc.rst`](docs/source/Eng/doc/new_features/v42_features_doc.rst).

README/README_zh-CN.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
## 目录
1414

15+
- [本次更新 (2026-06-20) — 区域设置感知的数字、货币与日期解析](#本次更新-2026-06-20--区域设置感知的数字货币与日期解析)
1516
- [本次更新 (2026-06-20) — 感知哈希图像去重](#本次更新-2026-06-20--感知哈希图像去重)
1617
- [本次更新 (2026-06-20) — S3 兼容成品存储](#本次更新-2026-06-20--s3-兼容成品存储)
1718
- [本次更新 (2026-06-20) — 模糊字符串匹配与去重](#本次更新-2026-06-20--模糊字符串匹配与去重)
@@ -94,6 +95,12 @@
9495

9596
---
9697

98+
## 本次更新 (2026-06-20) — 区域设置感知的数字、货币与日期解析
99+
100+
解析本地化的数字/货币/日期。完整参考:[`docs/source/Zh/doc/new_features/v43_features_doc.rst`](../docs/source/Zh/doc/new_features/v43_features_doc.rst)
101+
102+
- **`parse_decimal` / `parse_number` / `format_decimal` / `format_currency` / `format_date`**(`AC_parse_decimal` / `AC_parse_number` / `AC_format_decimal` / `AC_format_currency` / `AC_format_date``ac_*`):像 `"1.234,56"`(de_DE)这样的 OCR/UI 文本会通过 **Babel** 的 CLDR 数据正确解析为 `1234.56`,值也能依区域设置格式化回去。`babel` 为可选 `[locale]` extra,采延迟导入;功能测试以 `importorskip` 运行(wiring/facade 一律验证)。
103+
97104
## 本次更新 (2026-06-20) — 感知哈希图像去重
98105

99106
收合近乎相同的屏幕截图。完整参考:[`docs/source/Zh/doc/new_features/v42_features_doc.rst`](../docs/source/Zh/doc/new_features/v42_features_doc.rst)

README/README_zh-TW.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
## 目錄
1414

15+
- [本次更新 (2026-06-20) — 區域設定感知的數字、貨幣與日期解析](#本次更新-2026-06-20--區域設定感知的數字貨幣與日期解析)
1516
- [本次更新 (2026-06-20) — 感知雜湊影像去重](#本次更新-2026-06-20--感知雜湊影像去重)
1617
- [本次更新 (2026-06-20) — S3 相容成品儲存](#本次更新-2026-06-20--s3-相容成品儲存)
1718
- [本次更新 (2026-06-20) — 模糊字串比對與去重](#本次更新-2026-06-20--模糊字串比對與去重)
@@ -94,6 +95,12 @@
9495

9596
---
9697

98+
## 本次更新 (2026-06-20) — 區域設定感知的數字、貨幣與日期解析
99+
100+
解析在地化的數字/貨幣/日期。完整參考:[`docs/source/Zh/doc/new_features/v43_features_doc.rst`](../docs/source/Zh/doc/new_features/v43_features_doc.rst)
101+
102+
- **`parse_decimal` / `parse_number` / `format_decimal` / `format_currency` / `format_date`**(`AC_parse_decimal` / `AC_parse_number` / `AC_format_decimal` / `AC_format_currency` / `AC_format_date``ac_*`):像 `"1.234,56"`(de_DE)這樣的 OCR/UI 文字會透過 **Babel** 的 CLDR 資料正確解析為 `1234.56`,值也能依區域設定格式化回去。`babel` 為選用 `[locale]` extra,採延遲匯入;功能測試以 `importorskip` 執行(wiring/facade 一律驗證)。
103+
97104
## 本次更新 (2026-06-20) — 感知雜湊影像去重
98105

99106
收合近乎相同的螢幕截圖。完整參考:[`docs/source/Zh/doc/new_features/v42_features_doc.rst`](../docs/source/Zh/doc/new_features/v42_features_doc.rst)
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
Locale-Aware Number, Currency & Date Parsing
2+
============================================
3+
4+
Text scraped from a localized UI or OCR rarely matches Python's ``float()``:
5+
``"1.234,56"`` is twelve-hundred in ``de_DE`` but malformed to ``float``. These
6+
helpers parse such strings — and format values back — using **Babel**'s CLDR
7+
data, so flows can read and assert on numbers, currency, and dates across
8+
locales.
9+
10+
``babel`` is an **optional** dependency (``pip install je_auto_control[locale]``)
11+
imported lazily, so the package stays importable without it; the functions raise
12+
a clear error only when called without Babel. Imports no ``PySide6``.
13+
14+
Headless API
15+
------------
16+
17+
.. code-block:: python
18+
19+
from je_auto_control import (
20+
parse_decimal, parse_number, format_decimal, format_currency,
21+
format_date)
22+
23+
parse_decimal("1.234,56", locale="de_DE") # -> 1234.56
24+
parse_number("1,234", locale="en_US") # -> 1234
25+
26+
format_decimal(1234.5, locale="en_US") # -> "1,234.5"
27+
format_currency(1234.5, "USD", locale="en_US") # -> "$1,234.50"
28+
format_date("2026-06-20", locale="de_DE", fmt="short") # -> "20.06.26"
29+
30+
``format_date`` accepts an ISO ``YYYY-MM-DD`` string or a ``date`` object and a
31+
``fmt`` of ``short`` / ``medium`` / ``long`` / ``full``. Parse + format
32+
round-trip within a locale.
33+
34+
.. note::
35+
36+
The functional path requires Babel; CI runs these tests under
37+
``importorskip`` so they execute wherever Babel is installed and are skipped
38+
otherwise. The wiring/facade are always verified.
39+
40+
Executor commands
41+
-----------------
42+
43+
================================ ===================================================
44+
Command Effect
45+
================================ ===================================================
46+
``AC_parse_decimal`` ``{value}`` float from a locale decimal string.
47+
``AC_parse_number`` ``{value}`` int from a locale integer string.
48+
``AC_format_decimal`` ``{text}`` number formatted for a locale.
49+
``AC_format_currency`` ``{text}`` currency (ISO 4217) for a locale.
50+
``AC_format_date`` ``{text}`` ISO date formatted for a locale.
51+
================================ ===================================================
52+
53+
The same operations are exposed as MCP tools (``ac_parse_decimal`` /
54+
``ac_parse_number`` / ``ac_format_decimal`` / ``ac_format_currency`` /
55+
``ac_format_date``) and as Script Builder commands under **Data**.

docs/source/Eng/eng_index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ Comprehensive guides for all AutoControl features.
6565
doc/new_features/v40_features_doc
6666
doc/new_features/v41_features_doc
6767
doc/new_features/v42_features_doc
68+
doc/new_features/v43_features_doc
6869
doc/ocr_backends/ocr_backends_doc
6970
doc/observability/observability_doc
7071
doc/operations_layer/operations_layer_doc
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
區域設定感知的數字、貨幣與日期解析
2+
==================================
3+
4+
從在地化 UI 或 OCR 擷取的文字,鮮少能直接通過 Python 的 ``float()``:``"1.234,56"``
5+
在 ``de_DE`` 是一千二百多,但對 ``float`` 卻是格式錯誤。這些輔助函式以 **Babel** 的
6+
CLDR 資料解析這類字串(並可反向格式化值),讓流程能跨區域設定讀取並斷言數字、貨幣與日
7+
期。
8+
9+
``babel`` 為**選用**相依(``pip install je_auto_control[locale]``),採延遲匯入,因此套
10+
件在沒有它時仍可匯入;函式僅在未安裝 Babel 而被呼叫時才拋出明確錯誤。不匯入
11+
``PySide6``。
12+
13+
無頭 API
14+
--------
15+
16+
.. code-block:: python
17+
18+
from je_auto_control import (
19+
parse_decimal, parse_number, format_decimal, format_currency,
20+
format_date)
21+
22+
parse_decimal("1.234,56", locale="de_DE") # -> 1234.56
23+
parse_number("1,234", locale="en_US") # -> 1234
24+
25+
format_decimal(1234.5, locale="en_US") # -> "1,234.5"
26+
format_currency(1234.5, "USD", locale="en_US") # -> "$1,234.50"
27+
format_date("2026-06-20", locale="de_DE", fmt="short") # -> "20.06.26"
28+
29+
``format_date`` 接受 ISO ``YYYY-MM-DD`` 字串或 ``date`` 物件,``fmt`` 可為 ``short`` /
30+
``medium`` / ``long`` / ``full``。同一區域設定內解析 + 格式化可往返一致。
31+
32+
.. note::
33+
34+
功能路徑需要 Babel;CI 以 ``importorskip`` 執行這些測試,因此在有安裝 Babel 處執
35+
行、否則跳過。wiring/facade 則一律驗證。
36+
37+
執行器指令
38+
----------
39+
40+
================================ ===================================================
41+
指令 效果
42+
================================ ===================================================
43+
``AC_parse_decimal`` 由區域設定小數字串得到 ``{value}`` float。
44+
``AC_parse_number`` 由區域設定整數字串得到 ``{value}`` int。
45+
``AC_format_decimal`` 依區域設定格式化數字的 ``{text}``。
46+
``AC_format_currency`` 依區域設定的貨幣(ISO 4217)``{text}``。
47+
``AC_format_date`` 依區域設定格式化 ISO 日期的 ``{text}``。
48+
================================ ===================================================
49+
50+
相同操作亦提供為 MCP 工具(``ac_parse_decimal`` / ``ac_parse_number`` /
51+
``ac_format_decimal`` / ``ac_format_currency`` / ``ac_format_date``),以及 Script
52+
Builder 中 **Data** 分類下的指令。

docs/source/Zh/zh_index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ AutoControl 所有功能的完整使用指南。
6565
doc/new_features/v40_features_doc
6666
doc/new_features/v41_features_doc
6767
doc/new_features/v42_features_doc
68+
doc/new_features/v43_features_doc
6869
doc/ocr_backends/ocr_backends_doc
6970
doc/observability/observability_doc
7071
doc/operations_layer/operations_layer_doc

je_auto_control/__init__.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -243,6 +243,10 @@
243243
from je_auto_control.utils.image_dedup import (
244244
average_hash, dedupe_images, dhash, hamming_distance, images_similar,
245245
)
246+
# Locale-aware number/currency/date parsing & formatting (optional babel)
247+
from je_auto_control.utils.locale_parse import (
248+
format_currency, format_date, format_decimal, parse_decimal, parse_number,
249+
)
246250
# Background popup/interrupt watchdog (unattended automation)
247251
from je_auto_control.utils.watchdog import (
248252
PopupWatchdog, WatchdogRule, default_popup_watchdog,
@@ -694,6 +698,8 @@ def start_autocontrol_gui(*args, **kwargs):
694698
"set_default_store",
695699
"average_hash", "dedupe_images", "dhash", "hamming_distance",
696700
"images_similar",
701+
"format_currency", "format_date", "format_decimal", "parse_decimal",
702+
"parse_number",
697703
# MCP server
698704
"AuditLogger", "HttpMCPServer", "MCPContent", "MCPPrompt",
699705
"MCPPromptArgument", "MCPResource", "MCPServer", "MCPTool",

je_auto_control/gui/script_builder/command_schema.py

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -946,6 +946,54 @@ def _add_misc_specs(specs: List[CommandSpec]) -> None:
946946
),
947947
description="Collapse near-duplicate images by perceptual hash.",
948948
))
949+
specs.append(CommandSpec(
950+
"AC_parse_decimal", "Data", "Locale: Parse Decimal",
951+
fields=(
952+
FieldSpec("text", FieldType.STRING, placeholder="1.234,56"),
953+
FieldSpec("locale", FieldType.STRING, optional=True,
954+
default="en_US"),
955+
),
956+
description="Parse a locale-formatted decimal string to a float.",
957+
))
958+
specs.append(CommandSpec(
959+
"AC_parse_number", "Data", "Locale: Parse Number",
960+
fields=(
961+
FieldSpec("text", FieldType.STRING, placeholder="1,234"),
962+
FieldSpec("locale", FieldType.STRING, optional=True,
963+
default="en_US"),
964+
),
965+
description="Parse a locale-formatted integer string to an int.",
966+
))
967+
specs.append(CommandSpec(
968+
"AC_format_decimal", "Data", "Locale: Format Decimal",
969+
fields=(
970+
FieldSpec("value", FieldType.FLOAT),
971+
FieldSpec("locale", FieldType.STRING, optional=True,
972+
default="en_US"),
973+
),
974+
description="Format a number for a locale.",
975+
))
976+
specs.append(CommandSpec(
977+
"AC_format_currency", "Data", "Locale: Format Currency",
978+
fields=(
979+
FieldSpec("value", FieldType.FLOAT),
980+
FieldSpec("currency", FieldType.STRING, placeholder="USD"),
981+
FieldSpec("locale", FieldType.STRING, optional=True,
982+
default="en_US"),
983+
),
984+
description="Format a value as currency (ISO 4217) for a locale.",
985+
))
986+
specs.append(CommandSpec(
987+
"AC_format_date", "Data", "Locale: Format Date",
988+
fields=(
989+
FieldSpec("value", FieldType.STRING, placeholder="2026-06-20"),
990+
FieldSpec("locale", FieldType.STRING, optional=True,
991+
default="en_US"),
992+
FieldSpec("fmt", FieldType.ENUM, optional=True, default="medium",
993+
choices=("short", "medium", "long", "full")),
994+
),
995+
description="Format an ISO date string for a locale.",
996+
))
949997
specs.append(CommandSpec(
950998
"AC_generate_sop", "Report", "Generate SOP Document",
951999
fields=(

je_auto_control/utils/executor/action_executor.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3136,6 +3136,38 @@ def _dedupe_images(paths: Any, max_distance: int = 5) -> Dict[str, Any]:
31363136
max_distance=max_distance)}
31373137

31383138

3139+
def _parse_decimal(text: str, locale: str = "en_US") -> Dict[str, Any]:
3140+
"""Adapter: parse a locale-formatted decimal string to a float."""
3141+
from je_auto_control.utils.locale_parse import parse_decimal
3142+
return {"value": parse_decimal(text, locale)}
3143+
3144+
3145+
def _parse_number(text: str, locale: str = "en_US") -> Dict[str, Any]:
3146+
"""Adapter: parse a locale-formatted integer string to an int."""
3147+
from je_auto_control.utils.locale_parse import parse_number
3148+
return {"value": parse_number(text, locale)}
3149+
3150+
3151+
def _format_decimal(value: float, locale: str = "en_US") -> Dict[str, Any]:
3152+
"""Adapter: format a number for a locale."""
3153+
from je_auto_control.utils.locale_parse import format_decimal
3154+
return {"text": format_decimal(value, locale)}
3155+
3156+
3157+
def _format_currency(value: float, currency: str,
3158+
locale: str = "en_US") -> Dict[str, Any]:
3159+
"""Adapter: format a value as currency for a locale."""
3160+
from je_auto_control.utils.locale_parse import format_currency
3161+
return {"text": format_currency(value, currency, locale)}
3162+
3163+
3164+
def _format_date(value: str, locale: str = "en_US",
3165+
fmt: str = "medium") -> Dict[str, Any]:
3166+
"""Adapter: format an ISO date string for a locale."""
3167+
from je_auto_control.utils.locale_parse import format_date
3168+
return {"text": format_date(value, locale, fmt)}
3169+
3170+
31393171
class Executor:
31403172
"""
31413173
Executor
@@ -3397,6 +3429,11 @@ def __init__(self):
33973429
"AC_s3_delete": _s3_delete,
33983430
"AC_image_hash": _image_hash,
33993431
"AC_dedupe_images": _dedupe_images,
3432+
"AC_parse_decimal": _parse_decimal,
3433+
"AC_parse_number": _parse_number,
3434+
"AC_format_decimal": _format_decimal,
3435+
"AC_format_currency": _format_currency,
3436+
"AC_format_date": _format_date,
34003437
"AC_a11y_record_start": _a11y_record_start,
34013438
"AC_a11y_record_stop": _a11y_record_stop,
34023439
"AC_a11y_record_events": _a11y_record_events,

0 commit comments

Comments
 (0)