Skip to content

Comments

Ko tn staging v1#388

Open
tbartley94 wants to merge 7 commits intomainfrom
ko_tn_staging_v1
Open

Ko tn staging v1#388
tbartley94 wants to merge 7 commits intomainfrom
ko_tn_staging_v1

Conversation

@tbartley94
Copy link
Member

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Before your PR is "Ready for review"

Pre checks:

  • Have you signed your commits? Use git commit -s to sign.
  • Do all unittests finish successfully before sending PR?
    1. pytest or (if your machine does not have GPU) pytest --cpu from the root folder (given you marked your test cases accordingly @pytest.mark.run_only_on('CPU')).
    2. Sparrowhawk tests bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...
  • If you are adding a new feature: Have you added test cases for both pytest and Sparrowhawk here.
  • Have you added __init__.py for every folder and subfolder, including data folder which has .TSV files?
  • Have you followed codeQL results and removed unused variables and imports (report is at the bottom of the PR in github review box) ?
  • Have you added the correct license header Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. to all newly added Python files?
  • If you copied nemo_text_processing/text_normalization/en/graph_utils.py your header's second line should be Copyright 2015 and onwards Google, Inc.. See an example here.
  • Remove import guards (try import: ... except: ...) if not already done.
  • If you added a new language or a new feature please update the NeMo documentation (lives in different repo).
  • Have you added your language support to tools/text_processing_deployment/pynini_export.py.

PR Type:

  • New Feature
  • Bugfix
  • Documentation
  • Test

If you haven't finished some of the above items you can still open "Draft" PR.

bbae0312 and others added 7 commits June 10, 2025 10:04
* Add Korean TN support for cardinal numbers and postprocessing

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor Korean TN cardinal and postprocessing logic based on review feedback

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add __init__.py to ko/data directory

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* Update KO_TN_CACHE to trigger Korean CI run

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor Korean TN cardinal and postprocessing logic based on review feedback

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* Add Korean Ordinal TN logic and test cases

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* Add support for 0 in ordinal tagger

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove .far files

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(ko/ordinal): update ordinal FST based on review feedback

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* feat(ko/decimal): add Korean decimal TN support

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* feat(ko): Add fraction tagger and verbalizer with tests

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(ko): Update decimal and fraction taggers

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(ko/date): update date tagger and sparrowhawk test

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko(TN): Date TN fixes & cleanup

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko(TN): Add Time tagger/verbalizer + tests

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(ko/money): polish tagger/verbalizer & expand tests

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko: refactor money/telephone taggers & verbalizers

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* ko: update money/telephone taggers and telephone verbalizer

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* ko: update telephone taggers

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data)

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update KO electronic & measure taggers/verbalizers and test cases

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Edited as per review feedback

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Korean TN fixes: cardinal, decimal, fraction, date

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add ko electronic extensions and improve electronic/telephone normalization

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix Korean TN issues and update test cases

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix Korean TN electronic and post-processing issues

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* Fix Korean TN spacing and electronic/cardinal handling

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* Fix optional token separator and remove redundant whitespace normalization

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove unused KO post_processing and update exporter

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>

---------

Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@tbartley94 tbartley94 requested a review from mgrafu February 20, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants