CLDR-18980 Tweak constructed interval formats#5773
Conversation
|
Making progress, but have to iron out the tests. |
|
Steven, could you look at this? I don't understand the build failures here.
When I run Maven test locally, it builds and all the tests complete with no errors, except for one that is completely independent of my changes, a problem with mul. But then |
|
The failure i see is missing "flen" in Error: /home/runner/work/cldr/cldr/tools/cldr-code/src/main/java/org/unicode/cldr/util/CldrIntervalFormat.java:[266,52] cannot find symbol Mul unrelated |
|
Thanks that is probably it; fixed. Very strange that Maven test didn't complain on my local system. |
srl295
left a comment
There was a problem hiding this comment.
Approving but comments
- we've been keeping URLs in CLDRURLS.java. Should cleanup and move the new items there.
- why did you need to turn off locations in check, and why via a environment variable and not an option?
There is definitely more cleanup of the code that would be good. I tried to confine myself to what was directly involved. But if I hit this again, I can fix those if you point them out.
It is just too painful to wade through the redundant red lines. I didn't want to turn them off by default, and wanted to talk a bit about the best way to do it. |
Thinking about maintanability and finding those URLs in the same place.
Makes sense. Maybe an option to aggregate by type would be better, i.e. only show each subtype once and then give a count. |
|
Good ideas for the future. |
|
I think this is read to merge (may need to tweak for tests). |
|
Happy to finish reviewing the docs update for #5775 after this PR is merged and available to check on Smoke test and/or on Staging so that I can compare the docs against the final version. |
|
I need a review on this today |
srl295
left a comment
There was a problem hiding this comment.
OK for now. Lots of concern about freezing "what locales do now" into a test, that seems ripe for confusion later.
| public static final UnicodeSet SEPARATOR_SPACINGS = | ||
| new UnicodeSet("[\\u0020\\u00A0\\u202F\\u2009]").freeze(); | ||
|
|
||
| public static final UnicodeSet SEPARATORS_WITH_SPACES = |
| static final Pattern normal = | ||
| Pattern.compile(SEPARATOR_SPACINGS + "?" + SEPARATORS + SEPARATOR_SPACINGS + "?"); |
| return firstPattern + separatorString + secondPattern; | ||
| } | ||
|
|
||
| /** These may need to be adjusted if new locales do something radically different. */ |
There was a problem hiding this comment.
just what I was going to say. It's not great having yet more static lists buried in tests.
done |
CLDR-18980
See #5775 for the Info Hub changes (easier to do that in Github Web).
The main changes are:
tools/cldr-code/src/main/java/org/unicode/cldr/test/CheckDates.java
tools/cldr-code/src/main/java/org/unicode/cldr/test/ConsoleCheckCLDR.java
tools/cldr-code/src/main/java/org/unicode/cldr/util/CldrIntervalFormat.java
tools/cldr-code/src/main/java/org/unicode/cldr/util/DatetimeUtilities.java
tools/cldr-code/src/test/java/org/unicode/cldr/unittest/TestDateOrder.java
Major refactoring. Also fixed the check for datetime numeric separators, and a bunch of the error messages.
Note
The constructed intervals can't always be right, because we can't necessarily determine whether …M#y… (where # is some punctuation or letter) is a separator or a terminator. If a separator, we want to remove it if it ends up next to the interval separator (eg '–'); if it is an adjunct like 月, we want to keep it.
However, the comparison is very useful, because it points up many gratuitous inconsistencies between the interval formats and available formats. Here is a spreadsheet of differences, with a Key.
Comparison
https://docs.google.com/spreadsheets/d/12eGl72fX4cVgOgIKk2FUPWWUtBX31WG4yLFa8juxIGs/edit
There are warning samples in the tab warning samples.
https://docs.google.com/spreadsheets/d/12eGl72fX4cVgOgIKk2FUPWWUtBX31WG4yLFa8juxIGs/edit?gid=789063264#gid=789063264
ALLOW_MANY_COMMITS=true