Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions pythainlp/khavee/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ def check_sara(self, word: str) -> str:
sara = []
countoa = 0

if not word:
return ""
Comment on lines +57 to +58
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new empty-input behaviors (returning "" for check_sara/check_marttra) are not covered by existing khavee tests. Add unit tests that exercise empty-string input to prevent regressions back to IndexError.

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an empty string for empty input is inconsistent with the existing error signaling in this API (e.g. returning "Can't find Sara in this word" when no vowel is detected). Consider either returning the same "Can't find…" string for empty input as well, or raising a ValueError, so callers don't need to special-case "".

Suggested change
return ""
return "Can't find Sara in this word"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error/warning message should go to standard error/warning raises, not the return value.


# In case of การันย์
if "์" in word[-1]:
word = word[:-2]
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After stripping a trailing karun (word = word[:-2]), word can become empty (e.g. input like "ก์"). The function later indexes word[-1] (e.g. in the ออ handling), which will still raise IndexError. Consider re-checking if not word immediately after the karun-strip (and returning an appropriate value) to fully guard against empty-after-normalization cases.

Suggested change
word = word[:-2]
word = word[:-2]
# After removing the karun, the word may become empty (e.g. "ก์")
if not word:
return ""

Copilot uses AI. Check for mistakes.
Expand Down Expand Up @@ -253,6 +256,9 @@ def check_marttra(self, word: str) -> str:
word = self.handle_karun_sound_silence(word)
word = remove_tonemark(word)

if not word:
return ""
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an empty string for empty input is inconsistent with the existing error signaling in this API (e.g. returning "Can't find Marttra in this word" for unclassified words). Consider returning the same "Can't find…" string or raising a ValueError to avoid an ambiguous "" result.

Suggested change
return ""
return "Can't find Marttra in this word"

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error/warning message should go to standard error/warning raises, not the return value.


# Check for ำ at the end (represents "am" sound, ends with m)
if word[-1] == "ำ":
return "กม"
Expand Down
7 changes: 6 additions & 1 deletion pythainlp/morpheme/word_formation.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,12 @@ def nighit(w1: str, w2: str) -> str:
newword = []
newword.append(list_w1[0])
newword.append("ั")
consonant_start = [i for i in list_w2 if i in set(thai_consonants)][0]
consonants_in_w2 = [i for i in list_w2 if i in set(thai_consonants)]
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set(thai_consonants) is being constructed inside the list comprehension, which recreates the set for each element of list_w2. Build the set once (e.g. at module scope or once per function call) and reuse it for membership checks.

Copilot uses AI. Check for mistakes.
if not consonants_in_w2:
raise ValueError(
f"w2 {w2!r} contains no Thai consonants."
)
consonant_start = consonants_in_w2[0]
Comment on lines +41 to +46
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new branch (raising ValueError when w2 contains no Thai consonants) isn’t covered by the existing nighit tests. Add a unit test for a vowel-only (or empty) w2 to ensure it raises ValueError instead of regressing to IndexError.

Copilot uses AI. Check for mistakes.
if consonant_start in ["ก", "ช", "ค", "ข", "ง"]:
newword.append("ง")
elif consonant_start in ["จ", "ฉ", "ช", "ฌ"]:
Expand Down
Loading