Skip to content

Conversation

@ren1244
Copy link

@ren1244 ren1244 commented Jul 31, 2025

I encountered the same issue as described in #173.
After reviewing the source code, I found that the problem was caused by an incorrect Unicode range check for Kanji characters.
This pull request fixes that issue.

For example, given the input string: 兩個黃鸝鳴翠柳
It should be segmented as:

  • Kanji: 兩個
  • Byte: 黃鸝
  • Kanji: 鳴翠柳

In the previous version, 黃鸝 was mistakenly classified as Kanji, but these characters are not Kanji.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant