-
Notifications
You must be signed in to change notification settings - Fork 1k
Unicode 17.0.0 #1006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode 17.0.0 #1006
Conversation
|
Would the name Historical note: I originally implemented |
|
I considered The Unicode name for Maybe we should call it |
Compute the step table when "extract-char-cases.ss" is run.
|
The name The extra work to handle indic conjuncts slows down I pushed a commit at https://github.com/mflatt/ChezScheme/tree/grapheme-step-table for your consideration. It moves your improved |
|
Using a table for the step function is a great idea! I incorporated your code and moved the $char-extended-pictographic? function so that it uses the updated grapheme-break table. I also renamed the new function to |
|
@burgerrg All of your changes look good to me! I noticed a bug in the code that I added: if |
|
Thanks for finding that! I masked out the state fixnum. Please double-check that I have the right mask. I also found a couple places in 5_4.mo that report expected character counts and updated them. |
|
@mflatt, thank you for your help with this! |
Support Unicode 17.0.0.
The function
char-indic-break-propertywas added to support correct grapheme cluster identification for Indic scripts.The grapheme cluster break test was updated to use the test file from the Unicode Consortium.
Follow unicode/Readme to make future Unicode updates.