-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Update CaseInsensitive table from hybrid of (Unicode 6.3 and later, up to 8.0) to full Unicode 8.0 support. Fixes #517 #2356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
What was the scenario that you are trying to fix? Is there need to add some test cases? |
|
@kunalspathak - The initial commit I pushed was just a table normalization change that I just wanted to put up for preview so I could show to others. Edit: the full PR is now up. |
… Unicode 8.0 [1]. Fixed by breaking range 0x01B8, 0x01BD of pairs into the individual mappings, following the usual convention in this table of unrolling pairs rather than using an entry showing a range of pairs. Correct behavior under Under case-insensitive /i is: Shows 01B8 == 01B9 under /i. Shows 01BA != 01BB under /i. Shows 01BC == 01BD under /i. Relevant lines of the UnicodeData.txt for Unicode 8.0 [1]: 01B8;LATIN CAPITAL LETTER EZH REVERSED;Lu;0;L;;;;;N;LATIN CAPITAL LETTER REVERSED YOGH;;;01B9; 01B9;LATIN SMALL LETTER EZH REVERSED;Ll;0;L;;;;;N;LATIN SMALL LETTER REVERSED YOGH;;01B8;;01B8 01BA;LATIN SMALL LETTER EZH WITH TAIL;Ll;0;L;;;;;N;LATIN SMALL LETTER YOGH WITH TAIL;;;; 01BB;LATIN LETTER TWO WITH STROKE;Lo;0;L;;;;;N;LATIN LETTER TWO BAR;;;; 01BC;LATIN CAPITAL LETTER TONE FIVE;Lu;0;L;;;;;N;;;;01BD; 01BD;LATIN SMALL LETTER TONE FIVE;Ll;0;L;;;;;N;;;01BC;;01BCC -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later.
…) in case-insensitive equivalence table which caused matches to be missed. Correct behavior under Under case-insensitive /i is: 01C4 == 01C5 == 01C6 : DZ WITH CARON (uppercase, titlecase, lowercase) 01C7 == 01C8 == 01C9 : LJ (uppercase, titlecase, lowercase) 01CA == 01CB == 01CC : NJ (uppercase, titlecase, lowercase) 01F1 == 01F2 == 01F3 : DZ (uppercase, titlecase, lowercase) Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: 01C4;LATIN CAPITAL LETTER DZ WITH CARON;Lu;0;L;<compat> 0044 017D;;;;N;LATIN CAPITAL LETTER D Z HACEK;;;01C6;01C5 01C5;LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON;Lt;0;L;<compat> 0044 017E;;;;N;LATIN LETTER CAPITAL D SMALL Z HACEK;;01C4;01C6;01C5 01C6;LATIN SMALL LETTER DZ WITH CARON;Ll;0;L;<compat> 0064 017E;;;;N;LATIN SMALL LETTER D Z HACEK;;01C4;;01C5 01C7;LATIN CAPITAL LETTER LJ;Lu;0;L;<compat> 004C 004A;;;;N;LATIN CAPITAL LETTER L J;;;01C9;01C8 01C8;LATIN CAPITAL LETTER L WITH SMALL LETTER J;Lt;0;L;<compat> 004C 006A;;;;N;LATIN LETTER CAPITAL L SMALL J;;01C7;01C9;01C8 01C9;LATIN SMALL LETTER LJ;Ll;0;L;<compat> 006C 006A;;;;N;LATIN SMALL LETTER L J;;01C7;;01C8 01CA;LATIN CAPITAL LETTER NJ;Lu;0;L;<compat> 004E 004A;;;;N;LATIN CAPITAL LETTER N J;;;01CC;01CB 01CB;LATIN CAPITAL LETTER N WITH SMALL LETTER J;Lt;0;L;<compat> 004E 006A;;;;N;LATIN LETTER CAPITAL N SMALL J;;01CA;01CC;01CB 01CC;LATIN SMALL LETTER NJ;Ll;0;L;<compat> 006E 006A;;;;N;LATIN SMALL LETTER N J;;01CA;;01CB 01F1;LATIN CAPITAL LETTER DZ;Lu;0;L;<compat> 0044 005A;;;;N;;;;01F3;01F2 01F2;LATIN CAPITAL LETTER D WITH SMALL LETTER Z;Lt;0;L;<compat> 0044 007A;;;;N;;;01F1;01F3;01F2 01F3;LATIN SMALL LETTER DZ;Ll;0;L;<compat> 0064 007A;;;;N;;;01F1;;01F2 -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0.
… added regression tests. Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: 01DE;LATIN CAPITAL LETTER A WITH DIAERESIS AND MACRON;Lu;0;L;00C4 0304;;;;N;LATIN CAPITAL LETTER A DIAERESIS MACRON;;;01DF; 01DF;LATIN SMALL LETTER A WITH DIAERESIS AND MACRON;Ll;0;L;00E4 0304;;;;N;LATIN SMALL LETTER A DIAERESIS MACRON;;01DE;;01DE 01E0;LATIN CAPITAL LETTER A WITH DOT ABOVE AND MACRON;Lu;0;L;0226 0304;;;;N;LATIN CAPITAL LETTER A DOT MACRON;;;01E1; 01E1;LATIN SMALL LETTER A WITH DOT ABOVE AND MACRON;Ll;0;L;0227 0304;;;;N;LATIN SMALL LETTER A DOT MACRON;;01E0;;01E0 01E2;LATIN CAPITAL LETTER AE WITH MACRON;Lu;0;L;00C6 0304;;;;N;LATIN CAPITAL LETTER A E MACRON;;;01E3; 01E3;LATIN SMALL LETTER AE WITH MACRON;Ll;0;L;00E6 0304;;;;N;LATIN SMALL LETTER A E MACRON;;01E2;;01E2 01E4;LATIN CAPITAL LETTER G WITH STROKE;Lu;0;L;;;;;N;LATIN CAPITAL LETTER G BAR;;;01E5; 01E5;LATIN SMALL LETTER G WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER G BAR;;01E4;;01E4 01E6;LATIN CAPITAL LETTER G WITH CARON;Lu;0;L;0047 030C;;;;N;LATIN CAPITAL LETTER G HACEK;;;01E7; 01E7;LATIN SMALL LETTER G WITH CARON;Ll;0;L;0067 030C;;;;N;LATIN SMALL LETTER G HACEK;;01E6;;01E6 01E8;LATIN CAPITAL LETTER K WITH CARON;Lu;0;L;004B 030C;;;;N;LATIN CAPITAL LETTER K HACEK;;;01E9; 01E9;LATIN SMALL LETTER K WITH CARON;Ll;0;L;006B 030C;;;;N;LATIN SMALL LETTER K HACEK;;01E8;;01E8 01EA;LATIN CAPITAL LETTER O WITH OGONEK;Lu;0;L;004F 0328;;;;N;LATIN CAPITAL LETTER O OGONEK;;;01EB; 01EB;LATIN SMALL LETTER O WITH OGONEK;Ll;0;L;006F 0328;;;;N;LATIN SMALL LETTER O OGONEK;;01EA;;01EA 01EC;LATIN CAPITAL LETTER O WITH OGONEK AND MACRON;Lu;0;L;01EA 0304;;;;N;LATIN CAPITAL LETTER O OGONEK MACRON;;;01ED; 01ED;LATIN SMALL LETTER O WITH OGONEK AND MACRON;Ll;0;L;01EB 0304;;;;N;LATIN SMALL LETTER O OGONEK MACRON;;01EC;;01EC 01EE;LATIN CAPITAL LETTER EZH WITH CARON;Lu;0;L;01B7 030C;;;;N;LATIN CAPITAL LETTER YOGH HACEK;;;01EF; 01EF;LATIN SMALL LETTER EZH WITH CARON;Ll;0;L;0292 030C;;;;N;LATIN SMALL LETTER YOGH HACEK;;01EE;;01EE 01F0;LATIN SMALL LETTER J WITH CARON;Ll;0;L;006A 030C;;;;N;LATIN SMALL LETTER J HACEK;;;; <NO MAPPING> 01F1;LATIN CAPITAL LETTER DZ;Lu;0;L;<compat> 0044 005A;;;;N;;;;01F3;01F2 DZ (uppercase) 01F2;LATIN CAPITAL LETTER D WITH SMALL LETTER Z;Lt;0;L;<compat> 0044 007A;;;;N;;;01F1;01F3;01F2 DZ (titlecase) 01F3;LATIN SMALL LETTER DZ;Ll;0;L;<compat> 0064 007A;;;;N;;;01F1;;01F2 DZ (lowercase) 01F4;LATIN CAPITAL LETTER G WITH ACUTE;Lu;0;L;0047 0301;;;;N;;;;01F5; [3] 01F5;LATIN SMALL LETTER G WITH ACUTE;Ll;0;L;0067 0301;;;;N;;;01F4;;01F4 [3] -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0. [3] Already included in the table as a pair mapping.
Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: 03F3;GREEK LETTER YOT;Ll;0;L;;;;;N;;;037F;;037F 037F;GREEK CAPITAL LETTER YOT;Lu;0;L;;;;;N;;;;03F3; -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0.
Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: 0528;CYRILLIC CAPITAL LETTER EN WITH LEFT HOOK;Lu;0;L;;;;;N;;;;0529; 0529;CYRILLIC SMALL LETTER EN WITH LEFT HOOK;Ll;0;L;;;;;N;;;0528;;0528 052A;CYRILLIC CAPITAL LETTER DZZHE;Lu;0;L;;;;;N;;;;052B; 052B;CYRILLIC SMALL LETTER DZZHE;Ll;0;L;;;;;N;;;052A;;052A 052C;CYRILLIC CAPITAL LETTER DCHE;Lu;0;L;;;;;N;;;;052D; 052D;CYRILLIC SMALL LETTER DCHE;Ll;0;L;;;;;N;;;052C;;052C 052E;CYRILLIC CAPITAL LETTER EL WITH DESCENDER;Lu;0;L;;;;;N;;;;052F; 052F;CYRILLIC SMALL LETTER EL WITH DESCENDER;Ll;0;L;;;;;N;;;052E;;052E A698;CYRILLIC CAPITAL LETTER DOUBLE O;Lu;0;L;;;;;N;;;;A699; A699;CYRILLIC SMALL LETTER DOUBLE O;Ll;0;L;;;;;N;;;A698;;A698 A69A;CYRILLIC CAPITAL LETTER CROSSED O;Lu;0;L;;;;;N;;;;A69B; A69B;CYRILLIC SMALL LETTER CROSSED O;Ll;0;L;;;;;N;;;A69A;;A69A -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0.
Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: Uppercase: 13A0-13EF Lowercase: AB70-ABBF Extra letters in case-mapped pairs: 13F0-13FD -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0.
Relevant lines of the UnicodeData.txt for Unicode 8.0 [1][2]: A796;LATIN CAPITAL LETTER B WITH FLOURISH;Lu;0;L;;;;;N;;;;A797; A797;LATIN SMALL LETTER B WITH FLOURISH;Ll;0;L;;;;;N;;;A796;;A796 A798;LATIN CAPITAL LETTER F WITH STROKE;Lu;0;L;;;;;N;;;;A799; A799;LATIN SMALL LETTER F WITH STROKE;Ll;0;L;;;;;N;;;A798;;A798 A79A;LATIN CAPITAL LETTER VOLAPUK AE;Lu;0;L;;;;;N;;;;A79B; A79B;LATIN SMALL LETTER VOLAPUK AE;Ll;0;L;;;;;N;;;A79A;;A79A A79C;LATIN CAPITAL LETTER VOLAPUK OE;Lu;0;L;;;;;N;;;;A79D; A79D;LATIN SMALL LETTER VOLAPUK OE;Ll;0;L;;;;;N;;;A79C;;A79C A79E;LATIN CAPITAL LETTER VOLAPUK UE;Lu;0;L;;;;;N;;;;A79F; A79F;LATIN SMALL LETTER VOLAPUK UE;Ll;0;L;;;;;N;;;A79E;;A79E A7AA;LATIN CAPITAL LETTER H WITH HOOK;Lu;0;L;;;;;N;;;;0266; A7AB;LATIN CAPITAL LETTER REVERSED OPEN E;Lu;0;L;;;;;N;;;;025C; A7AC;LATIN CAPITAL LETTER SCRIPT G;Lu;0;L;;;;;N;;;;0261; A7AD;LATIN CAPITAL LETTER L WITH BELT;Lu;0;L;;;;;N;;;;026C; A7B0;LATIN CAPITAL LETTER TURNED K;Lu;0;L;;;;;N;;;;029E; A7B1;LATIN CAPITAL LETTER TURNED T;Lu;0;L;;;;;N;;;;0287; A7B2;LATIN CAPITAL LETTER J WITH CROSSED-TAIL;Lu;0;L;;;;;N;;;;029D; A7B3;LATIN CAPITAL LETTER CHI;Lu;0;L;;;;;N;;;;AB53; A7B4;LATIN CAPITAL LETTER BETA;Lu;0;L;;;;;N;;;;A7B5; A7B5;LATIN SMALL LETTER BETA;Ll;0;L;;;;;N;;;A7B4;;A7B4 A7B6;LATIN CAPITAL LETTER OMEGA;Lu;0;L;;;;;N;;;;A7B7; A7B7;LATIN SMALL LETTER OMEGA;Ll;0;L;;;;;N;;;A7B6;;A7B6 AB53;LATIN SMALL LETTER CHI;Ll;0;L;;;;;N;;;A7B3;;A7B3 -- [1] Currently fixing bugs in Unicode 8.0 because the source code claims compliance with Unicode 8.0 at the moment. Will update to Unicode 9.0 later. [2] These lines in Unicode 8.0 are equivalent to the lines in Unicode 9.0.
|
/cc @bterlson @mathiasbynens - Care to take a look? Also, any chance you could point me to where in the spec it mentions which case-insensitive matches should not be done without /u flag? For this PR I focused on compat with v8 for those codepoints. |
See step 2 of https://tc39.github.io/ecma262/#sec-runtime-semantics-canonicalize-ch (“If Unicode is true, then … else …”). I’ll take a closer look at this patch as soon as possible! |
|
For me it seems like reviewing the tool is just as important as the change in this PR, so I would encourage the other reviewers to do the same. Perhaps we need a separate PR for that? |
|
@tcare I can submit a PR for that but since I don't plan on merging that in, at least not to PR: dilijev#3 |
|
@mathiasbynens Thanks! Basically it seems the difference is without /u mapping under toUpperCase, and with /u, relying on CaseFolding.txt. In a lot of cases, the CaseFolding.txt file provides exactly the same mapping as UnicodeData.txt, but I guess there might be some differences versus what is implemented in toUpperCase? In the case of our table, we use the Basically under I'm still surprised about the Cherokee letters, because the Cherokee letters are now split into uppercase and lowercase ranges since Unicode 8.0. It seems like the lowercase/uppercase pairs should still match considering there are a lot of multilingual plane code points with upper/lower pairs for which there is a match without /u. |
|
I actually generated much larger test files which test all of the equivalences captured in this table. I'll add those tests to this PR. |
|
I might look into reducing the size of these test files to save space. Minification! @curtisman Thinks that the tool which generates the table, once reviewed, could probably be checked in to ChakraCore in |
|
Didn't save many lines by minifying the tests, but saved nearly 100 KB on final code size. The extra diff in the original test file will still bloat the repo somewhat. Oh well. |
… done with /u. Add generated tests. The generated test cases exhaustively test all case-insensitive single-character matches under Unicode 8.0. Could probably have written these tests as a mirror of the table from the source and some logic from the table generator, but this is more straightforward. Includes tests for which cases should not case-fold without /u flag. (Focused on v8 compat.) Minified the function names, and removed spaces and semicolons to save file size at a slight readability cost, but since they're not very readable without an encyclopediac knowledge of unicode code points, readability seems like a non-goal.
|
Looks good, I also had a quick look at the tool and a discussion with Doug offline. We have v8 compat achieved with the tests on this change so I am happy. |
|
@mathiasbynens I'm planning to merge this today since I'm going on vacation for 2 weeks and don't want to let this sit here without additional internal coverage. However, I would still greatly appreciate your feedback! |
|
LGTM |
|
Thanks @Cellule and @boingoing for your help reviewing! |
…nicode 6.3 and later, up to 8.0) to full Unicode 8.0 support. Fixes #517 Merge pull request #2356 from dilijev:unicase Update CaseInsensitive table from hybrid of (Unicode 6.3 and later, up to 8.0) to full Unicode 8.0 support. Fixes #517 Note: The current standard wants Unicode 9.0 but it might be too risky to update that far in a stabilization branch. Opened #2367 to track this work item. The table was generated in the past but then was (mostly) manually edited to include various optimizations and to fix bugs over the years. To make sure we got a complete update, I wrote a tool to generate the table. ## CaseInsensitive mapping generator tool PR: dilijev#3 Source: https://github.com/dilijev/ChakraCore/tree/CaseInsensitive/tools/Unicode/CaseInsensitive From this tool I was able to see and apply the differences from the current implementation to the correct implementation. In order to keep the change as small as possible, I used the diff as a reference for what needed changing and left out non-essential diffs. Additionally, the tool generates a suite of tests to track regressions against the update and ensure that the implementation does what is expected. I took some key tests from that suite and created the test file contained in this PR. # Overview of Changes I have staged the changes to hopefully make this easier to review. Here's an overview. NOTE: The individual commits list or summarize the relevant lines of UnicodeData.txt where applicable. First, I normalized the existing table to a reasonable format (same as the output of the tool) to make the later commits more clear. This involves fixing the casing and sorting deltas on each line in ascending order. 3d0f37f Next, I fixed a few bugs with the current table that were preventing some cases from being matched correctly. abb5d91 4894d24 25049de Added new codepoints: f197902 - GREEK LETTER YOT af2d083 - Cyrillic cba5439 - Cherokee 6c25a51 - Latin extensions Other tests and fixes: cb736ab - Add test cases from #517 to ensure those issues are fixed. fbfb953 - 0x0345 and case-insensitive equivalent characters with or without /u flag. dc3e750 - Case-insensitive matching for Cherokee only with /u. [1] d96eed5 - All other Unicode 8.0 cases of case-insensitive matching only with /u. [1] Added generated tests. [1] These were with a focus on compat with v8 as determined by running the full regression test suite I generated against node-6.9.4-LTS and node-7.4.0 (latest), and double-checking a handful of tests against the latest stable Chrome (v 55). # Test Coverage * **Regex test run successful!** `Summary: E:\d\RegexTestCollateral had 151147 tests; 0 failures` * Internal and slow tests pass. * Note that PRs are merged with the target branch before running Jenkins checks so attempting to run slow tests on this PR would result in failures as per #2316 -- but running them locally on this branch, the tests pass. # Reviewers @tcare @bterlson @boingoing @Cellule - Thank you for your assistance with this change and for your code reviews.
… hybrid of (Unicode 6.3 and later, up to 8.0) to full Unicode 8.0 support. Fixes #517 Merge pull request #2356 from dilijev:unicase Update CaseInsensitive table from hybrid of (Unicode 6.3 and later, up to 8.0) to full Unicode 8.0 support. Fixes #517 Note: The current standard wants Unicode 9.0 but it might be too risky to update that far in a stabilization branch. Opened #2367 to track this work item. The table was generated in the past but then was (mostly) manually edited to include various optimizations and to fix bugs over the years. To make sure we got a complete update, I wrote a tool to generate the table. ## CaseInsensitive mapping generator tool PR: dilijev#3 Source: https://github.com/dilijev/ChakraCore/tree/CaseInsensitive/tools/Unicode/CaseInsensitive From this tool I was able to see and apply the differences from the current implementation to the correct implementation. In order to keep the change as small as possible, I used the diff as a reference for what needed changing and left out non-essential diffs. Additionally, the tool generates a suite of tests to track regressions against the update and ensure that the implementation does what is expected. I took some key tests from that suite and created the test file contained in this PR. # Overview of Changes I have staged the changes to hopefully make this easier to review. Here's an overview. NOTE: The individual commits list or summarize the relevant lines of UnicodeData.txt where applicable. First, I normalized the existing table to a reasonable format (same as the output of the tool) to make the later commits more clear. This involves fixing the casing and sorting deltas on each line in ascending order. 3d0f37f Next, I fixed a few bugs with the current table that were preventing some cases from being matched correctly. abb5d91 4894d24 25049de Added new codepoints: f197902 - GREEK LETTER YOT af2d083 - Cyrillic cba5439 - Cherokee 6c25a51 - Latin extensions Other tests and fixes: cb736ab - Add test cases from #517 to ensure those issues are fixed. fbfb953 - 0x0345 and case-insensitive equivalent characters with or without /u flag. dc3e750 - Case-insensitive matching for Cherokee only with /u. [1] d96eed5 - All other Unicode 8.0 cases of case-insensitive matching only with /u. [1] Added generated tests. [1] These were with a focus on compat with v8 as determined by running the full regression test suite I generated against node-6.9.4-LTS and node-7.4.0 (latest), and double-checking a handful of tests against the latest stable Chrome (v 55). # Test Coverage * **Regex test run successful!** `Summary: E:\d\RegexTestCollateral had 151147 tests; 0 failures` * Internal and slow tests pass. * Note that PRs are merged with the target branch before running Jenkins checks so attempting to run slow tests on this PR would result in failures as per #2316 -- but running them locally on this branch, the tests pass. # Reviewers @tcare @bterlson @boingoing @Cellule - Thank you for your assistance with this change and for your code reviews.
|
Belated LGTM, although I skipped the generated files in favor of the code that generates them, i.e. dilijev#3. 👍 Glad to see this merged! |
Update CaseInsensitive table from hybrid of (Unicode 6.3 and later, up to 8.0) to full Unicode 8.0 support.
Fixes #517
Note: The current standard wants Unicode 9.0 but it might be too risky to update that far in a stabilization branch. Opened #2367 to track this work item.
The table was generated in the past but then was (mostly) manually edited to include various optimizations and to fix bugs over the years. To make sure we got a complete update, I wrote a tool to generate the table.
CaseInsensitive mapping generator tool
PR: dilijev#3
Source: https://github.com/dilijev/ChakraCore/tree/CaseInsensitive/tools/Unicode/CaseInsensitive
From this tool I was able to see and apply the differences from the current implementation to the correct implementation. In order to keep the change as small as possible, I used the diff as a reference for what needed changing and left out non-essential diffs. Additionally, the tool generates a suite of tests to track regressions against the update and ensure that the implementation does what is expected. I took some key tests from that suite and created the test file contained in this PR.
Overview of Changes
I have staged the changes to hopefully make this easier to review. Here's an overview.
NOTE: The individual commits list or summarize the relevant lines of UnicodeData.txt where applicable.
First, I normalized the existing table to a reasonable format (same as the output of the tool) to make the later commits more clear. This involves fixing the casing and sorting deltas on each line in ascending order.
3d0f37f
Next, I fixed a few bugs with the current table that were preventing some cases from being matched correctly.
abb5d91
4894d24
25049de
Added new codepoints:
f197902 - GREEK LETTER YOT
af2d083 - Cyrillic
cba5439 - Cherokee
6c25a51 - Latin extensions
Other tests and fixes:
cb736ab - Add test cases from #517 to ensure those issues are fixed.
fbfb953 - 0x0345 and case-insensitive equivalent characters with or without /u flag.
dc3e750 - Case-insensitive matching for Cherokee only with /u. [1]
d96eed5 - All other Unicode 8.0 cases of case-insensitive matching only with /u. [1] Added generated tests.
[1] These were with a focus on compat with v8 as determined by running the full regression test suite I generated against node-6.9.4-LTS and node-7.4.0 (latest), and double-checking a handful of tests against the latest stable Chrome (v 55).
Test Coverage
Summary: E:\d\RegexTestCollateral had 151147 tests; 0 failuresReviewers
@tcare @bterlson @boingoing @Cellule - Thank you for your assistance with this change and for your code reviews.