Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fn/matches.re.xml: re00984 unicode-version #6

Open
zadean opened this issue Aug 19, 2019 · 2 comments
Open

fn/matches.re.xml: re00984 unicode-version #6

zadean opened this issue Aug 19, 2019 · 2 comments

Comments

@zadean
Copy link
Contributor

zadean commented Aug 19, 2019

Test re00984 tests a large number of code-points for the \w character sequence.
Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence.

Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?

@michaelhkay
Copy link
Contributor

michaelhkay commented Aug 19, 2019 via email

@zadean
Copy link
Contributor Author

zadean commented Aug 20, 2019

@michaelhkay You make a very good point, and a separate test for the reclassified characters is definitely the better answer.

I took a quick look through the notes for the unicode updates since 6.3 and only found a few more category changes, but none that seem to break things in the current test suite as it stands.

Just a side note:
It may also be of interest to "modernize" a bit by adding some of the new emoji/emoticon codepoints to the \p{So} tests (re00169 & re00207). I imagine they will are showing up in real data and adding them would add value to the test cases. Not that this suite is a unicode test-suite, but just a few to show some level of compliance for the newer characters. But that is something for a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants