fn/matches.re.xml: re00984 unicode-version #6

zadean · 2019-08-19T17:20:36Z

Test re00984 tests a large number of code-points for the \w character sequence.
Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence.

Perhaps the test should include the "unicode-version" dependency flag for version "6.2"?

The text was updated successfully, but these errors were encountered:

michaelhkay · 2019-08-19T17:39:31Z

It would be a shame to put that dependency on the whole test - better to move the relevant part into a separate test with a dependency. Michael Kay

…

On 19 Aug 2019, at 18:20, Zachary Dean ***@***.***> wrote: Test re00984 tests a large number of code-points for the \w character sequence. Characters ⌈ and ⌉ are in this list. These codepoints were moved from \p{S} to \p{P} in unicode version 6.3, and therefore out of the \w character sequence. Perhaps the test should include the "unicode-version" dependency flag for version "6.2"? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6?email_source=notifications&email_token=AASIQIU2NDBFSFJP7XPA2NLQFLI6LA5CNFSM4IND6ULKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HGBKPQA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASIQIUHFAAG2QFROF6EZ7DQFLI6LANCNFSM4IND6ULA>.

zadean · 2019-08-20T18:53:35Z

@michaelhkay You make a very good point, and a separate test for the reclassified characters is definitely the better answer.

I took a quick look through the notes for the unicode updates since 6.3 and only found a few more category changes, but none that seem to break things in the current test suite as it stands.

Just a side note:
It may also be of interest to "modernize" a bit by adding some of the new emoji/emoticon codepoints to the \p{So} tests (re00169 & re00207). I imagine they will are showing up in real data and adding them would add value to the test cases. Not that this suite is a unicode test-suite, but just a few to show some level of compliance for the newer characters. But that is something for a different issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fn/matches.re.xml: re00984 unicode-version #6

fn/matches.re.xml: re00984 unicode-version #6

zadean commented Aug 19, 2019

michaelhkay commented Aug 19, 2019 via email

zadean commented Aug 20, 2019

fn/matches.re.xml: re00984 unicode-version #6

fn/matches.re.xml: re00984 unicode-version #6

Comments

zadean commented Aug 19, 2019

michaelhkay commented Aug 19, 2019 via email

zadean commented Aug 20, 2019