Multibyte characters ignored on OS X #1

cpsdqs · 2016-09-18T19:22:24Z

Characters with large code points are ignored for some reason. Possibly because they use multiple bytes.

miestasmia · 2016-09-19T14:35:22Z

I'm not sure what you're trying to write but afaik u+F09F does not exist.

cpsdqs · 2016-09-19T14:39:58Z

Each of the first six characters that aren't a space use four bytes, not two: e.g. F0 9F 90 A1, which is u+1F421 (blowfish)

miestasmia · 2016-09-19T14:54:19Z

Sorry, my bad. I'm afraid I can't reproduce it.

miestasmia · 2017-09-24T10:11:42Z

@cpsdqs Can you please confirm if this is still an issue in the latest version? Some of the internals have been changed.

cpsdqs · 2017-09-24T10:20:42Z

sure is

miestasmia · 2017-09-24T10:25:27Z

I still cannot reproduce this, which is leading me to think it's a Mac-only issue. Can you try run this on a Linux machine by any chance?

miestasmia · 2017-09-24T10:30:48Z

Additionally, could you try prepending export PYTHONIOENCODING=utf-8 prior to running unilookup on OS X?

cpsdqs · 2017-09-24T10:46:28Z

Doesn't work either ¯\_(ツ)_/¯
['D83C', 'DF29', '000A'] vs. ['1F329', '000A']
seems to be an issue with python itself

miestasmia · 2017-09-24T10:55:26Z

Could you try manually setting the input string instead of reading from stdin so we can try figure out where the issue arises?

cpsdqs · 2017-09-24T11:00:40Z

replacing sys.stdin on line 31 with ['🌩'] and adding # coding=utf-8 on line 2 (because it doesn't even run without: SyntaxError: Non-ASCII character '\xf0' in file ./unilookup on line 31, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details) doesn't really work

miestasmia · 2017-09-24T11:04:12Z

Hm, can you try add print char in the second for loop (at :33)?

cpsdqs · 2017-09-24T11:06:45Z

prints:

���
���

(all of them u+FFFD) … (side note: unilookup works fine for echo '��' | unilookup)

miestasmia · 2017-09-24T11:08:44Z

Okay so that seems to be where it breaks, because I'm getting the individual characters on Linux. I'll look into it

miestasmia · 2017-09-24T11:18:53Z

From what I've been able to find this is an issue with how Python 2 (on some platforms) handles unicode. The solution here would be to read the byte stream and manually determine byte length (as done here) and then split the byte stream as appropriate. This'll require quite a bit of refactoring to do, but I'll try to get it done soonish.

miestasmia closed this as completed Sep 19, 2016

miestasmia added the bug label Sep 19, 2016

miestasmia reopened this Sep 24, 2017

miestasmia self-assigned this Sep 24, 2017

miestasmia added the help wanted label Sep 24, 2017

miestasmia changed the title ~~Characters are ignored~~ Multibyte characters ignored on OS X Oct 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multibyte characters ignored on OS X #1

Multibyte characters ignored on OS X #1

cpsdqs commented Sep 18, 2016 •

edited

Loading

miestasmia commented Sep 19, 2016

cpsdqs commented Sep 19, 2016

miestasmia commented Sep 19, 2016 •

edited

Loading

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017 •

edited

Loading

miestasmia commented Sep 24, 2017 •

edited

Loading

cpsdqs commented Sep 24, 2017 •

edited by miestasmia

Loading

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017

miestasmia commented Sep 24, 2017

Multibyte characters ignored on OS X #1

Multibyte characters ignored on OS X #1

Comments

cpsdqs commented Sep 18, 2016 • edited Loading

miestasmia commented Sep 19, 2016

cpsdqs commented Sep 19, 2016

miestasmia commented Sep 19, 2016 • edited Loading

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017 • edited Loading

miestasmia commented Sep 24, 2017 • edited Loading

cpsdqs commented Sep 24, 2017 • edited by miestasmia Loading

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 24, 2017

miestasmia commented Sep 24, 2017

miestasmia commented Sep 24, 2017

cpsdqs commented Sep 18, 2016 •

edited

Loading

miestasmia commented Sep 19, 2016 •

edited

Loading

miestasmia commented Sep 24, 2017 •

edited

Loading

miestasmia commented Sep 24, 2017 •

edited

Loading

cpsdqs commented Sep 24, 2017 •

edited by miestasmia

Loading