You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<ahref="http://example.com/"><span>foo</span></a><ahref="http://example.com/">bar</a> foo bar
gets converted to
[foo](http://example.com/) bar foo [bar](http://example.com/)
when I would expect
[foo](http://example.com/)[bar](http://example.com/) foo bar
What happens: the walker enters the a and then enters the span again, finding the same text twice. But the proper label text was already consumed, so it consumes until it finds another match. In some cases (like above) it even linkifies the wrong text.
2. line breaks in link
<ahref="https://example.com/">foo
bar</a>
(notice the line break) gets converted to
foo bar
when I would expect
[foo bar](https://example.com/)
What happens: the node textContent is 'foo\nbar' which doesn't match any text in the plaintext.
2b. <br> in link
Another similar (but possibly harder to fix) case is
<ahref="https://example.com/">foo<br>bar</a>
which like 2 doesn't create a link:
foo
bar
In general, since the code can't rely on the browser to properly deal with HTML content, some of these corner cases will probably keep popping up. But this happened in the real world (try copying and pasting the first news entry in the deprecated section here) and it seemed to be significant enough to report.
I will follow up with a PR that suggests potential fixes for 1 and 2 (but not 2b), but this is not my field so it might be far from good.
The text was updated successfully, but these errors were encountered:
Thanks for fixing these edge cases, @sorcio ❇️! Note that plaintext paste (Ctrl/Cmd + Shift + V, improvements coming) can always be used if future edge cases are encountered.
A couple cases that generate the wrong markup.
1. nested html elements in link
gets converted to
when I would expect
What happens: the walker enters the
a
and then enters thespan
again, finding the same text twice. But the proper label text was already consumed, so it consumes until it finds another match. In some cases (like above) it even linkifies the wrong text.2. line breaks in link
(notice the line break) gets converted to
when I would expect
What happens: the node textContent is
'foo\nbar'
which doesn't match any text in the plaintext.2b.
<br>
in linkAnother similar (but possibly harder to fix) case is
which like 2 doesn't create a link:
In general, since the code can't rely on the browser to properly deal with HTML content, some of these corner cases will probably keep popping up. But this happened in the real world (try copying and pasting the first news entry in the deprecated section here) and it seemed to be significant enough to report.
I will follow up with a PR that suggests potential fixes for 1 and 2 (but not 2b), but this is not my field so it might be far from good.
The text was updated successfully, but these errors were encountered: