This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Only minimally escape text when rendering to markdown.
ClosedPublic

Authored by sammccall on Mar 5 2020, 8:44 AM.

Details

Summary

Conservatively escaping everything is bad in coc.nvim which shows the markdown
to the user, and we have reports of it causing problems for other parsers.

Fixes https://github.com/clangd/clangd/issues/301

Diff Detail

Event Timeline

sammccall created this revision.Mar 5 2020, 8:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 5 2020, 8:44 AM
sammccall updated this revision to Diff 248493.Mar 5 2020, 8:47 AM

couple more tests

Just writing down my investigation results for context:
Looks like we'll never escape $%'(,/:;?@[^{|} anymore.

Markdown already doesn't provide backslash escaping for $%',/:;?@^| which leaves us with parentheses ([{}:

  • [ looks fine as it is only used for hyperlinks and we escape the ] if need be
  • '(' again seems to be only special for providing a text for hyperlinks, so that should also be OK.
  • {} apparently doesn't do anything in normal markdown but some extensions make use of it for adding ids to headers, i.e # header1 {#mycrazyheader}, so again should be OK.
clang-tools-extra/clangd/FormattedString.cpp
69

what is the ? for ?

73

we early exited already in case of empty contents

130

nit: if(!StartsLine || !Before.empty()))

132

nit: After.ltrim('#')

133

markdown is weird :(

139

s/the a/the/

150

_ seems to behave different than * :(

it seems to rather depend on the spaces around the text being emphasized, i.e

foo _ bar _ foo -> no emphasis
foo _ bar_ foo -> no emphasis
foo _bar_ foo -> emphasis on bar
foo_bar_ foo -> no emphasis

so this should rather be Before.endswith(" ") && isAlpha(After) for the beginning of emphasis and the opposite for the ending.
Not sure if there's an easy way to decide on it in isolation.

162

nit: return IsBullet() || RulerLength() || !SpaceSurrounds();

164

is this really worth all the trouble ?

166

I believe it is also allowed to have whitespaces(less than 4?) before the > to be interpreted as a quote marker.

sammccall updated this revision to Diff 250624.Mar 16 2020, 1:38 PM
sammccall marked 10 inline comments as done.

Address review comments

Sorry for the slow turnaround, bit off more than I could chew on a few fronts :-(

clang-tools-extra/clangd/FormattedString.cpp
69

<?foo... is a processing instruction, which is valid inline HTML (and therefore needs escaping)

132

I'm trying to avoid repeating the known values of C as there's various fallthrough/copied cases and it seems more fragile.

164

Simplified it a bit. I think it's worth not escaping < when not matched with >. Sadly the rule can't be quite that simple because multiple chunks can form a tag.

166

Yes, but a chunk never begins with whitespace (see assert at top of function)

kadircet accepted this revision.Mar 16 2020, 11:36 PM

thanks, lgtm!

clang-tools-extra/clangd/FormattedString.cpp
150

regarding this one, i suppose we'll just be escaping in some unnecessary cases(like the 2nd and the 4th), but still better than the current state so nvm.

This revision is now accepted and ready to land.Mar 16 2020, 11:36 PM
sammccall marked an inline comment as done.Mar 17 2020, 5:10 AM

Filed https://github.com/google/llvm-premerge-checks/issues/147 for the spurious unit test failure.

clang-tools-extra/clangd/FormattedString.cpp
150

Oops, I forgot to reply to this one. Good catch that * and _ are different.
The rules are indeed really complicated (particularly for _ next to punctuation) and we don't know whether we are at the start or end.

However my reading of the spec says alnum_alnum never needs to be escaped, and that's incredibly common, so I've added that special case. (Same is true for alnum___alnum, but I don't think that's common enough to bother with).

sammccall updated this revision to Diff 250730.Mar 17 2020, 5:10 AM

Fix '_' rules.

thanks, still LG

This revision was automatically updated to reflect the committed changes.