Conservatively escaping everything is bad in coc.nvim which shows the markdown
to the user, and we have reports of it causing problems for other parsers.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Just writing down my investigation results for context:
Looks like we'll never escape $%'(,/:;?@[^{|} anymore.
Markdown already doesn't provide backslash escaping for $%',/:;?@^| which leaves us with parentheses ([{}:
- [ looks fine as it is only used for hyperlinks and we escape the ] if need be
- '(' again seems to be only special for providing a text for hyperlinks, so that should also be OK.
- {} apparently doesn't do anything in normal markdown but some extensions make use of it for adding ids to headers, i.e # header1 {#mycrazyheader}, so again should be OK.
clang-tools-extra/clangd/FormattedString.cpp | ||
---|---|---|
69 | what is the ? for ? | |
73 | we early exited already in case of empty contents | |
130 | nit: if(!StartsLine || !Before.empty())) | |
132 | nit: After.ltrim('#') | |
133 | markdown is weird :( | |
139 | s/the a/the/ | |
150 | _ seems to behave different than * :( it seems to rather depend on the spaces around the text being emphasized, i.e foo _ bar _ foo -> no emphasis foo _ bar_ foo -> no emphasis foo _bar_ foo -> emphasis on bar foo_bar_ foo -> no emphasis so this should rather be Before.endswith(" ") && isAlpha(After) for the beginning of emphasis and the opposite for the ending. | |
162 | nit: return IsBullet() || RulerLength() || !SpaceSurrounds(); | |
164 | is this really worth all the trouble ? | |
166 | I believe it is also allowed to have whitespaces(less than 4?) before the > to be interpreted as a quote marker. |
Sorry for the slow turnaround, bit off more than I could chew on a few fronts :-(
clang-tools-extra/clangd/FormattedString.cpp | ||
---|---|---|
69 | <?foo... is a processing instruction, which is valid inline HTML (and therefore needs escaping) | |
132 | I'm trying to avoid repeating the known values of C as there's various fallthrough/copied cases and it seems more fragile. | |
164 | Simplified it a bit. I think it's worth not escaping < when not matched with >. Sadly the rule can't be quite that simple because multiple chunks can form a tag. | |
166 | Yes, but a chunk never begins with whitespace (see assert at top of function) |
thanks, lgtm!
clang-tools-extra/clangd/FormattedString.cpp | ||
---|---|---|
150 | regarding this one, i suppose we'll just be escaping in some unnecessary cases(like the 2nd and the 4th), but still better than the current state so nvm. |
Filed https://github.com/google/llvm-premerge-checks/issues/147 for the spurious unit test failure.
clang-tools-extra/clangd/FormattedString.cpp | ||
---|---|---|
150 | Oops, I forgot to reply to this one. Good catch that * and _ are different. However my reading of the spec says alnum_alnum never needs to be escaped, and that's incredibly common, so I've added that special case. (Same is true for alnum___alnum, but I don't think that's common enough to bother with). |
what is the ? for ?