This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Do not trigger go-to-def textual fallback inside string literals
ClosedPublic

Authored by nridge on Mar 12 2020, 2:09 PM.

Diff Detail

Event Timeline

nridge created this revision.Mar 12 2020, 2:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2020, 2:09 PM

This is my attempt to avoid triggering the textual fallback for string literals, as per the discussion in the issue.

To classify tokens into the categories discussed in the issue, I resurrected and modified the getTokenFlavor() function that was removed in D75176.

However, it does not seem to be working as I expect -- Lexer::getRawToken() is returning TokenKind::raw_identifier inside the string literal rather than TokenKind::string_literal.

Am I misunderstanding how this API is supposed to work? I thought "raw" meant "no preprocessor", and things like string literals would still be recognized in raw mode.

kadircet added inline comments.Mar 16 2020, 12:51 AM
clang-tools-extra/clangd/XRefs.cpp
366

you can rather use AST.getTokens().spelledTokenAt(Loc) to get preprocessed token, and pass that into getTokenFlavor rather than a SourceLocation.

sammccall added inline comments.Mar 16 2020, 3:58 AM
clang-tools-extra/clangd/XRefs.cpp
360

I don't think we need this function, which just maps one enum onto another - just use the token kind directly?

366

for a bit more context: running Lexer::getRawToken runs a raw lex that only sees the piece of text you point at. If you're inside a huge comment, it won't know that.

Using AST.getTokens() uses the results of an earlier raw lex of the whole file.

nridge marked an inline comment as done.Mar 17 2020, 1:06 PM
nridge added inline comments.
clang-tools-extra/clangd/XRefs.cpp
366

Hmm, I've tried this and spelledTokenAt() seems to return null for comment tokens.

nridge marked an inline comment as done.Mar 17 2020, 1:17 PM
nridge added inline comments.
clang-tools-extra/clangd/XRefs.cpp
366

It looks like there are two reasons for this:

  • The lexer that is producing the token buffer is using inKeepCommentMode() == false
  • spelledTokenAt() only returns a result if you give it the location at the beginning of the token, not something in the middle
nridge updated this revision to Diff 250885.Mar 17 2020, 1:23 PM

Use TokenBuffer instead of a raw lexer.

Note that getting this to word required enabling comment-retention mode for
the lexer which produces the TokenBuffer. I'm not sure if this is desirable
in general.

nridge retitled this revision from [WIP] [clangd] Do not trigger go-to-def textual fallback inside string literals to [clangd] Do not trigger go-to-def textual fallback inside string literals.Mar 17 2020, 2:11 PM
sammccall added inline comments.Mar 17 2020, 2:18 PM
clang-tools-extra/clangd/XRefs.cpp
367

this means you're not going to resolve foo in a.^foo (you're touching two tokens).

The cleanest thing seems to be to use the word you've identified: iterate over the spelledTokensTouching(WordStart) and accept the one where tok.range(SM).touches(WordOffset + Word.size())

clang/lib/Tooling/Syntax/Tokens.cpp
343 ↗(On Diff #250885)

Yikes, I didn't remember TokenBuffer doesn't currently record comment tokens.
I'm afraid this isn't a trivial change and would definitely need tests to verify it doesn't interfere with translating between spelled/expanded tokens (I'm pretty sure there are tests that comments *aren't* retained now in TokensTest.cpp, part of SyntaxTests)

Given that, the shorter route for this patch would be to blacklist string literals rather than whitelisting comments + identifiers.

nridge updated this revision to Diff 251418.Mar 19 2020, 10:46 AM

Take a blacklisting approach

nridge marked 5 inline comments as done.Mar 19 2020, 10:47 AM
sammccall accepted this revision.Mar 19 2020, 1:53 PM
This revision is now accepted and ready to land.Mar 19 2020, 1:53 PM
nridge marked an inline comment as done.Mar 19 2020, 2:19 PM
nridge added inline comments.
clang-tools-extra/clangd/XRefs.cpp
381

Whoops, meant to use WordStart rather than Loc here.

nridge updated this revision to Diff 251476.Mar 19 2020, 2:22 PM

Use WordStart

This revision was automatically updated to reflect the committed changes.