Replace usage of RawLexer with syntax tokens inside ReplayPreamble.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang-tools-extra/clangd/ParsedAST.cpp | ||
---|---|---|
137 | tokenizing the whole file an extra time on every AST build seems a bit sad - this is considerably more lexing than we were doing before. Probably doesn't matter? We could trim this to the preamble bounds I guess. Or even compute it once when the preamble is built, since we assume all the bytes are the same? I guess SourceLocations are the problem... we could just translate offsets into the new SM, but that gets messy. | |
176 | why raw encoding? | |
178 | this looks like a linear search for each #include | |
189 | Not clear what "imitate the PP logic" means. |
clang-tools-extra/clangd/ParsedAST.cpp | ||
---|---|---|
137 | Implemented a way to partially tokenize a file in D74962.
It should be okay for replaypreambles as only clang tidy checkers depends on this logic and we are not planning to emit diagnostics with stale preambles. | |
178 | made it logarithmic instead, we can also make it linear in total if we decide to rely on the fact that MainFileIncludes are sorted. I believe it is currently true but never promised by the include collector. | |
189 | it was refering to the fact that we were performing the PP.LookupIdentifierInfo call to set kind etc. |
- Address comments.
- Add tests by mimicking a clang-tidy check.
- only tokenize the preamble section, not the whole file.
clang-tools-extra/clangd/ParsedAST.cpp | ||
---|---|---|
182 | nit: IncludeTok | |
clang-tools-extra/clangd/unittests/ParsedASTTests.cpp | ||
384 | I think it would be clearer to have parallel named point/range lists rather than doing index math. So the annotated code would be pretty verbose like: $hash^#$include[[import]] $filerange^"$file[[bar.h]]"... |
Breaks win: http://45.33.8.238/win/9705/step_9.txt
(Or your other change that landed at the same time)
tokenizing the whole file an extra time on every AST build seems a bit sad - this is considerably more lexing than we were doing before. Probably doesn't matter?
We could trim this to the preamble bounds I guess. Or even compute it once when the preamble is built, since we assume all the bytes are the same? I guess SourceLocations are the problem... we could just translate offsets into the new SM, but that gets messy.
On the other hand, assuming the preamble isn't going to change at all seems like an assumption not long for this world.
On the first hand again, maybe we'll have to revisit looots of stuff (go to definition and everything) once that assumption breaks anyway.