This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add a flag to allow indexing of reserved identifiers
AbandonedPublic

Authored by nridge on Jun 28 2023, 12:53 AM.

Details

Reviewers
hokein
kadircet
Summary

Some projects, like the Linux kernel, make extensive use of such
identifiers, making it a usability issue if they can't be indexed.

Fixes https://github.com/clangd/clangd/issues/1680

Diff Detail

Event Timeline

nridge created this revision.Jun 28 2023, 12:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2023, 12:53 AM
Herald added a subscriber: arphaman. · View Herald Transcript
nridge requested review of this revision.Jun 28 2023, 12:53 AM

No tests yet, posting for early feedback.

clang-tools-extra/clangd/index/StdLib.cpp
233

I'm thinking no: the only time you'd want reserved identifiers from the standard library is if you're working on the standard library itself, but in that case the standard library source files would be part of the project and you'd be indexing them "normally", not via indexStandardLibrary() (so your configuration would be:

Index:
  ReservedIdentifiers: true
  StandardLibrary: false

and the StandardLibrary: false means this codepath is not taken).

clang-tools-extra/clangd/indexer/IndexerMain.cpp
48

I'm thinking we should, so that projects that want to enable this (like e.g. the kernel) can use a remote index if desired.

I'm just not sure if we have the machinery in place to use Config from clangd-indexer.

clang-tools-extra/clangd/refactor/Rename.cpp
239

Would it be inappropriate to set this in the SymbolCollector::Options constructor, instead of every place where we call it?

nridge edited the summary of this revision. (Show Details)Jun 28 2023, 1:01 AM
nridge edited the summary of this revision. (Show Details)Jul 8 2023, 2:53 PM

Thanks for putting this together, it definitely seems like something people want.

My main concern is that the configuration has the wrong scope.
We're checking whether reserved-ident indexing is on for the TU we're indexing, but really we want to specify which headers we should index reserved-idents from.
If hell freezes over and the linux kernel starts using the C++ standard library one day, we want the preamble index to contain __free_pages() but not std::__variant_impl_helper. But the preamble indexing all runs under the same config.
(Also we more or less assume that all TUs will index a header the same way: if we've indexed foo.h under A.cpp, we won't reindex it under B.cpp even if the config is different. So we get "first-one-wins" behavior...)

I don't think we can afford to re-evaluate the config each time we switch between headers while indexing. (Stupid over-flexible config system is too slow... I've learned my lesson from that one!)
So I'm not really sure what to suggest...

  • we could do it this way anyway and accept that it's basically only usable as a global flag: this is probably fine for the linux kernel as it's mostly isolated anyway (no standard library).
  • we could add some rules (regexes?) to the config that describe which headers can have interesting symbols. This could be slow too, but at least cacheable, like the self-contained check
  • maybe we can come up with some heuristics to improve the "no reserved names" rule, with or without configuration. (Maybe apply it to only -isystem headers? Linux uses angle-quoted includes, but not actually -isystem AFAICS)

What do you think? Is this a real concern or am I missing something?

clang-tools-extra/clangd/ConfigFragment.h
205

nit: it's the *symbols* we're indexing, based on whether their names are reserved identifiers.

207

For some reason I find the name "ReservedNames" a little clearer.

I tend to think of an "identifier" as a type of token at the lexer level, and a "name" as its role at the language level. But I don't think this actually matches any standard terminology, so no need to change it unless it also reads better to you.

  • maybe we can come up with some heuristics to improve the "no reserved names" rule, with or without configuration. (Maybe apply it to only -isystem headers? Linux uses angle-quoted includes, but not actually -isystem AFAICS)

I tried this out, and it behaves well at least for linux & the C++ stdlib on my system. D155381 if you like the approach.

I have the feeling there may still be some exceptions, but we don't know about them yet and it sidesteps the config questions...

Yeah, I'm happy to go with D155381 instead.

(Stupid over-flexible config system is too slow... I've learned my lesson from that one!)

The issues with the config system are not entirely obvious to me. If, by chance, you've written some sort of post-mortem / discussion of the shortcomings and how you might design a config system in the future, I'd be interested to read it.

Yeah, I'm happy to go with D155381 instead.

(Stupid over-flexible config system is too slow... I've learned my lesson from that one!)

The issues with the config system are not entirely obvious to me. If, by chance, you've written some sort of post-mortem / discussion of the shortcomings and how you might design a config system in the future, I'd be interested to read it.

I haven't, but this is a useful idea and I'd like to.

The main regret is making it too powerful/complicated without properly considering the consequences:

  • the if conditions and the way we plug in fragments mean we really need to compute the config for each file we want to consider. Really, probably few conditions are used and something with simpler fs-hierarchical scope would be much easier to cache.
  • to avoid having to do too much interpretation of the config as we repeatedly process the same file, we have the compile step - but this makes processing many *different* files in quick succession more expensive

As a result, we can *barely* use it to control background indexing (evaluating the config to know whether each file in the project should be indexed takes nontrivial time) and we can't use it for things like this where we want to configure per-header but this would end up significantly increasing indexing cost.

There are some other things, like the amount of plumbing that's needed for new settings. It would be good to reflect on the alternatives a bit again, though.

nridge abandoned this revision.Jul 21 2023, 2:40 PM

Abandoning in favour of D155381.