This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Factor out the heuristic resolver code into its own class
ClosedPublic

Authored by nridge on Nov 29 2020, 3:06 PM.

Details

Summary

Since the heuristic resolver is created in a place where the
ASTContext is available, it can store the ASTContext and the
NameFactory hack can be removed.

The patch also renames getMembersReferencedViaDependentName()
to resolveDependentMember().

Diff Detail

Event Timeline

nridge created this revision.Nov 29 2020, 3:06 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2020, 3:06 PM
nridge requested review of this revision.Nov 29 2020, 3:06 PM

Shopping the patch around to some possible reviewers.

nridge updated this revision to Diff 311476.Dec 13 2020, 4:28 PM

Add a missed call site

Sorry for taking a while to get to this...

I'm not sure this is really a helpful use of Context. (Though some would argue that there are no good uses, my manager was appalled when we added it...)
Does this enable some future work, or is it reasonable to evaluate this patch in isolation?

There's of course no hard-and-fast rules but my checklist is:

  • is this some kind of optional input, like overriding some behavior? (i.e. forgetting it isn't terrible, and we needn't burden all callers)
  • does the parameter otherwise need to be plumbed through many layers, most of which don't use it?
  • will parameter otherwise need to be added to public interfaces where they don't make sense?

Here it's a mandatory input, and the alternative is cluttering a couple of layers of purely-private interfaces, so this seems like a lot of magic to avoid a fairly small amount of awkwardness.

(The original use cases for context were around VFS state for embedders, where *all* these things were true - the lspEncoding() is another place where this is IMO a pretty good tradeoff)

my manager was appalled when we added it...

:-)

Does this enable some future work, or is it reasonable to evaluate this patch in isolation?

I was inspired to do this by a work-in-progress patch I have to fix https://github.com/clangd/clangd/issues/451, for which I need to make Sema available in a similar way. That patch isn't quite ready for review (there are some crashes / assertion failures that I need to iron out), but I posted what I have so far at D93522 if you're curious.

Here it's a mandatory input, and the alternative is cluttering a couple of layers of purely-private interfaces, so this seems like a lot of magic to avoid a fairly small amount of awkwardness.

I agree that there is no fundamental need to use Context, it just seemed convenient (and hey, ASTContext even has Context in its name :-)). I could just pass the ASTContext as a parameter to targetDecl() and other relevant functions.

Perhaps, with a view to eventually needing Sema as well, I should just pass ParsedAST instead?

Ah, I suspected something like this was afoot! The patch looks... exciting :-) And I definitely agree we should remove the hacks if we're going to have access to ASTContext anyway.

I think an explicit parameter here is probably a good idea, it's a bit sad to lose targetDecl's simple signature but seems justified at this point.
But I also think we should consider splitting out the heuristic resolution parts into a separate module, maybe leading to a signature like:

targetDecl(DynTypedNode N, HeuristicResolver * = nullptr)

The HeuristicResolver object could provide primitives like finding the pointee type, template patterns etc. It would hold a reference to Sema/ASTContext.

There are a few reasons for this:

  • isolating complexity: FindTargets is mostly fairly broad-but-simple (many cases, but mostly trivial AST traversal) while heuristic resolution is fairly narrow-but-complex. Having the complexity in the way hinders understanding for people who, say, want to improve basic handling of some ObjC nodes that we forgot.
  • testing: Being able to test heuristic resolution primitives directly rather than through targetDecl seems valuable. Particularly for operations that involve sema and template instantiation, I expect we'll find crashes/problems and direct tests will expose them more clearly. Conversely the fallback heuristics can get in the way of testing the basic functionality (we've dealt with this a bit in xrefs).
  • encapsulation: I don't love the idea of exposing the Sema god-object everywhere via ParsedAST. We've gotten a long way without this, it's a really big and complicated hammer, and my suspicion is most places we'd be tempted to use it are bad or at least risky. If we expose HeuristicResolver from ParsedAST instead of Sema directly, we keep the barrier high.
  • reuse: might we also want to use the heuristic resolution for code completion at some point, e.g. LHS of MemberExpr? (We could abstract it via some callbacks, but I also suspect it's not coupled to clangd and could just be lifted up to clang at some point)

What do you think about this?

nridge planned changes to this revision.Jan 11 2021, 12:00 AM

That all sounds reasonable -- thanks for the suggestions!

I've been meaning to factor out the heuristic resolution stuff into its own file anyways, this patch is as good a time as any to do it.

nridge updated this revision to Diff 317272.Jan 18 2021, 12:11 AM

Change approach to factor out a HeuristicResolver class

I didn't make it an optional parameter to allTargetDecls() etc. If you feel there are call sites which would benefit from not passing one in, we can change this.

nridge retitled this revision from [clangd] Use clangd's Context mechanism to make the ASTContext of the AST being operated on available everywhere to [clangd] Factor out the heuristic resolver code into its own class.Jan 18 2021, 12:11 AM
nridge edited the summary of this revision. (Show Details)
nridge edited the summary of this revision. (Show Details)

Thanks a lot for doing this!
I have some thoughts on how we can give it a nice API/more clearly separate it from FindTarget/improve internal structure.

I'm not sure how we should sequence that/how much to do.
One the one hand, simply moving this code is progress! On the other hand, after splitting it the API is more relevant.
We can do certainly land this as mostly moving the code with its existing structure if you like, and I or you could try to clean up the interface later.

I would like to get your feedback on the ideas though as I spent a *long* time staring at this code today and want to make sure I'm not crazy :-)

Change approach to factor out a HeuristicResolver class

I didn't make it an optional parameter to allTargetDecls() etc. If you feel there are call sites which would benefit from not passing one in, we can change this.

As far as I know we should always have this available in production, so I don't think it needs to be optional in the signature.
However I think it should be *nullable* so we can test this separately, if that makes sense?

clang-tools-extra/clangd/FindTarget.cpp
641

only Identifier needs the heuristic resolution, TypeSpec[WithTemplate] can just pull out the stored type

clang-tools-extra/clangd/HeuristicResolver.cpp
114 ↗(On Diff #317272)

We shouldn't need these cases, if we're only trying to support dependent E.
(And otherwise, there are lots of other cases we could potentially need to support here eventually)

clang-tools-extra/clangd/HeuristicResolver.h
37 ↗(On Diff #317272)

This could use some class documentation explaining the idea - basically that clang's AST generally stores only fully-resolved edges, and therefore template ASTs are missing lots of information that we can guess in common cases.

37 ↗(On Diff #317272)

We should have some shared contract around how cases where heuristics are not needed (i.e. non-dependent code). And given the current functions, it's not obvious what this contract should be.

On the one hand, if we attempt to handle such cases here, I'm not sure we've really separated this from FindTarget - e.g. it's hard to find a crisp definition of resolveExprToDecls that's distinct from targetDecls(). And we end up duplicating code between the two.
On the other hand, if we don't, then internal calls (like the implementation of resolveNestedNameSpecifierToType) are really awkward.

In either case, I think the resolution is to make sure the scope of the heuristic bits are as narrow/isolated as possible. Then if we want to handle all cases, it's manageable to do so. And if we don't, we can easily define a peer e.g. resolveNNSToType(const NNS&, const HeuristicResolver *) either here or in AST.h.

(Currently it appears the actual code currently handles some but not all non-dependent cases)

50 ↗(On Diff #317272)

Doc seems stale

55 ↗(On Diff #317272)

This seems like a really nice example of a heuristic lookup with a clear contract :-)

56 ↗(On Diff #317272)

nit: const throughout?

60 ↗(On Diff #317272)

This is pretty vaguely defined. From reading the implementation, this can choose to return two things:

  1. a ValueDecl representing the value of E
  2. a TypeDecl representing the type of E

Case 1 is used by heuristic go-to-def, which never hits case 2 because it only passes certain kinds of E.
Case 1+2 are used internally by the heuristic resolver to resolve base types, and are unified by resolveDeclsToType which handles them as two cases.

This doesn't seem like a clean public API :-) It also seems to lead to some convolution internally.


I think we should distribute the work of this function into a few smaller functions with couple of flavors.

Some that deal with specific node types, extracted from resolveExprToDecls:

  • Decl* memberExprTarget(CXXDependentMemberExpr*). Uses exprToType (for the base type), pointee (for arrow exprs), resolveDependentMember (the final lookup).
  • Type* callExprType(CallExpr*). Uses exprToType (for the callee type).
  • Decl* declRefExprTarget(DependentScopeDeclRefExpr*) - this is currently a trivial wrapper for resolveDependentMember.

And some that are more generic building blocks:

  • Type* exprType(Expr*), which tries to handle any expr, dispatching to the node-specific function and falling back to Expr::getType().
  • getPointeeType(Type*) and resolveDependentMember(...) as now

It's not entirely clear what should be public vs private, but one idea I quite like is: all the node-specific methods are public, and all the building blocks are private.

  • This would mean adding some more node-specific methods to cover the external callers of resolveDependentMember, but in each case I looked at this seems to make sense. (This neatly deals with the slightly grungy signature of resolveDependentMember by hiding it)
  • it also mostly answers the question about contracts on non-dependent code: most non-dependent node types simply can't be passed in
64 ↗(On Diff #317272)

The contract is a bit fuzzy here, and both the caller and callee have to deal with NNS recursion. It could be narrowed by requiring the resolved parent type to be passed in...

clang-tools-extra/clangd/ParsedAST.h
113

is this intended to be nullable? (If not, why the pointer type?)

152

Optional<HeuristicResolver> or simply HeuristicResolver?

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
90 ↗(On Diff #317272)

If we really want these to be separate, I think we should be using a null resolver except for the tests that are explicitly testing heuristics resolution. (And those tests could check both with/without resolver and verify it's required...)

Thanks for the comments! I'll have a more detailed look later, but just wanted to ask one clarifying question for now:

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
90 ↗(On Diff #317272)

I'm not sure what you mean by "if we really want these to be separate". What are "these"?

sammccall added inline comments.Jan 18 2021, 12:17 PM
clang-tools-extra/clangd/unittests/FindTargetTests.cpp
90 ↗(On Diff #317272)

I mean the separation between the "hard" AST logic in FindTarget and the heuristic resolution. And by "we" I really mean "I", I guess :-)

Ideally we'd test these both separately and smoke-test the interaction, but I think the low-hanging fruit is to use all the existing tests, with most of them testing FindTarget in isolation and the few exceptions testing FindTarget+HeuristicResolver together.

nridge added inline comments.Jan 20 2021, 11:52 AM
clang-tools-extra/clangd/unittests/FindTargetTests.cpp
90 ↗(On Diff #317272)

My instict is to keep the codepath exercised by the tests as close as possible to the codepath exercised in production, because I've been bitten many times by "1. Looks like we've regressed scenario X. 2. But we have a test for scenario X. 3. Oh, but the test does something subtly different from what happens in production." So, if in production we always pass a non-null HeuristicResolver, I would prefer that we do so in the test as well.

Perhaps, as a compromise, we could annotate heuristic results as such (perhaps using a new DeclRelation::Heuristic flag?), and check for the expected presence / absence of that flag in the tests?

nridge updated this revision to Diff 318910.Jan 24 2021, 11:46 PM
nridge marked 4 inline comments as done.

Address (many of the) review comments

nridge added inline comments.Jan 24 2021, 11:59 PM
clang-tools-extra/clangd/HeuristicResolver.h
60 ↗(On Diff #317272)

Thanks, you make some good points about this method being messy (and particularly that we're duplicating some "resolve the type of a non-dependent expression" logic, which has bothered me as well).

I like your proposed breakdown into smaller methods, and I partially implemented it.

A couple of notes:

  • I had to keep the MemberExpr case around. It could not be subsumed by Expr::getType(), because for a MemberExpr that returns this thing, while getMemberDecl()->getType() returns the function type of the member function, and we want the latter (for this call site).
  • We don't quite achieve "non-dependent node types simply can't be passed in", because resolveCallExpr() can be called with a non-dependent call expr. However, it currently has no external callers -- perhaps we should make it private?

I left resolveDependentMember() public (and left its external callers alone) for now. I can make it private and add additional public wrappers if you'd like, but I'm wondering what the motivation is (especially as you've described it as "a really nice example of a heuristic lookup with a clear contract" :-)).

64 ↗(On Diff #317272)

I'm a bit unclear on what you mean by "both the caller and the callee have to deal with NNS recursion".

Looking through the callees, other than the recursive call in the implementation they all just make a single call to resolveNestedNameSpecifiedToType() and don't deal with NNS recursion?

clang-tools-extra/clangd/ParsedAST.h
152

I wanted to make it just HeuristicResolver but it's accessed through a const ParsedAST which would only expose a const HeuristicResolver *, and I had already sprinkled HeuristicResolver * (without the const) everywhere :)

I'll clean this up after we settle the nullability question.

Thanks for the thoughtful comments about the API surface and such! I think I've addressed or responded to most of them.

There are a few outstanding comments about how the HeuristicResolver object is stored and passed around (const, pointer, etc.), which I'll come back to once we come to a resolution on the nullability question.

(This patch should be ready for another round of review.)

Sorry for the looong delay getting back to this :-\

I do think this should be nullable (and const-friendly) but otherwise this is good to land.

clang-tools-extra/clangd/HeuristicResolver.h
73 ↗(On Diff #318910)

nit: NameFactory is a weird name now

60 ↗(On Diff #317272)

Those notes/caveats seem totally fine to me.

resolveCallExpr ... perhaps we should make it private?
left resolveDependentMember() public ... I can make it private... I'm wondering what the motivation is

I just think it's a little easier to understand how to use this class, and how to maintain/extend it, if the public methods form a consistent set with the same level of abstraction. Having all the public methods handle node types is tempting from this point of view.

I won't insist on this. We can inline the handling of UnusedUsingValueDecl into the caller as it happens to be simple, but I don't see a strong reason to do so.

64 ↗(On Diff #317272)

I'm a bit unclear on what you mean by "both the caller and the callee have to deal with NNS recursion".

You're right, I'm not sure what I was thinking here.

Maybe change the doc on this function to a "a dependent nested name specifier (i.e. Kind == Identifier)" and then adding an implementation comment "also handle TypeSpec case needed for recursion" or something...
I think that would make the contract clearer and more consistent with other functions.

clang-tools-extra/clangd/unittests/FindTargetTests.cpp
90 ↗(On Diff #317272)

Adding a DeclRelation flag means these results will be excluded unless we explicitly include Heuristic in the filter.
(Maybe that's a good thing? Seems easy to forget though.)

Another way to solve this in the tests is to run them *all* with/without the heuristic resolver and mark the testcases where heuristic is required for them to pass.

I really would like the resolver to be nullable to enforce the layering, and to ensure the core functionality doesn't grow complex dependencies that would preclude its use in new places (even if only for prototyping). e.g. if targetDecl requires heuristic resolver, and that grows a dep on sema, then it might be hard to use in places where we don't have ParsedAST (background indexing maybe?)

nridge updated this revision to Diff 323886.Feb 15 2021, 10:25 PM
nridge marked 10 inline comments as done.

Rebase and address remaining comments

Ok, I've made HeuristicResolver nullable, in the sense that I've added null checks arounds its use in TargetFinder.

I haven't refactored the tests to pass in a null resolver / annotate the ones that do or don't need one. Would you like that done in this patch, or a follow-up?

clang-tools-extra/clangd/HeuristicResolver.h
60 ↗(On Diff #317272)

Your argument for consistency, together with the fact that this allows moving the Filter lambdas out of the header file, has convinced me. Done!

clang-tools-extra/clangd/ParsedAST.h
113

I changed this to return const HeuristicResolver *, but kept it as a pointer for a couple of reasons:

  • I couldn't make ParsedAST store HeuristicResolver by value because it would make ParsedAST non-movable (HeuristicResolver is non-movable because it has a member of reference type (ASTContext&)).
  • The consumer functions (targetDecl() etc.) expect a pointer, so this avoids writing &AST->getHeuristicResolver() at every call site).
152

See previous comment for why I couldn't make it HeuristicResolver. (Optional doesn't work for the same reason.)

sammccall accepted this revision.Feb 16 2021, 12:47 AM

Thanks so much for pushing this through, it looks great.

I haven't refactored the tests to pass in a null resolver / annotate the ones that do or don't need one. Would you like that done in this patch, or a follow-up?

Let's go ahead and land this, I'm happy to make the test changes.
(Not worried about regressions, as clearly we're never passing null in production)

clang-tools-extra/clangd/HeuristicResolver.h
65 ↗(On Diff #323886)

nit: maybe just me, but "type of a NNS" is confusing because my brain thinks it's something like "type of an expr".

Maybe "type denoted by a NNS"?

Maybe also "Note that *dependent* name specifiers always name types, not e.g. namespaces".

This revision is now accepted and ready to land.Feb 16 2021, 12:47 AM
nridge updated this revision to Diff 323906.Feb 16 2021, 1:06 AM

address nit

This revision was landed with ongoing or failed builds.Feb 16 2021, 1:11 AM
This revision was automatically updated to reflect the committed changes.