This is an archive of the discontinued LLVM Phabricator instance.

Introduces DynTypedMatcher as a new concept that replaces the UntypedBaseMatcher and TypedMatcher. Due to DynTypedNode the basic dynamically typed matcher interface can now be simplified.
ClosedPublic

Authored by klimek on Sep 1 2012, 4:27 PM.

Details

Summary

Also switches the traversal interfaces to use DynTypedNode;
this is in preperation for the hasAncestor implementation, and
also allows us to need fewer changes when we want to add new
nodes to traverse, thus making the code a little more decoupled.

Main design concerns: I went back towards the original design
of getNodeAs to return a pointer, and switched DynTypedNode::get
to always return a pointer (in case of value types like QualType
the pointer points into the storage of DynTypedNode, thus allowing
us to treat all the nodes the same from the point of view of a
user of the DynTypedNodes.

Adding the QualType implementation for DynTypedNode was needed
for the recursive traversal interface changes.

Diff Detail

Event Timeline

silvas added a comment.Sep 1 2012, 9:28 PM

Looking pretty good!

include/clang/ASTMatchers/ASTMatchFinder.h
98–100

The name "Trigger" here isn't really speaking to me (and the comment isn't helping much either).

After looking a bit more into what this is used for, I honestly don't know why the comment didn't come along as clear, since it describe it pretty well.

I think it has to do with the 's in MatchCallback*'s, which masks the clear 1-1 correspondence going on here. I think it would be better to phrase this as "A DynTypedMatcher and the MatchCallback to be invoked when it matches".

Sorry, idk why this comment threw me off, but for whatever reason I truly had to dive all the way into the source to get a clue about what this is even for.

Also, putting this right above the Triggers vector would increase locality.

132–133

Since "trigger" is just a plain pair, it's not really clear at all what "executed against the AST means" for a trigger without some supporting documentation. Also as I mentioned above, putting the definition of Trigger nearby would help too.

(in general, a more self-documenting name than "trigger" would be good ("MatcherWithAction"?)).

include/clang/ASTMatchers/ASTTypeTraits.h
92–132

It seems like all of these traits classes are basically the same boilerplate with NT_* used for the tag and the actual stored type used as the type of the placement new and some reinterpret casting.

The only one that seems to differ between them is get which for Decl and Stmt has a dyn_cast, while QualType doesn't.

It would be nice if this duplication could be reduced, for example by having the traits class explicitly encode the constant (e.g. in an enum), and implementing as must as possible generically. I not, then at least a comment explaining the duplication situation would be good.

lib/ASTMatchers/ASTMatchFinder.cpp
81–85

A switch on the kind, with an "unreachable" default would allow clang to warn on incomplete coverage of the enum during future maintenance.

257–258

Maybe put a // FIXME near this to keep things grep'able.

353–355

I know this isn't directly related to this patch, but this is really the nerve-center of the matching process, and deserves to be better explained (or cross-referenced to where it is explained).

klimek added inline comments.Sep 2 2012, 4:22 AM
include/clang/ASTMatchers/ASTMatchFinder.h
132–133

Yea, I didn't really like the name either, and the more I think about it, the worse "Trigger" sounds.

How about MatchAction?

include/clang/ASTMatchers/ASTTypeTraits.h
92–132

The problem is that the QualType one is sufficiently different (using the storage for storage of the actual object).

The other two are similar enough, but on the other hand I'm not sure whether pulling out yet another abstraction is a good idea already.

lib/ASTMatchers/ASTMatchFinder.cpp
81–85

I actually don't want to expose the kind enum. Perhaps I'm a little too cautious, but I like my interfaces as small as possible.

I'm more thinking about having something built into RAV that would make sure all types are handled, but that would make RAV depend on the DynTypedNodes. I guess we'll see how it goes...

silvas added inline comments.Sep 2 2012, 8:51 AM
include/clang/ASTMatchers/ASTMatchFinder.h
132–133

To me, MatchAction sounds like "the action corresponding to a match", which is what MatchCallback is.

lib/ASTMatchers/ASTMatchFinder.cpp
81–85

Ok, no biggie.

My impression is that DynTypedNode will eventually graduate to be a public interface for general AST manipulation, and that the NodeTypeTag enum will naturally be part of the interface to allow users to switch on it. Only time will tell though. It's not a big deal at the moment though.

klimek added inline comments.Sep 3 2012, 6:23 AM
include/clang/ASTMatchers/ASTMatchFinder.h
132–133

Ok, couldn't come up with a better name and reverted the pull-out of the abstraction for now. Instead did a tyepdef of MatchCallback in the .cpp file for brevity of the type.

lib/ASTMatchers/ASTMatchFinder.cpp
257–258

There is actually no FIXME here - this code will not need to be adapted. The assert() is just there for the case where getMemoizationData() is out of sync with the rest of the DynTypedNode (which hopefully will not happen ;)

353–355

Hm, which parts need more explanations? Anything specific?

klimek updated this revision to Unknown Object (????).Sep 3 2012, 6:30 AM

Updated in response to Sean's comments.

silvas added a comment.Sep 3 2012, 7:25 AM

Comment at: lib/ASTMatchers/ASTMatchFinder.cpp:353-355
@@ -367,4 +352,5 @@
+ this, &Builder)) {

  BoundNodesTree BoundNodes = Builder.build();
  MatchVisitor Visitor(ActiveASTContext, It->second);
  BoundNodes.visitMatches(&Visitor);
}

Sean Silva wrote:

I know this isn't directly related to this patch, but this is really the nerve-center of the matching process, and deserves to be better explained (or cross-referenced to where it is explained).

Hm, which parts need more explanations? Anything specific?

It's not that it's that hard to understand, but that it's hard to
find. So, for example, I think it would be good to mention it in the
comment at the top of the file. Also, I think a comment on match()
should definitely say "this is the core of the matching process", or
something like that. In other words, in order to understand the code,
this is the best place to start at (AFAIK), and the reader should be
aware of that so that they don't waste their time.

If there is another place which is a better place to start, then that
should be the thing called out in the comment at the top of the file
of course.

--Sean Silva

silvas added a comment.Sep 3 2012, 7:27 AM

Comment at: lib/ASTMatchers/ASTMatchFinder.cpp:353-355
@@ -367,4 +352,5 @@
+ this, &Builder)) {

  BoundNodesTree BoundNodes = Builder.build();
  MatchVisitor Visitor(ActiveASTContext, It->second);
  BoundNodes.visitMatches(&Visitor);
}

Sean Silva wrote:

I know this isn't directly related to this patch, but this is really the nerve-center of the matching process, and deserves to be better explained (or cross-referenced to where it is explained).

Hm, which parts need more explanations? Anything specific?

It's not that it's that hard to understand, but that it's hard to
find. So, for example, I think it would be good to mention it in the
comment at the top of the file. Also, I think a comment on match()
should definitely say "this is the core of the matching process", or
something like that. In other words, in order to understand the code,
this is the best place to start at (AFAIK), and the reader should be
aware of that so that they don't waste their time.

If there is another place which is a better place to start, then that
should be the thing called out in the comment at the top of the file
of course.

--Sean Silva

Mostly nits and comment changes ..

include/clang/ASTMatchers/ASTMatchers.h
67

Blank line after \brief unless this is intentional.

include/clang/ASTMatchers/ASTTypeTraits.h
28

.. kinds of AST nodes ..

51

maybe: (like \c Stmt and \c Decl)

53

\c QualType here and \c DynTypedNode below

89

Some more documentation (or a reference) to what a QualType actually is (Type-pointer + 3Bits) might help understanding this easier.

96

I'd be consistent with getMemoizationData (spacing around * and using Storage.buffer).

Actually, how about:
return dyn_cast<T>(getMemoizationData());
?

110

same as above

lib/ASTMatchers/ASTMatchFinder.cpp
34

Add a comment why we don't want memoization on QualTypes (e.g. might be incorrect together with the coming hasAncestor matchers or unnecessary as there are no big QualType hierarchies).

Also, maybe add FIXME to add memoization to for TypeLocs, once we match those (TypeLocs form bigger trees and presumably memoization is worth it). Also then it might make sense to store DynTypedNodes instead of void*!?

257

Why couldn't you reach this with a DynTypedNode representing a QualType? If you can, add test.

348

As you are here anyway, you could rename this to use the standard I and E, but up to you.

klimek updated this revision to Unknown Object (????).Sep 3 2012, 11:07 AM

Updated according to comments.

include/clang/ASTMatchers/ASTMatchers.h
67

Done.

include/clang/ASTMatchers/ASTTypeTraits.h
28

Done.

51

Done.

53

Done.

89

I don't think we currently rely on that - for this implementation, all you need to know is that QualType doesn't have pointer identity and thus we need to store it by-value.

96

I dislike using getMemoizationData here. The other way around makes sense.

Storage.buffer is different, because we get 'Storage' as a const char[] in this method. Perhaps I should rename it to Buffer to be more consistent?

110

Same answer :P

lib/ASTMatchers/ASTMatchFinder.cpp
34

Tried to clarify in the comments.

Every TypeLoc has a QualType, so I don't see why the second argument holds - we'll either want to add memoization for both TypeLoc and QualType, or for neither.

Using DynTypedNode in the key doesn't make sense at all - we'll never want to get the original value back, we just need a one-way function that is unique.

257

Because matchesDescendantOf (ASTMatchersInternal:444) only works for Decl and Stmt currently...

348

Done.

djasper added inline comments.Sep 3 2012, 1:54 PM
include/clang/ASTMatchers/ASTTypeTraits.h
89

.. which I would find easier to understand than what the comment currently says.

How about:

/ Note that we can store \c Decls and \c Stmts by pointer as they are
/ guaranteed be unique pointers pointing to dedicated storage in the
/ AST. \c QualTypes on the other hand do not have storage or unique
/ pointers and thus need to be stored by value.
/// Further info: <reference to QualType>

(with the "Further info" completely optional).

96

Yes, the other way around makes more sense. I definitely like having only one reinterpret_cast :-).

'Storage' is fine, I just did not read carefully enough, sorry!

lib/ASTMatchers/ASTMatchFinder.cpp
34

Yes, exactly, every TypeLoc has a QualType, so it is sufficient to memoize the TypeLoc (memoizing the QualTypes, too, would at best lead to an improvement by a small constant factor). However, TypeLocs can form longer chains and we don't want to match whole subchains repeatedly for hasDescendant()..

"We just need a one-way function that is unique".. This is sort of what DynTypedNode provides (it already knows that for Decls and Stmts the pointer is unique and for QualTypes it has to use the value). Would be good to store information like that in only one class (even though we will never need the retrieval-logic). On the other hand, if you disagree on the first point and say that we will only ever want to memoize Stmts and Decls, this is a mute point ;-).

silvas added a comment.Sep 3 2012, 7:18 PM

/ Note that we can store \c Decls and \c Stmts by pointer as they are
/ guaranteed be unique pointers pointing to dedicated storage in the
/ AST. \c QualTypes on the other hand do not have storage or unique
/ pointers and thus need to be stored by value.
/// Further info: <reference to QualType>

This sounds more like it should be in Clang AST 101, not in the depths
of the AST matcher implementation. Someone who doesn't know what a
QualType is isn't going to make it this far into the file; as in all
writing, considering your audience is important. As such, IMO this
comment isn't necessary. I think that Manuel's original comment very
nicely summarizes the status quo as far as "why we're storing the
QualType by value"; namely, that this is what the AST gives us.

However, an explanation is needed here:

/// \brief Supported base node types.

enum NodeTypeTag {	
  NT_Decl,	
  NT_Stmt,	
  NT_QualType	
} Tag;

, because QualType isn't really a "base node". Since Type/TypeLoc
_are_ actually bases, an explanation for why and in what sense
QualType is being used as a "base" here and not Type* or TypeLoc would
be highly desirable. Or maybe "base" is subtly off from being the
right concept here, which I think is more along the lines of "a handle
for a distinct kind of entity in the AST"; it may be easier to just
say "base" though :P.

--Sean Silva

silvas added a comment.Sep 3 2012, 7:20 PM

/ Note that we can store \c Decls and \c Stmts by pointer as they are
/ guaranteed be unique pointers pointing to dedicated storage in the
/ AST. \c QualTypes on the other hand do not have storage or unique
/ pointers and thus need to be stored by value.
/// Further info: <reference to QualType>

This sounds more like it should be in Clang AST 101, not in the depths
of the AST matcher implementation. Someone who doesn't know what a
QualType is isn't going to make it this far into the file; as in all
writing, considering your audience is important. As such, IMO this
comment isn't necessary. I think that Manuel's original comment very
nicely summarizes the status quo as far as "why we're storing the
QualType by value"; namely, that this is what the AST gives us.

However, an explanation is needed here:

/// \brief Supported base node types.

enum NodeTypeTag {	
  NT_Decl,	
  NT_Stmt,	
  NT_QualType	
} Tag;

, because QualType isn't really a "base node". Since Type/TypeLoc
_are_ actually bases, an explanation for why and in what sense
QualType is being used as a "base" here and not Type* or TypeLoc would
be highly desirable. Or maybe "base" is subtly off from being the
right concept here, which I think is more along the lines of "a handle
for a distinct kind of entity in the AST"; it may be easier to just
say "base" though :P.

--Sean Silva

klimek updated this revision to Unknown Object (????).Sep 3 2012, 10:26 PM

Changed comment.

include/clang/ASTMatchers/ASTTypeTraits.h
89

Done.

lib/ASTMatchers/ASTMatchFinder.cpp
34

Why can TypeLocs form longer chains than QualTypes?

I also think the notion of a key for a node is sufficiently different from a pointer to the node. I'd rather pull out a NodeKey class or something at one point. For now, I'd rather keep it simple.

LGTM

lib/ASTMatchers/ASTMatchFinder.cpp
34

I think that is just what TypeLocs do ;-).

I still think there is a lot of similarity between what we wnat to store here and what DynTypedNode currently does. However, I am perfectly happy keeping this simple (and a void*) in this CL.

Generally looks good. A few comments:

include/clang/ASTMatchers/ASTMatchFinder.h
135

I would prefer a clearer name, even if it is on the wordy side. Trigger doesn't really fit. Some possibilities:
MatcherMatchActionPair
MatcherMatchCallbackPair
MatcherAndActionPair

include/clang/ASTMatchers/ASTMatchersInternal.h
375

This comment is stale; might as well fix it while we're nearby. The comment doesn't include CXXCtorInitializer.

include/clang/ASTMatchers/ASTTypeTraits.h
89

typo: "guaranteed be" -> "guaranteed to be"

92

I think it would be better to use boost::any here. Then you don't have to explicitly specify all the types. You will still be able to store the QualType by value, but you won't have to worry about it in the declaration of Storage.

96

Why the change to reinterpret_cast? It seems a fair bit messier. This would also be cleaner using boost::any.

djasper added inline comments.Sep 5 2012, 1:45 AM
include/clang/ASTMatchers/ASTMatchFinder.h
135

Maybe: CallbackForMatcher?

include/clang/ASTMatchers/ASTTypeTraits.h
92

LLVM and Boost don't mix ;-).... Boost is quite massive and would create a significant burden on clang. The specialized abstractions provided by LLVM are quite sufficient for most use cases.

The best data structure would probably by a union, however until we can use C++11, unions don't support members with non-trivial constructors. That is why we use this helper structure for the time being.

96

As said above, no boost in LLVM. However, I agree that we could probably continue passing in a void* here, right? Even for the QualType we are passing a pointer to the storage inside DynTypedNode, which should work.

klimek added inline comments.Sep 5 2012, 4:20 AM
include/clang/ASTMatchers/ASTMatchFinder.h
135

Renamed to MatcherCallbackPairs. I dislike pretty much all of the names, but that seems to be most straight forward.

include/clang/ASTMatchers/ASTMatchersInternal.h
375

Done.

include/clang/ASTMatchers/ASTTypeTraits.h
89

Done.

96

The change from void* is because now we have an actual storage (the character array), and I think funneling this through a void* will make it less explicit that we're doing something fishy here.

klimek updated this revision to Unknown Object (????).Sep 5 2012, 4:21 AM

Implements review comments.

mdiamond accepted this revision.Sep 5 2012, 3:43 PM

LGTM.

include/clang/ASTMatchers/ASTTypeTraits.h
92

Hrmm... I talked briefly with Manuel about adding the Any abstraction to llvm/Support and I tried to broach the idea on the IRC channel but didn't get any response. I think that's the way to go in the future, but in the short term, I don't think it should hold up this CL.

klimek closed this revision.Sep 13 2012, 10:38 AM