This is an archive of the discontinued LLVM Phabricator instance.

[AST] RecursiveASTVisitor visits lambda classes when implicit visitation is on.
ClosedPublic

Authored by sammccall on Jan 8 2019, 8:49 AM.

Details

Summary

This fixes ASTContext's parent map for nodes in such classes (e.g. operator()).
https://bugs.llvm.org/show_bug.cgi?id=39949

This changes the observed shape of the AST for implicit RAVs.

  • this includes AST MatchFinder: cxxRecordDecl() now matches lambda classes, functionDecl() matches the call operator, and the parent chain is body -> call operator -> lambda class -> lambdaexpr rather than body -> lambdaexpr.
  • this appears not to matter for the ASTImporterLookupTable builder
  • this doesn't matter for the other RAVs in-tree

In order to do this, we remove the TraverseLambdaBody hook. The problem is it's
hard/weird to ensure this hook is called when traversing via the implicit class.
There were just two users of this hook in-tree, who use it to skip bodies.
I replaced these with explicitly traversing the captures only. Another approach
would be recording the bodies when the lambda is visited, and then recognizing
them later.
I'd be open to suggestion on how to preserve this hook, instead.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Jan 8 2019, 8:49 AM
JonasToth added a comment.EditedJan 8 2019, 2:20 PM

I still see the unit-test crashing for ExprMutAnalyzer (just apply the last two tests https://github.com/JonasToth/clang/blob/fix_crash/unittests/Analysis/ExprMutationAnalyzerTest.cpp and run check-clang-unit)
The clang-tidy check does not crash anymore, but readability-function-size regresses. It probably doesn't properly count for lambdas anymore.

sammccall updated this revision to Diff 180796.Jan 9 2019, 12:08 AM

Ignore lambdas in FunctionSizeCheck (functionDecl() now matches their operator()).

I think they should probably be included, but don't want to change the check
semantics in this patch.

I still see the unit-test crashing for ExprMutAnalyzer (just apply the last two tests https://github.com/JonasToth/clang/blob/fix_crash/unittests/Analysis/ExprMutationAnalyzerTest.cpp and run check-clang-unit)

The ReproduceFailure11 test passes for me, and the ReproduceFailureMinimal fails to compile (a&&).

The clang-tidy check does not crash anymore, but readability-function-size regresses. It probably doesn't properly count for lambdas anymore.

Rather, it previously didn't count lambda bodies as functions (because functionDecl() didn't include them), and now it does :-)
I think counting them is better behavior, but I've modified the matcher to preserve the behavior for now.

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

sammccall edited the summary of this revision. (Show Details)Jan 9 2019, 1:00 AM

I still see the unit-test crashing for ExprMutAnalyzer (just apply the last two tests https://github.com/JonasToth/clang/blob/fix_crash/unittests/Analysis/ExprMutationAnalyzerTest.cpp and run check-clang-unit)

The ReproduceFailure11 test passes for me, and the ReproduceFailureMinimal fails to compile (a&&).

Wupsi, thats odd. I am currently not at my PC and can't fix, but it seems that it should be "template <class T> T forward(T & A) { return static_cast<T&&>(A); }" instead of a&&. If you have the time to try again, great! Otherwise I will check this evening (8:00-9:00 evening berlin timezone).

The clang-tidy check does not crash anymore, but readability-function-size regresses. It probably doesn't properly count for lambdas anymore.

Rather, it previously didn't count lambda bodies as functions (because functionDecl() didn't include them), and now it does :-)
I think counting them is better behavior, but I've modified the matcher to preserve the behavior for now.

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

We can change the clang-tidy check as well. I think lambdas should be considered functionDecls, shouldnt they? WDYT @aaron.ballman ?

klimek added a comment.Jan 9 2019, 2:59 AM

I still see the unit-test crashing for ExprMutAnalyzer (just apply the last two tests https://github.com/JonasToth/clang/blob/fix_crash/unittests/Analysis/ExprMutationAnalyzerTest.cpp and run check-clang-unit)

The ReproduceFailure11 test passes for me, and the ReproduceFailureMinimal fails to compile (a&&).

The clang-tidy check does not crash anymore, but readability-function-size regresses. It probably doesn't properly count for lambdas anymore.

Rather, it previously didn't count lambda bodies as functions (because functionDecl() didn't include them), and now it does :-)
I think counting them is better behavior, but I've modified the matcher to preserve the behavior for now.

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

I'm in favor of functionDecl really being a very simple matcher. Every time we tried to be smart in the end it turned out that wasn't the right call.

The timing of this is unfortunate because the change will not be needed soon. Refactoring is underway to extract a generic traverser from ASTDumper. Then the parent/child traversal of ASTMatchFinder can be implemented in terms of it without any of these kinds of problems. See

http://clang-developers.42468.n3.nabble.com/Dumping-AST-information-to-other-formats-tp4062988p4063004.html

The timing of this is unfortunate because the change will not be needed soon.

RecursiveASTVisitor should be fixed (even if it won't be used for the parent map) unless it's going away entirely - it's supposed to traverse the whole AST.

Refactoring is underway to extract a generic traverser from ASTDumper. Then the parent/child traversal of ASTMatchFinder can be implemented in terms of it without any of these kinds of problems.

That sounds interesting (though it's not obvious to me why ASTDumper's traversal is inherently less prone to these issues than RecursiveASTVisitor).
I'm not inclined to hold off fixing this bug though, because:

  • it's blocking other patches, and the changes you're talking about seem likely to be quite involved and need some details worked out
  • if we are going to switch from the RAV parent map to an ASTDumper-traverser-derived map, then the closer that switch is to a no-op, the easier it will be to land.

I'm not inclined to hold off fixing this bug though, because:

  • it's blocking other patches, and the changes you're talking about seem likely to be quite involved and need some details worked out
  • if we are going to switch from the RAV parent map to an ASTDumper-traverser-derived map, then the closer that switch is to a no-op, the easier it will be to land.

That makes sense to me, but I wanted to draw your attention to the refactoring effort.

The timing of this is unfortunate because the change will not be needed soon. Refactoring is underway to extract a generic traverser from ASTDumper. Then the parent/child traversal of ASTMatchFinder can be implemented in terms of it without any of these kinds of problems. See

http://clang-developers.42468.n3.nabble.com/Dumping-AST-information-to-other-formats-tp4062988p4063004.html

Should still be fixed for 8.0 (probably with this patch). The refactoring is more realistic to land in 9.0 i guess?

Should still be fixed for 8.0 (probably with this patch). The refactoring is more realistic to land in 9.0 i guess?

That's why I said timing is unfortunate :).

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

We can change the clang-tidy check as well. I think lambdas should be considered functionDecls, shouldnt they? WDYT @aaron.ballman ?

I think it depends on whether we want the AST matchers to match what the user wrote (syntax) or how the program behaves (semantics). If the user writes a lambda, they did not syntactically write a function declaration and so it makes sense for functionDecl() to not match. However, the semantics of a lambda are that they act as a functor (in effect) which includes a class declaration, a function declaration for the function call operator, conversion operators, etc and so it does make sense for users to want to match those semantic components. Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

It's always bothered me that tools like clang-query have behavior like this around lambdas, given:

int main() {
  auto l = [](){};
}

you get behavior like:

clang-query> m cxxRecordDecl()

Match #1:

1 match.
clang-query>

Should still be fixed for 8.0 (probably with this patch). The refactoring is more realistic to land in 9.0 i guess?

9.0, IMO.

klimek added a comment.Jan 9 2019, 6:39 AM

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

We can change the clang-tidy check as well. I think lambdas should be considered functionDecls, shouldnt they? WDYT @aaron.ballman ?

I think it depends on whether we want the AST matchers to match what the user wrote (syntax) or how the program behaves (semantics). If the user writes a lambda, they did not syntactically write a function declaration and so it makes sense for functionDecl() to not match. However, the semantics of a lambda are that they act as a functor (in effect) which includes a class declaration, a function declaration for the function call operator, conversion operators, etc and so it does make sense for users to want to match those semantic components. Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

The problem is that while I agree it would be nice if we matched either exactly the semantic or syntactic form of the code, the reality is that we match whatever the AST represents when we visit all nodes. Instead o f promising folks something that we can't hold (because nobody has time to fully check which matchers will match which form), I think telling people that we match the AST is the more honest and less surprising result in the end.

Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

I believe matching the implied semantic nodes is how closer to how matchers behave in general (corresponding to the fact that the ASTMatcher RecursiveASTVisitor sets shouldVisitImplicitCode to true). e.g.

$ cat ~/test.cc
void foo() { for (char c : "abc") {} }
$ bin/clang-query ~/test.cc --
clang-query> set output detailed-ast
clang-query> match binaryOperator()

Match #1:

Binding for "root":
BinaryOperator 0x471f038 </usr/local/google/home/sammccall/test.cc:1:26> '_Bool' '!='
|-ImplicitCastExpr 0x471f008 <col:26> 'const char *':'const char *' <LValueToRValue>
| `-DeclRefExpr 0x471efc8 <col:26> 'const char *':'const char *' lvalue Var 0x471ed48 '__begin1' 'const char *':'const char *'
`-ImplicitCastExpr 0x471f020 <col:26> 'const char *':'const char *' <LValueToRValue>
  `-DeclRefExpr 0x471efe8 <col:26> 'const char *':'const char *' lvalue Var 0x471edb8 '__end1' 'const char *':'const char *'
etc

Obviously this is only true when such nodes are present in the AST at the time of matching (if I'm understanding Manuel's comment correctly).

@klimek: would it be better to preserve the odd behavior of the functionDecl() matcher, and add a new functionOrLambdaDecl()? It seems too surprising to me.

We can change the clang-tidy check as well. I think lambdas should be considered functionDecls, shouldnt they? WDYT @aaron.ballman ?

I think it depends on whether we want the AST matchers to match what the user wrote (syntax) or how the program behaves (semantics). If the user writes a lambda, they did not syntactically write a function declaration and so it makes sense for functionDecl() to not match. However, the semantics of a lambda are that they act as a functor (in effect) which includes a class declaration, a function declaration for the function call operator, conversion operators, etc and so it does make sense for users to want to match those semantic components. Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

The problem is that while I agree it would be nice if we matched either exactly the semantic or syntactic form of the code, the reality is that we match whatever the AST represents when we visit all nodes. Instead o f promising folks something that we can't hold (because nobody has time to fully check which matchers will match which form), I think telling people that we match the AST is the more honest and less surprising result in the end.

I both agree and disagree. I think being able to point users to the concrete thing we're matching against is really useful (and honest). I also think surprising is subjective -- the current behavior with lambdas is baffling to me because it's an inconsistent mixture of syntax and semantics that we match against and that can be hard to remember when writing matchers. Also, it's hard to say we're matching the exact AST for lambdas.

int main() {
  auto l = [](){ return 12; };
  int (*fp)() = l;
}

gives us this AST

TranslationUnitDecl 0x2c3521ed618 <<invalid sloc>> <invalid sloc>
|-TypedefDecl 0x2c3521edef0 <<invalid sloc>> <invalid sloc> implicit __int128_t '__int128'
| `-BuiltinType 0x2c3521edbb0 '__int128'
|-TypedefDecl 0x2c3521edf58 <<invalid sloc>> <invalid sloc> implicit __uint128_t 'unsigned __int128'
| `-BuiltinType 0x2c3521edbd0 'unsigned __int128'
|-TypedefDecl 0x2c3521ee2a8 <<invalid sloc>> <invalid sloc> implicit __NSConstantString '__NSConstantString_tag'
| `-RecordType 0x2c3521ee030 '__NSConstantString_tag'
|   `-CXXRecord 0x2c3521edfa8 '__NSConstantString_tag'
|-TypedefDecl 0x2c3521ee340 <<invalid sloc>> <invalid sloc> implicit __builtin_ms_va_list 'char *'
| `-PointerType 0x2c3521ee300 'char *'
|   `-BuiltinType 0x2c3521ed6b0 'char'
|-TypedefDecl 0x2c3521ee3a8 <<invalid sloc>> <invalid sloc> implicit __builtin_va_list 'char *'
| `-PointerType 0x2c3521ee300 'char *'
|   `-BuiltinType 0x2c3521ed6b0 'char'
`-FunctionDecl 0x2c3521ee450 <C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:1:1, line:4:1> line:1:5 main 'int ()'
  `-CompoundStmt 0x2c3524c1d00 <col:12, line:4:1>
    |-DeclStmt 0x2c3524c1a40 <line:2:3, col:30>
    | `-VarDecl 0x2c3521ee590 <col:3, col:29> col:8 used l '(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)':'(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)' cinit
    |   `-LambdaExpr 0x2c3524c1830 <col:12, col:29> '(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)'
    |     |-CXXRecordDecl 0x2c3524c1228 <col:12> col:12 implicit class definition
    |     | |-DefinitionData lambda pass_in_registers empty standard_layout trivially_copyable literal can_const_default_init
    |     | | |-DefaultConstructor defaulted_is_constexpr
    |     | | |-CopyConstructor simple trivial has_const_param needs_implicit implicit_has_const_param
    |     | | |-MoveConstructor exists simple trivial needs_implicit
    |     | | |-CopyAssignment trivial has_const_param needs_implicit implicit_has_const_param
    |     | | |-MoveAssignment
    |     | | `-Destructor simple irrelevant trivial
    |     | |-CXXMethodDecl 0x2c3524c1360 <col:15, col:29> col:12 used constexpr operator() 'int () const' inline
    |     | | `-CompoundStmt 0x2c3524c1568 <col:16, col:29>
    |     | |   `-ReturnStmt 0x2c3524c1558 <col:18, col:25>
    |     | |     `-IntegerLiteral 0x2c3524c1408 <col:25> 'int' 12
    |     | |-CXXConversionDecl 0x2c3524c16e0 <col:12> col:12 implicit used constexpr operator int (*)() 'int (*() const)()' inline
    |     | | `-CompoundStmt 0x2c3524c1c98 <col:12>
    |     | |   `-ReturnStmt 0x2c3524c1c88 <col:12>
    |     | |     `-ImplicitCastExpr 0x2c3524c1c70 <col:12> 'int (*)()' <FunctionToPointerDecay>
    |     | |       `-DeclRefExpr 0x2c3524c1c50 <col:12> 'int ()' lvalue CXXMethod 0x2c3524c1780 '__invoke' 'int ()'
    |     | |-CXXMethodDecl 0x2c3524c1780 <col:12> col:12 implicit used __invoke 'int ()' static inline
    |     | | `-CompoundStmt 0x2c3524c1c40 <col:12>
    |     | `-CXXDestructorDecl 0x2c3524c1860 <col:12> col:12 implicit referenced ~ 'void () noexcept' inline default trivial
    |     `-CompoundStmt 0x2c3524c1568 <col:16, col:29>
    |       `-ReturnStmt 0x2c3524c1558 <col:18, col:25>
    |         `-IntegerLiteral 0x2c3524c1408 <col:25> 'int' 12
    `-DeclStmt 0x2c3524c1ce8 <line:3:3, col:18>
      `-VarDecl 0x2c3524c1af0 <col:3, col:17> col:9 fp 'int (*)()' cinit
        `-ImplicitCastExpr 0x2c3524c1cd0 <col:17> 'int (*)()' <UserDefinedConversion>
          `-CXXMemberCallExpr 0x2c3524c1cb0 <col:17> 'int (*)()'
            `-MemberExpr 0x2c3524c1c10 <col:17> '<bound member function type>' .operator int (*)() 0x2c3524c16e0
              `-ImplicitCastExpr 0x2c3524c1bf8 <col:17> 'const (lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)' lvalue <NoOp>
                `-DeclRefExpr 0x2c3524c1b50 <col:17> '(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)':'(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)' lvalue Var 0x2c3521ee590 'l' '(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)':'(lambda at C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:2:12)'

but the matcher behavior is

clang-query> m functionDecl()

Match #1:

C:\Users\aballman.GRAMMATECH\Desktop\test.cpp:1:1: note: "root" binds here
int main() {
^~~~~~~~~~~~
1 match.
clang-query> m cxxRecordDecl()

Match #1:

1 match.

So we match the CXXRecordDecl for the lambda but none of the functions declared within it.

However, as you say, we may not be able to support syntactic vs semantic traversals. I was envisioning something along the lines of the syntactic traversal looking at the source location of the AST node when deciding whether to match -- if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

I believe matching the implied semantic nodes is how closer to how matchers behave in general (corresponding to the fact that the ASTMatcher RecursiveASTVisitor sets shouldVisitImplicitCode to true). e.g.

I agree that we more closely match on semantics than we do on syntax and I think that's the correct default (as it will match the nodes in the AST that the user can see through dumping).

Given that, I kind of think we should have functionDecl() match only functions, and give users some other way to match the semantic declarations in a consistent manner. Alternatively, we could decide semantics are what we want to match (because it's what the AST encodes) and instead we give users a way to request to only match syntax.

I believe matching the implied semantic nodes is how closer to how matchers behave in general (corresponding to the fact that the ASTMatcher RecursiveASTVisitor sets shouldVisitImplicitCode to true). e.g.

$ cat ~/test.cc
void foo() { for (char c : "abc") {} }
$ bin/clang-query ~/test.cc --
clang-query> set output detailed-ast
clang-query> match binaryOperator()

Match #1:

Binding for "root":
BinaryOperator 0x471f038 </usr/local/google/home/sammccall/test.cc:1:26> '_Bool' '!='
|-ImplicitCastExpr 0x471f008 <col:26> 'const char *':'const char *' <LValueToRValue>
| `-DeclRefExpr 0x471efc8 <col:26> 'const char *':'const char *' lvalue Var 0x471ed48 '__begin1' 'const char *':'const char *'
`-ImplicitCastExpr 0x471f020 <col:26> 'const char *':'const char *' <LValueToRValue>
  `-DeclRefExpr 0x471efe8 <col:26> 'const char *':'const char *' lvalue Var 0x471edb8 '__end1' 'const char *':'const char *'
etc

Obviously this is only true when such nodes are present in the AST at the time of matching (if I'm understanding Manuel's comment correctly).

Note that I've fixed that too,

http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/7sLs34

but both the matcher and the dumper need to get the new behavior at the same time. I agree that we shouldn't match on implicit nodes. See also

http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/sNny36

See how Manuel had to explain this here years ago: https://youtu.be/VqCkCDFLSsc?t=1748

There's more that we should do to simplify AST dumping and matching: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/uBshw1

My point being that we shouldn't try to rush some way of treating lambdas like functionDecl()s or anything like that.

I suggest not trying to make any such drastic changes for 8.0, try to fix the bug in a minimal way if possible, and have a more considered approach to the future of AST Matchers for after the release.

I suggest not trying to make any such drastic changes for 8.0, try to fix the bug in a minimal way if possible, and have a more considered approach to the future of AST Matchers for after the release.

+1

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

This requires the ast dump traverser to be extracted so that the 'IgnoreInvisble' state can be set when it is dumping, so that it skips over such nodes. Currently there is no public API for ASTDumper. That can be changed.

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

Just pasting the AST for comparison with the one Aaron posted:

int main() {
  auto l = [](){ return 12; };
  int (*fp)() = l;
}
TranslationUnitDecl 0x5604ea6343e8 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x5604ea670470 <<source>:1:1, line:4:1> line:1:5 main 'int (void)'
  `-CompoundStmt 0x5604ea69b0e0 <col:12, line:4:1>
    |-DeclStmt 0x5604ea671260 <line:2:3, col:30>
    | `-VarDecl 0x5604ea6705b0 <col:3, col:29> col:8 used l 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)' cinit
    |   `-LambdaExpr 0x5604ea670c60 <col:12, col:29> 'class (lambda at <source>:2:12)'
    |     `-CompoundStmt 0x5604ea670990 <col:16, col:29>
    |       `-ReturnStmt 0x5604ea670978 <col:18, col:25>
    |         `-IntegerLiteral 0x5604ea670838 <col:25> 'int' 12
    `-DeclStmt 0x5604ea69b0c8 <line:3:3, col:18>
      `-VarDecl 0x5604ea671310 <col:3, col:17> col:9 fp 'int (*)(void)' cinit
        `-DeclRefExpr 0x5604ea69af00 <col:17> 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)' lvalue Var 0x5604ea6705b0 'l' 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)'

I suggest not trying to make any such drastic changes for 8.0, try to fix the bug in a minimal way if possible, and have a more considered approach to the future of AST Matchers for after the release.

+1

Can you elaborate?

This patch is an attempt to solve the problem "some nodes associated with lambdas have no parents".
This manifests as assertion failures (or with assertions off, incorrect results) for some matchers, on some code - people have encountered several examples where this happens, it's hard to know where to target a more targeted fix.
The invariant violated is "RAV traverses every AST node (when implicit traversal is enabled)" - is there a way to fix this issue sufficiently without addressing that?

Is the suggestion to make that change, but then modify the semantics of the functionDecl() etc matchers to hide it? Or something else?

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

So what happens when someone asks about the parent of an invisible node?

e.g. match(hasParent(decl().bind("parent")), X, Ctx) where X is the operator() function of a lambda class. (This is basically the case in the bug that this patch fixes)

I suggest not trying to make any such drastic changes for 8.0, try to fix the bug in a minimal way if possible, and have a more considered approach to the future of AST Matchers for after the release.

+1

Can you elaborate?

This patch is an attempt to solve the problem "some nodes associated with lambdas have no parents".

This is not a new kind of issue. I reported a variant of it too months ago: https://bugs.llvm.org/show_bug.cgi?id=37629.

As such I do think that this is not so urgent and can be fixed after the traverser class is extracted.

One of the problems is that visually tracking through parents/children of an AST dump does not represent the same relationships as can be expressed with hasParent() AST Matcher, because AST Matchers use the RAV and AST dumps do not.

However, we can make both use the same class (the generic traverser I am extracting from ASTDumper), and then that class of bug will be solved as I understand it.

This manifests as assertion failures (or with assertions off, incorrect results) for some matchers, on some code - people have encountered several examples where this happens, it's hard to know where to target a more targeted fix.
The invariant violated is "RAV traverses every AST node (when implicit traversal is enabled)" - is there a way to fix this issue sufficiently without addressing that?

Yes - don't use RAV to traverse parents when AST-matching.

Is the suggestion to make that change, but then modify the semantics of the functionDecl() etc matchers to hide it? Or something else?

My suggestion is to extract the traverser from ASTDumper first, then fix this issue.

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

So what happens when someone asks about the parent of an invisible node?

e.g. match(hasParent(decl().bind("parent")), X, Ctx) where X is the operator() function of a lambda class. (This is basically the case in the bug that this patch fixes)

Assuming that whether to skip invisible nodes is part of the Ctx, the X would simply not be in context, just like if the X were not in the TraversalScope of the Ctx.

Is the suggestion to make that change, but then modify the semantics of the functionDecl() etc matchers to hide it? Or something else?

My suggestion is to extract the traverser from ASTDumper first, then fix this issue.

I understand that you are working on a clean solution which is fine and can be done in 9.0, but:

  • this patch is small, almost non-intrusive and can be obsolete in 9.0, i think no one would be sad about that
  • we need to fix this issue, lambda traversal is absolutly normal in ASTMatchers. Once you include the STL in your code, you automatically encounter this use case
  • we break current code that is in trunk already (for a longer time)
  • this patch does not break any other tests so far, so it does not seem to inccur bigger issues, on the other hand redoing AST matching most likely will.

The semantic issues that came up make sense to address (within this or a similar complex patch), but the current change does not seem to change a lot.
Even if it does, introducing false positives/negatives for lambda-cases is still better then leaving the crashing as is.

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

So what happens when someone asks about the parent of an invisible node?

e.g. match(hasParent(decl().bind("parent")), X, Ctx) where X is the operator() function of a lambda class. (This is basically the case in the bug that this patch fixes)

Assuming that whether to skip invisible nodes is part of the Ctx, the X would simply not be in context, just like if the X were not in the TraversalScope of the Ctx.

Note that as I have prototyped it, the system is perhaps too-eager, not representing the FunctionDecl (or the ParmVarDecls) at all: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/tTEjs3

I know what the fix is, and the result will probably be something like

TranslationUnitDecl 0x5604ea6343e8 <<invalid sloc>> <invalid sloc>
`-FunctionDecl 0x5604ea670470 <<source>:1:1, line:4:1> line:1:5 main 'int (void)'
  `-CompoundStmt 0x5604ea69b0e0 <col:12, line:4:1>
    |-DeclStmt 0x5604ea671260 <line:2:3, col:30>
    | `-VarDecl 0x5604ea6705b0 <col:3, col:29> col:8 used l 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)' cinit
    |   `-LambdaExpr 0x5604ea670c60 <col:12, col:29> 'class (lambda at <source>:2:12)'
    |     |-ParmVarDecl 0x558298e6c4b8 <col:15, col:19> col:19 used param 'int'
    |     `-CompoundStmt 0x5604ea670990 <col:16, col:29>
    |       `-ReturnStmt 0x5604ea670978 <col:18, col:25>
    |         `-IntegerLiteral 0x5604ea670838 <col:25> 'int' 12
    `-DeclStmt 0x5604ea69b0c8 <line:3:3, col:18>
      `-VarDecl 0x5604ea671310 <col:3, col:17> col:9 fp 'int (*)(void)' cinit
        `-DeclRefExpr 0x5604ea69af00 <col:17> 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)' lvalue Var 0x5604ea6705b0 'l' 'class (lambda at <source>:2:12)':'class (lambda at <source>:2:12)'

and `parmVarDecl(hasName('param'), hasParent(lambdaExpr())) would be true, as you would expect from reading the AST.

This is of course subject to change and subject to review.

Is the suggestion to make that change, but then modify the semantics of the functionDecl() etc matchers to hide it? Or something else?

My suggestion is to extract the traverser from ASTDumper first, then fix this issue.

I understand that you are working on a clean solution which is fine and can be done in 9.0, but:

  • this patch is small, almost non-intrusive and can be obsolete in 9.0, i think no one would be sad about that
  • we need to fix this issue, lambda traversal is absolutly normal in ASTMatchers. Once you include the STL in your code, you automatically encounter this use case
  • we break current code that is in trunk already (for a longer time)
  • this patch does not break any other tests so far, so it does not seem to inccur bigger issues, on the other hand redoing AST matching most likely will.

The semantic issues that came up make sense to address (within this or a similar complex patch), but the current change does not seem to change a lot.
Even if it does, introducing false positives/negatives for lambda-cases is still better then leaving the crashing as is.

Yes, I was mostly responding to the suggestion to treat lambdas as functionDecls(), which I think is too drastic to do now, with other solutions on the horizon which may be better.

I don't object to something targeted to fixing the assert as you describe and which this patch seems to be. Sorry if that wasn't clear.

Can you add a test that checks that a lambda declared within another lambda also traverses as expected?

I suggest not trying to make any such drastic changes for 8.0, try to fix the bug in a minimal way if possible, and have a more considered approach to the future of AST Matchers for after the release.

+1

Can you elaborate?

I can elaborate my +1. :-) I don't think we should try to do anything remotely close to what I was talking about with semantic vs syntactic traversals in 8.0, but only consider it for 9.0. Instead, I prefer this minimal fix that addresses the regression with parent maps. We can revisit the question about adding a new traversal model later.

This manifests as assertion failures (or with assertions off, incorrect results) for some matchers, on some code - people have encountered several examples where this happens, it's hard to know where to target a more targeted fix.
The invariant violated is "RAV traverses every AST node (when implicit traversal is enabled)" - is there a way to fix this issue sufficiently without addressing that?

Yes - don't use RAV to traverse parents when AST-matching.

OK, this is certainly a much more invasive change, and isn't going to make the release.
I understand that you don't think this crash is worth fixing for 8.0, while others disagree.
I don't particularly have a dog in this fight, but having seen now three reports of this crash makes me nervous about ignoring it.

if the location isn't somewhere in user code, then don't consider the node or its children for traversal. However, that may be insufficient and equally as mysterious behavior.

That is exactly what I've implemented. I skip invisible nodes in matching and dumping: http://ec2-18-191-7-3.us-east-2.compute.amazonaws.com:10240/z/EuYjAn

So what happens when someone asks about the parent of an invisible node?

e.g. match(hasParent(decl().bind("parent")), X, Ctx) where X is the operator() function of a lambda class. (This is basically the case in the bug that this patch fixes)

Assuming that whether to skip invisible nodes is part of the Ctx, the X would simply not be in context, just like if the X were not in the TraversalScope of the Ctx.

Ah, so any skipped nodes would not have any parents at all.
The implementation of the idea would just be removing the assert - instead of assuming non-visited nodes is a bug, we assume they're *meant* to be excluded.

However, such nodes are very easy to reach, inside or outside of a matcher.
e.g. callExpr(callee(functionDecl())) will match a lambda call, and "func" will be the operator().
Having such nodes then not work with parent/ancestor matchers seems surprising and not obviously the best design choice (vs e.g. visiting all nodes including implicit, or breaking symmetry between downward and upward traversal).

I guess what I'm saying is your suggestion seems like a significant change to the design of (ancestor) matchers, and that the approach in this patch is the more conservative one - making e.g. the implicit FunctionDecls visible to matchers is consistent with clang -ast-dump and the behavior of matchers on other parts of the AST, even if that's something you'd like to change on an LLVM-9 timeline.

Yes - don't use RAV to traverse parents when AST-matching.

OK, this is certainly a much more invasive change, and isn't going to make the release.
I understand that you don't think this crash is worth fixing for 8.0, while others disagree.

Actually, I was trying to explain what can come in the future in response to a question, but I replied with a stronger-looking opinion than I intended.

I don't mind a non-invasive change like this going in now/soon, but I oppose more-drastic changes right now.

However, such nodes are very easy to reach, inside or outside of a matcher.
e.g. callExpr(callee(functionDecl())) will match a lambda call, and "func" will be the operator().
Having such nodes then not work with parent/ancestor matchers seems surprising and not obviously the best design choice (vs e.g. visiting all nodes including implicit, or breaking symmetry between downward and upward traversal).

I believe I have a solution to this, but this is not the place to discuss it. We can discuss it when I propose that patch :). I'll be sure to add you to the review.

@sammccall I (hopefully) fixed the type-issue in my test-cases. They nevertheless fail both on me, on latest llvm+clang+your patch. Clang-tidy still the same, so maybe there is a difference in our builds?
Do you build with assertions on?
I work on linux, ubuntu 18.04, building with clang-6.0, maybe there is a difference?

Just in general, I'd like to add that my experience over the years dealing with folks trying to do AST matchers is that the inability to express something is much more of a problem than matching too much, simply because the test cases people think of are usually small, and when running a transformation on a very large codebase, the cost of false negatives is significantly higher than the cost of false positive: the cost of a false negative means that you may produce something incorrect. The cost of a false positive is that you adapt your matcher to exclude the false negative.

@sammccall I (hopefully) fixed the type-issue in my test-cases. They nevertheless fail both on me, on latest llvm+clang+your patch. Clang-tidy still the same, so maybe there is a difference in our builds?
Do you build with assertions on?
I work on linux, ubuntu 18.04, building with clang-6.0, maybe there is a difference?

Your tests indeed fail (I was rebuilding AnalysisTests instead of ClangAnalysisTests, oops!).
The // namespace std comments out the rest of the test, because the test contains no newlines. Try using raw strings instead :-)
After removing the comments, the tests pass.

I believe I have a solution to this, but this is not the place to discuss it. We can discuss it when I propose that patch :). I'll be sure to add you to the review.

Understood, looking forward to hearing it! It would certainly be nice if there was a way to run matchers over a simpler/more consistent AST when appropriate.

@aaron.ballman @JonasToth I think this is ready for review again :-)

(Sorry, I forgot to mention: the corresponding clang-tidy fix is in D56552 - I didn't manage to switch to monorepo yet :-/)

aaron.ballman accepted this revision.Jan 10 2019, 10:30 AM

LGTM with one minor testing request (conditional on the test passing, of course).

unittests/Tooling/RecursiveASTVisitorTests/LambdaExpr.cpp
68 ↗(On Diff #180796)

Can you add a test that has a lambda within a lambda to ensure that everything is traversed as expected?

This revision is now accepted and ready to land.Jan 10 2019, 10:30 AM

@sammccall I (hopefully) fixed the type-issue in my test-cases. They nevertheless fail both on me, on latest llvm+clang+your patch. Clang-tidy still the same, so maybe there is a difference in our builds?
Do you build with assertions on?
I work on linux, ubuntu 18.04, building with clang-6.0, maybe there is a difference?

Your tests indeed fail (I was rebuilding AnalysisTests instead of ClangAnalysisTests, oops!).
The // namespace std comments out the rest of the test, because the test contains no newlines. Try using raw strings instead :-)
After removing the comments, the tests pass.

Same result! They do crash with the wrong comment. Thats probably the next surprise? :) I will pretend I didn't see anything ;)
Both the tests run and the const-analysis does not crash anymore, too.

Thank you very much for fixing the issue.

sammccall marked an inline comment as done.Jan 14 2019, 2:30 AM
This revision was automatically updated to reflect the committed changes.