This patch adds interpretation of binding declarations resulting from a
structured binding (DecompositionDecl) to a tuple-like type. Currently, the
framework only supports binding to a struct.
Fixes issue #57252.
Differential D139544
[clang][dataflow] Add support for structured bindings of tuple-like types. ymandel on Dec 7 2022, 8:08 AM. Authored by
Details This patch adds interpretation of binding declarations resulting from a Fixes issue #57252.
Diff Detail
Event TimelineComment Actions
I'm not sure that this is possible. If you look at the AST of the following example std::pair<int, int> p{1, 2}; auto [a, b] = p; you get `-DecompositionDecl 0x55b0ac862428 'std::pair<int, int>':'std::pair<int, int>' cinit |-CXXConstructExpr | `-ImplicitCastExpr | `-DeclRefExpr 'p' |-BindingDecl a | |-VarDecl a | | `-CallExpr | | |-ImplicitCastExpr | | | `-DeclRefExpr 'get' | | `-ImplicitCastExpr | | `-DeclRefExpr Decomposition 0x55b0ac862428 '' | `-DeclRefExpr 'a' ` ... Basically get<>() is getting the element from the copy of the struct, that's being created when the DecompositionDecl happens. 7: p 8: [B1.7] (ImplicitCastExpr, NoOp, const pair<int, int>) 9: [B1.8] (CXXConstructExpr, [B1.10], std::pair<int, int>) 10: auto = p; 11: get<0UL> 12: [B1.11] (ImplicitCastExpr, FunctionToPointerDecay, typename tuple_element<0UL, pair<int, int> >::type &&(*)(pair<int, int> &&) noexcept) 13: 14: [B1.13] (ImplicitCastExpr, NoOp, std::pair<int, int>) 15: [B1.12]([B1.14]) 16: std::tuple_element<0, std::pair<int, int>>::type a = get<0UL>(); Here 13 would be the DecompositionDecl. The variable declarations must appear after the DecompositionDecl, Comment Actions Thanks for clarifying the issues here. A few thoughts:
Example: https://godbolt.org/z/qcWMdG33T #include <tuple> std::tuple<bool, int> mk(); void f() { auto [b, i] = mk(); (void)i; } CFG: void f() [B2 (ENTRY)] Succs (1): B1 [B1] 1: mk 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, std::tuple<_Bool, int> (*)(void)) 3: [B1.2]() (CXXRecordTypedCall, [B1.4]) 4: auto = mk(); 5: get<0UL> 6: [B1.5] (ImplicitCastExpr, FunctionToPointerDecay, typename tuple_element<0UL, tuple<_Bool, int> >::type &&(*)(tuple<_Bool, int> &&) noexcept) 7: 8: [B1.7] (ImplicitCastExpr, NoOp, std::tuple<_Bool, int>) 9: [B1.6]([B1.8]) 10: std::tuple_element<0, std::tuple<bool, int>>::type b = get<0UL>(); 11: get<1UL> 12: [B1.11] (ImplicitCastExpr, FunctionToPointerDecay, typename tuple_element<1UL, tuple<_Bool, int> >::type &&(*)(tuple<_Bool, int> &&) noexcept) 13: 14: [B1.13] (ImplicitCastExpr, NoOp, std::tuple<_Bool, int>) 15: [B1.12]([B1.14]) 16: std::tuple_element<1, std::tuple<bool, int>>::type i = get<1UL>(); 17: i 18: (void)[B1.17] (CStyleCastExpr, ToVoid, void) Preds (1): B2 Succs (1): B0 [B0 (EXIT)] Preds (1): B1 Here, both lines 7 and 13 are blank but seem to be the underlying tuple that's being deconstructed. Are these different instances of the DecompositionDecl or are they (unprinted) references back to line 4?
Finally, a nit: why doesn't line 13 in your example, and lines 7 and 13 in my example, print? Is that something that I could add to the CFG printer? Comment Actions Gabor -- An alternative solution to this problem, assuming the current structure of the CFG, is to add a map to the Environment of type VarDecl -> BindingDecl that registers BindingDecls in need of initialization. Then, I could modify the DeclStmt interpretation to check this map for each VarDecl it processes, processing the corresponding BindingDecl if there is one. This solution, while a bit more complicated, has the mild advantage of incurring less cost (on average) since VarDecls are less common than DeclRefExpr. Also, it's eager instead of lazy, which seems somewhat easier to reason about. Comment Actions
std::pair<> is actually tuple-like if that's what you meant. Only tuple-like types have holding vars inside a DecompositionDecl.
Those lines are what you see as DeclRefExpr Decomposition '' in the AST. Which are indeed references back to line 4 as you mentioned.
I think in this case the elements do proceed in order of operation. When you create a structured binding by value to a tuple like type, the copy constructor of the type is invoked first. Let's assume that you have a tuple-like std::pair<int, int> p{1, 2}; and you create the structured binding auto [a, b] = p;.
The CFG resembles this behaviour. If you look at both of the example CFGs in this discussion, you can see that both of them begin with the constructor invocation. 7: p 8: [B1.7] (ImplicitCastExpr, NoOp, const pair<int, int>) 9: [B1.8] (CXXConstructExpr, [B1.10], std::pair<int, int>) 10: auto = p; 1: mk 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, std::tuple<_Bool, int> (*)(void)) 3: [B1.2]() (CXXRecordTypedCall, [B1.4]) 4: auto = mk(); Points 4 and 10 are a bit weird, but that's where the copy is created, however the copy is unnamed. This unnamed copy is what I refered to as tmp. Note that when you do a regular copy-constructor invocation like std::tuple<bool, int> mk; std::tuple<bool, int> mk2 = mk; you would get 3: mk 4: [B1.3] (ImplicitCastExpr, NoOp, const class std::tuple<_Bool, int>) 5: [B1.4] (CXXConstructExpr, [B1.6], std::tuple<_Bool, int>) 6: std::tuple<bool, int> mk2 = mk; ^^^This is the name that's missing from the CFGs above. After this the get<>() calls simply use this unnamed copy to initialize the elements from first to last, so everything seems to proceed in order in the CFG.
If I remember correctly (and my intuition is correct based on the examples) a DeclRefExpr in the CFG is the name of the variable being referenced. If you look at the AST in my example, you can see that the DeclRefExpr to the decomposition is DeclRefExpr Decomposition '', where '' Comment Actions Domján -- thanks for the detailed explanations -- this has been really helpful.
Agreed -- it definitely should appear before, but I think it would be good to appear _after_ as well. That is, I think there's a catch-22: the temporary's construction should appear before, but the binding decls, which use the synthetic variables, should appear after. What do you think of duplicating it in the CFG, possibly with a new (statement) element kind (to mark it as a special)? There are other workarounds that we could do on our end, but the duplication would avoid the need for such -- for us and any other clients. Comment Actions
I'm confused a bit here. Right now the CFG looks like this: <constructor_call> <binding_decl_0> ... <binding_decl_n> Based on what you say I assume you want something to happen between constructor_call and binding_decl_0.
Well, I don't think duplicating a statement in the CFG is a good idea as long as there are other solutions for this problem. This seems to be a Comment Actions Sorry, I should have explained in more detail (refernce example: https://godbolt.org/z/dE9ojcjj7). *Perhaps I'm still misreading the CFG, but I don't see the binding decls serialized in the CFG, only the synthesized var decls.* That is, here's my reading of the CFG: <constructor_call> <synthesized_var_decl_0> ... <synthesized_var_decl_n> In more detail, here is my annotated reading of the CFG, based on our discussion: [B1] 1: mk 2: [B1.1] (ImplicitCastExpr, FunctionToPointerDecay, std::tuple<int, int> (*)(void)) 3: [B1.2]() (CXXRecordTypedCall, [B1.4]) 4: auto = mk(); // <constructor_call> (with <binding_decl_0>, <binding_decl_1> nested as children) 5: get<0UL> 6: [B1.5] (ImplicitCastExpr, FunctionToPointerDecay, typename tuple_element<0, tuple<int, int> >::type (*)(tuple<int, int>)) 7: // <ref to anonymous temporary defined on line 4> 8: [B1.7] (ImplicitCastExpr, NoOp, std::tuple<int, int>) 9: [B1.8] (CXXConstructExpr, [B1.10]+0, tuple<int, int>) 10: [B1.6]([B1.9]) 11: [B1.10] 12: std::tuple_element<0, std::tuple<int, int>>::type i0 = get<0UL>(); // <synth_var_decl_0> 13: get<1UL> 14: [B1.13] (ImplicitCastExpr, FunctionToPointerDecay, typename tuple_element<1, tuple<int, int> >::type (*)(tuple<int, int>)) 15: // <ref to anonymous temporary defined on line 4> 16: [B1.15] (ImplicitCastExpr, NoOp, std::tuple<int, int>) 17: [B1.16] (CXXConstructExpr, [B1.18]+0, tuple<int, int>) 18: [B1.14]([B1.17]) 19: [B1.18] 20: std::tuple_element<1, std::tuple<int, int>>::type i1 = get<1UL>(); // <synth_var_decl_1> 21: i0 // ref to <binding_decl_0> 22: (void)[B1.21] (CStyleCastExpr, ToVoid, void) Here is the crux of the issue: per the AST, line 21 is reference is to the *BindingDecl*, not the (synthesized) VarDecl: -DeclRefExpr <col:9> 'std::tuple_element<0, std::tuple<int, int>>::type':'int' lvalue Binding 0x559942acf350 'i0' 'std::tuple_element<0, std::tuple<int, int>>::type':'int' So, in order to make sense of that reference, I need to define the BindingDecl. But, that in turn is defined in terms of the (synthesized) VarDecl. So, I want to insert between elements 20 and 21: 20.a: <binding_decl_0> // references synthesized variable `i0` from line 12 in its binding 20.b: <binding_decl_1> // references synthesized variable `i1` from line 20 in its binding The resulting simplified CFG would be: <constructor_call> <synthesized_var_decl_0> ... <synthesized_var_decl_n> <binding_decl_0> ... <binding_decl_n> I suggested repeating the DecompositionDecl because it includes the BindingDecls, but spelling them out explicitly would probably be even better. Right now, the only chance I get to visit the BindingDecls is when I encounter line 4, as children of the DecompositionDecl, at which point the binding of each decl reference as-yet-undefined (synthesized) VarDecls, hence my dilemma. I'm fine with a custom solution -- that's indeed what this patch contains and I'm happy to leave it at that. Thank you, again, for your patience hashing this out! Comment Actions Those BindingDecls never appear in the CFG unless one of the is used. Take a look at the other cases. 1.) Arrays void foo() { int arr[2]; auto [i0, i1] = arr; } [B1] 1: int arr[2]; 2: arr 3: [B1.2] (ImplicitCastExpr, ArrayToPointerDecay, int *) 4: * 5: [B1.3][[B1.4]] 6: [B1.5] (ImplicitCastExpr, LValueToRValue, int) 7: {[B1.6]} 8: auto = {arr[*]}; Preds (1): B2 Succs (1): B0 2.) Structs struct S { int x; int y; }; void foo() { S s; auto [i0, i1] = s; } [B1] 1: (CXXConstructExpr, [B1.2], S) 2: S s; 3: s 4: [B1.3] (ImplicitCastExpr, NoOp, const struct S) 5: [B1.4] (CXXConstructExpr, [B1.6], S) 6: auto = s; Preds (1): B2 Succs (1): B0 i0 and i1 are not in the CFG. They only get added, when one of them is actually used. int x = i0; 9: i0 10: [B1.9] (ImplicitCastExpr, LValueToRValue, int) 11: int x = i0; That means. when you see the DeclRefExpr you mentioned, the synthesized variable has already been created and the BindingDecl can see it. -DeclRefExpr <col:9> 'std::tuple_element<0, std::tuple<int, int>>::type':'int' lvalue Binding 0x559942acf350 'i0' 'std::tuple_element<0, std::tuple<int, int>>::type':'int' By the time you see this node, the variable has already been created and you can access it using BindingDecl::getHoldingVar(). You want to analyze these Comment Actions
Indeed. That''s the issue (which this discussion has helped clarify) -- the binding decls are not in the CFG and I was expecting them to be.
Great, that's exactly what this patch does. So, it sounds like we have what we need. Thank you! Gabor -- now that I understand the issues involved, I think there are two options here:
Based on this discussion, I'm inclined to go with the first option. Moreover, we could use it *for all* BindingDecls, not just tuple-typed decls and thereby entirely drop the custom handling of BindingDecls. WDYT? Comment Actions
Sorry, I should fix my response above: I *never* see the BindingDecls in the CFG, whether or not they are used. Can you send me an example CFG that contains the BindingDecls? Comment Actions
It was my fault, I was wrong with the wording. I meant the DeclRefExprs that refer to the BindingDecls. You can That's also how the static analyzer handles the BindingDecls in ExprEngine::VisitCommonDeclRefExpr(). Comment Actions Simplifies the implementation and removes mention of any potential CFG problem. This new version does the work eagerly by forcing a location for the synthesized variable. Comment Actions Got it, thanks! So, I think everything is resolved now, you've been incredibly helpful. Based on my now understanding the CFG representation of DecompositionDecls a lot better, I found a simpler solution that doesn't even need to involve our handling of DeclRefExprs and is therefore consistent with how we already handle the struct references. I think there's a separate discussion we might want to have about whether it would be valuable (to the static analyzer and the dataflow framework, among others) to update CFG to explicitly represent BindingDecls and the AST nodes in their bindings. But, that seems better for a discourse thread. :) Comment Actions This approach looks good to me. Some context: we kept the CFGs lightweight because it looks like we did not need to do any extensions for the Clang Static Analyzer. I'm glad the dataflow framework can also work with the current representation. On the other hand, I think structured bindings in C++ are really limited, e.g., we cannot nest them, and it is nowhere near what pattern matching can do in other languages like Rust. I do remember seeing papers about extending structured bindings in at least some dimensions like nesting. I wonder if making the representation more explicit in the CFG as structured bindings get more complex will simplify implementations in the future, but that is a problem for tomorrow :)
|
Nit commented out code.