This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Structured binding to tuple-like types
ClosedPublic

Authored by isuckatcs on Jun 29 2022, 10:21 AM.

Details

Summary

Introducing support for creating structured bindings to tuple-like types.

The difference between binding a tuple-like type and the other cases is that
we have an additional variable that holds the value of evaluating the binding,
this is referred to as HoldingVar.

The HoldingVar cannot be bound directly to the store, as there is no DeclStmt
for it. As a result the idea is to evaluate the initializer of this variable and keep
it alive in the environment with the bindings.

The BindingDecl has access to this initializer, so when it is evaluated, we can
read the value from the environment and bind it to the desired expression.

This approach is still a draft, and has to be tested deeper.

Diff Detail

Event Timeline

isuckatcs created this revision.Jun 29 2022, 10:21 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2022, 10:21 AM
isuckatcs requested review of this revision.Jun 29 2022, 10:21 AM
isuckatcs edited the summary of this revision. (Show Details)Jun 29 2022, 10:23 AM
NoQ added a comment.Jun 29 2022, 2:25 PM

Oh, clever!

Is there a DeclStmt for the entire DecompositionDecl? If so, maybe move all values at once to the Store there? I'm thinking this because if more complex types start showing up (eg., by-value get), Environment won't have enough room to represent them. It would be natural to represent a C++ object as a region with a bunch of bindings. We could probably still represent it as a single LazyCompoundValue and re-compute the region every time it's used but that's highly unusual compared to how everything else works so I suspect we'll run into a bunch of cornercase inconveniences. On the other hand, temporarily "foretelling" the variable while the initializer is evaluated, and then permanently storing data in the variable, is exactly how C++ variable constructors normally work.

When I encounter the BindingDecl I have access to 2 things:

  • the initializer expression through getHoldingVar()
  • a DeclRefExpr to the variable, the value is bound to (e.g.: in case of auto[u] = tpl; the DeclRefExpr points to u

I have tried putting u into the store (though after the DeclStmt visitor), however it is immediately marked as a dead symbol, and gets removed.
I cannot put it into the CFG, as there is no DeclStmt that contains it.

Also, there is a problem with storing the initializer in the environment too. If get<>() returns by value, there is a MaterializeTemporaryExpr,
and a temporary region gets created. At this point the environment contains two get<>()s, one with the returned value, and one with a
reference to the element of the temporary region. At some point both of them is removed from the environment, even though the binding
is alive, so I lose the way to read the value from the store.

A snippet that reproduces it:

struct Test { int v = 1; };

namespace std {

    template <>
    struct tuple_size<Test> { static const std::size_t value = 1; };

    template <std::size_t I>
    struct tuple_element<I, Test> { using type = int; };
}

template <std::size_t I>
int get(Test t) { return t.v + 1; }

int main() {
    Test p;
    auto [u] = p;

    u = 10; <-- 10 is moved to the temporary region
    <-- `get<>()`s are removed from the environment

    int x = u; <-- `Unknown` is read
}
isuckatcs updated this revision to Diff 441373.Jun 30 2022, 6:09 AM

This approach is able to handle the case when get<>() returns by value.
Note: so far I've only been working with POD types.

Opposed to the previous approach, now the get<>() calls are evaluated after
the temporary object construction, but before the DeclStmt.

3: p
4: [B1.3] (ImplicitCastExpr, NoOp, const struct Test)
5: [B1.4] (CXXConstructExpr, [B1.18], struct Test)
6: auto = p;
<-- `get<>()`s used to be called here

===========================================================
3: p
4: [B1.3] (ImplicitCastExpr, NoOp, const struct Test)
5: [B1.4] (CXXConstructExpr, [B1.18], struct Test)
<-- `get<>()`s are called here
18: auto = p;

To keep the return value alive, some hack is introduced to the Liveness Analysis.

I would like to note, that evaluating the get<>() functions after each other, without
storing the return value in between is fine. The elements of the binding will be initialized to the
return value of the get<>()s, but the parameter passed to the function is the still the original
object.

An example might help understanding it better:

struct Test {
    int u = 1;
    int v = 2;
};

template <std::size_t I>
int get(Test &t) {
    // return for u
    if (I == 0)
        return 5;

    // return for v
    return t.u + 1;
}

void foo() {
    Test p;
    auto &[u, v] = p;

    std::cout << u << " " << v << "\n";
}

===========================================================
5 2

In case of get<>() returning by value, we don't want to evaluate the MaterializeTemporaryExpression,
as that would create a new temporary region, which we lose access to. We want
to place the returned value directly into the new region being created.

To achieve it, some hack is introduced to the CFG too. Basically we ignore the mentioned
expression.

The next thing to do, it to copy the resulting return values to the bound variables. As the
bound varaibles are not in the CFG, and creating them one-by-one makes no sense, as
they will be cleaned up at the end of the statement, the idea is to place them into the
temporary region in the same order, as the bindings appear.

This results in an O(n) lookup, as reading the value requires iterating over all of the bindings,
to find the index of the region, where the bindings is stored.

isuckatcs updated this revision to Diff 441552.EditedJun 30 2022, 4:49 PM

Introduced more hacks to handle binding by reference.

The problem is that if we bind by reference, we don't actually
reference the members of the original tuple-like type, but of a
temporary copy instead. This is an issue, if get<>() returns by
value, so each element of the temporary copy will be a value, not
a reference, so this temporary region and the original object
has to be treated as seperate entities.

Take a look at this snippet: https://godbolt.org/z/oWTjK8545

NoQ added a comment.Jun 30 2022, 8:04 PM

there is a MaterializeTemporaryExpr, and a temporary region gets created

Is it actually correct to create a temporary region when we see this MaterializeTemporaryExpr? IIRC our procedure is doing that incorrectly as a hack when no ConstructionContext is present (it's always bad when it's needed but it's not there) so there's no better region to work with. But in case of POD types it's probably the right solution regardless, as they aren't strictly required to have an address before materialization. But generally the alternative is for MaterializeTemporaryExpr to resolve to the actual address of the object that already exists. I'm not sure if this ever happens in C++17 though, mandatory copy elision may eliminate all such materializations and leave only the ones where a temporary region is indeed required. But then again, this region doesn't necessarily need to be created, if it's a non-POD type then the temporary region should have been already created when constructor was called, then the transfer function for MaterializeTemporaryExpr will simply look that up.

So, anyway, I'd like to step back and slow down a bit: you're saying you're adding hacks, we're supposed to reduce the amount of hacks, not increase them! I'd like to at least explore the correct, hack-free solution before implementing hacks.

Could you show us the AST we're dealing with, and walk us through every expression and explain how you interpret them?

I have tried putting u into the store (though after the DeclStmt visitor), however it is immediately marked as a dead symbol, and gets removed.

My suggestion was to still have liveness changes, except they affect declaration liveness, not expression liveness.

martong added inline comments.Jul 1 2022, 5:55 AM
clang/test/Analysis/live-bindings-test.cpp
118

Unrelated change, please remove.

Could you show us the AST we're dealing with, and walk us through every expression and explain how you interpret them?

I tried to create a detailed explanation of how I interpret things, what problems I'm facing, and what solutions I could find to
them.

1.) Binding by value

a) Built-in types

First I want to take a look at examples from built-in types like std::pair<>. The simplest snippet, that a developer might create is the
following:

std::pair<int, int> p{1, 2};
auto [u, v] = p;

The AST of the decomposition is the following:

|-DeclStmt
| `-DecompositionDecl 0x5649cbff2b88 used 'std::pair<int, int>':'std::pair<int, int>' cinit
|   |-CXXConstructExpr 'std::pair<int, int>':'std::pair<int, int>' 'void (const std::pair<int, int> &) noexcept'
|   | `-ImplicitCastExpr 'const std::pair<int, int>' lvalue <NoOp>
|   |   `-DeclRefExpr 'std::pair<int, int>':'std::pair<int, int>' lvalue Var 'p' 'std::pair<int, int>':'std::pair<int, int>'
|   |-BindingDecl u 'std::tuple_element<0, std::pair<int, int>>::type':'int'
|   | |-VarDecl implicit used u 'std::tuple_element<0, std::pair<int, int>>::type &&' cinit
|   | | `-CallExpr 'typename tuple_element<0UL, pair<int, int>>::type':'int' xvalue adl
|   | |   |-ImplicitCastExpr 'typename tuple_element<0UL, pair<int, int>>::type &&(*)(pair<int, int> &&) noexcept' <FunctionToPointerDecay>
|   | |   | `-DeclRefExpr 'typename tuple_element<0UL, pair<int, int>>::type &&(pair<int, int> &&) noexcept' lvalue Function 'get' 'typename tuple_element<0UL, pair<int, int>>::type &&(pair<int, int> &&) noexcept' (FunctionTemplate 'get')
|   | |   `-ImplicitCastExpr 'std::pair<int, int>':'std::pair<int, int>' xvalue <NoOp>
|   | |     `-DeclRefExpr 'std::pair<int, int>':'std::pair<int, int>' lvalue Decomposition 0x5649cbff2b88 '' 'std::pair<int, int>':'std::pair<int, int>'
|   | `-DeclRefExpr 'std::tuple_element<0, std::pair<int, int>>::type':'int' lvalue Var 'u' 'std::tuple_element<0, std::pair<int, int>>::type &&'
|   `- <<removed the second BindingDecl for readability>>

The way I interpret it, is that first a copy of p gets created, and that copy becomes the "decomposed object".
Then for each binding we take this "decomposed object" and pass it to std::get<>() as an xvalue, which also returns with an xvalue.
The returned value is used to initialize a variable corresponding to the identifiers, u and v in this case. When we encounter one
of the symbols in the code, we basically work with these variables.

The problem

There is only one DeclStmt, which holds the DecompositionDecl, and the indentifier holding var (i.e.: the VarDecls seen under BidningDecls)
initializers take this object as parameter.

If the holding var initializers are evaluated after the DeclStmt, we can't store the result, as there is no other DeclStmt we might visit.
If we evaluate the holding var initializers before the DeclStmt, technically the object they take as parameter doesn't exist.

My solution

Basically I evaluate the holding var initializers between the CXXConstructExpr and the DeclStmt by modifying the CFG.
That way the copy of the object is created, and this copy is automatically passed to the get<>() functions. Then, the return value of the
get<>() functions are kept alive in the environment until the DeclStmt is visited. Then I take these return values and place them into the
temporaray region of the store created by CXXConstructExpr in the same order as the indentifiers appear in the source code.

b.) User defined types

Every developer is allowed to define their own tuple-like types and overload the get<>() function. Let's assume, a developer creates a
get<>() overload, that returns by value.

struct Test { int u = 1; };

<<other definitions that make Test tuple-like>>

template <std::size_t I>
int get(Test &&t) { return t.u; }

Creating a structured binding to this type is the same as the one in the previous section.

Test p;
auto [u] = p;

The resulting AST is different thought.

|-DeclStmt 
| `-DecompositionDecl 0x55aa4a537a90 used 'Test':'Test' cinit
|   |-CXXConstructExpr 'Test':'Test' 'void (const Test &) noexcept'
|   | `-ImplicitCastExpr 'const Test' lvalue <NoOp>
|   |   `-DeclRefExpr 'Test' lvalue Var 'p' 'Test'
|   `-BindingDecl u 'std::tuple_element<0, Test>::type':'int'
|     |-VarDecl implicit used u 'std::tuple_element<0, Test>::type &&' cinit
|     | `-ExprWithCleanups 'int' xvalue
|     |   `-MaterializeTemporaryExpr 'int' xvalue extended by Var 'u' 'std::tuple_element<0, Test>::type &&'
|     |     `-CallExpr 'int' adl
|     |       |-ImplicitCastExpr 'int (*)(Test &&)' <FunctionToPointerDecay>
|     |       | `-DeclRefExpr 'int (Test &&)' lvalue Function 'get' 'int (Test &&)' (FunctionTemplate 'get')
|     |       `-ImplicitCastExpr 'Test':'Test' xvalue <NoOp>
|     |         `-DeclRefExpr 'Test':'Test' lvalue Decomposition 0x55aa4a537a90 '' 'Test':'Test'
|     `-DeclRefExpr 'std::tuple_element<0, Test>::type':'int' lvalue Var 'u' 'std::tuple_element<0, Test>::type &&'

The difference is that it's not directly the CallExpr that initializes the holding var, it is wrapped into a MaterializeTemporaryExpr and
an ExprWithCleanups node.

The problem

Apart from the previous section, the additional problem is that after evaluating the CallExpr we get the return value in the environment
like get<0>() -> 1, which is then copied to a temporary region because of the MaterializeTemporaryExpr. After that the return value
is removed from the environment along with the entry that points to this temporary region, so there is no way to access it in the DeclStmt
visitor.

My solution

Ignore the MaterializeTemporaryExpr, and carry the return value in the environment to the DeclStmt visitor. The rest is the same as in the
previous section.

2.) Binding by reference

In case of references the only difference is the DecompositionDecl initializer.

std::pair<int, int> p{1, 2};
auto &[u, v] = p;
|-DeclStmt 
| `-DecompositionDecl 0x55cf0d0a95f0 used 'std::pair<int, int> &' cinit
|   |-DeclRefExpr 'std::pair<int, int>':'std::pair<int, int>' lvalue Var 'p' 'std::pair<int, int>':'std::pair<int, int>'
|   |-BindingDecl u 'std::tuple_element<0, std::pair<int, int>>::type':'int'
|   | |-VarDecl implicit used u 'std::tuple_element<0, std::pair<int, int>>::type &' cinit
|   | | `-CallExpr 'typename tuple_element<0UL, pair<int, int>>::type':'int' lvalue adl
|   | |   |-ImplicitCastExpr 'typename tuple_element<0UL, pair<int, int>>::type &(*)(pair<int, int> &) noexcept' <FunctionToPointerDecay>
|   | |   | `-DeclRefExpr 'typename tuple_element<0UL, pair<int, int>>::type &(pair<int, int> &) noexcept' lvalue Function 'get' 'typename tuple_element<0UL, pair<int, int>>::type &(pair<int, int> &) noexcept' (FunctionTemplate 'get')
|   | |   `-DeclRefExpr 'std::pair<int, int>':'std::pair<int, int>' lvalue Decomposition 0x55cf0d0a95f0 '' 'std::pair<int, int> &'
|   | `-DeclRefExpr 'std::tuple_element<0, std::pair<int, int>>::type':'int' lvalue Var 'u' 'std::tuple_element<0, std::pair<int, int>>::type &'
|   `- <<removed the second BindingDecl for readability>>
The problem

Opposed to the case of binding by value, here no temporary object is created, that can be passed to get<>(), so the DeclStmt must be
evaluated before the get<>() calls. However if that happens, the opportunity to bind the return values to the store is lost, as there is no
other statement that allows us to place the value into the store.

My solution

I duplicate the DeclStmt in the CFG, so it is visited twice. Once before the get calls and once after it. So the evaluation looks like this:

  1. visit the DeclRefExpr
  2. visit the DeclStmt and bind the decomposition to the object
  3. evaluate the get<>() function
  4. visit the DeclStmt again and place the return values to the store

The proper solution

In this case the proper solution would be to evaluate the DeclStmt, so that the temporary object is created. Then create a separate variable
for each of the structured binding identifiers (the ones denoted by u or v in the above examples), and keep these variables alive somehow.

The BindingDecl stores a DeclRefExpr to the appropriate variable, so evaluating it is straightforward, if the variables exist.

isuckatcs updated this revision to Diff 441809.EditedJul 1 2022, 4:23 PM
isuckatcs marked an inline comment as done.

I removed all of the hacks. Also the HoldingVars are now lazyly created, which means,
creation happens at the first time of attempting to read their value. Until then the
environment holds their values (it even holds them after, removing them is a todo).

So this is better than the previous attempt, which was full of hacks, but the environment
is still polluted as it was in the first one. At the moment I don't see how could we evade that.

isuckatcs added inline comments.Jul 1 2022, 4:49 PM
clang/test/Analysis/live-bindings-test.cpp
112

Though we might pass a Mytuple to this function, that has it's fields initialized, so is the warning reported here a bug?

isuckatcs updated this revision to Diff 441960.Jul 3 2022, 9:05 AM

Added more testcases

isuckatcs updated this revision to Diff 441964.Jul 3 2022, 10:17 AM

Updated

clang/test/Analysis/live-bindings-test.cpp
112

Ignore this comment, it turned out to be a bug.

This comment was removed by isuckatcs.
NoQ added a comment.Jul 4 2022, 10:05 PM

The way I interpret it, is that first a copy of p gets created, and that copy becomes the "decomposed object".
Then for each binding we take this "decomposed object" and pass it to std::get<>() as an xvalue, which also returns with an xvalue.
The returned value is used to initialize a variable corresponding to the identifiers, u and v in this case. When we encounter one
of the symbols in the code, we basically work with these variables.

The problem

There is only one DeclStmt, which holds the DecompositionDecl, and the indentifier holding var (i.e.: the VarDecls seen under BidningDecls)
initializers take this object as parameter.
If the holding var initializers are evaluated after the DeclStmt, we can't store the result, as there is no other DeclStmt we might visit.
If we evaluate the holding var initializers before the DeclStmt, technically the object they take as parameter doesn't exist.

Ok interesting, so there's still a copy/move construction into the decomposition variable, and then the bindings are looking into that copy.

Given that there's copy/move construction, there's already a system in place to keep the variable alive: it's "objects under construction". So just like in case of normal variables, the DeclStmt goes last in the CFG, but as soon as we hit CXXConstructExpr, its ConstructionContext is supposed to tell us about the upcoming DeclStmt and its VarDecl. Then the objects-under-construction state trait serves as an anchor to mark the variable as live until we hit DeclStmt in the CFG, which is exactly as much time as we need - after that live variables analysis will keep it alive.

This is the existing solution, it's probably not perfect, but it's universal enough to keep everything uniform, as simple as it gets with these things. Like, I could have implemented it in CFG-based live variable analysis instead, but that doesn't cover 30 other subclasses of construction contexts. So instead of 30 different changes in the liveness analysis algorithm, we have a single path-sensitive state trait that keeps everything we need alive for as long as we need it:

/// ExprEngine::removeDead()
843 for (auto I : CleanedState->get<ObjectsUnderConstruction>()) {
844   if (SymbolRef Sym = I.second.getAsSymbol())
845     SymReaper.markLive(Sym);
846   if (const MemRegion *MR = I.second.getAsRegion())
847     SymReaper.markLive(MR);
848 }

So at a glance, unless something's wrong, the decomposition variable is supposed to be kept alive by this clause until its DeclStmt is hit, just because it's a memory region that's a target of construction. The object definitely already exists (in terms of the Standard) because it was recently passed as this into the constructor. It's also live with respect to our path-sensitive notion of liveness because the trait keeps it alive.

I guess there's a matter of holding vars liveness. They definitely deserve the same lifetime as the decomposition variable. That's one change in live variables analysis that's definitely necessary. Then, separately from that, we need to keep these variables alive *before* the DeclStmt (after CXXConstructExpr) by using the trait. We might use the objects-under-construction trait for that - add another clause to it "if I.first is a DeclStmt that defines a DecompositionDecl, keep holding var regions alive".

So it sounds like a plan to me - evaluate everything in its natural order and use the trait as a backup for liveness issues.

2.) Binding by reference

Ohh. So it's not going to bind by value anyway, instead it'll contextually convert the prvalue into xvalue (this is literally what materialization is, https://en.cppreference.com/w/cpp/language/implicit_conversion#Temporary_materialization, "let's take the number 5 and uniquely determine where in memory it's stored"). In this case CXXTempObjectRegion is indeed the correct value of the MaterializeTemporaryExpr in all cases, be it construction or a primitive type. Except if it's construction, the CXXTempObjectRegion already exists by the time we hit MaterializeTemporaryExpr (we've just used it as this inside the constructor), we just need to remember what it was, but in case of primitive types we need to create a fresh region when we hit the MaterializeTemporaryExpr, because we didn't need one yet.

isuckatcs updated this revision to Diff 442278.Jul 5 2022, 6:28 AM
  • Fixed an issue that caused a crash
  • Added more testcases
isuckatcs updated this revision to Diff 442384.Jul 5 2022, 1:22 PM
  • Created an additional DeclStmt for each HoldingVar in the CFG
  • Added more testcases

Note: std::pair<> is a tuple-like type, however in our system-header-simulator-cxx.h it is mocked as a regular, non tuple-like type.

isuckatcs updated this revision to Diff 442388.Jul 5 2022, 1:42 PM

Added a check on the return value of SVal::getAsRegion() before passing it to ProgramState::getSVal(). The MemoryRegion* taken as parameter is asserted later, so if it is a nullptr, clang crashes.

NoQ added a comment.Jul 5 2022, 10:03 PM

This patch is starting to look really beautiful!

Note: std::pair<> is a tuple-like type, however in our system-header-simulator-cxx.h it is mocked as a regular, non tuple-like type.

In this case it's better to correct the simulator. We're simulating things "as needed", probably the previous tests didn't rely on it being tuple-like. If they did, you can give them another mock structure that's not tuple-like.

NoQ added inline comments.Jul 5 2022, 10:08 PM
clang/test/Analysis/live-bindings-test.cpp
119

Ok so do we ideally want this to warn? I suspect that we do. Of course the constructor for DecompositionDecl may have weird side effects, but if they just wanted these side effects they would have created a regular variable, they didn't need to decompose anything. So it's dead code one way or the other, which warrants a warning (because that's the only thing the dead stores checker ever warns about).

How difficult do you think it is to make this test pass? Maybe add a FIXME here so that people knew they should address it later, or if they accidentally address it they knew it's a good thing?

isuckatcs marked an inline comment as done.Jul 6 2022, 3:18 AM
isuckatcs added inline comments.
clang/test/Analysis/live-bindings-test.cpp
119

It seems there are many cases, when we ignore dead code, see DeadStoreObs::observeStmt(). One of these cases is if the initializer of a DeclStmt is a CXXConstructExpr. So as a result we don't get the warning even for simpler statements like NonPOD a, b; a = b;, even though we get it for int a, b; a = b;.

In the example above, the initializer is a copy constructor, hence a CXXConstructExpr.

Looking at the explanation of why we ignore dead code, I think it will require some work to get that warning showing, so I put a FIXME there instead.

// Don't warn on C++ objects (yet) until we can show that their
// constructors/destructors don't have side effects.
if (isa<CXXConstructExpr>(E))
  return;
isuckatcs updated this revision to Diff 442500.Jul 6 2022, 3:40 AM
isuckatcs marked an inline comment as done.

Added a FIXME

NoQ accepted this revision.Jul 6 2022, 7:08 PM

Ok wonderful! This is the hardest patch so far, let's evaluate it on some real-world code before landing.

clang/test/Analysis/live-bindings-test.cpp
119

Aha great! Sounds like this could be an easy fix if we're into it.

This revision is now accepted and ready to land.Jul 6 2022, 7:08 PM

Added testcases for structured binding syntax 2 and 3.

This revision was landed with ongoing or failed builds.Jul 26 2022, 1:24 AM
This revision was automatically updated to reflect the committed changes.