Implement P2718R0 "Lifetime extension in range-based for loops" (https://wg21.link/P2718R0)
Details
- Reviewers
aaron.ballman cor3ntin erichkeane rsmith shafik hubert.reinterpretcast sammccall - Group Reviewers
Restricted Project
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sorry, I missed the ping on Discord.
Thanks for working on this
I don't feel qualified to review this, but I don't think there are nearly enough tests.
FYI there is a previous attempt at this here https://reviews.llvm.org/D139586 - with some test suggestions
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | This isn't the right way to model the behavior here -- the presence or absence of an ExprWithCleanups is just a convenience to tell consumers of the AST whether they should expect to see cleanups later or not, and doesn't carry an implication of affecting the actual temporary lifetimes and storage durations. The outcome that we should be aiming to reach is that all MaterializeTemporaryExprs created as part of processing the for-range-initializer are marked as being lifetime-extended by the for-range variable. Probably the simplest way to handle that would be to track the current enclosing for-range-initializer variable in the ExpressionEvaluationContextRecord, and whenever a MaterializeTemporaryExpr is created, if there is a current enclosing for-range-initializer, mark that MaterializeTemporaryExpr as being lifetime-extended by it. |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | Awesome! Thanks a lot for your advice, this is very helpful! I want to take a longer look at it. |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | Thanks for your tips! I have a question that what's the correct way to extent the lifetime of CXXBindTemporaryExpr? Can I just materialize the temporary? It may replaced by MaterializeTemporaryExpr, and then I can mark it as being lifetime-extended by the for-range variable. |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | Eg. void f() { int v[] = {42, 17, 13}; Mutex m; for (int x : static_cast<void>(LockGuard(m)), v) // lock released in C++ 2020 { LockGuard guard(m); // OK in C++ 2020, now deadlocks } } BinaryOperator 0x135036220 'int[3]' lvalue ',' |-CXXStaticCastExpr 0x1350361d0 'void' static_cast<void> <ToVoid> | `-CXXFunctionalCastExpr 0x135036198 'LockGuard':'struct LockGuard' functional cast to LockGuard <ConstructorConversion> | `-CXXBindTemporaryExpr 0x135036178 'LockGuard':'struct LockGuard' (CXXTemporary 0x135036178) | `-CXXConstructExpr 0x135036140 'LockGuard':'struct LockGuard' 'void (Mutex &)' | `-DeclRefExpr 0x135035e18 'Mutex':'class Mutex' lvalue Var 0x135035b40 'm' 'Mutex':'class Mutex' `-DeclRefExpr 0x135036200 'int[3]' lvalue Var 0x135035928 'v' 'int[3]' |
Marking that as needing revision while the author explores the direction outlined by Richard :)
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | If MaterializeTemporaryExpr represents a "temporary materialization conversion", then the above should already have one just under the static_cast to void (since the cast operand would be a discarded-value expression). There may be unfortunate effects from materializing temporaries for discarded-value expressions though: Technically, temporaries are also created for objects having scalar type. Currently, MaterializeTemporaryExpr is documented as being tied to reference binding, but that is not correct: for example, MaterializeTemporaryExpr also appears when a member access is made on a temporary of class type. |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | http://eel.is/c++draft/class.temporary says: [Note 3: Temporary objects are materialized: ....... (2.6) when a prvalue that has type other than cv void appears as a discarded-value expression ([expr.context]). — end note] Seems we should materialized the discard-value expression in this case, WDYT? |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | I think we should, but what is the codegen fallout? Would no-opt builds start writing 42 into allocated memory for static_cast<void>(42)? |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | Thanks for your confirm @hubert.reinterpretcast ! I have tried locally, the generated IR of void f() is: define void @f()() { entry: %v = alloca [3 x i32], align 4 %m = alloca %class.Mutex, align 8 %__range1 = alloca ptr, align 8 %ref.tmp = alloca %struct.LockGuard, align 8 %__begin1 = alloca ptr, align 8 %__end1 = alloca ptr, align 8 %x = alloca i32, align 4 %guard = alloca %struct.LockGuard, align 8 call void @llvm.memcpy.p0.p0.i64(ptr align 4 %v, ptr align 4 @__const.f().v, i64 12, i1 false) %call = call noundef ptr @Mutex::Mutex()(ptr noundef nonnull align 8 dereferenceable(64) %m) %call1 = call noundef ptr @LockGuard::LockGuard(Mutex&)(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp, ptr noundef nonnull align 8 dereferenceable(64) %m) store ptr %v, ptr %__range1, align 8 %0 = load ptr, ptr %__range1, align 8 %arraydecay = getelementptr inbounds [3 x i32], ptr %0, i64 0, i64 0 store ptr %arraydecay, ptr %__begin1, align 8 %1 = load ptr, ptr %__range1, align 8 %arraydecay2 = getelementptr inbounds [3 x i32], ptr %1, i64 0, i64 0 %add.ptr = getelementptr inbounds i32, ptr %arraydecay2, i64 3 store ptr %add.ptr, ptr %__end1, align 8 br label %for.cond for.cond: ; preds = %for.inc, %entry %2 = load ptr, ptr %__begin1, align 8 %3 = load ptr, ptr %__end1, align 8 %cmp = icmp ne ptr %2, %3 br i1 %cmp, label %for.body, label %for.cond.cleanup for.cond.cleanup: ; preds = %for.cond %call5 = call noundef ptr @LockGuard::~LockGuard()(ptr noundef nonnull align 8 dereferenceable(8) %ref.tmp) #6 br label %for.end for.body: ; preds = %for.cond %4 = load ptr, ptr %__begin1, align 8 %5 = load i32, ptr %4, align 4 store i32 %5, ptr %x, align 4 %call3 = call noundef ptr @LockGuard::LockGuard(Mutex&)(ptr noundef nonnull align 8 dereferenceable(8) %guard, ptr noundef nonnull align 8 dereferenceable(64) %m) %call4 = call noundef ptr @LockGuard::~LockGuard()(ptr noundef nonnull align 8 dereferenceable(8) %guard) #6 br label %for.inc for.inc: ; preds = %for.body %6 = load ptr, ptr %__begin1, align 8 %incdec.ptr = getelementptr inbounds i32, ptr %6, i32 1 store ptr %incdec.ptr, ptr %__begin1, align 8 br label %for.cond for.end: ; preds = %for.cond.cleanup %call6 = call noundef ptr @Mutex::~Mutex()(ptr noundef nonnull align 8 dereferenceable(64) %m) #6 ret void } The AST of void f() is: |-FunctionDecl 0x14311d408 <line:62:1, line:69:1> line:62:6 used f 'void ()' | `-CompoundStmt 0x143808898 <col:10, line:69:1> | |-DeclStmt 0x14311d710 <line:63:3, col:25> | | `-VarDecl 0x14311d528 <col:3, col:24> col:7 used v 'int[3]' cinit | | `-InitListExpr 0x14311d648 <col:13, col:24> 'int[3]' | | |-IntegerLiteral 0x14311d590 <col:14> 'int' 42 | | |-IntegerLiteral 0x14311d5b0 <col:18> 'int' 17 | | `-IntegerLiteral 0x14311d5d0 <col:22> 'int' 13 | |-DeclStmt 0x14311d9f0 <line:64:3, col:10> | | `-VarDecl 0x14311d740 <col:3, col:9> col:9 used m 'Mutex':'Mutex' callinit destroyed | | `-CXXConstructExpr 0x14311d9b0 <col:9> 'Mutex':'Mutex' 'void ()' | `-CXXForRangeStmt 0x143808700 <line:65:3, line:68:3> | |-<<<NULL>>> | |-DeclStmt 0x14311e2c0 <line:65:16> | | `-VarDecl 0x14311dfa8 <col:16, col:49> col:16 implicit used __range1 'int (&)[3]' cinit | | `-ExprWithCleanups 0x14311e238 <col:16, col:49> 'int[3]' lvalue | | `-BinaryOperator 0x14311de38 <col:16, col:49> 'int[3]' lvalue ',' | | |-CXXStaticCastExpr 0x14311dde8 <col:16, col:46> 'void' static_cast<void> <ToVoid> | | | `-MaterializeTemporaryExpr 0x14311ddd0 <col:34, col:45> 'LockGuard':'LockGuard' xvalue extended by Var 0x14311dfa8 '__range1' 'int (&)[3]' | | | `-CXXFunctionalCastExpr 0x14311dd98 <col:34, col:45> 'LockGuard':'LockGuard' functional cast to LockGuard <ConstructorConversion> | | | `-CXXBindTemporaryExpr 0x14311dd78 <col:34, col:45> 'LockGuard':'LockGuard' (CXXTemporary 0x14311dd78) | | | `-CXXConstructExpr 0x14311dd40 <col:34, col:45> 'LockGuard':'LockGuard' 'void (Mutex &)' | | | `-DeclRefExpr 0x14311da18 <col:44> 'Mutex':'Mutex' lvalue Var 0x14311d740 'm' 'Mutex':'Mutex' | | `-DeclRefExpr 0x14311de18 <col:49> 'int[3]' lvalue Var 0x14311d528 'v' 'int[3]' | |-DeclStmt 0x143808588 <col:14> | | `-VarDecl 0x14311e360 <col:14> col:14 implicit used __begin1 'int *':'int *' cinit | | `-ImplicitCastExpr 0x143808400 <col:14> 'int *' <ArrayToPointerDecay> | | `-DeclRefExpr 0x14311e2d8 <col:14> 'int[3]' lvalue Var 0x14311dfa8 '__range1' 'int (&)[3]' | |-DeclStmt 0x1438085a0 <col:14> | | `-VarDecl 0x143808248 <col:14, col:16> col:14 implicit used __end1 'int *':'int *' cinit | | `-BinaryOperator 0x143808468 <col:14, col:16> 'int *' '+' | | |-ImplicitCastExpr 0x143808450 <col:14> 'int *' <ArrayToPointerDecay> | | | `-DeclRefExpr 0x14311e2f8 <col:14> 'int[3]' lvalue Var 0x14311dfa8 '__range1' 'int (&)[3]' | | `-IntegerLiteral 0x143808430 <col:16> 'long' 3 | |-BinaryOperator 0x143808628 <col:14> 'bool' '!=' | | |-ImplicitCastExpr 0x1438085f8 <col:14> 'int *':'int *' <LValueToRValue> | | | `-DeclRefExpr 0x1438085b8 <col:14> 'int *':'int *' lvalue Var 0x14311e360 '__begin1' 'int *':'int *' | | `-ImplicitCastExpr 0x143808610 <col:14> 'int *':'int *' <LValueToRValue> | | `-DeclRefExpr 0x1438085d8 <col:14> 'int *':'int *' lvalue Var 0x143808248 '__end1' 'int *':'int *' | |-UnaryOperator 0x143808668 <col:14> 'int *':'int *' lvalue prefix '++' | | `-DeclRefExpr 0x143808648 <col:14> 'int *':'int *' lvalue Var 0x14311e360 '__begin1' 'int *':'int *' | |-DeclStmt 0x14311dee0 <col:8, col:50> | | `-VarDecl 0x14311de78 <col:8, col:14> col:12 x 'int' cinit | | `-ImplicitCastExpr 0x1438086d0 <col:14> 'int' <LValueToRValue> | | `-UnaryOperator 0x1438086b8 <col:14> 'int' lvalue prefix '*' cannot overflow | | `-ImplicitCastExpr 0x1438086a0 <col:14> 'int *':'int *' <LValueToRValue> | | `-DeclRefExpr 0x143808680 <col:14> 'int *':'int *' lvalue Var 0x14311e360 '__begin1' 'int *':'int *' | `-CompoundStmt 0x143808880 <line:66:3, line:68:3> | `-DeclStmt 0x143808868 <line:67:5, col:23> | `-VarDecl 0x143808778 <col:5, col:22> col:15 guard 'LockGuard':'LockGuard' callinit destroyed | `-CXXConstructExpr 0x143808820 <col:15, col:22> 'LockGuard':'LockGuard' 'void (Mutex &)' | `-DeclRefExpr 0x1438087e0 <col:21> 'Mutex':'Mutex' lvalue Var 0x14311d740 'm' 'Mutex':'Mutex'
I have got this result with my locally build clang define noundef i32 @main() { entry: %retval = alloca i32, align 4 %ref.tmp = alloca i32, align 4 store i32 0, ptr %retval, align 4 store i32 42, ptr %ref.tmp, align 4 call void @f()() ret i32 0 } |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 |
@rsmith, in terms of the AST, do you have advice on the approach we should take in generating MaterializeTemporaryExprs for discarded-value expressions? Do we go with a language-specification purest direction and generate these for all prvalue object-typed discarded-value expressions? Do we limit generating them to range-for initializers? Do we further specialize and limit to non-scalar types or, even more specifically, to cases involving constructors or destructors? |
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | The cleanest solution seems to be that we create all the language-specified temporary materialization conversions; that approach will minimize the chance of subtle bugs in other areas of language semantics, and give a more faithful AST to external AST consumers. I think I have two questions:
I think the first step should be to add the code to create the new MTEs. Then we can explore points (1) and (2) to decide if we do so only within for-range initializers or everywhere. |
Thanks for your comments and sorry for the very late reply, I have been investigating how to achieve it these days.
clang/lib/Sema/SemaExprCXX.cpp | ||
---|---|---|
8901–8914 | Seems it was ignored before: https://github.com/llvm/llvm-project/blob/cb63fa00d1f723e3b4e2fb5e6645595eb0a6e84c/clang/lib/Sema/SemaExprCXX.cpp#L8219-L8228
I try to implement as wording by the standard and add a new flags variable CanIgnore to identify that the MaterializeTemporaryExpr can be ignored, but it broken 300+ tests, and I will try to fix these.
Can I just dump the status of BumpPtrAllocator? or does clang has any fancy option to do this? |
Sorry for the late reply. I tried to investigate the memory impact of creating these additional materializations by build the whole llvm-project and compare the number of MaterializedTemporaryExpr created during parsing.
Steps:
- Add a public member uint64_t NumMetarilizedTemporaryExpr = 0; in ASTContext .
- Increment the value of NumMetarilizedTemporaryExpr in Sema::CreateMaterializeTemporaryExpr.
- Write the NumMetarilizedTemporaryExpr into a text file when ASTContext destruction, each translation unit will append a line to the file to record the value of NumMetarilizedTemporaryExpr .
- Build the entire llvm-project separately using the compiler that creates addational materializations and the compiler that doesn't.
- Sum the numbers produced by each translation unit.
The result is:
Item | Count |
Addational Materialized Temporarys Total | 50655585 |
Clang Trunk Total | 18346347 |
Diff | 32309238 |
The more detail result in https://yronglin.github.io
The gap between these two numbers is very large. So I'think we can create additional materializations only within for-range initializers. I'm not sure if I can find a way to only create materializes for temporaries that need to have an extended lifetime, WDYT?
@rsmith asked for something slightly different (but I am not sure how to collect that info): He wanted the AST size delta.
For what we have, I think the detailed results table would be more useful if we add some size metric for the files and the increase expressed as a percentage. Perhaps sorting by file size would help contextualize the significance of the percentage change.
Thanks to Aaron's suggestion in Discard, the initial idea was to count the number of MaterializedTemporaryExpr created in the process of building llvm-project, and then calculate the precise memory increment data, without being affected by the operating system state. Maybe I can refine this table, add a column to show the memory increment percentage change of each file.
Only create addational metarialized temporary in for-range-init.
FIXME: Need to handle function default argument, and add more test to make sure the generated LLVM IR is correct.
FIXME: Need add test in clang/test/CXX/special/class.temporary/p6.cpp to check generated LLVM IR.
Should we dump the real default argument AST but not only CXXDefaultArgExpr if CXXDefaultArgExpr has been rewritted?
@cor3ntin I have reused EnsureImmediateInvocationInDefaultArgs to rewrite CXXDefaultArgExpr, does this is a correct approach?
I have updated the table, and it was sorted by Count Inc col.
Num files: 7445
Item | Count | Mem |
Addational Materialized Temporarys Total | 50655585 | 1187240.2734 (KB) |
Clang Trunk Total | 18346347 | 429992.5078 (KB) |
Diff | 32309238 | 757247.7656 (KB) |
Avg | 4339.7230 | 4.2380 (KB/file) |
Fix ci failure, and introduce an variable in ExpressionEvaluationContextRecord to rewrite default argument.
Do not spass MaterializePRValueInDiscardStatement in PushExpressionEvaluationContext.
I'm really not qualified to give you meaningful feedback on design here, but I spotted a few minor things that can be improved, I hope it helps!
clang/include/clang/Sema/Sema.h | ||
---|---|---|
1342 | Can IsRewriteDefaultArgument and MaterializePRValueInDiscardStatement have different values? | |
1344–1346 | ||
1346 | Either IsInLifetimeExtendingContext or IsInCXXForRangeInitializer seem like better names | |
1348–1357 | ||
9774 | ||
clang/lib/Parse/ParseDecl.cpp | ||
2330–2333 ↗ | (On Diff #551871) | We need to decide whether we want to backport thgis behavior to older language mode, i think that would make sense. |
2340–2344 ↗ | (On Diff #551871) | A discarded statement is the non-taken branch of an if constexpr, so using discarded expression (or discarded value expression) will lead to less confusion. |
clang/lib/Sema/SemaExpr.cpp | ||
18168 ↗ | (On Diff #551871) | Whitespace only change |
clang/lib/Sema/SemaExprCXX.cpp | ||
8222–8223 | Correct me if I'm wrong but the thing we want to avoid here is to avoid creating unnecessary AST nodes, right? | |
clang/lib/Sema/SemaInit.cpp | ||
7660 ↗ | (On Diff #551871) |
@cor3ntin Thanks for your review!
clang/include/clang/Sema/Sema.h | ||
---|---|---|
1342 | Currently they have the same value, but they mean different things, so I use two separated variables. | |
1346 | Thanks for your suggestion, IsInLifetimeExtendingContext looks good to me, I can use this var name later. | |
clang/lib/Sema/SemaExpr.cpp | ||
6250 ↗ | (On Diff #551871) | @cor3ntin Does EnsureImmediateInvocationInDefaultArgs need to be renamed to a more generic name? |
18168 ↗ | (On Diff #551871) | Thanks, I'll remove this. |
clang/lib/Sema/SemaExprCXX.cpp | ||
8222–8223 | Yeah, ignore unnecessary AST nodes and runtime memory allocation. The redundant alloca/store instructions may be removed during the optimization phase. Eg. static_cast<void>(42); define noundef i32 @main() { entry: %retval = alloca i32, align 4 %ref.tmp = alloca i32, align 4 store i32 0, ptr %retval, align 4 store i32 42, ptr %ref.tmp, align 4 call void @f()() ret i32 0 } |
The changes in SemaInit.cpp don't look correct to me. Trying to recurse through the initializer and find every nested temporary is highly error-prone; I can't tell you which expression nodes you forgot to recurse through, but I'm sure there are some.
I think the approach being taken here is not really maintainable. We don't want the initialization code to need to know how to recurse through an expression after the fact and find all the temporaries that might need to be lifetime-extended, and we don't need it to do that either. Instead, we should make the expression evaluation context track the current for-range variable, and in Sema::CreateMaterializeTemporaryExpr, we should create a temporary that is already set to be lifetime-extended by the loop variable.
clang/include/clang/Sema/Sema.h | ||
---|---|---|
1341–1342 | Can you expand this comment to more fully describe what this flag governs? Which default argument? How would it be rewritten? | |
clang/lib/Parse/ParseDecl.cpp | ||
2330–2333 ↗ | (On Diff #551871) | Definitely not. This is a visible behavior change to a rule that was not in any way incorrect or broken, and for which there were no implementation concerns. The change was not voted into the standard as a DR (see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4929.html). Applying this behavior in older standard versions would be a conformance bug. |
@rsmith Thanks a lot for your comments and sorry for the very late reply, I was on vacation some time ago.
I agree, I have reverted the changes in SemaInit.cpp. I have ever tried to do lifetime-extend in Sema::CreateMaterializeTemporaryExpr, but I fall into some trouble, the MaterializeTemporaryExpr was created before the for-range VarDecl, so we don't have a VarDecl for MaterializeTemporaryExpr::setExtendingDecl in Sema::CreateMaterializeTemporaryExpr, I have a possible solution here, Eg. we allocate a memory block which has same size of VarDecl, and pass this pointer to MaterializeTemporaryExpr::setExtendingDecl as a placeholder VarDecl, when we build the for-range statement, we just construct a VarDecl in the memory block that we allocated before. But this approach doesn't look very good and not maintainable. So I use ForRangeInitTemporaryLifetimeExtensionVisitor to visit every temporaries in the initializer and extend the lifetime. And ForRangeInitTemporaryLifetimeExtensionVisitor was derived from RecursiveASTVisitor, maybe this approach be able to handle all temporaries. WDYT?
clang/include/clang/Sema/Sema.h | ||
---|---|---|
1341–1342 | Yeah, this variable has been removed. By default, all CallExprs in Clang share the same CXXDefaultArgExpr from parameters, but in some contexts (such as lifetime extension), the lifetime of the temporaries in the current default parameters needs to be extended, so the CXXDefaultArgExpr needs to be copied and the lifetime of temporaries in the copy need to be extended. |
@yronglin We are sorry it takes so much time to get feedback. Richard and Hubert have little bandwidth for reviews. Others, including me, don't feel qualified to provide good feedback.
@Endill Thanks for your reply, let's wait for them have time. You are clang experts, please feel free to comments.
I'm also not an authority here, but would like to help get this unstuck. It seems like how to tie the MaterializeTemporaryExprs to the lifetime-extending decl is the biggest outstanding question?
I think the RecursiveASTVisitor solution is still fragile.
Looking at BuildForRangeVarDecl, it seems it doesn't need any inputs that would preclude building it earlier, before parsing the range expression.
I haven't looked at *exactly* how to restructure the interaction between Parser/Sema to enable this, maybe it's a little ugly, but it seems justified in order to be able to use ExpressionEvaluationContextRecord as intended.
clang/lib/Sema/SemaStmt.cpp | ||
---|---|---|
2741 | I think this will walk into the body of lambdas and extend the lifetime of temporaries found there. You could explicitly prevent this, but this looks like the same sort of fragility Richard was concerned with: Expression evaluation context seems to be the right way to control the propagation, and it'd be nice not to reinvent it. |
Many thanks for your review!
I have canceled RecursiveASTVisitor plan. I try to use ExpressionEvaluationContextRecord and collect temporaries in CreateMaterializeTemporaryExpr when parsing for-range-init, and then extend lifetime of the collected temporaries in ActOnCXXForRangeStmt. WDYT?
Can IsRewriteDefaultArgument and MaterializePRValueInDiscardStatement have different values?
Maybe MaterializePRValueInDiscardStatement is sufficient