This is an archive of the discontinued LLVM Phabricator instance.

[Coroutines] Reuse storage for local variables with non-overlapping lifetimes
ClosedPublic

Authored by ChuanqiXu on Sep 13 2020, 11:09 PM.

Details

Summary

bug 45566 shows the process of building coroutine frame won't consider that the lifetimes of different local variables are not overlapped, which means the compiler could generates smaller frame.

struct big_structure {
    big_structure();
    char dummy[500];
};
cppcoro::task<void> alternative_paths(bool cond) {
   // 1072 byte coroutine frame
    if (cond) {
        big_structure a;
        process(a);
        co_await something();
    } else {
        big_structure b;
        process2(b);
        co_await something();
    }
}
void normal_function(bool cond) {
    // 512 byte stack frame
    if (cond) {
        big_structure a;
        process(a);
    } else {
        big_structure b;
        process2(b);
    }
}

This patch calculate the lifetime range of each alloca by StackLifetime class. Then the patch build non-overlapped sets for allocas whose lifetime ranges are not overlapped. We use the largest type in a non-overlapped set as the field type in the frame. In insertSpills process, if we find the type of field is not the same with the alloca, we cast the pointer to the field type to the pointer to the alloca type. Since the lifetime range of alloca in one non-overlapped set is not overlapped with each other, it should be ok to reuse the storage space in the frame.

Test plan: check-llvm, check-clang, cppcoro, folly

Diff Detail

Event Timeline

ChuanqiXu created this revision.Sep 13 2020, 11:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 13 2020, 11:09 PM
ChuanqiXu requested review of this revision.Sep 13 2020, 11:09 PM

Thank you for working on this optimization!

This type of optimizations is typically not enabled in O0, and we probably don't want to either. Should we gate it under the optimization level?

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
365–369

Related to the optimization level suggestion, if we are to gate this to optimization levels, we should also put all of these into a local analyzer class, and give it an interface to get the info of each alloca, such that the caller can be as independent from this optimization as possible.

484

What is this referring to? Allocas in this function seems to be SmallVector type.

509–512

I believe you can obtain the list of suspends through Shape.CoroSuspends?

Thank you for working on this optimization!

This type of optimizations is typically not enabled in O0, and we probably don't want to either. Should we gate it under the optimization level?

This optimization depends on lifetime markers, which would not be produced in O0. So this optimization won't be enabled in O0.

Thank you for working on this optimization!

This type of optimizations is typically not enabled in O0, and we probably don't want to either. Should we gate it under the optimization level?

This optimization depends on lifetime markers, which would not be produced in O0. So this optimization won't be enabled in O0.

That's not entirely true though. lifetime markers could be enabled with ASAN in O0 for instance.

ChuanqiXu added inline comments.Sep 14 2020, 8:06 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
484

We add the field for allocas in addFieldForAllocas, where the allocas are in DenseMap. I think it is more clear to move the comment to addFieldForAllocas function.

Thank you for working on this optimization!

This type of optimizations is typically not enabled in O0, and we probably don't want to either. Should we gate it under the optimization level?

This optimization depends on lifetime markers, which would not be produced in O0. So this optimization won't be enabled in O0.

That's not entirely true though. lifetime markers could be enabled with ASAN in O0 for instance.

Oh, I don't recognize there are other cases. I would like to refactor this optimization into a local analyzer pass.

lxfind added inline comments.Sep 14 2020, 8:20 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
484

I see. In that case, my understanding is that the order of allocas in the frame may be different with the order in the source code not because we use DenseMap but because we sort using the size.

ChuanqiXu added inline comments.Sep 14 2020, 8:55 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
484

Yes. In my experiments, the mismatch order happens after I sort the allocas. I think DenseMap may also be responsible to this mismatch in some conditions. Because we add the field for allocas by such way:

for (Set in NonOverlappedSet)
    add field for first alloca in the Set

Because NonOverlappedSet is a DenseMap, the order we add field is not guaranteed. I think both the sorts and the DenseMap are responsible for the mismatch. I would note this in the comment later.

lxfind added inline comments.Sep 14 2020, 9:22 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
484

hmm good point. In that case we have a real problem here. We would want the compilation result to be deterministic, which means we need a way to guarantee the order when iterating the map.
Why is NonOverlapedAllocas a DenseMap though? Looking at the way you are adding elements to it, seems like it can very well just be a SmallVector.

ChuanqiXu added inline comments.Sep 14 2020, 9:47 PM
llvm/lib/Transforms/Coroutines/CoroFrame.cpp
484

The compilation result is deterministic although we can't get the result by just a watch. I think it is a good idea to let NonOverlappedSet be a SmallVector. However, as we would sort the allocas by size, the order in the frame would still mismatch the order in the source. I think the order of allocas in the frame may not matter a lot.

Gate this optimization under optimization level and put the optimization code into a helper class.

lxfind added inline comments.Sep 17 2020, 11:57 AM
llvm/lib/Passes/PassBuilder.cpp
898

I don't quite understand how the pass registration works... Seems that here we are creating the CoroSplitPass with level passed in already. Why do we need to do it again in PassRegistry.def?

ChuanqiXu added inline comments.Sep 20 2020, 2:17 AM
llvm/lib/Passes/PassBuilder.cpp
898

I think the code in PassRegistry.def is dealing with pass name parsing with opt tool. Maybe we need to invoke coro-split pass individually:

opt -coro-split ...

Thanks for the improvements. I believe the implementation can still be cleaner:
Instead of passing all those parameters to the constructor of FrameTypeBuilder, we could:

  • Move all the implementation details into addFieldForAllocas function. We don't even need the NonOverlappedAllocaInfo class, all variables and the body of run() could go in there.
  • We can change the type of ForSpill member of Field from pointer to a SmallVector of Spill. That way, you can handle ForSpill universally without knowing whether this optimization was turned on or not.
llvm/lib/Passes/PassBuilder.cpp
1728–1733

I think you can use the same code like this, which appears to be simpler.

  • Move all the implementation details into addFieldForAllocas function.
  • Change the type of ForSpill member of Field from pointer to a SmallVector of Spill.
ChuanqiXu added inline comments.Sep 21 2020, 11:07 PM
llvm/lib/Passes/PassBuilder.cpp
1728–1733

I am wondering how should I handle unknown ParamName in this style. Should I just skip it? Or should I return an error? which implies that we should add a new static member to PassBuilder::OptimizationLevel as PassBuilder::OptimizationLevel::Unknown.

lxfind added inline comments.Sep 21 2020, 11:43 PM
llvm/lib/Passes/PassBuilder.cpp
1728–1733

To be honest I don't think coroutine requires a fine-grained optimization level control from 0-3. So it would be sufficient if we just use a bool to indicate whether it will be optimized. O1,O2,O3 will set it to true, otherwise it can stay false.
Alternatively, if you prefer to use OptimizationLevel, I think it's fine to set it to O0 if it's unknown level.

However, at a high level, I am not sure if it's worth all the trouble adding all those changes to PassBuilder just to be able to specify the syntax coro-split<O2> to opt tool. Since you already added a separate flag to control this, which is sufficient to test opt, as long as the optimization is gated when it's invoked through Clang optimization pipeline, I would say it's good enough.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
361

Use ForSpillType?

390

I think we can just define one version of addFieldForAlloca with ForSpillType ForSpills = {}

436

Do we need 2 versions of addField? Would it work if we set ForSpillType ForSpill = {} as the parameter?

ChuanqiXu updated this revision to Diff 293373.Sep 22 2020, 1:17 AM
  • Remove the feature that we can pass optimization level to opt tool like -passes='coro-split<On>'
  • remove redundant addFieldForAlloca and addField
ChuanqiXu added inline comments.Sep 22 2020, 1:20 AM
llvm/lib/Passes/PassBuilder.cpp
1728–1733

I think your suggestion is right. The coroutine split pass don't need a fine-grained optimization control mechanism now.

llvm/lib/Transforms/Coroutines/CoroFrame.cpp
361

thanks, done

390

done

436

done

lxfind added inline comments.Sep 22 2020, 8:47 AM
llvm/lib/Transforms/Coroutines/Coroutines.cpp
62

So what happens if the compiler is passed with "-Os" or "-Oz"?

lxfind accepted this revision.Sep 22 2020, 8:54 AM

Overall LGTM. Thank you for addressing the feedback!
One last thing, the optimization level code can be further simplified. Since you are obtaining the OptLevel from the PassManagerBuilder, which is an integer. I feel it would be easier if you just stick with an integer optlevel instead of converting it to OptimizationLevel. The reason OptimizationLevel exists is to be able to model both OptLevel and SizeLevel at the same time, but here you don't really care about the SizeLevel. So I would suggest just use an int for it and save all the trouble converting.

llvm/lib/Passes/PassBuilder.cpp
1933

Nit: Unnecessary change

llvm/lib/Transforms/Coroutines/Coroutines.cpp
62

Never mind. That won't affect OptLevel.

This revision is now accepted and ready to land.Sep 22 2020, 8:54 AM

Change the type of OptLevel in Coro Split Passes as unsigned instead of PassBuilder::OptimizationLevel.

Overall LGTM. Thank you for addressing the feedback!
One last thing, the optimization level code can be further simplified. Since you are obtaining the OptLevel from the PassManagerBuilder, which is an integer. I feel it would be easier if you just stick with an integer optlevel instead of converting it to OptimizationLevel. The reason OptimizationLevel exists is to be able to model both OptLevel and SizeLevel at the same time, but here you don't really care about the SizeLevel. So I would suggest just use an int for it and save all the trouble converting.

Thanks for your suggestion!

Please remove unnecessary headers include and declarations

lxfind added inline comments.Sep 23 2020, 8:46 PM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

This isn't needed anymore

ChuanqiXu updated this revision to Diff 293983.Sep 24 2020, 2:38 AM

Remove unnecessary headers and declarations.

llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

If CoroSplitPass don't accept PassBuilder::OptimizationLevel as a construct argument, we may need to transfer PassBuilder::OptimizationLevel to unsigned in PassBuilder.cpp.

lxfind added inline comments.Sep 24 2020, 10:03 AM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

Yes just call getSpeedupLevel() on Level when passing it. This optimization is independent to the size optimization flags, just speed.

lxfind added inline comments.Sep 24 2020, 10:05 AM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

Well, I guess it's reasonable to run it when size optimization is turned on. Feel free to just move the call to std::max to the callsite.

lxfind added inline comments.Sep 24 2020, 10:35 AM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

This also gets a bit fuzzy because we are mixing speed level and size level now. It goes back to my earlier suggestion, where we may just need to use a boolean to track whether coroutine needs to be optimized. I doubt we need a fine-grained control on the optimization level for coroutines.

ChuanqiXu updated this revision to Diff 294255.Sep 25 2020, 2:14 AM

instead OptLevel in CoroSplit with bool to control whether the compiler would optimize a coroutine.

ChuanqiXu added inline comments.Sep 25 2020, 2:16 AM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

Yes, I think we should do simple things simple. It is fine to control the optimization behavior by a bool condition.

junparser added inline comments.Sep 26 2020, 9:13 PM
llvm/include/llvm/Transforms/Coroutines/CoroSplit.h
28–32

Would you please change WouldOptimize to other meaningful words.

ChuanqiXu updated this revision to Diff 294544.Sep 27 2020, 2:54 AM

rename wouldOptimize to ReuseFrameSlot.

junparser accepted this revision.Sep 27 2020, 11:00 PM

LGTM, thanks for the patch!