This is an archive of the discontinued LLVM Phabricator instance.

[ArgPromotion][AMDGPU] New MSSA-based function argument promotion pass with input/output argument support
Needs RevisionPublic

Authored by vpykhtin on Feb 4 2022, 9:34 AM.

Details

Reviewers
rampitec
arsenm
foad
jdoerfert
asbirlea
aeubanks
nikic
Group Reviewers
Restricted Project
Summary

Targets like AMDGPU can have performance benefit when replacing an argument passed by
reference with an argument passed by value on input and/or return value on output.
This isn't only simplifies the function body but also allows SROA for allocas for such
pointer arguments and helps analysis passes which quit on escaping pointers.

Although submitted very late this can be viewed as an early-preview. This definitely
requires more testing but I would like to know if I'm missing something fundamental.
Despite that this code has already been tested on some applications and seem working
to some extent.

I decided not to modify existing ArgumentPromotion pass because the change is substantial
and the new pass would allow gradual involvement for the concerned targets. Biggest change
is the use of MSSA for clobber testing and support for input/output arguments. So far GEPs
aren't supported but supposed to.

ArgumentPromotion code has been used as the source of inspiration and fragments of code. Another
used source is 'promoteLoopAccessesToScalars' function from LICM which does very similar thing
for loops. The way how a pointer can be treated as a safe to promote for an output argument
has been borrowed from there.

The original ArgumentPromotion has been submitted by Chris Lattner in 2004 and has another 78
contributors since then so I'm not really sure who to add as a reviewer here, please advise or add.

Clobber testing using MSSA is the most critical part here not only for performance reasons but
also for correctness - I would like it to be thoroughly reviewed.

Diff Detail

Event Timeline

vpykhtin created this revision.Feb 4 2022, 9:34 AM
vpykhtin requested review of this revision.Feb 4 2022, 9:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2022, 9:34 AM

This looks like a better version of AMDGPURewriteOutArguments which was never enabled. I wanted to replace that to use MSSA

Can you check if the testcases in llvm/test/CodeGen/AMDGPU/rewrite-out-arguments.ll are redundant and/or covered by the ones you add here?

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
176

dyn_cast instead of isa and cast

179

dyn_cast instead of isa and cast

185

No else after return

325–327

Weird formatting

814

The terminator can't be null

825

I would assume this happens in the regular verifier with expensive checks?

857

Demorgan this

913

Capitalize

929

Braces

955

Braces

1386

It's weird to have raw deletes in llvm code, why do you need this here?

llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
1104

I'd like to see some tests with more exotic memory operations (atomics, memcpy/memset, target memory intrinsics etc.).

I also don't see negative tests for some of the skipped conditions (e.g. musttail). Can you also add a test for invoke, and captured function address

arsenm added inline comments.Feb 4 2022, 11:10 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746

Why require all the way to -O3? -O2?

rampitec added inline comments.Feb 4 2022, 12:04 PM
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
308

else after continue is not needed.

780

A single function and BBEntryValue/BBExitValue as an argument?

1345

Don't you need to add at least MemorySSAWrapperPass and AAResultsWrapperPass?

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

arsenm added a comment.Feb 4 2022, 1:02 PM

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

nikic requested changes to this revision.Feb 7 2022, 12:30 AM
nikic added a subscriber: nikic.

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

This revision now requires changes to proceed.Feb 7 2022, 12:30 AM
vpykhtin updated this revision to Diff 406419.Feb 7 2022, 6:05 AM

Some of the per review issues addressed.

arsenm added a comment.Feb 7 2022, 6:11 AM

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

For return values, we can support way more values returned in registers than in pointer passed parameters. The same applies to passed parameters, we could pull more arguments into registers

vpykhtin marked 11 inline comments as done.Feb 7 2022, 6:29 AM

Can you check if the testcases in llvm/test/CodeGen/AMDGPU/rewrite-out-arguments.ll are redundant and/or covered by the ones you add here?

@arsenm I looked into the test and its not directly suitable for this pass as behaives differently, however the cases can be used for testing.

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746

I thought its quite "agressive" kind of opt, may be 02 should suite better.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
325–327

lint however tries to convince me the previous formatting was better.

825

Those checks aren't enough here because I modify MSSA and then reuse it for the next promoted argument.

1345

Well I cannot use it because this is a CallGraph pass and those analyses are available per function. Instead I use FunctionAnalysisManager to get per function results, the drawback is that I have to invalidate it. However this is only for the legacy pass manager.

1386

It was taken from ArgumentPromotion code, will take a look if its really needed.

llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
1104

Right, will add more tests. However current implementation is quite conservative as relies on MSSA to find clobbers, it requires additional false-clobber checks as was done by Stas in https://reviews.llvm.org/D118419.

ormris removed a subscriber: ormris.Feb 7 2022, 10:29 AM

This probably needs to be limited to only private and flat pointers in case of the AMDGPU. A target callback may be needed to check if a pointer argument is beneficial to promote.

I don't see why the type matters at all

The idea is to allow SROA in a caller, what's the point of doing it on a non-alloca pointers?

For return values, we can support way more values returned in registers than in pointer passed parameters. The same applies to passed parameters, we could pull more arguments into registers

Hm... That makes sense. We have discussed it offline with Valery, likely we do not need a target check for profitability. We would however want to limit a number of such promotions so that we do not accidentally turn a global store into a private store if we ran out of output registers.

nikic added a comment.Feb 8 2022, 2:32 AM

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

This method is deprecated and will be removed in the future, see https://llvm.org/docs/OpaquePointers.html for context.

Please ensure your implementation does not depend on calls to getPointerElementType() -- you can determine the type based on load/store instructions.

@nikic Thanks, why is this better? (half done BTW)

This method is deprecated and will be removed in the future, see https://llvm.org/docs/OpaquePointers.html for context.

See https://reviews.llvm.org/D118685 changes to the original argpromo pass to make it work with opaque pointers (which makes it nicer in general).

vpykhtin updated this revision to Diff 408386.Feb 14 2022, 5:38 AM
vpykhtin marked 2 inline comments as done.

@nikic please take a look now, I've addressed the issue with opaque pointers.

Also changed namings for arguments and loaded values so they match existing pass to simplify testing.

vpykhtin updated this revision to Diff 408460.Feb 14 2022, 9:39 AM

Updated opaque-ptr.ll test to use ALL prefix to simplify checks

vpykhtin updated this revision to Diff 409608.Feb 17 2022, 5:35 AM

@arsenm I've added inoutargs2.ll test which is modified rewrite_out_arguments.ll. Original checks are saved with REF prefix, differences marked with "; ***" prefix, please take a look. Most of the differences are related to bitcasts which my patch doesn't support yet, I'm not sure if it should be included in this commit

rampitec added inline comments.Feb 17 2022, 10:54 AM
llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
1

Fix file mode.

nikic added inline comments.Feb 18 2022, 3:41 AM
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
711

For the actual promotion, have you considered making use of PromoteMemToReg? Basically, replace the old argument with an alloca that is stored on entry and read on exit, and then run mem2reg on that alloca?

define i32 @test(i32 %arg) {
  %old.arg = alloca i32
  store i32, i32* %old.arg
  // Code uses old.arg
  %ret = load i32, i32* %old.arg
  ret i32 %ret
}
llvm/test/Transforms/ArgumentPromotion/inoutargs.ll
1

Please use update_test_checks.py.

vpykhtin added inline comments.Feb 18 2022, 11:25 AM
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
711

I think this is a good idea to reuse the working code, however I would rather make some sort of refactoring on it than trying to please it :) There is similar code in LICM promoteLoopAccessesToScalars, so it seems we would benefit of such mem2reg framework.

xbolva00 added inline comments.
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
1

So.. better version of ArgumentPromotion? Any plans to replace ArgumentPromotion with MSSAArgPromotionPass?

vpykhtin added inline comments.Feb 20 2022, 3:11 AM
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
1

For now it lacks of GEP, bitcast and byval struct passing support but eventually it will be added. I'm not sure if every target would benefit of [in/]out argument support and assume it would be done on per target basis: at least it would require to take in account added register pressure (if its an issue) with target-specific callback. There is no such callback in this patch yet and I'm going to add it later.

nikic added a comment.Feb 20 2022, 3:35 AM

From a cursory look at the implementation, does this handle unwinding properly?

store i32 0, ptr %arg
call void @may_unwind() readnone
store i32 1, ptr %arg

I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
711

LICM uses the LoadAndStorePromoter utility instead -- I'm not really familiar with the differences between these approaches. From what I can tell, LoadAndStorePromoter is SSAUpdater-based and can work without DomTree, while PromoteMemToReg is more similar to the code you use here, in that it uses IDF.

llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
5

What's up with these REF labels?

From a cursory look at the implementation, does this handle unwinding properly?

store i32 0, ptr %arg
call void @may_unwind() readnone
store i32 1, ptr %arg

I believe promotion is not possible in this case, because there's no good way to provide the new value on the unwind path to the caller.

Thanks, I completely missed this part and its not working correctly now, I will fix that.

llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
711

Yes. you’re right PromoteMemToReg is good here and it does much more – handles intrinsics, debug info etc. It’s alloca centered but I think it’s relatively easy to make it more generic: separate the part which decides if the transformation is safe and the part that actually performs it. It uses LargeBlockInfo to keep track of instruction ordering which can be substituted with similar MSSA functionality if it’s present.

llvm/test/Transforms/ArgumentPromotion/inoutargs2.ll
5

We have AMDGPURewriteOutArguments class which has similar functionality but works differently. @arsenm has tests for it and I put it here (modified in a way it suits my pass) so we could compare covered testcases. REFs are original checks from rewrite-out-arguments.ll so we could see the difference, maybe they should be removed later.

arsenm added inline comments.Nov 16 2022, 3:41 PM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
746

I think it should be in the default, which is -O2

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 3:41 PM
nlopes added inline comments.Nov 17 2022, 12:21 AM
llvm/lib/Transforms/IPO/MSSAArgPromotion.cpp
792

Please use PoisonValue instead of UndefValue whenever possible as we are trying to remove undef from LLVM.
Thank you!

nikic requested changes to this revision.Mar 7 2023, 1:38 AM

Marking this as changes requested per the unwinding comment above.

I think I also got a mail at some point about the removed isMustSet() function. I think the best way to do that now is to use BatchAA and do a separate isMustAlias() query, which should be cached at that point.

This revision now requires changes to proceed.Mar 7 2023, 1:38 AM

Going to reclaim this work