This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Add pass to replace out arguments
ClosedPublic

Authored by arsenm on Jul 27 2017, 8:19 PM.

Details

Summary

It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.

I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.

I'm also not sure where it should run.I think it should
run before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.

Diff Detail

Event Timeline

arsenm created this revision.Jul 27 2017, 8:19 PM
rampitec added inline comments.Jul 27 2017, 9:34 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
120

I'm not sure you actually can do it on a non-private. Other address spaces are externally visible and stores to them shall not be easily eliminated. The exception is a non-aliasing LDS variable with function local scope.

300

Consider also marking it always inline.

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
362

Check opt level.

arsenm added inline comments.Jul 27 2017, 10:30 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
120

The memory access isn't actually eliminated here, it's just moved to the stub function. and relies on other passes to fix it so it is OK. The hope with private is it the temporary alloca will be SROAable

300

The stub function is marked as always inline. The new function here steals the body of the old function, the original function is then marked always inline

rampitec added inline comments.Jul 27 2017, 10:37 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
120

This is true, but I do not believe other passes would be able to eliminate it. It looks like just an overhead.

300

I mean stub gets original attributes. The new function gets original attributes + always inline. That way it will be always inlined into stub if otherwise not simplified. I.e. if at the end call to a stub will not be replaced with a call to the new function we will have no extra call.

You can also check the heuristic isWrapperOnlyCall() in the AMDInline.cpp from HSAIL, which basically implements the same logic: inline everything into a wrapper/stub.

arsenm added inline comments.Jul 27 2017, 10:40 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
300

This is what happens, it's just mechanically the original IR function becomes the stub

rampitec added inline comments.Jul 27 2017, 10:42 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
300

If the original had no always inline, new will not get it either. In a worst case we can end up with two calls.

arsenm added inline comments.Jul 27 2017, 10:48 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
300

Line 367

rampitec added inline comments.Jul 27 2017, 11:16 PM
lib/Target/AMDGPU/AMDGPURewriteOutArguments.cpp
300

OK, probably inlining the stub itself is as good as inlining original into the stub. Cleared, thanks.

rampitec accepted this revision.Jul 27 2017, 11:16 PM

LGTM, but I would suggest checking opt level before adding to the PM.

This revision is now accepted and ready to land.Jul 27 2017, 11:16 PM

LGTM, but I would suggest checking opt level before adding to the PM.

I think I'll just drop adding the pass for now until I figure out where best to put it. Also it's probably better to put calls without this through more testing before enabling this by default

LGTM, but I would suggest checking opt level before adding to the PM.

I think I'll just drop adding the pass for now until I figure out where best to put it. Also it's probably better to put calls without this through more testing before enabling this by default

You can always add an option off by default.

arsenm closed this revision.Jul 28 2017, 11:40 AM

r309416