This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Handle constant LDS uses from different kernels
ClosedPublic

Authored by rampitec on Jun 3 2021, 4:07 PM.

Details

Summary

This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.

Diff Detail

Event Timeline

rampitec created this revision.Jun 3 2021, 4:07 PM
rampitec requested review of this revision.Jun 3 2021, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 3 2021, 4:07 PM
Herald added a subscriber: wdng. · View Herald Transcript

@hsmhsm It depends on the convertConstantExprsToInstructions() you have implemented in the D103225. I would appreciate if you extract the helper into a separate review. Kudos for the helper BTW.

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

rampitec added a subscriber: tra.Jun 3 2021, 4:35 PM
hsmhsm added inline comments.Jun 4 2021, 1:20 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
307

Probably, we can change

Instruction *I = dyn_cast<Instruction>(U.getUser());
return I && I->getFunction() == F;

to

Instruction *I = cast<Instruction>(U.getUser());
return I->getFunction() == F;
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
51

We probably better avoid recursion when it is possible to avoid.

54–60

Within the pointer replacement patch https://reviews.llvm.org/D103225, we already have this functionality in place - please take a look at the utility function getFunctionToInstsMap(). Probably we can reuse it here. Once we get FunctionToInstsMap, we can consider only key F, and ignore all others.

If you think it is a good idea, then, I can create new patch by taking out this utility function from https://reviews.llvm.org/D103225. I leave it to you.

hsmhsm added a comment.Jun 4 2021, 2:20 AM

@hsmhsm It depends on the convertConstantExprsToInstructions() you have implemented in the D103225. I would appreciate if you extract the helper into a separate review. Kudos for the helper BTW.

Pushed new patch at https://reviews.llvm.org/D103661

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.

foad added inline comments.Jun 4 2021, 6:44 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
297

Can this be for (auto &U : make_early_inc_range(GV->users()))?

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45

Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs?

rampitec updated this revision to Diff 349932.Jun 4 2021, 11:41 AM
rampitec marked 3 inline comments as done.

Addressed review comments.

I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.

We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.

I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.

Not completely, but we can omit module lds for functions which are only used from a single kernel at the very least.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
297

Thanks!

307

Not really, there can be non-instruction uses from outside of the kernel. In fact assert above is also not correct, we may not have converted non-kernel constant exprs.

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
54–60

It is not the same, I am only interested in instructions with constant expr uses, a much smaller set.

llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45

It converts the whole constant expr, this is how D103661 works (and you have already commented there). I am not sure that is important though. Do you see any benefits of having long constant expressions vs instructions? We have to admit the testcase is quite degenerate too, more a torture test than a real life use.

rampitec added inline comments.Jun 4 2021, 2:16 PM
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45

Actually these have to be instructions. Reading this huge expression... It uses two lds globals, both constexprs are the innermost. If we replace an innermost expression with an instruction we then have to replace all constantexpr uses with instructions too since a constantexpr cannot use an instruction.

rampitec added inline comments.Jun 4 2021, 2:29 PM
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45

Moreover, in our case we always replace an innermost expression as it all end up with a GlobalVariable. D103661 could probably stop replacing down the operands if it met the CE we requested to replace already, but that is never our scenario.

In this example we would always ask to replace @kern and @both users. I.e. an expression to pass to the helper is 'i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*)' for the first occurrence. It shall trigger the whole operand expression replacement.

However, helper function might have been improved to stop earlier if somebody would call it with a bigger expression, say 'i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*) to i32*'. If that would be a case convertConstantExprsToInstructions() might just stop with producing the addrspacecast instruction and do not convert the inner bitcast.

hsmhsm accepted this revision.Jun 6 2021, 8:23 AM

LGTM except for the minor comment below,

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
36

Not sure. what is the reall difference between SmallVector<User *> Stack({U}); and SmallVector<User *> Stack{U};

This revision is now accepted and ready to land.Jun 6 2021, 8:23 AM
rampitec updated this revision to Diff 350351.Jun 7 2021, 10:24 AM
rampitec marked an inline comment as done.

Changed initializer style.

llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp
36

No real difference except that I am not using c++11 syntax. Let's go in style and save couple symbols anyway.

rampitec marked an inline comment as done.Jun 7 2021, 12:45 PM
rampitec added inline comments.
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll
43–45

Why do %4, %5, %6 need to be Instructions? Couldn't they could be left as ConstantExprs?

With the change in the D103661 we do not convert @both anymore, which is a believe a right thing, it stays in module lds. Only the tree to @kern is converted.

This revision was landed with ongoing or failed builds.Jun 7 2021, 3:44 PM
This revision was automatically updated to reflect the committed changes.
tra added a comment.Jun 16 2021, 2:26 PM

FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:

F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55555d4253df  absl::logging_internal::LogMessage::Die()
    @     0x55555d424e54  absl::logging_internal::LogMessage::SendToLog()
    @     0x55555d424b7f  absl::logging_internal::LogMessage::Flush()
    @     0x55555d425ae9  absl::logging_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55555d4239c4  __assert_fail
    @     0x555558fddf77  llvm::GetElementPtrInst::Create()
    @     0x55555d1077d8  llvm::ConstantExpr::getAsInstruction()
    @     0x55555d201d42  llvm::convertConstantExprsToInstructions()
    @     0x55555d20125c  llvm::convertConstantExprsToInstructions()
    @     0x55555b47e1bd  llvm::AMDGPU::replaceConstantUsesInFunction()
    @     0x55555b2a40d5  (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS()
    @     0x55555b2a31df  (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule()
    @     0x55555d1bff34  llvm::legacy::PassManagerImpl::run()
    @     0x555558fae1b9  (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager()
    @     0x555558fa9537  clang::EmitBackendOutput()
    @     0x555558fa66a5  clang::BackendConsumer::HandleTranslationUnit()
    @     0x555559c210b4  clang::ParseAST()
    @     0x5555599cb106  clang::FrontendAction::Execute()
    @     0x55555993fdcf  clang::CompilerInstance::ExecuteAction()
    @     0x555558bdbff3  clang::ExecuteCompilerInvocation()
    @     0x555558bcfd54  cc1_main()
    @     0x555558bcd6e7  ExecuteCC1Tool()
    @     0x555558bcd3fd  main
    @     0x7ffff7d29bbd  __libc_start_main
    @     0x555558bca0a9  _start

FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:

F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
*** Check failure stack trace: ***
    @     0x55555d4253df  absl::logging_internal::LogMessage::Die()
    @     0x55555d424e54  absl::logging_internal::LogMessage::SendToLog()
    @     0x55555d424b7f  absl::logging_internal::LogMessage::Flush()
    @     0x55555d425ae9  absl::logging_internal::LogMessageFatal::~LogMessageFatal()
    @     0x55555d4239c4  __assert_fail
    @     0x555558fddf77  llvm::GetElementPtrInst::Create()
    @     0x55555d1077d8  llvm::ConstantExpr::getAsInstruction()
    @     0x55555d201d42  llvm::convertConstantExprsToInstructions()
    @     0x55555d20125c  llvm::convertConstantExprsToInstructions()
    @     0x55555b47e1bd  llvm::AMDGPU::replaceConstantUsesInFunction()
    @     0x55555b2a40d5  (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS()
    @     0x55555b2a31df  (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule()
    @     0x55555d1bff34  llvm::legacy::PassManagerImpl::run()
    @     0x555558fae1b9  (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager()
    @     0x555558fa9537  clang::EmitBackendOutput()
    @     0x555558fa66a5  clang::BackendConsumer::HandleTranslationUnit()
    @     0x555559c210b4  clang::ParseAST()
    @     0x5555599cb106  clang::FrontendAction::Execute()
    @     0x55555993fdcf  clang::CompilerInstance::ExecuteAction()
    @     0x555558bdbff3  clang::ExecuteCompilerInvocation()
    @     0x555558bcfd54  cc1_main()
    @     0x555558bcd6e7  ExecuteCC1Tool()
    @     0x555558bcd3fd  main
    @     0x7ffff7d29bbd  __libc_start_main
    @     0x555558bca0a9  _start

Thanks for the report! Reproducer will be handy.

tra added a comment.Jun 16 2021, 3:08 PM

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

Thanks, reproduced.

Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll

LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50

$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll

Thanks, reproduced.

D104425

foad added inline comments.Jun 17 2021, 1:44 AM
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
295

There is no need to create any Instructions. You are just replacing one Constant (GV) with another (GEP).