This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.
Details
Diff Detail
Unit Tests
Event Timeline
I think from this we are just one step from removing module lds except for potentially indirect functions. Everything else can be moved into kernel lds structure.
We do have a problem with excessive use of lds in rocBLAS because of the module lds already, so technically we have a regression with it. There is a w/a but it is better be solved sooner rather than later.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp | ||
---|---|---|
276 | Probably, we can change Instruction *I = dyn_cast<Instruction>(U.getUser()); return I && I->getFunction() == F; to Instruction *I = cast<Instruction>(U.getUser()); return I->getFunction() == F; | |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
51 | We probably better avoid recursion when it is possible to avoid. | |
54–60 | Within the pointer replacement patch https://reviews.llvm.org/D103225, we already have this functionality in place - please take a look at the utility function getFunctionToInstsMap(). Probably we can reuse it here. Once we get FunctionToInstsMap, we can consider only key F, and ignore all others. If you think it is a good idea, then, I can create new patch by taking out this utility function from https://reviews.llvm.org/D103225. I leave it to you. |
Pushed new patch at https://reviews.llvm.org/D103661
I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.
I am not sure, if we can completely eliminate module lds. My understanding is that it is still required to handle within non-kernel function used LDS. But, instead of directly dealing with LDS, it should deal with pointers, hence the patch https://reviews.llvm.org/D103225.
Not completely, but we can omit module lds for functions which are only used from a single kernel at the very least.
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp | ||
---|---|---|
266 | Thanks! | |
276 | Not really, there can be non-instruction uses from outside of the kernel. In fact assert above is also not correct, we may not have converted non-kernel constant exprs. | |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
54–60 | It is not the same, I am only interested in instructions with constant expr uses, a much smaller set. | |
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll | ||
43–45 | It converts the whole constant expr, this is how D103661 works (and you have already commented there). I am not sure that is important though. Do you see any benefits of having long constant expressions vs instructions? We have to admit the testcase is quite degenerate too, more a torture test than a real life use. |
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll | ||
---|---|---|
43–45 | Actually these have to be instructions. Reading this huge expression... It uses two lds globals, both constexprs are the innermost. If we replace an innermost expression with an instruction we then have to replace all constantexpr uses with instructions too since a constantexpr cannot use an instruction. |
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll | ||
---|---|---|
43–45 | Moreover, in our case we always replace an innermost expression as it all end up with a GlobalVariable. D103661 could probably stop replacing down the operands if it met the CE we requested to replace already, but that is never our scenario. In this example we would always ask to replace @kern and @both users. I.e. an expression to pass to the helper is 'i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*)' for the first occurrence. It shall trigger the whole operand expression replacement. However, helper function might have been improved to stop earlier if somebody would call it with a bigger expression, say 'i32* addrspacecast (i32 addrspace(3)* bitcast (float addrspace(3)* @both to i32 addrspace(3)*) to i32*'. If that would be a case convertConstantExprsToInstructions() might just stop with producing the addrspacecast instruction and do not convert the inner bitcast. |
LGTM except for the minor comment below,
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
36 | Not sure. what is the reall difference between SmallVector<User *> Stack({U}); and SmallVector<User *> Stack{U}; |
Changed initializer style.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
36 | No real difference except that I am not using c++11 syntax. Let's go in style and save couple symbols anyway. |
llvm/test/CodeGen/AMDGPU/lower-module-lds-constantexpr.ll | ||
---|---|---|
43–45 |
With the change in the D103661 we do not convert @both anymore, which is a believe a right thing, it stays in module lds. Only the tree to @kern is converted. |
FYI. I've just got an assertion in the pass. I'll post a reduced reproducer when I have it.
Meanwhile here' the crash info:
F0616 14:20:09.488221 1150352 logging.cc:107] assert.h assertion failed at third_party/llvm/llvm-project/llvm/include/llvm/Support/Casting.h:269 in typename cast_retty<X, Y *>::ret_type llvm::cast(Y *) [X = llvm::PointerType, Y = llvm::Type]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!" *** Check failure stack trace: *** @ 0x55555d4253df absl::logging_internal::LogMessage::Die() @ 0x55555d424e54 absl::logging_internal::LogMessage::SendToLog() @ 0x55555d424b7f absl::logging_internal::LogMessage::Flush() @ 0x55555d425ae9 absl::logging_internal::LogMessageFatal::~LogMessageFatal() @ 0x55555d4239c4 __assert_fail @ 0x555558fddf77 llvm::GetElementPtrInst::Create() @ 0x55555d1077d8 llvm::ConstantExpr::getAsInstruction() @ 0x55555d201d42 llvm::convertConstantExprsToInstructions() @ 0x55555d20125c llvm::convertConstantExprsToInstructions() @ 0x55555b47e1bd llvm::AMDGPU::replaceConstantUsesInFunction() @ 0x55555b2a40d5 (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS() @ 0x55555b2a31df (anonymous namespace)::AMDGPULowerModuleLDS::runOnModule() @ 0x55555d1bff34 llvm::legacy::PassManagerImpl::run() @ 0x555558fae1b9 (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager() @ 0x555558fa9537 clang::EmitBackendOutput() @ 0x555558fa66a5 clang::BackendConsumer::HandleTranslationUnit() @ 0x555559c210b4 clang::ParseAST() @ 0x5555599cb106 clang::FrontendAction::Execute() @ 0x55555993fdcf clang::CompilerInstance::ExecuteAction() @ 0x555558bdbff3 clang::ExecuteCompilerInvocation() @ 0x555558bcfd54 cc1_main() @ 0x555558bcd6e7 ExecuteCC1Tool() @ 0x555558bcd3fd main @ 0x7ffff7d29bbd __libc_start_main @ 0x555558bca0a9 _start
Minimized reproducer: https://gist.github.com/Artem-B/44d8fa3f1bf0a3c992f4fe5bcf678c3f#file-lds-assert-ll
LLVM version I've tested with: 47f18af55fd59e813144cc76711806d57a160e50
$ bin/opt -amdgpu-lower-module-lds -disable-output LDS-assert.ll
llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp | ||
---|---|---|
264 | There is no need to create any Instructions. You are just replacing one Constant (GV) with another (GEP). |
There is no need to create any Instructions. You are just replacing one Constant (GV) with another (GEP).