When a constant expression CE is being converted into a corresponding instruction I,
CE is supposed to be replaced by I. However, it is possible that CE is being used multiple
times within a parent instruction PI. Make sure that *all* the uses of CE within PI are
replaced by I.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll | ||
---|---|---|
114 | I don't understand this test change. You've introduced an illegal pair of addrspacecasts, going from addrspace(3) to addrspace(0) back to addrspace(4) |
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll | ||
---|---|---|
114 | Let me first explain in detail, what bug this patch fixes, and then come back to changes within this test. Consider following Instruction: I / \ C1 C2 / \ C3 C3 Let us assume that C3 uses an lds, then we want to convert C3 to an instruction, which in turn forces the paths (C1, C3) and (C2, C3) to convert into instructions. So, the expected instruction sequence after this constant expression to instruction conversion is as follows by assuming that the path (C1, C2) is processed first, and then the path (C2, C3). I3 (correspond to C3) I1 (I3) (correspond to C1) I2 (I3) (correspond to C2) I (I1, I2) However, before this patch, there was a bug, and due to this bug, the instruction sequence was used to be as below. I3 (correspond to C3) I1 (I3) (correspond to C1) I2 (C3) (correspond to C2) I (I1, I2) Notice the difference between correct expected result and buggy result. where buggy result gives I2 (C3) instead of I2 (I3). This patch fixes this issue. Now coming back to testcase, I have made changes to this testcase by introducing addrspace constant expressions in order to mimic the above explained scenario, and to test that it is properly handled. Here, I really did not care about legal/illegal aspects of addrspacecasts and also I did not bother about the semantics of the constant expression tree. Since it is a compile time testcase and since we care about only code transformation here, I assumed that it is fine as long as we test for a correct sequence of code transformation after the pass for a given input testcase even though input testcase is illegal from the perspective of actual programming semantics. However, if you think that we should come-up with a more meaningful testcase here, I will do so. |
llvm/lib/IR/ReplaceConstant.cpp | ||
---|---|---|
93 | Why not to remove it immediately? |
llvm/lib/IR/ReplaceConstant.cpp | ||
---|---|---|
93 | Consider below example, where C2 uses lds. In this case, we have two paths, but both the paths have same sequence, that is (C1, C2). So, it again causes the same bug which https://reviews.llvm.org/D104425 has fixed. That is, we land up replacing already dead constant expression. I / \ C1 C1 / \ C2 C2 |
llvm/lib/IR/ReplaceConstant.cpp | ||
---|---|---|
93 | I mean, dead users are already dead users. In this case you would just call removeDeadConstantUsers twice, which shall not be a problem as far as I understand. Just why a separate loop is needed? |
llvm/lib/IR/ReplaceConstant.cpp | ||
---|---|---|
93 | What I mean here is - when we process one of the duplicate paths, constant expression is dead hence it will get removed. Now, when we try to process other remaining path, both C1 and C2 are already visited, and also they are dead by now, which result in compiler crash as below. 0. Program arguments: /home/mahesha/ROCm/github/llvm/rel/build/bin/opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds lower-kernel-lds-constexpr.ll -o hsm.ll #0 0x00005565dca163cf PrintStackTraceSignalHandler(void*) Signals.cpp:0:0 #1 0x00005565dca13b7d SignalHandler(int) Signals.cpp:0:0 #2 0x00007f4cbeccb980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980) #3 0x00005565db3699b4 llvm::GetElementPtrInst::Create(llvm::Type*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::Twine const&, llvm::Instruction*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0xa0a9b4) #4 0x00005565dc03cd03 llvm::ConstantExpr::getAsInstruction(llvm::Instruction*) const (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x16ddd03) #5 0x00005565dc18896d llvm::convertConstantExprsToInstructions(llvm::Instruction*, std::map<llvm::Use*, std::vector<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> >, std::allocator<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> > > >, std::less<llvm::Use*>, std::allocator<std::pair<llvm::Use* const, std::vector<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> >, std::allocator<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> > > > > > >&, llvm::SmallPtrSetImpl<llvm::Instruction*>*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x182996d) #6 0x00005565dc18ab6f llvm::convertConstantExprsToInstructions(llvm::Instruction*, llvm::ConstantExpr*, llvm::SmallPtrSetImpl<llvm::Instruction*>*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x182bb6f) #7 0x00005565dcdca871 llvm::AMDGPU::replaceConstantUsesInFunction(llvm::ConstantExpr*, llvm::Function const*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x246b871) #8 0x00005565db33af7f (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS(llvm::Module&, llvm::Function*) AMDGPULowerModuleLDSPass.cpp:0:0 #9 0x00005565db33ee08 llvm::AMDGPULowerModuleLDSPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x9dfe08) #10 0x00005565db0de9a1 llvm::detail::PassModel<llvm::Module, llvm::AMDGPULowerModuleLDSPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x77f9a1) #11 0x00005565dc172f54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x1813f54) #12 0x00005565db05f8c8 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x7008c8) #13 0x00005565dafdf821 main (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x680821) #14 0x00007f4cbd95fbf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0 #15 0x00005565db05279a _start (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x6f379a) |
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll | ||
---|---|---|
129 | A more meaningful test that we can really codegen would be more helpful. Also this testcase could use a comment |
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll | ||
---|---|---|
129 | Here is the more meaningful testcase - which stores self address. WRONG OUTPUT WITHOUT THE PATCH: %1 = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k6.lds.t, %llvm.amdgcn.kernel.k6.lds.t addrspace(3)* @llvm.amdgcn.kernel.k6.lds, i32 0, i32 0), i32 0, i32 2 %2 = ptrtoint i32 addrspace(3)* %1 to i32 store i32 %2, i32 addrspace(3)* getelementptr inbounds ([4 x i32], [4 x i32] addrspace(3)* @lds.5, i32 0, i32 2), align 4 ret void CORRECT OUTPUT WITH THE PATCH: %1 = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k6.lds.t, %llvm.amdgcn.kernel.k6.lds.t addrspace(3)* @llvm.amdgcn.kernel.k6.lds, i32 0, i32 0), i32 0, i32 2 %2 = ptrtoint i32 addrspace(3)* %1 to i32 store i32 %2, i32 addrspace(3)* %1, align 8 ret void ALSO, THE GFX900 ISA FOR THE SAME: ; %bb.0: v_mov_b32_e32 v0, 8 v_mov_b32_e32 v1, 0 s_mov_b32 m0, -1 ds_write_b32 v1, v0 offset:8 s_endpgm |
This doesn't work with phi nodes that have multiple uses of the same constantexpr. Specifically it creates replacement instructions for that constantexpr and inserts them on the earliest incoming edge that uses that constantexpr, then replaces all uses with the new instruction. That works iff the earliest incoming edge happens to dominate the other incoming edges that use the same constantexpr, otherwise it emits broken IR.
I can't work out how this is supposed to work. One phase builds a collectConstantExprPaths object and a second uses it. I think the error is in the call to II->replaceUsesOfWith(CE, NI). As a workaround I'm currently using the following as a replacement for these functions which seems to be fine so far but will probably fail the test cases written for the above.
void eliminateConstantExprsFromInstruction(Instruction *I) { SmallVector<Instruction *> Worklist{I}; while (!Worklist.empty()) { Instruction *I = Worklist.pop_back_val(); for (Use &U : I->operands()) { auto *BI = I; if (auto *Phi = dyn_cast<PHINode>(I)) { BasicBlock *BB = Phi->getIncomingBlock(U); BI = &(*(BB->getFirstInsertionPt())); } if (ConstantExpr *C = dyn_cast<ConstantExpr>(U.get())) { Instruction *NI = C->getAsInstruction(BI); Worklist.push_back(NI); U.set(NI); C->removeDeadConstantUsers(); } } } }
With the plan to drop most constant expressions, could we start opting LDS globals out of constant expressions?
Appealing, but I think one of the things we're going to want to do with LDS globals is build arrays containing constantexpr pointing into them. Presently looking at patching lower module lds to not do this at all but I think that's thwarted by constantexpr being folded together.
Example repro.
; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s | FileCheck %s ; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s | FileCheck %s @var = addrspace(3) global i32 undef, align 4 define amdgpu_kernel void @func(i32 %c) { entry: switch i32 %c, label %return [ i32 0, label %bb0 i32 1, label %bb1 ] bb0: br label %bb2 bb1: br label %bb2 bb2: %tmp = phi ptr [ addrspacecast (ptr addrspace(3) @var to ptr), %bb0 ], [ addrspacecast (ptr addrspace(3) @var to ptr), %bb1 ] br label %return return: ret void }
Instruction does not dominate all uses! %0 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.func.lds to ptr %tmp = phi ptr [ %0, %bb0 ], [ %0, %bb1 ]
Why not to remove it immediately?