This is an archive of the discontinued LLVM Phabricator instance.

[IR] Replace *all* uses of a constant expression by corresponding instruction
ClosedPublic

Authored by hsmhsm on Oct 28 2021, 6:51 AM.

Details

Summary

When a constant expression CE is being converted into a corresponding instruction I,
CE is supposed to be replaced by I. However, it is possible that CE is being used multiple
times within a parent instruction PI. Make sure that *all* the uses of CE within PI are
replaced by I.

Diff Detail

Event Timeline

hsmhsm created this revision.Oct 28 2021, 6:51 AM
hsmhsm requested review of this revision.Oct 28 2021, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2021, 6:51 AM
hsmhsm added a reviewer: foad.Oct 29 2021, 1:34 AM
hsmhsm edited the summary of this revision. (Show Details)
arsenm added inline comments.Oct 29 2021, 6:16 AM
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll
114

I don't understand this test change. You've introduced an illegal pair of addrspacecasts, going from addrspace(3) to addrspace(0) back to addrspace(4)

arsenm requested changes to this revision.Oct 29 2021, 6:17 AM
This revision now requires changes to proceed.Oct 29 2021, 6:17 AM
hsmhsm added inline comments.Oct 29 2021, 9:20 AM
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll
114

Let me first explain in detail, what bug this patch fixes, and then come back to changes within this test.

Consider following Instruction:

      I
    /  \
  C1   C2
 /        \
C3        C3

Let us assume that C3 uses an lds, then we want to convert C3 to an instruction, which in turn forces the paths (C1, C3) and (C2, C3) to convert into instructions.

So, the expected instruction sequence after this constant expression to instruction conversion is as follows by assuming that the path (C1, C2) is processed first, and then the path (C2, C3).

I3          (correspond to C3)
I1 (I3)     (correspond to C1)
I2 (I3)     (correspond to C2)
I  (I1, I2)

However, before this patch, there was a bug, and due to this bug, the instruction sequence was used to be as below.

I3          (correspond to C3) 
I1 (I3)     (correspond to C1)
I2 (C3)     (correspond to C2)
I  (I1, I2)

Notice the difference between correct expected result and buggy result. where buggy result gives I2 (C3) instead of I2 (I3). This patch fixes this issue.

Now coming back to testcase, I have made changes to this testcase by introducing addrspace constant expressions in order to mimic the above explained scenario, and to test that it is properly handled. Here, I really did not care about legal/illegal aspects of addrspacecasts and also I did not bother about the semantics of the constant expression tree.

Since it is a compile time testcase and since we care about only code transformation here, I assumed that it is fine as long as we test for a correct sequence of code transformation after the pass for a given input testcase even though input testcase is illegal from the perspective of actual programming semantics.

However, if you think that we should come-up with a more meaningful testcase here, I will do so.

hsmhsm updated this revision to Diff 383436.Oct 29 2021, 11:08 AM

Rebase and fix review comments by Matt.

hsmhsm marked an inline comment as done.Oct 29 2021, 11:11 AM
rampitec added inline comments.Oct 29 2021, 11:26 AM
llvm/lib/IR/ReplaceConstant.cpp
101

Why not to remove it immediately?

hsmhsm added inline comments.Oct 29 2021, 12:02 PM
llvm/lib/IR/ReplaceConstant.cpp
101

Consider below example, where C2 uses lds. In this case, we have two paths, but both the paths have same sequence, that is (C1, C2). So, it again causes the same bug which https://reviews.llvm.org/D104425 has fixed. That is, we land up replacing already dead constant expression.

      I
    /  \
  C1   C1
 /        \
C2        C2
hsmhsm marked an inline comment as done.Oct 29 2021, 12:06 PM
rampitec added inline comments.Oct 29 2021, 12:10 PM
llvm/lib/IR/ReplaceConstant.cpp
101

I mean, dead users are already dead users. In this case you would just call removeDeadConstantUsers twice, which shall not be a problem as far as I understand. Just why a separate loop is needed?

hsmhsm added inline comments.Oct 29 2021, 12:27 PM
llvm/lib/IR/ReplaceConstant.cpp
101

What I mean here is - when we process one of the duplicate paths, constant expression is dead hence it will get removed. Now, when we try to process other remaining path, both C1 and C2 are already visited, and also they are dead by now, which result in compiler crash as below.

0.      Program arguments: /home/mahesha/ROCm/github/llvm/rel/build/bin/opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds lower-kernel-lds-constexpr.ll -o hsm.ll
 #0 0x00005565dca163cf PrintStackTraceSignalHandler(void*) Signals.cpp:0:0
 #1 0x00005565dca13b7d SignalHandler(int) Signals.cpp:0:0
 #2 0x00007f4cbeccb980 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12980)
 #3 0x00005565db3699b4 llvm::GetElementPtrInst::Create(llvm::Type*, llvm::Value*, llvm::ArrayRef<llvm::Value*>, llvm::Twine const&, llvm::Instruction*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0xa0a9b4)
 #4 0x00005565dc03cd03 llvm::ConstantExpr::getAsInstruction(llvm::Instruction*) const (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x16ddd03)
 #5 0x00005565dc18896d llvm::convertConstantExprsToInstructions(llvm::Instruction*, std::map<llvm::Use*, std::vector<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> >, std::allocator<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> > > >, std::less<llvm::Use*>, std::allocator<std::pair<llvm::Use* const, std::vector<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> >, std::allocator<std::vector<llvm::ConstantExpr*, std::allocator<llvm::ConstantExpr*> > > > > > >&, llvm::SmallPtrSetImpl<llvm::Instruction*>*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x182996d)
 #6 0x00005565dc18ab6f llvm::convertConstantExprsToInstructions(llvm::Instruction*, llvm::ConstantExpr*, llvm::SmallPtrSetImpl<llvm::Instruction*>*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x182bb6f)
 #7 0x00005565dcdca871 llvm::AMDGPU::replaceConstantUsesInFunction(llvm::ConstantExpr*, llvm::Function const*) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x246b871)
 #8 0x00005565db33af7f (anonymous namespace)::AMDGPULowerModuleLDS::processUsedLDS(llvm::Module&, llvm::Function*) AMDGPULowerModuleLDSPass.cpp:0:0
 #9 0x00005565db33ee08 llvm::AMDGPULowerModuleLDSPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x9dfe08)
#10 0x00005565db0de9a1 llvm::detail::PassModel<llvm::Module, llvm::AMDGPULowerModuleLDSPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x77f9a1)
#11 0x00005565dc172f54 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x1813f54)
#12 0x00005565db05f8c8 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::StringRef>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool) (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x7008c8)
#13 0x00005565dafdf821 main (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x680821)
#14 0x00007f4cbd95fbf7 __libc_start_main /build/glibc-S9d2JN/glibc-2.27/csu/../csu/libc-start.c:344:0
#15 0x00005565db05279a _start (/home/mahesha/ROCm/github/llvm/rel/build/bin/opt+0x6f379a)
rampitec accepted this revision.Oct 29 2021, 12:52 PM

LGTM

llvm/lib/IR/ReplaceConstant.cpp
101

I see. Thanks for checking!

arsenm added inline comments.Oct 29 2021, 1:42 PM
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll
129

A more meaningful test that we can really codegen would be more helpful. Also this testcase could use a comment

hsmhsm updated this revision to Diff 383552.Oct 29 2021, 8:34 PM

Fix review comments by Matt.

hsmhsm marked an inline comment as done.Oct 29 2021, 8:41 PM
hsmhsm added inline comments.
llvm/test/CodeGen/AMDGPU/lower-kernel-lds-constexpr.ll
129

Here is the more meaningful testcase - which stores self address.

WRONG OUTPUT WITHOUT THE PATCH:

%1 = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k6.lds.t, %llvm.amdgcn.kernel.k6.lds.t addrspace(3)* @llvm.amdgcn.kernel.k6.lds, i32 0, i32 0), i32 0, i32 2
%2 = ptrtoint i32 addrspace(3)* %1 to i32
store i32 %2, i32 addrspace(3)* getelementptr inbounds ([4 x i32], [4 x i32] addrspace(3)* @lds.5, i32 0, i32 2), align 4
ret void

CORRECT OUTPUT WITH THE PATCH:

%1 = getelementptr inbounds [4 x i32], [4 x i32] addrspace(3)* getelementptr inbounds (%llvm.amdgcn.kernel.k6.lds.t, %llvm.amdgcn.kernel.k6.lds.t addrspace(3)* @llvm.amdgcn.kernel.k6.lds, i32 0, i32 0), i32 0, i32 2
%2 = ptrtoint i32 addrspace(3)* %1 to i32
store i32 %2, i32 addrspace(3)* %1, align 8
ret void

ALSO, THE GFX900 ISA FOR THE SAME:

; %bb.0:
        v_mov_b32_e32 v0, 8
        v_mov_b32_e32 v1, 0
        s_mov_b32 m0, -1
        ds_write_b32 v1, v0 offset:8
        s_endpgm
arsenm accepted this revision.Nov 1 2021, 9:48 AM
This revision is now accepted and ready to land.Nov 1 2021, 9:48 AM
This revision was automatically updated to reflect the committed changes.
hsmhsm marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2022, 4:46 PM

This doesn't work with phi nodes that have multiple uses of the same constantexpr. Specifically it creates replacement instructions for that constantexpr and inserts them on the earliest incoming edge that uses that constantexpr, then replaces all uses with the new instruction. That works iff the earliest incoming edge happens to dominate the other incoming edges that use the same constantexpr, otherwise it emits broken IR.

I can't work out how this is supposed to work. One phase builds a collectConstantExprPaths object and a second uses it. I think the error is in the call to II->replaceUsesOfWith(CE, NI). As a workaround I'm currently using the following as a replacement for these functions which seems to be fine so far but will probably fail the test cases written for the above.

void eliminateConstantExprsFromInstruction(Instruction *I) {
  SmallVector<Instruction *> Worklist{I};

  while (!Worklist.empty()) {
    Instruction *I = Worklist.pop_back_val();
    for (Use &U : I->operands()) {

      auto *BI = I;
      if (auto *Phi = dyn_cast<PHINode>(I)) {
        BasicBlock *BB = Phi->getIncomingBlock(U);
        BI = &(*(BB->getFirstInsertionPt()));
      }

      if (ConstantExpr *C = dyn_cast<ConstantExpr>(U.get())) {
        Instruction *NI = C->getAsInstruction(BI);
        Worklist.push_back(NI);
        U.set(NI);
        C->removeDeadConstantUsers();
      }
    }
  }
}
arsenm added a comment.Aug 5 2022, 9:38 AM

This doesn't work with phi nodes that have multiple uses of the same constantexpr. Specifically it creates replacement instructions for that constantexpr and inserts them on the earliest incoming edge that uses that constantexpr, then replaces all uses with the new instruction. That works iff the earliest incoming edge happens to dominate the other incoming edges that use the same constantexpr, otherwise it emits broken IR.

I can't work out how this is supposed to work. One phase builds a collectConstantExprPaths object and a second uses it. I think the error is in the call to II->replaceUsesOfWith(CE, NI). As a workaround I'm currently using the following as a replacement for these functions which seems to be fine so far but will probably fail the test cases written for the above.

void eliminateConstantExprsFromInstruction(Instruction *I) {
  SmallVector<Instruction *> Worklist{I};

  while (!Worklist.empty()) {
    Instruction *I = Worklist.pop_back_val();
    for (Use &U : I->operands()) {

      auto *BI = I;
      if (auto *Phi = dyn_cast<PHINode>(I)) {
        BasicBlock *BB = Phi->getIncomingBlock(U);
        BI = &(*(BB->getFirstInsertionPt()));
      }

      if (ConstantExpr *C = dyn_cast<ConstantExpr>(U.get())) {
        Instruction *NI = C->getAsInstruction(BI);
        Worklist.push_back(NI);
        U.set(NI);
        C->removeDeadConstantUsers();
      }
    }
  }
}

With the plan to drop most constant expressions, could we start opting LDS globals out of constant expressions?

With the plan to drop most constant expressions, could we start opting LDS globals out of constant expressions?

Appealing, but I think one of the things we're going to want to do with LDS globals is build arrays containing constantexpr pointing into them. Presently looking at patching lower module lds to not do this at all but I think that's thwarted by constantexpr being folded together.

Example repro.

; RUN: opt -S -mtriple=amdgcn-- -amdgpu-lower-module-lds < %s | FileCheck %s
; RUN: opt -S -mtriple=amdgcn-- -passes=amdgpu-lower-module-lds < %s | FileCheck %s

@var = addrspace(3) global i32 undef, align 4

define amdgpu_kernel void @func(i32 %c) {
entry:
  switch i32 %c, label %return [
    i32 0, label %bb0
    i32 1, label %bb1
  ]

bb0:
  br label %bb2

bb1:
  br label %bb2

bb2:
  %tmp = phi ptr [ addrspacecast (ptr addrspace(3) @var to ptr), %bb0 ],
                 [ addrspacecast (ptr addrspace(3) @var to ptr), %bb1 ]
  br label %return

return:
  ret void
}
Instruction does not dominate all uses!
  %0 = addrspacecast ptr addrspace(3) @llvm.amdgcn.kernel.func.lds to ptr
  %tmp = phi ptr [ %0, %bb0 ], [ %0, %bb1 ]