PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore.
Internal testing shows no noticeable performance impact.
Paths
| Differential D156398
[AMDGPU] Remove post-PromoteAlloca SROA run ClosedPublic Authored by Pierre-vh on Jul 27 2023, 12:15 AM.
Details
Summary PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore. Internal testing shows no noticeable performance impact.
Diff Detail
Event Timelinearsenm added inline comments.
This revision is now accepted and ready to land.Jul 27 2023, 11:22 AM Comment Actions @arsenm There is a crash in amdgpu_isel.test from check-llvm-tools-updatetestchecks ; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 | FileCheck %s define i64 @i64_test(i64 %i) nounwind readnone { %loc = alloca i64 %j = load i64, i64 * %loc %r = add i64 %i, %j ret i64 %r } The FrameIndex fails: t25: i32,ch = load<(dereferenceable load (s32) from %ir.loc, align 8)> # D:1 t0, FrameIndex:i64<0>, undef:i64 LLVM ERROR: Cannot select: t6: i64 = FrameIndex<0> In function: i64_test Likewise there is a regression in amdgpu_generated_funcs.ll tests. I think SROA was doing some (good) stuff for these tests. I would suggest to just use O3 there and open a separate (internal or external) ticket for that ISel failure. What do you think? Comment Actions
This test is just broken. It's using the wrong address space for the alloca and sroa just happened to delete it. Also this should use opaque pointers
This revision is now accepted and ready to land.Aug 10 2023, 2:59 PM Closed by commit rG89e91e4c0c60: [AMDGPU] Remove post-PromoteAlloca SROA run (authored by Pierre-vh). · Explain WhyAug 10 2023, 11:29 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 549255 llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-sibling-call.ll
llvm/test/CodeGen/AMDGPU/captured-frame-index.ll
llvm/test/CodeGen/AMDGPU/cgp-addressing-modes.ll
llvm/test/CodeGen/AMDGPU/extload-private.ll
llvm/test/CodeGen/AMDGPU/frame-index-elimination.ll
llvm/test/CodeGen/AMDGPU/ipra.ll
llvm/test/CodeGen/AMDGPU/llc-pipeline.ll
llvm/test/CodeGen/AMDGPU/load-hi16.ll
llvm/test/CodeGen/AMDGPU/load-lo16.ll
llvm/test/CodeGen/AMDGPU/nested-calls.ll
llvm/test/CodeGen/AMDGPU/parallelandifcollapse.ll
llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
llvm/test/CodeGen/AMDGPU/sibling-call.ll
llvm/test/CodeGen/AMDGPU/store-hi16.ll
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll.expected
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.generated.expected
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_generated_funcs.ll.nogenerated.expected
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected
|
This is a question for my curiosity. Presumably if the 80 bytes/lane for %alloca is now on the stack, shouldn't we expect some other value like VGPRs to go down by 80 bytes (-20 VGPRs)?