This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove post-PromoteAlloca SROA run
ClosedPublic

Authored by Pierre-vh on Jul 27 2023, 12:15 AM.

Details

Reviewers
arsenm
foad
Group Reviewers
Restricted Project
Commits
rG89e91e4c0c60: [AMDGPU] Remove post-PromoteAlloca SROA run
Summary

PromoteAlloca now uses SSAUpdater, it doesn't need SROA to clean-up after it anymore.

Internal testing shows no noticeable performance impact.

Diff Detail

Event Timeline

Pierre-vh created this revision.Jul 27 2023, 12:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 12:15 AM
Pierre-vh requested review of this revision.Jul 27 2023, 12:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2023, 12:15 AM
Joe_Nash added inline comments.
llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
182

This is a question for my curiosity. Presumably if the 80 bytes/lane for %alloca is now on the stack, shouldn't we expect some other value like VGPRs to go down by 80 bytes (-20 VGPRs)?

arsenm accepted this revision.Jul 27 2023, 11:22 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/resource-optimization-remarks.ll
182

Mechanically that's not really how it works. In this case the stack isn't actually used for anything other than filler content (it's kind of a bug this was optimized out to begin with, this memset probably should have been volatile)

This revision is now accepted and ready to land.Jul 27 2023, 11:22 AM

@arsenm There is a crash in amdgpu_isel.test from check-llvm-tools-updatetestchecks
This function fails ISel:

; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 | FileCheck %s

define i64 @i64_test(i64 %i) nounwind readnone {
  %loc = alloca i64
  %j = load i64, i64 * %loc
  %r = add i64 %i, %j
  ret i64 %r
}

The FrameIndex fails:

t25: i32,ch = load<(dereferenceable load (s32) from %ir.loc, align 8)> # D:1 t0, FrameIndex:i64<0>, undef:i64
LLVM ERROR: Cannot select: t6: i64 = FrameIndex<0>
In function: i64_test

Likewise there is a regression in amdgpu_generated_funcs.ll tests.

I think SROA was doing some (good) stuff for these tests.
A simple fix seems to be to enable optimizations (-O3 works) for the crash.

I would suggest to just use O3 there and open a separate (internal or external) ticket for that ISel failure. What do you think?

Add test updates

Pierre-vh requested review of this revision.Jul 28 2023, 12:37 AM
arsenm added a comment.Aug 1 2023, 4:52 PM

@arsenm There is a crash in amdgpu_isel.test from check-llvm-tools-updatetestchecks
This function fails ISel:

; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 | FileCheck %s

define i64 @i64_test(i64 %i) nounwind readnone {
  %loc = alloca i64
  %j = load i64, i64 * %loc
  %r = add i64 %i, %j
  ret i64 %r
}

This test is just broken. It's using the wrong address space for the alloca and sroa just happened to delete it. Also this should use opaque pointers

arsenm added inline comments.Aug 1 2023, 5:00 PM
llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll
4 ↗(On Diff #545037)
Pierre-vh updated this revision to Diff 546709.Aug 2 2023, 11:20 PM

Rebase, also fix other test

arsenm accepted this revision.Aug 10 2023, 2:59 PM
This revision is now accepted and ready to land.Aug 10 2023, 2:59 PM
This revision was automatically updated to reflect the committed changes.