This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Remove dead stack objects
AbandonedPublic

Authored by rampitec on Nov 11 2020, 4:15 PM.

Details

Reviewers
arsenm
Summary

We did not remove unused stack objects which may leave us
with unused scratch allocation.

Diff Detail

Event Timeline

rampitec created this revision.Nov 11 2020, 4:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 11 2020, 4:15 PM
rampitec requested review of this revision.Nov 11 2020, 4:15 PM
arsenm added inline comments.Nov 11 2020, 5:17 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

Do we really need to compact these, or are we just not skipping these somewhere?

rampitec added inline comments.Nov 11 2020, 5:27 PM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

As far as I understand we don't. We mark it for delete, and then it is compacted. Look at the new testcase, after pei this slot in the middle disappeares completely. The other tests I also had to correct to use stack.0 so that it is not compacted and tests still need to use large offsets, otherwise they were using small offsets.

Or are you asking why wasn't it removed earlier? Honestly I don't know how a generic code could remove them. I had to mark as used slots we are reserving just for spilling and generic code doesn't have any way to know it is used. There is code trying to do so in StackSlotColoring, but it is a dead code because LiveSlots is a dead analysis doing nothing. I guess if it would succeed it will just happily remove yet unused slots we are reserving for reg spills and everything will crash.

rampitec added inline comments.Nov 12 2020, 9:43 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

I.e. look at the idot8s.ll from out tests. After selection we have:

Frame Objects:
  fi#0: size=4, align=4, at location [SP]
  fi#1: size=4, align=4, at location [SP]
Function Live Ins: $sgpr0_sgpr1 in %1

These are not used anywhere, but also there is no place to remove these. I can imagine we could also deadcode/fold some FI uses and leave behind unused slots.

rampitec added inline comments.Nov 12 2020, 10:22 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

And SDag does it to legalize bitcast from i32 to v4i8, extract elements and sign extend them, and then optimizes it away, leaving FI behind.

arsenm added inline comments.Nov 12 2020, 11:20 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

I think postprocessing this here is the wrong way to fix this. This also seems like a suspicious example, is there a better one?

rampitec added inline comments.Nov 12 2020, 11:23 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

That the only one I found. For the postproseccing, look at the StackSlotColoring::ScanForSpillSlotRefs(), it does the same thing and then StackSlotColoring::ColorSlots() removed unused FIs at the end. It just never works.

Also take in mind that requesting scratch automatically limits occupancy even if unused.

arsenm added inline comments.Nov 12 2020, 11:26 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

In this situation, I think this should count as stack usage. It's an earlier problem that these stack objects remain. This shouldn't be needed for the purposes of getting correct stack register initialization

rampitec added inline comments.Nov 12 2020, 11:27 AM
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
499

Then we will have regression: initialize rsrc where we don't do it now.

rampitec abandoned this revision.Nov 12 2020, 3:56 PM