We did not remove unused stack objects which may leave us
with unused scratch allocation.
Details
- Reviewers
arsenm
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | Do we really need to compact these, or are we just not skipping these somewhere? |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | As far as I understand we don't. We mark it for delete, and then it is compacted. Look at the new testcase, after pei this slot in the middle disappeares completely. The other tests I also had to correct to use stack.0 so that it is not compacted and tests still need to use large offsets, otherwise they were using small offsets. Or are you asking why wasn't it removed earlier? Honestly I don't know how a generic code could remove them. I had to mark as used slots we are reserving just for spilling and generic code doesn't have any way to know it is used. There is code trying to do so in StackSlotColoring, but it is a dead code because LiveSlots is a dead analysis doing nothing. I guess if it would succeed it will just happily remove yet unused slots we are reserving for reg spills and everything will crash. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | I.e. look at the idot8s.ll from out tests. After selection we have: Frame Objects: fi#0: size=4, align=4, at location [SP] fi#1: size=4, align=4, at location [SP] Function Live Ins: $sgpr0_sgpr1 in %1 These are not used anywhere, but also there is no place to remove these. I can imagine we could also deadcode/fold some FI uses and leave behind unused slots. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | And SDag does it to legalize bitcast from i32 to v4i8, extract elements and sign extend them, and then optimizes it away, leaving FI behind. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | I think postprocessing this here is the wrong way to fix this. This also seems like a suspicious example, is there a better one? |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | That the only one I found. For the postproseccing, look at the StackSlotColoring::ScanForSpillSlotRefs(), it does the same thing and then StackSlotColoring::ColorSlots() removed unused FIs at the end. It just never works. Also take in mind that requesting scratch automatically limits occupancy even if unused. |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | In this situation, I think this should count as stack usage. It's an earlier problem that these stack objects remain. This shouldn't be needed for the purposes of getting correct stack register initialization |
llvm/lib/Target/AMDGPU/SIMachineFunctionInfo.cpp | ||
---|---|---|
499 | Then we will have regression: initialize rsrc where we don't do it now. |
Do we really need to compact these, or are we just not skipping these somewhere?