This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Reduce stack pointer alignment
Needs ReviewPublic

Authored by Flakebi on Sep 15 2020, 8:35 AM.

Details

Summary

HSA still needs this for OpenCL but it is not needed for other
subtargets.

Diff Detail

Event Timeline

Flakebi created this revision.Sep 15 2020, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 15 2020, 8:35 AM
Flakebi requested review of this revision.Sep 15 2020, 8:35 AM
arsenm added inline comments.Sep 15 2020, 8:38 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

This isn't a real ABI requirement and I don't think should vary based on the triple. Why do you want to re-reduce this?

Flakebi added inline comments.Sep 15 2020, 8:48 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

As far as I understand it, an alignment of 16 is needed for OpenCL but otherwise not required (as real memory addresses for lanes are aligned to 4 only anyway).
Forcing an alignment of 16 means that spilling one vgpr will reserve 1 kiB of scratch memory where only 256 Byte are needed.
If a single sgpr needs to be spilled to a vgpr (which is then in turn spilled to scratch), we need 1 kiB of memory for a 4 Byte value, which seems quite a lot to me.

Is there a downside of requiring an alignment of 4? Are there cases outside OpenCL where a higher alignment is required and the stack would need to be realigned?

arsenm added inline comments.Sep 15 2020, 8:51 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

It's not needed, it just introduces stack realignment if you have any stack objects with a higher alignment. Values up to 16-bytes are common (and 8 are very common). This isn't a property of the source language since the target ABI still has higher alignments for common types.

arsenm added inline comments.Sep 15 2020, 8:53 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

I have been thinking we should add an optimization pass to reduce the alignment of allocas if the address isn't captured

arsenm added inline comments.Sep 16 2020, 10:24 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

To be clear, reducing this back to 4 (I had it at 4 originally) may be a good plan with some optimizations, but I don't think this should vary based on the triple

Flakebi added inline comments.Oct 2 2020, 2:31 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
944–946

Thanks for the helpful explanation. I might come back to this in a while.

arsenm resigned from this revision.Nov 18 2022, 5:11 PM

I think this should be abandoned. Requiring 16-byte stack alignment doesn't require 16-byte alignment for a VGPR spill. The spill alignment requirement is only 4. This mostly matters for when stack realignment is necessary. The stack waste comes only in call frame contexts

Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2022, 5:11 PM