This is an archive of the discontinued LLVM Phabricator instance.

[amdgpu] Increase alignment of all LDS variables
ClosedPublic

Authored by JonChesterfield on Dec 9 2021, 5:55 PM.

Details

Summary

Currently the superalign option only increases the alignment of
variables that are moved into the module.lds block. Change that to all LDS
variables. Also only increase the alignment once, instead of once per function.

Diff Detail

Event Timeline

JonChesterfield created this revision.Dec 9 2021, 5:55 PM
JonChesterfield requested review of this revision.Dec 9 2021, 5:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 5:55 PM
  • add missing chunk to diff

This will increase the LDS used by some cases (specifically, padding around variables that aren't moved into the module.lds block). I assume that's expected to be a net win. Internal CI running at 611966.

llvm/lib/Target/AMDGPU/AMDGPULowerModuleLDSPass.cpp
229

Somewhat misleading diff. processUsedLDS has had the if (SuperAlignLDSGlobals) {} block copied out into a helper function but is otherwise unchanged.

llvm/test/CodeGen/AMDGPU/lower-kernel-lds-super-align.ll
1–2

preference for explicit here, means the test keeps working if we change the default on the pass

arsenm added a comment.Dec 9 2021, 6:23 PM

This will increase the LDS used by some cases (specifically, padding around variables that aren't moved into the module.lds block). I assume that's expected to be a net win. Internal CI running at 611966.

If this is supposed to be the guaranteed to work in all cases pass, isn't that an issue? Also it's probably not a net win if the increase pushes over an occupancy window

I thought overaligning these would be a loss in general when it was proposed (hence the flag) but haven't done the benchmarking. Kind of taking it on faith that someone else has since the feature was implemented. epsdb passed fwiw.

To be more precise, I'm not convinced this makes much real world difference as most variables are naturally aligned anyway but it looks like a mistake that we only change the subset of LDS variables that are already slow to access and allocated in many places.

Also totally happy with deleting this feature or defaulting it to off. The point of this exercise is to get it out of the 'lowering' LDS pass and into a to be written 'optimisation' LDS pass, where this is the (minor?) semantic half of that change and moving it to a separate pass would then be non-functional.

This revision is now accepted and ready to land.Dec 10 2021, 11:13 AM

Cool. If it hits us in a benchmark there's the chicken switch.

This revision was landed with ongoing or failed builds.Dec 12 2021, 11:31 AM
This revision was automatically updated to reflect the committed changes.