This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fine tune LDS misaligned access speed
ClosedPublic

Authored by rampitec on Apr 21 2022, 5:18 PM.

Diff Detail

Event Timeline

rampitec created this revision.Apr 21 2022, 5:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:18 PM
rampitec requested review of this revision.Apr 21 2022, 5:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2022, 5:18 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.May 17 2022, 2:38 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1598

What do the numbers mean?

rampitec added inline comments.May 17 2022, 2:45 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1598

More or less 'it operates with a speed comparable to N-bit wide load'. With the full alignment ds128 is slower than ds96 for example. If underaligned it is comparable to a speed of a single dword access, which would then mean 32 < 128 and it is faster to issue a wide load regardless. 1 is simply 'slow, don't do it'. I.e. comparing an aligned load to a wider load which will not be aligned anymore the latter is slower.

But essentially it is just a rank, these are not additive.

arsenm added inline comments.May 17 2022, 3:07 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1598

This needs to be commented

rampitec updated this revision to Diff 430197.May 17 2022, 3:17 PM
rampitec marked an inline comment as done.

Added comments about the values used.

rampitec marked an inline comment as done.May 17 2022, 3:17 PM

Ping. If there is no interest, I will drop it and stack of changes above it.

arsenm accepted this revision.Jun 6 2022, 2:45 PM
This revision is now accepted and ready to land.Jun 6 2022, 2:45 PM
rampitec updated this revision to Diff 476604.Nov 18 2022, 2:28 PM

Rebased, updated one test where we started to split slow store.

arsenm accepted this revision.Nov 18 2022, 3:49 PM
rampitec updated this revision to Diff 478399.Nov 28 2022, 3:25 PM

Added vectorization tests.

This revision was landed with ongoing or failed builds.Nov 28 2022, 4:12 PM
This revision was automatically updated to reflect the committed changes.