Move features/bugs checks into the single place
allowsMisalignedMemoryAccessesImpl.
This is mostly NFCI except for the order of selection in couple places.
A separate change may be needed to stop lying about Fast.
Paths
| Differential D123343
[AMDGPU] Refactor LDS alignment checks. ClosedPublic Authored by rampitec on Apr 7 2022, 3:50 PM.
Details
Summary Move features/bugs checks into the single place This is mostly NFCI except for the order of selection in couple places.
Diff Detail
Event TimelineComment Actions I am not sure we really want to tell truth about the 'Fast' here. If we tell that DS read misaligned by 1 byte is slow vectorizer will not combine 2 of them and we will get 2 separate ds_read_b32 instead of ds_read2_b32. It is slow, but the ds_read2_b32 is still faster than 2 separate instructions equally misaligned. That is what happens then: https://reviews.llvm.org/differential/diff/421361/ rampitec added a child revision: D123524: [AMDGPU] Split unaligned 3 DWORD DS operations.Apr 11 2022, 10:48 AM This revision is now accepted and ready to land.Apr 11 2022, 5:43 PM This revision was landed with ongoing or failed builds.Apr 12 2022, 7:49 AM Closed by commit rGb8e09f15539a: [AMDGPU] Refactor LDS alignment checks. (authored by rampitec). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions
Maybe LoadStoreVectorizer should be changed to create slow instructions, if the instructions being combined were slow already.
Comment Actions
I have a w/a for this in the D123634, but in general I do not think 'fast' or 'slow' is a right measure. Something is not fast or slow, but faster or slower than something other. This would be a big change though.
rampitec added inline comments. Comment Actions
Speaking of fast and slow this might be modeled with an unsigned speed rank. The direct translation of current bool IsFast will be to 0 and 1. Then target may use more ranks and a higher the number the faster is the access.
Revision Contents
Diff 422232 llvm/lib/Target/AMDGPU/SIISelLowering.cpp
llvm/test/CodeGen/AMDGPU/load-local-redundant-copies.ll
llvm/test/CodeGen/AMDGPU/store-local.128.ll
|
You don't need this - it is already handled by using PowerOf2Ceil to initialize RequiredAlignment.