Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Details
Diff Detail
Unit Tests
Event Timeline
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1375 | No else after return | |
1377 | This is a windows driver bug and doesn't deserve mentioning here. We do not know what the host OS is | |
1378–1379 | This should use an explicit subtarget feature check, not for the generation. The subtarget feature also theoretically captures this being a setting. We have hasUnalignedBufferAccess already (I believe the same control is used for DS and regular memory instructions, but I'm not sure) | |
1384 | No else after return |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | I agree that it's a bug, but I find it reasonable to mention it here. I'd change the comment though to specifically call out that this should be considered a bug in the Windows KMD. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | It's not specific to this instance though, this belongs with the place where we assume unaligned access for amdhsa |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | Okay, that's fair. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | No, it makes an incorrect statement. The alignment requirement only exists up to gfx8, so it should say "... require 16-byte alignment on gfx8 and older". I also don't think we enforce dword alignment on gfx9+ anywhere for any other kind of memory access, so we shouldn't do it here. The mess on Windows is a problem, but if we really need to work around it, it should be a target "feature" (misfeature, really, but oh well). |
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
700 ↗ | (On Diff #275099) | I believe this is actually the same control as UnalignedBufferAccess, so a new feature isn't needed (but this needs double checking) |
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h | ||
---|---|---|
700 ↗ | (On Diff #275099) | I believe LDS only become fully-featured with gfx9. |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | So you're recommending two new features? One to allow misaligned DS instructions and one that requires dword alignment (currently hasUnalignedDSAccess but should be renamed in this case) alongside default requirement of 16 byte? |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1377 | I think the features are "supports DS unaligned access" and "unaligned access is enabled". We have unaligned-buffer-access already, which should be moved into the target CPU definitions on targets where it works. We should add a new equivalent feature for DS, also added to the target devices. We also need another feature to enable the unaligned access corresponding to the driver setting. I'm not sure the unaligned DS alignment helps with the b64/b96/b128 cases (I think they still require alignment even with unaligned support? but this is the kind of thing I always have a hard time figuring out from the documentation) |
This is the way these requirements and options make sense to me and match the docs.
As for a feature that turns unalingned access for both buffer and ds instructions, I guess it could be something that just acts like -mattr=+unaligned-buffer-access,+unaligned-ds-access but I'm not sure if it's really necessary.
Also added another child revision that selects these instructions for SDag the same way as for GlobalISel https://reviews.llvm.org/D84403 so the tests will make more sense.
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
109 ↗ | (On Diff #280093) | I believe this also controls vmem, so the "dynamic" control could be named something not-DS specific like "unaligned-access-mode" or something? |
116–120 ↗ | (On Diff #280093) | It would be better to invert this, so requiring aligned access it the default. unaligned-ds-access? |
- Addressed comments.
I'm still not sure if -unaligned-buffer-access and -unaligned-access-mode are supposed to be considered the same option.
If both are actually supposed to be controlled by SH_MEM_CONFIG.alignment_mode then they can be combined into one option as presented in: https://reviews.llvm.org/D84522 (in which case we can combine it with this patch, but for now I'll keep it separate).
Moved tests from child revision to here (mistakenly put them in wrong patch):
- llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
- llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/multiple_tails.ll
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1367–1368 | We should do something to avoid every user needing to consider both ModeEnabled && ModeWorks. You can leave that for the later cleanup patch though; either you can move this combination check into a helper method, or somehow make the lack of hardware support imply the mode is disabled |
- Rebase
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
1367–1368 | This is handled in child revision. |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local-128.mir | ||
---|---|---|
103 | This lost the m0 initialization which is an issue |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local-128.mir | ||
---|---|---|
103 | Actually this failed to select either way. I suspect what is happening is this now hits the function isn't legal verifier check in a debug build. I suspect this test will fail in a release build that does not do the legality check. You either need to use the disable-gisel-legality-check flag, or split the tests to not select this on targets that don't use the instruction |
Issue was in (load 16, align 4, addrspace 3) which should not be legal for gfx7 but because of -global-isel-abort=0 if it crashes it would just give the same MIR as input.
I've changed those to align 8. D81638 will update it again to pick DS_READ_B128 or DS_READ2_B64.
Also there was only run line for GFX7 but multiple checks lines. I added a new run line and removed redundant ones.
We should do something to avoid every user needing to consider both ModeEnabled && ModeWorks. You can leave that for the later cleanup patch though; either you can move this combination check into a helper method, or somehow make the lack of hardware support imply the mode is disabled