Page MenuHomePhabricator

[AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores
ClosedPublic

Authored by mbrkusanin on Jun 29 2020, 10:13 AM.

Details

Summary

Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.

Diff Detail

Event Timeline

mbrkusanin created this revision.Jun 29 2020, 10:13 AM
arsenm added inline comments.Jun 29 2020, 3:30 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1417

No else after return

1419

This is a windows driver bug and doesn't deserve mentioning here. We do not know what the host OS is

1420–1421

This should use an explicit subtarget feature check, not for the generation. The subtarget feature also theoretically captures this being a setting. We have hasUnalignedBufferAccess already (I believe the same control is used for DS and regular memory instructions, but I'm not sure)

1426

No else after return

nhaehnle added inline comments.Jul 1 2020, 6:18 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

I agree that it's a bug, but I find it reasonable to mention it here. I'd change the comment though to specifically call out that this should be considered a bug in the Windows KMD.

arsenm added inline comments.Jul 1 2020, 6:42 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

It's not specific to this instance though, this belongs with the place where we assume unaligned access for amdhsa

nhaehnle added inline comments.Jul 1 2020, 7:10 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

Okay, that's fair.

mbrkusanin updated this revision to Diff 275099.Jul 2 2020, 7:03 AM
mbrkusanin marked 4 inline comments as done.Jul 2 2020, 7:14 AM
mbrkusanin added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

Is the new wording ok?

1420–1421

Added hasUnalignedDSAccess.

nhaehnle added inline comments.Jul 6 2020, 7:51 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

No, it makes an incorrect statement. The alignment requirement only exists up to gfx8, so it should say "... require 16-byte alignment on gfx8 and older".

I also don't think we enforce dword alignment on gfx9+ anywhere for any other kind of memory access, so we shouldn't do it here. The mess on Windows is a problem, but if we really need to work around it, it should be a target "feature" (misfeature, really, but oh well).

arsenm added inline comments.Jul 6 2020, 1:01 PM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
701

I believe this is actually the same control as UnalignedBufferAccess, so a new feature isn't needed (but this needs double checking)

nhaehnle added inline comments.Jul 8 2020, 8:13 AM
llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h
701

I believe LDS only become fully-featured with gfx9.

mbrkusanin marked an inline comment as done.Jul 14 2020, 3:00 AM
mbrkusanin added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

So you're recommending two new features? One to allow misaligned DS instructions and one that requires dword alignment (currently hasUnalignedDSAccess but should be renamed in this case) alongside default requirement of 16 byte?

arsenm added inline comments.Jul 14 2020, 3:29 PM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1419

I think the features are "supports DS unaligned access" and "unaligned access is enabled". We have unaligned-buffer-access already, which should be moved into the target CPU definitions on targets where it works. We should add a new equivalent feature for DS, also added to the target devices. We also need another feature to enable the unaligned access corresponding to the driver setting.

I'm not sure the unaligned DS alignment helps with the b64/b96/b128 cases (I think they still require alignment even with unaligned support? but this is the kind of thing I always have a hard time figuring out from the documentation)

mbrkusanin edited the summary of this revision. (Show Details)

This is the way these requirements and options make sense to me and match the docs.
As for a feature that turns unalingned access for both buffer and ds instructions, I guess it could be something that just acts like -mattr=+unaligned-buffer-access,+unaligned-ds-access but I'm not sure if it's really necessary.

mbrkusanin added a comment.EditedJul 23 2020, 7:01 AM

Also added another child revision that selects these instructions for SDag the same way as for GlobalISel https://reviews.llvm.org/D84403 so the tests will make more sense.

arsenm added inline comments.Jul 23 2020, 1:53 PM
llvm/lib/Target/AMDGPU/AMDGPU.td
109

I believe this also controls vmem, so the "dynamic" control could be named something not-DS specific like "unaligned-access-mode" or something?

116–120

It would be better to invert this, so requiring aligned access it the default. unaligned-ds-access?

  • Addressed comments.

I'm still not sure if -unaligned-buffer-access and -unaligned-access-mode are supposed to be considered the same option.
If both are actually supposed to be controlled by SH_MEM_CONFIG.alignment_mode then they can be combined into one option as presented in: https://reviews.llvm.org/D84522 (in which case we can combine it with this patch, but for now I'll keep it separate).

mbrkusanin retitled this revision from AMDGPU: Fix alignment requirements for 96bit and 128bit local loads and stores to [AMDGPU] Fix alignment requirements for 96bit and 128bit local loads and stores.Jul 24 2020, 6:05 AM

Moved tests from child revision to here (mistakenly put them in wrong patch):

  • llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/merge-stores.ll
  • llvm/test/Transforms/LoadStoreVectorizer/AMDGPU/multiple_tails.ll
  • Rebase
  • Ping
arsenm accepted this revision.Aug 10 2020, 12:11 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1401–1402

We should do something to avoid every user needing to consider both ModeEnabled && ModeWorks. You can leave that for the later cleanup patch though; either you can move this combination check into a helper method, or somehow make the lack of hardware support imply the mode is disabled

This revision is now accepted and ready to land.Aug 10 2020, 12:11 PM
  • Rebase
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
1401–1402

This is handled in child revision.

arsenm requested changes to this revision.Aug 18 2020, 9:23 AM
arsenm added inline comments.
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local-128.mir
103

This lost the m0 initialization which is an issue

This revision now requires changes to proceed.Aug 18 2020, 9:23 AM
arsenm added inline comments.Aug 18 2020, 9:41 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-load-local-128.mir
103

Actually this failed to select either way. I suspect what is happening is this now hits the function isn't legal verifier check in a debug build. I suspect this test will fail in a release build that does not do the legality check. You either need to use the disable-gisel-legality-check flag, or split the tests to not select this on targets that don't use the instruction

Issue was in (load 16, align 4, addrspace 3) which should not be legal for gfx7 but because of -global-isel-abort=0 if it crashes it would just give the same MIR as input.
I've changed those to align 8. D81638 will update it again to pick DS_READ_B128 or DS_READ2_B64.

Also there was only run line for GFX7 but multiple checks lines. I added a new run line and removed redundant ones.

arsenm accepted this revision.Aug 20 2020, 12:52 PM
This revision is now accepted and ready to land.Aug 20 2020, 12:52 PM