Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasNoUnalignedDSAccessSupport() can be
now used to quickly check both.
Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPU.td | ||
---|---|---|
90 | I think we still need something to indicate the hardware supports the unaligned global/buffer memory vs. the driver set the config register to use it. | |
93 | Hardware support for unaligned global loads and stores | |
118 | Maybe change "Support" to "Enable" since this is the "dynamic" control. Also could make it clearer that this doesn't allow these on targets that don't support that specific memory space. Should also probably move this down to the section under // Subtarget Features (options and debugging) | |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
1452 | Probably needs to logically be hasUnalignedAccessMode && hardwareSupportsUnalignedGlobal (unless we somehow force off the dynamic mode feature if the hardware doesn't support it). maybe this should follow the model of ECC/XNACK |
- Addressed comments.
Reworked features to use ECC/XNACK model but this way FeatureDoesNotSupportUnalignedBufferAccess and FeatureDoesNotSupportUnalignedDSAccess seem kind of redundant (and long). It does make it more clear what each feature is supposed to do.
We are about to change the xnack and sramecc subtarget features to remove the "DoesNot" so I would suggest that FeatureDoesNotSupportUnalignedBufferAccess changes to FeatureSupportUnalignedBufferAccess.
Thanks for updating the subtarget features. I defer to @arsenm for the rest of the review.
mbrkusanin added a reverting change: rG8b08fa0103c8: Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access".
Revert "[AMDGPU] Reorganize GCN subtarget features for unaligned access"
This reverts commit f5cd7ec9f3fc969ff5e1feed961996844333de3b.
Certain rocPRIM/rocThrust/hipCUB tests were failing because of this change.
I fixed the underlying problems with D90607 and reapplied this patch in 830ed64ccd28bfa72fe93636e165f4e15a1d3af9.
I think we still need something to indicate the hardware supports the unaligned global/buffer memory vs. the driver set the config register to use it.