These are legal since we can do a 96-bit load on some subtargets, but
this is only for vector loads. If we can't widen the load, it needs to
be broken down once known scalar. For 16-byte alignment, widen to a
128-bit load.
Details
Diff Detail
Event Timeline
What about DS_READ? Following are also broken:
define <3 x i32> @v_load_lds_v3i32_align1(<3 x i32> addrspace(3)* %ptr) { %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 1 ret <3 x i32> %load } define <3 x i32> @v_load_lds_v3i32_align2(<3 x i32> addrspace(3)* %ptr) { %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 2 ret <3 x i32> %load } define <3 x i32> @v_load_lds_v3i32_align4(<3 x i32> addrspace(3)* %ptr) { %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 4 ret <3 x i32> %load } define <3 x i32> @v_load_lds_v3i32_align8(<3 x i32> addrspace(3)* %ptr) { %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 8 ret <3 x i32> %load } define <3 x i32> @v_load_lds_v3i32_align16(<3 x i32> addrspace(3)* %ptr) { %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 16 ret <3 x i32> %load }
It can be done with the same code in applyMappingLoad just by checking:
if (PtrBank == &AMDGPU::VGPRRegBank) { if (LoadSize != 96) return false; Register PtrReg = MI.getOperand(1).getReg(); const LLT PtrTy = MRI.getType(PtrReg); if (PtrTy.getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) { ... } }
and using VGPR bank in
ApplyRegBankMapping O(*this, MRI, &AMDGPU::VGPRRegBank);
But should it select ds_read_b96 instead? SDag does not select it either.
This case should be handled in the regular legalizer since it isn't contextually dependent on whether the pointer is SGPR or VGPR
I'm not much of an expert here but this looks reasonable to me.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1114 | All you need is assert(FirstSize % EltSize == 0), no parens around %. The other part of the condition is obviously implied. | |
1127–1128 | Perhaps assert that EltTy is a power of 2 size (or equivalently that the divsion 128 / EltTy.getSizeInBits() is exact) to guard against being called with weird types like v1s96 or v2s48? |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1127–1128 | Thankfully 1 element vectors don't exist in globalisel, but I would expect this to handle v2s48 (although one should never reach here I guess) |
All you need is assert(FirstSize % EltSize == 0), no parens around %. The other part of the condition is obviously implied.