This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix 8-byte aligned, 96-bit scalar loads
ClosedPublic

Authored by arsenm on Jun 9 2020, 6:25 PM.

Details

Summary

These are legal since we can do a 96-bit load on some subtargets, but
this is only for vector loads. If we can't widen the load, it needs to
be broken down once known scalar. For 16-byte alignment, widen to a
128-bit load.

Diff Detail

Event Timeline

arsenm created this revision.Jun 9 2020, 6:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2020, 6:25 PM

What about DS_READ? Following are also broken:

define <3 x i32> @v_load_lds_v3i32_align1(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 1
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align2(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 2
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align4(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 4
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align8(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 8
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align16(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 16
  ret <3 x i32> %load
}

It can be done with the same code in applyMappingLoad just by checking:

 if (PtrBank == &AMDGPU::VGPRRegBank) {
    if (LoadSize != 96)
      return false;

    Register PtrReg = MI.getOperand(1).getReg();
    const LLT PtrTy = MRI.getType(PtrReg);

    if (PtrTy.getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
     ...
  }
}

and using VGPR bank in

ApplyRegBankMapping O(*this, MRI, &AMDGPU::VGPRRegBank);

But should it select ds_read_b96 instead? SDag does not select it either.

What about DS_READ? Following are also broken:

define <3 x i32> @v_load_lds_v3i32_align1(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 1
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align2(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 2
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align4(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 4
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align8(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 8
  ret <3 x i32> %load
}

define <3 x i32> @v_load_lds_v3i32_align16(<3 x i32> addrspace(3)* %ptr) {
  %load = load <3 x i32>, <3 x i32> addrspace(3)* %ptr, align 16
  ret <3 x i32> %load
}

It can be done with the same code in applyMappingLoad just by checking:

 if (PtrBank == &AMDGPU::VGPRRegBank) {
    if (LoadSize != 96)
      return false;

    Register PtrReg = MI.getOperand(1).getReg();
    const LLT PtrTy = MRI.getType(PtrReg);

    if (PtrTy.getAddressSpace() == AMDGPUAS::LOCAL_ADDRESS) {
     ...
  }
}

and using VGPR bank in

ApplyRegBankMapping O(*this, MRI, &AMDGPU::VGPRRegBank);

But should it select ds_read_b96 instead? SDag does not select it either.

This case should be handled in the regular legalizer since it isn't contextually dependent on whether the pointer is SGPR or VGPR

foad accepted this revision.Jun 15 2020, 7:14 AM

I'm not much of an expert here but this looks reasonable to me.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1114

All you need is assert(FirstSize % EltSize == 0), no parens around %. The other part of the condition is obviously implied.

1127–1128

Perhaps assert that EltTy is a power of 2 size (or equivalently that the divsion 128 / EltTy.getSizeInBits() is exact) to guard against being called with weird types like v1s96 or v2s48?

This revision is now accepted and ready to land.Jun 15 2020, 7:14 AM
arsenm closed this revision.Jun 15 2020, 8:33 AM
arsenm marked 2 inline comments as done.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1127–1128

Thankfully 1 element vectors don't exist in globalisel, but I would expect this to handle v2s48 (although one should never reach here I guess)