This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
ClosedPublic

Authored by Petar.Avramovic on Nov 12 2020, 4:50 AM.

Details

Summary

Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptNov 12 2020, 4:50 AM
Petar.Avramovic requested review of this revision.Nov 12 2020, 4:50 AM
foad added inline comments.Nov 12 2020, 5:28 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir
51

Can you pre-commit this test so we can see the diff?

Pre-commit the test.

foad added inline comments.Nov 12 2020, 6:32 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16

I am really not sure about this change. I think Reg is always a 32-bit value here (perhaps we should assert this), so any G_ADD that we decompose will be a 32-bit add, so using unsigned (or perhaps int) for the offset makes sense.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377

I think the hardware zero-extends each component of the address from 32 to 64 bits before adding them together. So the question here is: does zext(add(x, y)) == add(zext(x), zext(y))? The answer is "only if add(x, y) has no unsigned overflow".

So I don't think "Offset >= 0" is exactly the right condition here, but it does fix the common case of a small negative immediate offset, so I'm happy with this as a short term fix.

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16

There is range checking for the offset so the container for value should not make big difference. Choosing int or unsigned brings ambiguity for '32-bit values with 1 at highest bit' (I would assume that it is most probably negative offset then large positive offset so int could work). With int64 we know if value was positive or negative in ir.

Also see below (1), (2):

28

(2) while we use getZExtValue here. Is this intended?

36

(1) m_ICst uses getSExtValue() on APInt

foad added inline comments.Nov 12 2020, 8:52 AM
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16

With int64 we know if value was positive or negative in ir.

No, it's just a 32-bit value, in IR and in MIR. You don't get any extra information by returning int64_t.

28

getSExtValue vs getZExtValue affects how the 32-bit value is extended to int64_t. But if you make this function return a 32-bit result, then that doesn't matter.

arsenm added inline comments.Nov 12 2020, 9:06 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377

Don't we have the sign bit is zero check?

llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp
16

I meant that APInt and int64_t (used by G_CONSTANT) keep sign info and we know what was intended use of that 32 bit value.
Given 32 bit value 0xFFFFFFFC, we can't handle it here if it was used as negative offset (-4 as int64_t) since soffset behaves as unsigned in address calculation. However, if that is a large positive(unsigned) offset (4294967292 as int64_t) folding this into soffset behaves correctly.

Petar.Avramovic edited the summary of this revision. (Show Details)
foad added inline comments.Nov 19 2020, 6:47 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377

Not sure what you mean. Are you suggesting only decomposing the ADD if we know the sign bit is zero in both operands? That would be safe, but I'm not sure how often that test would succeed.

arsenm added inline comments.Nov 19 2020, 6:59 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1377

Yes, this is what we do in a few places in the DAG already

arsenm added inline comments.Dec 14 2020, 7:28 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1324

Shouldn't use unsigned here

foad added inline comments.Dec 14 2020, 7:33 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1324

I don't think we need this function at all. Surely (int)Val >= 0 is clear enough?

Sorry about the delay. Use (int)Offset to check for one in most significant bit and potential unsigned overflow. DAG equivalent in SITargetLowering::setBufferOffsets also uses int for offset.

foad accepted this revision.Dec 18 2020, 6:04 AM

LGTM. It is correct in more cases than it was before, and it mostly matches what the DAG version does.

llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1375

The DAG version of setBufferOffsets doesn't have this case. Is it done later on when the register classes are known?

This revision is now accepted and ready to land.Dec 18 2020, 6:04 AM
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1375

SDAG just gives up and uses add as base (1 extra instruction compared to what global-isel does). See @s_buffer_load_f32_offset_add_vgpr_sgpr in GlobalISel/llvm.amdgcn.s.buffer.load.ll for example.