Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).
Details
Diff Detail
Event Timeline
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.mir | ||
---|---|---|
51 | Can you pre-commit this test so we can see the diff? |
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
---|---|---|
16 | I am really not sure about this change. I think Reg is always a 32-bit value here (perhaps we should assert this), so any G_ADD that we decompose will be a 32-bit add, so using unsigned (or perhaps int) for the offset makes sense. | |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
1377 | I think the hardware zero-extends each component of the address from 32 to 64 bits before adding them together. So the question here is: does zext(add(x, y)) == add(zext(x), zext(y))? The answer is "only if add(x, y) has no unsigned overflow". So I don't think "Offset >= 0" is exactly the right condition here, but it does fix the common case of a small negative immediate offset, so I'm happy with this as a short term fix. |
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
---|---|---|
16 | There is range checking for the offset so the container for value should not make big difference. Choosing int or unsigned brings ambiguity for '32-bit values with 1 at highest bit' (I would assume that it is most probably negative offset then large positive offset so int could work). With int64 we know if value was positive or negative in ir. Also see below (1), (2): | |
28 | (2) while we use getZExtValue here. Is this intended? | |
36 | (1) m_ICst uses getSExtValue() on APInt |
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
---|---|---|
16 |
No, it's just a 32-bit value, in IR and in MIR. You don't get any extra information by returning int64_t. | |
28 | getSExtValue vs getZExtValue affects how the 32-bit value is extended to int64_t. But if you make this function return a 32-bit result, then that doesn't matter. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1377 | Don't we have the sign bit is zero check? |
llvm/lib/Target/AMDGPU/AMDGPUGlobalISelUtils.cpp | ||
---|---|---|
16 | I meant that APInt and int64_t (used by G_CONSTANT) keep sign info and we know what was intended use of that 32 bit value. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1377 | Not sure what you mean. Are you suggesting only decomposing the ADD if we know the sign bit is zero in both operands? That would be safe, but I'm not sure how often that test would succeed. |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1377 | Yes, this is what we do in a few places in the DAG already |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1324 | Shouldn't use unsigned here |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1324 | I don't think we need this function at all. Surely (int)Val >= 0 is clear enough? |
Sorry about the delay. Use (int)Offset to check for one in most significant bit and potential unsigned overflow. DAG equivalent in SITargetLowering::setBufferOffsets also uses int for offset.
LGTM. It is correct in more cases than it was before, and it mostly matches what the DAG version does.
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1375 | The DAG version of setBufferOffsets doesn't have this case. Is it done later on when the register classes are known? |
llvm/lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp | ||
---|---|---|
1375 | SDAG just gives up and uses add as base (1 extra instruction compared to what global-isel does). See @s_buffer_load_f32_offset_add_vgpr_sgpr in GlobalISel/llvm.amdgcn.s.buffer.load.ll for example. |
I am really not sure about this change. I think Reg is always a 32-bit value here (perhaps we should assert this), so any G_ADD that we decompose will be a 32-bit add, so using unsigned (or perhaps int) for the offset makes sense.