This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Handle 32-bit address wraparounds for SMRD opcodes
ClosedPublic

Authored by mareko on Aug 23 2018, 6:08 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

mareko created this revision.Aug 23 2018, 6:08 PM
arsenm added inline comments.Aug 23 2018, 11:11 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗(On Diff #162317)

Can we check NUW flags on the add instead?

test/CodeGen/AMDGPU/constant-address-space-32bit.ll
289 ↗(On Diff #162317)

The constant should canonically be i32

mareko added inline comments.Aug 24 2018, 9:19 AM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗(On Diff #162317)

Yes, but where would I check for 256 * 1024?

nhaehnle added inline comments.Aug 24 2018, 10:00 AM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗(On Diff #162317)

The question is, why the check for 256 * 1024 in the first place? That seems rather fishy. I guess it could be made to work if we define that the 32-bit address space only goes up to 4GB - 256KB, but then that should be added to the address space documentation.

The other issue is that I think some hardware generations have the IMM offset as a *signed* byte offset. At least for gfx9 that's what the docs say. So then the check should depend on the hardware generation.

mareko added inline comments.Aug 24 2018, 10:26 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1362–1369 ↗(On Diff #162317)

arsenm asked me to check for NUW, which would disable IMM offset for all current users. Do you have a better idea?

I'm not 100% sure the NUW thing works here, because the actual instruction has the conversion to 64-bit before the add IIRC

Right, that matches my understanding: the SMRD/SMEM instruction does a 64-bit addition, so if the 32-bit (add X, imm) were to have an unsigned wraparound, moving it into the immediate of the SMRD/SMEM would remove the wraparound and therefore be incorrect.

Conversely, if the 32-bit add is nuw, then putting the immediate into the SMEM is okay. So we should definitely add this as a condition that allows the move.

The question is: what to do about the 256 KiB condition? It is rather hacky, so ideally we'd do without, but that requires us to figure out how earlier passes can be coaxed into generating adds with nuw set (or live with less efficient code, which would suck). This needs looking at the getelementptr lowering.

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

  • check NUW here
  • create inbounds GEP

make good use of SMEM/SMRD immediates?

The DAG isn't good about preserving NUW. I had some old patches I never got reviews to try to improve this

That would be good, yeah. Note here it's not so much a case of preserving NUW as it is of deducing as much NUW as possible from getelementptrs.

We can ignore old Mesa + new LLVM, because LLVM 7 is the first release to have 32-bit pointers, and I think we can fix that before release.

For internal driver additions, e.g. offset + small constant, the driver can use NUW or NSW. For app additions, we would just use normal Add. As long as NUW or NSW is preserved, we should be fine.

That sounds like the way to go is testing just the NUW bit here in SelectSMRDOffset?

@arsenm Where do you have the patches that preserve NUW?

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

  • check NUW here
  • create inbounds GEP

make good use of SMEM/SMRD immediates?

The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".

mareko updated this revision to Diff 163011.Aug 28 2018, 11:48 PM

This fixes GPU hangs with OpenGL bindless handle arithmetic.

nhaehnle accepted this revision.Aug 29 2018, 6:39 AM

Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).

Would the combination of:

  • check NUW here
  • create inbounds GEP

make good use of SMEM/SMRD immediates?

The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".

Yes, x+4 can overflow, but 'inbounds' is basically a promise that that won't happen. I'm pretty sure there's nothing hackish or unsafe about it.

First, 'inbounds' on getelementptr means (according to LangRef): interpreting all array offsets as signed integers and performing all arithmetic with infinite precision (i.e., no wraparound), the resulting pointer stays in bounds of the object pointed to by the base pointer at each step.

Now, visitGetElementPtr translates inbounds getelementptr on addrspace(6) with a non-negative array index into an add nuw i32. This is correct, because memory objects themselves never wrap around (there is no memory object that starts below 2^32 and ends above 0) and so the in bounds promise means that there is indeed no unsigned wrapping when adding as 32-bits.

Second, when we encounter (load (add nuw i32 reg, const)), and we put the const into the SMRD, then yes, the addition will be performed in 64 bits. But that doesn't change the wrapping behavior because there was no wrapping to begin with.

I think this change is good to go as-is.

This revision is now accepted and ready to land.Aug 29 2018, 6:39 AM
This revision was automatically updated to reflect the committed changes.