This fixes GPU hangs with OpenGL bindless handle arithmetic.
|1362–1369 ↗||(On Diff #162317)|
The question is, why the check for 256 * 1024 in the first place? That seems rather fishy. I guess it could be made to work if we define that the 32-bit address space only goes up to 4GB - 256KB, but then that should be added to the address space documentation.
The other issue is that I think some hardware generations have the IMM offset as a *signed* byte offset. At least for gfx9 that's what the docs say. So then the check should depend on the hardware generation.
Right, that matches my understanding: the SMRD/SMEM instruction does a 64-bit addition, so if the 32-bit (add X, imm) were to have an unsigned wraparound, moving it into the immediate of the SMRD/SMEM would remove the wraparound and therefore be incorrect.
Conversely, if the 32-bit add is nuw, then putting the immediate into the SMEM is okay. So we should definitely add this as a condition that allows the move.
The question is: what to do about the 256 KiB condition? It is rather hacky, so ideally we'd do without, but that requires us to figure out how earlier passes can be coaxed into generating adds with nuw set (or live with less efficient code, which would suck). This needs looking at the getelementptr lowering.
Looking at SelectionDAGBuilder::visitGetElementPtr, nuw is set under certain conditions for inbounds getelementptr. I suspect we should be able to make most GEPs inbounds in Mesa - it just means that we never, not even temporarily, try to take addresses outside of properly allocated memory objects (buffers, arrays of descriptors).
Would the combination of:
- check NUW here
- create inbounds GEP
make good use of SMEM/SMRD immediates?
We can ignore old Mesa + new LLVM, because LLVM 7 is the first release to have 32-bit pointers, and I think we can fix that before release.
For internal driver additions, e.g. offset + small constant, the driver can use NUW or NSW. For app additions, we would just use normal Add. As long as NUW or NSW is preserved, we should be fine.
@arsenm Where do you have the patches that preserve NUW?
The code in SelectionDAGBuilder::visitGetElementPtr is unsafe for our case, because it doesn't know that the addition of 32-bit addresses is performed in 64 bits. Even x+4 can overflow in 32 bits but not 64 bits. The GEP can be "inbounds", but it doesn't change anything. However, we can hackishly use the inbounds flag to mean that the addition is safe, because GEP with inbounds and offset <= INT_MAX is converted to "add nuw".
Yes, x+4 can overflow, but 'inbounds' is basically a promise that that won't happen. I'm pretty sure there's nothing hackish or unsafe about it.
First, 'inbounds' on getelementptr means (according to LangRef): interpreting all array offsets as signed integers and performing all arithmetic with infinite precision (i.e., no wraparound), the resulting pointer stays in bounds of the object pointed to by the base pointer at each step.
Now, visitGetElementPtr translates inbounds getelementptr on addrspace(6) with a non-negative array index into an add nuw i32. This is correct, because memory objects themselves never wrap around (there is no memory object that starts below 2^32 and ends above 0) and so the in bounds promise means that there is indeed no unsigned wrapping when adding as 32-bits.
Second, when we encounter (load (add nuw i32 reg, const)), and we put the const into the SMRD, then yes, the addition will be performed in 64 bits. But that doesn't change the wrapping behavior because there was no wrapping to begin with.
I think this change is good to go as-is.