There's a desire to move away from undef in LLVM. Currently we want to
have the addressspace(3) variables use poison instead.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can you elaborate on why this would be the right thing to do?
We generally do want to treat shared memory loads/stores as volatile and block optimizations on them, as shared memory is often modified from other threads. In that sense neither poison nor undef appear to reflect that. I guess we may need to freeze all values loaded from shared memory.
E.g. if we have code like this, use of 'x' is fine, even though to LLVM it would look as if we always read uninitialized value from x.
__shared__ int x; __global__ void kernel() { if (threadIdx.x == 0) { x = compute(); } syncthreads(); if (threadIdx.x != 0) { if (x != 42) // LLVM should not be allowed to assume any specific value for x, and must do the actual comparison here. do_something(x); } }
This was in reponse to https://reviews.llvm.org/D147572 suggesting that we use poison in general over undef. I feel the semantics are roughly equivalent here, so I don't think this patch changes any existing behavior. All of these values get addrspacecasted before they're used, but as far as I know those simply propagate poison values and don't imply any sort of volatility.
I'm somewhat concerned about this property of poison: "It is correct to replace a poison value with an undef value or any value of the type."
The way I read it, poison *may* make it possible to assume whatever value is convenient for optimization passes, which is not what we want for the values read from shared memory.
Your example is not impacted by the Undef -> Poison change. What you describe is either a valid transformation for both, or invalid for both.
The example from before cannot be optimized as is, but if you make x internal, we already propagate the value just fine (with undef as initializer):
https://godbolt.org/z/bo3zn56jM
That said, I can see that we might want to be consistent with other loads from uninitialized memory.
As of right now, this mostly results in undef, not poison.
While only slightly different, it basically prevents people from comparing the undef values loaded from uninitialized memory while getting "sane" results.
@nlopes @nikic I assume loading uninitialized memory is and will remain undef?
Are they not impacted because it "happens to work" now, or because both undef and poison are guaranteed to do the right thing here? That's what I'm unsure about, as poison sounds like it would give the compiler more freedom to optimize, while I'm afraid that it may already have too much freedom as it is.
The example from before cannot be optimized as is, but if you make x internal, we already propagate the value just fine (with undef as initializer):
https://godbolt.org/z/bo3zn56jM
Exactly. That's where GPUs throw a wrench in LLVM's view of the world. While that variable may be internal to the module, as far as the symbol visibility is concerned, it is accessible from the other threads and may be modified by them.
The example above produces wrong code as nothing ever will be stored in x, before we load from it.
There is only few real differences, e.g., if you have icmp <op> undef, <val> you get a result that is not undef but with poison you get poison.
For the transformation I was talking about, the one you described and my example showcased, it does not matter though.
We can assume x is 99 because it is otherwise either undef or poison and for the latter two we can just "pick" 99 too. So, we'll always load 99 is correct in both cases.
The example from before cannot be optimized as is, but if you make x internal, we already propagate the value just fine (with undef as initializer):
https://godbolt.org/z/bo3zn56jMExactly. That's where GPUs throw a wrench in LLVM's view of the world. While that variable may be internal to the module, as far as the symbol visibility is concerned, it is accessible from the other threads and may be modified by them.
The example above produces wrong code as nothing ever will be stored in x, before we load from it.
You lost me. If nothing is stored in x, 99 is a perfectly fine value to "read" from x. The optimization is sound, IMHO.
Feel free to modify the example and cause it to break, we use this in production so finding bugs is good ;)
[EDIT: Just to make it clear, the Attributor is aware of threading to the degree that it needs to be. At least modulo bugs.]
I've no clue about these shared variables.
If you require that there's a store before any load, then the initialization value doesn't matter and can (should) be poison.
If it's ok to do an uninitialized of these shared variables, then you need to decide what's the semantics you want. If you want that comparison in the example above to be executed, poison is not ok.
Note that in C++, reading an uninitialized integer is UB, so you can use poison safely.
This optimization specifically replaces local C++ variables like in this example https://godbolt.org/z/hfM4b35q9. I think this is a valid transformation at least in this optimization at least because it targets local variables. I think the other discussions about how we handle the threading concepts are unrelated to this patch.
the other discussions about how we handle the threading concepts are unrelated to this patch.
Agreed. Sorry about getting somewhat off-topic.
I have no objection to the patch.