Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The nice thing to do would be to route this through a TTI hook.
Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
6419 | Bunch more could be handled (I can’t name one that shouldn’t) |
I'd appreciate some confirmation that this makes sense, for some or all of the cases. Here's my thinking:
case Intrinsic::amdgcn_fmed3: case Intrinsic::amdgcn_fmul_legacy: case Intrinsic::amdgcn_fract: case Intrinsic::amdgcn_rcp: case Intrinsic::amdgcn_rsq:
These are just standard arithmetic operations, so should be handled the same as generic arithmetic intrinsics like maxnum.
case Intrinsic::amdgcn_icmp:
This is like ballot, a convergent operation that depends on values in other lanes.
case Intrinsic::amdgcn_interp_p1: case Intrinsic::amdgcn_interp_p1_f16: case Intrinsic::amdgcn_interp_p2: case Intrinsic::amdgcn_interp_p2_f16:
These ones load values from LDS as well as doing arithmetic, but we don't expose the load to the compiler so I don't think there's any reason to say that the loaded value could be poison.
case Intrinsic::amdgcn_mbcnt_hi: case Intrinsic::amdgcn_mbcnt_lo:
The result of these depends on the lane ID of the executing lane.
case Intrinsic::amdgcn_readfirstlane: case Intrinsic::amdgcn_readlane:
These depend on values from other lanes. For readlane we could define that the result is poison if the lane number argument is out of range, but AFAIK we do not do that - the spec is just that you get whatever the corresponding hardware instruction would return, and hardware instructions never return poison.
case Intrinsic::amdgcn_softwqm: case Intrinsic::amdgcn_wqm: case Intrinsic::amdgcn_wwm:
These ones return the input value unmodified.
In summary, for the intrinsics whose results depend on some extra state, there is no point in saying that that state could be poison, unless it is exposed as a regular IR value that the compiler can reason about.
Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined
That might work for creating poison, but what about propagating poison? Wouldn't it be beneficial to say that target arithmetic intrinsics still propagate poison like regular arithmetic does?
Yes.
I also realized we do have some sources of poison. The various intrinsics to access inputs return poison when called through functions with the corresponding "amdgpu-no..." attributes or incompatible calling conventions. I'd expect to cover every intrinsic which selects to a real instruction to be covered here
This would be a pretty big annoyance, because ValueTracking is used everywhere, and this would require adding a TTI dependency to pretty much all passes. The practical outcome would probably be that most callers just wouldn't pass TTI. As such, prefer the approach in this patch.
Also I would argue that no target intrinsics can produce poison. Arithmetic poison ultimately exists because of the desire to tolerate different edge case handling on different targets, but target instructions should be well defined
Just to give an obvious counter-example: Load-like intrinsics tend to produce poison (because there might be poison in memory).
But that would be a property of the memory, not the operation itself. I also thought load poison was UB?
I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.
In the AMDGPU backend we have lots of load-like intrinsics that are mostly opaque to the compiler, except that it knows they read from some address space.
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
6419 | amdgcn_log, amdgcn_fmad_ftz, int_amdgcn_mul_i24, int_amdgcn_mul_u24, int_amdgcn_mulhi_i24, int_amdgcn_mulhi_u24, int_amdgcn_inverse_ballot, int_amdgcn_ballot, int_amdgcn_cvt_pk_u8_f32, int_amdgcn_mqsad_u32_u8, int_amdgcn_mqsad_pk_u16_u8, int_amdgcn_qsad_pk_u16_u8, int_amdgcn_sad_u16, int_amdgcn_sad_hi_u8, int_amdgcn_msad_u8, int_amdgcn_sad_u8, f int_amdgcn_lerp, int_amdgcn_sbfe, int_amdgcn_ubfe, int_amdgcn_mov_dpp, int_amdgcn_update_dpp, int_amdgcn_ds_permute, int_amdgcn_ds_bpermute, int_amdgcn_perm, int_amdgcn_permlane16, int_amdgcn_permlanex16, int_amdgcn_mov_dpp8, int_amdgcn_s_get_waveid_in_workgroup, int_amdgcn_permlane64, int_amdgcn_fdot2, int_amdgcn_fdot2_f16_f16, int_amdgcn_fdot2_bf16_bf16, int_amdgcn_fdot2_f32_bf16, int_amdgcn_sdot2, int_amdgcn_udot2, int_amdgcn_sdot4, int_amdgcn_udot4, int_amdgcn_sudot4, int_amdgcn_udot8, int_amdgcn_sudot8 All the MFMA, there are a bunch more cvt* int_amdgcn_fdiv_fast |
Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.
llvm/lib/Analysis/ValueTracking.cpp | ||
---|---|---|
6419 | If you want to add that many intrinsics, it's probably better to add a new intrinsic property to the TableGen intrinsic definitions. That should also make it easier to see how complete the coverage is. (Easiest way to do that is probably a new function attribute.) |
I was wondering about that case myself. I don't see the benefit of saying that a load-like intrinsic can return poison, unless the compiler recognizes it as a load well enough to do things like forwarding a previously stored value through it.
Not sure I follow. Do you want to define your load intrinsics are performing an implicit freeze? You could do that, but I'm not sure it is desirable.
I don't understand the big picture here: what possible benefit is there to telling the compiler that the value that is loaded can (or can't) be poison, when the compiler knows nothing else about the provenance of that value?
(By contrast, for a normal IR load, I can see that tracking the poison-ness of the loaded value might be useful, so that IR passes can add or remove store->load pairs without affecting poison-ness.)
there might be poison in memory
Perhaps https://llvm.org/docs/LangRef.html#poison-values should mention this interesting fact.
Okay, I think I get what you mean now. You want to say the load result is non-poison/undef, because the compiler will never replace the load with a literal undef/poison value due to lack of optimization support, and as such you can't introduce practical miscompiles with it. This makes me pretty uncomfortable, because you'd effectively be lying, just in a way that shouldn't matter. Maybe @nlopes has some insight on whether this could cause any specific issues.
Right. I have a similar issue with the arguments passed into GPU kernels, which are launched by hardware:
define amdgpu_cs void @f(i32 %x, i32 %y) { ... }
The compiler will never see IR for a call to this function. You're not allowed to call kernels from IR. The argument values %x and %y are set up by hardware, and I think I'd like to mark them as noundef so the compiler can assume they are never poison.
If a function call is guaranteed to never be inlined, then it's ok to mark the declaration as not returning undef/poison.
But you can't mark the function definition as such, since the function may in fact return a poison and then you end up with UB.
For AMD's special load intrinsics, it's fine to mark them as non-undef/poison as long as they are not lowered to vanilla load instructions in SelDag. As then the lowering would be wrong. You need to keep the façade until assembly time.
The semantics needs to be consistent throughout the pipeline. It's ok to "lie" about the semantics of something, as long as the compiler never gets to see the real thing. The real stuff can only meet at linker time. At that point there's no poison or undef values, and therefore your promises about a function/instruction not returning undef/poison are correct.
That's why you can give whatever semantics you want to declarations, as long as you only see the definition in assembly/SelDag and the semantics you promised correspond to the asm/SelDag thing.
My 2 cents:
For the interp intrinsic even the underlying memory is sort of known to never contain poison. The data ultimately comes from an export out of a vertex shader, and *that* data could be poison, but there's a lot of fixed function hardware machinery in-between and we can always pretend that this machinery has a "freeze" somewhere. (This prevents a hypothetical transform that constant-propagates poison from the vertex shader into the fragment shader, but I really don't think that such a transform is useful for performance.)
I'm less comfortable about saying that readlane cannot create poison. Reading from an out-of-range lane is basically like reading from an arbitrary uninitialized register, which is basically the ur-version of poison. I'm not sure that it matters, but it *may* matter via fun facts like: readlane is memory(none), but if you duplicate it and the source lane is out of range, you may get different results from the duplicates. Even readfirstlane is "interesting": the only way for readfirstlane to return poison is if the input is poison; however it is sufficient for the input *in a different thread* to be poison. There may be dragons there.
Apart from that, I'm generally in favor of this change.
If we are agreed that intrinsics that map down to a normal arithmetic instruction should propagate poison, then I will rework this patch with a new NoCreatePoison intrinsic attribute.
Bunch more could be handled (I can’t name one that shouldn’t)