Fixes part of https://bugs.llvm.org/show_bug.cgi?id=42178. We'll also want to handle uadd.sat(a, b) - b, which is more complicated.
We will expand umax back to usubsat in the backend if profitable.
Differential D63060
[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) nikic on Jun 9 2019, 7:41 AM. Authored by
Details Fixes part of https://bugs.llvm.org/show_bug.cgi?id=42178. We'll also want to handle uadd.sat(a, b) - b, which is more complicated. We will expand umax back to usubsat in the backend if profitable.
Diff Detail
Event TimelineComment Actions This seems correct: ---------------------------------------- %sat = usub_sat i8 C1, C2 %res = add i8 %sat, C2 => %sat = usub_sat i8 C1, C2 %t0 = icmp ult i8 C1, C2 %res = select i1 %t0, i8 C2, i8 C1 Done: 1 Optimization is correct! But it seems to un-undef things: ---------------------------------------- %sat = usub_sat i8 %a, %b %res = add i8 %sat, %b => %sat = usub_sat i8 %a, %b %t0 = icmp ult i8 %a, %b %res = select i1 %t0, i8 %b, i8 %a ERROR: Value mismatch for i8 %res Example: i8 %a = undef i8 %b = #x10 (16) i8 %sat = undef i1 %t0 = undef Source value: undef Target value: #x02 (2) ? Comment Actions @lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero). Comment Actions It's not explicitly mentioned, but rather a direct consequence of undef semantics: For usub_sat i8 undef, 0x10 to be undef, the following proposition would have to hold: For any X in i8 there exists an Y in i8 such that X == usub_sat Y, 0x10. However, for X = 0xff there is no such Y. Comment Actions As pointed out on the GH issue, alive is correct here and it's just the printing of the counterexample that was wrong. The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :( Not sure what to do here now... Comment Actions We would pessimize all folds which could hit A but one-use check would fail now. Canonicalize to llvm.maximum intrinsics (if they start support ints, @spatel )? My motivating example in PR42207 for other simplifications also needs max/min :( Comment Actions Abandoning as I doubt we'll have a short-term answer for this. @xbolva00 Adding new fundamental intrinsics is a lot of work, I have some doubts that it will be worthwhile for min/max. Comment Actions Cannot we do this transformation in the last instcombine run? Easier than new intrinsic... Comment Actions There were discussions about integer min/max intrinsics some years ago but stalled .. old motivation cases + our new motivation cases should indicate that integer version of max/min intrinsics are worth. This comment was removed by lebedev.ri. Comment Actions Thanks for answer. Why this is issue ? since we can check if A is undef Comment Actions Not really, if isa<undef>(a) fires, then that would have been constant-folded away before it gets here https://godbolt.org/z/JcknUp Comment Actions I was *just* thinking about the inverse of this transformation as a way to increase available ILP on CPUs where there are more pipes that can do arithmetic than compares.... cool!
Comment Actions
Is there a PR for that? Can you open it please? |
Comment doesn't seem to match what the code is doing.