This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
ClosedPublic

Authored by nikic on Jun 9 2019, 7:41 AM.

Details

Summary

Fixes part of https://bugs.llvm.org/show_bug.cgi?id=42178. We'll also want to handle uadd.sat(a, b) - b, which is more complicated.

We will expand umax back to usubsat in the backend if profitable.

Diff Detail

Event Timeline

nikic created this revision.Jun 9 2019, 7:41 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2019, 7:41 AM

This seems correct:

----------------------------------------
  %sat = usub_sat i8 C1, C2
  %res = add i8 %sat, C2
=>
  %sat = usub_sat i8 C1, C2
  %t0 = icmp ult i8 C1, C2
  %res = select i1 %t0, i8 C2, i8 C1

Done: 1
Optimization is correct!

But it seems to un-undef things:

----------------------------------------
  %sat = usub_sat i8 %a, %b
  %res = add i8 %sat, %b
=>
  %sat = usub_sat i8 %a, %b
  %t0 = icmp ult i8 %a, %b
  %res = select i1 %t0, i8 %b, i8 %a

ERROR: Value mismatch for i8 %res

Example:
i8 %a = undef
i8 %b = #x10 (16)
i8 %sat = undef
i1 %t0 = undef
Source value: undef
Target value: #x02 (2)

?

nikic added a comment.Jun 9 2019, 7:55 AM

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

Humm, that could explain part of the issue, but is that somewhere in langref?

lebedev.ri added a subscriber: regehr.
nikic added a comment.Jun 9 2019, 8:27 AM

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

Humm, that could explain part of the issue, but is that somewhere in langref?

It's not explicitly mentioned, but rather a direct consequence of undef semantics: For usub_sat i8 undef, 0x10 to be undef, the following proposition would have to hold: For any X in i8 there exists an Y in i8 such that X == usub_sat Y, 0x10. However, for X = 0xff there is no such Y.

nikic added a comment.Jun 9 2019, 10:17 AM

As pointed out on the GH issue, alive is correct here and it's just the printing of the counterexample that was wrong.

The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(

Not sure what to do here now...

lebedev.ri requested changes to this revision.EditedJun 9 2019, 10:34 AM

Guess we are again missing freeze instruction ? :(

This revision now requires changes to proceed.Jun 9 2019, 10:34 AM
xbolva00 added a comment.EditedJun 9 2019, 10:42 AM

We would pessimize all folds which could hit A but one-use check would fail now.

Canonicalize to llvm.maximum intrinsics (if they start support ints, @spatel )?

My motivating example in PR42207 for other simplifications also needs max/min :(

nikic abandoned this revision.Jun 9 2019, 12:43 PM

Abandoning as I doubt we'll have a short-term answer for this.

@xbolva00 Adding new fundamental intrinsics is a lot of work, I have some doubts that it will be worthwhile for min/max.

Cannot we do this transformation in the last instcombine run? Easier than new intrinsic...

There were discussions about integer min/max intrinsics some years ago but stalled .. old motivation cases + our new motivation cases should indicate that integer version of max/min intrinsics are worth.

I would like to ask if the increase of uses due to new transformation is generally banned in InstCombine (any pattern which we would like to fold to min/max pattern) or not? If banned, what is acceptable solution for such cases ?

@spatel @efriedma

This comment was removed by lebedev.ri.

I would like to ask if the increase of uses due to new transformation is generally banned in InstCombine (any pattern which we would like to fold to min/max pattern) or not? If banned, what is acceptable solution for such cases ?

*uses* or *instructions*? Only instruction count should not increase.
use-count is unspecified, unless of course it's illegal as per undef semantics.

@spatel @efriedma

Thanks for answer.
This comment:
“The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(“

Why this is issue ? since we can check if A is undef

Thanks for answer.
This comment:
“The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(“

Why this is issue ? since we can check if A is undef

Not really, if isa<undef>(a) fires, then that would have been constant-folded away before it gets here https://godbolt.org/z/JcknUp
And if it didn't fire, it doesn't mean that a is guaranteed to not be undef.

Ah, thanks. So (dead ?) FreezeInst is only solution ..

FreezeInst landed.

nikic reclaimed this revision.Aug 26 2020, 1:28 PM
This revision now requires changes to proceed.Aug 26 2020, 1:28 PM
nikic updated this revision to Diff 288102.Aug 26 2020, 1:29 PM
nikic changed the repository for this revision from rL LLVM to rG LLVM Github Monorepo.

Update to use umax intrinsic instead, which does not have undef troubles.

lebedev.ri accepted this revision.Aug 26 2020, 1:32 PM
This revision is now accepted and ready to land.Aug 26 2020, 1:32 PM
This revision was landed with ongoing or failed builds.Aug 28 2020, 12:52 PM
This revision was automatically updated to reflect the committed changes.

I was *just* thinking about the inverse of this transformation as a way to increase available ILP on CPUs where there are more pipes that can do arithmetic than compares.... cool!

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1428

Comment doesn't seem to match what the code is doing.

We'll also want to handle uadd.sat(a, b) - b, which is more complicated.

Is there a PR for that? Can you open it please?