This is an archive of the discontinued LLVM Phabricator instance.

----------------------------------------
  %sat = usub_sat i8 C1, C2
  %res = add i8 %sat, C2
=>
  %sat = usub_sat i8 C1, C2
  %t0 = icmp ult i8 C1, C2
  %res = select i1 %t0, i8 C2, i8 C1

Done: 1
Optimization is correct!

But it seems to un-undef things:

----------------------------------------
  %sat = usub_sat i8 %a, %b
  %res = add i8 %sat, %b
=>
  %sat = usub_sat i8 %a, %b
  %t0 = icmp ult i8 %a, %b
  %res = select i1 %t0, i8 %b, i8 %a

ERROR: Value mismatch for i8 %res

Example:
i8 %a = undef
i8 %b = #x10 (16)
i8 %sat = undef
i1 %t0 = undef
Source value: undef
Target value: #x02 (2)

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

In D63060#1535589, @nikic wrote:

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

Humm, that could explain part of the issue, but is that somewhere in langref?

lebedev.ri added a reviewer: nlopes.Jun 9 2019, 8:17 AM

lebedev.ri added a subscriber: regehr.

In D63060#1535594, @lebedev.ri wrote:

In D63060#1535589, @nikic wrote:

@lebedev.ri That looks like a bug in alive, usub_sat i8 undef, 0x10 is not undef (we fold to zero).

Humm, that could explain part of the issue, but is that somewhere in langref?

It's not explicitly mentioned, but rather a direct consequence of undef semantics: For usub_sat i8 undef, 0x10 to be undef, the following proposition would have to hold: For any X in i8 there exists an Y in i8 such that X == usub_sat Y, 0x10. However, for X = 0xff there is no such Y.

As pointed out on the GH issue, alive is correct here and it's just the printing of the counterexample that was wrong.

The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(

Not sure what to do here now...

Guess we are again missing freeze instruction ? :(

This revision now requires changes to proceed.Jun 9 2019, 10:34 AM

We would pessimize all folds which could hit A but one-use check would fail now.

Canonicalize to llvm.maximum intrinsics (if they start support ints, @spatel )?

My motivating example in PR42207 for other simplifications also needs max/min :(

Abandoning as I doubt we'll have a short-term answer for this.

@xbolva00 Adding new fundamental intrinsics is a lot of work, I have some doubts that it will be worthwhile for min/max.

Cannot we do this transformation in the last instcombine run? Easier than new intrinsic...

There were discussions about integer min/max intrinsics some years ago but stalled .. old motivation cases + our new motivation cases should indicate that integer version of max/min intrinsics are worth.

I would like to ask if the increase of uses due to new transformation is generally banned in InstCombine (any pattern which we would like to fold to min/max pattern) or not? If banned, what is acceptable solution for such cases ?

@spatel @efriedma

lebedev.ri added a comment.Jun 24 2019, 6:43 AM

This comment was removed by lebedev.ri.

In D63060#1555484, @xbolva00 wrote:

I would like to ask if the increase of uses due to new transformation is generally banned in InstCombine (any pattern which we would like to fold to min/max pattern) or not? If banned, what is acceptable solution for such cases ?

*uses* or *instructions*? Only instruction count should not increase.
use-count is unspecified, unless of course it's illegal as per undef semantics.

@spatel @efriedma

Thanks for answer.
This comment:
“The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(“

Why this is issue ? since we can check if A is undef

In D63060#1555522, @xbolva00 wrote:

Thanks for answer.
This comment:
“The transform is not correct, because it increases the number of uses of A. If A is undef, it may evaluate to a different value for each use :(“

Why this is issue ? since we can check if A is undef

Not really, if isa<undef>(a) fires, then that would have been constant-folded away before it gets here https://godbolt.org/z/JcknUp
And if it didn't fire, it doesn't mean that a is guaranteed to not be undef.

Ah, thanks. So (dead ?) FreezeInst is only solution ..

nikic mentioned this in rL365999: [InstCombine] add tests for umin/umax via usub.sat; NFC.Jul 13 2019, 6:05 AM

spatel mentioned this in rL366000: Revert "[InstCombine] add tests for umin/umax via usub.sat; NFC".Jul 13 2019, 6:17 AM

spatel mentioned this in rG22cc1030f6a9: Revert "[InstCombine] add tests for umin/umax via usub.sat; NFC".

FreezeInst landed.

nikic reclaimed this revision.Aug 26 2020, 1:28 PM

This revision now requires changes to proceed.Aug 26 2020, 1:28 PM

Update to use umax intrinsic instead, which does not have undef troubles.

lebedev.ri accepted this revision.Aug 26 2020, 1:32 PM

This revision is now accepted and ready to land.Aug 26 2020, 1:32 PM

Thanks!

Harbormaster completed remote builds in B69677: Diff 288102.Aug 26 2020, 3:42 PM

This revision was landed with ongoing or failed builds.Aug 28 2020, 12:52 PM

Closed by commit rGffe05dd12593: [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178) (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGffe05dd12593: [InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178).

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 28 2020, 12:52 PM

I was *just* thinking about the inverse of this transformation as a way to increase available ILP on CPUs where there are more pipes that can do arithmetic than compares.... cool!

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp
1428	Comment doesn't seem to match what the code is doing.

nikic mentioned this in rG57a26bb7b435: [InstCombine] Fix typo in comment (NFC).Aug 29 2020, 1:19 AM

We'll also want to handle uadd.sat(a, b) - b, which is more complicated.

Is there a PR for that? Can you open it please?

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineAddSub.cpp

8 lines

test/

Transforms/

InstCombine/

saturating-add-sub.ll

23 lines

Diff 288681

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show First 20 Lines • Show All 1,419 Lines • ▼ Show 20 Lines

Instruction *InstCombinerImpl::visitAdd(BinaryOperator &I) {

if (Instruction *V =

canonicalizeCondSignextOfHighBitExtractToSignextHighBitExtract(I))

return V;

if (Instruction *SatAdd = foldToUnsignedSaturatedAdd(I))

return SatAdd;

// usub.sat(A, B) + A => umax(A, B)

jroelofsUnsubmitted

Not Done

return SatAdd;

- // usub.sat(A, B) + A => umax(A, B)

+ // usub.sat(A, B) + B => umax(A, B)

if (match(&I, m_c_BinOp(

Comment doesn't seem to match what the code is doing.

jroelofs: Comment doesn't seem to match what the code is doing.

if (match(&I, m_c_BinOp(

m_OneUse(m_Intrinsic<Intrinsic::usub_sat>(m_Value(A), m_Value(B))),

m_Deferred(B)))) {

return replaceInstUsesWith(I,

Builder.CreateIntrinsic(Intrinsic::umax, {I.getType()}, {A, B}));

}

return Changed ? &I : nullptr;

}

/// Eliminate an op from a linear interpolation (lerp) pattern.

static Instruction *factorizeLerp(BinaryOperator &I,

InstCombiner::BuilderTy &Builder) {

Value *X, *Y, *Z;

if (!match(&I, m_c_FAdd(m_OneUse(m_c_FMul(m_Value(Y),

▲ Show 20 Lines • Show All 847 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/saturating-add-sub.ll

Show First 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines	;
%aa = add nsw <2 x i8> %a, <i8 7, i8 6>		%aa = add nsw <2 x i8> %a, <i8 7, i8 6>
%bb = and <2 x i8> %b, <i8 7, i8 6>		%bb = and <2 x i8> %b, <i8 7, i8 6>
%r = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> %aa, <2 x i8> %bb)		%r = call <2 x i8> @llvm.ssub.sat.v2i8(<2 x i8> %aa, <2 x i8> %bb)
ret <2 x i8> %r		ret <2 x i8> %r
}		}

define i8 @test_scalar_usub_add(i8 %a, i8 %b) {		define i8 @test_scalar_usub_add(i8 %a, i8 %b) {
; CHECK-LABEL: @test_scalar_usub_add(		; CHECK-LABEL: @test_scalar_usub_add(
; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[A:%.]], i8 [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = call i8 @llvm.umax.i8(i8 [[A:%.]], i8 [[B:%.*]])
; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]		; CHECK-NEXT: ret i8 [[TMP1]]
; CHECK-NEXT: ret i8 [[RES]]
;		;
%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)		%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)
%res = add i8 %sat, %b		%res = add i8 %sat, %b
ret i8 %res		ret i8 %res
}		}

define i8 @test_scalar_usub_add_extra_use(i8 %a, i8 %b, i8* %p) {		define i8 @test_scalar_usub_add_extra_use(i8 %a, i8 %b, i8* %p) {
; CHECK-LABEL: @test_scalar_usub_add_extra_use(		; CHECK-LABEL: @test_scalar_usub_add_extra_use(
; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[A:%.]], i8 [[B:%.*]])		; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[A:%.]], i8 [[B:%.*]])
; CHECK-NEXT: store i8 [[SAT]], i8* [[P:%.*]], align 1		; CHECK-NEXT: store i8 [[SAT]], i8* [[P:%.*]], align 1
; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]		; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]
; CHECK-NEXT: ret i8 [[RES]]		; CHECK-NEXT: ret i8 [[RES]]
;		;
%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)		%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)
store i8 %sat, i8* %p		store i8 %sat, i8* %p
%res = add i8 %sat, %b		%res = add i8 %sat, %b
ret i8 %res		ret i8 %res
}		}

define i8 @test_scalar_usub_add_commuted(i8 %a, i8 %b) {		define i8 @test_scalar_usub_add_commuted(i8 %a, i8 %b) {
; CHECK-LABEL: @test_scalar_usub_add_commuted(		; CHECK-LABEL: @test_scalar_usub_add_commuted(
; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[A:%.]], i8 [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = call i8 @llvm.umax.i8(i8 [[A:%.]], i8 [[B:%.*]])
; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]		; CHECK-NEXT: ret i8 [[TMP1]]
; CHECK-NEXT: ret i8 [[RES]]
;		;
%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)		%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 %b)
%res = add i8 %b, %sat		%res = add i8 %b, %sat
ret i8 %res		ret i8 %res
}		}

define i8 @test_scalar_usub_add_commuted_wrong(i8 %a, i8 %b) {		define i8 @test_scalar_usub_add_commuted_wrong(i8 %a, i8 %b) {
; CHECK-LABEL: @test_scalar_usub_add_commuted_wrong(		; CHECK-LABEL: @test_scalar_usub_add_commuted_wrong(
; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[B:%.]], i8 [[A:%.*]])		; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[B:%.]], i8 [[A:%.*]])
; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]		; CHECK-NEXT: [[RES:%.*]] = add i8 [[SAT]], [[B]]
; CHECK-NEXT: ret i8 [[RES]]		; CHECK-NEXT: ret i8 [[RES]]
;		;
%sat = call i8 @llvm.usub.sat.i8(i8 %b, i8 %a)		%sat = call i8 @llvm.usub.sat.i8(i8 %b, i8 %a)
%res = add i8 %sat, %b		%res = add i8 %sat, %b
ret i8 %res		ret i8 %res
}		}

define i8 @test_scalar_usub_add_const(i8 %a) {		define i8 @test_scalar_usub_add_const(i8 %a) {
; CHECK-LABEL: @test_scalar_usub_add_const(		; CHECK-LABEL: @test_scalar_usub_add_const(
; CHECK-NEXT: [[SAT:%.]] = call i8 @llvm.usub.sat.i8(i8 [[A:%.]], i8 42)		; CHECK-NEXT: [[TMP1:%.]] = call i8 @llvm.umax.i8(i8 [[A:%.]], i8 42)
; CHECK-NEXT: [[RES:%.*]] = add nuw i8 [[SAT]], 42		; CHECK-NEXT: ret i8 [[TMP1]]
; CHECK-NEXT: ret i8 [[RES]]
;		;
%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 42)		%sat = call i8 @llvm.usub.sat.i8(i8 %a, i8 42)
%res = add i8 %sat, 42		%res = add i8 %sat, 42
ret i8 %res		ret i8 %res
}		}

define i8 @test_scalar_uadd_sub(i8 %a, i8 %b) {		define i8 @test_scalar_uadd_sub(i8 %a, i8 %b) {
; CHECK-LABEL: @test_scalar_uadd_sub(		; CHECK-LABEL: @test_scalar_uadd_sub(
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	;
%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 42)		%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 42)
%res = sub i8 %sat, 42		%res = sub i8 %sat, 42
ret i8 %res		ret i8 %res
}		}

define i1 @scalar_uadd_eq_zero(i8 %a, i8 %b) {		define i1 @scalar_uadd_eq_zero(i8 %a, i8 %b) {
; CHECK-LABEL: @scalar_uadd_eq_zero(		; CHECK-LABEL: @scalar_uadd_eq_zero(
; CHECK-NEXT: [[TMP1:%.]] = or i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = or i8 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i8 [[TMP1]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp eq i8 [[TMP1]], 0
; CHECK-NEXT: ret i1 [[TMP2]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 %b)		%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 %b)
%cmp = icmp eq i8 %sat, 0		%cmp = icmp eq i8 %sat, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @scalar_uadd_ne_zero(i8 %a, i8 %b) {		define i1 @scalar_uadd_ne_zero(i8 %a, i8 %b) {
; CHECK-LABEL: @scalar_uadd_ne_zero(		; CHECK-LABEL: @scalar_uadd_ne_zero(
; CHECK-NEXT: [[TMP1:%.]] = or i8 [[A:%.]], [[B:%.*]]		; CHECK-NEXT: [[TMP1:%.]] = or i8 [[A:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.*]] = icmp ne i8 [[TMP1]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp ne i8 [[TMP1]], 0
; CHECK-NEXT: ret i1 [[TMP2]]		; CHECK-NEXT: ret i1 [[CMP]]
;		;
%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 %b)		%sat = call i8 @llvm.uadd.sat.i8(i8 %a, i8 %b)
%cmp = icmp ne i8 %sat, 0		%cmp = icmp ne i8 %sat, 0
ret i1 %cmp		ret i1 %cmp
}		}

define i1 @scalar_usub_eq_zero(i8 %a, i8 %b) {		define i1 @scalar_usub_eq_zero(i8 %a, i8 %b) {
; CHECK-LABEL: @scalar_usub_eq_zero(		; CHECK-LABEL: @scalar_usub_eq_zero(
▲ Show 20 Lines • Show All 520 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 288681

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/test/Transforms/InstCombine/saturating-add-sub.ll

[InstCombine] usub.sat(a, b) + b => umax(a, b) (PR42178)
ClosedPublic