Download Raw Diff

Details

Reviewers

nikic
lebedev.ri
spatel

Commits

rG59b56e5c579c: [InstCombine] Expand usub_sat patterns to handle constants

Summary

The constants come through as add %x, -C, not a sub as would be expected. They need some extra matchers to canonicalise them towards usub_sat.

----------------------------------------
Optimization: Forwards
Precondition: true
  %cmp = icmp ugt i32 %a, 10
  %sub = add i32 %a, -10
  %sel = select i1 %cmp, i32 %sub, i32 0
=>
  %sel = usub_sat %a, 10

Done: 1
Optimization is correct!

----------------------------------------
Optimization: Backwards
Precondition: true
  %cmp = icmp ult i32 %a, 2
  %sub = add i32 %a, -2
  %sel = select i1 %cmp, i32 %sub, i32 0
=>
  %x = usub_sat 2, %a
  %sel = sub 0, %x

Done: 1
Optimization is correct!

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Oct 28 2019, 8:36 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2019, 8:36 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

dmgreen edited the summary of this revision. (Show Details)Oct 28 2019, 8:36 AM

Looks like there should be some extra patterns for "one-off" constants too.

Thank you for looking into this.

llvm/test/Transforms/InstCombine/unsigned_saturated_sub.ll
357–358	In this pattern either one of these two instructions must be one-use and go away.

lebedev.ri requested changes to this revision.Oct 30 2019, 2:23 PM

This revision now requires changes to proceed.Oct 30 2019, 2:23 PM

Sorry for the delay. I was trying to look into the "off by one" patterns, the last two of these:

Name: Forwards
Pre: C1 == -C2
  %cmp = icmp ugt i8 %a, C1
  %sub = add i8 %a, C2
  %sel = select i1 %cmp, i8 %sub, i8 0
=>
  %sel = usub_sat %a, C1

Name: Backwards
Pre: C1 == -C2
  %cmp = icmp ult i8 %a, C1
  %sub = add i8 %a, C2
  %sel = select i1 %cmp, i8 %sub, i8 0
=>
  %x = usub_sat C1, %a
  %sel = sub 0, %x

Name: Forwards_off
Pre: C1 + 1 == -C2 && C2 != 0
  %cmp = icmp ugt i8 %a, C1
  %sub = add i8 %a, C2
  %sel = select i1 %cmp, i8 %sub, i8 0
=>
  %sel = usub_sat %a, C1 + 1

Name: Backwards_off
Pre: C1 - 1 == -C2 && C1 != 0
  %cmp = icmp ult i8 %a, C1
  %sub = add i8 %a, C2
  %sel = select i1 %cmp, i8 %sub, i8 0
=>
  %x = usub_sat C1 - 1, %a
  %sel = sub 0, %x

They are a bit fiddly though and deserve to be their own commit.

(marking as reviewed)

Will you also look into the edge-case patterns, like:

unsigned t0(unsigned num) {
    return num > 0 ? num-1 : num;
}

https://godbolt.org/z/DsBxd8

Name: unsigned minus 1
  %2 = icmp ugt i32 %0, 1
  %3 = add i32 %0, -1
  %4 = select i1 %2, i32 %3, i32 0
  ret i32 %4
=>
  %r = usub_sat %0, 1
  ret %r
  %3 = add i32 %0, -1
  %2 = icmp ugt i32 %0, 1
  %4 = select i1 %2, i32 %3, i32 0

Done: 1
Optimization is correct!

(not sure what other ones are there, this is the one that i noticed in real code..)

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
722–724	Err, this isn't quite right. I should have been more specific. We produce a single instruction `@llvm.usub.sat.`, so there shouldn't be a one-use check in general. But we also may need to negate the result. So if we need to negate the result, then we need ensure that a single instruction goes away - either `icmp` or `sub`.

This revision now requires changes to proceed.Nov 10 2019, 10:20 AM

dmgreen marked an inline comment as done.Nov 28 2019, 10:21 AM

dmgreen added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
722–724	OK. I see. I was thinking you were referring to assembly instructions being longer if the icmp had multiple uses. There may be some benefit to making sure the output is actually smaller. I've updated the code. Let me know if this was or wasn't what you were thinking.

Sorry for the delay. Adjust this one use check.

lebedev.ri marked an inline comment as done.Nov 28 2019, 10:50 AM

lebedev.ri added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
707–710	I'm having doubt about the comment. If `b < a`, then `b - a` will wrap, and we'll return wrapped result. And if `b >= a`, then `b - a` will not wrap, but we will saturate to `0`. What am i missing?
722–724	Yep, this seems what i meant.

dmgreen marked 2 inline comments as done.Nov 28 2019, 12:46 PM

dmgreen added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
707–710	From the comment below, this looks like it is referring to `(a > b) ? b - a : 0 -> -usub.sat(a, b)` I'll try and update it to be clearer.

dmgreen updated this revision to Diff 231464.Nov 28 2019, 12:52 PM

LGTM, thank you.

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp
707–710	Aha, okay, thank you.
722–725	I would recommend commit this as a separate commit first.

This revision is now accepted and ready to land.Nov 28 2019, 1:22 PM

Closed by commit rG59b56e5c579c: [InstCombine] Expand usub_sat patterns to handle constants (authored by dmgreen). · Explain WhyNov 30 2019, 9:40 AM

This revision was automatically updated to reflect the committed changes.

dmgreen mentioned this in rG3a1bef5616c3: [InstCombine] Adjust usub_sat fold one use checks.

Diff 231591

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	if (Pred == ICmpInst::ICMP_ULE \|\| Pred == ICmpInst::ICMP_ULT) {
// (b < a) ? a - b : 0 -> (a > b) ? a - b : 0		// (b < a) ? a - b : 0 -> (a > b) ? a - b : 0
std::swap(A, B);		std::swap(A, B);
Pred = ICmpInst::getSwappedPredicate(Pred);		Pred = ICmpInst::getSwappedPredicate(Pred);
}		}

assert((Pred == ICmpInst::ICMP_UGE \|\| Pred == ICmpInst::ICMP_UGT) &&		assert((Pred == ICmpInst::ICMP_UGE \|\| Pred == ICmpInst::ICMP_UGT) &&
"Unexpected isUnsigned predicate!");		"Unexpected isUnsigned predicate!");

// Account for swapped form of subtraction: ((a > b) ? b - a : 0).		// Ensure the sub is of the form:
		// (a > b) ? a - b : 0 -> usub.sat(a, b)
		// (a > b) ? b - a : 0 -> -usub.sat(a, b)
		// Checking for both a-b and a+(-b) as a constant.
		lebedev.riUnsubmitted Not Done Reply Inline Actions I'm having doubt about the comment. If `b < a`, then `b - a` will wrap, and we'll return wrapped result. And if `b >= a`, then `b - a` will not wrap, but we will saturate to `0`. What am i missing? lebedev.ri: I'm having doubt about the comment. If `b < a`, then `b - a` will wrap, and we'll return…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions From the comment below, this looks like it is referring to `(a > b) ? b - a : 0 -> -usub.sat(a, b)` I'll try and update it to be clearer. dmgreen: From the comment below, this looks like it is referring to `(a > b) ? b - a : 0 -> -usub.sat(a…
		lebedev.riUnsubmitted Done Reply Inline Actions Aha, okay, thank you. lebedev.ri: Aha, okay, thank you.
bool IsNegative = false;		bool IsNegative = false;
if (match(TrueVal, m_Sub(m_Specific(B), m_Specific(A))))		const APInt *C;
		if (match(TrueVal, m_Sub(m_Specific(B), m_Specific(A))) \|\|
		(match(A, m_APInt(C)) &&
		match(TrueVal, m_Add(m_Specific(B), m_SpecificInt(-*C)))))
IsNegative = true;		IsNegative = true;
else if (!match(TrueVal, m_Sub(m_Specific(A), m_Specific(B))))		else if (!match(TrueVal, m_Sub(m_Specific(A), m_Specific(B))) &&
		!(match(B, m_APInt(C)) &&
		match(TrueVal, m_Add(m_Specific(A), m_SpecificInt(-*C)))))
return nullptr;		return nullptr;

// If we are adding a negate and the sub and icmp are used anywhere else, we		// If we are adding a negate and the sub and icmp are used anywhere else, we
// would end up with more instructions.		// would end up with more instructions.
if (IsNegative && !TrueVal->hasOneUse() && !ICI->hasOneUse())		if (IsNegative && !TrueVal->hasOneUse() && !ICI->hasOneUse())
		lebedev.riUnsubmitted Not Done Reply Inline Actions Err, this isn't quite right. I should have been more specific. We produce a single instruction `@llvm.usub.sat.`, so there shouldn't be a one-use check in general. But we also may need to negate the result. So if we need to negate the result, then we need ensure that a single instruction goes away - either `icmp` or `sub`. lebedev.ri: Err, this isn't quite right. I should have been more specific. We produce a single instruction…
		dmgreenAuthorUnsubmitted Done Reply Inline Actions OK. I see. I was thinking you were referring to assembly instructions being longer if the icmp had multiple uses. There may be some benefit to making sure the output is actually smaller. I've updated the code. Let me know if this was or wasn't what you were thinking. dmgreen: OK. I see. I was thinking you were referring to assembly instructions being longer if the icmp…
		lebedev.riUnsubmitted Done Reply Inline Actions Yep, this seems what i meant. lebedev.ri: Yep, this seems what i meant.
return nullptr;		return nullptr;
		lebedev.riUnsubmitted Not Done Reply Inline Actions I would recommend commit this as a separate commit first. lebedev.ri: I would recommend commit this as a separate commit first.

// (a > b) ? a - b : 0 -> usub.sat(a, b)		// (a > b) ? a - b : 0 -> usub.sat(a, b)
// (a > b) ? b - a : 0 -> -usub.sat(a, b)		// (a > b) ? b - a : 0 -> -usub.sat(a, b)
Value *Result = Builder.CreateBinaryIntrinsic(Intrinsic::usub_sat, A, B);		Value *Result = Builder.CreateBinaryIntrinsic(Intrinsic::usub_sat, A, B);
if (IsNegative)		if (IsNegative)
Result = Builder.CreateNeg(Result);		Result = Builder.CreateNeg(Result);
return Result;		return Result;
}		}
▲ Show 20 Lines • Show All 2,056 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
%ptr = call i8* @malloc(i64 %sz)		%ptr = call i8* @malloc(i64 %sz)
%ptr2 = getelementptr inbounds i8, i8* %ptr, i32 2		%ptr2 = getelementptr inbounds i8, i8* %ptr, i32 2
%calc_size = call i64 @llvm.objectsize.i64.p0i8(i8* %ptr2, i1 false, i1 true, i1 true)		%calc_size = call i64 @llvm.objectsize.i64.p0i8(i8* %ptr2, i1 false, i1 true, i1 true)
ret i64 %calc_size		ret i64 %calc_size
}		}

; CHECK: define i64 @internal_pointer(i64 %sz)		; CHECK: define i64 @internal_pointer(i64 %sz)
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: %0 = add i64 %sz, -2		; CHECK-NEXT: %0 = call i64 @llvm.usub.sat.i64(i64 %sz, i64 2)
; CHECK-NEXT: %1 = icmp ult i64 %sz, 2		; CHECK-NEXT: ret i64 %0
; CHECK-NEXT: %2 = select i1 %1, i64 0, i64 %0
; CHECK-NEXT: ret i64 %2
; CHECK-NEXT: }		; CHECK-NEXT: }

define i64 @uses_nullptr_no_fold() {		define i64 @uses_nullptr_no_fold() {
entry:		entry:
%res = call i64 @llvm.objectsize.i64.p0i8(i8* null, i1 false, i1 true, i1 true)		%res = call i64 @llvm.objectsize.i64.p0i8(i8* null, i1 false, i1 true, i1 true)
ret i64 %res		ret i64 %res
}		}

▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/unsigned_saturated_sub.ll

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	;
%sel = select i1 %cmp, i64 0 ,i64 %sub		%sel = select i1 %cmp, i64 0 ,i64 %sub
%extrasub = sub i64 %a, %b		%extrasub = sub i64 %a, %b
call void @use(i64 %extrasub)		call void @use(i64 %extrasub)
ret i64 %sel		ret i64 %sel
}		}

define i32 @max_sub_ugt_c1(i32 %a) {		define i32 @max_sub_ugt_c1(i32 %a) {
; CHECK-LABEL: @max_sub_ugt_c1(		; CHECK-LABEL: @max_sub_ugt_c1(
; CHECK-NEXT: [[CMP:%.]] = icmp ugt i32 [[A:%.]], 1		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.usub.sat.i32(i32 [[A:%.]], i32 1)
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -1		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 0
; CHECK-NEXT: ret i32 [[SEL]]
;		;
%cmp = icmp ugt i32 %a, 1		%cmp = icmp ugt i32 %a, 1
%sub = add i32 %a, -1		%sub = add i32 %a, -1
%sel = select i1 %cmp, i32 %sub ,i32 0		%sel = select i1 %cmp, i32 %sub ,i32 0
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ugt_c01(i32 %a) {		define i32 @max_sub_ugt_c01(i32 %a) {
; CHECK-LABEL: @max_sub_ugt_c01(		; CHECK-LABEL: @max_sub_ugt_c01(
; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], 0		; CHECK-NEXT: [[CMP:%.]] = icmp eq i32 [[A:%.]], 0
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -1		; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -1
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 0, i32 [[SUB]]		; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 0, i32 [[SUB]]
; CHECK-NEXT: ret i32 [[SEL]]		; CHECK-NEXT: ret i32 [[SEL]]
;		;
%cmp = icmp ugt i32 %a, 0		%cmp = icmp ugt i32 %a, 0
%sub = add i32 %a, -1		%sub = add i32 %a, -1
%sel = select i1 %cmp, i32 %sub ,i32 0		%sel = select i1 %cmp, i32 %sub ,i32 0
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ugt_c10(i32 %a) {		define i32 @max_sub_ugt_c10(i32 %a) {
; CHECK-LABEL: @max_sub_ugt_c10(		; CHECK-LABEL: @max_sub_ugt_c10(
; CHECK-NEXT: [[CMP:%.]] = icmp ugt i32 [[A:%.]], 10		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.usub.sat.i32(i32 [[A:%.]], i32 10)
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -10		; CHECK-NEXT: ret i32 [[TMP1]]
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 0
; CHECK-NEXT: ret i32 [[SEL]]
;		;
%cmp = icmp ugt i32 %a, 10		%cmp = icmp ugt i32 %a, 10
%sub = add i32 %a, -10		%sub = add i32 %a, -10
%sel = select i1 %cmp, i32 %sub, i32 0		%sel = select i1 %cmp, i32 %sub, i32 0
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ugt_c910(i32 %a) {		define i32 @max_sub_ugt_c910(i32 %a) {
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	;
%cmp = icmp ult i32 %a, 1		%cmp = icmp ult i32 %a, 1
%sub = add i32 %a, -1		%sub = add i32 %a, -1
%sel = select i1 %cmp, i32 %sub, i32 0		%sel = select i1 %cmp, i32 %sub, i32 0
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ult_c2(i32 %a) {		define i32 @max_sub_ult_c2(i32 %a) {
; CHECK-LABEL: @max_sub_ult_c2(		; CHECK-LABEL: @max_sub_ult_c2(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[A:%.]], 2		; CHECK-NEXT: [[TMP1:%.]] = call i32 @llvm.usub.sat.i32(i32 2, i32 [[A:%.]])
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -2		; CHECK-NEXT: [[TMP2:%.*]] = sub nsw i32 0, [[TMP1]]
lebedev.riUnsubmitted Not Done Reply Inline Actions In this pattern either one of these two instructions must be one-use and go away. lebedev.ri: In this pattern either one of these two instructions must be one-use and go away.
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 0		; CHECK-NEXT: ret i32 [[TMP2]]
; CHECK-NEXT: ret i32 [[SEL]]
;		;
%cmp = icmp ult i32 %a, 2		%cmp = icmp ult i32 %a, 2
%sub = add i32 %a, -2		%sub = add i32 %a, -2
%sel = select i1 %cmp, i32 %sub, i32 0		%sel = select i1 %cmp, i32 %sub, i32 0
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ult_c2_oneuseicmp(i32 %a) {		define i32 @max_sub_ult_c2_oneuseicmp(i32 %a) {
; CHECK-LABEL: @max_sub_ult_c2_oneuseicmp(		; CHECK-LABEL: @max_sub_ult_c2_oneuseicmp(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[A:%.]], 2		; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[A:%.]], 2
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -2		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.usub.sat.i32(i32 2, i32 [[A]])
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = sub nsw i32 0, [[TMP1]]
; CHECK-NEXT: call void @usei1(i1 [[CMP]])		; CHECK-NEXT: call void @usei1(i1 [[CMP]])
; CHECK-NEXT: ret i32 [[SEL]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%cmp = icmp ult i32 %a, 2		%cmp = icmp ult i32 %a, 2
%sub = add i32 %a, -2		%sub = add i32 %a, -2
%sel = select i1 %cmp, i32 %sub, i32 0		%sel = select i1 %cmp, i32 %sub, i32 0
call void @usei1(i1 %cmp)		call void @usei1(i1 %cmp)
ret i32 %sel		ret i32 %sel
}		}

define i32 @max_sub_ult_c2_oneusesub(i32 %a) {		define i32 @max_sub_ult_c2_oneusesub(i32 %a) {
; CHECK-LABEL: @max_sub_ult_c2_oneusesub(		; CHECK-LABEL: @max_sub_ult_c2_oneusesub(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[A:%.]], 2		; CHECK-NEXT: [[SUB:%.]] = add i32 [[A:%.]], -2
; CHECK-NEXT: [[SUB:%.*]] = add i32 [[A]], -2		; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.usub.sat.i32(i32 2, i32 [[A]])
; CHECK-NEXT: [[SEL:%.*]] = select i1 [[CMP]], i32 [[SUB]], i32 0		; CHECK-NEXT: [[TMP2:%.*]] = sub nsw i32 0, [[TMP1]]
; CHECK-NEXT: call void @usei32(i32 [[SUB]])		; CHECK-NEXT: call void @usei32(i32 [[SUB]])
; CHECK-NEXT: ret i32 [[SEL]]		; CHECK-NEXT: ret i32 [[TMP2]]
;		;
%cmp = icmp ult i32 %a, 2		%cmp = icmp ult i32 %a, 2
%sub = add i32 %a, -2		%sub = add i32 %a, -2
%sel = select i1 %cmp, i32 %sub, i32 0		%sel = select i1 %cmp, i32 %sub, i32 0
call void @usei32(i32 %sub)		call void @usei32(i32 %sub)
ret i32 %sel		ret i32 %sel
}		}

▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Expand usub_sat patterns to handle constants
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 231591

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll

llvm/test/Transforms/InstCombine/unsigned_saturated_sub.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Expand usub_sat patterns to handle constantsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 231591

llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp

llvm/test/Transforms/InstCombine/builtin-dynamic-object-size.ll

llvm/test/Transforms/InstCombine/unsigned_saturated_sub.ll

[InstCombine] Expand usub_sat patterns to handle constants
ClosedPublic