This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
5/7
PatternMatch.h
-
lib/
-
CodeGen/
1
CodeGenPrepare.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineCompares.cpp
-
test/Transforms/CodeGenPrepare/
-
Transforms/
-
CodeGenPrepare/
-
AArch64/
-
overflow-intrinsics.ll
-
X86/
1
overflow-intrinsics.ll

Differential D74228

[PatternMatch] Match XOR variant of unsigned-add overflow check.
ClosedPublic

Authored by fhahn on Feb 7 2020, 9:13 AM.

Download Raw Diff

Details

Reviewers

nikic
RKSimon
lebedev.ri
spatel

Commits

rGe01a3d49c224: [PatternMatch] Match XOR variant of unsigned-add overflow check.

Summary

Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Feb 7 2020, 9:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 9:13 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B45948: Diff 243190.Feb 7 2020, 9:49 AM

Tests missing.
What about commutted variant?

llvm/include/llvm/IR/PatternMatch.h
1709	// (a ^ -1 <u b)
1712	Will avoiding variable result in 80-char column overflow?
llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
1	This doesn't seem right..

nikic added inline comments.Feb 7 2020, 10:07 AM

llvm/include/llvm/IR/PatternMatch.h
1711	Does m_SpecificInt sign extend? I'd use `m_AllOnes()` here. Or for that matter `m_Not(m_Value(Op1))`, though that will require reordering.

Resurrect test, adjust users of UAddWithOverflow. I am not sure if we should change the name of the matcher or the third operand, as the 'Sum' option now may return the XOR.

In D74228#1864311, @lebedev.ri wrote:

Tests missing.

Should be back again.

What about commutted variant?

I will add those if the direction here is the right one.

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 7 2020, 11:37 AM

Harbormaster completed remote builds in B45968: Diff 243254.Feb 7 2020, 12:26 PM

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

In D74228#1865537, @nikic wrote:

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

We tried to canonicalize to them in instcombine, but this caused perf regressions.
We converted some of them in CGP instead.
We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.
We crippled the CGP transforms to avoid regressions.

I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

In D74228#1865587, @spatel wrote:

In D74228#1865537, @nikic wrote:

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

We tried to canonicalize to them in instcombine, but this caused perf regressions.

We converted some of them in CGP instead.

We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.

We crippled the CGP transforms to avoid regressions.

I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

In D74228#1867093, @fhahn wrote:

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

Ah, I see it now. I wasn't reading the diff in InstCombiner::visitICmpInst() correctly. Might be nicer to include something like:

// m_UAddWithOverflow can match patterns that do not include
// an explicit "add" instruction, so check the opcode of the matched op.

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).
Add full end-to-end tests for C --> asm to test-suite.

Document additional pattern match by UAddWithOverflow and add comment on why we match for AddInst explicitly in InstCombine.

In D74228#1867452, @spatel wrote:

In D74228#1867093, @fhahn wrote:

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).

Add full end-to-end tests for C --> asm to test-suite.

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

Harbormaster failed remote builds in B46211: Diff 243816!Feb 11 2020, 5:16 AM

In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:

define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}

...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

In D74228#1869653, @spatel wrote:
In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:
define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}
...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

I think that should cover it, thanks for sharing the problematic case on SPARC.

In D74228#1870169, @fhahn wrote:
In D74228#1869653, @spatel wrote:
In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:
define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}
...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.
The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

In D74228#1870505, @spatel wrote:

In D74228#1870169, @fhahn wrote:

In D74228#1869653, @spatel wrote:

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

+1, I've run into a few I am trying to address with this and other patches in the area :)

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent. So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?
@nikic or others - does that match your expectations?

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

Should we simply start forming @llvm.with.overflow intrinsics in middle-end more often?

In D74228#1877776, @nikic wrote:

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that. For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

In D74228#1878189, @nikic wrote:

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

We almost can now, it would be C = freeze(B); (A + C < C)
Almost, because i think codegen patch D29014 is stuck :/

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

In D74228#1878177, @lebedev.ri wrote:

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

In D74228#1878307, @fhahn wrote:

In D74228#1878177, @lebedev.ri wrote:

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

I've put up a patch adjusting the shouldFormOverflowOp hook to consider whether the math result is used: D74722

fhahn added a parent revision: D74722: [TargetLower] Update shouldFormOverflowOp check if math is used..Feb 17 2020, 8:31 AM

I'd like to see (at least?) the commutative case handled.

llvm/include/llvm/IR/PatternMatch.h
1712	Hm, i don't really see one-use checks in similar neighboring patterns here.

use m_c_Xor to handle commutative case

fhahn marked an inline comment as done.Feb 18 2020, 7:13 AM

fhahn added inline comments.

llvm/include/llvm/IR/PatternMatch.h
1712	I can drop it here, but I think we should be more restrictive here, because we cannot use the math result for the xor. Alternatively we can have the use check in CGP.

fhahn added a parent revision: D74771: [PatternMatch] Move UAddWithOverflow matchers further down (NFC)..Feb 18 2020, 7:13 AM

lebedev.ri added inline comments.Feb 18 2020, 7:25 AM

llvm/include/llvm/IR/PatternMatch.h
2096	We know constant is always on rhs of binop. I was talking about `b u> (a ^ -1)` case there. Also, not strictly related to this patch, but does this simply ignore inverted cases, like `(a ^ -1) u>= b`?

Harbormaster completed remote builds in B46708: Diff 245159.Feb 18 2020, 7:30 AM

Actually update to match b u> (a ^ -1). Remove m_c_Xor again. I'll make sure the code in PatternMatch.h stays at its current position.

fhahn marked an inline comment as done.Feb 18 2020, 2:45 PM

fhahn added inline comments.

llvm/include/llvm/IR/PatternMatch.h
2096	I was talking about b u> (a ^ -1) case there. Right, that should be fixed in the latest version. I've updated `replaceMathCmpWithIntrinsic` to take the A and B args directly, rather than getting it from the BO, because with Xor that's not straight forward. Also, not strictly related to this patch, but does this simply ignore inverted cases, like (a ^ -1) u>= b? For now yes. Not sure how common that is, but it could be added as follow-up

Update test checks, assertion.

This seems good to me but would be good for @nikic to comment.

This revision is now accepted and ready to land.Feb 18 2020, 3:52 PM

LG to me as well.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1230	nit: xor or XOR

fhahn mentioned this in rG7cbf710396df: [CGP] Precommit tests for D74228..Feb 19 2020, 1:00 AM

Harbormaster failed remote builds in B46763: Diff 245278!Feb 19 2020, 2:35 AM

Harbormaster failed remote builds in B46766: Diff 245285!Feb 19 2020, 4:04 AM

Closed by commit rGe01a3d49c224: [PatternMatch] Match XOR variant of unsigned-add overflow check. (authored by fhahn). · Explain WhyFeb 19 2020, 6:32 AM

This revision was automatically updated to reflect the committed changes.

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

In D74228#1883057, @vsk wrote:

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

Probably. I've revert the change for now.

fhahn mentioned this in rG36300bc1ae5e: [CGP] Precommit tests for D74228..Jul 14 2020, 4:27 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

PatternMatch.h

16 lines

lib/

CodeGen/

CodeGenPrepare.cpp

28 lines

Transforms/

InstCombine/

InstCombineCompares.cpp

7 lines

test/

Transforms/

CodeGenPrepare/

AArch64/

overflow-intrinsics.ll

16 lines

X86/

overflow-intrinsics.ll

14 lines

Diff 245397

llvm/include/llvm/IR/PatternMatch.h

Show First 20 Lines • Show All 1,668 Lines • ▼ Show 20 Lines
/// m_UnordFMin(L, R) = L iff L or R are NaN		/// m_UnordFMin(L, R) = L iff L or R are NaN
template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline MaxMin_match<FCmpInst, LHS, RHS, ufmin_pred_ty>		inline MaxMin_match<FCmpInst, LHS, RHS, ufmin_pred_ty>
m_UnordFMin(const LHS &L, const RHS &R) {		m_UnordFMin(const LHS &L, const RHS &R) {
return MaxMin_match<FCmpInst, LHS, RHS, ufmin_pred_ty>(L, R);		return MaxMin_match<FCmpInst, LHS, RHS, ufmin_pred_ty>(L, R);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Matchers for overflow check patterns: e.g. (a + b) u< a		// Matchers for overflow check patterns: e.g. (a + b) u< a, (a ^ -1) <u b
		// Note that S might be matched to other instructions than AddInst.
//		//

template <typename LHS_t, typename RHS_t, typename Sum_t>		template <typename LHS_t, typename RHS_t, typename Sum_t>
struct UAddWithOverflow_match {		struct UAddWithOverflow_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;
Sum_t S;		Sum_t S;

Show All 14 Lines	if (Pred == ICmpInst::ICMP_ULT)
if (AddExpr.match(ICmpLHS) && (ICmpRHS == AddLHS \|\| ICmpRHS == AddRHS))		if (AddExpr.match(ICmpLHS) && (ICmpRHS == AddLHS \|\| ICmpRHS == AddRHS))
return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpLHS);		return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpLHS);

// a >u (a + b), b >u (a + b)		// a >u (a + b), b >u (a + b)
if (Pred == ICmpInst::ICMP_UGT)		if (Pred == ICmpInst::ICMP_UGT)
if (AddExpr.match(ICmpRHS) && (ICmpLHS == AddLHS \|\| ICmpLHS == AddRHS))		if (AddExpr.match(ICmpRHS) && (ICmpLHS == AddLHS \|\| ICmpLHS == AddRHS))
return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpRHS);		return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpRHS);

		Value *Op1;
		lebedev.riUnsubmitted Done Reply Inline Actions // (a ^ -1 <u b) lebedev.ri: ``` // (a ^ -1 <u b) ```
		auto XorExpr = m_OneUse(m_Xor(m_Value(Op1), m_AllOnes()));
		// (a ^ -1) <u b
		nikicUnsubmitted Done Reply Inline Actions Does m_SpecificInt sign extend? I'd use `m_AllOnes()` here. Or for that matter `m_Not(m_Value(Op1))`, though that will require reordering. nikic: Does m_SpecificInt sign extend? I'd use `m_AllOnes()` here. Or for that matter `m_Not(m_Value…
		if (Pred == ICmpInst::ICMP_ULT) {
		lebedev.riUnsubmitted Done Reply Inline Actions Will avoiding variable result in 80-char column overflow? lebedev.ri: Will avoiding variable result in 80-char column overflow?
		lebedev.riUnsubmitted Not Done Reply Inline Actions Hm, i don't really see one-use checks in similar neighboring patterns here. lebedev.ri: Hm, i don't really see one-use checks in similar neighboring patterns here.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I can drop it here, but I think we should be more restrictive here, because we cannot use the math result for the xor. Alternatively we can have the use check in CGP. fhahn: I can drop it here, but I think we should be more restrictive here, because we cannot use the…
		if (XorExpr.match(ICmpLHS))
		return L.match(Op1) && R.match(ICmpRHS) && S.match(ICmpLHS);
		}
		// b > u (a ^ -1)
		if (Pred == ICmpInst::ICMP_UGT) {
		if (XorExpr.match(ICmpRHS))
		return L.match(Op1) && R.match(ICmpLHS) && S.match(ICmpRHS);
		}

// Match special-case for increment-by-1.		// Match special-case for increment-by-1.
if (Pred == ICmpInst::ICMP_EQ) {		if (Pred == ICmpInst::ICMP_EQ) {
// (a + 1) == 0		// (a + 1) == 0
// (1 + a) == 0		// (1 + a) == 0
if (AddExpr.match(ICmpLHS) && m_ZeroInt().match(ICmpRHS) &&		if (AddExpr.match(ICmpLHS) && m_ZeroInt().match(ICmpRHS) &&
(m_One().match(AddLHS) \|\| m_One().match(AddRHS)))		(m_One().match(AddLHS) \|\| m_One().match(AddRHS)))
return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpLHS);		return L.match(AddLHS) && R.match(AddRHS) && S.match(ICmpLHS);
// 0 == (a + 1)		// 0 == (a + 1)
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines

inline VScaleVal_match m_VScale(const DataLayout &DL) {		inline VScaleVal_match m_VScale(const DataLayout &DL) {
return VScaleVal_match(DL);		return VScaleVal_match(DL);
}		}

} // end namespace PatternMatch		} // end namespace PatternMatch
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_PATTERNMATCH_H		#endif // LLVM_IR_PATTERNMATCH_H
		lebedev.riUnsubmitted Not Done Reply Inline Actions We know constant is always on rhs of binop. I was talking about `b u> (a ^ -1)` case there. Also, not strictly related to this patch, but does this simply ignore inverted cases, like `(a ^ -1) u>= b`? lebedev.ri: We know constant is always on rhs of binop. I was talking about `b u> (a ^ -1)` case there.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I was talking about b u> (a ^ -1) case there. Right, that should be fixed in the latest version. I've updated `replaceMathCmpWithIntrinsic` to take the A and B args directly, rather than getting it from the BO, because with Xor that's not straight forward. Also, not strictly related to this patch, but does this simply ignore inverted cases, like (a ^ -1) u>= b? For now yes. Not sure how common that is, but it could be added as follow-up fhahn: > I was talking about b u> (a ^ -1) case there. Right, that should be fixed in the latest…

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 393 Lines • ▼ Show 20 Lines	bool performAddressTypePromotion(
Instruction *&Inst,		Instruction *&Inst,
bool AllowPromotionWithoutCommonHeader,		bool AllowPromotionWithoutCommonHeader,
bool HasPromoted, TypePromotionTransaction &TPT,		bool HasPromoted, TypePromotionTransaction &TPT,
SmallVectorImpl<Instruction *> &SpeculativelyMovedExts);		SmallVectorImpl<Instruction *> &SpeculativelyMovedExts);
bool splitBranchCondition(Function &F, bool &ModifiedDT);		bool splitBranchCondition(Function &F, bool &ModifiedDT);
bool simplifyOffsetableRelocate(Instruction &I);		bool simplifyOffsetableRelocate(Instruction &I);

bool tryToSinkFreeOperands(Instruction *I);		bool tryToSinkFreeOperands(Instruction *I);
bool replaceMathCmpWithIntrinsic(BinaryOperator BO, CmpInst Cmp,		bool replaceMathCmpWithIntrinsic(BinaryOperator BO, Value Arg0,
		Value Arg1, CmpInst Cmp,
Intrinsic::ID IID);		Intrinsic::ID IID);
bool optimizeCmp(CmpInst *Cmp, bool &ModifiedDT);		bool optimizeCmp(CmpInst *Cmp, bool &ModifiedDT);
bool combineToUSubWithOverflow(CmpInst *Cmp, bool &ModifiedDT);		bool combineToUSubWithOverflow(CmpInst *Cmp, bool &ModifiedDT);
bool combineToUAddWithOverflow(CmpInst *Cmp, bool &ModifiedDT);		bool combineToUAddWithOverflow(CmpInst *Cmp, bool &ModifiedDT);
};		};

} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 769 Lines • ▼ Show 20 Lines	static bool OptimizeNoopCopyExpression(CastInst *CI, const TargetLowering &TLI,
// If, after promotion, these are the same types, this is a noop copy.		// If, after promotion, these are the same types, this is a noop copy.
if (SrcVT != DstVT)		if (SrcVT != DstVT)
return false;		return false;

return SinkCast(CI);		return SinkCast(CI);
}		}

bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,		bool CodeGenPrepare::replaceMathCmpWithIntrinsic(BinaryOperator *BO,
		Value Arg0, Value Arg1,
CmpInst *Cmp,		CmpInst *Cmp,
Intrinsic::ID IID) {		Intrinsic::ID IID) {
if (BO->getParent() != Cmp->getParent()) {		if (BO->getParent() != Cmp->getParent()) {
// We used to use a dominator tree here to allow multi-block optimization.		// We used to use a dominator tree here to allow multi-block optimization.
// But that was problematic because:		// But that was problematic because:
// 1. It could cause a perf regression by hoisting the math op into the		// 1. It could cause a perf regression by hoisting the math op into the
// critical path.		// critical path.
// 2. It could cause a perf regression by creating a value that was live		// 2. It could cause a perf regression by creating a value that was live
// across multiple blocks and increasing register pressure.		// across multiple blocks and increasing register pressure.
// 3. Use of a dominator tree could cause large compile-time regression.		// 3. Use of a dominator tree could cause large compile-time regression.
// This is because we recompute the DT on every change in the main CGP		// This is because we recompute the DT on every change in the main CGP
// run-loop. The recomputing is probably unnecessary in many cases, so if		// run-loop. The recomputing is probably unnecessary in many cases, so if
// that was fixed, using a DT here would be ok.		// that was fixed, using a DT here would be ok.
return false;		return false;
}		}

// We allow matching the canonical IR (add X, C) back to (usubo X, -C).		// We allow matching the canonical IR (add X, C) back to (usubo X, -C).
Value *Arg0 = BO->getOperand(0);
Value *Arg1 = BO->getOperand(1);
if (BO->getOpcode() == Instruction::Add &&		if (BO->getOpcode() == Instruction::Add &&
IID == Intrinsic::usub_with_overflow) {		IID == Intrinsic::usub_with_overflow) {
assert(isa<Constant>(Arg1) && "Unexpected input for usubo");		assert(isa<Constant>(Arg1) && "Unexpected input for usubo");
Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));		Arg1 = ConstantExpr::getNeg(cast<Constant>(Arg1));
}		}

// Insert at the first instruction of the pair.		// Insert at the first instruction of the pair.
Instruction *InsertPt = nullptr;		Instruction *InsertPt = nullptr;
for (Instruction &Iter : *Cmp->getParent()) {		for (Instruction &Iter : *Cmp->getParent()) {
if (&Iter == BO \|\| &Iter == Cmp) {		if (&Iter == BO \|\| &Iter == Cmp) {
InsertPt = &Iter;		InsertPt = &Iter;
break;		break;
}		}
}		}
assert(InsertPt != nullptr && "Parent block did not contain cmp or binop");		assert(InsertPt != nullptr && "Parent block did not contain cmp or binop");

IRBuilder<> Builder(InsertPt);		IRBuilder<> Builder(InsertPt);
Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);		Value *MathOV = Builder.CreateBinaryIntrinsic(IID, Arg0, Arg1);
		if (BO->getOpcode() != Instruction::Xor) {
Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");		Value *Math = Builder.CreateExtractValue(MathOV, 0, "math");
Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
BO->replaceAllUsesWith(Math);		BO->replaceAllUsesWith(Math);
		} else
		assert(BO->hasOneUse() &&
		"Patterns with XOr should use the BO only in the compare");
		nikicUnsubmitted Not Done Reply Inline Actions nit: xor or XOR nikic: nit: xor or XOR
		Value *OV = Builder.CreateExtractValue(MathOV, 1, "ov");
Cmp->replaceAllUsesWith(OV);		Cmp->replaceAllUsesWith(OV);
BO->eraseFromParent();
Cmp->eraseFromParent();		Cmp->eraseFromParent();
		BO->eraseFromParent();
return true;		return true;
}		}

/// Match special-case patterns that check for unsigned add overflow.		/// Match special-case patterns that check for unsigned add overflow.
static bool matchUAddWithOverflowConstantEdgeCases(CmpInst *Cmp,		static bool matchUAddWithOverflowConstantEdgeCases(CmpInst *Cmp,
BinaryOperator *&Add) {		BinaryOperator *&Add) {
// Add = add A, 1; Cmp = icmp eq A,-1 (overflow if A is max val)		// Add = add A, 1; Cmp = icmp eq A,-1 (overflow if A is max val)
// Add = add A,-1; Cmp = icmp ne A, 0 (overflow if A is non-zero)		// Add = add A,-1; Cmp = icmp ne A, 0 (overflow if A is non-zero)
Show All 23 Lines
}		}

/// Try to combine the compare into a call to the llvm.uadd.with.overflow		/// Try to combine the compare into a call to the llvm.uadd.with.overflow
/// intrinsic. Return true if any changes were made.		/// intrinsic. Return true if any changes were made.
bool CodeGenPrepare::combineToUAddWithOverflow(CmpInst *Cmp,		bool CodeGenPrepare::combineToUAddWithOverflow(CmpInst *Cmp,
bool &ModifiedDT) {		bool &ModifiedDT) {
Value A, B;		Value A, B;
BinaryOperator *Add;		BinaryOperator *Add;
if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add))))		if (!match(Cmp, m_UAddWithOverflow(m_Value(A), m_Value(B), m_BinOp(Add)))) {
if (!matchUAddWithOverflowConstantEdgeCases(Cmp, Add))		if (!matchUAddWithOverflowConstantEdgeCases(Cmp, Add))
return false;		return false;
		// Set A and B in case we match matchUAddWithOverflowConstantEdgeCases.
		A = Add->getOperand(0);
		B = Add->getOperand(1);
		}

if (!TLI->shouldFormOverflowOp(ISD::UADDO,		if (!TLI->shouldFormOverflowOp(ISD::UADDO,
TLI->getValueType(*DL, Add->getType()),		TLI->getValueType(*DL, Add->getType()),
Add->hasNUsesOrMore(2)))		Add->hasNUsesOrMore(2)))
return false;		return false;

// We don't want to move around uses of condition values this late, so we		// We don't want to move around uses of condition values this late, so we
// check if it is legal to create the call to the intrinsic in the basic		// check if it is legal to create the call to the intrinsic in the basic
// block containing the icmp.		// block containing the icmp.
if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())		if (Add->getParent() != Cmp->getParent() && !Add->hasOneUse())
return false;		return false;

if (!replaceMathCmpWithIntrinsic(Add, Cmp, Intrinsic::uadd_with_overflow))		if (!replaceMathCmpWithIntrinsic(Add, A, B, Cmp,
		Intrinsic::uadd_with_overflow))
return false;		return false;

// Reset callers - do not crash by iterating over a dead instruction.		// Reset callers - do not crash by iterating over a dead instruction.
ModifiedDT = true;		ModifiedDT = true;
return true;		return true;
}		}

bool CodeGenPrepare::combineToUSubWithOverflow(CmpInst *Cmp,		bool CodeGenPrepare::combineToUSubWithOverflow(CmpInst *Cmp,
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::combineToUSubWithOverflow(CmpInst *Cmp,
if (!Sub)		if (!Sub)
return false;		return false;

if (!TLI->shouldFormOverflowOp(ISD::USUBO,		if (!TLI->shouldFormOverflowOp(ISD::USUBO,
TLI->getValueType(*DL, Sub->getType()),		TLI->getValueType(*DL, Sub->getType()),
Sub->hasNUsesOrMore(2)))		Sub->hasNUsesOrMore(2)))
return false;		return false;

if (!replaceMathCmpWithIntrinsic(Sub, Cmp, Intrinsic::usub_with_overflow))		if (!replaceMathCmpWithIntrinsic(Sub, Sub->getOperand(0), Sub->getOperand(1),
		Cmp, Intrinsic::usub_with_overflow))
return false;		return false;

// Reset callers - do not crash by iterating over a dead instruction.		// Reset callers - do not crash by iterating over a dead instruction.
ModifiedDT = true;		ModifiedDT = true;
return true;		return true;
}		}

/// Sink the given CmpInst into user blocks to reduce the number of virtual		/// Sink the given CmpInst into user blocks to reduce the number of virtual
▲ Show 20 Lines • Show All 6,146 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

Show First 20 Lines • Show All 5,562 Lines • ▼ Show 20 Lines	if (Instruction *Res = foldICmpWithMinMax(I))
}		}

Instruction *AddI = nullptr;		Instruction *AddI = nullptr;
if (match(&I, m_UAddWithOverflow(m_Value(A), m_Value(B),		if (match(&I, m_UAddWithOverflow(m_Value(A), m_Value(B),
m_Instruction(AddI))) &&		m_Instruction(AddI))) &&
isa<IntegerType>(A->getType())) {		isa<IntegerType>(A->getType())) {
Value *Result;		Value *Result;
Constant *Overflow;		Constant *Overflow;
if (OptimizeOverflowCheck(Instruction::Add, /Signed/false, A, B,		// m_UAddWithOverflow can match patterns that do not include an explicit
*AddI, Result, Overflow)) {		// "add" instruction, so check the opcode of the matched op.
		if (AddI->getOpcode() == Instruction::Add &&
		OptimizeOverflowCheck(Instruction::Add, /Signed/ false, A, B, *AddI,
		Result, Overflow)) {
replaceInstUsesWith(*AddI, Result);		replaceInstUsesWith(*AddI, Result);
return replaceInstUsesWith(I, Overflow);		return replaceInstUsesWith(I, Overflow);
}		}
}		}

// (zext a) * (zext b) --> llvm.umul.with.overflow.		// (zext a) * (zext b) --> llvm.umul.with.overflow.
if (match(Op0, m_Mul(m_ZExt(m_Value(A)), m_ZExt(m_Value(B))))) {		if (match(Op0, m_Mul(m_ZExt(m_Value(A)), m_ZExt(m_Value(B))))) {
if (Instruction R = processUMulZExtIdiom(I, Op0, Op1, this))		if (Instruction R = processUMulZExtIdiom(I, Op0, Op1, this))
▲ Show 20 Lines • Show All 558 Lines • Show Last 20 Lines

llvm/test/Transforms/CodeGenPrepare/AArch64/overflow-intrinsics.ll

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	;
store i64 %add, i64* %res		store i64 %add, i64* %res
ret i64 %Q		ret i64 %Q
}		}

; Instcombine folds (a + b <u a) to (a ^ -1 <u b). Make sure we match this		; Instcombine folds (a + b <u a) to (a ^ -1 <u b). Make sure we match this
; pattern as well.		; pattern as well.
define i64 @uaddo6_xor(i64 %a, i64 %b) {		define i64 @uaddo6_xor(i64 %a, i64 %b) {
; CHECK-LABEL: @uaddo6_xor(		; CHECK-LABEL: @uaddo6_xor(
; CHECK-NEXT: [[X:%.]] = xor i64 [[A:%.]], -1		; CHECK-NEXT: [[TMP1:%.]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[A:%.]], i64 [[B:%.*]])
; CHECK-NEXT: [[CMP:%.]] = icmp ult i64 [[X]], [[B:%.]]		; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
; CHECK-NEXT: [[Q:%.*]] = select i1 [[CMP]], i64 [[B]], i64 42		; CHECK-NEXT: [[Q:%.*]] = select i1 [[OV]], i64 [[B]], i64 42
; CHECK-NEXT: ret i64 [[Q]]		; CHECK-NEXT: ret i64 [[Q]]
;		;
%x = xor i64 %a, -1		%x = xor i64 %a, -1
%cmp = icmp ult i64 %x, %b		%cmp = icmp ult i64 %x, %b
%Q = select i1 %cmp, i64 %b, i64 42		%Q = select i1 %cmp, i64 %b, i64 42
ret i64 %Q		ret i64 %Q
}		}

define i64 @uaddo6_xor_commuted(i64 %a, i64 %b) {		define i64 @uaddo6_xor_commuted(i64 %a, i64 %b) {
; CHECK-LABEL: @uaddo6_xor_commuted(		; CHECK-LABEL: @uaddo6_xor_commuted(
; CHECK-NEXT: [[X:%.]] = xor i64 -1, [[A:%.]]		; CHECK-NEXT: [[TMP1:%.]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[A:%.]], i64 [[B:%.*]])
; CHECK-NEXT: [[CMP:%.]] = icmp ult i64 [[X]], [[B:%.]]		; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
; CHECK-NEXT: [[Q:%.*]] = select i1 [[CMP]], i64 [[B]], i64 42		; CHECK-NEXT: [[Q:%.*]] = select i1 [[OV]], i64 [[B]], i64 42
; CHECK-NEXT: ret i64 [[Q]]		; CHECK-NEXT: ret i64 [[Q]]
;		;
%x = xor i64 -1, %a		%x = xor i64 %a, -1
%cmp = icmp ult i64 %x, %b		%cmp = icmp ugt i64 %b, %x
%Q = select i1 %cmp, i64 %b, i64 42		%Q = select i1 %cmp, i64 %b, i64 42
ret i64 %Q		ret i64 %Q
}		}

declare void @use(i64)		declare void @use(i64)

define i64 @uaddo6_xor_multi_use(i64 %a, i64 %b) {		define i64 @uaddo6_xor_multi_use(i64 %a, i64 %b) {
; CHECK-LABEL: @uaddo6_xor_multi_use(		; CHECK-LABEL: @uaddo6_xor_multi_use(
Show All 39 Lines

llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	lebedev.riUnsubmitted Not Done Reply Inline Actions This doesn't seem right.. lebedev.ri: This doesn't seem right..
	; RUN: opt -codegenprepare -S < %s \| FileCheck %s			; RUN: opt -codegenprepare -S < %s \| FileCheck %s
	; RUN: opt -enable-debugify -codegenprepare -S < %s 2>&1 \| FileCheck %s -check-prefix=DEBUG			; RUN: opt -enable-debugify -codegenprepare -S < %s 2>&1 \| FileCheck %s -check-prefix=DEBUG

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
	target triple = "x86_64-apple-darwin10.0.0"			target triple = "x86_64-apple-darwin10.0.0"

	define i64 @uaddo1_overflow_used(i64 %a, i64 %b) nounwind ssp {			define i64 @uaddo1_overflow_used(i64 %a, i64 %b) nounwind ssp {
	; CHECK-LABEL: @uaddo1_overflow_used(			; CHECK-LABEL: @uaddo1_overflow_used(
	▲ Show 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	exit:			exit:
	ret i64 0			ret i64 0
	}			}

	; Instcombine folds (a + b <u a) to (a ^ -1 <u b). Make sure we match this			; Instcombine folds (a + b <u a) to (a ^ -1 <u b). Make sure we match this
	; pattern as well.			; pattern as well.
	define i64 @uaddo6_xor(i64 %a, i64 %b) {			define i64 @uaddo6_xor(i64 %a, i64 %b) {
	; CHECK-LABEL: @uaddo6_xor(			; CHECK-LABEL: @uaddo6_xor(
	; CHECK-NEXT: [[X:%.]] = xor i64 [[A:%.]], -1			; CHECK-NEXT: [[TMP1:%.]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i64 [[X]], [[B:%.]]			; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
	; CHECK-NEXT: [[Q:%.*]] = select i1 [[CMP]], i64 [[B]], i64 42			; CHECK-NEXT: [[Q:%.*]] = select i1 [[OV]], i64 [[B]], i64 42
	; CHECK-NEXT: ret i64 [[Q]]			; CHECK-NEXT: ret i64 [[Q]]
	;			;
	%x = xor i64 %a, -1			%x = xor i64 %a, -1
	%cmp = icmp ult i64 %x, %b			%cmp = icmp ult i64 %x, %b
	%Q = select i1 %cmp, i64 %b, i64 42			%Q = select i1 %cmp, i64 %b, i64 42
	ret i64 %Q			ret i64 %Q
	}			}

	define i64 @uaddo6_xor_commuted(i64 %a, i64 %b) {			define i64 @uaddo6_xor_commuted(i64 %a, i64 %b) {
	; CHECK-LABEL: @uaddo6_xor_commuted(			; CHECK-LABEL: @uaddo6_xor_commuted(
	; CHECK-NEXT: [[X:%.]] = xor i64 -1, [[A:%.]]			; CHECK-NEXT: [[TMP1:%.]] = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 [[A:%.]], i64 [[B:%.*]])
	; CHECK-NEXT: [[CMP:%.]] = icmp ult i64 [[X]], [[B:%.]]			; CHECK-NEXT: [[OV:%.*]] = extractvalue { i64, i1 } [[TMP1]], 1
	; CHECK-NEXT: [[Q:%.*]] = select i1 [[CMP]], i64 [[B]], i64 42			; CHECK-NEXT: [[Q:%.*]] = select i1 [[OV]], i64 [[B]], i64 42
	; CHECK-NEXT: ret i64 [[Q]]			; CHECK-NEXT: ret i64 [[Q]]
	;			;
	%x = xor i64 -1, %a			%x = xor i64 %a, -1
	%cmp = icmp ult i64 %x, %b			%cmp = icmp ult i64 %x, %b
	%Q = select i1 %cmp, i64 %b, i64 42			%Q = select i1 %cmp, i64 %b, i64 42
	ret i64 %Q			ret i64 %Q
	}			}

	declare void @use(i64)			declare void @use(i64)

	define i64 @uaddo6_xor_multi_use(i64 %a, i64 %b) {			define i64 @uaddo6_xor_multi_use(i64 %a, i64 %b) {
	▲ Show 20 Lines • Show All 442 Lines • Show Last 20 Lines