Download Raw Diff

Details

Reviewers

nikic
RKSimon
lebedev.ri
spatel

Commits

rGe01a3d49c224: [PatternMatch] Match XOR variant of unsigned-add overflow check.

Summary

Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Feb 7 2020, 9:13 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 9:13 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

Harbormaster completed remote builds in B45948: Diff 243190.Feb 7 2020, 9:49 AM

Tests missing.
What about commutted variant?

llvm/include/llvm/IR/PatternMatch.h
1708	// (a ^ -1 <u b)
1711	Will avoiding variable result in 80-char column overflow?
llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
1	This doesn't seem right..

nikic added inline comments.Feb 7 2020, 10:07 AM

llvm/include/llvm/IR/PatternMatch.h
1710	Does m_SpecificInt sign extend? I'd use `m_AllOnes()` here. Or for that matter `m_Not(m_Value(Op1))`, though that will require reordering.

Resurrect test, adjust users of UAddWithOverflow. I am not sure if we should change the name of the matcher or the third operand, as the 'Sum' option now may return the XOR.

In D74228#1864311, @lebedev.ri wrote:

Tests missing.

Should be back again.

What about commutted variant?

I will add those if the direction here is the right one.

Herald added a subscriber: hiraditya. · View Herald TranscriptFeb 7 2020, 11:37 AM

Harbormaster completed remote builds in B45968: Diff 243254.Feb 7 2020, 12:26 PM

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

In D74228#1865537, @nikic wrote:

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

We tried to canonicalize to them in instcombine, but this caused perf regressions.
We converted some of them in CGP instead.
We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.
We crippled the CGP transforms to avoid regressions.

I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

In D74228#1865587, @spatel wrote:

In D74228#1865537, @nikic wrote:

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

We tried to canonicalize to them in instcombine, but this caused perf regressions.

We converted some of them in CGP instead.

We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.

We crippled the CGP transforms to avoid regressions.

I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

In D74228#1867093, @fhahn wrote:

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

Ah, I see it now. I wasn't reading the diff in InstCombiner::visitICmpInst() correctly. Might be nicer to include something like:

// m_UAddWithOverflow can match patterns that do not include
// an explicit "add" instruction, so check the opcode of the matched op.

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).
Add full end-to-end tests for C --> asm to test-suite.

Document additional pattern match by UAddWithOverflow and add comment on why we match for AddInst explicitly in InstCombine.

In D74228#1867452, @spatel wrote:

In D74228#1867093, @fhahn wrote:

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).

Add full end-to-end tests for C --> asm to test-suite.

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

Harbormaster failed remote builds in B46211: Diff 243816!Feb 11 2020, 5:16 AM

In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:

define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}

...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

In D74228#1869653, @spatel wrote:
In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:
define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}
...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

I think that should cover it, thanks for sharing the problematic case on SPARC.

In D74228#1870169, @fhahn wrote:
In D74228#1869653, @spatel wrote:
In D74228#1869304, @fhahn wrote:

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:
define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}
...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.
The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

In D74228#1870505, @spatel wrote:

In D74228#1870169, @fhahn wrote:

In D74228#1869653, @spatel wrote:

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

+1, I've run into a few I am trying to address with this and other patches in the area :)

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent. So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?
@nikic or others - does that match your expectations?

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

Should we simply start forming @llvm.with.overflow intrinsics in middle-end more often?

In D74228#1877776, @nikic wrote:

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that. For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

In D74228#1870646, @fhahn wrote:

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

In D74228#1878189, @nikic wrote:

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

We almost can now, it would be C = freeze(B); (A + C < C)
Almost, because i think codegen patch D29014 is stuck :/

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

In D74228#1878177, @lebedev.ri wrote:

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

In D74228#1878307, @fhahn wrote:

In D74228#1878177, @lebedev.ri wrote:

In D74228#1878174, @spatel wrote:

In D74228#1877769, @lebedev.ri wrote:

In D74228#1870877, @spatel wrote:

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

I've put up a patch adjusting the shouldFormOverflowOp hook to consider whether the math result is used: D74722

fhahn added a parent revision: D74722: [TargetLower] Update shouldFormOverflowOp check if math is used..Feb 17 2020, 8:31 AM

I'd like to see (at least?) the commutative case handled.

llvm/include/llvm/IR/PatternMatch.h
1711	Hm, i don't really see one-use checks in similar neighboring patterns here.

use m_c_Xor to handle commutative case

fhahn marked an inline comment as done.Feb 18 2020, 7:13 AM

fhahn added inline comments.

llvm/include/llvm/IR/PatternMatch.h
1711	I can drop it here, but I think we should be more restrictive here, because we cannot use the math result for the xor. Alternatively we can have the use check in CGP.

fhahn added a parent revision: D74771: [PatternMatch] Move UAddWithOverflow matchers further down (NFC)..Feb 18 2020, 7:13 AM

lebedev.ri added inline comments.Feb 18 2020, 7:25 AM

llvm/include/llvm/IR/PatternMatch.h
2089	We know constant is always on rhs of binop. I was talking about `b u> (a ^ -1)` case there. Also, not strictly related to this patch, but does this simply ignore inverted cases, like `(a ^ -1) u>= b`?

Harbormaster completed remote builds in B46708: Diff 245159.Feb 18 2020, 7:30 AM

Actually update to match b u> (a ^ -1). Remove m_c_Xor again. I'll make sure the code in PatternMatch.h stays at its current position.

fhahn marked an inline comment as done.Feb 18 2020, 2:45 PM

fhahn added inline comments.

llvm/include/llvm/IR/PatternMatch.h
2089	I was talking about b u> (a ^ -1) case there. Right, that should be fixed in the latest version. I've updated `replaceMathCmpWithIntrinsic` to take the A and B args directly, rather than getting it from the BO, because with Xor that's not straight forward. Also, not strictly related to this patch, but does this simply ignore inverted cases, like (a ^ -1) u>= b? For now yes. Not sure how common that is, but it could be added as follow-up

Update test checks, assertion.

This seems good to me but would be good for @nikic to comment.

This revision is now accepted and ready to land.Feb 18 2020, 3:52 PM

LG to me as well.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1233	nit: xor or XOR

fhahn mentioned this in rG7cbf710396df: [CGP] Precommit tests for D74228..Feb 19 2020, 1:00 AM

Harbormaster failed remote builds in B46763: Diff 245278!Feb 19 2020, 2:35 AM

Harbormaster failed remote builds in B46766: Diff 245285!Feb 19 2020, 4:04 AM

Closed by commit rGe01a3d49c224: [PatternMatch] Match XOR variant of unsigned-add overflow check. (authored by fhahn). · Explain WhyFeb 19 2020, 6:32 AM

This revision was automatically updated to reflect the committed changes.

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

In D74228#1883057, @vsk wrote:

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

Probably. I've revert the change for now.

fhahn mentioned this in rG36300bc1ae5e: [CGP] Precommit tests for D74228..Jul 14 2020, 4:27 PM

This is an archive of the discontinued LLVM Phabricator instance.

[PatternMatch] Match XOR variant of unsigned-add overflow check.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 243254

llvm/include/llvm/IR/PatternMatch.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

This is an archive of the discontinued LLVM Phabricator instance.

[PatternMatch] Match XOR variant of unsigned-add overflow check.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 243254

llvm/include/llvm/IR/PatternMatch.h

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll

[PatternMatch] Match XOR variant of unsigned-add overflow check.
ClosedPublic