Page MenuHomePhabricator

[PatternMatch] Match XOR variant of unsigned-add overflow check.
ClosedPublic

Authored by fhahn on Feb 7 2020, 9:13 AM.

Details

Summary

Instcombine folds (a + b <u a) to (a ^ -1 <u b) and that does not match
the expected pattern in CodeGenPerpare via UAddWithOverflow.

This causes a regression over Clang 7 on both X86 and AArch64:
https://gcc.godbolt.org/z/juhXYV

This patch extends UAddWithOverflow to also catch the XOR case, if the
XOR is only used in the ICMP. This covers just a single case, but I'd
like to make sure I am not missing anything before tackling the other
cases.

Diff Detail

Event Timeline

fhahn created this revision.Feb 7 2020, 9:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2020, 9:13 AM

Tests missing.
What about commutted variant?

llvm/include/llvm/IR/PatternMatch.h
1709
// (a ^ -1 <u b)
1712

Will avoiding variable result in 80-char column overflow?

llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll
1

This doesn't seem right..

nikic added inline comments.Feb 7 2020, 10:07 AM
llvm/include/llvm/IR/PatternMatch.h
1711

Does m_SpecificInt sign extend? I'd use m_AllOnes() here. Or for that matter m_Not(m_Value(Op1)), though that will require reordering.

fhahn updated this revision to Diff 243254.Feb 7 2020, 11:37 AM
fhahn marked 3 inline comments as done.

Resurrect test, adjust users of UAddWithOverflow. I am not sure if we should change the name of the matcher or the third operand, as the 'Sum' option now may return the XOR.

Tests missing.

Should be back again.

What about commutted variant?

I will add those if the direction here is the right one.

nikic added a comment.Feb 8 2020, 4:33 AM

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

spatel added a comment.Feb 8 2020, 7:20 AM

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

  1. We tried to canonicalize to them in instcombine, but this caused perf regressions.
  2. We converted some of them in CGP instead.
  3. We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.
  4. We crippled the CGP transforms to avoid regressions.

I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

fhahn added a comment.Feb 10 2020, 7:09 AM

Not sure this is the right way to go about it. As we're only using the "overflow" result here, and don't need to collect two nominally independent instructions (potentially from different BBs), I don't think there's a strong reason to do this transform in CGP, as opposed to a setcc->uaddo DAG combine (if uaddo is legal).

Maybe @spatel can chime in, who implemented the handling in CGP.

IIRC, the history of transforms for overflow ops is something like this:

  1. We tried to canonicalize to them in instcombine, but this caused perf regressions.
  2. We converted some of them in CGP instead.
  3. We converted more of them in CGP (this was my attempt), but this caused perf and compile-time regressions.
  4. We crippled the CGP transforms to avoid regressions.

    I'm not sure what the status is currently in SDAG. Given that we already have the code in CGP, and this is a small addition, I wouldn't object to extending that code. If we could re-implement everything that is in CGP in SDAG, that might be a better option...although that might be counter-productive if we assume we're soon heading to or are already in a GlobalISel world.

    I might have missed some of the recent patches on overflow intrinsics. It's not clear to me from the diff what InstCombine's role in this patch is - can we add a code comment and/or regression test to explain what we expect to happen for this (and other patterns)?

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

InstCombine started turning (a + b <u a) into (a ^ -1 <u b), but CGP looks for (a + b <u a) and misses the case now. Does that answer your question?

Ah, I see it now. I wasn't reading the diff in InstCombiner::visitICmpInst() correctly. Might be nicer to include something like:

// m_UAddWithOverflow can match patterns that do not include
// an explicit "add" instruction, so check the opcode of the matched op.

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

  1. Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).
  2. Add full end-to-end tests for C --> asm to test-suite.
fhahn updated this revision to Diff 243816.Feb 11 2020, 4:51 AM

Document additional pattern match by UAddWithOverflow and add comment on why we match for AddInst explicitly in InstCombine.

fhahn added a comment.Feb 11 2020, 4:52 AM

It might be good to have tests that run instcombine & CGP to catch regressions such as this?

Yes, that would be an improvement. Two potential options:

  1. Add 'opt' RUNs that include CGP to llvm/test/Transforms/PhaseOrdering (this is probably stretching the intent of "PhaseOrdering"; we'll need a new target-specific test directory because CGP will require that we pick a target).
  2. Add full end-to-end tests for C --> asm to test-suite.

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:

define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}

...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:

define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}

...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

For example:

$ llc -o - ov.ll -mtriple=sparcv9
uaddo_no_math_use:                      ! @uaddo_no_math_use
! %bb.0:
	add %o0, %o1, %o2
	mov	%g0, %o3
	cmp %o2, %o0
	movcs	%xcc, 1, %o3
	mov	42, %o0
	cmp %o3, 0
	retl
	movne	%icc, %o1, %o0
no_intrinsic:                           ! @no_intrinsic
! %bb.0:
	xor %o0, -1, %o2
	mov	42, %o0
	cmp %o2, %o1
	retl
	movcs	%xcc, %o1, %o0

Hopefully, I'm seeing the whole problem now. Let me know if I'm still missing it.

I think that should cover it, thanks for sharing the problematic case on SPARC.

I remember that there was a discussion on llvm-dev about this topic. Do you know if we already have similar tests in test-suite?

I don't follow test-suite closely, but I'm not seeing anything that checks asm output at first glance....but it's probably worth raising again on the list because that topic seems to come up every few months.

But back to this patch - I'm just now looking at the test case in this patch, and I think that I missed the point of the earlier comment by @nikic. Let's step back and answer if CGP is the right place to do this transform.

If we are comparing codegen for these 2 patterns:

define i64 @uaddo_no_math_use(i64 %a, i64 %b) {
  %t = call { i64, i1 } @llvm.uadd.with.overflow.i64(i64 %a, i64 %b)
  %ov = extractvalue { i64, i1 } %t, 1
  %Q = select i1 %ov, i64 %b, i64 42
  ret i64 %Q
}

define i64 @no_intrinsic(i64 %a, i64 %b) {
  %x = xor i64 %a, -1
  %cmp = icmp ult i64 %x, %b
  %Q = select i1 %cmp, i64 %b, i64 42
  ret i64 %Q
}

...then should we form an intrinsic in CGP or create ISD::UADDO later? Although this CGP transform uses a TLI hook (shouldFormOverflowOp), that hook is aggressive by default - it assumed we would generate this intrinsic only if we needed both results (similarly, we can argue that we're missing a canonicalization to replace the intrinsic call in the first example). If we want to create the intrinsic even if we don't need both results, then we may need a DAG reversal because the default expansion does not seem to give us optimal asm for all targets.

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

fhahn added a comment.Feb 11 2020, 1:26 PM

The example below is interesting! It seems like we have the same problem with the add version of UAddWithOverflow, as we also generate uadd calls if the sum is not used besides the compare (e.g. the @uaddo1 test case in llvm/test/Transforms/CodeGenPrepare/X86/overflow-intrinsics.ll)

Maybe we should adjust the shouldFormOverflowOp hook to check if there are actual users of the sum and default to OFF if there are actual users? And then opt in on X86 & AArch64, where I think it should be beneficial for overflow-only checks. I think doing it in CGP is more convenient then in SelDag and we would also need an appropriate target hook there I think. Unless we match that pattern in the target implementations.

What do you think?

I think it would be fine to tighten up the default hook and then allow targets to opt in with more awareness. OTOH in SDAG, we could just check that UADDO is legal/custom, so no custom hook needed?

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

I think that should cover it, thanks for sharing the problematic case on SPARC.

That took some hunting. :)
But I just wanted to show that we need to be careful with these transforms. We've generated a lot of back-and-forth regressions with the overflow intrinsics.

+1, I've run into a few I am trying to address with this and other patches in the area :)

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent. So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?
@nikic or others - does that match your expectations?

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

nikic added a comment.Feb 15 2020, 8:40 AM

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

Should we simply start forming @llvm.with.overflow intrinsics in middle-end more often?

Taking a step back here, I'd like to question why InstCombine does this fold in the first place. This was introduced in https://github.com/llvm/llvm-project/commit/1cf0734b2f6b346993c41c18f4d6a9f2d1e11189, but I'm not quite clear on the motivations (and if they still apply after we introduced saturation intrinsics?). As we consider a + b < a the canonical overflow pattern for the case where a + b is used, why wouldn't we use the same pattern for the case where it is not used as well?

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that. For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

So should I prepare a set of patches for the improved target hook? Or SelDag? I would slightly prefer CGP for that, also with GlobalISel in mind and that's were we already do the expansion for similar cases.

It's an odd argument to add to CGP if we are using GlobalISel for codegen. CGP is intended to be a SDAG-only hack because of the basic block limitation there. Without the block limitation, we should favor doing transforms in the "official" codegen form. So IIUC, GISel should never use CGP. Eventually, the GISel path must re-implement the combines in both CGP and SDAG to become perf-equivalent.

So I'd favor SDAG on this patch unless there's evidence that the 'not' and 'cmp' are not always in the same block?

It is quite trivial to come up with a contrived example that shows this can reasonably happen:
https://godbolt.org/z/Jj92M9

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

nikic added a comment.Feb 16 2020, 6:43 AM

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

The transform makes sense in IR because it eliminates a use of a variable and trades an 'add' for a 'not' - both of those are better for analysis and enabling subsequent folding. If we are questioning that transform, then we have to be ok with reversing that transform: turn 'not' into 'add' and create an extra use of a variable. To do that, we'd need to show that that is at least not harmful in codegen. Given the history of regressions using overflow intrinsics, I'm skeptical that we want to reverse that.

Okay, that makes sense. I'm not even sure we could do the reverse fold, thanks to the magic of undef. (A < ~B) => (A + B < B) for A = -1, B = undef, the former is guaranteed false, while the latter could be true or false as the two uses of B can be chosen independently.

We almost can now, it would be C = freeze(B); (A + C < C)
Almost, because i think codegen patch D29014 is stuck :/

For similar reasons, I don't think we want to canonicalize to overflow intrinsics if we're not using the math part of the op - that would be harmful to IR-level transforms and codegen unless we teach every pass to undo those?

Yes, definitely. We can't even canonicalize to (unsigned) overflow intrinsics even if the result is used, because they are so badly optimized right now.

So overall, I agree as well that the approach in this patch is fine :)

fhahn added a comment.Feb 16 2020, 9:30 AM

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

fhahn added a comment.Feb 17 2020, 8:27 AM

@nikic or others - does that match your expectations?

I admittedly haven't paid match attention to what's happening here,
but i agree that this pattern is missed, and do think we should handle multi-BB case.

Ok, then I think we can proceed with this patch in CGP, but as mentioned, I think we need to tighten the default TLI hook, so we don't create regressions for some targets.

SGMT

Great, I'll add the hook

I've put up a patch adjusting the shouldFormOverflowOp hook to consider whether the math result is used: D74722

I'd like to see (at least?) the commutative case handled.

llvm/include/llvm/IR/PatternMatch.h
1712

Hm, i don't really see one-use checks in similar neighboring patterns here.

fhahn updated this revision to Diff 245159.Feb 18 2020, 7:10 AM

use m_c_Xor to handle commutative case

fhahn marked an inline comment as done.Feb 18 2020, 7:13 AM
fhahn added inline comments.
llvm/include/llvm/IR/PatternMatch.h
1712

I can drop it here, but I think we should be more restrictive here, because we cannot use the math result for the xor. Alternatively we can have the use check in CGP.

lebedev.ri added inline comments.Feb 18 2020, 7:25 AM
llvm/include/llvm/IR/PatternMatch.h
2096

We know constant is always on rhs of binop.
I was talking about b u> (a ^ -1) case there.

Also, not strictly related to this patch, but does this simply ignore inverted cases, like (a ^ -1) u>= b?

fhahn updated this revision to Diff 245278.Feb 18 2020, 2:42 PM

Actually update to match b u> (a ^ -1). Remove m_c_Xor again. I'll make sure the code in PatternMatch.h stays at its current position.

fhahn marked an inline comment as done.Feb 18 2020, 2:45 PM
fhahn added inline comments.
llvm/include/llvm/IR/PatternMatch.h
2096

I was talking about b u> (a ^ -1) case there.

Right, that should be fixed in the latest version. I've updated replaceMathCmpWithIntrinsic to take the A and B args directly, rather than getting it from the BO, because with Xor that's not straight forward.

Also, not strictly related to this patch, but does this simply ignore inverted cases, like (a ^ -1) u>= b?

For now yes. Not sure how common that is, but it could be added as follow-up

fhahn updated this revision to Diff 245285.Feb 18 2020, 3:03 PM

Update test checks, assertion.

lebedev.ri accepted this revision.Feb 18 2020, 3:52 PM

This seems good to me but would be good for @nikic to comment.

This revision is now accepted and ready to land.Feb 18 2020, 3:52 PM
nikic accepted this revision.Feb 19 2020, 12:31 AM

LG to me as well.

llvm/lib/CodeGen/CodeGenPrepare.cpp
1230

nit: xor or XOR

This revision was automatically updated to reflect the committed changes.
vsk added a subscriber: vsk.Feb 19 2020, 10:26 AM

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

In D74228#1883057, @vsk wrote:

Not yet 100% sure, but I suspect this is breaking the clang self-host, see:

FAILED: tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o 
/Users/buildslave/jenkins/workspace/lldb-cmake/host-compiler/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/include -Itools/clang/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/libxml2 -Iinclude -I/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include -Wdocumentation -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -fmodules -fmodules-cache-path=/Users/buildslave/jenkins/workspace/lldb-cmake/lldb-build/module.cache -fcxx-modules -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3  -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -MF tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o.d -o tools/clang/lib/Basic/CMakeFiles/obj.clangBasic.dir/SourceManager.cpp.o -c /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/clang/lib/Basic/SourceManager.cpp
Instruction does not dominate all uses!
  %1 = load i32, i32* %Size.i, align 8, !tbaa !66
  %0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %FID.coerce, i32 %1)
in function _ZNK5clang13SourceManager17getPreviousFileIDENS_6FileIDE
fatal error: error in backend: Broken function found, compilation aborted!

http://lab.llvm.org:8080/green/view/LLDB/job/lldb-cmake/9597

Probably. I've revert the change for now.