This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY
ClosedPublic

Authored by davezarzycki on Nov 11 2019, 6:28 AM.

Details

Summary

Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.

Work remaining: match ADD, etc.

Diff Detail

Event Timeline

davezarzycki created this revision.Nov 11 2019, 6:28 AM

Can somebody CC the right System Z experts? A couple of their tests fail after this change.

@davezarzycki Please can you give more detail on the SystemZ breakages?

With all the targets enabled, only System Z fails. It looks like somebody just needs to update the assembly.

FAIL: LLVM :: CodeGen/SystemZ/int-uadd-03.ll (15196 of 34316)
******************** TEST 'LLVM :: CodeGen/SystemZ/int-uadd-03.ll' FAILED ********************
Script:
--
: 'RUN: at line 3';   /home/dave/b/l/t/bin/llc < /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll -mtriple=s390x-linux-gnu | /home/dave/b/l/t/bin/FileCheck /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll
--
Exit Code: 1

Command Output (stderr):
--
/home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll:202:10: error: CHECK: expected string not found in input
; CHECK: algf %r2, 160(%r15)
         ^
<stdin>:207:2: note: scanning from here
 llgfr %r1, %r13
 ^
<stdin>:239:2: note: possible intended match here
 algf %r1, 160(%r15) # 4-byte Folded Reload
 ^

--

********************
FAIL: LLVM :: CodeGen/SystemZ/int-usub-03.ll (15237 of 34316)
******************** TEST 'LLVM :: CodeGen/SystemZ/int-usub-03.ll' FAILED ********************
Script:
--
: 'RUN: at line 3';   /home/dave/b/l/t/bin/llc < /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll -mtriple=s390x-linux-gnu | /home/dave/b/l/t/bin/FileCheck /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll
--
Exit Code: 1

Command Output (stderr):
--
/home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll:210:10: error: CHECK: expected string not found in input
; CHECK: slgf %r2, 160(%r15)
         ^
<stdin>:215:2: note: scanning from here
 lgr %r3, %r2
 ^
<stdin>:251:2: note: possible intended match here
 slgf %r1, 160(%r15) # 4-byte Folded Reload
 ^

--

********************

Testing Time: 34.33s
********************
Failing Tests (2):
    LLVM :: CodeGen/SystemZ/int-uadd-03.ll
    LLVM :: CodeGen/SystemZ/int-usub-03.ll

  Expected Passes    : 33723
  Expected Failures  : 150
  Unsupported Tests  : 441
  Unexpected Failures: 2
RKSimon added a comment.EditedNov 11 2019, 7:29 AM

Why can't you include the updated systemz tests in this patch?

It looks like the check is a bit too strict, something like the following should fix the problem:

diff --git a/llvm/test/CodeGen/SystemZ/int-uadd-03.ll b/llvm/test/CodeGen/SystemZ/int-uadd-03.ll
index d57f8a84411..b7b9883ecc9 100644
--- a/llvm/test/CodeGen/SystemZ/int-uadd-03.ll
+++ b/llvm/test/CodeGen/SystemZ/int-uadd-03.ll
@@ -199,7 +199,7 @@ define zeroext i1 @f10(i64 %src, i64 %index, i64 %a, i64 *%res) {
 define zeroext i1 @f11(i32 *%ptr0) {
 ; CHECK-LABEL: f11:
 ; CHECK: brasl %r14, foo@PLT
-; CHECK: algf %r2, 160(%r15)
+; CHECK: algf {{%r[0-9]+}}, 160(%r15)
 ; CHECK: br %r14
   %ptr1 = getelementptr i32, i32 *%ptr0, i64 2
   %ptr2 = getelementptr i32, i32 *%ptr0, i64 4
diff --git a/llvm/test/CodeGen/SystemZ/int-usub-03.ll b/llvm/test/CodeGen/SystemZ/int-usub-03.ll
index 4e5f99fcee2..5e0a947772c 100644
--- a/llvm/test/CodeGen/SystemZ/int-usub-03.ll
+++ b/llvm/test/CodeGen/SystemZ/int-usub-03.ll
@@ -207,7 +207,7 @@ define zeroext i1 @f10(i64 %src, i64 %index, i64 %a, i64 *%res) {
 define zeroext i1 @f11(i32 *%ptr0) {
 ; CHECK-LABEL: f11:
 ; CHECK: brasl %r14, foo@PLT
-; CHECK: slgf %r2, 160(%r15)
+; CHECK: slgf {{%r[0-9]+}}, 160(%r15)
 ; CHECK: br %r14
   %ptr1 = getelementptr i32, i32 *%ptr0, i64 2
   %ptr2 = getelementptr i32, i32 *%ptr0, i64 4
}
  1. Merged the minor System Z fixes into the prerequisite patch: https://reviews.llvm.org/D70077
  2. Added a paranoia check to ensure that the carry in value is i1.

@davezarzycki Please can you rebase?

Rebased with regenerated tests.

RKSimon added inline comments.Nov 12 2019, 5:26 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2823

We handle OR and XOR here but I don't see anything that treats them differently.

davezarzycki marked an inline comment as done.Nov 12 2019, 7:02 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2823

That's because it doesn't matter in this scenario. Either (uaddo A, B) overflows, or (uaddo *, In) overflows, but they cannot both overflow. I can add a comment about this if you want.

RKSimon added inline comments.Nov 12 2019, 7:29 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2823

yes please - your diagram doesn't mention xor at all

Added commentary about how it's impossible for both carry bits in the diamond to be set at the same time.

(added potential reviewers based on carry/overflow experience)

Thank you @spatel for adding more reviewers. Questions I have as a non-expert of LLVM design:

  1. Why is UADDO/USUBO an intrinsic but ADDCARRY/SUBCARRY is not? This is unfortunate given that clang and perhaps other languages have addcarry/subcarry intrinsics.
  2. Is the diamond being updated correctly? Or does it work by accident?
  3. Is there a good example of how to break a diamond when updating the DAG?
chfast added a subscriber: chfast.Nov 14 2019, 5:15 AM

Oh sh*i, this is the same thing I'm doing in https://reviews.llvm.org/D70012.

At least I can add my tests to the pool :)

  1. Why is UADDO/USUBO an intrinsic but ADDCARRY/SUBCARRY is not? This is unfortunate given that clang and perhaps other languages have addcarry/subcarry intrinsics.

I don't know the full history, but my guess is that the *.with.overflow intrinsics were assumed to be good enough to hold the patterns together through IR optimization, and the backend could relatively easily map those to DAG nodes. If that's not correct, then we could make a case for adding to, extending, or replacing the current set of intrinsics.

That does lead to questions I was already wondering about:

  1. Can we do this transform in IR (instcombine)?
  2. There are tests with intrinsics and tests with "raw" IR - are we missing IR transforms that should canonicalize to 1 form or the other?
davezarzycki added a comment.EditedNov 14 2019, 7:16 AM
  1. Why is UADDO/USUBO an intrinsic but ADDCARRY/SUBCARRY is not? This is unfortunate given that clang and perhaps other languages have addcarry/subcarry intrinsics.

I don't know the full history, but my guess is that the *.with.overflow intrinsics were assumed to be good enough to hold the patterns together through IR optimization, and the backend could relatively easily map those to DAG nodes. If that's not correct, then we could make a case for adding to, extending, or replacing the current set of intrinsics.

That does lead to questions I was already wondering about:

  1. Can we do this transform in IR (instcombine)?
  2. There are tests with intrinsics and tests with "raw" IR - are we missing IR transforms that should canonicalize to 1 form or the other?

As the first question and according to the SCM history (r234638 / b6c5914308132acc9289335ed6a92b31f9484631):

This change moves creating calls to llvm.uadd.with.overflow from InstCombine to CodeGenPrep. Combining overflow check patterns into calls to the said intrinsic in InstCombine inhibits optimization because it introduces an intrinsic call that not all other transforms and analyses understand.

As to the second question, the tests with intrinsics was generated via clang's __builtin_addc/subc. The tests without intrinsics were generated via pure C and hence more potential solutions exist that can be matched.

[EDIT] – I should add that CodeGenPrep doesn't match all eligible cases where llvm.uadd.with.overflow is possible, which hurts this proposed optimization.

Hi everybody. Feedback is still appreciated. What's blocking this change from being committed?

lebedev.ri added inline comments.Nov 18 2019, 12:04 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2842–2847

I find this confusing.
Which one of Carry0 and Carry1 is the first one?
Can this be simplified to something like

// ???
if (Carry1.getOperand(0) != Carry0.getValue(0))
  std::swap(Carry0, Carry1);
if (Carry1.getOperand(0) != Carry0.getValue(0))
  return SDValue();

This would be more idiomatic..

2852–2854

Do we care what Carry0.getOperand(1) is, what produces it?

davezarzycki added inline comments.Nov 18 2019, 12:45 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2842–2847

Sure. I can refactor that. I guess the comment wasn't clear enough. The code wants the UADDO (or USUBO) that does the primary arithmetic to be Carry0. I.e. the node at the top of the ASCII art. The UADDO (or USUBO) that does the carry/borrow in is Carry1. I.e. the node in the middle of the ASCII art.

2852–2854

We want that operand to be a carry bit (or at least plausibly so), so that we don't create bogus addcarry/subcarry nodes.

davezarzycki edited the summary of this revision. (Show Details)
  1. Check TLI.isOperationLegalOrCustom().
  2. Handle AND being used to merge flags.
  3. More source comments.
lebedev.ri added inline comments.Nov 18 2019, 3:04 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2819–2820

Since you've drawn the original graph, also showing the result might be good?

2842–2847

Please mark done comments as done.

2846

Ah, so the numbering is from bottom to top, i expected the other way around.

2853–2855

s/ISDOp/NewOp/

2861–2865

There's getAsCarry() should it be used here, or are we okay with some other i1?
(can we have some other i1 by now?)

2871–2874

This could use a few comments

davezarzycki marked 13 inline comments as done.Nov 18 2019, 4:29 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2846

I've swapped the ordering to match your expectations.

2861–2865

That function is non-recursive, so it doesn't work. In other words, it doesn't know how to handle carry flags being merged. I can change that if people are okay with that.

lebedev.ri added inline comments.Nov 18 2019, 5:17 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2846

(It's just that i think this ordering is more prevalent, i've practically not seen the opposite ordering in the codebase)

2861–2865

No real opinion from me here.

2873

I'm not following this bit, why do we do this only for ISD::AND?

davezarzycki marked 4 inline comments as done and an inline comment as not done.Nov 18 2019, 5:33 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2873

As the comment above tries to explain, we can prove in this diamond UADDO/USUBO scenario that it is impossible for both UADDO/USUBO nodes to overflow at the same time. Either one does, or the other, therefore using AND to merge the carry flags will always return zero.

chfast added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2848

Have you seen the getAsCarry() helper? It might help simplifying matching here.

2854

The ADDCARRY rather replaces UADDO so you should use the debug location of the Carry1 node.

2873

I think you should use CombineTo() here to bump stats counters, produce some debug logs, updated worklist and delete replaced nodes.

It can be something like:

CombineTo(Carry1.getNode(), Merged.getValue(0), undef);
return CombineTo(N, Merged.getValue(1));

Both Carry1 and N should be deleted by CombineTo().

davezarzycki marked 4 inline comments as done.Nov 18 2019, 6:15 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2848

As mentioned elsewhere in this thread, getAsCarry() would need to be updated in order to be used here. In particular, it would need become aware of this proposed optimization because the carry flag might not have been merged yet.

2854

The ADDCARRY replaces three operations. Why is one node better than another for the debug location?

2873

Okay. I'll look into that.

lebedev.ri added inline comments.Nov 18 2019, 6:37 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2854

I'd say the current debuginfo is correct.

2873

As the comment above tries to explain, we can prove in this diamond UADDO/USUBO scenario that it is impossible for both UADDO/USUBO nodes to overflow at the same time. Either one does, or the other, therefore using AND to merge the carry flags will always return zero.

Yes, but that wasn't the question i was asking.
My question was - what happens with uses of Carry1.getValue(0) if we have non-AND?
Since you don't replace, they will still use old nodes,
which means the old instruction chain will stay?
I think this may missing more tests.

davezarzycki marked an inline comment as done.Nov 18 2019, 7:55 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2873

There is a test for the zero case. See bogus_add_U320_without_i128_and in addcarry.ll.

In any case, I'll switch to CombineTo

Switch to DAGCombiner::CombineTo

davezarzycki marked 4 inline comments as done.Nov 18 2019, 9:14 AM

With the exception of some debate among the reviewers about what the right debug location should be, I think I've addressed all actionable comments so far.

lebedev.ri added inline comments.Nov 18 2019, 9:52 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2873

FWIW the previous version (no CombineTo(), only a singled RAUW call) looked more better to me,
but maybe i'm missing the point as to why that was wrong.

2874–2875

I'm sorry, i still don't understand why this is being done only in ISD::AND case.
Please explain that in a new comment in the code.

lebedev.ri added inline comments.Nov 18 2019, 10:02 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2874–2875

To be specific, i don't understand why this isn't:

DAG.ReplaceAllUsesOfValueWith(Carry1.getValue(0), Merged.getValue(0)); // debatable, but done always
if (N->getOpcode() == ISD::AND) // Both carrys can't overflow at the same time
  return DAG.getConstant(0, DL, MVT::i1);
return Merged.getValue(1);
davezarzycki marked 2 inline comments as done.Nov 19 2019, 12:29 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2874–2875

That also works. I'll switch to that.

Why in your mind is the first line of your suggestion "debatable"?

lebedev.ri marked an inline comment as done.Nov 19 2019, 12:40 AM
lebedev.ri added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2874–2875

Because i do not understand why currently Combiner.CombineTo() is used,
maybe using DAG.ReplaceAllUsesOfValueWith() would be incorrect there.

chfast added inline comments.Nov 19 2019, 12:40 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2873

To my understanding CombineTo() is a rich version of RAUW. It does RAUW, but also can add new nodes to the DAG Combiner worklist, updates DAG combiner stats, prints debug logs about which nodes has been combined into which, handles removed nodes.

lebedev.ri added inline comments.Nov 19 2019, 12:58 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2873

I would think simply returning the new value would be enough,
and whatever code called us will do all the same anyway?

davezarzycki marked 2 inline comments as done.Nov 19 2019, 1:14 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2874–2875

I tried this:

-  Combiner.CombineTo(Carry1.getNode(), Merged.getValue(0),
-                     DAG.getUNDEF(Merged.getValue(1).getValueType()));
+  DAG.ReplaceAllUsesOfValueWith(Carry1.getValue(0), Merged.getValue(0));
   if (N->getOpcode() == ISD::AND)
-    return Combiner.CombineTo(N, DAG.getConstant(0, DL, MVT::i1));
-  return Combiner.CombineTo(N, Merged.getValue(1));
+    return DAG.getConstant(0, DL, MVT::i1);
+  return Merged.getValue(1);

And it also worked. Who is responsible for this code and decide which approach is preferable?

davezarzycki marked an inline comment as done.Nov 19 2019, 10:26 PM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2874–2875

Ping. @RKSimon @spatel @craig.topper – As the top three committers to DAGCombiner.cpp in the last two years, do you have any preference about DAG.ReplaceAllUsesOfValueWith(…) versus Combiner.CombineTo(…)? Otherwise I believe that all feedback has been addressed to date. What's blocking approval? Thanks! :-)

Why does Carry1's result 0 not get replaced for the xor/or case?

Unless I'm misunderstanding you, I think the many rounds of review feedback are clouding the code review. Lines 2874 and 2875 replace Carry1's result 0 in all cases (OR, XOR, and AND).

Unless I'm misunderstanding you, I think the many rounds of review feedback are clouding the code review. Lines 2874 and 2875 replace Carry1's result 0 in all cases (OR, XOR, and AND).

Doesn't line 2873 return if the opcode is not ISD::AND?

Unless I'm misunderstanding you, I think the many rounds of review feedback are clouding the code review. Lines 2874 and 2875 replace Carry1's result 0 in all cases (OR, XOR, and AND).

Doesn't line 2873 return if the opcode is not ISD::AND?

Ah. Oops. I paused on uploading new patches while @chfast and @lebedev.ri were debating whether DAG.ReplaceAllUsesOfValueWith(…) or Combiner.CombineTo(…) was better. (Do you have a preference?) I'll update the patch now with the former.

Update the patch now that the DAG RAUW versus Combiner discussion has settled down. I personally like the former because it's simpler, but I do like the arguments in favor of the DAGCombiner. Either way, I'll defer to a code owner.

Almost LG to me, but apparently i have more comments.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2810

Wait, In is some third carry-in bit, right? This diagram doesn't really make that obvious.

2818

Out = (addcarry A, B, In).carry

2835–2838

Why can Carry0.getValue(0) only be in Carry1.getOperand(0), but not in Carry1.getOperand(1)?
I think there is no such restriction at least for add.

2844–2846

We are replacing and/or/xor node, and we only create a single replacement node,
so i don't think we need any one-use checks here?

2847

I think it will be helpful to

SDValue CarryBit = Carry1.getOperand(1);
davezarzycki marked 10 inline comments as done.Nov 20 2019, 5:01 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2810

No. There is one carry in bit in this graph: In. There are two carry out bits being merged in a way that implies an addcarry or subcarry is possible. I can try to make this more obvious.

2818

Sure. In the next patch.

2835–2838

Sure. I'll make it more robust in the next patch. For whatever it may be worth, this is the order output by CodeGenPrepare when creates implicit UADDO/USUBO nodes. It's also happens to be the order that clang generates via its addc/subc builtins.

2844–2846

I was trying to be conservative. After some thought, I think dropping the one-use tests is harmless. I'll remove them from the next patch.

2847

Sure. In the next patch.

davezarzycki marked 5 inline comments as done.
  1. Prettier ASCII art.
  2. Add robustness against the carry in bit swapping positions during optimization passes.
  3. Dropped use-once checks on the first uaddo/usubo node.

This patch fixes a bug introduced during the last round of feedback. By making the patch robust against the optimizer swapping operands to UADDO, the patch started tolerating the carry in bit being on the lefthand side of USUBO. That is wrong, and it's now fixed.

lebedev.ri accepted this revision.Nov 20 2019, 6:05 AM

With latest update i think this is basically LG, last few non-functional comments.
But please wait a bit for @deadalnix / @craig.topper / @RKSimon.

This patch fixes a bug introduced during the last round of feedback. By making the patch robust against the optimizer swapping operands to UADDO, the patch started tolerating the carry in bit being on the lefthand side of USUBO. That is wrong, and it's now fixed.

Thank you! I was just figuring out if that bug *wasn't* there.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2813

Since you are drawing add, you could swap it's operands, and thus CarryIn wouldn't have to intercept PartialCarryOutX. Just a thought, not sure it matters.

2859–2862
if (CarryIn.getOpcode() != ISD::ZERO_EXTEND)
  return SDValue();
CarryIn = CarryIn.getOperand(0);
if (CarryIn.getValueType() != MVT::i1)
  return SDValue();

It's better than seeing CarryIn.getOperand(0) later and going "wait what why we use first operand from CarryIn instruction?"

2867

CarryIn);

This revision is now accepted and ready to land.Nov 20 2019, 6:05 AM
lebedev.ri added inline comments.Nov 20 2019, 6:18 AM
test/CodeGen/X86/addcarry.ll
623

Please precommit all tests first.

davezarzycki marked 5 inline comments as done.Nov 20 2019, 6:22 AM
davezarzycki added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2813

I'd really rather not. The current diagram matches the UADDO/USUBO generated by CodeGenPrepare from plain IR. It also matches the UADDO/USUBO that clang emits as a part of its builtin addc/subc intrinsics. Also, if one substitutes usubo for uaddo, the diagram still holds true, whereas if the add operands were swapped, the diagram would be wrong for usubo.

2859–2862

Sure. I'll change that before committing.

lebedev.ri marked an inline comment as done.Nov 20 2019, 6:28 AM
lebedev.ri added inline comments.
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2813

Sure, that was only an observation.

davezarzycki marked 3 inline comments as done.Nov 20 2019, 6:30 AM
davezarzycki added inline comments.
test/CodeGen/X86/addcarry.ll
623

Oh, whoops. I missed this feedback once I saw the "ready to land" status change. Sorry. At least all of the other tests were pre-committed.

lebedev.ri added inline comments.Nov 20 2019, 6:35 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2829

Just make it a member function?

2837–2838

Very last note, i think it would be best to also

if (N->getOpcode() != ISD::AND && N->getOpcode() != ISD::OR && N->getOpcode() != ISD::XOR)
  return SDValue();

(or maybe that should be an assert)

davezarzycki added a comment.EditedNov 20 2019, 6:54 AM

There was this line:

But please wait a bit for @deadalnix / @craig.topper / @RKSimon.

Oh, sorry, I missed that there was a comment above the quoted text in addition to below it. I'm happy to revert this if somebody wants, or fix things in a followup patch.

I'm a bit late to the party, but this is nothing short of amazing. Congratulations!

I have found one case not covered by this case: in case of subtraction when we only care about the final carry bit (the difference is discarded) the first subtraction will be to compare. So here we may want to also search for cmp. I did a prototype by duplicating this code and modifying in for the cmp case only it this works. But I'm not sure if mixing it all here will be good idea. Anyway, I wanted to reach out about it and maybe fix the TODO from this patch in the mean time.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
2862

Do you remember this TODO here? I'd like to fix it if you can provide more details about it.

Via Web