Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR.
Work remaining: match ADD, etc.
Differential D70079
[SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRY davezarzycki on Nov 11 2019, 6:28 AM. Authored by
Details Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR. Work remaining: match ADD, etc.
Diff Detail
Event TimelineComment Actions Can somebody CC the right System Z experts? A couple of their tests fail after this change. Comment Actions With all the targets enabled, only System Z fails. It looks like somebody just needs to update the assembly. FAIL: LLVM :: CodeGen/SystemZ/int-uadd-03.ll (15196 of 34316) ******************** TEST 'LLVM :: CodeGen/SystemZ/int-uadd-03.ll' FAILED ******************** Script: -- : 'RUN: at line 3'; /home/dave/b/l/t/bin/llc < /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll -mtriple=s390x-linux-gnu | /home/dave/b/l/t/bin/FileCheck /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll -- Exit Code: 1 Command Output (stderr): -- /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-uadd-03.ll:202:10: error: CHECK: expected string not found in input ; CHECK: algf %r2, 160(%r15) ^ <stdin>:207:2: note: scanning from here llgfr %r1, %r13 ^ <stdin>:239:2: note: possible intended match here algf %r1, 160(%r15) # 4-byte Folded Reload ^ -- ******************** FAIL: LLVM :: CodeGen/SystemZ/int-usub-03.ll (15237 of 34316) ******************** TEST 'LLVM :: CodeGen/SystemZ/int-usub-03.ll' FAILED ******************** Script: -- : 'RUN: at line 3'; /home/dave/b/l/t/bin/llc < /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll -mtriple=s390x-linux-gnu | /home/dave/b/l/t/bin/FileCheck /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll -- Exit Code: 1 Command Output (stderr): -- /home/dave/s/l/llvm/test/CodeGen/SystemZ/int-usub-03.ll:210:10: error: CHECK: expected string not found in input ; CHECK: slgf %r2, 160(%r15) ^ <stdin>:215:2: note: scanning from here lgr %r3, %r2 ^ <stdin>:251:2: note: possible intended match here slgf %r1, 160(%r15) # 4-byte Folded Reload ^ -- ******************** Testing Time: 34.33s ******************** Failing Tests (2): LLVM :: CodeGen/SystemZ/int-uadd-03.ll LLVM :: CodeGen/SystemZ/int-usub-03.ll Expected Passes : 33723 Expected Failures : 150 Unsupported Tests : 441 Unexpected Failures: 2 Comment Actions It looks like the check is a bit too strict, something like the following should fix the problem: diff --git a/llvm/test/CodeGen/SystemZ/int-uadd-03.ll b/llvm/test/CodeGen/SystemZ/int-uadd-03.ll index d57f8a84411..b7b9883ecc9 100644 --- a/llvm/test/CodeGen/SystemZ/int-uadd-03.ll +++ b/llvm/test/CodeGen/SystemZ/int-uadd-03.ll @@ -199,7 +199,7 @@ define zeroext i1 @f10(i64 %src, i64 %index, i64 %a, i64 *%res) { define zeroext i1 @f11(i32 *%ptr0) { ; CHECK-LABEL: f11: ; CHECK: brasl %r14, foo@PLT -; CHECK: algf %r2, 160(%r15) +; CHECK: algf {{%r[0-9]+}}, 160(%r15) ; CHECK: br %r14 %ptr1 = getelementptr i32, i32 *%ptr0, i64 2 %ptr2 = getelementptr i32, i32 *%ptr0, i64 4 diff --git a/llvm/test/CodeGen/SystemZ/int-usub-03.ll b/llvm/test/CodeGen/SystemZ/int-usub-03.ll index 4e5f99fcee2..5e0a947772c 100644 --- a/llvm/test/CodeGen/SystemZ/int-usub-03.ll +++ b/llvm/test/CodeGen/SystemZ/int-usub-03.ll @@ -207,7 +207,7 @@ define zeroext i1 @f10(i64 %src, i64 %index, i64 %a, i64 *%res) { define zeroext i1 @f11(i32 *%ptr0) { ; CHECK-LABEL: f11: ; CHECK: brasl %r14, foo@PLT -; CHECK: slgf %r2, 160(%r15) +; CHECK: slgf {{%r[0-9]+}}, 160(%r15) ; CHECK: br %r14 %ptr1 = getelementptr i32, i32 *%ptr0, i64 2 %ptr2 = getelementptr i32, i32 *%ptr0, i64 4 } Comment Actions
Comment Actions Added commentary about how it's impossible for both carry bits in the diamond to be set at the same time. Comment Actions Thank you @spatel for adding more reviewers. Questions I have as a non-expert of LLVM design:
Comment Actions Oh sh*i, this is the same thing I'm doing in https://reviews.llvm.org/D70012. At least I can add my tests to the pool :) Comment Actions I don't know the full history, but my guess is that the *.with.overflow intrinsics were assumed to be good enough to hold the patterns together through IR optimization, and the backend could relatively easily map those to DAG nodes. If that's not correct, then we could make a case for adding to, extending, or replacing the current set of intrinsics. That does lead to questions I was already wondering about:
Comment Actions As the first question and according to the SCM history (r234638 / b6c5914308132acc9289335ed6a92b31f9484631):
As to the second question, the tests with intrinsics was generated via clang's __builtin_addc/subc. The tests without intrinsics were generated via pure C and hence more potential solutions exist that can be matched. [EDIT] – I should add that CodeGenPrep doesn't match all eligible cases where llvm.uadd.with.overflow is possible, which hurts this proposed optimization. Comment Actions Hi everybody. Feedback is still appreciated. What's blocking this change from being committed?
Comment Actions
Comment Actions With the exception of some debate among the reviewers about what the right debug location should be, I think I've addressed all actionable comments so far.
Comment Actions Unless I'm misunderstanding you, I think the many rounds of review feedback are clouding the code review. Lines 2874 and 2875 replace Carry1's result 0 in all cases (OR, XOR, and AND). Comment Actions Ah. Oops. I paused on uploading new patches while @chfast and @lebedev.ri were debating whether DAG.ReplaceAllUsesOfValueWith(…) or Combiner.CombineTo(…) was better. (Do you have a preference?) I'll update the patch now with the former. Comment Actions Update the patch now that the DAG RAUW versus Combiner discussion has settled down. I personally like the former because it's simpler, but I do like the arguments in favor of the DAGCombiner. Either way, I'll defer to a code owner. Comment Actions Almost LG to me, but apparently i have more comments.
Comment Actions
Comment Actions This patch fixes a bug introduced during the last round of feedback. By making the patch robust against the optimizer swapping operands to UADDO, the patch started tolerating the carry in bit being on the lefthand side of USUBO. That is wrong, and it's now fixed. Comment Actions With latest update i think this is basically LG, last few non-functional comments. Thank you! I was just figuring out if that bug *wasn't* there.
Comment Actions Oh, sorry, I missed that there was a comment above the quoted text in addition to below it. I'm happy to revert this if somebody wants, or fix things in a followup patch. Comment Actions I have found one case not covered by this case: in case of subtraction when we only care about the final carry bit (the difference is discarded) the first subtraction will be to compare. So here we may want to also search for cmp. I did a prototype by duplicating this code and modifying in for the cmp case only it this works. But I'm not sure if mixing it all here will be good idea. Anyway, I wanted to reach out about it and maybe fix the TODO from this patch in the mean time.
|
Wait, In is some third carry-in bit, right? This diagram doesn't really make that obvious.