This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Improve carry reconstruction in combineCarryDiamond.
ClosedPublic

Authored by deadalnix on Jul 5 2023, 10:53 AM.

Details

Summary

The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases.

Diff Detail

Event Timeline

deadalnix created this revision.Jul 5 2023, 10:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 10:53 AM
deadalnix requested review of this revision.Jul 5 2023, 10:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 5 2023, 10:53 AM
deadalnix updated this revision to Diff 537474.Jul 5 2023, 1:17 PM

Fix SystemZ tests

deadalnix added inline comments.
llvm/test/CodeGen/SystemZ/int-uadd-01.ll
313–318

@uweigand Maybe you could advise on what's the right way forward here?

Hi @deadalnix , it looks to me the test case exposes an actual bug. A simplified test would be this:

define i1 @f14(i32 %val0, i32 %val1, i32 %val2) {
  %t0 = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %val0, i32 %val1)
  %add0 = extractvalue {i32, i1} %t0, 0
  %obit0 = extractvalue {i32, i1} %t0, 1
  %t1 = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %add0, i32 %val2)
  %add1 = extractvalue {i32, i1} %t1, 0
  %obit1 = extractvalue {i32, i1} %t1, 1

  %res = or i1 %obit0, %obit1
  ret i1 %res
}

declare {i32, i1} @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone

This gets initially expanded into:

t0: ch,glue = EntryToken
  t2: i32,ch = CopyFromReg t0, Register:i32 %0
  t4: i32,ch = CopyFromReg t0, Register:i32 %1
t7: i32,i1 = uaddo t2, t4
        t6: i32,ch = CopyFromReg t0, Register:i32 %2
      t8: i32,i1 = uaddo t7, t6
    t9: i1 = or t7:1, t8:1
  t10: i32 = any_extend t9

but after DAGCombine with your change this ends up as:

    t2: i32,ch = CopyFromReg t0, Register:i32 %0
    t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t6: i32,ch = CopyFromReg t0, Register:i32 %2
  t14: i32,i1 = uaddo_carry t2, t4, t6
t10: i32 = any_extend t14:1

This looks wrong to me - the last operand of uaddo_carry is supposed to be a boolean flag indicating incoming carry, but here we're passing in one of the i32 inputs in full. Am I missing something here?

deadalnix planned changes to this revision.Jul 6 2023, 5:36 AM

Hi @deadalnix , it looks to me the test case exposes an actual bug. A simplified test would be this:
...

Good catch. The carry reconstruction is a bit too optimistic. I'll rework this.

deadalnix updated this revision to Diff 537934.Jul 6 2023, 5:07 PM

Fix overly enthusiastic carry promotion.

barannikov88 added inline comments.Jul 7 2023, 3:24 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3542

It does not look like this TODO was addressed, unless I'm misunderstanding it.

3547–3548

It is now an arbitrary boolean value, not necessarily a carry/borrow.
I don't know whether this is good or bad, although it looks like it was indended to match only carry/borrow-producing nodes.
If go that road I'd also consider replacing this call with more general bit tracking (ComputeNumSignBits et al.).

deadalnix added inline comments.Jul 7 2023, 4:02 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3542

Indeed. I did not think that really needed after this logic change so I removed the TODO.

deadalnix added inline comments.Jul 7 2023, 4:05 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
3547–3548

In this case, we replace a complex web of instructions by just one add. Reconstructing a carry is worth it. The test cases that were changed bellow are good exemple of that, the carry is passed as an argument to the functions, so it is not possible to know if these are "real" carries or not, and it is nevertheless worth doing the transform.

In this case, we replace a complex web of instructions by just one add. Reconstructing a carry is worth it. The test cases that were changed bellow are good exemple of that, the carry is passed as an argument to the functions, so it is not possible to know if these are "real" carries or not, and it is nevertheless worth doing the transform.

I was thinking about cases like feeding uaddo by a setcc, but that does not seem to pay off. E.g.:

declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64)

define { i64, i1 } @subcarry_fake_carry(i64 %a, i64 %b, i32 %x, i32 %y) {
    %t1 = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
    %partial = extractvalue { i64, i1 } %t1, 0
    %k1 = extractvalue { i64, i1 } %t1, 1

    %carryin = icmp eq i32 %x, %y
    %zcarryin = zext i1 %carryin to i64
    %s = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %partial, i64 %zcarryin)
    %k2 = extractvalue { i64, i1 } %s, 1

    %carryout = or i1 %k1, %k2

    %ret = insertvalue { i64, i1 } %s, i1 %carryout, 1
    ret { i64, i1 } %ret
}

$ llc test.ll -o - -x86-asm-syntax=intel
Before this patch:

mov     rax, rdi
sub     rax, rsi
setb    sil
xor     edi, edi
cmp     edx, ecx
sete    dil
sub     rax, rdi
setb    dl
or      dl, sil

with this patch:

mov     rax, rdi
cmp     edx, ecx
sete    cl
add     cl, -1
sbb     rax, rsi
setb    dl

With my suggestion:

mov     rax, rdi
xor     edi, edi
cmp     edx, ecx
sete    dil
add     rdi, -1
sbb     rax, rsi
setb    dl

Ok, this looks good to me as-is. There are things that can be improved e.g. taking into accout "boolean contents flavor" and the type of the boolean value (which appears to be i8 on x86), but they seem to be more or less independent of this change.
It's not exactly my area, so I'll let someone else take a look.

deadalnix updated this revision to Diff 541016.Jul 17 2023, 7:20 AM

Tighten conditions and add a test case.

deadalnix updated this revision to Diff 541505.Jul 18 2023, 6:38 AM

Rebase, ping?

RKSimon accepted this revision.Jul 22 2023, 9:40 AM

LGTM

This revision is now accepted and ready to land.Jul 22 2023, 9:40 AM

In this case, we replace a complex web of instructions by just one add. Reconstructing a carry is worth it. The test cases that were changed bellow are good exemple of that, the carry is passed as an argument to the functions, so it is not possible to know if these are "real" carries or not, and it is nevertheless worth doing the transform.

I was thinking about cases like feeding uaddo by a setcc, but that does not seem to pay off. E.g.:

declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64)

define { i64, i1 } @subcarry_fake_carry(i64 %a, i64 %b, i32 %x, i32 %y) {
    %t1 = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b)
    %partial = extractvalue { i64, i1 } %t1, 0
    %k1 = extractvalue { i64, i1 } %t1, 1

    %carryin = icmp eq i32 %x, %y
    %zcarryin = zext i1 %carryin to i64
    %s = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %partial, i64 %zcarryin)
    %k2 = extractvalue { i64, i1 } %s, 1

    %carryout = or i1 %k1, %k2

    %ret = insertvalue { i64, i1 } %s, i1 %carryout, 1
    ret { i64, i1 } %ret
}

It looks this test cases could be added to the repo.

It looks this test cases could be added to the repo.

It already is in subcarry.ll . I'll rebase and check everything's good with this, and if so land.