The gain is usually suffiscient to go the extra mile and reconstruct a carry in some cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hi @deadalnix , it looks to me the test case exposes an actual bug. A simplified test would be this:
define i1 @f14(i32 %val0, i32 %val1, i32 %val2) { %t0 = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %val0, i32 %val1) %add0 = extractvalue {i32, i1} %t0, 0 %obit0 = extractvalue {i32, i1} %t0, 1 %t1 = call {i32, i1} @llvm.uadd.with.overflow.i32(i32 %add0, i32 %val2) %add1 = extractvalue {i32, i1} %t1, 0 %obit1 = extractvalue {i32, i1} %t1, 1 %res = or i1 %obit0, %obit1 ret i1 %res } declare {i32, i1} @llvm.uadd.with.overflow.i32(i32, i32) nounwind readnone
This gets initially expanded into:
t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %0 t4: i32,ch = CopyFromReg t0, Register:i32 %1 t7: i32,i1 = uaddo t2, t4 t6: i32,ch = CopyFromReg t0, Register:i32 %2 t8: i32,i1 = uaddo t7, t6 t9: i1 = or t7:1, t8:1 t10: i32 = any_extend t9
but after DAGCombine with your change this ends up as:
t2: i32,ch = CopyFromReg t0, Register:i32 %0 t4: i32,ch = CopyFromReg t0, Register:i32 %1 t6: i32,ch = CopyFromReg t0, Register:i32 %2 t14: i32,i1 = uaddo_carry t2, t4, t6 t10: i32 = any_extend t14:1
This looks wrong to me - the last operand of uaddo_carry is supposed to be a boolean flag indicating incoming carry, but here we're passing in one of the i32 inputs in full. Am I missing something here?
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3542 | It does not look like this TODO was addressed, unless I'm misunderstanding it. | |
3548 | It is now an arbitrary boolean value, not necessarily a carry/borrow. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3542 | Indeed. I did not think that really needed after this logic change so I removed the TODO. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
3548 | In this case, we replace a complex web of instructions by just one add. Reconstructing a carry is worth it. The test cases that were changed bellow are good exemple of that, the carry is passed as an argument to the functions, so it is not possible to know if these are "real" carries or not, and it is nevertheless worth doing the transform. |
In this case, we replace a complex web of instructions by just one add. Reconstructing a carry is worth it. The test cases that were changed bellow are good exemple of that, the carry is passed as an argument to the functions, so it is not possible to know if these are "real" carries or not, and it is nevertheless worth doing the transform.
I was thinking about cases like feeding uaddo by a setcc, but that does not seem to pay off. E.g.:
declare { i64, i1 } @llvm.usub.with.overflow.i64(i64, i64) define { i64, i1 } @subcarry_fake_carry(i64 %a, i64 %b, i32 %x, i32 %y) { %t1 = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %a, i64 %b) %partial = extractvalue { i64, i1 } %t1, 0 %k1 = extractvalue { i64, i1 } %t1, 1 %carryin = icmp eq i32 %x, %y %zcarryin = zext i1 %carryin to i64 %s = call { i64, i1 } @llvm.usub.with.overflow.i64(i64 %partial, i64 %zcarryin) %k2 = extractvalue { i64, i1 } %s, 1 %carryout = or i1 %k1, %k2 %ret = insertvalue { i64, i1 } %s, i1 %carryout, 1 ret { i64, i1 } %ret }
$ llc test.ll -o - -x86-asm-syntax=intel
Before this patch:
mov rax, rdi sub rax, rsi setb sil xor edi, edi cmp edx, ecx sete dil sub rax, rdi setb dl or dl, sil
with this patch:
mov rax, rdi cmp edx, ecx sete cl add cl, -1 sbb rax, rsi setb dl
With my suggestion:
mov rax, rdi xor edi, edi cmp edx, ecx sete dil add rdi, -1 sbb rax, rsi setb dl
Ok, this looks good to me as-is. There are things that can be improved e.g. taking into accout "boolean contents flavor" and the type of the boolean value (which appears to be i8 on x86), but they seem to be more or less independent of this change.
It's not exactly my area, so I'll let someone else take a look.
It already is in subcarry.ll . I'll rebase and check everything's good with this, and if so land.
It is now an arbitrary boolean value, not necessarily a carry/borrow.
I don't know whether this is good or bad, although it looks like it was indended to match only carry/borrow-producing nodes.
If go that road I'd also consider replacing this call with more general bit tracking (ComputeNumSignBits et al.).