[AArch64] Fix sub with carry 13403a70e45b2d22878ba59fc211f8dba3a8deba introduced a bug where we generate the outgoing carry inverted, which in turn breaks the lowering of @llvm.usub.sat.i128, returning the normal difference on saturation and zero otherwise. Note that AArch64 has peculiar semantics where the subtraction instructions generate borrow inverted. The problem is that we mix the two forms of semantics -- the normal carry and inverted carry -- in the area of extended precision subtractions. Specifically, we have three problems: - lowerADDSUBCARRY takes the non-inverted incoming carry from a subtraction and feeds it to SBCS without inverting it first. - lowerADDSUBCARRY makes available the outgoing carry from SBCS without inverting it. - foldOverflowCheck folds: (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP in turn generates a borrow, *clearing* the carry flag. Instead, we should fold: (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry) When the incoming carry flag is set, CSET LO results in zero. CMP does not generate a borrow, *setting* the carry flag. IIUC, we should use the normal (that is, non-inverted) semantics for carry everywhere. This patch fixes the three problems above. This patch does not add any new testcases because we have a plenty of them covering the instruction in question. In particular, @u128_saturating_sub is identical to the testcase in the motivating issue. Fixes: #55253
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3324–3327 | The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set". | |
3371 | It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're inverting the output flag, don't we need to invert the input flag too? Or is it already getting inverted correctly? It might make sense to also add tests for i256 addition and subtraction. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3371 | Just checked; this is what we currently generate for i256 subtraction: subs x0, x0, x4 sbcs x1, x1, x5 cset w8, hs cmp w8, #1 sbcs x2, x2, x6 cset w8, hs cmp w8, #1 sbcs x3, x3, x7 ret I guess maybe the input is already handled correctly. |
I've fixed the treatment of the incoming carry as well as foldOverflowCheck. Please take a look. Thanks!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
3371 | It turns out that we have to invert the incoming carry, too. We appear to generate correct code -- subs immediately followed by sbcs, but that's because wrong semantics is canceled by another piece of wrong semantics. lowerADDSUBCARRY generates wrong code, taking in the incoming carry and feeding it to take in the incoming carry without inverting it. Meanwhile, foldOverflowCheck expects the wrong code from lowerADDSUBCARRY and folds it away. That is, if you disable foldOverflowCheck by teaching it to always return SDValue(), then you would get wrong code between subs and sbcs`. In the latest revision, of the patch I've corrected both -- the treatment of the incoming carry and the folding done in foldOverflowCheck. |
Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".
@efriedma, I understand that you've LGTMed an earlier revision, but I'd appreciate a review on the latest one. As you suspected, the incoming carry also needs fixing. Thanks in advance!
LGTM
I agree it makes sense to update the comment in ISDOpcodes.h; might as well do it here. And I still want to see an i256 sub testcase. But you can do that in a followup.
The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set".