[AArch64] Fix sub with carry
13403a70e45b2d22878ba59fc211f8dba3a8deba introduced a bug where we
generate the outgoing carry inverted, which in turn breaks the
lowering of @llvm.usub.sat.i128, returning the normal difference on
saturation and zero otherwise.
Note that AArch64 has peculiar semantics where the subtraction
instructions generate borrow inverted. The problem is that we mix the
two forms of semantics -- the normal carry and inverted carry -- in
the area of extended precision subtractions. Specifically, we have
three problems:
- lowerADDSUBCARRY takes the non-inverted incoming carry from a
subtraction and feeds it to SBCS without inverting it first.
- lowerADDSUBCARRY makes available the outgoing carry from SBCS
without inverting it.
- foldOverflowCheck folds:
(SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry)
When the incoming carry flag is set, CSET LO results in zero. CMP
in turn generates a borrow, *clearing* the carry flag. Instead, we
should fold:
(SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry)
When the incoming carry flag is set, CSET LO results in zero. CMP
does not generate a borrow, *setting* the carry flag.
IIUC, we should use the normal (that is, non-inverted) semantics for
carry everywhere.
This patch fixes the three problems above.
This patch does not add any new testcases because we have a plenty of
them covering the instruction in question. In particular,
@u128_saturating_sub is identical to the testcase in the motivating
issue.
Fixes: #55253Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 3324–3327 | The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set". | |
| 3371 | It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're inverting the output flag, don't we need to invert the input flag too? Or is it already getting inverted correctly? It might make sense to also add tests for i256 addition and subtraction. | |
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 3371 | Just checked; this is what we currently generate for i256 subtraction: subs x0, x0, x4 sbcs x1, x1, x5 cset w8, hs cmp w8, #1 sbcs x2, x2, x6 cset w8, hs cmp w8, #1 sbcs x3, x3, x7 ret I guess maybe the input is already handled correctly. | |
I've fixed the treatment of the incoming carry as well as foldOverflowCheck. Please take a look. Thanks!
| llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
|---|---|---|
| 3371 | It turns out that we have to invert the incoming carry, too. We appear to generate correct code -- subs immediately followed by sbcs, but that's because wrong semantics is canceled by another piece of wrong semantics. lowerADDSUBCARRY generates wrong code, taking in the incoming carry and feeding it to take in the incoming carry without inverting it. Meanwhile, foldOverflowCheck expects the wrong code from lowerADDSUBCARRY and folds it away. That is, if you disable foldOverflowCheck by teaching it to always return SDValue(), then you would get wrong code between subs and sbcs`. In the latest revision, of the patch I've corrected both -- the treatment of the incoming carry and the folding done in foldOverflowCheck. | |
Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".
@efriedma, I understand that you've LGTMed an earlier revision, but I'd appreciate a review on the latest one. As you suspected, the incoming carry also needs fixing. Thanks in advance!
LGTM
I agree it makes sense to update the comment in ISDOpcodes.h; might as well do it here. And I still want to see an i256 sub testcase. But you can do that in a followup.
The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set".