This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Fix sub with carry
ClosedPublic

Authored by kazu on May 4 2022, 9:16 PM.

Details

Summary
[AArch64] Fix sub with carry

13403a70e45b2d22878ba59fc211f8dba3a8deba introduced a bug where we
generate the outgoing carry inverted, which in turn breaks the
lowering of @llvm.usub.sat.i128, returning the normal difference on
saturation and zero otherwise.

Note that AArch64 has peculiar semantics where the subtraction
instructions generate borrow inverted.  The problem is that we mix the
two forms of semantics -- the normal carry and inverted carry -- in
the area of extended precision subtractions.  Specifically, we have
three problems:

- lowerADDSUBCARRY takes the non-inverted incoming carry from a
  subtraction and feeds it to SBCS without inverting it first.

- lowerADDSUBCARRY makes available the outgoing carry from SBCS
  without inverting it.

- foldOverflowCheck folds:

  (SBC{S} l r (CMP (CSET LO carry) 1)) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  in turn generates a borrow, *clearing* the carry flag.  Instead, we
  should fold:

  (SBC{S} l r (CMP 0 (CSET LO carry))) => (SBC{S} l r carry)

  When the incoming carry flag is set, CSET LO results in zero.  CMP
  does not generate a borrow, *setting* the carry flag.

IIUC, we should use the normal (that is, non-inverted) semantics for
carry everywhere.

This patch fixes the three problems above.

This patch does not add any new testcases because we have a plenty of
them covering the instruction in question.  In particular,
@u128_saturating_sub is identical to the testcase in the motivating
issue.

Fixes: #55253

Diff Detail

Event Timeline

kazu created this revision.May 4 2022, 9:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 9:16 PM
kazu requested review of this revision.May 4 2022, 9:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2022, 9:16 PM
efriedma added inline comments.May 5 2022, 10:52 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3321–3324

The comment here could be improved. Maybe spell out "if Invert is true, value is 0 if 'C' bit is set, and 1 if it is not set".

3367

It's a bit concerning to me that you're changing OutFlag, but not OpCarryIn. If we're inverting the output flag, don't we need to invert the input flag too? Or is it already getting inverted correctly?

It might make sense to also add tests for i256 addition and subtraction.

efriedma added inline comments.May 5 2022, 10:56 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3367

Just checked; this is what we currently generate for i256 subtraction:

subs    x0, x0, x4
sbcs    x1, x1, x5
cset    w8, hs
cmp     w8, #1
sbcs    x2, x2, x6
cset    w8, hs
cmp     w8, #1
sbcs    x3, x3, x7
ret

I guess maybe the input is already handled correctly.

kazu updated this revision to Diff 427442.May 5 2022, 2:04 PM

Update the comment.

efriedma accepted this revision.May 5 2022, 2:06 PM

LGTM

This revision is now accepted and ready to land.May 5 2022, 2:06 PM
kazu updated this revision to Diff 427547.May 6 2022, 12:44 AM
kazu marked an inline comment as done.

Fix foldOverflowCheck.

kazu edited the summary of this revision. (Show Details)May 6 2022, 12:56 AM
kazu requested review of this revision.May 6 2022, 1:01 AM

I've fixed the treatment of the incoming carry as well as foldOverflowCheck. Please take a look. Thanks!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3367

It turns out that we have to invert the incoming carry, too.

We appear to generate correct code -- subs immediately followed by sbcs, but that's because wrong semantics is canceled by another piece of wrong semantics. lowerADDSUBCARRY generates wrong code, taking in the incoming carry and feeding it to take in the incoming carry without inverting it. Meanwhile, foldOverflowCheck expects the wrong code from lowerADDSUBCARRY and folds it away.

That is, if you disable foldOverflowCheck by teaching it to always return SDValue(), then you would get wrong code between subs and sbcs`.

In the latest revision, of the patch I've corrected both -- the treatment of the incoming carry and the folding done in foldOverflowCheck.

Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".

kazu edited the summary of this revision. (Show Details)May 6 2022, 9:35 AM
kazu added a comment.May 6 2022, 9:37 AM

Is it worth updating the ISD node description in ISDOpcodes.h? because this is how the bug happened in the first place, with a literal interpretation of "incoming carry" and "and the output carry".

Sure, I am happy to post a separate patch for that.

kazu added a comment.May 6 2022, 9:40 AM

@efriedma, I understand that you've LGTMed an earlier revision, but I'd appreciate a review on the latest one. As you suspected, the incoming carry also needs fixing. Thanks in advance!

efriedma accepted this revision.May 6 2022, 10:13 AM

LGTM

I agree it makes sense to update the comment in ISDOpcodes.h; might as well do it here. And I still want to see an i256 sub testcase. But you can do that in a followup.

This revision is now accepted and ready to land.May 6 2022, 10:13 AM
jgorbe added a subscriber: jgorbe.May 6 2022, 10:16 AM
This revision was landed with ongoing or failed builds.May 6 2022, 11:04 AM
This revision was automatically updated to reflect the committed changes.