Page MenuHomePhabricator

[AArch64] Improve codegen for inverted overflow checking intrinsics

Authored by aemerson on Sep 21 2017, 5:10 PM.



[AArch64] Improve codegen for inverted overflow checking intrinsics.

E.g. if we have a (xor(overflow-bit), 1) where overflow-bit comes from an intrinsic like llvm.sadd.with.overflow then we can kill the xor and use the inverted condition code for the CSEL.


Diff Detail


Event Timeline

aemerson created this revision.Sep 21 2017, 5:10 PM
kristof.beyls added inline comments.Oct 5 2017, 1:51 AM
1980–2012 ↗(On Diff #116289)

Hi Amara,

I'm only vaguely familiar with this area of the code base.
My understanding is that you're aiming to have one more pattern apply involving an xor node.
I think it'd be nice to write out the pattern in a comment just like is available for the pattern matched in the existing code in LowerXOR.

Apart from that, with my unfamiliarity of this code base, I wonder why this pattern matching optimization is done during lowering.
Are there good reasons this optimization isn't done elsewhere, e.g. described by a tablegen pattern or during DAGCombine (e.g. in performXorCombine in this same source file)?
Apologies if the answer is blatantly obvious to people more experienced in this area.



javed.absar added inline comments.Oct 5 2017, 2:09 AM
1987 ↗(On Diff #116289)

Perhaps the COp1 test should be first test for efficiency e.g.

if (ConstantSDNode *OtherIsConst = dyn_cast<ConstantSDNode>(Other) ) {

if (....) {


COp1 => OtherIsConst

aemerson added inline comments.Oct 5 2017, 3:06 AM
1980–2012 ↗(On Diff #116289)

Thanks for taking a look. This is part of lowering because I want to re-use the code for detecting the overflow arithmetic nodes in getAArch64XALUOOp(). That helper function is used to recognize the patterns in a few other places like LowerBR_CC and LowerSelect for example. If we leave this combine until later we can't re-use that as the pattern is destroyed.

I'll improve the comment to better explain the transformation.

1987 ↗(On Diff #116289)

I'd prefer not to have nested if statements as that makes the rest of the block cramped. I have noticed however that isOneConstant() can be used instead, I'll use that and move it earlier in the condition.

kristof.beyls added inline comments.Oct 5 2017, 5:52 AM
1980–2012 ↗(On Diff #116289)

Thanks Amara, that makes sense to me.
Now that I've looked at getAArch64XALUOOp() and where it's used, I couldn't help but notice that almost everywhere it's used, the same boiler-plate kind-of code is present around it:

if (Sel.getResNo() == 1 &&
      (Opc == ISD::SADDO || Opc == ISD::UADDO || Opc == ISD::SSUBO ||
       Opc == ISD::USUBO || Opc == ISD::SMULO || Opc == ISD::UMULO) && ...) {
      // Only lower legal XALUO ops.
    if (!DAG.getTargetLoweringInfo().isTypeLegal(LHS->getValueType(0)))
      return SDValue();
    AArch64CC::CondCode OFCC;
    SDValue Value, Overflow;
    std::tie(Value, Overflow) = getAArch64XALUOOp(OFCC, CCVal.getValue(0), DAG);
    SDValue CCVal = DAG.getConstant(OFCC, DL, MVT::i32);
  return DAG.getNode(AArch64ISD::CSEL, DL, Op.getValueType(), TVal, FVal,
                   CCVal, Overflow);

Couldn't that code be factored out somehow instead of having it copy pasted a few times?
I think that could improve the readability of the code and result in the advantages that the DRY-principle brings.

aemerson added inline comments.Oct 5 2017, 8:03 AM
1980–2012 ↗(On Diff #116289)

Yeah, I think this is a borderline case. There's not a way I can see that can factor things out in a nice way due to the differences in the initial if-condition as well as how the AArch64 CC is mutated in two of the cases (while maintaining readability). I can split out the Opc checks though as they're the same in all cases?

kristof.beyls added inline comments.Oct 5 2017, 12:07 PM
1980–2012 ↗(On Diff #116289)

I hope factoring out something like

bool matchesLegalSuAddSubMulWithOverflow(SDValue Op, SelectionDAG &DAG)
  unsigned Opc = Op.getOpcode();
  if (Op.getResNo() == 1 &&
      (Opc == ISD::SADDO || Opc == ISD::UADDO || Opc == ISD::SSUBO ||
       Opc == ISD::USUBO || Opc == ISD::SMULO || Opc == ISD::UMULO))
    if (DAG.getTargetLoweringInfo().isTypeLegal(Op->getValueType(0)))
    return true;
  return false;

and using that would already reduce the copy paste a lot; and the actual logic of the code might become a bit more simple and obvious?
But it's debatable, so I'll leave the decision on whether to factor this out or not up to you.

aemerson updated this revision to Diff 117994.Oct 6 2017, 7:15 AM

Rebased and updated diff to use a helper function for the overflow op result checks. I didn't put the type legal check into the helper because it's used to do an early exit rather than to continue onto the rest of the lower method.

kristof.beyls accepted this revision.Oct 9 2017, 1:44 AM

Thanks Amara, LGTM modulo one tiny nitpick which you should feel free to disagree with.

659–670 ↗(On Diff #117994)

Just one more tiny remark: I think it might be slightly easier to maintain this test file if the tests you added here we be each next to the "positive" test you copy pasted it from higher up in this file. E.g. moving saddo.not.i32 come right after

This revision is now accepted and ready to land.Oct 9 2017, 1:44 AM
This revision was automatically updated to reflect the committed changes.