This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Combine ISD::SETCC into AArch64ISD::ANDS
ClosedPublic

Authored by bcl5980 on Mar 11 2022, 1:14 AM.

Details

Summary

When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283

Diff Detail

Event Timeline

bcl5980 created this revision.Mar 11 2022, 1:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:14 AM
bcl5980 requested review of this revision.Mar 11 2022, 1:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:14 AM

Can someone help to review this? This is the first time I try to contribute to LLVM so it will be grateful if someone give me any suggestion.

I've added some comments on the patch itself but these are likely redundant based on my top level question, which is whether the optimisation would be better placed in emitComparison which is used in several places including the lowering code for ISD::SETCC. It's responsible for deciding how best to set the flags and already includes emitting AArch64ISD::ANDS so your work feels like a natural extension.

As a side comment the diff itself lacks context and so, for example, things like the name of the changed function is unknown without searching a local checkout. When manually uploading a patch to phabricator if you use git show HEAD -U999999 the patch will include the context, which will be visible on phabricator. Alternatively you can use the arcanist tool to upload patches. See https://llvm.org/docs/Phabricator.html

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17286

Personally I wouldn't bother with factoring out this condition because it makes the diff look bigger than it actually is and causes more indentation. I'd just add your new combine test in isolation. i.e.

if (Cond == ISD::SETNE && isOneConstant(RHS) &&
    LHS->getOpcode() == AArch64ISD::CSEL &&
    isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) &&
    LHS->hasOneUse()) {
17307–17308

It's worth restricting this combine until later in the code generation pipeline because introducing target specific nodes (i.e. AArch64ISD::) early can result in loosing useful optimisations and/or sometimes introduces legalisation issues. The above block get's away with it because it requires one of its inputs to be a a target specific node and so will naturally only trigger late.

I doubt it's necessary to restrict the combine until after legalisation, so it's likely just a case of adding

if (DCI.isBeforeLegalize())
  return SDValue();

at the top of this function so that you can be sure the types are legal and DAGCombine has had enough chance to do all the target independent combines.

17310–17311

Do you need to descend into the operands of ISD::SRL? Does LHS->getValueType(0) work? because the input and output types should match.

paulwalker-arm added a comment.EditedMar 14 2022, 4:40 AM

An alternative suggested is to perhaps keep the DAGCombine but have it just canonicalise the shift pattern to use ISD::AND [1], that way I believe the existing code in emitComparison will do what you need. If that works then that's my preferred option as it keeps things simple and relatively target agnostic.

[1]
setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

An alternative suggested is to perhaps keep the DAGCombine but have it just canonicalise the shift pattern to use ISD::AND [1], that way I believe the existing code in emitComparison will do what you need. If that works then that's my preferred option as it keeps things simple and relatively target agnostic.

[1]
setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

It looks when I do this in combine , and will be replaced to shift again soon.

Combining: t31: i32 = setcc t14, Constant:i64<0>, setne:ch
Is 0 legal add imm: yes
Creating constant: t34: i64 = Constant<-8192>
Creating new node: t35: i64 = and t2, Constant:i64<-8192>

Combining: t25: ch = setne

Combining: t20: i64 = Constant<0>

Combining: t14: i64 = srl t2, Constant:i64<13>

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17307–17308

It looks ISD::SETCC will lower to AArch64ISD::CSEL in custom legalize. Do we need to add a new function like performCSELCombine to do combine on AArch64ISD::CSEL?

bcl5980 marked an inline comment as not done.Mar 14 2022, 7:55 AM
bcl5980 added inline comments.Mar 14 2022, 8:04 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17307–17308

I'm sorry I find the function performCSELCombine. And another way is combine AArch64ISD::SUBS
SUBS (srl x, imm), 0 --> ANDS x, (-1 << imm)
which way do you think is better?

paulwalker-arm added a comment.EditedMar 14 2022, 8:30 AM

Not sure I understand. The combining: .... text just means it's visiting a node. There's no following text to suggest it has created anything new like there is when the AND is created. My guess is that DAGCombine continues to visit the operands of the original node despite it being replace.

I just tried

// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
if (Cond == ISD::SETNE && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) &&
    LHS->hasOneUse()) {
  EVT TstVT = LHS->getValueType(0);
  if (TstVT == MVT::i32 || TstVT == MVT::i64) {
    uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1);
    SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0),
                              DAG.getConstant(TstImm, DL, TstVT));
    return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2));
  }
}

using origin/main from a few minutes ago and I get the same new output for llvm/test/CodeGen/AArch64/arm64-xaluo.ll as you do. Are you seeing something different?

Not sure I understand. The combining: .... text just means it's visiting a node. There's no following text to suggest it has created anything new like there is when the AND is created. My guess is that DAGCombine continues to visit the operands of the original node despite it being replace.

I just tried

// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
if (Cond == ISD::SETNE && isNullConstant(RHS) &&
    LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) &&
    LHS->hasOneUse()) {
  EVT TstVT = LHS->getValueType(0);
  if (TstVT == MVT::i32 || TstVT == MVT::i64) {
    uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1);
    SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0),
                              DAG.getConstant(TstImm, DL, TstVT));
    return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2));
  }
}

using origin/main from a few minutes ago and I get the same new output for llvm/test/CodeGen/AArch64/arm64-xaluo.ll as you do. Are you seeing something different?

Yeah, it works. Test failed because this code generate tst w8, 0xff00(not 0xffffff00). That is also correct as knownbits is 16 bit for i8 mul.
So the question is which way do you think is better?

performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)
or
performSETCCCombine: setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne

Both of them keep ISD before legalize.

Thanks for the investigation. I think the options are:

  1. DAG combine for setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
  2. update emitComparison to replace replace single use srl with ands.

I don't see "performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)" as an option as the only reason we would see this idiom is when emitComparison emits it.

Of the two approaches (1) is my favourite purely because it tries to canonicalise the DAG, which is always beneficial. I say this because I guess if we're seeing srl as something to support then how far away are we from wanting to support shl and thus I'd rather canonicalise the DAG rather than make emitComparison progressively more complex.

bcl5980 updated this revision to Diff 415313.Mar 14 2022, 9:11 PM
bcl5980 edited the summary of this revision. (Show Details)

Older pattern: srl x, imm ==> ands x, (-1 << imm)
New pattern: srl x, imm ==> and x, (-1 << imm)

bcl5980 marked 3 inline comments as done.Mar 14 2022, 9:25 PM
bcl5980 edited the summary of this revision. (Show Details)Mar 15 2022, 10:47 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392

This test is fragile because for example it will assert when TstVT is a scalable vector, whilst also return true for v2i32, which I doubt you want. I'm guessing it's only the isa<ConstantSDNode>(LHS->getOperand(1)) condition that's saving you here.

Given you're emitting general ISD nodes do you really care what the type is? If not then perhaps you just need if (TstVTisScalarInteger())?

bcl5980 added inline comments.Mar 16 2022, 7:11 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392

Thanks for the mention. ScalarInterger check is necessary.
We can get benifit from SRL pattern when the data type is i128. This check is only used for avoid i128.
How about TstVT.getFixedSizeInBits() <= 64 && TstVT.isScalarInteger() ?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17392

Works for me but please reverse the tests so that we only call getFixedSizeInBits once we know TstVT is a scalar integer.

bcl5980 updated this revision to Diff 416372.Mar 17 2022, 6:33 PM

Disable the combination when op type is vector

bcl5980 marked 2 inline comments as done.Mar 17 2022, 6:33 PM
paulwalker-arm accepted this revision.Mar 18 2022, 4:22 AM
This revision is now accepted and ready to land.Mar 18 2022, 4:22 AM

Hi paulwalker-arm, thanks for your patience. Since it is my first change, I don't have commit access yet. Could you help to commit it please? Thank you.
user.name: chenglin.bi
user.email: chenglin.bi@cixcomputing.com

This revision was automatically updated to reflect the committed changes.