When N > 12, (2^N -1) is not a legal add immediate (isLegalAddImmediate will return false).
ANd if SetCC input use this number, DAG combiner will generate one more SRL instruction.
So combine [setcc (srl x, imm), 0, ne] to [setcc (and x, (-1 << imm)), 0, ne] to get better optimization in emitComparison
Fix https://github.com/llvm/llvm-project/issues/54283
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can someone help to review this? This is the first time I try to contribute to LLVM so it will be grateful if someone give me any suggestion.
I've added some comments on the patch itself but these are likely redundant based on my top level question, which is whether the optimisation would be better placed in emitComparison which is used in several places including the lowering code for ISD::SETCC. It's responsible for deciding how best to set the flags and already includes emitting AArch64ISD::ANDS so your work feels like a natural extension.
As a side comment the diff itself lacks context and so, for example, things like the name of the changed function is unknown without searching a local checkout. When manually uploading a patch to phabricator if you use git show HEAD -U999999 the patch will include the context, which will be visible on phabricator. Alternatively you can use the arcanist tool to upload patches. See https://llvm.org/docs/Phabricator.html
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17286 | Personally I wouldn't bother with factoring out this condition because it makes the diff look bigger than it actually is and causes more indentation. I'd just add your new combine test in isolation. i.e. if (Cond == ISD::SETNE && isOneConstant(RHS) && LHS->getOpcode() == AArch64ISD::CSEL && isNullConstant(LHS->getOperand(0)) && isOneConstant(LHS->getOperand(1)) && LHS->hasOneUse()) { | |
17307–17308 | It's worth restricting this combine until later in the code generation pipeline because introducing target specific nodes (i.e. AArch64ISD::) early can result in loosing useful optimisations and/or sometimes introduces legalisation issues. The above block get's away with it because it requires one of its inputs to be a a target specific node and so will naturally only trigger late. I doubt it's necessary to restrict the combine until after legalisation, so it's likely just a case of adding if (DCI.isBeforeLegalize()) return SDValue(); at the top of this function so that you can be sure the types are legal and DAGCombine has had enough chance to do all the target independent combines. | |
17310–17311 | Do you need to descend into the operands of ISD::SRL? Does LHS->getValueType(0) work? because the input and output types should match. |
An alternative suggested is to perhaps keep the DAGCombine but have it just canonicalise the shift pattern to use ISD::AND [1], that way I believe the existing code in emitComparison will do what you need. If that works then that's my preferred option as it keeps things simple and relatively target agnostic.
[1]
setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
It looks when I do this in combine , and will be replaced to shift again soon.
Combining: t31: i32 = setcc t14, Constant:i64<0>, setne:ch
Is 0 legal add imm: yes
Creating constant: t34: i64 = Constant<-8192>
Creating new node: t35: i64 = and t2, Constant:i64<-8192>
Combining: t25: ch = setne
Combining: t20: i64 = Constant<0>
Combining: t14: i64 = srl t2, Constant:i64<13>
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17307–17308 | It looks ISD::SETCC will lower to AArch64ISD::CSEL in custom legalize. Do we need to add a new function like performCSELCombine to do combine on AArch64ISD::CSEL? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17307–17308 | I'm sorry I find the function performCSELCombine. And another way is combine AArch64ISD::SUBS |
Not sure I understand. The combining: .... text just means it's visiting a node. There's no following text to suggest it has created anything new like there is when the AND is created. My guess is that DAGCombine continues to visit the operands of the original node despite it being replace.
I just tried
// setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne if (Cond == ISD::SETNE && isNullConstant(RHS) && LHS->getOpcode() == ISD::SRL && isa<ConstantSDNode>(LHS->getOperand(1)) && LHS->hasOneUse()) { EVT TstVT = LHS->getValueType(0); if (TstVT == MVT::i32 || TstVT == MVT::i64) { uint64_t TstImm = -1ULL << LHS->getConstantOperandVal(1); SDValue TST = DAG.getNode(ISD::AND, DL, TstVT, LHS->getOperand(0), DAG.getConstant(TstImm, DL, TstVT)); return DAG.getNode(ISD::SETCC, DL, VT, TST, RHS, N->getOperand(2)); } }
using origin/main from a few minutes ago and I get the same new output for llvm/test/CodeGen/AArch64/arm64-xaluo.ll as you do. Are you seeing something different?
Yeah, it works. Test failed because this code generate tst w8, 0xff00(not 0xffffff00). That is also correct as knownbits is 16 bit for i8 mul.
So the question is which way do you think is better?
performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)
or
performSETCCCombine: setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
Both of them keep ISD before legalize.
Thanks for the investigation. I think the options are:
- DAG combine for setcc (srl x, imm), 0, ne ==> setcc (and x, (-1 << imm)), 0, ne
- update emitComparison to replace replace single use srl with ands.
I don't see "performSUBSCombine: subs (srl x, imm), 0 ==> ands x, (-1 << imm)" as an option as the only reason we would see this idiom is when emitComparison emits it.
Of the two approaches (1) is my favourite purely because it tries to canonicalise the DAG, which is always beneficial. I say this because I guess if we're seeing srl as something to support then how far away are we from wanting to support shl and thus I'd rather canonicalise the DAG rather than make emitComparison progressively more complex.
Older pattern: srl x, imm ==> ands x, (-1 << imm)
New pattern: srl x, imm ==> and x, (-1 << imm)
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17392 | This test is fragile because for example it will assert when TstVT is a scalable vector, whilst also return true for v2i32, which I doubt you want. I'm guessing it's only the isa<ConstantSDNode>(LHS->getOperand(1)) condition that's saving you here. Given you're emitting general ISD nodes do you really care what the type is? If not then perhaps you just need if (TstVTisScalarInteger())? |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17392 | Thanks for the mention. ScalarInterger check is necessary. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17392 | Works for me but please reverse the tests so that we only call getFixedSizeInBits once we know TstVT is a scalar integer. |
Hi paulwalker-arm, thanks for your patience. Since it is my first change, I don't have commit access yet. Could you help to commit it please? Thank you.
user.name: chenglin.bi
user.email: chenglin.bi@cixcomputing.com
Personally I wouldn't bother with factoring out this condition because it makes the diff look bigger than it actually is and causes more indentation. I'd just add your new combine test in isolation. i.e.