This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Sink operands to allow for bitselect instructions
ClosedPublic

Authored by pranavk on Mar 30 2023, 1:46 PM.

Details

Reviewers
efriedma
dmgreen
Summary

The pattern Or(And(A, MaskValue), And(B, ~MaskValue)), where ~MaskValue = Xor(MaskValue, -1) gets lowered to bitselect instruction when NEON is available. However, when this pattern is in a loop and MaskValue lives outside of the immediate basic block, instruction selection isn't able to choose bitselect and we end up with sequence of ORs and ANDs. This patch sinks such MaskValue into the basic block to allow backend to select bit select instructions.

This will solve performance bugs mentioned in this comment: https://github.com/llvm/llvm-project/issues/49305#issuecomment-1440828393

VBSL intrinsics can be found here: https://developer.arm.com/architectures/instruction-sets/intrinsics/#q=vbsl

Diff Detail

Event Timeline

pranavk created this revision.Mar 30 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 1:46 PM
pranavk requested review of this revision.Mar 30 2023, 1:46 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 30 2023, 1:46 PM
scw edited reviewers, added: efriedma; removed: eli.friedman.
scw added subscribers: scw, echristo.
efriedma added subscribers: bsmith, paulwalker-arm, lenary.

The primary tradeoff here is that existing optimizations won't understand the intrinsic... for example, we can't constant-fold, or automatically invert the mask. But making the intrinsics more predictably produce efficient sequences is probably worthwhile.

(Any other opinions here?)

I guess I should note both the examples in https://github.com/llvm/llvm-project/issues/49305 could probably be fixed in other ways... we have heuristics to, for example, sink logic ops into loops when it's profitable. But that requires someone to notice the specific issues, and take the time to diagnose/fix them.

My preference would be for fixing the code we have, not introducing new intrinsics. Intrinsics act as black-boxes for the optimizer, and I'm pretty sure I've heard of cases in the past of the compiler optimizing the or/and/xor's to nicer sequences of instructions. It would be a shame to lose that.

The number of instructions in bsl is quite high compared to many intrinsics, they probably have a high chance of going wrong. But if we know of ways to optimize them (through shouldSinkOperands), then that would be my preference. It can end up helping all cases, not just the intrinsics.

pranavk updated this revision to Diff 521040.May 10 2023, 10:56 AM
pranavk edited the summary of this revision. (Show Details)

Change shouldSinkOperand to allow backend to generate bitselect instructions

pranavk planned changes to this revision.May 10 2023, 10:58 AM

I agree. I changed the implementation to not introduce the intrinsic. I will need another change in InstCombine to handle case #1 mentioned on github bug report. I will have separate patch for it changing InstCombine. Thanks

pranavk requested review of this revision.May 10 2023, 10:59 AM
pranavk updated this revision to Diff 521114.May 10 2023, 3:05 PM

[AArch64][InstCombine] Bail out for bitselect instructions

pranavk updated this revision to Diff 521115.May 10 2023, 3:06 PM

[AArch64] Change shouldSinkOperand to allow bitselect instructions

pranavk retitled this revision from [AArch64] Add IR intrinsics for vbsl* C intrinsics to [AArch64] Sink operands to allow for bitselect instructions.May 10 2023, 3:14 PM
pranavk edited the summary of this revision. (Show Details)

Thanks for working on this. I noticed there was another instance of vbsl being reported recently in https://github.com/llvm/llvm-project/issues/62642. Hopefully it can be addresses via extra optimizations too.

Can you add a testcase for the issues in https://github.com/llvm/llvm-project/issues/49305? And look into the existing tests.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14343

That sounds like it might be a bug that happens if it tries to sink too many operands? From what I remember the order they are put into Ops might matter. And if it is sinking to the Or it might need to add both the And as well as the Not.

14346

I->hasOneUse()

14348

I->user_back();

14364

Can this be simplified with a m_Not matcher? In general instructions will be canonicalized so that constants are on the RHS.

If we are sinking the Not, I feel this should want to test that the pattern makes up a bsl, and if it does then sink the operand of I. i.e. something like checking that OI is m_c_Or(m_c_And(m_Value(A),m_Value(B)),m_specific(I)), and then checking that I is m_c_And(m_Not(m_Specific(A)),m_Value(D)) or the other way around.

pranavk planned changes to this revision.May 11 2023, 3:27 PM
pranavk marked 2 inline comments as done.
pranavk added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14343

When I tried to sink Or, I didn't add both Ands in the Vector. So it was sinking it just before Or even though it was used by And one instruction before the location where it sunk it. So, I don't think it's a bug. I just forgot to add Ands to the vector. I tried adding Ands to the vector and it works. So I have changed my implementation to switch-case under Instruction::Or as I think that makes it easier to read this code.

14364

Absolutely. I didn't look close enough it seems in PatternMatcher.h file. After discovering whole bunch of pattern matchers, I have shortened/simplified this implementation using m_not, etc. Please look at the latest patch.

I have additionally added more guards/checks to prevent this from happening when one of the And, or its operands are not available in same basic block.

pranavk updated this revision to Diff 521474.May 11 2023, 3:28 PM
pranavk edited the summary of this revision. (Show Details)

Address reviewer comments

tests coming

pranavk updated this revision to Diff 521545.May 11 2023, 8:17 PM

More concise pattern matching

dmgreen accepted this revision.May 14 2023, 1:51 AM

Thanks. LGTM with a few extra suggestions.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14357

I'm not sure if this is necessary, so long as some of the operands can be sunk, but it is probably OK for the moment to keep as-is.

14361–14363

I think this can avoid the loop if we just use
Ops.push_back(&MainAnd->getOperandUse(MainAnd->getOperand(0) == IA ? 1 : 0));

llvm/test/CodeGen/AArch64/aarch64-bit-gen.ll
148

Can you run utils/update_llc_test_checks.py on the file, to generate the runtime checks? There will be more of them but that should be OK in this case. It doesn't looks too large.

This revision is now accepted and ready to land.May 14 2023, 1:51 AM
pranavk updated this revision to Diff 522309.May 15 2023, 1:00 PM
pranavk marked 3 inline comments as done.

address reviewer comments

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14361–14363

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

pranavk added inline comments.May 15 2023, 1:01 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14361–14363

Done. this was left as part of my final refactoring. Uploaded new patch.

dmgreen accepted this revision.May 15 2023, 11:30 PM

Thanks. LGTM

pranavk closed this revision.May 17 2023, 2:54 PM

I noticed there was another instance of vbsl being reported recently in https://github.com/llvm/llvm-project/issues/62642. Hopefully it can be addresses via extra optimizations too.

This is another InstCombine problem -- as soon as it sees constant, InstCombine runs demanded bits pass/analysis and tries to do clever tricks with sequence of and/and/or. We need to teach InstCombine here to not touch the sequence as we expect it to be vectorized in instruction lowering. I will try to fix this when looking at the other InstCombine problem (https://reviews.llvm.org/D150316) we are talking about.

This comment was removed by pranavk.