Page MenuHomePhabricator

[AArch64] Sink operands to allow for bitselect instructions

Authored by pranavk on Mar 30 2023, 1:46 PM.



The pattern Or(And(A, MaskValue), And(B, ~MaskValue)), where ~MaskValue = Xor(MaskValue, -1) gets lowered to bitselect instruction when NEON is available. However, when this pattern is in a loop and MaskValue lives outside of the immediate basic block, instruction selection isn't able to choose bitselect and we end up with sequence of ORs and ANDs. This patch sinks such MaskValue into the basic block to allow backend to select bit select instructions.

This will solve performance bugs mentioned in this comment:

VBSL intrinsics can be found here:

Diff Detail

Unit TestsFailed

60,100 msx64 debian > MLIR.Examples/standalone::test.toy
Script: -- : 'RUN: at line 1'; "/etc/cmake/bin/cmake" "/var/lib/buildkite-agent/builds/llvm-project/mlir/examples/standalone" -G "Ninja" -DCMAKE_CXX_COMPILER=/usr/bin/clang++ -DCMAKE_C_COMPILER=/usr/bin/clang -DLLVM_ENABLE_LIBCXX=OFF -DMLIR_DIR=/var/lib/buildkite-agent/builds/llvm-project/build/lib/cmake/mlir -DLLVM_USE_LINKER=lld -DPython3_EXECUTABLE="/usr/bin/python3.9"

Event Timeline

pranavk created this revision.Mar 30 2023, 1:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 1:46 PM
pranavk requested review of this revision.Mar 30 2023, 1:46 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 30 2023, 1:46 PM
scw edited reviewers, added: efriedma; removed: eli.friedman.
scw added subscribers: scw, echristo.
efriedma added subscribers: bsmith, paulwalker-arm, lenary.

The primary tradeoff here is that existing optimizations won't understand the intrinsic... for example, we can't constant-fold, or automatically invert the mask. But making the intrinsics more predictably produce efficient sequences is probably worthwhile.

(Any other opinions here?)

I guess I should note both the examples in could probably be fixed in other ways... we have heuristics to, for example, sink logic ops into loops when it's profitable. But that requires someone to notice the specific issues, and take the time to diagnose/fix them.

My preference would be for fixing the code we have, not introducing new intrinsics. Intrinsics act as black-boxes for the optimizer, and I'm pretty sure I've heard of cases in the past of the compiler optimizing the or/and/xor's to nicer sequences of instructions. It would be a shame to lose that.

The number of instructions in bsl is quite high compared to many intrinsics, they probably have a high chance of going wrong. But if we know of ways to optimize them (through shouldSinkOperands), then that would be my preference. It can end up helping all cases, not just the intrinsics.

pranavk updated this revision to Diff 521040.Wed, May 10, 10:56 AM
pranavk edited the summary of this revision. (Show Details)

Change shouldSinkOperand to allow backend to generate bitselect instructions

pranavk planned changes to this revision.Wed, May 10, 10:58 AM

I agree. I changed the implementation to not introduce the intrinsic. I will need another change in InstCombine to handle case #1 mentioned on github bug report. I will have separate patch for it changing InstCombine. Thanks

pranavk requested review of this revision.Wed, May 10, 10:59 AM
pranavk updated this revision to Diff 521114.Wed, May 10, 3:05 PM

[AArch64][InstCombine] Bail out for bitselect instructions

pranavk updated this revision to Diff 521115.Wed, May 10, 3:06 PM

[AArch64] Change shouldSinkOperand to allow bitselect instructions

pranavk retitled this revision from [AArch64] Add IR intrinsics for vbsl* C intrinsics to [AArch64] Sink operands to allow for bitselect instructions.Wed, May 10, 3:14 PM
pranavk edited the summary of this revision. (Show Details)

Thanks for working on this. I noticed there was another instance of vbsl being reported recently in Hopefully it can be addresses via extra optimizations too.

Can you add a testcase for the issues in And look into the existing tests.


That sounds like it might be a bug that happens if it tries to sink too many operands? From what I remember the order they are put into Ops might matter. And if it is sinking to the Or it might need to add both the And as well as the Not.






Can this be simplified with a m_Not matcher? In general instructions will be canonicalized so that constants are on the RHS.

If we are sinking the Not, I feel this should want to test that the pattern makes up a bsl, and if it does then sink the operand of I. i.e. something like checking that OI is m_c_Or(m_c_And(m_Value(A),m_Value(B)),m_specific(I)), and then checking that I is m_c_And(m_Not(m_Specific(A)),m_Value(D)) or the other way around.

pranavk planned changes to this revision.Thu, May 11, 3:27 PM
pranavk marked 2 inline comments as done.
pranavk added inline comments.

When I tried to sink Or, I didn't add both Ands in the Vector. So it was sinking it just before Or even though it was used by And one instruction before the location where it sunk it. So, I don't think it's a bug. I just forgot to add Ands to the vector. I tried adding Ands to the vector and it works. So I have changed my implementation to switch-case under Instruction::Or as I think that makes it easier to read this code.


Absolutely. I didn't look close enough it seems in PatternMatcher.h file. After discovering whole bunch of pattern matchers, I have shortened/simplified this implementation using m_not, etc. Please look at the latest patch.

I have additionally added more guards/checks to prevent this from happening when one of the And, or its operands are not available in same basic block.

pranavk updated this revision to Diff 521474.Thu, May 11, 3:28 PM
pranavk edited the summary of this revision. (Show Details)

Address reviewer comments

tests coming

pranavk updated this revision to Diff 521545.Thu, May 11, 8:17 PM

More concise pattern matching

dmgreen accepted this revision.Sun, May 14, 1:51 AM

Thanks. LGTM with a few extra suggestions.


I'm not sure if this is necessary, so long as some of the operands can be sunk, but it is probably OK for the moment to keep as-is.


I think this can avoid the loop if we just use
Ops.push_back(&MainAnd->getOperandUse(MainAnd->getOperand(0) == IA ? 1 : 0));


Can you run utils/ on the file, to generate the runtime checks? There will be more of them but that should be OK in this case. It doesn't looks too large.

This revision is now accepted and ready to land.Sun, May 14, 1:51 AM
pranavk updated this revision to Diff 522309.Mon, May 15, 1:00 PM
pranavk marked 3 inline comments as done.

address reviewer comments



pranavk added inline comments.Mon, May 15, 1:01 PM

Done. this was left as part of my final refactoring. Uploaded new patch.

dmgreen accepted this revision.Mon, May 15, 11:30 PM

Thanks. LGTM

pranavk closed this revision.Wed, May 17, 2:54 PM

I noticed there was another instance of vbsl being reported recently in Hopefully it can be addresses via extra optimizations too.

This is another InstCombine problem -- as soon as it sees constant, InstCombine runs demanded bits pass/analysis and tries to do clever tricks with sequence of and/and/or. We need to teach InstCombine here to not touch the sequence as we expect it to be vectorized in instruction lowering. I will try to fix this when looking at the other InstCombine problem ( we are talking about.

This comment was removed by pranavk.