This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][DAGCombiner] fold instruction BIC from ISD::AND
AbandonedPublic

Authored by bcl5980 on Dec 8 2022, 12:39 AM.

Details

Summary

This patch add a new target ISD AArch64ISD::SBIC to represent scalar version instruction bic.
And select the ISD in the stage combine from this pattern:
(~X | C) & Y --> bic Y, (X & ~C)

Diff Detail

Event Timeline

bcl5980 created this revision.Dec 8 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 12:39 AM
bcl5980 requested review of this revision.Dec 8 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2022, 12:39 AM
bcl5980 retitled this revision from [AArch64][DAGCombiner] fold BIC from AND in dagcombiner to [AArch64][DAGCombiner] fold instruction BIC from ISD::AND.Dec 8 2022, 12:40 AM
bcl5980 updated this revision to Diff 481530.Dec 8 2022, 11:46 PM

add shiftreg selection for bic

bcl5980 updated this revision to Diff 481532.Dec 9 2022, 12:01 AM

improve code in td file

mingmingl added inline comments.Dec 13 2022, 4:18 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16046

An alternative is and Y, (orn C, X).

https://godbolt.org/z/PGnTdTMKW shows orn is generated if RHS of or is not a constant or RHS is a large constant, but orn not generated for small constant.

https://gist.github.com/minglotus-6/dd2c75a2253b128081125a578e7ff6c6 is the llc ISel debug output for small constant, and https://gist.github.com/minglotus-6/2692d59e5c939941895d3581a4bdcbea for large constant.

I wonder if fixing ISel (after dag-combiner) is a more general solution (i.e., selects ~X | C to orn, and fixes this motivating case as well).

16091

nit: add a comment like LHS won't be a constant sd node if RHS is not a constant, due to canonicalization, or better assert the conditon.

llvm/test/CodeGen/AArch64/logical-op-with-not.ll
39–40

Related with scalar BIC DAG node (probably not a part of this patch)

%4 = xor i32 %3, -1
%5 = and i32 %4, %1

itself could be a BIC as well.

llc with -debug shows xor %3, -1 is combined into or (xor %0, -1), -65281 so tblgen pattern won't see the %4 %5 sequence.

bcl5980 added inline comments.Dec 13 2022, 7:16 PM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
16046

Yeah, we have two ways to fix the issue:

and + orn
and + bic

The problem for orn is it always need to involve a mov. So I choice and+bic to fix the issue.
I also try to add the code into ISelDAG, but it looks the code become complex and it will lose some optimization for the first and. Like the case @bic_shiftedreg_from_and.

bcl5980 added inline comments.Dec 13 2022, 10:32 PM
llvm/test/CodeGen/AArch64/logical-op-with-not.ll
39–40

Yeah, actually what this change do is reverting the xor + or to xor + and.

Hello. In general we have found it is better to avoid AArch64ISD nodes if possible. That way the benefits from generic dag combines keep applying, as opposed to becoming black boxes that the rest of the optimizer cannot see. Sometimes they are necessary, and that might be the case here, but is it possible to just adjust the code using standard nodes? It doesn't seem like this applies very often though - perhaps that makes it OK in this case.

Hello. In general we have found it is better to avoid AArch64ISD nodes if possible. That way the benefits from generic dag combines keep applying, as opposed to becoming black boxes that the rest of the optimizer cannot see. Sometimes they are necessary, and that might be the case here, but is it possible to just adjust the code using standard nodes? It doesn't seem like this applies very often though - perhaps that makes it OK in this case.

The headache thing here is DAGCombiner will revert the pattern to the origin if I use standard node. So I try to add a new AArch64ISD.

Hello. In general we have found it is better to avoid AArch64ISD nodes if possible. That way the benefits from generic dag combines keep applying, as opposed to becoming black boxes that the rest of the optimizer cannot see. Sometimes they are necessary, and that might be the case here, but is it possible to just adjust the code using standard nodes? It doesn't seem like this applies very often though - perhaps that makes it OK in this case.

The headache thing here is DAGCombiner will revert the pattern to the origin if I use standard node. So I try to add a new AArch64ISD.

Maybe I can put the transform to after legalize to make sure standard nodes exist longer.

The headache thing here is DAGCombiner will revert the pattern to the origin if I use standard node. So I try to add a new AArch64ISD.

Maybe I can put the transform to after legalize to make sure standard nodes exist longer.

Putting the transform late can certainly help, thats a good option if it is needed. It can sometimes be better to just prevent the DAG combine that is going in the wrong direction. There are a set of methods like shouldFoldConstantShiftPairToMask and isDesirableToCommuteWithShift that can be used by the target to control how the DAGCombiner acts. There can be problems if the transform is towards a canonical form that other DAG combines rely upon, but it sounds like in this case it might be easy enough to add?

The headache thing here is DAGCombiner will revert the pattern to the origin if I use standard node. So I try to add a new AArch64ISD.

Maybe I can put the transform to after legalize to make sure standard nodes exist longer.

Putting the transform late can certainly help, thats a good option if it is needed. It can sometimes be better to just prevent the DAG combine that is going in the wrong direction. There are a set of methods like shouldFoldConstantShiftPairToMask and isDesirableToCommuteWithShift that can be used by the target to control how the DAGCombiner acts. There can be problems if the transform is towards a canonical form that other DAG combines rely upon, but it sounds like in this case it might be easy enough to add?

These method like shouldFoldConstantShiftPairToMask or isDesirableToCommuteWithShift only prevent DAG to transform a good pattern to bad pattern for us. But if the code is the bad pattern at first, we still can't optimize the case. So if DAGCombiner already has some canonicalization, I prefer to optimize the pattern after DAG combine. And I am not sure if we need add a TLI interface, what interface can we add to make the code general?

These method like shouldFoldConstantShiftPairToMask or isDesirableToCommuteWithShift only prevent DAG to transform a good pattern to bad pattern for us. But if the code is the bad pattern at first, we still can't optimize the case. So if DAGCombiner already has some canonicalization, I prefer to optimize the pattern after DAG combine. And I am not sure if we need add a TLI interface, what interface can we add to make the code general?

It appears to be this code that is transforming not(and(x,C)) -> or(not(x), not(C)): https://github.com/llvm/llvm-project/blob/85d049a089d4a6a4a67145429ea5d8e155651138/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L8746
There would be advantages and disadvantages to both methods, but it seems like disabling that with a target hook, either universally or in specific cases should work and might be a little more general than this version. mvn_shiftedreg_from_and does get a little worse though, which can maybe be fixed in the DAG2DAG matchers? It could help other architectures if the not constant isn't as cheap as the original too.
If the reverse pattern is needed, it should be OK to add either as a DAG combine or in AArch64ISelLowering, so long as it does not fight with the existing combines.

I would recommend avoiding new AArch64ISD nodes, but if the alternative doesn't work then make sure we only start creating the node late - after the DAG has been legalized.

bcl5980 abandoned this revision.Jan 5 2023, 10:58 PM