This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Optimize comparison chains
ClosedPublic

Authored by Kmeakin on Feb 23 2022, 10:52 AM.

Details

Summary

LLVM generates sub-optimal code for sequences of chained comparions - ie code of the form
(x1 relop1 y1) boolop1 (x2 relop2 y2) boolop2 ... (xn relopn yn)
where relop is one of {<=, <, >=, >, ==, !=} and boolop is one of {&&, ||}.

LLVM currently emits chains of CMP+CSET for each comparison, and then AND/ORR to combine the results of the comparisons. This can be replaced by a chain of CCMPs and a single CSET at the end.

For example.
(x1 < x2) && (x3 > x4) && (x5 != x6) && (x7 == x8)
generates

cmp     w2, w3
cset    w8, hi
cmp     w0, w1
cset    w9, lo
cmp     w4, w5
and     w8, w8, w9
cset    w9, ne
cmp     w6, w7
and     w8, w8, w9
cset    w9, eq
and     w0, w8, w9

but the more efficient code would be:

cmp w2, w3
ccmp w0, w1, #2, hi
ccmp w4, w5, #4, lo
ccmp w6, w7, #0, ne
cset w0, eq

This patch generalizes https://reviews.llvm.org/D118327 to cases where results of comparisons are ORRed together, and where the comparison is performed with CCMP instead of CMP/SUBS

Diff Detail

Event Timeline

Kmeakin created this revision.Feb 23 2022, 10:52 AM
Kmeakin requested review of this revision.Feb 23 2022, 10:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2022, 10:52 AM
Kmeakin retitled this revision from rebase ontop of David Green's patch to Optimize comparison chains on AArch64.Feb 23 2022, 11:00 AM
Kmeakin edited the summary of this revision. (Show Details)
Kmeakin removed a subscriber: hiraditya.
Kmeakin retitled this revision from Optimize comparison chains on AArch64 to [AArch64] Optimize comparison chains.Feb 23 2022, 11:04 AM
Kmeakin added a reviewer: dmgreen.
Kmeakin removed a subscriber: kristof.beyls.
Kmeakin edited the summary of this revision. (Show Details)Feb 23 2022, 11:06 AM

Do you have any cases where the existing And lowering wasn't already performing the folds that the new method does? I feel like it was already working OK, and it may be better to work on top of it as opposed to writing it from scratch. It doesn't seem to be needed for any of the tests, and this example from the commit message already looks OK: https://godbolt.org/z/Mdf8nj5ae. I'm not sure if it's worth sharing the method between And and Or, but this might be better focussing on the Or code.

This patch generalizes https://reviews.llvm.org/D118327 to cases where results of comparisons are ORRed together,

Cool, that's great to see. It looks like it will be very useful.

and where the comparison is performed with CCMP instead of CMP/SUBS

I think the existing PerformANDCSELCombine method already handled that. It only requires one of the two operands to be a SUBS, the other can be anything that sets flags, which it re-uses directly. The SUBS gets converted to a CCMP, the other flag-setting instruction is uses as-is as the input.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14101

AL and NV conditions are inverses of one another, but they shouldn't come up, as they would just always pick one of the operands without needing result of the comparison. You could probably change it to an assert, but they shouldn't be generated from anywhere in practice.

14251

I think this code was fine before. Checking for CSEL will inherently check that the type is legal..

Kmeakin updated this revision to Diff 412075.Mar 1 2022, 6:19 AM

Extend performANDORCSELCombine instead of replacing it

Thanks for the updates. Looks like a good patch, if we can clean up the details a little.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
14051

This is only used in one place? If so it might be simpler inline, or at least closer to the use.

14105–14106

Can you move this comment next to tryCombineToEXTR whilst you are here too?

14115

This can be removed?

14230

Leave this in, I would think?

llvm/test/CodeGen/AArch64/arm64-ccmp.ll
756–757

I think this comment can be removed now. The code looks fine.

llvm/test/CodeGen/AArch64/cmp-chains.ll
2

It can be good to pre-commit the tests, so just the differences get shown in the review. It makes it easier to see what changed.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 2:59 AM
Kmeakin updated this revision to Diff 412435.Mar 2 2022, 8:29 AM

Inline IsBool, move comment next to tryCombineToEXTR, remove obsolete comments from tests

On a general note, I'm a bit concerned that we now have two different codepaths for generating ccmp during isel: emitConjunction, and performANDORCSELCombine. As far as I can tell, they're largely overlapping; emitConjunction is more general, and performANDORCSELCombine triggers in more cases, but they're both doing basically the same transform. Can we consolidate this code somehow?

dmgreen accepted this revision.Mar 3 2022, 12:17 AM

On a general note, I'm a bit concerned that we now have two different codepaths for generating ccmp during isel: emitConjunction, and performANDORCSELCombine. As far as I can tell, they're largely overlapping; emitConjunction is more general, and performANDORCSELCombine triggers in more cases, but they're both doing basically the same transform. Can we consolidate this code somehow?

Yep I was thinking of that when I ended up using emitConjunction from branches recently. I think that this method of peephole optimizing bit at a time is a more sensible way to go than trying to do things all at once. It's a more "ISel" way of doing things, and should capture more cases. But a few more folds might be needed before we get to the point that we don't need emitConjunction any more.

As far as this patch goes, it LGTM. Thanks for the updates.

This revision is now accepted and ready to land.Mar 3 2022, 12:17 AM
This revision was landed with ongoing or failed builds.Mar 4 2022, 7:11 AM
This revision was automatically updated to reflect the committed changes.