This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] expand is-power-of-2 pattern that uses popcount
ClosedPublic

Authored by spatel on Aug 19 2022, 8:20 AM.

Details

Summary

(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)

Adjust the legality check to avoid the poor codegen on AArch64. We probably only want to use popcount on this pattern when it is a single instruction.

This should fix issue #57225

Diff Detail

Event Timeline

spatel created this revision.Aug 19 2022, 8:20 AM
spatel requested review of this revision.Aug 19 2022, 8:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2022, 8:20 AM
davezarzycki added inline comments.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18593–18594

For your consideration:

(ctpop x) <= 1 --> (x & x-1) == 0
(ctpop x) > 1 --> (x & x-1) != 0

spatel added inline comments.Aug 21 2022, 6:03 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18593–18594

Ah, I forgot that we added a popcount target hook with D89952. Do you think we should use that for this pattern too? Or did you decide it wasn't worth pursuing (D91093)?

I'm looking back at the comments in:
https://bugs.llvm.org/show_bug.cgi?id=47825
...and we cited AArch64 scalar custom lowering/combining as a missing piece. But we could adapt the TLI hook if that seems better.

davezarzycki added inline comments.Aug 22 2022, 4:34 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18593–18594

I think the D91093 work is good, but it stopped being a good use of my time to pursue. (Sorry)

Nevertheless the target hook is there. This seems like a reasonable use of it, no?

spatel added inline comments.Aug 22 2022, 6:05 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18593–18594

Reviewing the earlier versions of D89952 - it doesn't seem like this pattern really fits the intent of the TLI hook?

This pattern will always expand to exactly one subtract+mask (+null check), so it's always going to be cheap based on the cost metric.

But we also have a legality check preceding both of these blocks. We can clean this up a bit and still have a smaller patch that gets us the expected results, so I'll post that alternative and see if it looks better.

spatel updated this revision to Diff 454479.Aug 22 2022, 6:18 AM

Patch updated:
Adjust the legality checks instead of adding custom combining for AArch64.
This preserves x86 codegen for these patterns by keeping the hacky vector type check in place (without that we'd create a likely slower scalar "popcnt+cmp+set" instead of an "lea+test+set".

spatel retitled this revision from [AArch64] expand is-power-of-2 pattern that uses popcount to [SDAG] expand is-power-of-2 pattern that uses popcount.Aug 22 2022, 6:22 AM
spatel edited the summary of this revision. (Show Details)

I had almost written something into 57225 that said, effectively, custom isn't a very good costmodel as it can mean so many things, so adding a target hook sounds like a good way forward.

This sounds good for AArch64. Could there be cases where the CTPOP (+setcc) is custom lowered but cheaper than the Add+And+setcc+setcc+and/or? My guess would probably be no, it would either be legal or quite expensive.

I had almost written something into 57225 that said, effectively, custom isn't a very good costmodel as it can mean so many things, so adding a target hook sounds like a good way forward.

This sounds good for AArch64. Could there be cases where the CTPOP (+setcc) is custom lowered but cheaper than the Add+And+setcc+setcc+and/or? My guess would probably be no, it would either be legal or quite expensive.

It's a fuzzy/general problem - we generally assume that a legal op is as cheap as it gets (usually one instruction), and a custom op is not as cheap (some sequence of legal instructions), but there's no guarantee on that.

I haven't seen any examples where this pattern is going wrong yet (after this patch), but it is complicated - @davezarzycki was looking at a large/messy set of potential x86 vector ISA trade-offs.

As noted earlier, the existing TLI hook doesn't quite fit the usage we want here. We could alter it to return a cost of 0 or something like that, but I thought that muddied the definition for the intended pattern (where we're expanding in a loop).

dmgreen accepted this revision.Aug 23 2022, 11:21 AM

I agree, this looks like a good improvement. It might be worth adding a i32 version of the popcount compare tests. Otherwise LGTM

This revision is now accepted and ready to land.Aug 23 2022, 11:21 AM