(ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0)
Adjust the legality check to avoid the poor codegen on AArch64. We probably only want to use popcount on this pattern when it is a single instruction.
This should fix issue #57225
Differential D132237
[SDAG] expand is-power-of-2 pattern that uses popcount spatel on Aug 19 2022, 8:20 AM. Authored by
Details (ctpop x) == 1 --> (x != 0) && ((x & x-1) == 0) Adjust the legality check to avoid the poor codegen on AArch64. We probably only want to use popcount on this pattern when it is a single instruction. This should fix issue #57225
Diff Detail
Event Timeline
Comment Actions Patch updated: Comment Actions I had almost written something into 57225 that said, effectively, custom isn't a very good costmodel as it can mean so many things, so adding a target hook sounds like a good way forward. This sounds good for AArch64. Could there be cases where the CTPOP (+setcc) is custom lowered but cheaper than the Add+And+setcc+setcc+and/or? My guess would probably be no, it would either be legal or quite expensive. Comment Actions It's a fuzzy/general problem - we generally assume that a legal op is as cheap as it gets (usually one instruction), and a custom op is not as cheap (some sequence of legal instructions), but there's no guarantee on that. I haven't seen any examples where this pattern is going wrong yet (after this patch), but it is complicated - @davezarzycki was looking at a large/messy set of potential x86 vector ISA trade-offs. As noted earlier, the existing TLI hook doesn't quite fit the usage we want here. We could alter it to return a cost of 0 or something like that, but I thought that muddied the definition for the intended pattern (where we're expanding in a loop). Comment Actions I agree, this looks like a good improvement. It might be worth adding a i32 version of the popcount compare tests. Otherwise LGTM |