When CTPOP is not legal, emulating it can be quite expensive. In D89952, a TLI hook was added to allow targets to tune how expensive CTPOP emulation is. This patch attempts to do basic/easy tuning. More refinement is certainly possible, especially when a legal CTPOP is available but with the "wrong" type/size.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Fixed the malformed diff/patch.
Also, PPC reviewers would be appreciated. Thanks for any recommendations.
Adding more potential reviewers.
This is 2 independent patches in 1 proposal, so you may want to split it to avoid waiting on approval for both targets.
I made some general x86 comments, but someone that knows AVX512 better should have a look at those test diffs in particular.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
5347 | We should have a code comment here to describe approximately what the difference in x86 asm will be with these settings. As-is { 2, 4, 6, 10, 14 } is a set of magic numbers. Need to explain why a particular data type (i8, etc) is treated differently than other types. | |
llvm/test/CodeGen/X86/vector-popcnt-128-ult-ugt.ll | ||
473–484 | Are we saying that the longer sequence is better because it avoids the constant loads and/or potentially expensive vpshufb's? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
5335 | How are we able to make that assertion? I don't see anything in the caller that filters out scalar types. We probably need at least 1 scalar test as a sanity check if the plan is that this optimization is vector-only for x86. |