This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeVectors] Improve vector CTPOP expansion
AbandonedPublic

Authored by bruno on May 25 2015, 11:44 AM.

Details

Summary

This patch is a follow up from vector CTPOP work started in http://reviews.llvm.org/D6531

It modifies current target independent vector CTPOP expansion to implement a parallel version of the algorithm presented in http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel

A new TLI hook is provided to let the target decide for a vector type whether it should use the unrolled CTPOP expansion or the algorithm implemented in this patch. This is specially useful for x86 where unrolling, parallel bitmath and custom lowering dispute the better performance depending on the type. It looks like this can benefit other target as well. PowerPC folks, maybe this could show gains for vector types pre-POWER8?

The patch depends upon http://reviews.llvm.org/D6531 to be applied first so that the tests can run smoothly.

Diff Detail

Repository
rL LLVM

Event Timeline

bruno updated this revision to Diff 26464.May 25 2015, 11:44 AM
bruno retitled this revision from to [LegalizeVectors] Improve vector CTPOP expansion.
bruno updated this object.
bruno edited the test plan for this revision. (Show Details)
bruno added reviewers: chandlerc, hfinkel, nadav, delena.
bruno set the repository for this revision to rL LLVM.
bruno added subscribers: Unknown Object (MLST), kbarton, wschmidt.
wschmidt added a comment.EditedMay 27 2015, 9:43 AM

Yes, for Power prior to P8, this would be an improvement. Since earlier processors have no quick path between vector registers and GPRs, scalarization requires two trips through memory, each of which will incur the evil load-hit-store stall. So even though we have scalar popcntb, popcntw, and popcntd, we would prefer not to use them if we can keep the computation in VRs. Once your patch is approved, we should likely override with "bool PPCTargetLowering::isCheapToUnrollVectorPopCount(EVT) const" that just returns false for all types, and add test cases. We should test the performance to be certain, but I'm confident this would be a win.

(For P8, CTPOP is legal for all vector types, so IIUC we wouldn't go through this logic.)

hfinkel added inline comments.May 27 2015, 11:44 AM
include/llvm/Target/TargetLowering.h
280

This highlights an unfortunate terminology collision that we really should fix. Essentially everywhere in LLVM, "Unroll" refers to something done to loops (not vectors). The operation referred to here is called 'Scalarize', not only in TTI and its users, but also all over CodeGen too, and I think that's a better name (we're breaking the vector apart into scalar operations, and then rebuilding a vector from the results).

I'd like to not make the problem worse with this patch: Please call the operation Scalarize, not Unroll. Fixing the other few places in CodeGen (just the users of DAG.UnrollVectorOp in practice) would also be a nice follow-up.

bruno abandoned this revision.Apr 13 2017, 11:28 AM

I don't have plans to work on this any time soon. Abandoning.