This is an archive of the discontinued LLVM Phabricator instance.

Recognize pattern for ctpop in instcombine
AbandonedPublic

Authored by kparzysz on Dec 22 2015, 9:19 AM.

Details

Reviewers
majnemer
Summary

Recognize the pattern for population count in the instruction combiner and replace it with a call to the corresponding intrinsic.
The pattern is based on that in "Hacker's Delight".

Diff Detail

Repository
rL LLVM

Event Timeline

kparzysz updated this revision to Diff 43452.Dec 22 2015, 9:19 AM
kparzysz retitled this revision from to Recognize patterns for ctpop and ctlz in instcombine.
kparzysz updated this object.
kparzysz added a reviewer: majnemer.
kparzysz set the repository for this revision to rL LLVM.
kparzysz added a subscriber: llvm-commits.
kparzysz updated this revision to Diff 43453.Dec 22 2015, 9:27 AM

Removed all the unnecessary (or unwanted) options from the function attributes in the unit tests.

majnemer edited edge metadata.Dec 22 2015, 10:25 AM

Unless I am mistaken, the ctpop and ctlz pieces of this code are independent of one another. Please split the patch into two patches, one for each transform.

lib/Transforms/InstCombine/InstCombineAddSub.cpp
1181–1183

Seeing as how this is a rather rare occurrence for adds, can you sort this to the bottom? The Simplify... naming convention implies that no new instruction needs to be created which is not the case for this transform. I'd call it optimizeCtpopIdiom or something similar.

test/Transforms/InstCombine/ctpop-match.ll
57

Please CHECK a little more thoroughly, it would be nice to see the call to ctpop consume the argument, etc.

95–97

This test case doesn't look reduced.

They are not completely independent. The pattern for ctlz relies on having the call to Intrinsic::ctpop already in the code.

kparzysz updated this revision to Diff 43461.Dec 22 2015, 10:56 AM
kparzysz edited edge metadata.

Addresses all the comments except for splitting the patch into two. The ctpop part does not depend on ctlz, but ctlz utilizes ctpop.

kparzysz marked 3 inline comments as done.Dec 22 2015, 10:57 AM

Marked comments as "done".

kparzysz updated this revision to Diff 43465.Dec 22 2015, 11:00 AM

Restored a blank line accidentally deleted from visitSub.

Are you ok with keeping both parts together, or would you still prefer to have them separated?

From previous comments it looks like this patch should just contain the ctpop patterns and a followup patch adds the ctlz work.

An additional followup could be to recognise the pattern cttz(x) = ctpop((x & -x)- 1)

Additionally, is there any chance that this could be tweaked to detect the same ctpop / lzcnt patterns for vector types as well?

Yes, and I asked if it's still ok for these two parts to go together. I can split it if there is a strong preference for that.

I haven't thought about the vector versions and I haven't seen any code that does this for vectors, so I'm not sure what patterns to look for. I don't know if the pattern for scalars would also apply to vectors.

I can add cttz.

kparzysz updated this revision to Diff 45393.Jan 20 2016, 8:17 AM
kparzysz retitled this revision from Recognize patterns for ctpop and ctlz in instcombine to Recognize pattern for ctpop in instcombine.
kparzysz updated this object.
hfinkel added a subscriber: hfinkel.Feb 2 2016, 7:59 PM
hfinkel added inline comments.
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1058

I think it would be better if the comment above matchCtpopW explained what matchCtpopW does, and a comment above optimizeToCtpop explained what optimizeToCtpop does. The:

// ... If BW is less than the bit-width of the type of V, then
// BI == ctpop(V) if the bits of V beyond BW are zero.

is a bit confusing here because it is describing behavior implemented in a different function.

1156

Instead of checking for known zero bits, why not transform this into a ctpop(v & mask)? If the upper bits are known to be zero, then the mask will go away in the next instcombine iteration regardless.

Does the rest of the optimization pipeline benefit from this canonicalization? If not, I would recommend moving this to CGP.

To answer David's question: the main reason why it's here is that after the instruction combiner, the code is transformed in a way that makes the detection of the popcnt pattern more difficult. At this stage, there is some regularity in the IR that can be exploited to check for popcnt patterns of varying widths.
As far as downstream optimizations are involved---this transformation will reduce code size by quite a bit, and any optimization that can be limited by code size could potentially be enabled by this change.

kparzysz marked an inline comment as done.Feb 3 2016, 7:38 AM
kparzysz added inline comments.
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1156

Because then the pattern is not really computing the population count. The instructions in the pattern operate on the whole value, including the high bits, and these bits contribute to the final result.

kparzysz updated this revision to Diff 46785.Feb 3 2016, 7:40 AM

Updated comments to reflect Hal's remarks.

To answer David's question: the main reason why it's here is that after the instruction combiner, the code is transformed in a way that makes the detection of the popcnt pattern more difficult. At this stage, there is some regularity in the IR that can be exploited to check for popcnt patterns of varying widths.
As far as downstream optimizations are involved---this transformation will reduce code size by quite a bit, and any optimization that can be limited by code size could potentially be enabled by this change.

I am concerned that you will slow down InstCombine by a considerable margin by checking for PopCnt. Most of the time, the work done to perform this optimization will not be fruitful.

We've recently been through a similar odyssey for bswap/bitreverse (see r257875). We managed to move that logic to CGP without too much drama.

kparzysz abandoned this revision.Mar 3 2016, 2:29 PM

Will implement this in another location (most likely CodeGenPrepare).