This is an archive of the discontinued LLVM Phabricator instance.

[X86] Use xmm registers to implement 64-bit popcnt on 32-bit targets if possible if popcnt instruction is not available
ClosedPublic

Authored by craig.topper on Mar 21 2019, 12:13 PM.

Details

Summary

On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences.

Diff Detail

Event Timeline

craig.topper created this revision.Mar 21 2019, 12:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2019, 12:13 PM
Herald added a subscriber: hiraditya. · View Herald Transcript

Make sure that NoImplicitFloat is not set before doing the transform.

spatel accepted this revision.Mar 22 2019, 1:09 PM

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
26716

Clearer to make this use the constant opcode getNode(ISD::CTPOP...) instead of using N->getOpcode() again.

This revision is now accepted and ready to land.Mar 22 2019, 1:09 PM
craig.topper marked an inline comment as done.Mar 22 2019, 1:31 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
26716

Yeah I'll change that. When I wrote it I was thinking we might want to do this for cttz and ctlz too, but those expanded still use bsr/bsf or lzcnt/tzcnt so the vector version is probably worse.

This revision was automatically updated to reflect the committed changes.