On 32-bit targets without popcnt, we currently expand 64-bit popcnt to sequences of arithmetic and logic ops for each 32-bit half and then add the 32 bit halves together. If we have xmm registers we can use use those to implement the operation instead. This results in less instructions then doing two separate 32-bit popcnt sequences.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
LGTM
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26715 | Clearer to make this use the constant opcode getNode(ISD::CTPOP...) instead of using N->getOpcode() again. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
26715 | Yeah I'll change that. When I wrote it I was thinking we might want to do this for cttz and ctlz too, but those expanded still use bsr/bsf or lzcnt/tzcnt so the vector version is probably worse. |
Clearer to make this use the constant opcode getNode(ISD::CTPOP...) instead of using N->getOpcode() again.