This is an archive of the discontinued LLVM Phabricator instance.

[MathExtras] Add intrinsic for the PopulationCounter in Visual Studio
AbandonedPublic

Authored by ekatz on Oct 23 2019, 4:47 AM.

Details

Summary

The Visual Studio popcnt and popcnt64 intrinsics should be used the same way the GNU __builtin_popcount is already used.

Diff Detail

Event Timeline

ekatz created this revision.Oct 23 2019, 4:47 AM
craig.topper added a comment.EditedOct 23 2019, 3:18 PM

I'm pretty sure MSVC will always turn this into a popcnt instruction on X86. It has no fall back because MSVC doesn't have a concept of a target CPU. So this would only work on a Nehalem or newer CPU that implements the instruction.

ekatz added a comment.Oct 28 2019, 3:15 PM

I'm pretty sure MSVC will always turn this into a popcnt instruction on X86. It has no fall back because MSVC doesn't have a concept of a target CPU. So this would only work on a Nehalem or newer CPU that implements the instruction.

Actually, it seems MSVC never does that optimization. Not on 32-bit or 64-bit.
Regarding the supported CPU issue; Nehalem is over a decade old. How far back do we want to support? I think it is reasonable to assume popcnt is always available.

I'm pretty sure MSVC will always turn this into a popcnt instruction on X86. It has no fall back because MSVC doesn't have a concept of a target CPU. So this would only work on a Nehalem or newer CPU that implements the instruction.

Actually, it seems MSVC never does that optimization. Not on 32-bit or 64-bit.

Never does what optimization?

Regarding the supported CPU issue; Nehalem is over a decade old. How far back do we want to support? I think it is reasonable to assume popcnt is always ava

Never does what optimization?

Sorry, I though you meant that MSVC will automatically convert the popcnt bit twiddling pattern to the popcnt intrinsic. This optimization is done in LLVM, but not in MSVC.

In any case, as I wrote:

Regarding the supported CPU issue; Nehalem is over a decade old. How far back do we want to support? I think it is reasonable to assume popcnt is always available.

RKSimon added a subscriber: RKSimon.

GCC's __builtin_popcnt guarantees fallback to generic code if the cpu doesn't support a CTPOP-style instruction, I'm not sure how many people build clang with -march=native (or whatever) but I'd expect most people to just build for generic x86_64, which means we're probably executing a slow generic path anyhow.

The MSVC popcnt intrinsics always generates a x86 POPCNT instruction and we can't assume that this is available - its annoying but I can guarantee that somebody out there is building/running on an old box (or VM) that will fail.

As Craig mentioned MSVC is awful at building with target cpu defines - its gotten better and defines AVX__/AVX2__ etc. more recently but that's a poor wrapper for checking whether POPCNT can be used.

Abandon this? As I explained, MSVC doesn't provide a guaranteed fallback for targets without the POPCNT instruction.

I thought I might add a dynamic detection to the CMake files, as it already does that for headers and other supported functionalities. Didn't find the time for it, as it is a pretty low priority. But, what do you think?

TBH I don't think this is worth it unless you can demonstrate actual reductions in compile time - once C++20 lands and we're allowed to use its features we can move to using std::popcount and avoid all of this.

ekatz abandoned this revision.Jan 14 2020, 9:48 AM

I agree. This seems like too much of a hassle. I'll abandon this patch.