popcnt have a false dependency on the destination register dest, the instruction will wait until dest is ready before executing.
more details in:
http://stackoverflow.com/questions/25078285/replacing-a-32-bit-loop-count-variable-with-64-bit-introduces-crazy-performance
WA?