I recently discovered a performance regression in
test-suite/Multisource/Benchmarks/ptrdist/ks/ks.test solely due to
changing alignment. If you add 16 nops to the top of the
FindMaxGpAndSwap, performance goes down by about 10%. If you add an
aligment directive of 32 bytes at the top of the first loop of the
function, the performance comes back.
Verified on both Haswell and Sandybridge.