This is an archive of the discontinued LLVM Phabricator instance.

[X86] Enable AVX512BW for memcmp
ClosedPublic

Authored by davezarzycki on Oct 4 2019, 4:34 AM.

Details

Reviewers
craig.topper

Diff Detail

Event Timeline

davezarzycki created this revision.Oct 4 2019, 4:34 AM
craig.topper added inline comments.Oct 4 2019, 9:58 AM
test/CodeGen/X86/memcmp.ll
1015 ↗(On Diff #223186)

I'm not sure this is better than the avx code. vpcmpeq to mask register and kortest are both 3 cycle latency. The vpcmpb to xmm is 1 cyc. vpmovmskb is 1 cyc and cmpl is 1 cyc.

That's a reasonable point. From my perspective:

  1. Staying in the opmask registers as long as possible relieves general purpose register pressure.
  2. When/if MaxLoadsPerMemcmp becomes greater than 2, then keeping the intermediate results in the opmask registers is the right thing to do.
  3. I have a pending patch to increase MaxLoadsPerMemcmp to 4 if-and-only-if a compare against zero is happening. Changing the value to 8 (if-and-only-if a compare against zero is happening) will require more work (I think).
  4. In the case of 512-bit ops, using AVX512BW instead of just AVX512F better matches people's mental model (that byte-wise memcmp() should generate byte-wise vector instructions).

What do you think?

I'm fine with the 512-bit change to use byte compare with AVX512BW.

For 128/256, I wonder about the PTEST suggestion in the FIXME above this code. Unfortunately PTEST has 3 cycle latency on recent Intel CPUs. But it would avoid the GPR use entirely.

davezarzycki retitled this revision from [X86] Enable AVX512BW and AVX512VL for memcmp to [X86] Enable AVX512BW for memcmp.

Simplified to just enabling AVX512BW for 512 and let 128/256 keep using AVX/AVX2.

PS – I trust your knowledge of instruction latencies. I wonder what Agner Fog did wrong if he measured PMOVMSKB as having 2-3 cycles of latency.

I think I just misremembered MOVMSKB latency. I think it is 2-3. I should have looked it up. I remembered from something previously that VPCMPEQ/KORTEST were both 3 cycs.

Is this patch ready to land then?

This revision is now accepted and ready to land.Oct 5 2019, 9:33 AM