This is an archive of the discontinued LLVM Phabricator instance.

[X86] NFC: expand inline memcmp test coverage
ClosedPublic

Authored by davezarzycki on Oct 20 2019, 3:05 AM.

Details

Summary
  1. Adds SSE4.1 coverage.
  2. Adds prefer-256-bit or not coverage.
  3. Adds more power-of-two tests up to 512 bytes.
  4. Adds power-of-two-minus-one tests to verify overlapping loads.
  5. Adds power-of-two-plus-one-half tests (48, 96, 192, and 384).
  6. Adds greater-than/less-than tests from 16 to 512 bytes.

Diff Detail

Event Timeline

davezarzycki created this revision.Oct 20 2019, 3:05 AM

Please note: 48 byte and 96 byte comparisons have terrible code gen on AVX2 and later.

And now that I've had a chance to rebase D69044 ("up to four load pairs") on top of this updated test file, I can report that:

  1. The 48 and 96 byte memcmps do not improve for AVX2 or AVX512.
  2. The AVX1 code gen is relatively reasonable for 48 bytes: three XMM compares. It could have been one YMM compare and one zero extended XMM compare.

I think I figured out why 48 and 96 bytes are awful. It seems that lowering a vector that is the result of a zero extended scalar generates terrible code. Should combineVectorSizedSetCCEquality detect the zero extend and create an ISD::INSERT_SUBVECTOR node? Or should something more fundamental detect this scenario and create the ISD::INSERT_SUBVECTOR?

Ping. Is this ready to land? I also have a forthcoming patch for the 48 and 96 byte scenarios.

Ping. I think this is ready and is useful to other proposals like D69044 and D69157. Any objections?

xbolva00 accepted this revision.Oct 26 2019, 8:11 AM
xbolva00 added a subscriber: xbolva00.

Yeah, please precommit new testcases..

This revision is now accepted and ready to land.Oct 26 2019, 8:11 AM
craig.topper accepted this revision.Oct 26 2019, 10:18 AM

LGTM too

davezarzycki closed this revision.Oct 26 2019, 11:22 AM

0d0509384f054cb4f13260786ee48163ac94d123