This is an archive of the discontinued LLVM Phabricator instance.

[X86] Prefer KORTEST on Knights Landing or later for memcmp()
ClosedPublic

Authored by davezarzycki on Oct 17 2019, 9:24 PM.

Details

Summary

PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting.

KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0

Instead of:

KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0

Diff Detail

Event Timeline

davezarzycki created this revision.Oct 17 2019, 9:24 PM
craig.topper added inline comments.Oct 17 2019, 10:07 PM
lib/Target/X86/X86ISelLowering.cpp
42624

WidenVector isn't officially zeroing the upper bits. It's inserting into an undef vector. The assembly for the test cases is coming out correctly, but I think we really need to explicitly put 0s in the upper bits in the DAG.

No longer leave the upper vector register as UNDEF.

Rebased the patch after r375215

Ping. I think this is ready to land. What am I missing? Thanks!

Sorry for the delay a lot of us have been traveling and at the developers conference.

lib/Target/X86/X86ISelLowering.cpp
42612

This isn't ScalarToVector. It's Vector to wider Vector right?

42620

You can just use getIntPtrConstant here. VecIdxVT will return pointer type. That's consistent with other insert_subvectors.

lib/Target/X86/X86TargetTransformInfo.h
88

I think this should be with the CodeGen control options. The FeaturePrefer128Bit/256Bit were special because they are properties of the CPUs and they can be implied by a function attribute.

I can't explain why SlowUAMem32 and SlowUAMem16 are in separate sections....

davezarzycki marked 2 inline comments as done.Oct 24 2019, 4:20 AM

Thanks for getting back to me. This isn't urgent so please enjoy the conference!

lib/Target/X86/X86ISelLowering.cpp
42612

The memcmp expansion creates large scalar values and that are normally bitcast to a vector with this closure. In the case of Xeon Phi, it may also widen the vector too. If you have a better name for the closure, I'll happily rename it.

lib/Target/X86/X86TargetTransformInfo.h
88

Interesting. I was seriously considering naming this "SlowPTESTAndMOVMSK" to be consistent with the "Fast" and "Slow" pattern for fast/slow instructions. I can make this a CodeGen control option if you want, but please help me understand why slow PTEST/MOVMSK instructions (a.k.a. "prefer mask registers") is different than the other slow feature flags. Thanks!

craig.topper added inline comments.Oct 24 2019, 10:48 AM
lib/Target/X86/X86ISelLowering.cpp
42612

I missed that the bitcast is in here too. So it is scalar to vector. Sorry about that.

lib/Target/X86/X86TargetTransformInfo.h
88

Its not different that's why I wanted it grouped with the Fast/Slow flags.

Updated to address the last round of feedback. Also, this is rebased on top of D69222 so that we can see more of the net code gen changes.

This revision is now accepted and ready to land.Oct 26 2019, 10:19 AM
davezarzycki closed this revision.Oct 26 2019, 11:23 AM

11c920207afa92ad13fdf72daba14c9af336293a