PTEST and especially the MOVMSK instructions are slow on Knights Landing or later. As a bonus, this patch increases instruction parallelism by emitting.
KORTEST(PCMPNEQ(a, b), PCMPNEQ(c, d)) == 0
Instead of:
KORTEST(AND(PCMPEQ(a, b), PCMPEQ(c, d))) == ~0
This isn't ScalarToVector. It's Vector to wider Vector right?