[x86] avoid vector load narrowing with extracted store uses (PR42305)
This is an exception to the rule that we should prefer xmm ops to ymm ops.
As shown in PR42305:
...the store folding opportunity with vextractf128 may result in better
perf by reducing the instruction count.
Differential Revision: https://reviews.llvm.org/D63517