This replaces the build_vector lowering code that was just added in D80013 and matches the pattern later from the x86-specific "vzext_movl". That seems to result in the same or better improvements and gets rid of the 'TODO' items from that patch.
AFAICT, we always shrink wider constant vectors to 128-bit on these patterns, so we still get the implicit zero-extension to ymm/zmm without wasting space on larger vector constants. There's a trade-off there because that means we miss potential load-folding.
Similarly, we could load scalar constants here with implicit zero-extension even to 128-bit. That saves constant space, but it means we forego load-folding, and so it increases register pressure. This seems like a good middle-ground between those 2 options?
Would creating X86ISD::VZEXT_LOAD be better? We'd need to zext i8/i16 cases to i32 constant pool entries but we'd avoid the upper zero constants.