The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a xmm blend (we make use of the implicit zeroing of the upper ymm).
Fix for PR25320.
Differential D14151
[X86][AVX] Fix lowering of X86ISD::VZEXT_MOVL for 128-bit -> 256-bit extension RKSimon on Oct 28 2015, 9:04 AM. Authored by
Details The lowering patterns for X86ISD::VZEXT_MOVL for 128-bit to 256-bit vectors were just copying the lower xmm instead of actually masking off the first scalar using a xmm blend (we make use of the implicit zeroing of the upper ymm). Fix for PR25320.
Diff Detail
Event Timeline
Comment Actions Hi Simon,
Comment Actions I'll update this patch to just be a fix, with just the removed lines (and the altered tests) and then prepare a second patch that improves the code quality. We are missing a potentially big perf gain for when the upper half of a register can be implicitly zero'd with VEX encoded instructions - especially on 128-bit ALUs such as Jaguar and Sandy Bridge.
|