Page MenuHomePhabricator

[X86, AVX] improve insertion into zero element of 256-bit vector

Authored by spatel on Mar 25 2015, 10:06 AM.



This patch allows AVX blend instructions to handle insertion into the low element of a 256-bit vector for the appropriate data types.

For f32, instead of:

vblendps	$1, %xmm1, %xmm0, %xmm1 ## xmm1 = xmm1[0],xmm0[1,2,3]
vblendps	$15, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1,2,3],ymm0[4,5,6,7]

we get:

vblendps	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3,4,5,6,7]

For f64, instead of:

vmovsd	%xmm1, %xmm0, %xmm1     ## xmm1 = xmm1[0],xmm0[1]
vblendpd	$3, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0,1],ymm0[2,3]

we get:

vblendpd	$1, %ymm1, %ymm0, %ymm0 ## ymm0 = ymm1[0],ymm0[1,2,3]

For the hardware-neglected integer data types, I left a TODO comment in the code and added regression tests for a follow-on patch.

Diff Detail


Event Timeline

spatel updated this revision to Diff 22655.Mar 25 2015, 10:06 AM
spatel retitled this revision from to [X86, AVX] improve insertion into zero element of 256-bit vector.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: andreadb, qcolombet, RKSimon.
spatel added a subscriber: Unknown Object (MLST).
nadav added a subscriber: nadav.Mar 25 2015, 10:12 AM

This looks good to me.

This revision was automatically updated to reflect the committed changes.

Thanks, Nadav! Checked in at r233199.