This is an archive of the discontinued LLVM Phabricator instance.

generate extract_subvector node to avoid disastrous shuffle vector codegen
AbandonedPublic

Authored by spatel on Dec 11 2014, 11:27 AM.

Details

Summary

This is a partial fix for PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). When we extract multiple consecutive elements from a vector to create a build_vector, we should try to form an extract_subvector instead of relying solely on getVectorShuffle().

The difference in output for the simplest v4f64 test case looks like this:

vextractf128	$1, %ymm0, %xmm0
vpermilpd	$1, %xmm0, %xmm1 ## xmm1 = xmm0[1,0]
vunpcklpd	%xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0],xmm1[0]
vmovapd	%xmm0, (%rdi)
vzeroupper
retq

Becomes:

vextractf128	$1, %ymm0, (%rdi)
vzeroupper
retq

We should still fix the shuffle problem in the x86 backend, but I thought it was best to solve the higher-level problem first. There's also a bug in the x86 backend dealing with arbitrary indexing and lowering the EXTRACT_SUBVECTOR node, so I've limited this patch to firing on the (most common?) case of half-vector extractions. This pattern emerges in particular on SandyBridge because it cracks 32-byte memops in half causing mismatches in vector sizes.

Diff Detail

Event Timeline

spatel updated this revision to Diff 17183.Dec 11 2014, 11:27 AM
spatel retitled this revision from to generate extract_subvector node to avoid disastrous shuffle vector codegen.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: mkuper, chandlerc, andreadb, hfinkel.
spatel added a subscriber: Unknown Object (MLST).
mkuper edited edge metadata.Dec 16 2014, 2:50 AM

I looked at PR15872 yesterday, and ended up with something very similar, although more general.
Too bad I didn't really look at this review before starting. :-\

Then again, maybe I ended up too general, although I don't think I'm running into the miscompile.
Will add you to the review for that in a few minutes, but I'm fine with either version, or some merge of them, as long as it solves both PRs (and I'd still like someone more experienced with this to give a LGTM :-) ).

spatel abandoned this revision.Dec 16 2014, 10:50 AM

Abandoning patch in favor of D6678.