This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets
ClosedPublic

Authored by RKSimon on Nov 30 2014, 1:37 PM.

Details

Summary

4i32 shuffles for single insertions into zero vectors lowers to X86vzmovl which was using (v)blendps - causing domain switch stalls. This patch fixes this by using (v)pblendw instead.

The updated tests on test/CodeGen/X86/sse41.ll still contain a domain stall due to the use of insertps - I'm looking at fixing this in a future patch.

Pre-SSE4.1 targets are still affected by a similar domain stall using movss - we could fix this by using 2 x ( punpckldq XMM, zero ) in series - if people agree I'll make a patch for this as well.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 16753.Nov 30 2014, 1:37 PM
RKSimon retitled this revision from to [X86][SSE] Keep 4i32 vector insertions in integer domain on SSE4.1 targets.
RKSimon updated this object.
RKSimon edited the test plan for this revision. (Show Details)
RKSimon added reviewers: chandlerc, andreadb.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: Unknown Object (MLST).
andreadb accepted this revision.Dec 1 2014, 2:58 AM
andreadb edited edge metadata.

Hi Simon,

thanks for fixing those tablegen patterns! The patch looks good to me.

Cheers,
Andrea

This revision is now accepted and ready to land.Dec 1 2014, 2:58 AM
RKSimon closed this revision.Dec 2 2014, 2:32 PM
RKSimon updated this revision to Diff 16829.

Closed by commit rL223165 (authored by @RKSimon).

chandlerc edited edge metadata.Dec 2 2014, 8:14 PM

I only have one concern here, and it is just a very general concern: