This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Attempt to extract vector elements through target shuffles
ClosedPublic

Authored by RKSimon on Feb 20 2017, 11:14 AM.

Details

Summary

DAGCombiner already supports peeking thorough shuffles to improve vector element extraction, but legalization often leaves us in situations where we need to extract vector elements after shuffles have already been lowered.

This patch adds support for VECTOR_EXTRACT_ELEMENT/PEXTRW/PEXTRB instructions to attempt to handle target shuffles as well. I've covered some basic scenarios including handling shuffle mask scaling and the implicit zero-extension of PEXTRW/PEXTRB, there is more that could be done here (that I've mentioned in TODOs) but I haven't found many cases where its worth it.

Additional venues to consider include adding some form of 'TargetLowering::isVectorExtractionLegal()' check to try to handle more of these cases in DAGCombiner, and a SelectionDAG form of SimplifyDemandedVectorElts.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Feb 20 2017, 11:14 AM
filcab added inline comments.Feb 21 2017, 6:57 AM
lib/Target/X86/X86ISelLowering.cpp
28877 ↗(On Diff #89137)

What if you have a mask vector of 8 bits? Maybe check for SrcSVT == i1?

28922 ↗(On Diff #89137)

Seems to me this would still be valid with different types (extract_element with a different type), but we can't do anything in that case. If so, shouldn't we return SDValue(); instead of asserting?

34441 ↗(On Diff #89137)

Is this an optimization or a correctness thing? Why can't you simply call combineExtractVectorElt since it will try that combine anyway?

test/CodeGen/X86/promote-vec3.ll
19 ↗(On Diff #89137)

Interesting.

RKSimon added inline comments.Feb 21 2017, 7:53 AM
lib/Target/X86/X86ISelLowering.cpp
28922 ↗(On Diff #89137)

It shouldn't happen as we test for after legalization at the start of the function, but I can replace it with an early out.

34441 ↗(On Diff #89137)

combineExtractVectorElt has a load of other combines inside it (notably load extraction) that can't handle PEXTRW/PEXTRB and their implicit zeroing of the upper bits of their i32 result. I'll make this clearer in the TODO at the top of combineExtractVectorElt_SSE

filcab added inline comments.Feb 21 2017, 8:01 AM
lib/Target/X86/X86ISelLowering.cpp
28922 ↗(On Diff #89137)

If it's a "shouldn't ever happen", as it seems to be, then the assert is best.

34441 ↗(On Diff #89137)

Thanks.

RKSimon updated this revision to Diff 89243.Feb 21 2017, 10:39 AM

Updated based on Filipe's comments.

RKSimon marked 4 inline comments as done.Feb 21 2017, 10:42 AM
RKSimon added inline comments.
test/CodeGen/X86/promote-vec3.ll
19 ↗(On Diff #89137)

In a future patch we can hopefully fold extract(insert(v,x,c),c) -> x

RKSimon updated this revision to Diff 89879.Feb 27 2017, 7:04 AM

ping? rebased on trunk

filcab accepted this revision.Feb 27 2017, 11:16 AM
This revision is now accepted and ready to land.Feb 27 2017, 11:16 AM
This revision was automatically updated to reflect the committed changes.