This is an archive of the discontinued LLVM Phabricator instance.

[x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered.
ClosedPublic

Authored by chandlerc on Jul 7 2014, 3:20 AM.

Details

Summary

When we use widening to legalize vector types, extend nodes are a real
challenge. Either the input or output is likely to be legal, but in many
cases not both. As a consequence, we don't really have any way to
represent this situation and the prior code in the widening legalization
framework would just scalarize the extend operation completely.

This patch introduces a new DAG node to represent doing a zero extend of
a vector "in register". The core of the idea is to allow legal but
different vector types in the input and output. The output vector must
have fewer lanes but wider elements. The operation is defined to zero
extend the low elements of the input to the size of the output elements,
and drop all of the high elements which don't have a corresponding lane
in the output vector.

It also includes generic expansion of this node in terms of blending
a zero vector into the high elements of the vector and bitcasting
across. This in turn yields extremely nice code for x86 SSE2 when we use
the new widening legalization logic in conjunction with the new shuffle
lowering logic.

There is still more to do here. We need to support sign extension, any
extension, and potentially int-to-float conversions. My current plan is
to continue using similar synthetic nodes to model each of these
transitions with generic lowering code for each one.

However, with this patch LLVM already reaches performance parity with
GCC for the core C loops of the x264 code (assuming you disable the
hand-written assembly versions) when compiling for SSE2 and SSE3
architectures and enabling the new widening and lowering logic for
vectors.

Diff Detail

Repository
rL LLVM

Event Timeline

chandlerc updated this revision to Diff 11113.Jul 7 2014, 3:20 AM
chandlerc retitled this revision from to [x86] Add a ZERO_EXTEND_VECTOR_INREG DAG node and use it when widening vector types to be legal and a ZERO_EXTEND node is encountered..
chandlerc updated this object.
chandlerc edited the test plan for this revision. (Show Details)
chandlerc added a subscriber: Unknown Object (MLST).
bkramer added inline comments.Jul 7 2014, 4:28 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
750 ↗(On Diff #11113)

Can we get away with creating illegal shuffles here or should we ask the backend if the mask is legal first?

arsenm added a subscriber: arsenm.Jul 7 2014, 11:27 AM
arsenm added inline comments.
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
750 ↗(On Diff #11113)

I'm pretty sure this needs to be checked. I fixed a similar issue in http://reviews.llvm.org/D4320

chandlerc added inline comments.Jul 7 2014, 11:49 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
750 ↗(On Diff #11113)

I believe we can get away with this, but it's a bit dicey I agree.

First, to Matt's point -- we can *definitely* create illegal types here, i don't know whot D4320 is talking about. If you look at SelectionDAGISel::CodeGenAndEmitDAG, you'll see that we re-type-legalize after the vector operation legalization. There are also specific comments around the vector operation legalization that it is expected to at times produce illegal types.

What we can't do here is produce an illegal *vector* type which could then result in an illegal vector operation, which could then result in scalarization with an illegal scalar type. We won't get a chance to fix the scalarized version unless we produce it in this pass of the legalizer.

So the risk in the current approach is that I don't know if the legalizer correctly recurses on the newly created VECTOR_SHUFFLE node to legalize that operation. If it doesn't then we would need to recurse on it within VectorLegalizer. I'll check that.

filcab edited edge metadata.Jul 7 2014, 4:18 PM

LGTM after the legality check gets taken care of (either because we already check the resulting node or by recursing manually).

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2420 ↗(On Diff #11113)

s/in the/is the/?

filcab accepted this revision.Jul 7 2014, 4:18 PM
filcab edited edge metadata.
This revision is now accepted and ready to land.Jul 7 2014, 4:18 PM
grosbach edited edge metadata.Jul 7 2014, 4:25 PM

LGTM too.

hfinkel added inline comments.Jul 8 2014, 1:26 PM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
745 ↗(On Diff #11113)

Is the big-endian case covered by one of the existing regression tests?

chandlerc added inline comments.Jul 9 2014, 3:37 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
750 ↗(On Diff #11113)

Just to close the loop, in LegalizeVectorOps.cpp:307-311 the code recurses over any legalized result node which differs from the original node, so returning an illegal VECTOR_SHUFFLE node (or even a bitcast of a VECTOR_SHUFFLE node, as LegalizeOp always recurses across operands) should always re-enter the vector operation legalization framework before we finish the phase of legalization. So essentially, this does appear to be a valid way of recursively delegating legalization from one node to another provided we don't end up with a cycle. If we do, it will show up as an infinite loop immediately, so I'm not too worried.

chandlerc added inline comments.Jul 9 2014, 3:52 AM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
745 ↗(On Diff #11113)

No, I'm not aware of any way to write a big-endian test case that even produces a ZERO_EXTEND_VECTOR_INREG node. Until there is wider usage of widening legalization on big-endian targets, it isn't really possible to test this code.

I'm going to commit this for now (as it is dead code on BE targets) and let me know if you'd rather turn this into an assert that BE isn't implemented so we remember to add a test, or any other approach for big-endian. Writing a test that covers it is trivial, its just the challenge of even reaching the code.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2420 ↗(On Diff #11113)

Done when committed.

chandlerc closed this revision.Jul 9 2014, 4:06 AM
chandlerc updated this revision to Diff 11188.

Closed by commit rL212610 (authored by @chandlerc).