This is an archive of the discontinued LLVM Phabricator instance.

Convert a vselect into a concat_vector if possible
ClosedPublic

Authored by filcab on May 26 2014, 8:36 PM.

Details

Summary

If both vector args to vselect are concat_vectors and the condition is
constant and picks half a vector from each argument, convert the vselect
into a concat_vectors.

Added tests.

Diff Detail

Repository
rL LLVM

Event Timeline

filcab updated this revision to Diff 9818.May 26 2014, 8:36 PM
filcab retitled this revision from to Convert a vselect into a concat_vector if possible.
filcab updated this object.
filcab edited the test plan for this revision. (Show Details)
filcab added reviewers: nadav, delena, grosbach.
filcab added a subscriber: Unknown Object (MLST).
delena added inline comments.May 26 2014, 11:56 PM
lib/Target/X86/X86ISelLowering.cpp
17876 ↗(On Diff #9818)

You do not check that the types are legal. On what stage this optimization should happen? <2 x float> that you take in your test is illegal.

Actually, seeing as it's generic, maybe I should move it to a more generic part of DAGCombiner, instead of the X86 backend.

What do you think, Elena? Nadav?

Filipe

lib/Target/X86/X86ISelLowering.cpp
17876 ↗(On Diff #9818)

This is a generic optimization, which allows us to eliminate a vselect and replace it with a simpler concat_vectors, that's why I'm doing it before legalize types.

Moving this to DAGCombine makes sense to me.

lib/Target/X86/X86ISelLowering.cpp
17845 ↗(On Diff #9818)

Does this do with right thing is BottomHalf is undef? It seems like you'd want to pick the first non-undef operand (and similarly with TopHalf below).

filcab updated this revision to Diff 9860.May 27 2014, 7:21 PM

Moved the transform to DAGCombine.cpp, since it's generic, not X86 specific.

Also added the asserts Nadav requested and addressed the bug pointed out by Hal.

Nadav: Merging the loops would make for a big, complicated mess, instead
of two small, simple loops. I'd prefer to keep them separated, if you
don't mind.

hfinkel added inline comments.May 28 2014, 12:43 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4634 ↗(On Diff #9860)

Are you sure this is always true on all possible code paths? If you are sure, please add a comment explaining where this is handled and why it will always happen first. Otherwise, please just make this a normal check.

If half is undef, and the other half is X (either true or false), this will
fall under the if statements before the call to this function in
visitSELECT. Around line 4707 (isBuildVectorAll{Zeros,Ones}). I also added
a comment at the call site of this function.

Filipe

  • Original Message -----

From: "Filipe Cabecinhas" <filcab+llvm.phabricator@gmail.com>
To: "filcab+llvm phabricator" <filcab+llvm.phabricator@gmail.com>, nrotem@apple.com, "elena demikhovsky"
<elena.demikhovsky@intel.com>, grosbach@apple.com
Cc: hfinkel@anl.gov, llvm-commits@cs.uiuc.edu
Sent: Wednesday, May 28, 2014 2:48:17 AM
Subject: Re: [PATCH] Convert a vselect into a concat_vector if possible

If half is undef, and the other half is X (either true or false),
this will
fall under the if statements before the call to this function in
visitSELECT. Around line 4707 (isBuildVectorAll{Zeros,Ones}). I also
added
a comment at the call site of this function.

Ah, right. isBuildVectorAllZeros also skips undefs.

-Hal

Filipe

http://reviews.llvm.org/D3916

hfinkel added inline comments.May 28 2014, 1:27 AM
lib/CodeGen/SelectionDAG/DAGCombiner.cpp
4637 ↗(On Diff #9860)

Why does operand 0 derive from operand 1 of the inputs?

filcab updated this revision to Diff 9902.May 28 2014, 7:29 PM

Fixed a bug pointed by Hal Finkel, when returning the new concat_vectors

The code was doing the exact opposite of what it should.
I also added the specific registers to the test, in a way that accounts for some order change from the instruction scheduling.

I'm having a problem in making this test generic, though. Should I do a test for some additional architectures? The test/CodeGen/Generic directory only seems to check that llc doesn't error out when running on it. But I see no way to test this optimization without actually looking at the code.

Should I do more tests? Any way to check this that I didn't think about?

Thanks,

  • Filipe

Aside from fixing up the test, this LGTM.

test/CodeGen/X86/vselect.ll
269 ↗(On Diff #9902)

I think can be made a little more specific. You know what the input registers are (because they're dictated by the calling convention), so you can do something like this:

; CHECK-DAG: movlhps %xmm2, %xmm{{[01]}}
; CHECK-DAG: movlhps %xmm3, %xmm{{[01]}}

(where the CHECK-DAGs can match in either order). Because the add is commutative, it is hard to pattern match the resulting registers in a stable way (because they could be flipped). If you make the test use a subtract (so that the correct order is fixed), then we can be even more specific:

; CHECK-DAG: movlhps %xmm2, %[[REG1:[xmm[01]]]]
; CHECK-DAG: movlhps %xmm3, %[[REG2:[xmm[01]]]]
; CHECK-NEXT: subps [[REG1]], [[REG2]]

(or something like that).

Thanks, I didn't remember check-dag. Btw, if we use check-dag, we wouldn't
need the %xmm{{[01]}} regex either! unless I'm looking at it wrong.

I'll commit later today.

Thank you,

Filipe

filcab closed this revision.May 30 2014, 4:10 PM
filcab updated this revision to Diff 9970.

Closed by commit rL209929 (authored by @filcab).