This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner][X86] Teach visitCONCAT_VECTORS to combine (concat_vectors (concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
ClosedPublic

Authored by craig.topper on Aug 19 2019, 5:42 PM.

Details

Summary

I also had to tweak one existing X86 combine to avoid a regression there. I don't think we need the IdxVal == 0 check on the out insert_subvector. From the other index checks, we know the we had a subvector with some number of 0 elements above it. We can safely drop those 0 elements inserting just the smaller subvector anywhere into the larger zero vector.

This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.

I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Aug 19 2019, 5:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2019, 5:42 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper marked an inline comment as done.Aug 19 2019, 5:45 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/vec_umulo.ll
2516 ↗(On Diff #216029)

By instruction count this is a regression, but I'm not sure exactly what the difference is.

RKSimon added inline comments.Aug 20 2019, 3:06 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
43259 ↗(On Diff #216029)

Are we sure that this works in the general IdxVal case?

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7593 ↗(On Diff #216029)

This looks like we're missing a computeKnownBitsForTargetNode handling for a X86ISD opcode?

craig.topper marked 2 inline comments as done.Aug 20 2019, 8:26 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
43259 ↗(On Diff #216029)

I think so. We know we were inserting a subvector and some zeroes into a zero vector. This just shrinks the subvector of the insertion so we don’t transfer the zeroes. Since everything but the new subvector is going to be zero that should be fine. The only thing that should matter is that the extract and inner insert use the same index so we know the extract covers the whole subvector and possibly some zeroes. Does that sound right?

llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7593 ↗(On Diff #216029)

There aren’t any target nodes here. The kshifts should be coming from isel for an insert_subvector. I think we’re probably missing a combine for insert_subvector into zero followed by an insert into undef. Maybe with an extract between them.

craig.topper marked an inline comment as done.Aug 20 2019, 10:06 AM
craig.topper added inline comments.
llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
7593 ↗(On Diff #216029)

For these cases we end up with a DAG like this.

t33: v16i1 = BUILD_VECTOR Constant:i8<0>, Constant:i8<0>, Constant:i8<0>, ....
t6: v2i1 = setcc t2, t4, seteq:ch

      t34: v16i1 = insert_subvector t33, t6, Constant:i64<0>
    t35: v8i1 = extract_subvector t34, Constant:i64<0>
  t18: i8 = bitcast t35
t36: i32 = zero_extend t18

We can't simplfy the t33 input to the insert_subvector, since t6 is only 2 bits. We can probably add a combine to turn the bitcast into a v16i1->i16 bitcast from the insert_subvector to get rid of the extract. Then the zero_extend will be from i16 to i32 which we should be able to optimize out through an isel pattern that we have for that to use KMOVW.

Replace the combineInsertSubVector change with a new combine in combineExtractSubvector. This just narrows an insert into zero if we are extracting a smaller portion of it and the extract is only the only user. This should be a more straightforward change. The combine I edited in combineInsertSubvector has no one use checks so doing an index other than zero may not be the best choice.

Rebase after adding a bitcast optimization for v8i1.

RKSimon accepted this revision.Aug 20 2019, 2:35 PM

LGTM - cheers

This revision is now accepted and ready to land.Aug 20 2019, 2:35 PM
This revision was automatically updated to reflect the committed changes.