This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add DAG combine to turn (vzmovl (insert_subvector undef, X, 0)) into (insert_subvector allzeros, (vzmovl X), 0)
ClosedPublic

Authored by craig.topper on Jun 18 2019, 12:43 PM.

Details

Summary

128/256 bit scalar_to_vectors are canonicalized to (insert_subvector undef, (scalar_to_vector), 0). We have isel patterns that try to match this pattern being used by a vzmovl to use a 128-bit instruction and a subreg_to_reg.

This patch detects the insert_subvector undef portion of this and pulls it through the vzmovl, creating a narrower vzmovl and an insert_subvector allzeroes. We can then match the insertsubvector into a subreg_to_reg operation by itself. Then we can fall back on existing (vzmovl (scalar_to_vector)) patterns.

Note, while the scalar_to_vector case is the motivating case I didn't restrict to just that case. I'm also wondering about shrinking any 256/512 vzmovl to an extract_subvector+vzmovl+insert_subvector(allzeros) but I fear that would have bad implications to shuffle combining.

I also think there is more canonicalization we can do with vzmovl with loads or scalar_to_vector with loads to create vzload.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jun 18 2019, 12:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2019, 12:43 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper marked 3 inline comments as done.Jun 18 2019, 12:53 PM
craig.topper added inline comments.
llvm/test/CodeGen/X86/avx-load-store.ll
243 ↗(On Diff #205416)

This is because we emit a SUBREG_TO_REG+MOV for insert_subvector(zero) and our post processing peephole that removes the MOV when possible doesn't run at -O0.

llvm/test/CodeGen/X86/vec_extract-avx.ll
147 ↗(On Diff #205416)

This is due to some inconsistencies between our handling of v4i64 vzmovl and v2i64 vzmovl. You'll see this same test case was changed by D63373 in a different way. The really weird thing here is that we're reducing the size of a load in an isel pattern. Which isn't good since we don't check if its volatile. We should remove these kinds of isel patterns and do something in DAG combine to move towards vzload.

llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
1507 ↗(On Diff #205416)

This is also due to inconsistencies between v4f64 and v2f64 vzmovl handling. We also generate this same code after D63373

spatel accepted this revision.Jun 21 2019, 11:27 AM

LGTM

This revision is now accepted and ready to land.Jun 21 2019, 11:27 AM
This revision was automatically updated to reflect the committed changes.