Page MenuHomePhabricator

[X86][SSE] Don't colaesce v4i32 extracts (RFC)
ClosedPublic

Authored by RKSimon on Jan 19 2018, 10:36 AM.

Details

Summary

We currently coalesce v4i32 extracts from all 4 elements to 2 v2i64 extracts + shifts/sign-extends.

This seems to have been added back in the days when we tended to spill vectors and reload scalars, or ended up with repeated shuffles moving everything down to 0'th index. I don't think either of these are likely these days as we have better EXTRACT_VECTOR_ELT and VECTOR_SHUFFLE handling, and the existing code tends to make it very difficult for various vector and load combines.

This patch proposes to drop the extract coalescing code - in the test coverage we have this is a net gain, now we could be vector biased but I don't think this is a big problem.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jan 19 2018, 10:36 AM

ping - comments?

niravd accepted this revision.Jan 26 2018, 7:28 AM

This seems reasonable to me. The test outputs all look better or more optimizable (Have you looked at why the ands aren't being colleasced into a vector?)

This revision is now accepted and ready to land.Jan 26 2018, 7:28 AM

This seems reasonable to me. The test outputs all look better or more optimizable (Have you looked at why the ands aren't being colleasced into a vector?)

Thanks - we do have x86 lowering code the vectorizes BUILDVECTOR(X & C1, Y & C2, .....) -> BUILDVECTOR(X, Y, ...) & BUILDVECTOR(C1, C2, ...) but nothing that does the reverse even though both are often generated during legalization. I'm always a bit afraid of doing too much vectorization code in the DAG.

This revision was automatically updated to reflect the committed changes.