This is an archive of the discontinued LLVM Phabricator instance.

[X86] Separate the memory size of vzext_load/vextract_store from the element size of the result type. Use them improve the codegen of v2f32 loads/stores with sse1 only.
ClosedPublic

Authored by craig.topper on Jul 10 2019, 1:37 PM.

Details

Summary

SSE1 only supports v4f32. But does have instructions like movlps/movhps that load/store 64-bits of memory.

This patch breaks the connection between the node VT of the vzext_load/vextract_store patterns and the memory VT. Enabling a v4f32 node with a 64-bit memory VT. I've used i64 as the memory VT here. I've written the PatFrag predicate to just check the store size not the specific VT. I think the VT will only matter for CSE purposes. We could use v2f32, but if we want to start using these operations in more places a simple integer type might make the most sense.

I'd like to maybe use this same thing for SSE2 and later as well, but that will need more work to be supported by EltsFromConsecutiveLoads to avoid regressing lit tests. I'd maybe also like to combine bitcasts with these load/stores nodes now that the types are disconnected. And I'd also like to consider canonicalizing (scalar_to_vector + load) to vzext_load.

If you want I can split the mechanical tablegen stuff where I added the 32/64 off from the sse1 change.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Jul 10 2019, 1:37 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2019, 1:37 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon accepted this revision.Jul 14 2019, 7:49 AM

No objections, LGTM

This revision is now accepted and ready to land.Jul 14 2019, 7:49 AM
This revision was automatically updated to reflect the committed changes.