This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Split 0 vector stores into scalar store pairs.
ClosedPublic

Authored by gberry on Nov 11 2016, 1:33 PM.

Details

Summary

Replace a splat of zeros to a vector store by scalar stores of WZR/XZR.
The load store optimizer pass will merge them to store pair stores.
This should be better than a movi to create the vector zero followed by
a vector store if the zero constant is not re-used, since one
instructions and one register live range will be removed.

For example, the final generated code should be:

stp xzr, xzr, [x0]

instead of:

movi v0.2d, #0
str q0, [x0]

Diff Detail

Repository
rL LLVM

Event Timeline

gberry updated this revision to Diff 77663.Nov 11 2016, 1:33 PM
gberry retitled this revision from to [AArch64] Split 0 vector stores into scalar store pairs..
gberry updated this object.
gberry added a subscriber: llvm-commits.
MatzeB accepted this revision.Nov 11 2016, 1:50 PM
MatzeB edited edge metadata.

LGTM

lib/Target/AArch64/AArch64ISelLowering.cpp
8817 ↗(On Diff #77663)

Should use StoreSDNode &St as it cannot be nullptr.

8823 ↗(On Diff #77663)

Wouldn't we get this without the if for a 3-element vector:

stp xzr, xzr, [x0]
str xzr, [x0 + 16]

should be a sensible thing to do. Or does this need more work for a future commit?

8944 ↗(On Diff #77663)

nothing. (wanted to delete this comment but phabricator seems to be buggy)

test/CodeGen/AArch64/ldst-opt.ll
1337 ↗(On Diff #77663)

A sentence explaining what we are testing would be nice. Same for the next function.

This revision is now accepted and ready to land.Nov 11 2016, 1:50 PM
gberry marked an inline comment as done.Nov 14 2016, 11:48 AM

Thanks. I owe you two follow-up changes, which should be coming shortly.

lib/Target/AArch64/AArch64ISelLowering.cpp
8817 ↗(On Diff #77663)

I'll fix this in a quick follow-up change since it involves changing a NFC refactor this change depends on.

8823 ↗(On Diff #77663)

It's a little more complicated because of legalization to catch the v3i32 and v2i32 cases, but I can handle those in a follow-up change.

This revision was automatically updated to reflect the committed changes.