This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Improve single vector lane stores
ClosedPublic

Authored by evandro on May 9 2018, 11:48 AM.

Details

Summary

When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.

Diff Detail

Event Timeline

evandro created this revision.May 9 2018, 11:48 AM
evandro updated this revision to Diff 145977.May 9 2018, 11:50 AM
sebpop added a subscriber: sebpop.May 9 2018, 11:58 AM

This fixes a perf regression we were seeing with generation of vectors smaller than 64 bit: https://reviews.llvm.org/D45821

efriedma added inline comments.
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2385

Do we also need patterns for STURB and friends?

evandro added inline comments.May 9 2018, 2:50 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2385

I came across some code indicating that we do. But I'm working on that in a different patch, testing pending.

efriedma added inline comments.May 9 2018, 3:10 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2385

Okay; please add some test coverage for the offsets we current do/don't handle.

evandro updated this revision to Diff 146021.May 9 2018, 4:02 PM
evandro updated this revision to Diff 146022.
evandro marked an inline comment as done.
efriedma added inline comments.May 9 2018, 5:10 PM
llvm/test/CodeGen/AArch64/arm64-st1.ll
51

Please don't use CHECK-NOT like this; it isn't clear what we're actually generating.

evandro marked an inline comment as done.May 10 2018, 7:47 AM
evandro updated this revision to Diff 146126.May 10 2018, 7:51 AM
efriedma added inline comments.May 10 2018, 10:52 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2256

The VecROStoreLane0Pat<ro8, store, v16i8, VecROStoreLane0Pat<ro16, store, v8i16, and VecROStoreLane0Pat<ro32, truncstorei32, v4i32 patterns will never match. And there's a missing truncstorei8 pattern.

evandro added inline comments.May 10 2018, 11:16 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2256

True, except for VecROStoreLane0Pat<ro8, ..., since FPR8, used by STRBro{W,X}, can only contain untyped.

evandro added inline comments.May 10 2018, 12:05 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2256

Or do you mean like so?

` defm : VecStoreULane0Pat<truncstorei8, v8i16, i32, hsub, STURBi>;

defm : VecStoreULane0Pat<truncstorei16, v8i16, i32, hsub, STURHi>;
defm : VecStoreULane0Pat<store,         v8f16, f16, hsub, STURHi>;
defm : VecStoreULane0Pat<truncstorei8,  v4i32, i32, ssub, STURBi>;
defm : VecStoreULane0Pat<truncstorei16, v4i32, i32, ssub, STURHi>;
defm : VecStoreULane0Pat<truncstorei32, v4i32, i32, ssub, STURSi>;`
efriedma added inline comments.May 11 2018, 12:36 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2256

I'm not sure where you're going with your suggestion.

Due to legalization, the result of a vector_extract can't be i8 or i16; the patterns need to account for that.

evandro added inline comments.May 11 2018, 1:41 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.td
2256

I'll update this patch and then, please, let me know if I'm following what you mean.

evandro updated this revision to Diff 146404.May 11 2018, 1:44 PM
efriedma accepted this revision.May 11 2018, 4:09 PM

LGTM. (We should also add the i8 patterns at some point, but we can do that as a followup.)

This revision is now accepted and ready to land.May 11 2018, 4:09 PM

LGTM. (We should also add the i8 patterns at some point, but we can do that as a followup.)

Trying to add i8 hasn't been as straightforward. It seems to be hinged on FPR8 holding only untyped, but the approach that I'm trying would not change that. I'll keep you posted.

Thank you.

This revision was automatically updated to reflect the committed changes.