When storing the 0th lane of a vector, use a simpler and usually more efficient scalar store instead.
Details
Diff Detail
Event Timeline
This fixes a perf regression we were seeing with generation of vectors smaller than 64 bit: https://reviews.llvm.org/D45821
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2382 | Do we also need patterns for STURB and friends? |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2382 | I came across some code indicating that we do. But I'm working on that in a different patch, testing pending. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2382 | Okay; please add some test coverage for the offsets we current do/don't handle. |
llvm/test/CodeGen/AArch64/arm64-st1.ll | ||
---|---|---|
68 | Please don't use CHECK-NOT like this; it isn't clear what we're actually generating. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2252–2256 | The VecROStoreLane0Pat<ro8, store, v16i8, VecROStoreLane0Pat<ro16, store, v8i16, and VecROStoreLane0Pat<ro32, truncstorei32, v4i32 patterns will never match. And there's a missing truncstorei8 pattern. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2252–2256 | True, except for VecROStoreLane0Pat<ro8, ..., since FPR8, used by STRBro{W,X}, can only contain untyped. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2252–2256 | Or do you mean like so? ` defm : VecStoreULane0Pat<truncstorei8, v8i16, i32, hsub, STURBi>; defm : VecStoreULane0Pat<truncstorei16, v8i16, i32, hsub, STURHi>; defm : VecStoreULane0Pat<store, v8f16, f16, hsub, STURHi>; defm : VecStoreULane0Pat<truncstorei8, v4i32, i32, ssub, STURBi>; defm : VecStoreULane0Pat<truncstorei16, v4i32, i32, ssub, STURHi>; defm : VecStoreULane0Pat<truncstorei32, v4i32, i32, ssub, STURSi>;` |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2252–2256 | I'm not sure where you're going with your suggestion. Due to legalization, the result of a vector_extract can't be i8 or i16; the patterns need to account for that. |
llvm/lib/Target/AArch64/AArch64InstrInfo.td | ||
---|---|---|
2252–2256 | I'll update this patch and then, please, let me know if I'm following what you mean. |
LGTM. (We should also add the i8 patterns at some point, but we can do that as a followup.)
Trying to add i8 hasn't been as straightforward. It seems to be hinged on FPR8 holding only untyped, but the approach that I'm trying would not change that. I'll keep you posted.
Thank you.
The VecROStoreLane0Pat<ro8, store, v16i8, VecROStoreLane0Pat<ro16, store, v8i16, and VecROStoreLane0Pat<ro32, truncstorei32, v4i32 patterns will never match. And there's a missing truncstorei8 pattern.