Depends on D114642
Original review https://reviews.llvm.org/D112201
OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Paths
| Differential D117003
[SchedModels][CortexA55] Add ASIMD integer instructioins ClosedPublic Authored by kpdev42 on Jan 11 2022, 3:04 AM.
Details Summary Depends on D114642 Original review https://reviews.llvm.org/D112201 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Diff Detail
Event TimelineComment Actions I'm not sure how much I love the predicate matching in the scheduler, as opposed to just matching instructions opcodes. There are quite a few instructions which narrow or enlarge vectorsm where register types are misleading. Can you at least move the code so it doesn't look like this is bolted onto the end of the existing schedule :)
This revision is now accepted and ready to land.Feb 14 2022, 1:06 AM This revision was landed with ongoing or failed builds.Feb 17 2022, 2:43 AM Closed by commit rG37fa99eda0f5: [SchedModels][CortexA55] Add ASIMD integer instructions (authored by kpdev42). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions Hello. I'm getting a few reports of this making performance worse, especially on Cortex-A510 cpu's. I think that adding the forwarding paths present on A55, but not available in A510 are causing more hazards and the performance to drop significantly in places, because they are compiled for cpu=generic. The A510 generally has higher throughput, but also higher latencies in places. We may need to back out some of these changes, even if it makes the A55 model less precise. At least in the short term. We might need to take the route of not hurting other cpus, providing it doesn't help the A55 performance much. Comment Actions I have partially reverted this in 61b616755aced8ed7afc48ffd152f02194b9d201. I was trying not to undo the whole thing, but just removed the forwarding paths and some other parts that were making performance worse around the "L" instructions. The rest was honestly making some performance worse too, but some stuff was better and the parts removed seemed to be causing much of the change. We probably need to be more careful going forward that we benchmark on more cpu's, not just the Cortex-A55. The schedule is used by any -mcpu=generic compile, so even if it's a less accurate model of the A55, we may need to strike more of a balance between different cpus until we have a better option.
Revision Contents
Diff 409560 llvm/lib/Target/AArch64/AArch64SchedA55.td
llvm/test/Analysis/CostModel/AArch64/vector-select.ll
llvm/test/CodeGen/AArch64/GlobalISel/combine-udiv.ll
llvm/test/CodeGen/AArch64/aarch64-dup-ext.ll
llvm/test/CodeGen/AArch64/active_lane_mask.ll
llvm/test/CodeGen/AArch64/addsub-constant-folding.ll
llvm/test/CodeGen/AArch64/arm64-AdvSIMD-Scalar.ll
llvm/test/CodeGen/AArch64/arm64-fcopysign.ll
llvm/test/CodeGen/AArch64/arm64-sli-sri-opt.ll
llvm/test/CodeGen/AArch64/arm64-subvector-extend.ll
llvm/test/CodeGen/AArch64/arm64-vhadd.ll
llvm/test/CodeGen/AArch64/cmp-select-sign.ll
llvm/test/CodeGen/AArch64/dag-numsignbits.ll
llvm/test/CodeGen/AArch64/div-rem-pair-recomposition-signed.ll
llvm/test/CodeGen/AArch64/div-rem-pair-recomposition-unsigned.ll
llvm/test/CodeGen/AArch64/expand-vector-rot.ll
llvm/test/CodeGen/AArch64/f16-instructions.ll
llvm/test/CodeGen/AArch64/fcopysign.ll
llvm/test/CodeGen/AArch64/fptosi-sat-vector.ll
llvm/test/CodeGen/AArch64/fptoui-sat-vector.ll
llvm/test/CodeGen/AArch64/funnel-shift-rot.ll
llvm/test/CodeGen/AArch64/insert-subvector-res-legalization.ll
llvm/test/CodeGen/AArch64/lowerMUL-newload.ll
llvm/test/CodeGen/AArch64/minmax-of-minmax.ll
llvm/test/CodeGen/AArch64/minmax.ll
llvm/test/CodeGen/AArch64/overeager_mla_fusing.ll
llvm/test/CodeGen/AArch64/ragreedy-local-interval-cost.ll
llvm/test/CodeGen/AArch64/sadd_sat_vec.ll
llvm/test/CodeGen/AArch64/sat-add.ll
llvm/test/CodeGen/AArch64/selectcc-to-shiftand.ll
llvm/test/CodeGen/AArch64/signbit-shift.ll
llvm/test/CodeGen/AArch64/sink-addsub-of-const.ll
llvm/test/CodeGen/AArch64/sinksplat.ll
llvm/test/CodeGen/AArch64/sitofp-fixed-legal.ll
llvm/test/CodeGen/AArch64/srem-seteq-illegal-types.ll
llvm/test/CodeGen/AArch64/srem-seteq-vec-nonsplat.ll
llvm/test/CodeGen/AArch64/srem-seteq-vec-splat.ll
llvm/test/CodeGen/AArch64/ssub_sat_vec.ll
llvm/test/CodeGen/AArch64/sve-fixed-length-int-div.ll
llvm/test/CodeGen/AArch64/sve-fixed-length-int-mulh.ll
llvm/test/CodeGen/AArch64/sve-fixed-length-int-rem.ll
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll
llvm/test/CodeGen/AArch64/sve-vscale-attr.ll
llvm/test/CodeGen/AArch64/uadd_sat_vec.ll
llvm/test/CodeGen/AArch64/urem-seteq-illegal-types.ll
llvm/test/CodeGen/AArch64/urem-seteq-vec-nonsplat.ll
llvm/test/CodeGen/AArch64/urem-seteq-vec-nonzero.ll
llvm/test/CodeGen/AArch64/urem-seteq-vec-splat.ll
llvm/test/CodeGen/AArch64/urem-seteq-vec-tautological.ll
llvm/test/CodeGen/AArch64/usub_sat_vec.ll
llvm/test/CodeGen/AArch64/vec_cttz.ll
llvm/test/CodeGen/AArch64/vec_uaddo.ll
llvm/test/CodeGen/AArch64/vec_umulo.ll
llvm/test/CodeGen/AArch64/vecreduce-add.ll
llvm/test/CodeGen/AArch64/vecreduce-and-legalization.ll
llvm/test/CodeGen/AArch64/vecreduce-fmax-legalization.ll
llvm/test/CodeGen/AArch64/vecreduce-fmin-legalization.ll
llvm/test/CodeGen/AArch64/vector-fcopysign.ll
llvm/test/CodeGen/AArch64/vselect-constants.ll
llvm/test/tools/llvm-mca/AArch64/Cortex/A55-neon-instructions.s
|
"01" in the dual issue tables means it must be the first item (slot 0). "10" would be EndGroup, and is mostly limited to certains branches and rets.