mflr is kind of expensive on Power version smaller than 10, so we should schedule the store for the mflr's def away from mflr.
In epilogue, the expensive mtlr has no user for its def, so it doesn't matter that the load and the mtlr are back-to-back.
Paths
| Differential D137423
[PowerPC] make expensive mflr be away from its user in the function prologue ClosedPublic Authored by shchenz on Nov 4 2022, 7:16 AM.
Details
Summary mflr is kind of expensive on Power version smaller than 10, so we should schedule the store for the mflr's def away from mflr. In epilogue, the expensive mtlr has no user for its def, so it doesn't matter that the load and the mtlr are back-to-back.
Diff Detail
Event Timeline
shchenz retitled this revision from [PowerPC][WIP] make expensive mflr be awfy from its user in the function prologue to [PowerPC] make expensive mflr be away from its user in the function prologue. Comment Actionsfix all LIT failures shchenz added a parent revision: D137612: [PowerPC] add a new subtarget feature FastMFLR.Nov 7 2022, 10:47 PM shchenz marked an inline comment as done. shchenz added inline comments.
shchenz marked an inline comment as done. This revision is now accepted and ready to land.Nov 14 2022, 10:05 AM This revision was landed with ongoing or failed builds.Nov 14 2022, 6:14 PM Closed by commit rGeb7d16ea2564: [PowerPC] make expensive mflr be away from its user in the function prologue (authored by shchenz). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 475324 llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
llvm/test/CodeGen/PowerPC/2007-09-08-unaligned.ll
llvm/test/CodeGen/PowerPC/2007-11-16-landingpad-split.ll
llvm/test/CodeGen/PowerPC/2008-10-28-f128-i32.ll
llvm/test/CodeGen/PowerPC/2010-05-03-retaddr1.ll
llvm/test/CodeGen/PowerPC/CSR-fit.ll
llvm/test/CodeGen/PowerPC/Frames-dyn-alloca-with-func-call.ll
llvm/test/CodeGen/PowerPC/MCSE-caller-preserved-reg.ll
llvm/test/CodeGen/PowerPC/P10-stack-alignment.ll
llvm/test/CodeGen/PowerPC/PR35812-neg-cmpxchg.ll
llvm/test/CodeGen/PowerPC/aix-cc-abi.ll
llvm/test/CodeGen/PowerPC/aix-cc-byval-mem.ll
llvm/test/CodeGen/PowerPC/aix-cc-byval.ll
llvm/test/CodeGen/PowerPC/aix-cc-ext-vec-abi.ll
llvm/test/CodeGen/PowerPC/aix-crspill.ll
llvm/test/CodeGen/PowerPC/aix-csr.ll
llvm/test/CodeGen/PowerPC/aix-emit-tracebacktable.ll
llvm/test/CodeGen/PowerPC/aix-framepointer-save-restore.ll
llvm/test/CodeGen/PowerPC/aix-llvm-intrinsic.ll
llvm/test/CodeGen/PowerPC/aix-lr.ll
llvm/test/CodeGen/PowerPC/aix-sret-param.ll
llvm/test/CodeGen/PowerPC/aix-tls-gd-double.ll
llvm/test/CodeGen/PowerPC/aix-tls-gd-int.ll
llvm/test/CodeGen/PowerPC/aix-tls-gd-longlong.ll
llvm/test/CodeGen/PowerPC/aix-tls-xcoff-reloc-large.ll
llvm/test/CodeGen/PowerPC/aix-tls-xcoff-reloc.ll
llvm/test/CodeGen/PowerPC/aix-user-defined-memcpy.ll
llvm/test/CodeGen/PowerPC/aix-vec-arg-spills.ll
llvm/test/CodeGen/PowerPC/aix-vector-stack-caller.ll
llvm/test/CodeGen/PowerPC/aix-xcoff-reloc.ll
llvm/test/CodeGen/PowerPC/aix-xcoff-symbol-rename.ll
llvm/test/CodeGen/PowerPC/all-atomics.ll
llvm/test/CodeGen/PowerPC/alloca-crspill.ll
llvm/test/CodeGen/PowerPC/atomics-i128-ldst.ll
llvm/test/CodeGen/PowerPC/atomics-i128.ll
llvm/test/CodeGen/PowerPC/atomics-indexed.ll
llvm/test/CodeGen/PowerPC/atomics.ll
llvm/test/CodeGen/PowerPC/branch-on-store-cond.ll
llvm/test/CodeGen/PowerPC/byval.ll
llvm/test/CodeGen/PowerPC/canonical-merge-shuffles.ll
llvm/test/CodeGen/PowerPC/constant-pool.ll
llvm/test/CodeGen/PowerPC/csr-split.ll
llvm/test/CodeGen/PowerPC/ctrloop-constrained-fp.ll
llvm/test/CodeGen/PowerPC/ctrloop-fp128.ll
llvm/test/CodeGen/PowerPC/cxx_tlscc64.ll
llvm/test/CodeGen/PowerPC/disable-ctr-ppcf128.ll
llvm/test/CodeGen/PowerPC/elf64-byval-cc.ll
llvm/test/CodeGen/PowerPC/expand-foldable-isel.ll
llvm/test/CodeGen/PowerPC/f128-aggregates.ll
llvm/test/CodeGen/PowerPC/f128-arith.ll
llvm/test/CodeGen/PowerPC/f128-branch-cond.ll
llvm/test/CodeGen/PowerPC/f128-compare.ll
llvm/test/CodeGen/PowerPC/f128-conv.ll
llvm/test/CodeGen/PowerPC/f128-fma.ll
llvm/test/CodeGen/PowerPC/f128-passByValue.ll
llvm/test/CodeGen/PowerPC/f128-rounding.ll
llvm/test/CodeGen/PowerPC/f128-truncateNconv.ll
llvm/test/CodeGen/PowerPC/fast-isel-branch.ll
llvm/test/CodeGen/PowerPC/float-load-store-pair.ll
llvm/test/CodeGen/PowerPC/fmf-propagation.ll
llvm/test/CodeGen/PowerPC/fminnum.ll
llvm/test/CodeGen/PowerPC/fp-int128-fp-combine.ll
llvm/test/CodeGen/PowerPC/fp-strict-conv-f128.ll
llvm/test/CodeGen/PowerPC/fp-strict-conv-spe.ll
llvm/test/CodeGen/PowerPC/fp-strict-f128.ll
llvm/test/CodeGen/PowerPC/fp-strict-fcmp.ll
llvm/test/CodeGen/PowerPC/fp-strict-round.ll
llvm/test/CodeGen/PowerPC/fp-strict.ll
llvm/test/CodeGen/PowerPC/fp128-bitcast-after-operation.ll
llvm/test/CodeGen/PowerPC/frem.ll
llvm/test/CodeGen/PowerPC/funnel-shift.ll
llvm/test/CodeGen/PowerPC/handle-f16-storage-type.ll
llvm/test/CodeGen/PowerPC/inlineasm-i64-reg.ll
llvm/test/CodeGen/PowerPC/jump-tables-collapse-rotate.ll
llvm/test/CodeGen/PowerPC/larger-than-red-zone.ll
llvm/test/CodeGen/PowerPC/lower-intrinsics-afn-mass_notail.ll
llvm/test/CodeGen/PowerPC/lower-intrinsics-fast-mass_notail.ll
llvm/test/CodeGen/PowerPC/machine-pre.ll
llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll
llvm/test/CodeGen/PowerPC/no-duplicate.ll
llvm/test/CodeGen/PowerPC/not-fixed-frame-object.ll
llvm/test/CodeGen/PowerPC/out-of-range-dform.ll
llvm/test/CodeGen/PowerPC/pcrel_ldst.ll
llvm/test/CodeGen/PowerPC/pow-025-075-intrinsic-scalar-mass-fast.ll
llvm/test/CodeGen/PowerPC/pow-025-075-nointrinsic-scalar-mass-fast.ll
llvm/test/CodeGen/PowerPC/ppc-prologue.ll
llvm/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll
llvm/test/CodeGen/PowerPC/ppc32-nest.ll
llvm/test/CodeGen/PowerPC/ppc64-P9-setb.ll
llvm/test/CodeGen/PowerPC/ppc64-byval-larger-struct.ll
llvm/test/CodeGen/PowerPC/ppc64-byval-multi-store.ll
llvm/test/CodeGen/PowerPC/ppc64-inlineasm-clobber.ll
llvm/test/CodeGen/PowerPC/ppc64-nest.ll
llvm/test/CodeGen/PowerPC/ppc64-notoc-rm-relocation.ll
llvm/test/CodeGen/PowerPC/ppc64-rop-protection-aix.ll
llvm/test/CodeGen/PowerPC/ppc64-rop-protection.ll
llvm/test/CodeGen/PowerPC/ppcf128-constrained-fp-intrinsics.ll
llvm/test/CodeGen/PowerPC/ppcf128-endian.ll
llvm/test/CodeGen/PowerPC/pr33547.ll
llvm/test/CodeGen/PowerPC/pr36292.ll
llvm/test/CodeGen/PowerPC/pr41088.ll
llvm/test/CodeGen/PowerPC/pr43527.ll
llvm/test/CodeGen/PowerPC/pr43976.ll
llvm/test/CodeGen/PowerPC/pr44183.ll
llvm/test/CodeGen/PowerPC/pr45301.ll
llvm/test/CodeGen/PowerPC/pr45432.ll
llvm/test/CodeGen/PowerPC/pr47373.ll
llvm/test/CodeGen/PowerPC/pr48519.ll
llvm/test/CodeGen/PowerPC/pr48527.ll
llvm/test/CodeGen/PowerPC/pr49092.ll
llvm/test/CodeGen/PowerPC/pr55463.ll
llvm/test/CodeGen/PowerPC/pr56469.ll
llvm/test/CodeGen/PowerPC/read-set-flm.ll
llvm/test/CodeGen/PowerPC/recipest.ll
llvm/test/CodeGen/PowerPC/reg-scavenging.ll
llvm/test/CodeGen/PowerPC/remove-redundant-load-imm.ll
llvm/test/CodeGen/PowerPC/retaddr.ll
llvm/test/CodeGen/PowerPC/retaddr2.ll
llvm/test/CodeGen/PowerPC/retaddr_multi_levels.ll
llvm/test/CodeGen/PowerPC/scalar-rounding-ops.ll
llvm/test/CodeGen/PowerPC/sms-cpy-1.ll
llvm/test/CodeGen/PowerPC/sms-phi-1.ll
llvm/test/CodeGen/PowerPC/sms-phi-3.ll
llvm/test/CodeGen/PowerPC/spe.ll
llvm/test/CodeGen/PowerPC/srem-lkk.ll
llvm/test/CodeGen/PowerPC/srem-seteq-illegal-types.ll
llvm/test/CodeGen/PowerPC/stack-restore-with-setjmp.ll
llvm/test/CodeGen/PowerPC/store_fptoi.ll
llvm/test/CodeGen/PowerPC/tailcall-speculatable-callee.ll
llvm/test/CodeGen/PowerPC/testComparesi32gtu.ll
llvm/test/CodeGen/PowerPC/testComparesi32ltu.ll
llvm/test/CodeGen/PowerPC/tocSaveInPrologue.ll
llvm/test/CodeGen/PowerPC/urem-lkk.ll
llvm/test/CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll
llvm/test/CodeGen/PowerPC/vector-reduce-fadd.ll
llvm/test/DebugInfo/XCOFF/explicit-section.ll
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The cost of an mflr is a characteristic of an implementation, not of the architecture. A future processor might have a slow mflr. I think a new subtarget feature is needed for this.