This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Implement SchedModel for Power7
ClosedPublic

Authored by qiucf on Aug 23 2023, 10:12 PM.

Details

Reviewers
nemanjai
shchenz
stefanp
Group Reviewers
Restricted Project
Commits
rG69b056d5638b: [PowerPC] Implement SchedModel for Power7
Summary

This patch implements the resource based SchedModel for Power7 CPU. Only the autogenerated cases are updated in this revision.

Diff Detail

Event Timeline

qiucf created this revision.Aug 23 2023, 10:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 10:12 PM
qiucf requested review of this revision.Aug 23 2023, 10:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 10:12 PM
qiucf updated this revision to Diff 555652.Sep 3 2023, 10:26 PM
qiucf added reviewers: Restricted Project, nemanjai, shchenz, stefanp.
qiucf updated this revision to Diff 556106.Sep 6 2023, 10:23 PM
qiucf updated this revision to Diff 556250.Sep 8 2023, 6:27 AM
shchenz added inline comments.Sep 12 2023, 7:34 PM
llvm/lib/Target/PowerPC/PPCScheduleP7.td
33

What rules do we take to add unsupported features here? I believe we are missing lots of CPU features defined in PPC.td.

51

I checked P7_bookIV version 2.1, in sections 4.1.3, it says there is only 1 VMX unit:

One VMX execution unit capable of executing simple FX, complex FX, permute and 4-way SIMD single-precision FP ops

 Twelve execution units
– Two symmetric load/store units (LSU), also capable of executing simple fixed-point ops
– Two symmetric fixed-point units (FXU)
– Four floating-point units (FPU), implemented as two 2-way SIMD operations for double- and
single-precision. Scalar binary floating point instructions can only use two FPUs.
– One VMX execution unit capable of executing simple FX, complex FX, permute and 4-way SIMD
single-precision FP ops
– One decimal floating-point unit (DFU)
– 1 Branch execution unit (BR)
– 1 CR Logical execution unit (CRL)
60

use meaningful names please.

64
Out of order issue of up to 8 operations into the following 8 issue ports
– Two load or store operations
– Two fixed-point operations
– Two issue ports shared by two floating-point, two VSX, two VMX and one DFP ops
– One branch operation
– One condition register operation
218

hmm, in the above version, I see latency info for vsx scalar instructions.

qiucf updated this revision to Diff 556622.Sep 12 2023, 9:25 PM
qiucf marked 4 inline comments as done.
  • Update VSX instructions according to new version UM
  • Rename issue ports
  • Add IsISA2_07 to unsupported features
  • VMX has only one port
shchenz accepted this revision as: shchenz.Sep 12 2023, 10:16 PM

LGTM with a small fix in one LIT.

I assume 1: you fixed all the LIT failures except the auto generated cases(the summary of the patch needs update). 2: you checked that the performance impact of this patch is positive for -mcpu=pwr7.

llvm/lib/Target/PowerPC/PPCScheduleP7.td
33

hmm, OK, let use this setting for now. I think we may need a similar approach in SystemZ arch. That's more clear and robust.

llvm/test/CodeGen/PowerPC/aix-vector-stack-caller.ll
85

Fix the broken case.

This revision is now accepted and ready to land.Sep 12 2023, 10:16 PM
qiucf marked an inline comment as done.Sep 12 2023, 11:52 PM

LGTM with a small fix in one LIT.

I assume 1: you fixed all the LIT failures except the auto generated cases(the summary of the patch needs update). 2: you checked that the performance impact of this patch is positive for -mcpu=pwr7.

Sure.

This revision was landed with ongoing or failed builds.Sep 13 2023, 12:02 AM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/PowerPC/vec_cmpd_p7.ll