This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Scheduler description for Rocket Core
ClosedPublic

Authored by HsiangKai on Oct 8 2019, 10:22 PM.

Details

Summary

[RISCV] Pipeline scheduler model for RISCV Rocket micro-architecture using the MIScheduler interface. Support for 32 and 64-bit Rocket cores is implemented.

Diff Detail

Event Timeline

compilerguy created this revision.Oct 8 2019, 10:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 10:22 PM
khchen added a subscriber: khchen.Oct 8 2019, 10:27 PM
kito-cheng added inline comments.Oct 8 2019, 11:04 PM
llvm/lib/Target/RISCV/RISCVInstrInfoM.td
35

Predicates are removed by accident?

A few nits

llvm/lib/Target/RISCV/RISCVSchedRocket32.td
10

Hmmmm :)

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
10

Hmmmm :)

lenary added inline comments.Oct 10 2019, 6:46 AM
llvm/lib/Target/RISCV/RISCVSchedRocket32.td
2

This should match the filename.

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

This should match the filename.

luismarques added inline comments.
llvm/lib/Target/RISCV/RISCVSchedRocket32.td
80

typo

@javedabsar (or @javed.absar) I seem to recall you have experience with schedulers. If you could give us a hand here that'd be great! :)

@javedabsar (or @javed.absar) I seem to recall you have experience with schedulers. If you could give us a hand here that'd be great! :)

Sure no problem Roger :)

Could you please point me to some doc which describes the pipeline model of RISCVRocket64 - i.e. what kind of processing units are available, how each instruction flows through the pipeline (fully pipelined or partially, latencies, resource dependences)?

That would be my starting point to match against the schedules defined in schedule*.td.

Is there a forwarding mechanism in Rocket? If so, that can be added to make the model more accurate.

llvm/lib/Target/RISCV/RISCVSchedRocket32.td
74

I guess its input dependent. Probably better to take average case 17.
Is 'Rocket32UnitIMul]' intentional (i.e. Div using Mul instead of UnitDiv) ?
If DIV is not pipelined then please put ResourceCycle = ... , as well.

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

What are the main differences in terms of schedule between Rocket63 and . Looks like to me that a lot of code can be factored out into a common td

93

You could use let Latency = 2 in { ... } to avoid repeating

Hi Michael:

By the way, thanks for the link - https://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ . But the waybacklink doesnt work for me. Is there an alternate way to get the arch doc. Thanks

apazos added inline comments.Oct 16 2019, 5:32 PM
llvm/lib/Target/RISCV/RISCVSchedRocket32.td
17

Can you add to the comments a link to the processor model documentation used in the schedule definitions (latencies, function units types, etc.)?

Have you been able to run utils/schedcover.py on Rocket32Model and Rocket64Model?

Some suggestions inlined.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
342

Specify SchedReadWrite for $rs1.

400

If all BranchCC_rri have the same Sched<> list, it could be specified in BranchCC_rri class.

401

I supposed you should specify each operand with SchedReadWrite type. It will be the interface between instructions and scheduling model. So, I think we should preserve the place holder for input operands. It will look like

def LB : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadLDB]>;

All the following instructions should specify the SchedReadWrite types for each operand, no matter output operand or input operand.

llvm/lib/Target/RISCV/RISCVSchedule.td
106

They are the only two SchedRead definitions. However, there is no place using them. I think you should specify more SchedRead types and associate these SchedRead to input operands in instruction definitions.

If you have let UnsupportedFeatures = [IsRV32] (or similarly for RV64) in your model definition, you can avoid needing to list a bunch of scheduler resources with let Unsupported = 1;.

jrtc27 added a comment.EditedOct 18 2019, 6:19 AM

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

Do we need separate schedule information for compressed and uncompressed instructions? I would assume that any sensible RVC implementation would decode the two to exactly the same control logic (other than for PC incrementing and the original instruction for mtval).

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

In general, InstRW is used to describe subtarget-specific scheduling behavior for specific instructions. If we use InstRW to describe the whole pipeline model, we need to create the mappings between scheduling types (SchedReadWrite) to instructions for every subtarget. It is a tedious work when adding a new subtarget for the target. I think it is more reasonable to associate SchedReadWrite list to instructions for target and specify latencies and cycles to these SchedReadWrite in subtarget. We only need to associate SchedReadWrite list to instructions only once and specify latencies and cycles for every subtarget. It is more modular and it is easier for subtarget.

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

In general, InstRW is used to describe subtarget-specific scheduling behavior for specific instructions. If we use InstRW to describe the whole pipeline model, we need to create the mappings between scheduling types (SchedReadWrite) to instructions for every subtarget. It is a tedious work when adding a new subtarget for the target. I think it is more reasonable to associate SchedReadWrite list to instructions for target and specify latencies and cycles to these SchedReadWrite in subtarget. We only need to associate SchedReadWrite list to instructions only once and specify latencies and cycles for every subtarget. It is more modular and it is easier for subtarget.

I too think that defining SchedReadWrite and associating them with Instruction classes is better (rather than wholly relying on InstRW). In fact there was a period I rewrote large part of ARM schedule.tds to move away from InstRW approach.

Having said that, it looks like we might be defining too many SchedReadWrite categories to differentiate instruction types that may actually have identical latency and resource numbers.

Addressed review comments.

Hi Michael:

By the way, thanks for the link - https://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ . But the waybacklink doesnt work for me. Is there an alternate way to get the arch doc. Thanks

Hi @javed.absar,

Thanks for your review and comments. You could find the pipeline model in http://www-inst.eecs.berkeley.edu/~cs250/fa13/handouts/lab2-riscv.pdf, page 13. In addition, SiFive U54 is based on Rocket microarchitecture, so you could also refer to https://sifive.cdn.prismic.io/sifive%2Fac48c4fd-af85-46be-9dd9-22aa34ba6977_u54mc-core-complex-manual-v19.05.pdf for pipeline information. I know the information is limited in the documents. If there is any question that you can not find answer from documents, welcome to raise your questions. Thanks.

Do we need separate schedule information for compressed and uncompressed instructions? I would assume that any sensible RVC implementation would decode the two to exactly the same control logic (other than for PC incrementing and the original instruction for mtval).

We merged the schedule information for c-ext instructions. Thanks.

HsiangKai added inline comments.Nov 27 2019, 10:49 PM
llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

We also think that it should be factored out for the common part. However, we need to provide SchedMachineModel attribute to WriteRes and ReadAdvance and the model is different for Rocket32 and Rocket64. We have no idea how to factor out the common parts. Do you have any suggestions?

khchen added a comment.Dec 6 2019, 9:01 AM

BTW, I used this patch and D71124 to run the CoreMark on HiFive unleashed with -O3 -mllvm -enable-misched -mllvm -enable-post-misched -mcpu=rocket-rv64,
the result is better than -O3 -mllvm -enable-misched -mllvm -enable-post-misched and get the performance 3.5% speedup (use ld.bfd and libgcc).

Jim added a comment.Dec 9 2019, 1:28 AM

A few nits.
LLVM prefers spaces to tabs.

llvm/lib/Target/RISCV/RISCV.td
99

Blank line should not be deleted.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
450

indentation with tabs.

456

indentation with tabs.

468

indentation with tabs.

478

indentation with tabs.

947

indentation with tabs.

952

indentation with tabs.

967

indentation with tabs.

HsiangKai commandeered this revision.Dec 9 2019, 8:21 PM
HsiangKai updated this revision to Diff 232993.
HsiangKai edited reviewers, added: compilerguy; removed: HsiangKai.

Fix format typo.

Overall, it looks nearly good enough. I don't know how the scheduler will behave is interesting cases when there is no ShedRead for each argument in the ins list of the instructions. However, it shouldn't be too difficult to add them in the instruction classes.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
400

Agreed.

llvm/lib/Target/RISCV/RISCVSchedRocket32.td
32

You might consider setting BufferSize = 1 for this non-pipelined units.

37

You might consider setting BufferSize = 1 for this non-pipelined units.

61

Typically, for most values, the latency is near the worst case. I suggest using the worst case here. And, as @javed.absar said, it also needs ResourceCycle = <worst case>; here as well.

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
31

You might consider setting BufferSize = 1 for this non-pipelined units.

36

You might consider setting BufferSize = 1 for this non-pipelined units.

60

Again, I suggest using the worst case here and below.

101

Per the comment above, is it 5 or 6?

126

Outline the Latency to avoid repetitions.

129

Ditto.

llvm/lib/Target/RISCV/RISCVSchedule.td
23

I don't think that nop needs a scheduling class.

28
s/WriteLDDW/WriteLDD/
33
s/WriteLSTDW/WriteSTD/
37
s/WriteAtomicLDDW/WriteAtomicLDD/
39
s/WriteAtomicSTDW/WriteAtomicSTD/
58

Swap this line with the one above it.

84
s/WriteFLd32/WriteFLD32/
85
s/WriteFLd64/WriteFLD64/
86
s/WriteFSt32s/WriteFST32/
87
s/WriteFSt64/WriteFST64/
evandro added inline comments.Jan 13 2020, 2:02 PM
llvm/lib/Target/RISCV/RISCVSchedule.td
23

Oh, never mind me. A SchedWrite is necessary to capture its µop, assuming it's not subsumed by the fetch engine.

shiva0217 added inline comments.Jan 13 2020, 10:16 PM
llvm/lib/Target/RISCV/RISCVSchedule.td
106

Will the SchedRead for input operands be added, so they could be used by ReadAdvance to describe forwarding rules?

HsiangKai marked an inline comment as done.Jan 14 2020, 9:08 PM
HsiangKai added inline comments.
llvm/lib/Target/RISCV/RISCVSchedule.td
106

I also think so. I will add SchedRead for input operands.

Address Evandro's comments.

lenary added inline comments.Jan 15 2020, 1:10 AM
llvm/lib/Target/RISCV/RISCVSchedule.td
106

I presume this patch is not yet ready to land because it is still missing SchedRead for most input operands? Do you expect to have those ready before the LLVM 10.0 branch?

HsiangKai marked an inline comment as done.Jan 15 2020, 6:53 AM
HsiangKai added inline comments.
llvm/lib/Target/RISCV/RISCVSchedule.td
106

I will add SchedRead for input operands for these two days.

HsiangKai added reviewers: evandro, lenary.

Add SchedRead for input operands.

Except for a handful of gnats, this is really close to LGTM.

  • The stores also need a SchedRead for register being stored, besides one for the base address register.
  • nop, both I and C, should have a ShedWrite to capture their µop, as some µarchitectures might subsume the µop in the front end.
  • The Pseudo class definition in RISCVInstrFormats.td should set the SchedRW to an empty list [] so that you wouldn't have to include any SchedReadWrite class for pseudo instructions.
llvm/lib/Target/RISCV/RISCVInstrInfo.td
403–405

Stores also read the base address in a register.

498

Read base address in register.

llvm/lib/Target/RISCV/RISCVInstrInfoC.td
330

Again, the base address is in an input register.

364

This should use the same SchedReadWrite class as for the I nop.

533

Input register needs a SchedRead.

592

Same ShedReadWrite class as for the I nop.

llvm/lib/Target/RISCV/RISCVInstrInfoD.td
82

SchedReads missing.

llvm/lib/Target/RISCV/RISCVSchedule.td
127

A single SchedRead for the base address register should suffice for RV32 and RV64 loads and stores.

HsiangKai updated this revision to Diff 238691.Jan 16 2020, 8:32 PM

Address Evandro's comments.

It LGTM, but let's wait till tomorrow, when the patch will be 3 business days old, for any input.

evandro accepted this revision.Jan 21 2020, 1:27 PM

Going thrice!

This revision is now accepted and ready to land.Jan 21 2020, 1:27 PM
lenary accepted this revision.Jan 21 2020, 1:39 PM

I am happy with the changes, though unfamiliar with Schedulers specifically (except what I have tried to learn when reading this patch).

I am happy that the reviews have been thorough enough for this patch to land.

This revision was automatically updated to reflect the committed changes.