Page MenuHomePhabricator

[RISCV] Scheduler description for Rocket Core
Needs ReviewPublic

Authored by compilerguy on Oct 8 2019, 10:22 PM.

Details

Summary

[RISCV] Pipeline scheduler model for RISCV Rocket micro-architecture using the MIScheduler interface. Support for 32 and 64-bit Rocket cores is implemented.

Diff Detail

Event Timeline

compilerguy created this revision.Oct 8 2019, 10:22 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 8 2019, 10:22 PM
khchen added a subscriber: khchen.Oct 8 2019, 10:27 PM
kito-cheng added inline comments.Oct 8 2019, 11:04 PM
llvm/lib/Target/RISCV/RISCVInstrInfoM.td
35

Predicates are removed by accident?

A few nits

llvm/lib/Target/RISCV/RISCVSchedRocket32.td
2

This should match the filename.

10

Hmmmm :)

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

This should match the filename.

10

Hmmmm :)

luismarques added inline comments.
llvm/lib/Target/RISCV/RISCVSchedRocket32.td
80

typo

@javedabsar (or @javed.absar) I seem to recall you have experience with schedulers. If you could give us a hand here that'd be great! :)

@javedabsar (or @javed.absar) I seem to recall you have experience with schedulers. If you could give us a hand here that'd be great! :)

Sure no problem Roger :)

Could you please point me to some doc which describes the pipeline model of RISCVRocket64 - i.e. what kind of processing units are available, how each instruction flows through the pipeline (fully pipelined or partially, latencies, resource dependences)?

That would be my starting point to match against the schedules defined in schedule*.td.

Is there a forwarding mechanism in Rocket? If so, that can be added to make the model more accurate.

llvm/lib/Target/RISCV/RISCVSchedRocket32.td
74

I guess its input dependent. Probably better to take average case 17.
Is 'Rocket32UnitIMul]' intentional (i.e. Div using Mul instead of UnitDiv) ?
If DIV is not pipelined then please put ResourceCycle = ... , as well.

llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

What are the main differences in terms of schedule between Rocket63 and . Looks like to me that a lot of code can be factored out into a common td

93

You could use let Latency = 2 in { ... } to avoid repeating

Hi Michael:

By the way, thanks for the link - https://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ . But the waybacklink doesnt work for me. Is there an alternate way to get the arch doc. Thanks

apazos added inline comments.Oct 16 2019, 5:32 PM
llvm/lib/Target/RISCV/RISCVSchedRocket32.td
17

Can you add to the comments a link to the processor model documentation used in the schedule definitions (latencies, function units types, etc.)?

Have you been able to run utils/schedcover.py on Rocket32Model and Rocket64Model?

Some suggestions inlined.

llvm/lib/Target/RISCV/RISCVInstrInfo.td
339

Specify SchedReadWrite for $rs1.

390

If all BranchCC_rri have the same Sched<> list, it could be specified in BranchCC_rri class.

396

I supposed you should specify each operand with SchedReadWrite type. It will be the interface between instructions and scheduling model. So, I think we should preserve the place holder for input operands. It will look like

def LB : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadLDB]>;

All the following instructions should specify the SchedReadWrite types for each operand, no matter output operand or input operand.

llvm/lib/Target/RISCV/RISCVSchedule.td
106

They are the only two SchedRead definitions. However, there is no place using them. I think you should specify more SchedRead types and associate these SchedRead to input operands in instruction definitions.

If you have let UnsupportedFeatures = [IsRV32] (or similarly for RV64) in your model definition, you can avoid needing to list a bunch of scheduler resources with let Unsupported = 1;.

jrtc27 added a comment.EditedOct 18 2019, 6:19 AM

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

Do we need separate schedule information for compressed and uncompressed instructions? I would assume that any sensible RVC implementation would decode the two to exactly the same control logic (other than for PC incrementing and the original instruction for mtval).

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

In general, InstRW is used to describe subtarget-specific scheduling behavior for specific instructions. If we use InstRW to describe the whole pipeline model, we need to create the mappings between scheduling types (SchedReadWrite) to instructions for every subtarget. It is a tedious work when adding a new subtarget for the target. I think it is more reasonable to associate SchedReadWrite list to instructions for target and specify latencies and cycles to these SchedReadWrite in subtarget. We only need to associate SchedReadWrite list to instructions only once and specify latencies and cycles for every subtarget. It is more modular and it is easier for subtarget.

I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):

def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;

In general, InstRW is used to describe subtarget-specific scheduling behavior for specific instructions. If we use InstRW to describe the whole pipeline model, we need to create the mappings between scheduling types (SchedReadWrite) to instructions for every subtarget. It is a tedious work when adding a new subtarget for the target. I think it is more reasonable to associate SchedReadWrite list to instructions for target and specify latencies and cycles to these SchedReadWrite in subtarget. We only need to associate SchedReadWrite list to instructions only once and specify latencies and cycles for every subtarget. It is more modular and it is easier for subtarget.

I too think that defining SchedReadWrite and associating them with Instruction classes is better (rather than wholly relying on InstRW). In fact there was a period I rewrote large part of ARM schedule.tds to move away from InstRW approach.

Having said that, it looks like we might be defining too many SchedReadWrite categories to differentiate instruction types that may actually have identical latency and resource numbers.

Addressed review comments.

Hi Michael:

By the way, thanks for the link - https://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ . But the waybacklink doesnt work for me. Is there an alternate way to get the arch doc. Thanks

Hi @javed.absar,

Thanks for your review and comments. You could find the pipeline model in http://www-inst.eecs.berkeley.edu/~cs250/fa13/handouts/lab2-riscv.pdf, page 13. In addition, SiFive U54 is based on Rocket microarchitecture, so you could also refer to https://sifive.cdn.prismic.io/sifive%2Fac48c4fd-af85-46be-9dd9-22aa34ba6977_u54mc-core-complex-manual-v19.05.pdf for pipeline information. I know the information is limited in the documents. If there is any question that you can not find answer from documents, welcome to raise your questions. Thanks.

Do we need separate schedule information for compressed and uncompressed instructions? I would assume that any sensible RVC implementation would decode the two to exactly the same control logic (other than for PC incrementing and the original instruction for mtval).

We merged the schedule information for c-ext instructions. Thanks.

HsiangKai added inline comments.Wed, Nov 27, 10:49 PM
llvm/lib/Target/RISCV/RISCVSchedRocket64.td
2

We also think that it should be factored out for the common part. However, we need to provide SchedMachineModel attribute to WriteRes and ReadAdvance and the model is different for Rocket32 and Rocket64. We have no idea how to factor out the common parts. Do you have any suggestions?

khchen added a comment.Fri, Dec 6, 9:01 AM

BTW, I used this patch and D71124 to run the CoreMark on HiFive unleashed with -O3 -mllvm -enable-misched -mllvm -enable-post-misched -mcpu=rocket-rv64,
the result is better than -O3 -mllvm -enable-misched -mllvm -enable-post-misched and get the performance 3.5% speedup (use ld.bfd and libgcc).