[RISCV] Pipeline scheduler model for RISCV Rocket micro-architecture using the MIScheduler interface. Support for 32 and 64-bit Rocket cores is implemented.
Details
Diff Detail
Event Timeline
llvm/lib/Target/RISCV/RISCVInstrInfoM.td | ||
---|---|---|
35 | Predicates are removed by accident? |
llvm/lib/Target/RISCV/RISCVSchedRocket32.td | ||
---|---|---|
79 | typo |
@javedabsar (or @javed.absar) I seem to recall you have experience with schedulers. If you could give us a hand here that'd be great! :)
Sure no problem Roger :)
Could you please point me to some doc which describes the pipeline model of RISCVRocket64 - i.e. what kind of processing units are available, how each instruction flows through the pipeline (fully pipelined or partially, latencies, resource dependences)?
That would be my starting point to match against the schedules defined in schedule*.td.
Is there a forwarding mechanism in Rocket? If so, that can be added to make the model more accurate.
llvm/lib/Target/RISCV/RISCVSchedRocket32.td | ||
---|---|---|
73 | I guess its input dependent. Probably better to take average case 17. | |
llvm/lib/Target/RISCV/RISCVSchedRocket64.td | ||
1 | What are the main differences in terms of schedule between Rocket63 and . Looks like to me that a lot of code can be factored out into a common td | |
92 | You could use let Latency = 2 in { ... } to avoid repeating |
Hi Michael:
By the way, thanks for the link - https://www.lowrisc.org/docs/tagged-memory-v0.1/rocket-core/ . But the waybacklink doesnt work for me. Is there an alternate way to get the arch doc. Thanks
llvm/lib/Target/RISCV/RISCVSchedRocket32.td | ||
---|---|---|
16 | Can you add to the comments a link to the processor model documentation used in the schedule definitions (latencies, function units types, etc.)? Have you been able to run utils/schedcover.py on Rocket32Model and Rocket64Model? |
Some suggestions inlined.
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
323 | Specify SchedReadWrite for $rs1. | |
374 | If all BranchCC_rri have the same Sched<> list, it could be specified in BranchCC_rri class. | |
380 | I supposed you should specify each operand with SchedReadWrite type. It will be the interface between instructions and scheduling model. So, I think we should preserve the place holder for input operands. It will look like def LB : Load_ri<0b000, "lb">, Sched<[WriteLDB, ReadLDB]>; All the following instructions should specify the SchedReadWrite types for each operand, no matter output operand or input operand. | |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
105 | They are the only two SchedRead definitions. However, there is no place using them. I think you should specify more SchedRead types and associate these SchedRead to input operands in instruction definitions. |
If you have let UnsupportedFeatures = [IsRV32] (or similarly for RV64) in your model definition, you can avoid needing to list a bunch of scheduler resources with let Unsupported = 1;.
I wonder whether, instead of putting all the scheduling resource information as part of the instruction definition, we should be doing something like we do with patterns, ie declaring them separately (either in each RISCVInstrInfoX.td, or in RISCVSchedule.td to keep scheduling completely separate from encoding and codegen). For example (formatting aside):
def : InstRW<[WriteFALU32], (instrs FADD_S, FSUB_S, FSGNJ_S, FSGNJN_S, FSGNJX_S, FMIN_S, FMAX_S)>;
Do we need separate schedule information for compressed and uncompressed instructions? I would assume that any sensible RVC implementation would decode the two to exactly the same control logic (other than for PC incrementing and the original instruction for mtval).
In general, InstRW is used to describe subtarget-specific scheduling behavior for specific instructions. If we use InstRW to describe the whole pipeline model, we need to create the mappings between scheduling types (SchedReadWrite) to instructions for every subtarget. It is a tedious work when adding a new subtarget for the target. I think it is more reasonable to associate SchedReadWrite list to instructions for target and specify latencies and cycles to these SchedReadWrite in subtarget. We only need to associate SchedReadWrite list to instructions only once and specify latencies and cycles for every subtarget. It is more modular and it is easier for subtarget.
I too think that defining SchedReadWrite and associating them with Instruction classes is better (rather than wholly relying on InstRW). In fact there was a period I rewrote large part of ARM schedule.tds to move away from InstRW approach.
Having said that, it looks like we might be defining too many SchedReadWrite categories to differentiate instruction types that may actually have identical latency and resource numbers.
Hi @javed.absar,
Thanks for your review and comments. You could find the pipeline model in http://www-inst.eecs.berkeley.edu/~cs250/fa13/handouts/lab2-riscv.pdf, page 13. In addition, SiFive U54 is based on Rocket microarchitecture, so you could also refer to https://sifive.cdn.prismic.io/sifive%2Fac48c4fd-af85-46be-9dd9-22aa34ba6977_u54mc-core-complex-manual-v19.05.pdf for pipeline information. I know the information is limited in the documents. If there is any question that you can not find answer from documents, welcome to raise your questions. Thanks.
llvm/lib/Target/RISCV/RISCVSchedRocket64.td | ||
---|---|---|
1 | We also think that it should be factored out for the common part. However, we need to provide SchedMachineModel attribute to WriteRes and ReadAdvance and the model is different for Rocket32 and Rocket64. We have no idea how to factor out the common parts. Do you have any suggestions? |
BTW, I used this patch and D71124 to run the CoreMark on HiFive unleashed with -O3 -mllvm -enable-misched -mllvm -enable-post-misched -mcpu=rocket-rv64,
the result is better than -O3 -mllvm -enable-misched -mllvm -enable-post-misched and get the performance 3.5% speedup (use ld.bfd and libgcc).
A few nits.
LLVM prefers spaces to tabs.
llvm/lib/Target/RISCV/RISCV.td | ||
---|---|---|
86 | Blank line should not be deleted. | |
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
429 | indentation with tabs. | |
436 | indentation with tabs. | |
449 | indentation with tabs. | |
459 | indentation with tabs. | |
925 | indentation with tabs. | |
930 | indentation with tabs. | |
945 | indentation with tabs. |
Overall, it looks nearly good enough. I don't know how the scheduler will behave is interesting cases when there is no ShedRead for each argument in the ins list of the instructions. However, it shouldn't be too difficult to add them in the instruction classes.
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
374 | Agreed. | |
llvm/lib/Target/RISCV/RISCVSchedRocket32.td | ||
32 | You might consider setting BufferSize = 1 for this non-pipelined units. | |
37 | You might consider setting BufferSize = 1 for this non-pipelined units. | |
61 | Typically, for most values, the latency is near the worst case. I suggest using the worst case here. And, as @javed.absar said, it also needs ResourceCycle = <worst case>; here as well. | |
llvm/lib/Target/RISCV/RISCVSchedRocket64.td | ||
31 | You might consider setting BufferSize = 1 for this non-pipelined units. | |
36 | You might consider setting BufferSize = 1 for this non-pipelined units. | |
60 | Again, I suggest using the worst case here and below. | |
101 | Per the comment above, is it 5 or 6? | |
126 | Outline the Latency to avoid repetitions. | |
129 | Ditto. | |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
23 | I don't think that nop needs a scheduling class. | |
28 | s/WriteLDDW/WriteLDD/ | |
33 | s/WriteLSTDW/WriteSTD/ | |
37 | s/WriteAtomicLDDW/WriteAtomicLDD/ | |
39 | s/WriteAtomicSTDW/WriteAtomicSTD/ | |
58 | Swap this line with the one above it. | |
84 | s/WriteFLd32/WriteFLD32/ | |
85 | s/WriteFLd64/WriteFLD64/ | |
86 | s/WriteFSt32s/WriteFST32/ | |
87 | s/WriteFSt64/WriteFST64/ |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
---|---|---|
23 | Oh, never mind me. A SchedWrite is necessary to capture its µop, assuming it's not subsumed by the fetch engine. |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
---|---|---|
105 | Will the SchedRead for input operands be added, so they could be used by ReadAdvance to describe forwarding rules? |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
---|---|---|
105 | I also think so. I will add SchedRead for input operands. |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
---|---|---|
105 | I presume this patch is not yet ready to land because it is still missing SchedRead for most input operands? Do you expect to have those ready before the LLVM 10.0 branch? |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
---|---|---|
105 | I will add SchedRead for input operands for these two days. |
Except for a handful of gnats, this is really close to LGTM.
- The stores also need a SchedRead for register being stored, besides one for the base address register.
- nop, both I and C, should have a ShedWrite to capture their µop, as some µarchitectures might subsume the µop in the front end.
- The Pseudo class definition in RISCVInstrFormats.td should set the SchedRW to an empty list [] so that you wouldn't have to include any SchedReadWrite class for pseudo instructions.
llvm/lib/Target/RISCV/RISCVInstrInfo.td | ||
---|---|---|
382–384 | Stores also read the base address in a register. | |
479 | Read base address in register. | |
llvm/lib/Target/RISCV/RISCVInstrInfoC.td | ||
330 | Again, the base address is in an input register. | |
364 | This should use the same SchedReadWrite class as for the I nop. | |
527 | Input register needs a SchedRead. | |
586 | Same ShedReadWrite class as for the I nop. | |
llvm/lib/Target/RISCV/RISCVInstrInfoD.td | ||
79 | SchedReads missing. | |
llvm/lib/Target/RISCV/RISCVSchedule.td | ||
127 | A single SchedRead for the base address register should suffice for RV32 and RV64 loads and stores. |
It LGTM, but let's wait till tomorrow, when the patch will be 3 business days old, for any input.
I am happy with the changes, though unfamiliar with Schedulers specifically (except what I have tried to learn when reading this patch).
I am happy that the reviews have been thorough enough for this patch to land.
Blank line should not be deleted.