SiFive7's scalar execution consists of 4 stages AG, M1, M2, WB.
Most simple arithmetic and branch instructions can execute in
either AG or M2.
If the operands are ready, the instruction will execute in the AG
stage. Otherwise, it executes in the M2 stage. Everything is fully
bypassed, so dependent instructions should only see 1 cycle latency.
This patch adds ReadAdvances to pretend that these instructions
execute in the M2 ALU and reads their operands then. This allows
the scheduler to schedule dependent instructions back to back.
I've increased branch latency to 3 since they are also executed in both
stages. Still need to fix JALR, but I want to cleanup some scheduler
classes first.
Multiply, cpop and division instructions can only start in the AG stage.
Still need to do some work for FP instructions that produce integer results.
I've added an llvm-mca test that creates a long dependency chain.
The timeline view can show that things are bypassed. I
didn't check all permutations, but we have some variety.