The method of counting resource consumption is modified to be based on
"Cycles" value when DFA is not used.
The calculation of ResMII is modified to total "Cycles" and divide it
by the number of units for each resource. Previously, ResMII was
excessive because it was assumed that resources were consumed for
the cycles of "Latency" value.
The method of resource reservation is modified similarly. When a
value of "Cycles" is larger than 1, the resource is considered to be
consumed by 1 for cycles of its length from the scheduled cycle.
To realize this, ResourceManager maintains a resource table for all
slots. Previously, resource consumption was always 1 for 1 cycle
regardless of the value of "Cycles" or "Latency".
In addition, the number of instructions per cycle is modified to be
constrained by "IssueWidth".
For the case of using DFA, the scheduling results are unchanged.
Example:
- Command: $ llc -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 --ppc-enable-pipeliner --debug-only=pipeliner --pipeliner-dbg-res --pipeliner-max-stages=10 llvm/test/CodeGen/PowerPC/sms-phi-2.ll
- Previous result:
MII = 15 MAX_II = 25 (rec=2, res=15) Schedule Found? 1 (II=15)
- Modified result:
#Insts: 7, IssueWidth: 8, Cycles: 1 ID Name Units Consumed Cycles 1 ALU 4 5 2 2 ALUE 2 0 0 3 ALUO 2 0 0 4 BR 1 0 0 5 CY 1 0 0 6 DFU 1 0 0 7 DISP_NBR 6 15 3 8 DISP_SS 4 8 2 9 DISPb01 2 0 0 10 DISPx02 2 4 2 11 DISPx13 2 4 2 12 DISPxab 2 3 2 13 DIV 2 8 4 14 DP 4 1 1 15 DPE 2 0 0 16 DPO 2 0 0 17 IP_AGEN 4 1 1 18 IP_EXEC 4 9 3 19 IP_EXECE 2 1 1 20 IP_EXECO 2 1 1 21 LS 4 1 1 22 PM 2 0 0 MII = 4 MAX_II = 14 (rec=2, res=4) MRT: Slot 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 #Insts 0 0 0 0 0 0 0 3 2 0 2 2 1 2 1 0 0 0 2 1 1 0 0 2 1 3 0 0 0 0 0 5 4 0 2 2 2 2 0 0 0 1 4 0 0 1 0 3 2 0 0 0 0 0 0 3 2 0 0 0 0 2 0 0 0 0 1 0 0 0 0 0 3 2 0 0 0 0 0 4 0 0 0 0 0 2 0 0 0 0 2 0 0 0 0 2 Schedule Found? 1 (II=4)
The modification provides a better ResMII and can actually schedule it at that value (although resource management during scheduling is changed to be more restrictive.)
The modifications will produce more aggressive schedule, but the final result may not be significantly different due to the limitation of the maximum number of stages (3).
I believe that only ARM Cortex-M7 enables pipeliner by default. What are your thoughts on the impact of this modification?
Will need to adjust for use of NumMicroOps as described in later comments