This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/
-
llvm/
-
CodeGen/
-
MachineScheduler.h
-
MC/
-
MCSchedule.h
-
Target/
-
TargetSchedule.td
-
lib/
-
CodeGen/
-
MachinePipeliner.cpp
-
MachineScheduler.cpp
-
MachineTraceMetrics.cpp
-
MC/
-
MCSchedule.cpp
-
utils/TableGen/
-
TableGen/
-
SubtargetEmitter.cpp

Differential D94604

[CodeGen] Allow parallel uses of a resource
AbandonedPublic

Authored by dpenry on Jan 13 2021, 7:58 AM.

Download Raw Diff

Details

Reviewers

MatzeB
jroelofs
Paul-C-Anagnostopoulos
dmgreen
steven.zhang
atrick
evgeny777
andreadb

Summary

At present, if a write resource is listed more than once by a
instruction, the resource is assumed to be used sequentially
across multiple cycles. The new ResourceUses attribute
of ProcWriteResource permits multiple simultaneous uses to be
specified.

Use the ResourceUses information when performing machine scheduling
and other codegen tasks.

A use of this new annotation is in revision: https://reviews.llvm.org/D94605

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dpenry created this revision.Jan 13 2021, 7:58 AM

Herald added subscribers: ecnelises, javed.absar, hiraditya, MatzeB. · View Herald TranscriptJan 13 2021, 7:58 AM

dpenry requested review of this revision.Jan 13 2021, 7:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 13 2021, 7:58 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

dpenry edited the summary of this revision. (Show Details)Jan 13 2021, 8:08 AM

dpenry added reviewers: MatzeB, jroelofs, Paul-C-Anagnostopoulos, dmgreen.

Harbormaster completed remote builds in B85023: Diff 316402.Jan 13 2021, 8:46 AM

dmgreen added reviewers: steven.zhang, atrick, evgeny777, andreadb.Jan 16 2021, 10:38 AM

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

In D94604#2504464, @andreadb wrote:
I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort
def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }
In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)
def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;
At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>
Example - after:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>
In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:

{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136

From ARMWriteProcResTable:

{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164

From CortexM7ModelProcResources:

{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12

In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.

In D94604#2505268, @dpenry wrote:
In D94604#2504464, @andreadb wrote:
I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort
def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }
In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)
def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;
At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>
Example - after:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>
In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.
Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:
{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136
From ARMWriteProcResTable:
{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164
From CortexM7ModelProcResources:
{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12
In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.

Maybe I am missing some context here (apologies in case), but why is that a problem in practice?

This is how I see it:

Resource-cycles are there to limit the resource throughput. The write from your example can only be issued when both ports (M7UnitVPort0 and M7UnitVPort1) are available. If group M7UnitVPort is partially or fully used, then your write needs to be delayed until both ports become available. The model assumes that micro-opcodes are all dispatched at the same cycle. We cannot currently model "delayed consumption of resources", so resource consumption starts immediately at the beginning of the issue cycle.
In practice, what that means is that ports are "consumed" during the entire duration of the issue cycle. The two resource cycles set by ExpandProcResources for group M7UnitVPort are in practice contributed by the underlying units (i.e. 1 cycle of M7UnitVPort0, and 1 cycle by M7UnitVPort1). So the group doesn't need to be consumed for any extra cycles.
That write alone is enough to maximise the throughput of M7UnitVPort; no other write that uses M7UnitVPort0 and/or M7UnitVPort1 can issue during that same cycle.

In D94604#2506534, @andreadb wrote:
In D94604#2505268, @dpenry wrote:
In D94604#2504464, @andreadb wrote:
I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort
def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }
In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)
def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;
At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>
Example - after:
def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>
In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.
Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:
{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136
From ARMWriteProcResTable:
{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164
From CortexM7ModelProcResources:
{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12
In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.
Maybe I am missing some context here (apologies in case), but why is that a problem in practice?

This is how I see it:

Resource-cycles are there to limit the resource throughput. The write from your example can only be issued when both ports (M7UnitVPort0 and M7UnitVPort1) are available. If group M7UnitVPort is partially or fully used, then your write needs to be delayed until both ports become available. The model assumes that micro-opcodes are all dispatched at the same cycle. We cannot currently model "delayed consumption of resources", so resource consumption starts immediately at the beginning of the issue cycle.
In practice, what that means is that ports are "consumed" during the entire duration of the issue cycle. The two resource cycles set by ExpandProcResources for group M7UnitVPort are in practice contributed by the underlying units (i.e. 1 cycle of M7UnitVPort0, and 1 cycle by M7UnitVPort1). So the group doesn't need to be consumed for any extra cycles.
That write alone is enough to maximise the throughput of M7UnitVPort; no other write that uses M7UnitVPort0 and/or M7UnitVPort1 can issue during that same cycle.

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

Some instructions require the entire FP datapath
Other instructions require half of the FP datapath
It is possible to dual-issue two instructions each requiring half of the FP datapath
It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles. And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW. What it cares about is the list of resources and cycles in the WriteProcResTable. So what appears to be happening when the resource group method is used is that MachineScheduler, when scheduling one of these instructions that uses both M7UnitVPort0 and M7UnitVPort1, marks M7UnitVPort0 and M7VPort1 as being occupied the until next cycle and one M7UnitVPort as occupied until two cycles from now. (SchedBoundary::bumpNode, line 2427 for the top-down scheduling side). Similarly, the top-down check for scheduling this instruction looks for the first cycle in which all three of M7UnitVPort0, M7UnitVPort1, and M7UnitVPort are available, and doesn't try to find two of M7UnitVPort (SchedBoundary::getNextResourceCycle). (Note that bottom-up scheduling takes the cycle count into account by adding it to the first available cycle and not by incorporating it into the recorded occupancy, with essentially the same effect.) This does two unwanted things:

It allows another M7UnitVPort user (e.g., a VLDR) to simultaneously issue in this cycle, which is not wanted.
It prevents two M7UnitVPort users (e.g VLDR.F32) from simultaneously issuing in the next cycle, when they should be able to.

It does succeed in preventing two dual-M7UnitVPort users from simultaneously issuing due to the limitations in M7UnitVPort0 and M7UnitVPort1.

I'm not experienced enough in the machine scheduler to approve it, but the description of the problem and the proposed solution make sense to me.

Is there still some concerns about this patch ?

In D94604#2506907, @dpenry wrote:

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

Some instructions require the entire FP datapath

Other instructions require half of the FP datapath

It is possible to dual-issue two instructions each requiring half of the FP datapath

It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles. And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW. What it cares about is the list of resources and cycles in the WriteProcResTable.

The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.

In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.

In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

In D94604#2613522, @andreadb wrote:

In D94604#2506907, @dpenry wrote:

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

Some instructions require the entire FP datapath

Other instructions require half of the FP datapath

It is possible to dual-issue two instructions each requiring half of the FP datapath

It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

That is certainly what I would have expected resource groups to do.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles. And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW. What it cares about is the list of resources and cycles in the WriteProcResTable.

The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.

In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

That is the computation I'm seeing.

If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.

In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

That's what I don't see happening in MachineScheduler. It sees the group as if it were a separate resource which is consumed for two cycles and doesn't try to find parallel units within the group to provide those two cycles of resource consumption. As far as I can tell, the concept of groups does not exist at all in MachineScheduler.

Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

I think where we're getting to is that MachineScheduler has never been made to work "as expected" with groups. If groups are the accepted way to specify this sort of resource usage, then making MachineScheduler use group information would seem to be preferable to adding a new annotation. However, I do have one reservation. Groups are used fairly widely at present -- I see them in PowerPC, X86, AArch64, and ARM. I am not at all sanguine about changing something which would have such widespread effects without more people chiming in. Any idea who else should be part of this discussion?

In D94604#2618347, @dpenry wrote:

In D94604#2613522, @andreadb wrote:

In D94604#2506907, @dpenry wrote:

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

Some instructions require the entire FP datapath

Other instructions require half of the FP datapath

It is possible to dual-issue two instructions each requiring half of the FP datapath

It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

That is certainly what I would have expected resource groups to do.

I do get that resource consumption begins immediately, but the scheduling model certainly does allow a resource to be occupied for multiple cycles. And the MachineScheduler doesn't seem to care about how it came to be that way -- whether through resource groups, subclasses, or using the same resource twice in the InstRW. What it cares about is the list of resources and cycles in the WriteProcResTable.

The number of cycles reported in WriteProcResTable is not a problem here. In fact, it is actually correct and it should be 2.
For groups, the number of resource cycles reported by WriteProcResTable doesn't necessarily translate to actual latency. Some of (if not all) the resource cycles consumed by a group may often map to the same runtime cycle. That is because each individual unit starts consumption at relative cycle #0. So there is clearly an overlap. It implies that resource cycles for groups can be consumed in parallel.

In WriteProcResTable, group resource cycles are computed by summing all the individual contributions from all the resource units (that, plus any extra cycles explicitly declared for the group). That's how you end up with 2cy for M7UnitVPort, and that is correct.

That is the computation I'm seeing.

If MachineScheduler believes that those 2 resource cycles translates to a 2cy latency, then that's a bug in MachineScheduler.

In your case, group M7UnitVPort is consumed for 2 "resource cycles". However, it doesn't mean that the group will only be available every other cycle. Those two resource cycles are contributed by M7UnitVPort0 and M7UnitVPort1 (one resource cycle each), and the resource unit consumption always happens at relative cycle #0. In reality, those two resource cycles are effectively the same cycle (i.e. units are consumed in parallel for 1cy).

That's what I don't see happening in MachineScheduler. It sees the group as if it were a separate resource which is consumed for two cycles and doesn't try to find parallel units within the group to provide those two cycles of resource consumption. As far as I can tell, the concept of groups does not exist at all in MachineScheduler.

Again, apologies if I still don't get the full picture. But I strongly believe at this point that there might be a wrong assumption in the MachineScheduler on how resource cycles are set for groups.

I think where we're getting to is that MachineScheduler has never been made to work "as expected" with groups. If groups are the accepted way to specify this sort of resource usage, then making MachineScheduler use group information would seem to be preferable to adding a new annotation. However, I do have one reservation. Groups are used fairly widely at present -- I see them in PowerPC, X86, AArch64, and ARM. I am not at all sanguine about changing something which would have such widespread effects without more people chiming in. Any idea who else should be part of this discussion?

When it comes to scheduling models, I personally think that @atrick is the most knowledgeable person. Most of the people interested in scheduling models have been already added in CC. So I don't know who else could be added...

On X86, we don't particularly care so much about how resource bookkeeping is done because we still use the old post-ra-scheduler.
The analysis conducted by that algorithm is purely "latency based"; it only tracks the completion of writes, so the analysis is purely data driven. I may be wrong, but the last time I looked at that algorithm, no particular checks were performed on processor resources. So, I don't think that this issue affects x86 at least.

Speaking about X86 scheduling models, groups are mainly used in two cases:

to model hardware schedulers (each one with its own buffer).
to restrict the set of ports/pipes that can be consumed by instructions.

About 1. Intel (out-of-order) processors (at least those for which there is a scheduling model upstream), tend to implement a single unified scheduler in hardware.
AMD processors on the other hand, tend to use a mixed approach, with multiple schedulers in the Integer unit (typically one scheduler per ALU pipe), and a unified scheduler for the FPU. The FPU is often implemented using a coprocessor model; it is disjoint from the Integer cluster, and it often uses a unified scheduler to serve the underlying pipes.

This is taken from the Haswell model:

// 60 Entry Unified Scheduler
def HWPortAny : ProcResGroup<[HWPort0, HWPort1, HWPort2, HWPort3, HWPort4, HWPort5, HWPort6, HWPort7]> {
  let BufferSize=60;
}

The unified scheduler is basically our HWPortAny, which literally sees ALL the hardware ports. So, any instruction will always consume HWPortAny resource cycles.
There are also other groups, but those are fully contained in HWPortAny, and are only used to restrict which ports are usable for specific instructions (based on their scheduling class).

Since HWPortAny contains all the other groups and all the existing port units, the expectation is that each write will consume HWPortAny resource cycles too.
Actually, the more ports are consumed during a cycle, the higher is the number of HWPortAny resource cycle consumption. To put it in another way, the higher the issue throughput, the higher the number of resource cycles consumed for HWPortAny. It is essentially a sort of measure of throughput. I definitely don't see it as a measure of latency.

You could easily imagine what problems it would cause for X86 models if we started using a scheduling algorithm which doesn't make a distinction between groups and normal units. HWPortAny is just an example of group which would be negatively impacted by that heuristic. HWPortAny is likely to always stall for multiple cycles if issue throughput is more than 1.

In reality, group availability should only depend on whether the contained resource units are available or not.
This is how how I see it. This is also how I have implemented the resource usage in llvm-mca. The ResourceManager component in llvm-mca is responsible for doing the bookkeeping of processor resources; it internally tracks resource consuption. If all the units of a group are consumed during a cycle, then (during that same cycle) the group is unavailable for other instructions.

OK. So X86 isn't a concern because it isn't using MachineScheduler (at least not post-RA).

Indeed, after looking a bit to brainstorm how MachineScheduler would change, I note that only when BufferSize of a resource is 0 does MachineScheduler update any of the resource usage (ReservedCycles). And with a bit more digging, there is currently only one upstream scheduling model which uses both BufferSize = 0 and resource groups -- and the resource group itself doesn't have BufferSize=0. So, a change to MachineScheduler to respect groups when BufferSize = 0 doesn't look like it's likely to cause much disruption.

I'll go try that out and see what happens.

In D94604#2619805, @dpenry wrote:

OK. So X86 isn't a concern because it isn't using MachineScheduler (at least not post-RA).

Indeed, after looking a bit to brainstorm how MachineScheduler would change, I note that only when BufferSize of a resource is 0 does MachineScheduler update any of the resource usage (ReservedCycles). And with a bit more digging, there is currently only one upstream scheduling model which uses both BufferSize = 0 and resource groups -- and the resource group itself doesn't have BufferSize=0. So, a change to MachineScheduler to respect groups when BufferSize = 0 doesn't look like it's likely to cause much disruption.

I'll go try that out and see what happens.

Yeah. For what I remember, BufferSize=0 is kind of special because it is used to simulate in-order units in a otherwise out-of-order backend.
Which - if you think of groups like schedulers - kinda makes sense. A scheduler with no buffer is forced to immediately issue instructions at dispatch time. So, dispatching is effectively equivalent to issuing instructions. I had to specially support that in llvm-mca (it gave me some headhaches).

There should be a nice comment about it in TargetSchedule.td

dpenry mentioned this in D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.Mar 19 2021, 11:54 AM

Abandoned because was replaced by D98976

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineScheduler.h

9 lines

MC/

MCSchedule.h

4 lines

Target/

TargetSchedule.td

9 lines

lib/

CodeGen/

MachinePipeliner.cpp

6 lines

MachineScheduler.cpp

83 lines

MachineTraceMetrics.cpp

6 lines

MC/

MCSchedule.cpp

2 lines

utils/

TableGen/

SubtargetEmitter.cpp

43 lines

Diff 316402

llvm/include/llvm/CodeGen/MachineScheduler.h

Show First 20 Lines • Show All 745 Lines • ▼ Show 20 Lines	public:

/// Get the difference between the given SUnit's ready time and the current		/// Get the difference between the given SUnit's ready time and the current
/// cycle.		/// cycle.
unsigned getLatencyStallCycles(SUnit *SU);		unsigned getLatencyStallCycles(SUnit *SU);

unsigned getNextResourceCycleByInstance(unsigned InstanceIndex,		unsigned getNextResourceCycleByInstance(unsigned InstanceIndex,
unsigned Cycles);		unsigned Cycles);

std::pair<unsigned, unsigned> getNextResourceCycle(unsigned PIdx,		typedef SmallVector<std::pair<unsigned, unsigned>, 8> NextResourceCycles;
unsigned Cycles);
		NextResourceCycles getNextResourceCycle(unsigned PIdx, unsigned Cycles,
		unsigned Uses);

bool checkHazard(SUnit *SU);		bool checkHazard(SUnit *SU);

unsigned findMaxLatency(ArrayRef<SUnit*> ReadySUs);		unsigned findMaxLatency(ArrayRef<SUnit*> ReadySUs);

unsigned getOtherResourceCount(unsigned &OtherCritIdx);		unsigned getOtherResourceCount(unsigned &OtherCritIdx);

/// Release SU to make it ready. If it's not in hazard, remove it from		/// Release SU to make it ready. If it's not in hazard, remove it from
/// pending queue (if already in) and push into available queue.		/// pending queue (if already in) and push into available queue.
/// Otherwise, push the SU into pending queue.		/// Otherwise, push the SU into pending queue.
///		///
/// @param SU The unit to be released.		/// @param SU The unit to be released.
/// @param ReadyCycle Until which cycle the unit is ready.		/// @param ReadyCycle Until which cycle the unit is ready.
/// @param InPQueue Whether SU is already in pending queue.		/// @param InPQueue Whether SU is already in pending queue.
/// @param Idx Position offset in pending queue (if in it).		/// @param Idx Position offset in pending queue (if in it).
void releaseNode(SUnit *SU, unsigned ReadyCycle, bool InPQueue,		void releaseNode(SUnit *SU, unsigned ReadyCycle, bool InPQueue,
unsigned Idx = 0);		unsigned Idx = 0);

void bumpCycle(unsigned NextCycle);		void bumpCycle(unsigned NextCycle);

void incExecutedResources(unsigned PIdx, unsigned Count);		void incExecutedResources(unsigned PIdx, unsigned Count);

unsigned countResource(unsigned PIdx, unsigned Cycles, unsigned ReadyCycle);		unsigned countResource(unsigned PIdx, unsigned Cycles, unsigned Uses,
		unsigned ReadyCycle);

void bumpNode(SUnit *SU);		void bumpNode(SUnit *SU);

void releasePending();		void releasePending();

void removeReady(SUnit *SU);		void removeReady(SUnit *SU);

/// Call this before applying any other heuristics to the Available queue.		/// Call this before applying any other heuristics to the Available queue.
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/include/llvm/MC/MCSchedule.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	struct MCProcResourceDesc {
}		}
};		};

/// Identify one of the processor resource kinds consumed by a particular		/// Identify one of the processor resource kinds consumed by a particular
/// scheduling class for the specified number of cycles.		/// scheduling class for the specified number of cycles.
struct MCWriteProcResEntry {		struct MCWriteProcResEntry {
uint16_t ProcResourceIdx;		uint16_t ProcResourceIdx;
uint16_t Cycles;		uint16_t Cycles;
		uint16_t Uses;

bool operator==(const MCWriteProcResEntry &Other) const {		bool operator==(const MCWriteProcResEntry &Other) const {
return ProcResourceIdx == Other.ProcResourceIdx && Cycles == Other.Cycles;		return ProcResourceIdx == Other.ProcResourceIdx && Cycles == Other.Cycles &&
		Uses == Other.Uses;
}		}
};		};

/// Specify the latency in cpu cycles for a particular scheduling class and def		/// Specify the latency in cpu cycles for a particular scheduling class and def
/// index. -1 indicates an invalid latency. Heuristics would typically consider		/// index. -1 indicates an invalid latency. Heuristics would typically consider
/// an instruction with invalid latency to have infinite latency. Also identify		/// an instruction with invalid latency to have infinite latency. Also identify
/// the WriteResources of this def. When the operand expands to a sequence of		/// the WriteResources of this def. When the operand expands to a sequence of
/// writes, this ID is the last write in the sequence.		/// writes, this ID is the last write in the sequence.
▲ Show 20 Lines • Show All 308 Lines • Show Last 20 Lines

llvm/include/llvm/Target/TargetSchedule.td

	Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	}			}

	// Define values common to WriteRes and SchedWriteRes.			// Define values common to WriteRes and SchedWriteRes.
	//			//
	// SchedModel ties these resources to a processor.			// SchedModel ties these resources to a processor.
	class ProcWriteResources<list<ProcResourceKind> resources> {			class ProcWriteResources<list<ProcResourceKind> resources> {
	list<ProcResourceKind> ProcResources = resources;			list<ProcResourceKind> ProcResources = resources;
	list<int> ResourceCycles = [];			list<int> ResourceCycles = [];
				list<int> ResourceUses = [];
	int Latency = 1;			int Latency = 1;
	int NumMicroOps = 1;			int NumMicroOps = 1;
	bit BeginGroup = false;			bit BeginGroup = false;
	bit EndGroup = false;			bit EndGroup = false;
	// Allow a processor to mark some scheduling classes as unsupported			// Allow a processor to mark some scheduling classes as unsupported
	// for stronger verification.			// for stronger verification.
	bit Unsupported = false;			bit Unsupported = false;
	// Allow a processor to mark some scheduling classes as single-issue.			// Allow a processor to mark some scheduling classes as single-issue.
	Show All 16 Lines
	// ProcResources indicates the set of resources consumed by the write.			// ProcResources indicates the set of resources consumed by the write.
	// Optionally, ResourceCycles indicates the number of cycles the			// Optionally, ResourceCycles indicates the number of cycles the
	// resource is consumed. Each ResourceCycles item is paired with the			// resource is consumed. Each ResourceCycles item is paired with the
	// ProcResource item at the same position in its list. ResourceCycles			// ProcResource item at the same position in its list. ResourceCycles
	// can be `[]`: in that case, all resources are consumed for a single			// can be `[]`: in that case, all resources are consumed for a single
	// cycle, regardless of latency, which models a fully pipelined processing			// cycle, regardless of latency, which models a fully pipelined processing
	// unit. A value of 0 for ResourceCycles means that the resource must			// unit. A value of 0 for ResourceCycles means that the resource must
	// be available but is not consumed, which is only relevant for			// be available but is not consumed, which is only relevant for
	// unbuffered resources.			// unbuffered resources. Optionally, ResourceUses indicates the number of
				// copies of the resource which are consumed. Each ResourceUses item is
				// paired with the ProcResource item at the same position in its list.
				// ResourceUses can be `[]`: in that case, a single resource is consumed.
				// ResourceUses and ResourceCycles can be used together: in that case, the
				// number of copies indicated by ResourceUses are consumed for the number of
				// cycles indicated by ResourceCycles.
	//			//
	// By default, each SchedWrite takes one micro-op, which is counted			// By default, each SchedWrite takes one micro-op, which is counted
	// against the processor's IssueWidth limit. If an instruction can			// against the processor's IssueWidth limit. If an instruction can
	// write multiple registers with a single micro-op, the subtarget			// write multiple registers with a single micro-op, the subtarget
	// should define one of the writes to be zero micro-ops. If a			// should define one of the writes to be zero micro-ops. If a
	// subtarget requires multiple micro-ops to write a single result, it			// subtarget requires multiple micro-ops to write a single result, it
	// should either override the write's NumMicroOps to be greater than 1			// should either override the write's NumMicroOps to be greater than 1
	// or require additional writes. Extra writes can be required either			// or require additional writes. Extra writes can be required either
	▲ Show 20 Lines • Show All 277 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachinePipeliner.cpp

Show First 20 Lines • Show All 1,061 Lines • ▼ Show 20 Lines	if (STI && STI->getSchedModel().hasInstrSchedModel()) {
// Pseudo/PostRAPseudo		// Pseudo/PostRAPseudo
return;		return;

for (const MCWriteProcResEntry &PRE :		for (const MCWriteProcResEntry &PRE :
make_range(STI->getWriteProcResBegin(SCDesc),		make_range(STI->getWriteProcResBegin(SCDesc),
STI->getWriteProcResEnd(SCDesc))) {		STI->getWriteProcResEnd(SCDesc))) {
if (!PRE.Cycles)		if (!PRE.Cycles)
continue;		continue;
Resources[PRE.ProcResourceIdx]++;		Resources[PRE.ProcResourceIdx] += PRE.Uses;
}		}
return;		return;
}		}
llvm_unreachable("Should have non-empty InstrItins or hasInstrSchedModel!");		llvm_unreachable("Should have non-empty InstrItins or hasInstrSchedModel!");
}		}

/// Return true if IS1 has less priority than IS2.		/// Return true if IS1 has less priority than IS2.
bool operator()(const MachineInstr IS1, const MachineInstr IS2) const {		bool operator()(const MachineInstr IS1, const MachineInstr IS2) const {
▲ Show 20 Lines • Show All 1,982 Lines • ▼ Show 20 Lines	for (; I != E; ++I) {
unsigned NumUnits = ProcResource->NumUnits;		unsigned NumUnits = ProcResource->NumUnits;
LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource)		if (SwpDebugResource)
dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",		dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",
ProcResource->Name, I->ProcResourceIdx,		ProcResource->Name, I->ProcResourceIdx,
ProcResourceCount[I->ProcResourceIdx], NumUnits,		ProcResourceCount[I->ProcResourceIdx], NumUnits,
I->Cycles);		I->Cycles);
});		});
if (ProcResourceCount[I->ProcResourceIdx] >= NumUnits)		if (ProcResourceCount[I->ProcResourceIdx] + I->Uses > NumUnits)
return false;		return false;
}		}
LLVM_DEBUG(if (SwpDebugResource) dbgs() << "return true\n\n";);		LLVM_DEBUG(if (SwpDebugResource) dbgs() << "return true\n\n";);
return true;		return true;
}		}

void ResourceManager::reserveResources(const MCInstrDesc *MID) {		void ResourceManager::reserveResources(const MCInstrDesc *MID) {
LLVM_DEBUG({		LLVM_DEBUG({
Show All 12 Lines	if (!SCDesc->isValid()) {
});		});
return;		return;
}		}
for (const MCWriteProcResEntry &PRE :		for (const MCWriteProcResEntry &PRE :
make_range(STI->getWriteProcResBegin(SCDesc),		make_range(STI->getWriteProcResBegin(SCDesc),
STI->getWriteProcResEnd(SCDesc))) {		STI->getWriteProcResEnd(SCDesc))) {
if (!PRE.Cycles)		if (!PRE.Cycles)
continue;		continue;
++ProcResourceCount[PRE.ProcResourceIdx];		ProcResourceCount[PRE.ProcResourceIdx] += PRE.Uses;
LLVM_DEBUG({		LLVM_DEBUG({
if (SwpDebugResource) {		if (SwpDebugResource) {
const MCProcResourceDesc *ProcResource =		const MCProcResourceDesc *ProcResource =
SM.getProcResource(PRE.ProcResourceIdx);		SM.getProcResource(PRE.ProcResourceIdx);
dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",		dbgs() << format(" %16s(%2d): Count: %2d, NumUnits:%2d, Cycles:%2d\n",
ProcResource->Name, PRE.ProcResourceIdx,		ProcResource->Name, PRE.ProcResourceIdx,
ProcResourceCount[PRE.ProcResourceIdx],		ProcResourceCount[PRE.ProcResourceIdx],
ProcResource->NumUnits, PRE.Cycles);		ProcResource->NumUnits, PRE.Cycles);
Show All 22 Lines

llvm/lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 2,026 Lines • ▼ Show 20 Lines	for (SUnit &SU : DAG->SUnits) {
const MCSchedClassDesc *SC = DAG->getSchedClass(&SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(&SU);
RemIssueCount += SchedModel->getNumMicroOps(SU.getInstr(), SC)		RemIssueCount += SchedModel->getNumMicroOps(SU.getInstr(), SC)
* SchedModel->getMicroOpFactor();		* SchedModel->getMicroOpFactor();
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel->getWriteProcResBegin(SC),		PI = SchedModel->getWriteProcResBegin(SC),
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
unsigned PIdx = PI->ProcResourceIdx;		unsigned PIdx = PI->ProcResourceIdx;
unsigned Factor = SchedModel->getResourceFactor(PIdx);		unsigned Factor = SchedModel->getResourceFactor(PIdx);
RemainingCounts[PIdx] += (Factor * PI->Cycles);		RemainingCounts[PIdx] += (Factor * PI->Cycles * PI->Uses);
}		}
}		}
}		}

void SchedBoundary::		void SchedBoundary::
init(ScheduleDAGMI dag, const TargetSchedModel smodel, SchedRemainder *rem) {		init(ScheduleDAGMI dag, const TargetSchedModel smodel, SchedRemainder *rem) {
reset();		reset();
DAG = dag;		DAG = dag;
Show All 40 Lines	unsigned SchedBoundary::getNextResourceCycleByInstance(unsigned InstanceIdx,
if (NextUnreserved == InvalidCycle)		if (NextUnreserved == InvalidCycle)
return 0;		return 0;
// For bottom-up scheduling add the cycles needed for the current operation.		// For bottom-up scheduling add the cycles needed for the current operation.
if (!isTop())		if (!isTop())
NextUnreserved += Cycles;		NextUnreserved += Cycles;
return NextUnreserved;		return NextUnreserved;
}		}

/// Compute the next cycle at which the given processor resource can be		/// Compute the next cycle at which the given processor resources can be
/// scheduled. Returns the next cycle and the index of the processor resource		/// scheduled. Returns the next cycle and the indices of the processor resource
/// instance in the reserved cycles vector.		/// instances in the reserved cycles vector.
std::pair<unsigned, unsigned>		SchedBoundary::NextResourceCycles
SchedBoundary::getNextResourceCycle(unsigned PIdx, unsigned Cycles) {		SchedBoundary::getNextResourceCycle(unsigned PIdx, unsigned Cycles,
unsigned MinNextUnreserved = InvalidCycle;		unsigned Uses) {
unsigned InstanceIdx = 0;		NextResourceCycles InstanceIdx;
unsigned StartIndex = ReservedCyclesIndex[PIdx];		unsigned StartIndex = ReservedCyclesIndex[PIdx];
unsigned NumberOfInstances = SchedModel->getProcResource(PIdx)->NumUnits;		unsigned NumberOfInstances = SchedModel->getProcResource(PIdx)->NumUnits;
assert(NumberOfInstances > 0 &&		assert(NumberOfInstances > 0 &&
"Cannot have zero instances of a ProcResource");		"Cannot have zero instances of a ProcResource");

		if (Uses == 1) {
		InstanceIdx.emplace_back(InvalidCycle, 0);
		for (unsigned U = 0; U < Uses; ++U) {
		for (unsigned I = StartIndex, End = StartIndex + NumberOfInstances;
		I < End; ++I) {
		unsigned NextUnreserved = getNextResourceCycleByInstance(I, Cycles);
		if (InstanceIdx.back().first > NextUnreserved) {
		InstanceIdx.back().second = I;
		InstanceIdx.back().first = NextUnreserved;
		}
		}
		}
		} else {
		SmallVector<std::pair<unsigned, unsigned>, 8> NextResCycles;
for (unsigned I = StartIndex, End = StartIndex + NumberOfInstances; I < End;		for (unsigned I = StartIndex, End = StartIndex + NumberOfInstances; I < End;
++I) {		++I) {
unsigned NextUnreserved = getNextResourceCycleByInstance(I, Cycles);		unsigned NextUnreserved = getNextResourceCycleByInstance(I, Cycles);
if (MinNextUnreserved > NextUnreserved) {		InstanceIdx.emplace_back(NextUnreserved, I);
InstanceIdx = I;
MinNextUnreserved = NextUnreserved;
}		}
		// Find the earliest (Uses) resource instances
		std::nth_element(InstanceIdx.begin(), InstanceIdx.begin() + (Uses - 1),
		InstanceIdx.end());
		InstanceIdx.resize(Uses);
		// Leaves the element with the largest NextResourceCycle as the last one
}		}
return std::make_pair(MinNextUnreserved, InstanceIdx);		return InstanceIdx;
}		}

/// Does this SU have a hazard within the current instruction group.		/// Does this SU have a hazard within the current instruction group.
///		///
/// The scheduler supports two modes of hazard recognition. The first is the		/// The scheduler supports two modes of hazard recognition. The first is the
/// ScheduleHazardRecognizer API. It is a fully general hazard recognizer that		/// ScheduleHazardRecognizer API. It is a fully general hazard recognizer that
/// supports highly complicated in-order reservation tables		/// supports highly complicated in-order reservation tables
/// (ScoreboardHazardRecognizer) and arbitrary target-specific logic.		/// (ScoreboardHazardRecognizer) and arbitrary target-specific logic.
Show All 27 Lines	bool SchedBoundary::checkHazard(SUnit *SU) {

if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {		if (SchedModel->hasInstrSchedModel() && SU->hasReservedResource) {
const MCSchedClassDesc *SC = DAG->getSchedClass(SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(SU);
for (const MCWriteProcResEntry &PE :		for (const MCWriteProcResEntry &PE :
make_range(SchedModel->getWriteProcResBegin(SC),		make_range(SchedModel->getWriteProcResBegin(SC),
SchedModel->getWriteProcResEnd(SC))) {		SchedModel->getWriteProcResEnd(SC))) {
unsigned ResIdx = PE.ProcResourceIdx;		unsigned ResIdx = PE.ProcResourceIdx;
unsigned Cycles = PE.Cycles;		unsigned Cycles = PE.Cycles;
unsigned NRCycle, InstanceIdx;		unsigned Uses = PE.Uses;
std::tie(NRCycle, InstanceIdx) = getNextResourceCycle(ResIdx, Cycles);		NextResourceCycles NRCycles = getNextResourceCycle(ResIdx, Cycles, Uses);
		unsigned NRCycle = NRCycles.back().first;
		unsigned InstanceIdx = NRCycles.back().second;
if (NRCycle > CurrCycle) {		if (NRCycle > CurrCycle) {
#ifndef NDEBUG		#ifndef NDEBUG
MaxObservedStall = std::max(Cycles, MaxObservedStall);		MaxObservedStall = std::max(Cycles, MaxObservedStall);
#endif		#endif
LLVM_DEBUG(dbgs() << " SU(" << SU->NodeNum << ") "		LLVM_DEBUG(dbgs() << " SU(" << SU->NodeNum << ") "
<< SchedModel->getResourceName(ResIdx)		<< SchedModel->getResourceName(ResIdx)
<< '[' << InstanceIdx - ReservedCyclesIndex[ResIdx] << ']'		<< '[' << InstanceIdx - ReservedCyclesIndex[ResIdx] << ']'
<< "=" << NRCycle << "c\n");		<< "=" << NRCycle << "c\n");
▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines

/// Add the given processor resource to this scheduled zone.		/// Add the given processor resource to this scheduled zone.
///		///
/// \param Cycles indicates the number of consecutive (non-pipelined) cycles		/// \param Cycles indicates the number of consecutive (non-pipelined) cycles
/// during which this resource is consumed.		/// during which this resource is consumed.
///		///
/// \return the next cycle at which the instruction may execute without		/// \return the next cycle at which the instruction may execute without
/// oversubscribing resources.		/// oversubscribing resources.
unsigned SchedBoundary::		unsigned SchedBoundary::countResource(unsigned PIdx, unsigned Cycles,
countResource(unsigned PIdx, unsigned Cycles, unsigned NextCycle) {		unsigned Uses, unsigned NextCycle) {
unsigned Factor = SchedModel->getResourceFactor(PIdx);		unsigned Factor = SchedModel->getResourceFactor(PIdx);
unsigned Count = Factor * Cycles;		unsigned Count = Factor * Cycles;
LLVM_DEBUG(dbgs() << " " << SchedModel->getResourceName(PIdx) << " +"		LLVM_DEBUG(dbgs() << " " << SchedModel->getResourceName(PIdx) << " +"
<< Cycles << "x" << Factor << "u\n");		<< Cycles << "x" << Factor << "u\n");

// Update Executed resources counts.		// Update Executed resources counts.
incExecutedResources(PIdx, Count);		incExecutedResources(PIdx, Count);
assert(Rem->RemainingCounts[PIdx] >= Count && "resource double counted");		assert(Rem->RemainingCounts[PIdx] >= Count && "resource double counted");
Rem->RemainingCounts[PIdx] -= Count;		Rem->RemainingCounts[PIdx] -= Count;

// Check if this resource exceeds the current critical resource. If so, it		// Check if this resource exceeds the current critical resource. If so, it
// becomes the critical resource.		// becomes the critical resource.
if (ZoneCritResIdx != PIdx && (getResourceCount(PIdx) > getCriticalCount())) {		if (ZoneCritResIdx != PIdx && (getResourceCount(PIdx) > getCriticalCount())) {
ZoneCritResIdx = PIdx;		ZoneCritResIdx = PIdx;
LLVM_DEBUG(dbgs() << " *** Critical resource "		LLVM_DEBUG(dbgs() << " *** Critical resource "
<< SchedModel->getResourceName(PIdx) << ": "		<< SchedModel->getResourceName(PIdx) << ": "
<< getResourceCount(PIdx) / SchedModel->getLatencyFactor()		<< getResourceCount(PIdx) / SchedModel->getLatencyFactor()
<< "c\n");		<< "c\n");
}		}
// For reserved resources, record the highest cycle using the resource.		// For reserved resources, record the highest cycle using the resource.
unsigned NextAvailable, InstanceIdx;		NextResourceCycles NRCycles = getNextResourceCycle(PIdx, Cycles, Uses);
std::tie(NextAvailable, InstanceIdx) = getNextResourceCycle(PIdx, Cycles);		unsigned NextAvailable = NRCycles.back().first;
		unsigned InstanceIdx = NRCycles.back().second;
if (NextAvailable > CurrCycle) {		if (NextAvailable > CurrCycle) {
LLVM_DEBUG(dbgs() << " Resource conflict: "		LLVM_DEBUG(dbgs() << " Resource conflict: "
<< SchedModel->getResourceName(PIdx)		<< SchedModel->getResourceName(PIdx)
<< '[' << InstanceIdx - ReservedCyclesIndex[PIdx] << ']'		<< '[' << InstanceIdx - ReservedCyclesIndex[PIdx] << ']'
<< " reserved until @" << NextAvailable << "\n");		<< " reserved until @" << NextAvailable << "\n");
}		}
return NextAvailable;		return NextAvailable;
}		}
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	if (ZoneCritResIdx) {
<< ScaledMOps / SchedModel->getLatencyFactor()		<< ScaledMOps / SchedModel->getLatencyFactor()
<< "c\n");		<< "c\n");
}		}
}		}
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel->getWriteProcResBegin(SC),		PI = SchedModel->getWriteProcResBegin(SC),
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
unsigned RCycle =		unsigned RCycle =
countResource(PI->ProcResourceIdx, PI->Cycles, NextCycle);		countResource(PI->ProcResourceIdx, PI->Cycles, PI->Uses, NextCycle);
if (RCycle > NextCycle)		if (RCycle > NextCycle)
NextCycle = RCycle;		NextCycle = RCycle;
}		}
if (SU->hasReservedResource) {		if (SU->hasReservedResource) {
// For reserved resources, record the highest cycle using the resource.		// For reserved resources, record the highest cycle using the resource.
// For top-down scheduling, this is the cycle in which we schedule this		// For top-down scheduling, this is the cycle in which we schedule this
// instruction plus the number of cycles the operations reserves the		// instruction plus the number of cycles the operations reserves the
// resource. For bottom-up is it simply the instruction's cycle.		// resource. For bottom-up is it simply the instruction's cycle.
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel->getWriteProcResBegin(SC),		PI = SchedModel->getWriteProcResBegin(SC),
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
unsigned PIdx = PI->ProcResourceIdx;		unsigned PIdx = PI->ProcResourceIdx;
if (SchedModel->getProcResource(PIdx)->BufferSize == 0) {		if (SchedModel->getProcResource(PIdx)->BufferSize == 0) {
unsigned ReservedUntil, InstanceIdx;		NextResourceCycles NRCycles = getNextResourceCycle(PIdx, 0, PI->Uses);
std::tie(ReservedUntil, InstanceIdx) = getNextResourceCycle(PIdx, 0);		unsigned ReservedUntil = NRCycles.back().first;
		for (auto &NRC : NRCycles)
if (isTop()) {		if (isTop()) {
ReservedCycles[InstanceIdx] =		ReservedCycles[NRC.second] =
std::max(ReservedUntil, NextCycle + PI->Cycles);		std::max(ReservedUntil, NextCycle + PI->Cycles);
} else		} else
ReservedCycles[InstanceIdx] = NextCycle;		ReservedCycles[NRC.second] = NextCycle;
}		}
}		}
}		}
}		}
// Update ExpectedLatency and DependentLatency.		// Update ExpectedLatency and DependentLatency.
unsigned &TopLatency = isTop() ? ExpectedLatency : DependentLatency;		unsigned &TopLatency = isTop() ? ExpectedLatency : DependentLatency;
unsigned &BotLatency = isTop() ? DependentLatency : ExpectedLatency;		unsigned &BotLatency = isTop() ? DependentLatency : ExpectedLatency;
if (SU->getDepth() > TopLatency) {		if (SU->getDepth() > TopLatency) {
▲ Show 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	initResourceDelta(const ScheduleDAGMI *DAG,
if (!Policy.ReduceResIdx && !Policy.DemandResIdx)		if (!Policy.ReduceResIdx && !Policy.DemandResIdx)
return;		return;

const MCSchedClassDesc *SC = DAG->getSchedClass(SU);		const MCSchedClassDesc *SC = DAG->getSchedClass(SU);
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel->getWriteProcResBegin(SC),		PI = SchedModel->getWriteProcResBegin(SC),
PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel->getWriteProcResEnd(SC); PI != PE; ++PI) {
if (PI->ProcResourceIdx == Policy.ReduceResIdx)		if (PI->ProcResourceIdx == Policy.ReduceResIdx)
ResDelta.CritResources += PI->Cycles;		ResDelta.CritResources += PI->Cycles * PI->Uses;
if (PI->ProcResourceIdx == Policy.DemandResIdx)		if (PI->ProcResourceIdx == Policy.DemandResIdx)
ResDelta.DemandedResources += PI->Cycles;		ResDelta.DemandedResources += PI->Cycles * PI->Uses;
}		}
}		}

/// Compute remaining latency. We need this both to determine whether the		/// Compute remaining latency. We need this both to determine whether the
/// overall schedule has become latency-limited and whether the instructions		/// overall schedule has become latency-limited and whether the instructions
/// outside this zone are resource or latency limited.		/// outside this zone are resource or latency limited.
///		///
/// The "dependent" latency is updated incrementally during scheduling as the		/// The "dependent" latency is updated incrementally during scheduling as the
▲ Show 20 Lines • Show All 1,300 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineTraceMetrics.cpp

Show First 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	for (const auto &MI : *MBB) {
const MCSchedClassDesc *SC = SchedModel.resolveSchedClass(&MI);		const MCSchedClassDesc *SC = SchedModel.resolveSchedClass(&MI);
if (!SC->isValid())		if (!SC->isValid())
continue;		continue;

for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = SchedModel.getWriteProcResBegin(SC),		PI = SchedModel.getWriteProcResBegin(SC),
PE = SchedModel.getWriteProcResEnd(SC); PI != PE; ++PI) {		PE = SchedModel.getWriteProcResEnd(SC); PI != PE; ++PI) {
assert(PI->ProcResourceIdx < PRKinds && "Bad processor resource kind");		assert(PI->ProcResourceIdx < PRKinds && "Bad processor resource kind");
PRCycles[PI->ProcResourceIdx] += PI->Cycles;		PRCycles[PI->ProcResourceIdx] += PI->Cycles * PI->Uses;
}		}
}		}
FBI->InstrCount = InstrCount;		FBI->InstrCount = InstrCount;

// Scale the resource cycles so they are comparable.		// Scale the resource cycles so they are comparable.
unsigned PROffset = MBB->getNumber() * PRKinds;		unsigned PROffset = MBB->getNumber() * PRKinds;
for (unsigned K = 0; K != PRKinds; ++K)		for (unsigned K = 0; K != PRKinds; ++K)
ProcResourceCycles[PROffset + K] =		ProcResourceCycles[PROffset + K] =
▲ Show 20 Lines • Show All 1,100 Lines • ▼ Show 20 Lines	for (const MCSchedClassDesc *SC : Instrs) {
if (!SC->isValid())		if (!SC->isValid())
continue;		continue;
for (TargetSchedModel::ProcResIter		for (TargetSchedModel::ProcResIter
PI = TE.MTM.SchedModel.getWriteProcResBegin(SC),		PI = TE.MTM.SchedModel.getWriteProcResBegin(SC),
PE = TE.MTM.SchedModel.getWriteProcResEnd(SC);		PE = TE.MTM.SchedModel.getWriteProcResEnd(SC);
PI != PE; ++PI) {		PI != PE; ++PI) {
if (PI->ProcResourceIdx != ResourceIdx)		if (PI->ProcResourceIdx != ResourceIdx)
continue;		continue;
Cycles +=		Cycles += (PI->Cycles * PI->Uses *
(PI->Cycles * TE.MTM.SchedModel.getResourceFactor(ResourceIdx));		TE.MTM.SchedModel.getResourceFactor(ResourceIdx));
}		}
}		}
return Cycles;		return Cycles;
};		};

for (unsigned K = 0; K != PRDepths.size(); ++K) {		for (unsigned K = 0; K != PRDepths.size(); ++K) {
unsigned PRCycles = PRDepths[K] + PRHeights[K];		unsigned PRCycles = PRDepths[K] + PRHeights[K];
for (const MachineBasicBlock *MBB : Extrablocks)		for (const MachineBasicBlock *MBB : Extrablocks)
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/lib/MC/MCSchedule.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	MCSchedModel::getReciprocalThroughput(const MCSubtargetInfo &STI,
Optional<double> Throughput;		Optional<double> Throughput;
const MCSchedModel &SM = STI.getSchedModel();		const MCSchedModel &SM = STI.getSchedModel();
const MCWriteProcResEntry *I = STI.getWriteProcResBegin(&SCDesc);		const MCWriteProcResEntry *I = STI.getWriteProcResBegin(&SCDesc);
const MCWriteProcResEntry *E = STI.getWriteProcResEnd(&SCDesc);		const MCWriteProcResEntry *E = STI.getWriteProcResEnd(&SCDesc);
for (; I != E; ++I) {		for (; I != E; ++I) {
if (!I->Cycles)		if (!I->Cycles)
continue;		continue;
unsigned NumUnits = SM.getProcResource(I->ProcResourceIdx)->NumUnits;		unsigned NumUnits = SM.getProcResource(I->ProcResourceIdx)->NumUnits;
double Temp = NumUnits * 1.0 / I->Cycles;		double Temp = NumUnits * 1.0 / (I->Cycles * I->Uses);
Throughput = Throughput ? std::min(Throughput.getValue(), Temp) : Temp;		Throughput = Throughput ? std::min(Throughput.getValue(), Temp) : Temp;
}		}
if (Throughput.hasValue())		if (Throughput.hasValue())
return 1.0 / Throughput.getValue();		return 1.0 / Throughput.getValue();

// If no throughput value was calculated, assume that we can execute at the		// If no throughput value was calculated, assume that we can execute at the
// maximum issue width scaled by number of micro-ops for the schedule class.		// maximum issue width scaled by number of micro-ops for the schedule class.
return ((double)SCDesc.NumMicroOps) / SM.IssueWidth;		return ((double)SCDesc.NumMicroOps) / SM.IssueWidth;
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/utils/TableGen/SubtargetEmitter.cpp

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	void EmitProcessorResourceSubUnits(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
void EmitProcessorResources(const CodeGenProcModel &ProcModel,		void EmitProcessorResources(const CodeGenProcModel &ProcModel,
raw_ostream &OS);		raw_ostream &OS);
Record *FindWriteResources(const CodeGenSchedRW &SchedWrite,		Record *FindWriteResources(const CodeGenSchedRW &SchedWrite,
const CodeGenProcModel &ProcModel);		const CodeGenProcModel &ProcModel);
Record *FindReadAdvance(const CodeGenSchedRW &SchedRead,		Record *FindReadAdvance(const CodeGenSchedRW &SchedRead,
const CodeGenProcModel &ProcModel);		const CodeGenProcModel &ProcModel);
void ExpandProcResources(RecVec &PRVec, std::vector<int64_t> &Cycles,		void ExpandProcResources(RecVec &PRVec, std::vector<int64_t> &Cycles,
		std::vector<int64_t> &Uses,
const CodeGenProcModel &ProcModel);		const CodeGenProcModel &ProcModel);
void GenSchedClassTables(const CodeGenProcModel &ProcModel,		void GenSchedClassTables(const CodeGenProcModel &ProcModel,
SchedClassTables &SchedTables);		SchedClassTables &SchedTables);
void EmitSchedClassTables(SchedClassTables &SchedTables, raw_ostream &OS);		void EmitSchedClassTables(SchedClassTables &SchedTables, raw_ostream &OS);
void EmitProcessorModels(raw_ostream &OS);		void EmitProcessorModels(raw_ostream &OS);
void EmitSchedModelHelpers(const std::string &ClassName, raw_ostream &OS);		void EmitSchedModelHelpers(const std::string &ClassName, raw_ostream &OS);
void emitSchedModelHelpersImpl(raw_ostream &OS,		void emitSchedModelHelpersImpl(raw_ostream &OS,
bool OnlyExpandMCInstPredicates = false);		bool OnlyExpandMCInstPredicates = false);
▲ Show 20 Lines • Show All 812 Lines • ▼ Show 20 Lines	Record *SubtargetEmitter::FindReadAdvance(const CodeGenSchedRW &SchedRead,
}		}
return ResDef;		return ResDef;
}		}

// Expand an explicit list of processor resources into a full list of implied		// Expand an explicit list of processor resources into a full list of implied
// resource groups and super resources that cover them.		// resource groups and super resources that cover them.
void SubtargetEmitter::ExpandProcResources(RecVec &PRVec,		void SubtargetEmitter::ExpandProcResources(RecVec &PRVec,
std::vector<int64_t> &Cycles,		std::vector<int64_t> &Cycles,
		std::vector<int64_t> &Uses,
const CodeGenProcModel &PM) {		const CodeGenProcModel &PM) {
assert(PRVec.size() == Cycles.size() && "failed precondition");		assert(PRVec.size() == Cycles.size() && "failed precondition");
		assert(PRVec.size() == Uses.size() && "failed precondition");
for (unsigned i = 0, e = PRVec.size(); i != e; ++i) {		for (unsigned i = 0, e = PRVec.size(); i != e; ++i) {
Record *PRDef = PRVec[i];		Record *PRDef = PRVec[i];
RecVec SubResources;		RecVec SubResources;
if (PRDef->isSubClassOf("ProcResGroup"))		if (PRDef->isSubClassOf("ProcResGroup"))
SubResources = PRDef->getValueAsListOfDefs("Resources");		SubResources = PRDef->getValueAsListOfDefs("Resources");
else {		else {
SubResources.push_back(PRDef);		SubResources.push_back(PRDef);
PRDef = SchedModels.findProcResUnits(PRDef, PM, PRDef->getLoc());		PRDef = SchedModels.findProcResUnits(PRDef, PM, PRDef->getLoc());
for (Record *SubDef = PRDef;		for (Record *SubDef = PRDef;
SubDef->getValueInit("Super")->isComplete();) {		SubDef->getValueInit("Super")->isComplete();) {
if (SubDef->isSubClassOf("ProcResGroup")) {		if (SubDef->isSubClassOf("ProcResGroup")) {
// Disallow this for simplicitly.		// Disallow this for simplicitly.
PrintFatalError(SubDef->getLoc(), "Processor resource group "		PrintFatalError(SubDef->getLoc(), "Processor resource group "
" cannot be a super resources.");		" cannot be a super resources.");
}		}
Record *SuperDef =		Record *SuperDef =
SchedModels.findProcResUnits(SubDef->getValueAsDef("Super"), PM,		SchedModels.findProcResUnits(SubDef->getValueAsDef("Super"), PM,
SubDef->getLoc());		SubDef->getLoc());
PRVec.push_back(SuperDef);		PRVec.push_back(SuperDef);
Cycles.push_back(Cycles[i]);		Cycles.push_back(Cycles[i]);
		Uses.push_back(Uses[i]);
SubDef = SuperDef;		SubDef = SuperDef;
}		}
}		}
for (Record *PR : PM.ProcResourceDefs) {		for (Record *PR : PM.ProcResourceDefs) {
if (PR == PRDef \|\| !PR->isSubClassOf("ProcResGroup"))		if (PR == PRDef \|\| !PR->isSubClassOf("ProcResGroup"))
continue;		continue;
RecVec SuperResources = PR->getValueAsListOfDefs("Resources");		RecVec SuperResources = PR->getValueAsListOfDefs("Resources");
RecIter SubI = SubResources.begin(), SubE = SubResources.end();		RecIter SubI = SubResources.begin(), SubE = SubResources.end();
for( ; SubI != SubE; ++SubI) {		for( ; SubI != SubE; ++SubI) {
if (!is_contained(SuperResources, *SubI)) {		if (!is_contained(SuperResources, *SubI)) {
break;		break;
}		}
}		}
if (SubI == SubE) {		if (SubI == SubE) {
PRVec.push_back(PR);		PRVec.push_back(PR);
Cycles.push_back(Cycles[i]);		Cycles.push_back(Cycles[i]);
		Uses.push_back(Uses[i]);
}		}
}		}
}		}
}		}

// Generate the SchedClass table for this processor and update global		// Generate the SchedClass table for this processor and update global
// tables. Must be called for each processor in order.		// tables. Must be called for each processor in order.
void SubtargetEmitter::GenSchedClassTables(const CodeGenProcModel &ProcModel,		void SubtargetEmitter::GenSchedClassTables(const CodeGenProcModel &ProcModel,
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	for (unsigned W : Writes) {
SCDesc.EndGroup \|= WriteRes->getValueAsBit("EndGroup");		SCDesc.EndGroup \|= WriteRes->getValueAsBit("EndGroup");
SCDesc.BeginGroup \|= WriteRes->getValueAsBit("SingleIssue");		SCDesc.BeginGroup \|= WriteRes->getValueAsBit("SingleIssue");
SCDesc.EndGroup \|= WriteRes->getValueAsBit("SingleIssue");		SCDesc.EndGroup \|= WriteRes->getValueAsBit("SingleIssue");

// Create an entry for each ProcResource listed in WriteRes.		// Create an entry for each ProcResource listed in WriteRes.
RecVec PRVec = WriteRes->getValueAsListOfDefs("ProcResources");		RecVec PRVec = WriteRes->getValueAsListOfDefs("ProcResources");
std::vector<int64_t> Cycles =		std::vector<int64_t> Cycles =
WriteRes->getValueAsListOfInts("ResourceCycles");		WriteRes->getValueAsListOfInts("ResourceCycles");
		std::vector<int64_t> Uses =
		WriteRes->getValueAsListOfInts("ResourceUses");

if (Cycles.empty()) {		if (Cycles.empty()) {
// If ResourceCycles is not provided, default to one cycle per		// If ResourceCycles is not provided, default to one cycle per
// resource.		// resource.
Cycles.resize(PRVec.size(), 1);		Cycles.resize(PRVec.size(), 1);
} else if (Cycles.size() != PRVec.size()) {		} else if (Cycles.size() != PRVec.size()) {
// If ResourceCycles is provided, check consistency.		// If ResourceCycles is provided, check consistency.
PrintFatalError(		PrintFatalError(
WriteRes->getLoc(),		WriteRes->getLoc(),
Twine("Inconsistent resource cycles: !size(ResourceCycles) != "		Twine("Inconsistent resource cycles: !size(ResourceCycles) != "
"!size(ProcResources): ")		"!size(ProcResources): ")
.concat(Twine(PRVec.size()))		.concat(Twine(PRVec.size()))
.concat(" vs ")		.concat(" vs ")
.concat(Twine(Cycles.size())));		.concat(Twine(Cycles.size())));
}		}
		if (Uses.empty()) {
		// If ResourceUses is not provided, default to one use of each
		// resource.
		Uses.resize(PRVec.size(), 1);
		} else if (Uses.size() != PRVec.size()) {
		// If ResourceUses is provided, check consistency.
		PrintFatalError(
		WriteRes->getLoc(),
		Twine("Inconsistent resource uses: !size(ResourceUses) != "
		"!size(ProcResources): ")
		.concat(Twine(PRVec.size()))
		.concat(" vs ")
		.concat(Twine(Uses.size())));
		}

ExpandProcResources(PRVec, Cycles, ProcModel);		ExpandProcResources(PRVec, Cycles, Uses, ProcModel);

for (unsigned PRIdx = 0, PREnd = PRVec.size();		for (unsigned PRIdx = 0, PREnd = PRVec.size();
PRIdx != PREnd; ++PRIdx) {		PRIdx != PREnd; ++PRIdx) {
MCWriteProcResEntry WPREntry;		MCWriteProcResEntry WPREntry;
WPREntry.ProcResourceIdx = ProcModel.getProcResourceIdx(PRVec[PRIdx]);		WPREntry.ProcResourceIdx = ProcModel.getProcResourceIdx(PRVec[PRIdx]);
assert(WPREntry.ProcResourceIdx && "Bad ProcResourceIdx");		assert(WPREntry.ProcResourceIdx && "Bad ProcResourceIdx");
WPREntry.Cycles = Cycles[PRIdx];		WPREntry.Cycles = Cycles[PRIdx];
		WPREntry.Uses = Uses[PRIdx];
		if (Uses[PRIdx] < 0)
		PrintFatalError(WriteRes->getLoc(), "Resource use is less than 0");

// If this resource is already used in this sequence, add the current		// If this resource is already used in this sequence, add the current
// entry's cycles so that the same resource appears to be used		// entry's cycles so that the same resource appears to be used
// serially, rather than multiple parallel uses. This is important for		// serially, rather than multiple parallel uses. This is important for
// in-order machine where the resource consumption is a hazard.		// in-order machine where the resource consumption is a hazard.
		//
		// If the resource is to be used in parallel, use the largest
		// such number of parallel uses.
unsigned WPRIdx = 0, WPREnd = WriteProcResources.size();		unsigned WPRIdx = 0, WPREnd = WriteProcResources.size();
for( ; WPRIdx != WPREnd; ++WPRIdx) {		for( ; WPRIdx != WPREnd; ++WPRIdx) {
if (WriteProcResources[WPRIdx].ProcResourceIdx		if (WriteProcResources[WPRIdx].ProcResourceIdx
== WPREntry.ProcResourceIdx) {		== WPREntry.ProcResourceIdx) {
WriteProcResources[WPRIdx].Cycles += WPREntry.Cycles;		WriteProcResources[WPRIdx].Cycles += WPREntry.Cycles;
		WriteProcResources[WPRIdx].Uses =
		std::max(WPREntry.Uses, WriteProcResources[WPRIdx].Uses);
break;		break;
}		}
}		}
if (WPRIdx == WPREnd)		if (WPRIdx == WPREnd)
WriteProcResources.push_back(WPREntry);		WriteProcResources.push_back(WPREntry);
}		}
}		}
WriteLatencies.push_back(WLEntry);		WriteLatencies.push_back(WLEntry);
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	for (const CodeGenSchedClass &SC : SchedModels.schedClasses()) {
}		}
}		}
}		}

// Emit SchedClass tables for all processors and associated global tables.		// Emit SchedClass tables for all processors and associated global tables.
void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,		void SubtargetEmitter::EmitSchedClassTables(SchedClassTables &SchedTables,
raw_ostream &OS) {		raw_ostream &OS) {
// Emit global WriteProcResTable.		// Emit global WriteProcResTable.
OS << "\n// {ProcResourceIdx, Cycles}\n"		OS << "\n// {ProcResourceIdx, Cycles, Uses}\n"
<< "extern const llvm::MCWriteProcResEntry "		<< "extern const llvm::MCWriteProcResEntry " << Target
<< Target << "WriteProcResTable[] = {\n"		<< "WriteProcResTable[] = {\n"
<< " { 0, 0}, // Invalid\n";		<< " { 0, 0, 0}, // Invalid\n";
for (unsigned WPRIdx = 1, WPREnd = SchedTables.WriteProcResources.size();		for (unsigned WPRIdx = 1, WPREnd = SchedTables.WriteProcResources.size();
WPRIdx != WPREnd; ++WPRIdx) {		WPRIdx != WPREnd; ++WPRIdx) {
MCWriteProcResEntry &WPREntry = SchedTables.WriteProcResources[WPRIdx];		MCWriteProcResEntry &WPREntry = SchedTables.WriteProcResources[WPRIdx];
OS << " {" << format("%2d", WPREntry.ProcResourceIdx) << ", "		OS << " {" << format("%2d", WPREntry.ProcResourceIdx) << ", "
<< format("%2d", WPREntry.Cycles) << "}";		<< format("%2d", WPREntry.Cycles) << ", " << format("%2d", WPREntry.Uses)
		<< "}";
if (WPRIdx + 1 < WPREnd)		if (WPRIdx + 1 < WPREnd)
OS << ',';		OS << ',';
OS << " // #" << WPRIdx << '\n';		OS << " // #" << WPRIdx << '\n';
}		}
OS << "}; // " << Target << "WriteProcResTable\n";		OS << "}; // " << Target << "WriteProcResTable\n";

// Emit global WriteLatencyTable.		// Emit global WriteLatencyTable.
OS << "\n// {Cycles, WriteResourceID}\n"		OS << "\n// {Cycles, WriteResourceID}\n"
▲ Show 20 Lines • Show All 709 Lines • Show Last 20 Lines