Page MenuHomePhabricator

dpenry (David Penry)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 12 2020, 6:37 AM (26 w, 3 d)

Recent Activity

Mon, Apr 19

dpenry abandoned D94605: [ARM] Use ResourceUses annotations for Cortex-M7.

Abandoned because was replaced by D98977

Mon, Apr 19, 1:31 PM · Restricted Project
dpenry abandoned D94604: [CodeGen] Allow parallel uses of a resource.

Abandoned because was replaced by D98976

Mon, Apr 19, 1:31 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Now using APInt (thanks for that.... I'd wanted something like that but had forgotten what it was called!) and a helper function.

Mon, Apr 19, 11:02 AM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Created helper function and used APInt instead of uint64_t.

Mon, Apr 19, 10:59 AM · Restricted Project

Fri, Apr 16

dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Fri, Apr 16, 5:06 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Fixed a comment

Fri, Apr 16, 4:50 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Marked comments as done which are no longer relevant to the code.

Fri, Apr 16, 4:46 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Major rework to address review comments. Much simpler code!

Fri, Apr 16, 4:43 PM · Restricted Project

Apr 15 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

OK, I think we are in agreement about what they're supposed to allow/disallow when ResourceCycles is uniform. I'd like to make a stab at making this work in the general cases of SchedWriteRes you mentioned before as well:

Apr 15 2021, 11:33 AM · Restricted Project

Apr 14 2021

dpenry updated the diff for D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.

Added a test case

Apr 14 2021, 5:29 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I have added a test case which might clarify how the scheduling improves with the scheduler changes; the t2ADDrr is able to dual-issue with VADDD, but VLDRS is not.

Apr 14 2021, 5:26 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Reordered D98976 and D98977 and added a test case
to show improvement in scheduling results

Apr 14 2021, 5:11 PM · Restricted Project

Apr 13 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I had another look at the patch.

The resource cycle computation doesn't look correct. I think that there is a bit of confusion on how resource cycles of a group are allocated to the underlying sub-units. See my comments below.

Apr 13 2021, 3:39 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I'm not sure this avoids the criticisms from D94604 - we have resource groups and the child resource units, and we can control the width of both of these to model usage each cycle.

But for starters, I'd like to see tests of some kind (llvm-mca timeline or whatever you prefer) that clearly shows the issue you think you're encountering.

The issue comes when using resource groups with BufferSize = 0 -- MachineScheduler just doesn't care about the group indicating a parallel use of resources, though from the discussion in D94604, it seems clear that this is what resource groups are intended to model. It hasn't shown up as an issue before because resource groups with BufferSize = 0 haven't been used before D98977 (which tries to use resource groups in the CortexM7 scheduling model instead of directly using multiple copies of a resource, as code comments indicate that using a resource twice for an in-order model is supposed to mean using it over multiple cycles.)

Some sort of test showing the change in scheduling outcomes is a good idea and I'll come up with something.

This patch doesn't look right to me. I share the same concerns as Simon.

How resource cycles are contributed by individual units of a group has nothing to do with the value of BufferSize.
I don't understand why you think it should make a difference.

A BufferSize of zero simply means that the dispatch event must coincide with the issue event. In the absence of a buffer, opcodes are forced to be sent immediately for execution to the underlying resource units. Basically, there is no delayed execution, and resource consumption starts immediately. It doesn't say anything about how and which units are selected for execution.

Apr 13 2021, 8:36 AM · Restricted Project

Apr 12 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I'm not sure this avoids the criticisms from D94604 - we have resource groups and the child resource units, and we can control the width of both of these to model usage each cycle.

But for starters, I'd like to see tests of some kind (llvm-mca timeline or whatever you prefer) that clearly shows the issue you think you're encountering.

Apr 12 2021, 1:34 PM · Restricted Project

Apr 7 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

<ping>

Apr 7 2021, 10:12 AM · Restricted Project

Mar 19 2021

dpenry added reviewers for D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model: dmgreen, SjoerdMeijer.
Mar 19 2021, 11:57 AM · Restricted Project
dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:56 AM · Restricted Project
dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:54 AM · Restricted Project
dpenry requested review of D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.
Mar 19 2021, 11:50 AM · Restricted Project
dpenry requested review of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:46 AM · Restricted Project
dpenry abandoned D98974: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.

Messed up and wanted another revision before this one and can't convince arc to do the right thing for me.

Mar 19 2021, 11:42 AM · Restricted Project
dpenry requested review of D98974: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.
Mar 19 2021, 11:33 AM · Restricted Project

Mar 11 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

OK. So X86 isn't a concern because it isn't using MachineScheduler (at least not post-RA).

Mar 11 2021, 8:41 AM · Restricted Project

Mar 10 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

  1. Some instructions require the entire FP datapath
  2. Other instructions require half of the FP datapath
  3. It is possible to dual-issue two instructions each requiring half of the FP datapath
  4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

Mar 10 2021, 6:26 PM · Restricted Project

Jan 19 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:

{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136

From ARMWriteProcResTable:

{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164

From CortexM7ModelProcResources:

{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12

In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.

Maybe I am missing some context here (apologies in case), but why is that a problem in practice?

This is how I see it:

Resource-cycles are there to limit the resource throughput. The write from your example can only be issued when both ports (M7UnitVPort0 and M7UnitVPort1) are available. If group M7UnitVPort is partially or fully used, then your write needs to be delayed until both ports become available. The model assumes that micro-opcodes are all dispatched at the same cycle. We cannot currently model "delayed consumption of resources", so resource consumption starts immediately at the beginning of the issue cycle.
In practice, what that means is that ports are "consumed" during the entire duration of the issue cycle. The two resource cycles set by ExpandProcResources for group M7UnitVPort are in practice contributed by the underlying units (i.e. 1 cycle of M7UnitVPort0, and 1 cycle by M7UnitVPort1). So the group doesn't need to be consumed for any extra cycles.
That write alone is enough to maximise the throughput of M7UnitVPort; no other write that uses M7UnitVPort0 and/or M7UnitVPort1 can issue during that same cycle.

Jan 19 2021, 7:55 AM · Restricted Project

Jan 18 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Jan 18 2021, 11:28 AM · Restricted Project

Jan 13 2021

dpenry added reviewers for D94605: [ARM] Use ResourceUses annotations for Cortex-M7: dmgreen, SjoerdMeijer.
Jan 13 2021, 8:10 AM · Restricted Project
dpenry updated the summary of D94604: [CodeGen] Allow parallel uses of a resource.
Jan 13 2021, 8:08 AM · Restricted Project
dpenry requested review of D94605: [ARM] Use ResourceUses annotations for Cortex-M7.
Jan 13 2021, 8:06 AM · Restricted Project
dpenry requested review of D94604: [CodeGen] Allow parallel uses of a resource.
Jan 13 2021, 7:58 AM · Restricted Project

Dec 21 2020

dpenry updated the diff for D93054: [ARM] Add bank conflict hazarding.

More formatting changes

Dec 21 2020, 9:01 AM · Restricted Project

Dec 18 2020

dpenry updated the diff for D93054: [ARM] Add bank conflict hazarding.

Formatting and variable name changes

Dec 18 2020, 3:50 PM · Restricted Project
dpenry added inline comments to D93054: [ARM] Add bank conflict hazarding.
Dec 18 2020, 3:42 PM · Restricted Project

Dec 10 2020

dpenry added reviewers for D93054: [ARM] Add bank conflict hazarding: SjoerdMeijer, dmgreen.
Dec 10 2020, 11:21 AM · Restricted Project
dpenry requested review of D93054: [ARM] Add bank conflict hazarding.
Dec 10 2020, 11:19 AM · Restricted Project

Nov 12 2020

dpenry requested review of D91355: [ARM] Cortex-M7 schedule.
Nov 12 2020, 7:59 AM · Restricted Project