Page MenuHomePhabricator

dpenry (David Penry)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 12 2020, 6:37 AM (91 w, 5 d)

Recent Activity

Yesterday

dpenry added a comment to D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules.

Comments addressed.

Tue, Aug 16, 4:16 PM · Restricted Project, Restricted Project
dpenry updated the diff for D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules.

Responding to comments

Tue, Aug 16, 4:15 PM · Restricted Project, Restricted Project

Mon, Jul 18

dpenry updated the diff for D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules.
  • Changes to comments
  • Debug message explaining that target opted out of schedule
  • Fixes to variable names
  • Naming of functions within test case
Mon, Jul 18, 3:29 PM · Restricted Project, Restricted Project
dpenry added a comment to D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules.

At a high level shouldUseSchedule sounds like a useful addition, but much of the logic in the current Arm implementation doesn't seem very Arm specific (correct me if I'm wrong). Would it be beneficial to incorporate the register pressure calculations into SwingSchedulerDAG, which the target could opt into checking.

Mon, Jul 18, 3:26 PM · Restricted Project, Restricted Project

Jun 30 2022

dpenry added reviewers for D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules: thopre, bcahoon, jsji, jmolloy, dmgreen.
Jun 30 2022, 1:36 PM · Restricted Project, Restricted Project
dpenry requested review of D128941: [ModuloSchedule] Add interface call to accept/reject SMS schedules.
Jun 30 2022, 1:29 PM · Restricted Project, Restricted Project

Jun 7 2022

dpenry committed rG907aedbb3d08: [NFC] Fix spelling/newlines in comments/debug messages (authored by dpenry).
[NFC] Fix spelling/newlines in comments/debug messages
Jun 7 2022, 9:40 AM · Restricted Project, Restricted Project
dpenry closed D127162: [NFC] Fix spelling/newlines in comments/debug messages.
Jun 7 2022, 9:39 AM · Restricted Project, Restricted Project

Jun 6 2022

dpenry added a reviewer for D127162: [NFC] Fix spelling/newlines in comments/debug messages: dmgreen.
Jun 6 2022, 3:40 PM · Restricted Project, Restricted Project
dpenry requested review of D127162: [NFC] Fix spelling/newlines in comments/debug messages.
Jun 6 2022, 3:39 PM · Restricted Project, Restricted Project

May 26 2022

dpenry committed rG917dc0749b77: [ARM] Recognize t2LoopEnd for software pipelining (authored by dpenry).
[ARM] Recognize t2LoopEnd for software pipelining
May 26 2022, 9:57 AM · Restricted Project, Restricted Project
dpenry closed D126322: [ARM] Recognize t2LoopEnd for software pipelining.
May 26 2022, 9:56 AM · Restricted Project, Restricted Project
dpenry updated the diff for D126322: [ARM] Recognize t2LoopEnd for software pipelining.

Fix comment

May 26 2022, 7:51 AM · Restricted Project, Restricted Project

May 25 2022

dpenry updated the diff for D126322: [ARM] Recognize t2LoopEnd for software pipelining.

Responding to review comments from dmgreen

May 25 2022, 12:31 PM · Restricted Project, Restricted Project

May 24 2022

dpenry published D126322: [ARM] Recognize t2LoopEnd for software pipelining for review.
May 24 2022, 1:16 PM · Restricted Project, Restricted Project

Apr 29 2022

dpenry committed rGdcb77643e344: Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM (authored by dpenry).
Reapply [CodeGen][ARM] Enable Swing Module Scheduling for ARM
Apr 29 2022, 10:56 AM · Restricted Project, Restricted Project

Apr 28 2022

dpenry updated subscribers of D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

The reversion is because buildbot clang builds failed; there's an unused private field in a structure that isn't liked.

Apr 28 2022, 1:36 PM · Restricted Project, Restricted Project
dpenry added a reverting change for rG28d09bbbc3d0: [CodeGen][ARM] Enable Swing Module Scheduling for ARM: rGfa49021c68ef: Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM".
Apr 28 2022, 1:30 PM · Restricted Project, Restricted Project
dpenry committed rGfa49021c68ef: Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM" (authored by dpenry).
Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM"
Apr 28 2022, 1:30 PM · Restricted Project, Restricted Project
dpenry added a reverting change for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM: rGfa49021c68ef: Revert "[CodeGen][ARM] Enable Swing Module Scheduling for ARM".
Apr 28 2022, 1:29 PM · Restricted Project, Restricted Project
dpenry committed rG28d09bbbc3d0: [CodeGen][ARM] Enable Swing Module Scheduling for ARM (authored by dpenry).
[CodeGen][ARM] Enable Swing Module Scheduling for ARM
Apr 28 2022, 1:02 PM · Restricted Project, Restricted Project
dpenry closed D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.
Apr 28 2022, 1:01 PM · Restricted Project, Restricted Project

Apr 27 2022

dpenry added inline comments to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.
Apr 27 2022, 8:32 AM · Restricted Project, Restricted Project

Apr 26 2022

dpenry updated the diff for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

Edit in response to JanekvO's comment

Apr 26 2022, 8:56 AM · Restricted Project, Restricted Project
dpenry added inline comments to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.
Apr 26 2022, 8:12 AM · Restricted Project, Restricted Project
dpenry added inline comments to D124267: [MachinePipeliner] Fix unscheduled instruction.
Apr 26 2022, 7:49 AM · Restricted Project, Restricted Project

Apr 22 2022

dpenry added inline comments to D124267: [MachinePipeliner] Fix unscheduled instruction.
Apr 22 2022, 3:10 PM · Restricted Project, Restricted Project
dpenry updated the diff for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

Rebase in hopes the pre-merge buildbot will be kind

Apr 22 2022, 11:33 AM · Restricted Project, Restricted Project

Apr 12 2022

dpenry updated the diff for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

Changes as per dmgreen

Apr 12 2022, 11:18 AM · Restricted Project, Restricted Project
dpenry added a comment to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

Made modifications per dmgreen. Uploading new patch.

Apr 12 2022, 11:18 AM · Restricted Project, Restricted Project

Apr 7 2022

dpenry added a comment to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

It's great to see extensions to the pipeliner so that it works on more targets and loops that don't assume a hardware loop. A little while back a couple of functions, see shoulgnoreForPipelining, were added to support regular loops. However, I don't see that function used. Perhaps that effort was never completely finished? Do we need to get rid of that function, or try to combine the work you've done with that work?

Apr 7 2022, 8:05 AM · Restricted Project, Restricted Project

Mar 31 2022

dpenry added a comment to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

mark comments done

Mar 31 2022, 1:44 PM · Restricted Project, Restricted Project
dpenry updated the diff for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

Responding to review comments:

Mar 31 2022, 1:42 PM · Restricted Project, Restricted Project

Mar 29 2022

dpenry added reviewers for D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM: MatzeB, bcahoon, jsji, jmolloy, dmgreen.
Mar 29 2022, 11:15 AM · Restricted Project, Restricted Project
dpenry requested review of D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.
Mar 29 2022, 11:06 AM · Restricted Project, Restricted Project

Apr 19 2021

dpenry abandoned D94605: [ARM] Use ResourceUses annotations for Cortex-M7.

Abandoned because was replaced by D98977

Apr 19 2021, 1:31 PM · Restricted Project
dpenry abandoned D94604: [CodeGen] Allow parallel uses of a resource.

Abandoned because was replaced by D98976

Apr 19 2021, 1:31 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Now using APInt (thanks for that.... I'd wanted something like that but had forgotten what it was called!) and a helper function.

Apr 19 2021, 11:02 AM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Created helper function and used APInt instead of uint64_t.

Apr 19 2021, 10:59 AM · Restricted Project

Apr 16 2021

dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Apr 16 2021, 5:06 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Fixed a comment

Apr 16 2021, 4:50 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Marked comments as done which are no longer relevant to the code.

Apr 16 2021, 4:46 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Major rework to address review comments. Much simpler code!

Apr 16 2021, 4:43 PM · Restricted Project

Apr 15 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

OK, I think we are in agreement about what they're supposed to allow/disallow when ResourceCycles is uniform. I'd like to make a stab at making this work in the general cases of SchedWriteRes you mentioned before as well:

Apr 15 2021, 11:33 AM · Restricted Project

Apr 14 2021

dpenry updated the diff for D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.

Added a test case

Apr 14 2021, 5:29 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I have added a test case which might clarify how the scheduling improves with the scheduler changes; the t2ADDrr is able to dual-issue with VADDD, but VLDRS is not.

Apr 14 2021, 5:26 PM · Restricted Project
dpenry updated the diff for D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

Reordered D98976 and D98977 and added a test case
to show improvement in scheduling results

Apr 14 2021, 5:11 PM · Restricted Project

Apr 13 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I had another look at the patch.

The resource cycle computation doesn't look correct. I think that there is a bit of confusion on how resource cycles of a group are allocated to the underlying sub-units. See my comments below.

Apr 13 2021, 3:39 PM · Restricted Project
dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I'm not sure this avoids the criticisms from D94604 - we have resource groups and the child resource units, and we can control the width of both of these to model usage each cycle.

But for starters, I'd like to see tests of some kind (llvm-mca timeline or whatever you prefer) that clearly shows the issue you think you're encountering.

The issue comes when using resource groups with BufferSize = 0 -- MachineScheduler just doesn't care about the group indicating a parallel use of resources, though from the discussion in D94604, it seems clear that this is what resource groups are intended to model. It hasn't shown up as an issue before because resource groups with BufferSize = 0 haven't been used before D98977 (which tries to use resource groups in the CortexM7 scheduling model instead of directly using multiple copies of a resource, as code comments indicate that using a resource twice for an in-order model is supposed to mean using it over multiple cycles.)

Some sort of test showing the change in scheduling outcomes is a good idea and I'll come up with something.

This patch doesn't look right to me. I share the same concerns as Simon.

How resource cycles are contributed by individual units of a group has nothing to do with the value of BufferSize.
I don't understand why you think it should make a difference.

A BufferSize of zero simply means that the dispatch event must coincide with the issue event. In the absence of a buffer, opcodes are forced to be sent immediately for execution to the underlying resource units. Basically, there is no delayed execution, and resource consumption starts immediately. It doesn't say anything about how and which units are selected for execution.

Apr 13 2021, 8:36 AM · Restricted Project

Apr 12 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

I'm not sure this avoids the criticisms from D94604 - we have resource groups and the child resource units, and we can control the width of both of these to model usage each cycle.

But for starters, I'd like to see tests of some kind (llvm-mca timeline or whatever you prefer) that clearly shows the issue you think you're encountering.

Apr 12 2021, 1:34 PM · Restricted Project

Apr 7 2021

dpenry added a comment to D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.

<ping>

Apr 7 2021, 10:12 AM · Restricted Project

Mar 19 2021

dpenry added reviewers for D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model: dmgreen, SjoerdMeijer.
Mar 19 2021, 11:57 AM · Restricted Project
dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:56 AM · Restricted Project
dpenry updated the summary of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:54 AM · Restricted Project
dpenry requested review of D98977: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.
Mar 19 2021, 11:50 AM · Restricted Project
dpenry requested review of D98976: [CodeGen] Use ProcResGroup information in SchedBoundary.
Mar 19 2021, 11:46 AM · Restricted Project
dpenry abandoned D98974: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.

Messed up and wanted another revision before this one and can't convince arc to do the right thing for me.

Mar 19 2021, 11:42 AM · Restricted Project
dpenry requested review of D98974: [ARM] Use ProcResGroup in Cortex-M7 scheduling model.
Mar 19 2021, 11:33 AM · Restricted Project

Mar 11 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

OK. So X86 isn't a concern because it isn't using MachineScheduler (at least not post-RA).

Mar 11 2021, 8:41 AM · Restricted Project

Mar 10 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

<snip>

Perhaps I should state explicitly what it is that needs to be modeled for the Cortex-M7 scheduler to make sure we're on the same page:

  1. Some instructions require the entire FP datapath
  2. Other instructions require half of the FP datapath
  3. It is possible to dual-issue two instructions each requiring half of the FP datapath
  4. It is not possible to dual-issue instructions requiring the entire FP datapath with instructions requiring the entire FP datapath or half of the FP datapath.

I would love it if there was a way to just make this work out of the box. However, stating that a resource is used twice (that's what's in the current code) or that there's a resource group with two parts (as suggested) doesn't do the trick. Nor did trying to define VPort0 and VPort1 as sub-units of VPort.

What you have described is a classic scenario for processor resource groups You can have a group for the entire FP datapath, and then model each half of the FP datapath separately with a resource unit. If this doesn't work for you, then it is a bug in the MachineScheduler (at least, the logic that does the bookkeping of resource cycles for groups).

More in general, an algorithm cannot ignore the resource cycle contributions of individual units to a group. Otherwise, group latencies are incorrectly computed.

Mar 10 2021, 6:26 PM · Restricted Project

Jan 19 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Unfortunately, I have tried doing this with a resource group with no success. ExpandProcResources ends up marking the resource group as used for multiple cycles:

From CortexM7ModelSchedClasses:

{DBGFIELD("IIC_fpFMAC64_WriteFPMAC64_ReadFPMAC_ReadFPMUL_ReadFPMUL") 1, true, false, 161, 4, 795, 1, 132, 3}, // #136

From ARMWriteProcResTable:

{ 9,  1}, // #161
{10,  2}, // #162
{11,  1}, // #163
{12,  1}, // #164

From CortexM7ModelProcResources:

{"M7UnitVFP",       1, 0, 0, nullptr}, // #9
{"M7UnitVPort",     2, 0, 0, CortexM7ModelProcResourceSubUnits + 1}, // #10
{"M7UnitVPort0",    1, 0, 0, nullptr}, // #11
{"M7UnitVPort1",    1, 0, 0, nullptr}, // #12

In the end, the test in lines 1139-1140 of SubTargetEmitter.cpp forces multiple uses of a resource -- whether they be explicitly stated in an InstRW, implied by using different resources in a resource group, or hierarchically stated as using subunits of the resource -- to take multiple cycles. That test seems so fundamental to the way that current schedule descriptions work that it seemed better to introduce the additional Uses notation than to change it.

Maybe I am missing some context here (apologies in case), but why is that a problem in practice?

This is how I see it:

Resource-cycles are there to limit the resource throughput. The write from your example can only be issued when both ports (M7UnitVPort0 and M7UnitVPort1) are available. If group M7UnitVPort is partially or fully used, then your write needs to be delayed until both ports become available. The model assumes that micro-opcodes are all dispatched at the same cycle. We cannot currently model "delayed consumption of resources", so resource consumption starts immediately at the beginning of the issue cycle.
In practice, what that means is that ports are "consumed" during the entire duration of the issue cycle. The two resource cycles set by ExpandProcResources for group M7UnitVPort are in practice contributed by the underlying units (i.e. 1 cycle of M7UnitVPort0, and 1 cycle by M7UnitVPort1). So the group doesn't need to be consumed for any extra cycles.
That write alone is enough to maximise the throughput of M7UnitVPort; no other write that uses M7UnitVPort0 and/or M7UnitVPort1 can issue during that same cycle.

Jan 19 2021, 7:55 AM · Restricted Project

Jan 18 2021

dpenry added a comment to D94604: [CodeGen] Allow parallel uses of a resource.

I have only skimmed through this patch once, however I think that you can fix the problem in https://reviews.llvm.org/D94605 without introducing your new field ResourceUses.

The "problematic" resource is M7UnitVPort

def M7UnitVPort  : ProcResource<2> { let BufferSize = 0; }

In your case, you want to allow the consumption of both resource units from a single write.
You can do that if you convert M7UnitVPort into a group (see example below)

def M7UnitVPort0 : ProcResource<1> { let BufferSize = 0; }
def M7UnitVPort1 : ProcResource<1> { let BufferSize = 0; }

def M7UnitVPort : ProcResGroup<[M7UnitVPort0, M7UnitVPort1]>;

At that point, you simply enumerate the resource units in the list of consumed resources. So, something like this:

Example - before:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort, M7UnitVPort]>

Example - after:

def : WriteRes<WriteFPMAC64, [M7UnitVFP, M7UnitVPort0, M7UnitVPort1]>

In conclusion, if the goal is to be able to do something like that, then I think the syntax is already expressive enough.
The obvious downside is that currently you need to declare multiple resources to do what you want to do.

Jan 18 2021, 11:28 AM · Restricted Project

Jan 13 2021

dpenry added reviewers for D94605: [ARM] Use ResourceUses annotations for Cortex-M7: dmgreen, SjoerdMeijer.
Jan 13 2021, 8:10 AM · Restricted Project
dpenry updated the summary of D94604: [CodeGen] Allow parallel uses of a resource.
Jan 13 2021, 8:08 AM · Restricted Project
dpenry requested review of D94605: [ARM] Use ResourceUses annotations for Cortex-M7.
Jan 13 2021, 8:06 AM · Restricted Project
dpenry requested review of D94604: [CodeGen] Allow parallel uses of a resource.
Jan 13 2021, 7:58 AM · Restricted Project

Dec 21 2020

dpenry updated the diff for D93054: [ARM] Add bank conflict hazarding.

More formatting changes

Dec 21 2020, 9:01 AM · Restricted Project

Dec 18 2020

dpenry updated the diff for D93054: [ARM] Add bank conflict hazarding.

Formatting and variable name changes

Dec 18 2020, 3:50 PM · Restricted Project
dpenry added inline comments to D93054: [ARM] Add bank conflict hazarding.
Dec 18 2020, 3:42 PM · Restricted Project

Dec 10 2020

dpenry added reviewers for D93054: [ARM] Add bank conflict hazarding: SjoerdMeijer, dmgreen.
Dec 10 2020, 11:21 AM · Restricted Project
dpenry requested review of D93054: [ARM] Add bank conflict hazarding.
Dec 10 2020, 11:19 AM · Restricted Project

Nov 12 2020

dpenry requested review of D91355: [ARM] Cortex-M7 schedule.
Nov 12 2020, 7:59 AM · Restricted Project