This is an archive of the discontinued LLVM Phabricator instance.

[MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
ClosedPublic

Authored by courbet on Oct 5 2018, 5:42 AM.

Details

Summary

The pfm counters are now in the MCSubtargetInfo rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Diff Detail

Event Timeline

courbet created this revision.Oct 5 2018, 5:42 AM

Sorry for the formatting noise, I'll revert the non-related changes.

courbet updated this revision to Diff 168463.Oct 5 2018, 6:00 AM

Remove formatting noise.

gchatelet added inline comments.Oct 5 2018, 6:13 AM
include/llvm/MC/MCSchedule.h
191

Why do you need both?

199

What would ProcResName be is case of retired instruction for instance?

lib/MC/MCSchedule.cpp
152 ↗(On Diff #168457)

I don't understand the error message.

utils/TableGen/CodeGenSchedule.cpp
783

join string literals and reapply formatting.

886

Same here

1204

Can you explain what the 6 is referring to?

1432

const

1463

join string literals

1942

Can you reformat

utils/TableGen/SubtargetEmitter.cpp
1792

llvm::format would help here

OS << llvm::format("{0}ProcSchedKV, {0}ProcPfmKV, {0}WriteProcResTable, {0}WriteLatencyTable, {0}ReadAdvanceTable, ", Target);
1886

Same here.

@gchatelet a bunch of your comments are on old code touched by unwanted reformatting. I ignored these since I have reverted the formatted changes.

include/llvm/MC/MCSchedule.h
191

For consistency with MCSchedModel (see getValueForCpu() below).

199

the retired counter is a scheduler concept, not tied to proc resources. It would go with CycleCounter and UopsCounter above.

lib/MC/MCSchedule.cpp
152 ↗(On Diff #168457)

This is ensuring that the value can be constructed by the linker without running any dynamic initialization.

courbet updated this revision to Diff 168465.Oct 5 2018, 6:28 AM

use "dynamic initialization" instead of "static constructor".

courbet updated this revision to Diff 168467.Oct 5 2018, 6:50 AM

address review comments

courbet marked an inline comment as done.Oct 5 2018, 6:50 AM

Thanks

utils/TableGen/SubtargetEmitter.cpp
1792

I want to remain consistent with the rest of the file. I've reformatted to make it more readable.

RKSimon added inline comments.Oct 5 2018, 7:17 AM
lib/Target/X86/X86.td
978 ↗(On Diff #168467)

Missing btver2 details?

courbet updated this revision to Diff 168470.Oct 5 2018, 7:24 AM
courbet marked an inline comment as done.

add missing BtVer2PfmCounters to btver2.

lib/Target/X86/X86.td
978 ↗(On Diff #168467)

Thanks for the catch.

gchatelet accepted this revision.Oct 5 2018, 7:51 AM
This revision is now accepted and ready to land.Oct 5 2018, 7:51 AM

I’m not at work today, but I’d like a bit of time to review this patch.

I saw that it has already been accepted. However I am not entirely sure if I like the idea of adding another field to MCSubtargetInfo. I need to think about this.

RKSimon requested changes to this revision.Oct 5 2018, 8:13 AM
This revision now requires changes to proceed.Oct 5 2018, 8:13 AM

I’m not at work today, but I’d like a bit of time to review this patch.

I saw that it has already been accepted. However I am not entirely sure if I like the idea of adding another field to MCSubtargetInfo. I need to think about this.

Sure, take your time.

Thanks for waiting Clement.

I am now back from holiday, and I finally had time to look at your patch.

What concerns me about this approach is that MCSubtargetInfo describes a processor for the purpose of codegen.
At least for me (I know it may sound a bit philosophical), exposing the knowledge about PFMs to MCSubtargetInfo is not ideal. Ideally, the MCSubtargetInfo interface should be small and "abstract enough" to allow the description of different subprocessors for different targets. Anything perf related, should be described by other modules (i.e. in a separate class; alternatively, perf-related knowledge should be described by scheduling models).

What if instead we make PFM descriptors a customizable property tablegen class SchedMachineModel? Something that defaults to an empty set of descriptors.

On X86, knowledge about PFM counters of a vendor/family could be moved to a separate .td file (similarly to what you already do with lib/Target/X86/X86PfmCounters.td).
When a new model is created, people can either use let expressions to override the PFM set, or - alternatively - derive from a IntelSchedMachineModel class (vic. AMDSchedMachineModel) that sets some common "defaults".
Alternatively, we could introduce the concept of target vendor/processor family, if that helps mapping models to "default sets of" PFM counters.
As long as people are allowed to customize that set (either by using let expressions, or by overriding fiels using a a tablegen derived class), then it should be okay.

Not sure if it makes sense.

The bottom line is: I think we should try to keep the concept of PFMs separate from MCSubtargetInfo as much as possible.
That being said, I don't have a too strong opinion on this; if other devs think that I am wrong on this, then fine. I don't want to block the development on this.

I hope this helps.
Andrea

Hi Andrea, thanks for the comments.

I think there are two independent points that we're discussing here:
1 - Where we're putting the generated table
2 - How we specify the table.

Regarding (1):
The goal of this patch is to move the pfm counters out of the SchedModels (see PR39165), because different CPUs with different PFM counters can use the same sched model (e.g. sandybridge).
That being said, I'm not opposed to moving the table of CPU->MCPfmCountersInfo outside of the MCSubtargetInfo. One possible approach is to create a new PfmEmitter tablegen backend and PfmCounters library. If people think that's reasonable I can do that easily.

Regarding (2), I'm not sure I understand extactly what you're suggesting. Are you suggesting I do something like:

let PfmCounters = HaswellPfmCounters in {
def : HaswellProc<"haswell">;
def : HaswellProc<"core-avx2">; // Legacy alias.
}

Or more like:

// In X86.td:
def HaswellProcessorModel : HaswellProc<"haswell">;
def CoreAvx2ProcessorModel : HaswellProc<"core-avx2">; // Legacy alias.

// In X86PfmCounters.td:
def : ProcPfmCounters {
  let CycleCounter = UnhaltedCoreCyclesPfmCounter;
  let UopsCounter = UopsIssuedPfmCounter;
  let IssueCounters = [
    PfmIssueCounter<"HWPort0", "uops_dispatched_port:port_0">,
    PfmIssueCounter<"HWPort1", "uops_dispatched_port:port_1">,
    PfmIssueCounter<"HWPort2", "uops_dispatched_port:port_2">,
    PfmIssueCounter<"HWPort3", "uops_dispatched_port:port_3">,
    PfmIssueCounter<"HWPort4", "uops_dispatched_port:port_4">,
    PfmIssueCounter<"HWPort5", "uops_dispatched_port:port_5">,
    PfmIssueCounter<"HWPort6", "uops_dispatched_port:port_6">,
    PfmIssueCounter<"HWPort7", "uops_dispatched_port:port_7">
  ];
  let ProcModels = [HaswellProcessorModel, CoreAvx2ProcessorModel]
}

Thanks for the description Clement,

Hi Andrea, thanks for the comments.

I think there are two independent points that we're discussing here:
1 - Where we're putting the generated table
2 - How we specify the table.

Regarding (1):
The goal of this patch is to move the pfm counters out of the SchedModels (see PR39165), because different CPUs with different PFM counters can use the same sched model (e.g. sandybridge).

Sorry. I didn't realize this patch was trying to fix that issue.
The use case for this change was not very clear to me. I though it was just a simple: "how to inherit a default set of PMCs, so that I didn't have to specify the same set over and over..".

I still think it is a preferrable solution to not touch MCSubtargetInfo. That being said, I don't have a strong opinion; other reviewers are definitely more knowledgeable than me on exegesis..

Andrea

courbet requested review of this revision.Oct 15 2018, 11:59 PM
RKSimon added a comment.EditedOct 22 2018, 5:35 AM

Sorry for late reply!

It looks like we need to find a way to still embed this information in MCExtraProcessorInfo but allow for different CPUs PFM mappings.

One of the aims for PR39165 was to make it possible for llvm-exegesis to be run on CPUs with declared PFMs but without a model - allowing a report on raw resource usage for an instruction and to help create the model from scratch. @courbet @gchatelet do you think this would still be useful? Is it still an aim of llvm-exegesis to create models from scratch or just report on existing models?

Can you easily attach PFM mappings to the GenericModel/GenericPostRAModel?

It looks like we need to find a way to still embed this information in MCExtraProcessorInfo but allow for different CPUs PFM mappings.
Can you easily attach PFM mappings to the GenericModel/GenericPostRAModel?

I don't think an MCExtraProcessorInfo should store information for several CPUs, because MCExtraProcessorInfo is in MCSchedModel, which is selected by CPU, so that would be weird.

Before this change, we were putting the MCPfmCountersInfo inside MCSchedModel, which implicitly meant that there was a 1:1 mapping between them. This change decouples MCPfmCountersInfo from MCSchedModel.
A CPU chooses its SchedModel and PfmCounters independently.

Now the decision we have to make now is whether in TD files (see point (2) in my comment above):

  • (A) the CPU should declare its PfmCounters (this is the approach I'm taking here). Because that's very similar to what is done for MCSchedModels, I'm doing the same for PfmCounters, i.e. putting the table of Cpu->PfmCounter inside TargetSubtargetInfo, but that is actually independent from (see point (1) in my response to Andrea's comment above)
  • (B) the mapping of CPU->PfmCounter is kept separately (in X86PfmCounters.td).

One of the aims for PR39165 was to make it possible for llvm-exegesis to be run on CPUs with declared PFMs but without a model - allowing a report on raw resource usage for an instruction and to help create the model from scratch. @courbet @gchatelet do you think this would still be useful?
Is it still an aim of llvm-exegesis to create models from scratch or just report on existing models?

@lebedev.ri used llvm-exegesis to produce the bdverX model. At that point there was no SchedModel, so it still seems desirable to do this.

I don't think an MCExtraProcessorInfo should store information for several CPUs, because MCExtraProcessorInfo is in MCSchedModel, which is selected by CPU, so that would be weird.

You say that but we have multiple, very different, CPUs referencing the same model - Knights Landing uses the Haswell Model, the PileDriver model is likely to be used for the entire Bulldozer range, Sandy Bridge model is used as the default cpu - if you run llvm-exegesis in any of these cases apart for the 'main' cpu case it will just crash.

Before this change, we were putting the MCPfmCountersInfo inside MCSchedModel, which implicitly meant that there was a 1:1 mapping between them. This change decouples MCPfmCountersInfo from MCSchedModel.
A CPU chooses its SchedModel and PfmCounters independently.

Now the decision we have to make now is whether in TD files (see point (2) in my comment above):

  • (A) the CPU should declare its PfmCounters (this is the approach I'm taking here). Because that's very similar to what is done for MCSchedModels, I'm doing the same for PfmCounters, i.e. putting the table of Cpu->PfmCounter inside TargetSubtargetInfo, but that is actually independent from (see point (1) in my response to Andrea's comment above)
  • (B) the mapping of CPU->PfmCounter is kept separately (in X86PfmCounters.td).

(C) As the PFMs are only used by llvm-exegesis (and are likely to only every be) the PFM mappings should be moved entirely into llvm-exegesis code.

One of the aims for PR39165 was to make it possible for llvm-exegesis to be run on CPUs with declared PFMs but without a model - allowing a report on raw resource usage for an instruction and to help create the model from scratch. @courbet @gchatelet do you think this would still be useful?
Is it still an aim of llvm-exegesis to create models from scratch or just report on existing models?

@lebedev.ri used llvm-exegesis to produce the bdverX model. At that point there was no SchedModel, so it still seems desirable to do this.

IIRC @lebedev.ri had to create a copy of the btver2 model and then slowly iterate on it until it matched the bdver arch.

@lebedev.ri used llvm-exegesis to produce the bdverX model. At that point there was no SchedModel, so it still seems desirable to do this.

IIRC @lebedev.ri had to create a copy of the btver2 model and then slowly iterate on it

Correct.

until it matched the bdver arch.

I don't think an MCExtraProcessorInfo should store information for several CPUs, because MCExtraProcessorInfo is in MCSchedModel, which is selected by CPU, so that would be weird.

You say that but we have multiple, very different, CPUs referencing the same model - Knights Landing uses the Haswell Model, the PileDriver model is likely to be used for the entire Bulldozer range, Sandy Bridge model is used as the default cpu - if you run llvm-exegesis in any of these cases apart for the 'main' cpu case it will just crash.

That's exactly my point: Because we have very different CPUs referencing the same SchedModel, we should not store stuff that's going to vary from CPU to CPU (e.g. PfmCounters) inside the MCSchedModel.

Before this change, we were putting the MCPfmCountersInfo inside MCSchedModel, which implicitly meant that there was a 1:1 mapping between them. This change decouples MCPfmCountersInfo from MCSchedModel.
A CPU chooses its SchedModel and PfmCounters independently.

Now the decision we have to make now is whether in TD files (see point (2) in my comment above):

  • (A) the CPU should declare its PfmCounters (this is the approach I'm taking here). Because that's very similar to what is done for MCSchedModels, I'm doing the same for PfmCounters, i.e. putting the table of Cpu->PfmCounter inside TargetSubtargetInfo, but that is actually independent from (see point (1) in my response to Andrea's comment above)
  • (B) the mapping of CPU->PfmCounter is kept separately (in X86PfmCounters.td).

(C) As the PFMs are only used by llvm-exegesis (and are likely to only every be) the PFM mappings should be moved entirely into llvm-exegesis code.

This is choice (B) here, plus choice (1) in the comment I mentioned above, quoting myself:

"That being said, I'm not opposed to moving the table of CPU->MCPfmCountersInfo outside of the MCSubtargetInfo. One possible approach is to create a new PfmEmitter tablegen backend and PfmCounters library. If people think that's reasonable I can do that easily."

So it seems we have a plan now :)

One of the aims for PR39165 was to make it possible for llvm-exegesis to be run on CPUs with declared PFMs but without a model - allowing a report on raw resource usage for an instruction and to help create the model from scratch. @courbet @gchatelet do you think this would still be useful?
Is it still an aim of llvm-exegesis to create models from scratch or just report on existing models?

@lebedev.ri used llvm-exegesis to produce the bdverX model. At that point there was no SchedModel, so it still seems desirable to do this.

IIRC @lebedev.ri had to create a copy of the btver2 model and then slowly iterate on it until it matched the bdver arch.

courbet updated this revision to Diff 170656.Oct 23 2018, 8:29 AM

Move pfm counters to ExegesisTarget. There is now a separate Exegesis TableGen backend.

courbet updated this revision to Diff 170658.Oct 23 2018, 8:37 AM

AArch64 is not yet tablegened.

andreadb accepted this revision.Oct 24 2018, 11:02 AM

Move pfm counters to ExegesisTarget. There is now a separate Exegesis TableGen backend.

Thanks Clement!

I really like this approach. Overall, I think it is a much better design than the original one, since it avoid polluting MCSubtargetInfo.

Cheers
-Andrea

RKSimon accepted this revision.Oct 24 2018, 11:22 AM

LGTM - thanks @courbet

This revision is now accepted and ready to land.Oct 24 2018, 11:22 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Dec 30 2018, 5:45 PM
thakis added inline comments.
llvm/trunk/lib/Target/X86/CMakeLists.txt
16 ↗(On Diff #171043)

This list is in alphabetical order. Mind putting this entry in alphabetical order too?

thakis added inline comments.Dec 30 2018, 5:48 PM
llvm/trunk/lib/Target/X86/CMakeLists.txt
16 ↗(On Diff #171043)

Also, it looks like X86GenExegesis.inc is only used on llvm/tools/lib/X86. Shouldn't this tablegen call be there?