This is an archive of the discontinued LLVM Phabricator instance.

[llvm-mca] Initial implementation of output serialization using JSON
ClosedPublic

Authored by wolfgangp on Aug 26 2020, 10:37 AM.

Details

Summary

This is a first cut at creating JSON output for MCA views.

Generate JSON by using --json on the command line. llvm-mca generates either its readable output as before or JSON. At the moment they cannot be both generated with just one run.

4 views are implemented:

  • Summary
  • InstructionInfo
  • Timeline
  • ResourcePressure

There is a major and minor version included as key/value pairs in the output. The views appear in alphabetical order.

The test case is rather rigid. It checks for every single line emitted. It can possibly be shortened, but I don't know what a compromise would be. There is also a need to check that regular output and JSON output match from a data point of view. This would involve creating a tool that would check the 2 different outputs for deviations.

Let me know what you think.

Diff Detail

Event Timeline

wolfgangp created this revision.Aug 26 2020, 10:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2020, 10:37 AM
wolfgangp requested review of this revision.Aug 26 2020, 10:37 AM
lebedev.ri added a comment.EditedAug 26 2020, 10:53 AM

I don't like the test.
Can it simply be autogenerated, without omitting details?

llvm/test/tools/llvm-mca/JSON/X86/views.s
6 ↗(On Diff #288029)

Maybe decrease/hardcode iteration/cycle count.
I don't think you need that long of a simulation here.

llvm/tools/llvm-mca/PipelinePrinter.cpp
30–31 ↗(On Diff #288029)

I'd just inline them, less potential for dubious allocations.

llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
335 ↗(On Diff #288029)

Are you sure these have to be std::string. I'd think they can be StringLiteral, or StringRef.

llvm/tools/llvm-mca/Views/InstructionInfoView.cpp
130 ↗(On Diff #288029)

getValueOr("None") ?

llvm/tools/llvm-mca/llvm-mca.cpp
99 ↗(On Diff #288029)

I think there was some docs for the tool, it would be good to document this there, and possibly in releasenotes

andreadb added a comment.EditedAug 26 2020, 12:26 PM

Thanks Wolfgang for working on this.

I suggest a few changes to the JSON ouput:

  1. Have a single (top level) JSON object for the assembly code sequence (see example below):
"InstructionSequence": [
  "addl\t%eax, %eax",
  "addl\t%ebx, %ebx",
  "addl\t%ecx, %ecx",
  "addl\t%edx, %edx"
]
  1. Have Views reference code strings by simply using indices to the "InstructionSequence". This is to avoid to have to unnecessarily print the same instruction over and over (since different views may need to reference specific instructions).

Example (based on the code snippet from your test):

Before:

{
  "Instruction": "addl\t%ecx, %ecx",
  "Latency": 1,
  "NumMicroOpcodes": 1,
  "RThroughput": "0.5",
  <...>
}

After:

{
  "Instruction": 1
  "Latency": 1,
  "Latency": 1,
  "NumMicroOpcodes": 1,
  "RThroughput": "0.5",
}
  1. Have a top level object describing the simulated cpu. That object would need two fields:
    • A cpu name (in your test, it would report "haswell").
    • An array of processor resource units (strings).

The principle is the same as for the code sequence: if a View needs to reference a processor resource unit, then just use an index to that array.

  1. (optional) Change the ResourcePressureView object by splitting the array into multiple arrays (i.e. one per each instruction in the code sequence).

Point 4. is optional, but only if point 3 is implemented. Without point 3. there is no correspondence between the resource cycles and the (names of) consumed resources.
You can still infer the number of resources by dividing the number of elements in the array by the number of instructions in the code sequence.
Splitting that single array into a sequence of arrays would give a bit more structure and make (at least in my opinion) the entire output a bit more readable.

More in general: I suggest to remove the version number from the JSON format. We can add it in future if we need to.

About the test: I agree with Roman. We want to check both the structure and the numbers. So an autogenerated test that doesn't omit details is much much better.
Also (as Roman already suggested), please limit the number of iterations to 1. Otherwise your json output would be really too big (especially if you print the timeline!).

-Andrea

P.s.: in a future patch, we need a way to print out warnings (generated by mca during a simulation) as part of the json output.

Can it simply be autogenerated, without omitting details?

It would have to be auto-generated based on the readable output, then, using some sort of (python) script that creates the structure independently, or did you have something else in mind?

Can it simply be autogenerated, without omitting details?

It would have to be auto-generated based on the readable output, then, using some sort of (python) script that creates the structure independently, or did you have something else in mind?

You should be able to use update_mca_test_checks.py to autogenerate the CHECK lines.

lebedev.ri requested changes to this revision.Oct 8 2020, 3:44 AM
This revision now requires changes to proceed.Oct 8 2020, 3:44 AM

Hey Wolfgang,
are you still looking at this patch? Do you have any updates?

Hey Wolfgang,
are you still looking at this patch? Do you have any updates?

Yeah, sorry, I got roped into other things. I haven't forgotten about it and am planning to return to it, but right now I don't have the cycles. Anyone who wants to grab it and run with it feel free.

wolfgangp updated this revision to Diff 312424.Dec 17 2020, 3:30 AM

Apologies for the long delay, I'm finally ready to resume work on this feature. The update does the following:

  • The test case is auto-generated with a cycle count of 1, so is much smaller than before.
  • The instruction strings and the resource names are emitted in top level objects, where the views refer to the individual items by index. To emit the top level JSON object for the instruction strings I Introduced a special class with a singleton object whose only purpose is to do just that.
  • added some brief documentation for the --json option. Let me know if more is needed.
  • Addressed other miscellaneous review comments.
wolfgangp marked 5 inline comments as done.Dec 17 2020, 3:33 AM
wolfgangp added inline comments.
llvm/tools/llvm-mca/llvm-mca.cpp
99 ↗(On Diff #288029)

Added some brief documentation, need more perhaps?

RKSimon added inline comments.Dec 17 2020, 8:44 AM
llvm/tools/llvm-mca/Views/InstructionInfoView.cpp
143 ↗(On Diff #312424)

const auto &I ?

andreadb added inline comments.Dec 18 2020, 5:31 AM
llvm/test/tools/llvm-mca/JSON/X86/views.s
85–103 ↗(On Diff #312424)

There are multiple problems here:

An instruction rarely consumes more than maybe three or four processor resource units during an entrie simulation. So, array ResourcePressureInfo is going to be very sparse (subtargets may declare more than 10 units).
In practice, 'Numerator' is going to be zero most of the times for most entries.

It is not clear how to reconstruct the resource pressure view from this json output.
In this example, there are 10 processor resource units. However, the number of actual resources declared in JSON array Resources is more than 10 (i.e. it contains names for both units and groups). The ResourcePressureView only cares about resource units.

When processing the content of the json file, this forces the user to skip resource groups, and assume that units always come first within array Resources. While this may be true now, we don't want to strongly rely on an implementation detail.

Another (minor) issue is that "Denominator" is currently set to 1. However, to compute the actual pressure value the user needs to do the following math:
( Numerator / Denominator) / Executions .

This approach requires that we look elsewhere for the execution number (which is 100 in this example). Personally, I think it is more user friendly if we just do the math and stick a double in there (rather than having "Numerator" and "Denominator").

So, I suggest the following two changes:

  1. Only report entries where Numerator is different than zero.
  2. Each entry must also provide an instruction index and a resource index.
  3. Report the actual pressure value as a double (up to two digits after the dot), instead of field Numerator and Denominator.

Soomething like this:

{
  "InstructionIndex": 0,
  "ResourceIndex" : 2,
  "Pressure": 1.00
}
andreadb requested changes to this revision.Dec 18 2020, 6:03 AM

The json output is better now.
However there are things about the implementation that must be fixed/changed (see my comments below).

llvm/tools/llvm-mca/Views/View.cpp
30–39 ↗(On Diff #312424)

Please move this to a separate .cpp file.

40–50 ↗(On Diff #312424)

This should not be a static member of View.
See my previous comment.

52–53 ↗(On Diff #312424)

There is already a mechanism for adding views. We shouldn't add things globally this way. See my other comments about addView.

llvm/tools/llvm-mca/Views/View.h
35–37 ↗(On Diff #312424)

This should not be a static member of View. Please move this logic into a separate View class.

What we want is 'just yet-another-view' that prints out static information about the target CPU. That printing logic should be moved into a separate class; an instance of that class would then be added to the PipelinePrinter like any other view.

66–88 ↗(On Diff #312424)

Please move this to a separate View header file.

81 ↗(On Diff #312424)

This should be a virtual destructor

87 ↗(On Diff #312424)

This is unnecessary.
See my comment in llvm-mca.cpp. We should just rely on addView to subscribe new views, rather than doing this.

llvm/tools/llvm-mca/llvm-mca.cpp
569–572 ↗(On Diff #312424)

You should be able to just call addView like we do for any other view in this file.

Something like:

Printer.addView(
          std::make_unique<mca::InstructionViewJSONHandler>(*STI, *IP, Insts));
576–580 ↗(On Diff #312424)

I think that we should only keep a single API and avoid introducing method printJSON().

Regardless of whether the user requested a JSON report or not, the user should still be able to use the same API (i.e. just call printReport(TOF->os()))

You could add extra state to the PipelinePrinter (something like an OutputKind enum which, at the moment, is either default or json).

Also, we shouldn't be passing the MCInst array, MCPU, and STI back to printJSON.
The printer should be minimalistic and fully delegate to views the task of printing out stuff.

This revision now requires changes to proceed.Dec 18 2020, 6:03 AM
wolfgangp updated this revision to Diff 317121.Jan 15 2021, 5:30 PM
wolfgangp marked an inline comment as done.

This change is trying to use a more streamlined interface based on Andrea's feedback. I removed the printJSON method from the PipelinePrinter and the singleton object that was used to list the instructions.
Instead, each implemented view now has a <ViewName>JSON derived class that overrides the printView method. When JSON output is requested, objects of these class types are added as views. This leaves the PipelinePrinter largely unchanged.

There is a separate "InstructionView" which lists the instructions and CPU resources. Other views refer to the instructions and resources by index.

The ResourcePRessureView now lists all the non-zero resource usages, referring to the instructions and resources by index.
One thing I was not sure about: The last row of the ResourceUsage matrix has the totals for all iterations. Right now it's just printed as if it belonged to instruction N (with N being the number of instructions). I don't know if we want to do something different.

wolfgangp marked 6 inline comments as done.Jan 15 2021, 5:32 PM

I really like this JSON format. Thanks a lot!

The internal design however should be improved/simplified a bit (see my commet below).
This patch is almost ready to go; I plan to accept it once my comment below is addressed.

As a side note: shouldn't we add a line in the release notes about this new feature?
I don't actually remember the process for updating the release notes. When I firstly added llvm-mca, I remember I had to drop a line in a .rst file somewhere.
I think that this change is important enough to be advertised in the release notes (iirc there should be a section specifically for llvm tools). I don't know if the process has changed over the past few years though. Maybe @lebedev.ri and @RKSimon know more about it?

Thanks,
Andrea

llvm/tools/llvm-mca/PipelinePrinter.cpp
16–24 ↗(On Diff #317121)

In my previous comment I was suggesting to add the concept of "view output kind". A new enum type visible to all views (something like this):

enum class ViewOutputKind
{
  RAW,
  JSON
};

The PipelinePrinter could store that information as an internal field.
It could be done at construction time (or, if you really want, you could also add a new API like this setOutputKind(enum ViewOutputKind VOK);

The printReport API would still be unchanged. What would change though is the
signature of printView, which would look like this:

void PipelinePrinter::printReport(llvm::raw_ostream &OS) const {
  for (const auto &V : Views)
    V->printView(OutputKind, OS);
}

The main advantage of this approach is that you don't need to derive new views just for printing a JSON report. It also avoids the visibility change "issue" (from private to protected) to expose the internal state of a base view class to the derived classes.

Method printView for most classes might look like this:

void printView(llvm::View::FileOutputKind OK, llvm::raw_ostream &OS) const override
{
  switch (OK)
 {
 default:
 case OK_RAW:
   printViewRaw(llvm::raw_ostream &OS);  /* a new private member of InstructionInfoView */
 case OK_JSON:
   printViewJSon(llvm::raw_ostream &OS);  /* a new private member of InstructionInfoView */
 }
}

printViewRaw and printViewJSon would be private members of a view.

That way, you don't need separate classes just for the purpose of printing the report in a different format. The logic that prints json files would be fully contained in the printViewJSon.

This might also simplify a bit how the PipelinePrinter is populated (though I am not 100% sure about this).

wolfgangp updated this revision to Diff 318077.Jan 20 2021, 6:12 PM

Based on Andrea's request to use an OutputKind rather than overriding the printView method in a separate class for JSON output

  1. Introduced an output kind into the View class, which is stored in the PipelinePrinter based on the command line settings. PipelinePrinter passes the output kind to the views' PrintView method(), which dispatches either to the regular printView() or to printViewJSON().
  2. Removed the <View>JSON classes and instead added a printViewJSON virtual method which is shared by most views.
  3. the toJSON() methods have been moved back to the view classes.

I changed a couple of minor things from Andrea's proposal: 1) I used the name READABLE instead of RAW (the existing output is well formatted, so I felt the name RAW was not quite catching its essence. 2) I overloaded the old PrintView() instead of using a different name.

I didn't address the release note question yet, but I will look into what's needed here.

RKSimon added inline comments.Jan 21 2021, 3:43 AM
llvm/docs/CommandGuide/llvm-mca.rst
213 ↗(On Diff #318077)

Please can you add a mention in "llvm-project\llvm\docs\ReleaseNotes.rst" in the "Changes to the LLVM tools" section.

I can see that InstructionView has been moved outside of View.h into its own file, but I cannot see that new file anywhere in this diff.

I think that this diff is missing View/InstructionView.{h,cpp}, and llvm/Support/JSON.h.

Could you please upload your full patch?

Some minor comments below.

llvm/tools/llvm-mca/PipelinePrinter.cpp
16 ↗(On Diff #318077)

Is it necessary to include this file here?
Also, I cannot see that file in the diff. Could you please upload a full diff (assuming that this is not a phab user error... :) ).

My point is that there is nothing specific about this file which is dependent on any definitions other than those from View.h.

Also, some views might end up implicitly including it anyway.
I noticed that most views tend to include InstructionView.h (which unfortunately I am unable to see in this diff). Maybe you could move this include in there?

llvm/tools/llvm-mca/Views/ResourcePressureView.h
74 ↗(On Diff #318077)

Is protected really necessary here? It seems to me that it should be private.

llvm/tools/llvm-mca/Views/TimelineView.h
127 ↗(On Diff #318077)

Why protected?
Please correct me if I am wrong, but I don't think it needs to be protected and it can stay private.

andreadb added inline comments.Jan 21 2021, 4:33 AM
llvm/tools/llvm-mca/PipelinePrinter.cpp
16 ↗(On Diff #318077)

Nevermind. Forget about my second sentence.
JSON.h already exists and obviously it is not modified by this patch.
Apologies for the confusion.

The question about whether it should be included here is still valid though.
That being said, it is not an issue, so it is up to you whether to move that include elsewhere or not.

wolfgangp marked 4 inline comments as done.

Sorry about the missing files. Some hiccup while generating the diff.

Addressed the latest comments ('protected' was a leftover from the previous iteration, removed an unncessary include directive and added a mention in the release notes).

wolfgangp marked 2 inline comments as done.Jan 21 2021, 11:38 AM
andreadb accepted this revision.Jan 21 2021, 11:42 AM

LGTM.

Thanks!

@lebedev.ri Any more comments? You are currently blocking the patch. Cheers.

FYI I'm seeing clang diagnostic errors, I think from this commit:

Views/InstructionView.h:37:8: error: 'printView' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

void printView(llvm::raw_ostream &) const {}
     ^

Views/View.h:37:16: note: overridden virtual function is here

virtual void printView(llvm::raw_ostream &OS) const = 0;
             ^

In file included from llvm-project/llvm/tools/llvm-mca/Views/BottleneckAnalysis.cpp:15:
In file included from llvm-mca-headers/Views/BottleneckAnalysis.h:83:
Views/InstructionView.h:46:13: error: 'getNameAsString' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

StringRef getNameAsString() const { return "Instructions and CPU resources"; }
          ^

Views/View.h:44:21: note: overridden virtual function is here

virtual StringRef getNameAsString() const = 0;
This revision was not accepted when it landed; it landed in state Needs Review.Jan 21 2021, 3:46 PM
This revision was automatically updated to reflect the committed changes.

FYI I'm seeing clang diagnostic errors, I think from this commit:

Should be fixed now.

FYI I'm seeing clang diagnostic errors, I think from this commit:

Should be fixed now.

I'm still seeing some. Looks like your fix didn't get all of them:

Views/InstructionInfoView.h:81:15: error: 'toJSON' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

json::Value toJSON() const;
            ^

Views/InstructionView.h:58:15: note: overridden virtual function is here

json::Value toJSON() const override;

FYI I'm seeing clang diagnostic errors, I think from this commit:

Should be fixed now.

I'm still seeing some. Looks like your fix didn't get all of them:

Sorry about that. Should be all of them now.

FYI I'm seeing clang diagnostic errors, I think from this commit:

Should be fixed now.

I'm still seeing some. Looks like your fix didn't get all of them:

Sorry about that. Should be all of them now.

Thanks! Looks good on my end now :-) Really wish some premerge checks ran with these on... :-/