Download Raw Diff

Details

Reviewers

djtodoro
andreadb
RKSimon
lebedev.ri
qcolombet
holland11
gbedwell

Commits

rG0e343479a7ea: [llvm-mca] Compare multiple files

Summary

Script (llvm-mca-compare.py) uses llvm-mca tool to print statistics in console for multiple files.
Script requires specified --llvm-mca-binary option (specified relative path to binary of llvm-mca). Options: --args [="-option1=<arg> -option2=<arg> ..."], -v or -h can also be used.

The script is used as follows:

$ llvm-project/llvm/utils/llvm-mca-compare.py file1.s --llvm-mca-binary=build/bin/llvm-mca

Input files:
[f1]: file1.s

ITERATIONS: 100


-----------------------------------------
Code region: 1

+---------------------+--------+
|                     | [f1]:  |
+=====================+========+
| Instructions:       | 1100   |
+---------------------+--------+
| Total Cycles:       | 1097   |
+---------------------+--------+
| Total uOps:         | 1900   |
+---------------------+--------+
| Dispatch Width:     | 6      |
+---------------------+--------+
| uOps Per Cycle:     | 1.73   |
+---------------------+--------+
| IPC:                | 1.0    |
+---------------------+--------+
| Block RThroughput:  | 3.17   |
+---------------------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 2.72     | 2.76     | 1.66     | 1.68     | 3        | 2.76     | 2.76     | 1.66     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

$ llvm-project/llvm/utils/llvm-mca-compare.py file1.s --llvm-mca-binary=build/bin/llvm-mca -v

run: $ build/bin/llvm-mca -json file1.s

Simulation Parameters: 
-march : x86_64
-mcpu : skylake
-mtriple : x86_64-unknown-linux-gnu


Input files:
[f1]: file1.s

ITERATIONS: 100


-----------------------------------------
Code region: 1

+---------------------+--------+
|                     | [f1]:  |
+=====================+========+
| Instructions:       | 1100   |
+---------------------+--------+
| Total Cycles:       | 1097   |
+---------------------+--------+
| Total uOps:         | 1900   |
+---------------------+--------+
| Dispatch Width:     | 6      |
+---------------------+--------+
| uOps Per Cycle:     | 1.73   |
+---------------------+--------+
| IPC:                | 1.0    |
+---------------------+--------+
| Block RThroughput:  | 3.17   |
+---------------------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 2.72     | 2.76     | 1.66     | 1.68     | 3        | 2.76     | 2.76     | 1.66     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

$ llvm-project/llvm/utils/llvm-mca-compare.py file1.s --llvm-mca-binary=build/bin/llvm-mca --args="-dispatch=10 -noalias=false -iterations=300"  -v

run: $ build/bin/llvm-mca -dispatch=10 -noalias=false -iterations=300 -json file1.s

Simulation Parameters: 
-dispatch : 10
-march : x86_64
-mcpu : skylake
-mtriple : x86_64-unknown-linux-gnu
-noalias : False


Input files:
[f1]: file1.s

ITERATIONS: 300


-----------------------------------------
Code region: 1

+---------------------+--------+
|                     | [f1]:  |
+=====================+========+
| Instructions:       | 3300   |
+---------------------+--------+
| Total Cycles:       | 3097   |
+---------------------+--------+
| Total uOps:         | 5700   |
+---------------------+--------+
| Dispatch Width:     | 10     |
+---------------------+--------+
| uOps Per Cycle:     | 1.84   |
+---------------------+--------+
| IPC:                | 1.07   |
+---------------------+--------+
| Block RThroughput:  | 3      |
+---------------------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 2.75     | 2.75     | 1.67     | 1.67     | 3        | 2.75     | 2.75     | 1.67     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

$ llvm-project/llvm/utils/llvm-mca-compare.py file1.s file2.s file3.s file4.s --llvm-mca-binary=build/bin/llvm-mca

Input files:
[f1]: file1.s
[f2]: file2.s
[f3]: file3.s
[f4]: file4.s

ITERATIONS: 100


-----------------------------------------
Code region: 1

+---------------------+--------+--------+--------+--------+
|                     | [f1]:  | [f2]:  | [f3]:  | [f4]:  |
+=====================+========+========+========+========+
| Instructions:       | 1100   | 600    | 2800   | 1200   |
+---------------------+--------+--------+--------+--------+
| Total Cycles:       | 1097   | 897    | 2192   | 1096   |
+---------------------+--------+--------+--------+--------+
| Total uOps:         | 1900   | 1400   | 4500   | 2200   |
+---------------------+--------+--------+--------+--------+
| Dispatch Width:     | 6      | 6      | 6      | 6      |
+---------------------+--------+--------+--------+--------+
| uOps Per Cycle:     | 1.73   | 1.56   | 2.05   | 2.01   |
+---------------------+--------+--------+--------+--------+
| IPC:                | 1.0    | 0.67   | 1.28   | 1.09   |
+---------------------+--------+--------+--------+--------+
| Block RThroughput:  | 3.17   | 2.33   | 10     | 3.67   |
+---------------------+--------+--------+--------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 2.72     | 2.76     | 1.66     | 1.68     | 3        | 2.76     | 2.76     | 1.66     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f2]:  | -          | -            | 1.11     | 1.95     | 1.33     | 1.34     | 2        | 1.95     | 1.99     | 1.33     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f3]:  | -          | -            | 4.71     | 4.72     | 7        | 7        | 10       | 4.73     | 4.84     | 7        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f4]:  | -          | -            | 3        | 3        | 1.75     | 1.76     | 2        | 3.01     | 3.99     | 1.49     |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

$ llvm-project/llvm/utils/llvm-mca-compare.py test-two-code-regions.s test-two-code-regions-opt.s --llvm-mca-binary=build/bin/llvm-mca

Input files:
[f1]: test-two-code-regions.s
[f2]: test-two-code-regions-opt.s

ITERATIONS: 100


-----------------------------------------
Code region: 1

+---------------------+--------+--------+
|                     | [f1]:  | [f2]:  |
+=====================+========+========+
| Instructions:       | 300    | 100    |
+---------------------+--------+--------+
| Total Cycles:       | 303    | 103    |
+---------------------+--------+--------+
| Total uOps:         | 300    | 100    |
+---------------------+--------+--------+
| Dispatch Width:     | 6      | 6      |
+---------------------+--------+--------+
| uOps Per Cycle:     | 0.99   | 0.97   |
+---------------------+--------+--------+
| IPC:                | 0.99   | 0.97   |
+---------------------+--------+--------+
| Block RThroughput:  | 0.75   | 0.25   |
+---------------------+--------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 0.75     | 0.75     | -        | -        | -        | 0.75     | 0.75     | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f2]:  | -          | -            | 0.25     | 0.25     | -        | -        | -        | 0.25     | 0.25     | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+



-----------------------------------------
Code region: 2

+---------------------+--------+--------+
|                     | [f1]:  | [f2]:  |
+=====================+========+========+
| Instructions:       | 200    | 100    |
+---------------------+--------+--------+
| Total Cycles:       | 203    | 103    |
+---------------------+--------+--------+
| Total uOps:         | 200    | 100    |
+---------------------+--------+--------+
| Dispatch Width:     | 6      | 6      |
+---------------------+--------+--------+
| uOps Per Cycle:     | 0.99   | 0.97   |
+---------------------+--------+--------+
| IPC:                | 0.99   | 0.97   |
+---------------------+--------+--------+
| Block RThroughput:  | 0.5    | 0.25   |
+---------------------+--------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 0.5      | 0.5      | -        | -        | -        | 0.5      | 0.5      | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f2]:  | -          | -            | 0.25     | 0.25     | -        | -        | -        | 0.25     | 0.25     | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

$ llvm-project/llvm/utils/llvm-mca-compare.py test-one-code-region.s test-two-code-regions.s --llvm-mca-binary=build/bin/llvm-mca

Input files:
[f1]: test-one-code-region.s
[f2]: test-two-code-regions.s

ITERATIONS: 100


-----------------------------------------
Code region: 1

+---------------------+--------+--------+
|                     | [f1]:  | [f2]:  |
+=====================+========+========+
| Instructions:       | 100    | 300    |
+---------------------+--------+--------+
| Total Cycles:       | 103    | 303    |
+---------------------+--------+--------+
| Total uOps:         | 100    | 300    |
+---------------------+--------+--------+
| Dispatch Width:     | 6      | 6      |
+---------------------+--------+--------+
| uOps Per Cycle:     | 0.97   | 0.99   |
+---------------------+--------+--------+
| IPC:                | 0.97   | 0.99   |
+---------------------+--------+--------+
| Block RThroughput:  | 0.25   | 0.75   |
+---------------------+--------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | 0.25     | 0.25     | -        | -        | -        | 0.25     | 0.25     | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f2]:  | -          | -            | 0.75     | 0.75     | -        | -        | -        | 0.75     | 0.75     | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+



-----------------------------------------
Code region: 2

+---------------------+--------+--------+
|                     | [f1]:  | [f2]:  |
+=====================+========+========+
| Instructions:       | -      | 200    |
+---------------------+--------+--------+
| Total Cycles:       | -      | 203    |
+---------------------+--------+--------+
| Total uOps:         | -      | 200    |
+---------------------+--------+--------+
| Dispatch Width:     | -      | 6      |
+---------------------+--------+--------+
| uOps Per Cycle:     | -      | 0.99   |
+---------------------+--------+--------+
| IPC:                | -      | 0.99   |
+---------------------+--------+--------+
| Block RThroughput:  | -      | 0.5    |
+---------------------+--------+--------+

Resource pressure per iteration: 

+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
|        | SKLDivider | SKLFPDivider | SKLPort0 | SKLPort1 | SKLPort2 | SKLPort3 | SKLPort4 | SKLPort5 | SKLPort6 | SKLPort7 |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f1]:  | -          | -            | -        | -        | -        | -        | -        | -        | -        | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+
| [f2]:  | -          | -            | 0.5      | 0.5      | -        | -        | -        | 0.5      | 0.5      | -        |
+--------+------------+--------------+----------+----------+----------+----------+----------+----------+----------+----------+

Used assembly files:

test-two-code-regions.s149 BDownload

test-one-code-region.s51 BDownload

file4.s1 KBDownload

file3.s1 KBDownload

test-two-code-regions-opt.s99 BDownload

file1.s809 BDownload

file2.s670 BDownload

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mmatic05 created this revision.Dec 6 2021, 2:24 AM

Herald added a reviewer: andreadb. · View Herald TranscriptDec 6 2021, 2:24 AM

Herald added subscribers: gbedwell, mgorny. · View Herald Transcript

mmatic05 requested review of this revision.Dec 6 2021, 2:24 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptDec 6 2021, 2:24 AM

djtodoro added a subscriber: petarj.Dec 6 2021, 2:31 AM

Thanks for this! (I am not very familiar with this code, but the motivation sounds reasonable -- some nits included)

llvm/test/tools/llvm-mca/X86/Inputs/diff-option-file-3.s
65 ↗	(On Diff #391996)	`(https://github.com/mmatic05/llvm-project.git 00f463e835d4baa285bc488c9fcb859e0461e452)` not needed
llvm/test/tools/llvm-mca/X86/Inputs/diff-option-file-4.s
45 ↗	(On Diff #391996)	(https://github.com/mmatic05/llvm-project.git 00f463e835d4baa285bc488c9fcb859e0461e452) not needed
llvm/tools/llvm-mca/MachineCodePerformance.cpp
1 ↗	(On Diff #391996)	This is missing the license information.
llvm/tools/llvm-mca/MachineCodePerformance.h
11 ↗	(On Diff #391996)	I guess this should go into the cpp unit.
llvm/tools/llvm-mca/Views/ResourcePressureView.h
100 ↗	(On Diff #391996)	`SmallVectorImpl` instead?
llvm/tools/llvm-mca/llvm-mca.cpp
461 ↗	(On Diff #391996)	no need for the curly brackets here
592 ↗	(On Diff #391996)	here as well

djtodoro added inline comments.Dec 6 2021, 2:41 AM

llvm/tools/llvm-mca/Views/SummaryView.cpp
114 ↗	(On Diff #391996)	is there another way we can get this info? I don't like these uints passed by ref here...

Harbormaster completed remote builds in B137615: Diff 391996.Dec 6 2021, 3:33 AM

RKSimon added a reviewer: RKSimon.Dec 6 2021, 5:24 AM

Hi @mmatic05 ,

Thanks for contributing this patch. However, I have a few questions/concerns regarding your suggested design.

Is there a reason why this feature should be implemented as part of llvm-mca?

Wouldn't it be simpler to contribute this feature as a separate script or tool?

You could have a python script that simply drives multiple mca runs under the hood, and then collects the json output from each individual run. That same script would then also diff the various outputs and pretty-print the result either as normal text, or as yet-another json file.

That way, you would be able to provide your new feature without having to change llvm-mca.
Any extension/customization on the diffing logic would be fully transparent to the main llvm-mca driver.

What do you think?

-Andrea

andreadb added reviewers: lebedev.ri, qcolombet, holland11.Dec 6 2021, 6:25 AM

+1 To refactor this as a helper python script (llvm-mca-diff.py ?) we can add to llvm-project\llvm\utils - there might be some json output we're missing but I think everything else can be kept out of the exe

I didn't realise that you already have a python script to do the llvm-mca comparisons.

Would it be possible to contribute your LLVM-MCA-PRETTY-PRINTER python script instead?

+1, we can keep it within llvm-project/llvm/utils

Thanks a lot to everyone on the comments, I'm working on it.

addressed comments
added separate python script which calls llvm-mca tool and has -diff option to compare multiple files

Thanks for the update. Can you please update the summary?

djtodoro added inline comments.Dec 7 2021, 7:48 AM

llvm/utils/llvm-mca-pretty-printer/llvm-mca-pretty-printer.py
33 ↗	(On Diff #392398)	`-use-mca` ?
39 ↗	(On Diff #392398)	Can this be more descriptive?

djtodoro added inline comments.Dec 7 2021, 7:50 AM

llvm/utils/llvm-mca-pretty-printer/llvm-mca-pretty-printer.py
117 ↗	(On Diff #392398)	`RThroughput`

mmatic05 edited the summary of this revision. (Show Details)Dec 7 2021, 7:51 AM

ntesic added a subscriber: ntesic.Dec 7 2021, 8:00 AM

RKSimon added a reviewer: gbedwell.Dec 7 2021, 8:21 AM

Not to descend into bike shedding, but is llvm-mca-pretty-printer.py a suitable name?

llvm/utils/llvm-mca-pretty-printer/llvm-mca-pretty-printer.py
33 ↗	(On Diff #392398)	If we follow the style used by the llvm\utils\update_*_test_checks.py scripts - this should be: --llvm-mca-binary=<path to llvm-mca>

Also, it might be easier if this was not in a subdir - llvm/utils/llvm-mca-pretty-printer.py - to avoid path problems

Harbormaster completed remote builds in B137908: Diff 392398.Dec 7 2021, 8:42 AM

addressed comments

Harbormaster completed remote builds in B138109: Diff 392691.Dec 8 2021, 2:25 AM

mmatic05 edited the summary of this revision. (Show Details)Dec 8 2021, 2:31 AM

addressed comments

djtodoro added inline comments.Dec 8 2021, 3:41 AM

llvm/utils/llvm-mca-compare-multiple-files.py
47 ↗	(On Diff #392700)	we should be using one naming convention (either camelCase or under_scores), but I vote for underscores
53 ↗	(On Diff #392700)	this newline is not needed
91 ↗	(On Diff #392700)	I suggest that we check for this package as: try: import termtables except ImportError: print('error: termtables not found.') return

Harbormaster completed remote builds in B138114: Diff 392700.Dec 8 2021, 4:15 AM

RKSimon added inline comments.Dec 8 2021, 5:28 AM

llvm/utils/llvm-mca-compare-multiple-files.py
30 ↗	(On Diff #392700)	why is skylake the default? should it be 'native'?

If the user doesn't specify an -mtriple, then we shouldn't pass a default. llvm-mca should be able to deduce the default target triple (i.e. sys::getDefaultTargetTriple()).
It should not be an x86 triple by default.

Same for the cpu, you shouldn't default it to 'skylake' (why skylake? Also, x86 may not be in the list of available targets).
Basically, that flag should not be passed to llvm-mca if the user doesn't explicitly set it. llvm-mca would default mcpu to llvm::sys::getHostCPUName().

About the flags:
do we actually need to specify -diff on the command line? Isn't it implicit if we pass multiple assembly files in input?

Also, (as a feature request) would it be possible to also get a diff of the resource pressure (for the iteration)? Comparing the summary view is important. However, I believe that most users will also want to use your script to quickly see the differences in resource pressure.

It would be very useful if we could also pass extra flags in input to llvm-mca.
As you know, flags like -noalias and/or -dispatch may affect the outcome of a simulation. When doing a diff, users should be allowed to pass extra args to llvm-mca.

This also leads to a (minor) issue with the output of this script.
The simulation setup is currently not printed out. The only context that we get is:

which files were passed in input to the script
the total number of iterations.

That information may not be enough to reproduce an experiment. For example, you would need to know what the other flags were (example: which subtarget was specified).

I think that we should print out which flags were actually passed in input to llvm-mca. If people are concerned about the verbosity of it, then we can do it only the user has asked for a more verbose output (e.g. flag -v (verbose) was passed in input to the script).

Last: have you considered also diffing .json files? I guess that the feature of diffing json files could be added in future. It would allow skipping the first step (i.e. the actual simulation step), and it would simply compare the .json outputs if compatible (i.e. target settings are the same). This is just an idea. In case, it could be contributed in future as a follow-up patch if we think it would be useful.

I hope it makes sense.

addressed comments

Thanks a lot everyone on suggestions and comments.
We do not have information about the resource pressure (for iteration) in the json object that is the result of the llvm-mca tool. This may be one of the future patches, which will parse the stdout. I would also leave the json format printing for some next patch.

Herald added a subscriber: pengfei. · View Herald TranscriptDec 9 2021, 5:29 AM

Thanks a lot, @mmatic05!

How about llvm/utils/llvm-mca-compare-multiple-files.py --> llvm/utils/llvm-mca-compare.py ?

Harbormaster completed remote builds in B138419: Diff 393119.Dec 9 2021, 6:14 AM

In D115138#3182603, @mmatic05 wrote:

addressed comments

Thanks a lot everyone on suggestions and comments.
We do not have information about the resource pressure (for iteration) in the json object that is the result of the llvm-mca tool. This may be one of the future patches, which will parse the stdout. I would also leave the json format printing for some next patch.

That's not correct.

Information about the per-"iteration resource pressure" is already in the json output.

Example:

addq    %rax, %rcx
addq    %rax, %rdx
addq    %rax, %rsi
addq    %rax, %rdi

Assuming the following simulation parameters:

"SimulationParameters": {
  "-march": "x86_64",
  "-mcpu": "znver1",
  "-mtriple": "x86_64-unknown-linux-gnu"
},

For 10 iterations, you get the following json output for the resource pressure view:

"Instructions": [
  "addq\t%rax, %rcx",
  "addq\t%rax, %rdx",
  "addq\t%rax, %rsi",
  "addq\t%rax, %rdi"
],
"ResourcePressureView": {
  "ResourcePressureInfo": [
    {
      "InstructionIndex": 0,
      "ResourceIndex": 5,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 1,
      "ResourceIndex": 4,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 2,
      "ResourceIndex": 3,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 3,
      "ResourceIndex": 2,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 4,
      "ResourceIndex": 2,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 4,
      "ResourceIndex": 3,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 4,
      "ResourceIndex": 4,
      "ResourceUsage": 1
    },
    {
      "InstructionIndex": 4,
      "ResourceIndex": 5,
      "ResourceUsage": 1
    }
  ]
},

There are four instructions and five indices. The last index is always for the full iteration. The index value for the iteration matches the length (in number of instructions) of the code snippet in input. So, it is 4 for that example.

andreadb added inline comments.Dec 9 2021, 6:47 AM

llvm/utils/llvm-mca-compare-multiple-files.py
34 ↗	(On Diff #393119)	I am not entirely sure if this is the right approach. I suggest to delegate the parsing of llvm-mca specific arguments to the llvm-mca driver itself. I would only parse here flags that are specific to your script only. In one of my previous posts, I was essentially suggesting to only parse fine names, and simply forward all other options to each llvm-mca invocation. Basically, llvm-mca arguments could be simply treated like a generic string to append to the llvm-mca invocation. You don't need to know what is being passed in input. If the llvm-mca run is successful, then those args made sense. You can always inspect the json file for the full set of simulation parameters and the summary view information (for the number of iterations). The way how it is now is that we parse most llvm-mca arguments here, and then we let llvm-mca do the same again. This design is not ideal because if forces us to mirror changes to (non-view related) llvm-mca flags to this script too. This script doesn't need to know about the semantic of llvm-mca specific flags. Once the set of input code snippets is known, we simply forward everything else to llvm-mca. That way, you avoid to duplicate the logic that parses arguments, and you can probably get rid of `verify_program_inputs`.
160–162 ↗	(On Diff #393119)	We should get this information from the json output. See json object "SimulationParameters". Example: "SimulationParameters": { "-march": "x86_64", "-mcpu": "znver1", "-mtriple": "x86_64-unknown-linux-gnu" }, That json object also stores information about flags like -noalias and -dispatch. Example: "SimulationParameters": { "-dispatch": 2, "-march": "x86_64", "-mcpu": "znver1", "-mtriple": "x86_64-unknown-linux-gnu", "-noalias": false }, The advantage of doing things this way, is that we know exactly which "native" subtarget was selected (for the case where no -mcpu is passed in input).
181–192 ↗	(On Diff #393119)	What happens with asm files declaring multiple code regions? Your script only processes the first code region. I am not sure how we should deal with those. If the idea is that this script shouldn't support multiple regions, then we should warn if one of the json outputs provides information about more than one code region.