This is an archive of the discontinued LLVM Phabricator instance.

[RFC] llvm-mca: a static performance analysis tool.
ClosedPublic

Authored by andreadb on Mar 1 2018, 9:15 AM.

Details

Summary

This is related to the RFC on llvm-dev visible at this link:
http://lists.llvm.org/pipermail/llvm-dev/2018-March/121490.html

llvm-mca is an LLVM based performance analysis tool that can be used to
statically measure the performance of code, and to help triage potential
problems with target scheduling models.

llvm-mca uses information which is already available in LLVM (e.g. scheduling
models) to statically measure the performance of machine code in a specific cpu.
Performance is measured in terms of throughput as well as processor resource
consumption. The tool currently works for processors with an out-of-order
backend, for which there is a scheduling model available in LLVM.

The main goal of this tool is not just to predict the performance of the code
when run on the target, but also help with diagnosing potential performance
issues.

Given an assembly code sequence, llvm-mca estimates the IPC (instructions per
cycle), as well as hardware resources pressure. The analysis and reporting style
were mostly inspired by the IACA tool from Intel.

There is an RFC about this tool on llvm-dev.

This review covers two patches:
A first (very small) patch that always enables the generation of processor
resource names in the SubtargetEmitter. Currently, the processor resource names
are only generated for debugging purposes, but are needed by the tool to
generate user friendly reports, so we would like to always generate them.
A second patch with the actual static analysis tool (in llvm/tools).

Once these first two patches are committed, the plan is to keep working on the
tool with the help of the community to address all of the limitations described
by the RFC (which should have been posted on llvm-dev), and find good
solutions/fixes for the design issues described by section "Known design problems".

We hope the community will find this tool useful like we have.

Special thanks to Simon Pilgrim, Filipe Cabecinhas and Greg Bedwell who really
helped me a lot by suggesting improvements and testing the tool.

Diff Detail

Event Timeline

andreadb created this revision.Mar 1 2018, 9:15 AM
andreadb edited the summary of this revision. (Show Details)
bcain added a subscriber: bcain.Mar 1 2018, 9:28 AM
spatel added a subscriber: spatel.Mar 1 2018, 9:50 AM
bcain requested changes to this revision.Mar 1 2018, 12:14 PM

Thanks for the contribution, it looks interesting.

tools/llvm-mca/BackendStatistics.h
45

I think you should probably avoid "using" statements from the global scope in header files.

tools/llvm-mca/Dispatch.h
246

I think it's an error to move the result of make_unique.

tools/llvm-mca/TimelineView.cpp
30

This does not have enough fields to match the TimelineViewEntry defn.

34

Ditto for WaitTimeEntry.

This revision now requires changes to proceed.Mar 1 2018, 12:14 PM
fhahn added a subscriber: fhahn.Mar 2 2018, 2:57 AM

Thanks Brian for the feedback!

tools/llvm-mca/BackendStatistics.h
45

I will remove it. I have already added the llvm:: prefix to raw_ostream. I think I can do the same for the remaining types.
Thanks for spotting this.

tools/llvm-mca/Dispatch.h
246

Right. I will remove the move (even at line 246). Thanks!

tools/llvm-mca/TimelineView.cpp
30

Nice catch!
I will fix it.

34

I will fix it. Thanks!

courbet added inline comments.Mar 2 2018, 4:47 AM
tools/llvm-mca/Backend.h
48

My main reservation is around the fact that there is a lot of coupling between the driver (RunCycle()/NotifyCycleBegin()/NotifyCycleEnd()) and the pipeline itself. I think this should be split into a driver class ad a configurable pipeline.
With the current design the backend has to know about all the possible pipelines and expose all the quantities that listeners might be interested in (GetNumXXXStalls(), NotifyInstructionXXX()). This does not scale well to more pipeline components (in particular to simulate the frontend). I think it would be better to manipulate IB/HWS/DU though a common interface and give this interface a way to notify the listeners.
What do you think ?

tools/llvm-mca/llvm-mca.cpp
316

I think that in the long run it should be possible for subtargets to specialize the creation of the backend. This would allow correctly modeling some finer implementation details. To that end we should add a createMCABackend() to llvm::Target. (in the future) .

utils/TableGen/SubtargetEmitter.cpp
611

I would split this to a separate diff and submit independently. These tables are so small that the added size is negligible.

courbet added inline comments.Mar 2 2018, 5:21 AM
tools/llvm-mca/BackendPrinter.cpp
171

Instead of:

BackendPrinter::AddView1(args) { View1 = make_unique<View1>(args); } 
BackendPrinter::AddView2(args) { View2 = make_unique<View2>(args); } 

BackendPrinter::print() const {
  if (View1) {
    View1->printView1();
  }
  if (View2) {
    View2->printView1();
  }
}

what about creating a BackendView interface and do:

BackendPrinter::AddView(std::unique_ptr<View>) { Views.push_back(std::move(View)); } 

BackendPrinter::print() const {
  for (const auto& View : Views) {
    View->print(getOStream(), B);
  }
}

It would make it easier to add custom views to the output, and unify the API for various views (printDispatchStalls(), printRATStatistics() , printGeneralStatistics(), printSchedulerUsage() become views).

tools/llvm-mca/TimelineView.h
139

It seems that this is used only by printTimeline(), and WaitTime is used only by printAverageWaitTimes(), is there any reason to not split TimelineView into TimelineView and WaitTimeView ?

Hi Clement,

Thanks for the feedback! Please see my answers below.

Cheers,
Andrea

tools/llvm-mca/Backend.h
48

I agree with you that this is what should be done in the medium/long term. I think it is a must if we want to let targets compose the sequence of stages.

When I originally designed this tool, I adopted a similar approach to the one you described, where every component was able to notify events to listeners through a common interface.
However, I ended up adopting this simpler but "less modular" design mainly for simplicity; the number of component/stages was not "big enough" to justify the presence of multiple event producers. Essentially, having the Backend as the single publisher of events was much simpler.

As I mentioned in the RFC, this version of the tool doesn currently model the hardware frontend. The long term goal is to teach the tool how to analyze frontend performance too. So, I would be very happy if you guys at Google would help contributing that part!

In the meantime, would it be possible to have this changed in a follow-up patch in preparation for the "future" development?

tools/llvm-mca/llvm-mca.cpp
316

I agree.

utils/TableGen/SubtargetEmitter.cpp
611

Yes. That was the plan. This patch is essentially the union of two patches.
As you wrote, the change to this file is small enough to be committed independently (thanks to your previous patch on the processor resouce group descriptors :-)).

Since this is still an RFC, I plan to keep all the code into a single patch. Once/If people are happy with this design, then the change to Tablegen would be committed first.

andreadb updated this revision to Diff 136726.Mar 2 2018, 5:50 AM

Patch updated. Addressed review comments from Brian.

andreadb marked 3 inline comments as done.Mar 2 2018, 5:51 AM
bcain accepted this revision.Mar 2 2018, 6:20 AM

FYI several of the issues that I found are ones that I discovered when I tried to build the patch with 6.0.rc2. Also, I didn't get to detail them here but there were warnings regarding virtual methods w/o virtual destructor. I believe those were on the classes inherited from HWEventListener.

This revision is now accepted and ready to land.Mar 2 2018, 6:20 AM
courbet added inline comments.Mar 2 2018, 6:27 AM
tools/llvm-mca/Backend.h
48

So, I would be very happy if you guys at Google would help contributing that part!
In the meantime, would it be possible to have this changed in a follow-up patch in preparation for the "future" development?

SGTM.

MatzeB added a subscriber: MatzeB.Mar 4 2018, 12:34 PM

This is great! I always wanted to see a IACTA like tool for other architectures.

I only glanced over the patch, so this is only the nipicky/obvious things:

  • New llvm code should use camelCase rather than CamelCase for method names.
  • Too much use of auto for my taste. I would only use it in situations where they type is immediately obvious (on the same line) such as auto x = dyn_cast<YYY>() but nowhere else. I find it makes code harder to read when I first have to search around what things a for loop is iterating over for example.
  • Needs documentation in docs/CommandGuide. Maybe you can incorporate some passages from your llvm-dev mail.

I would love to see this tool land in tree. And as far as I am concerned we can push it and continue development in-tree when 1) waited some more for other people to chime in here, 2) we verified the normal code isn't changed too much (double check code size of MC Unit name change) 3) we have some documentation.

include/llvm/MC/MCSchedule.h
27–29

Do we have an idea how much this increases code size of other llvm tools? If this turns out to be a lot then we may be better of keeping the names either in a separate library or a separate global variables so that the linker can strip them away if unused.

tools/llvm-mca/Backend.h
79

return value is always 0 and never used.

tools/llvm-mca/Dispatch.cpp
28

Move comment to the declaration.

tools/llvm-mca/Dispatch.h
85

Make WS a reference to indicate it mustn't be nullptr?

89

You should use SmallVectorImpl<> & in API to be independent of the small-size.

305

feels odd to have a return with a void value.

tools/llvm-mca/InstrBuilder.cpp
114–119

Did clang-format mess this part up?

This is great! I always wanted to see a IACTA like tool for other architectures.

Thanks Matthias!

I only glanced over the patch, so this is only the nipicky/obvious things:

  • New llvm code should use camelCase rather than CamelCase for method names.

Will do.

  • Too much use of auto for my taste. I would only use it in situations where they type is immediately obvious (on the same line) such as auto x = dyn_cast<YYY>() but nowhere else. I find it makes code harder to read when I first have to search around what things a for loop is iterating over for example.

Will do.

  • Needs documentation in docs/CommandGuide. Maybe you can incorporate some passages from your llvm-dev mail.

I will add a document there. I also plan to add a README.txt with the long description from the RFC.

I also plan to add more tests and restructure X86 so that there is a sub-directory for specific processors.
Something like "X86/BtVer2/dot-product.s"; etc.

Code comments can probably be improved, and I plan to do integrate a few more comments in the code.

I would love to see this tool land in tree. And as far as I am concerned we can push it and continue development in-tree when 1) waited some more for other people to chime in here, 2) we verified the normal code isn't changed too much (double check code size of MC Unit name change) 3) we have some documentation.

I was planning to wait a few more days to give other people an opportunity to reply.

In the meantime, I am going to update a new version of the patch that addresses your review comments.

Thanks again Matthias,
-Andrea

andreadb added inline comments.Mar 5 2018, 2:19 AM
include/llvm/MC/MCSchedule.h
27–29

I need to check this.
I expect the impact to be very small, since the number of names per processor tends to be very small. I plan to compare tool sizes before/after to see if the code size increase is negligible or not.

RKSimon added inline comments.Mar 5 2018, 2:47 AM
tools/llvm-mca/BackendPrinter.cpp
117

(style)

for (unsigned Idx = 0, E = B.GetNumInstructions(); Idx < E; ++Idx) {

Same throughout the patch

andreadb updated this revision to Diff 137006.Mar 5 2018, 8:17 AM

Patch updated.

  • Added documentation in docs/CommandGuide/llvm-mca.rst
  • Added a README.txt containing most of the useful documentation originally posted in the RFC, plus an extra "Future work" section.
  • Fixed coding style for functions/methods (as requested by Matthias)
  • Removed most uses of the 'auto' keyword (as suggested by Matthias)
  • Fixed a few spelling mistakes in code comments. Also, added a few extra code comments.
  • Added more X86 tests. Created sub-directory for BtVer2 specific tests: dot-product.s , memcpy-like-test.s, load-store-alias.s Each test now checks the timeline report. Tests memcpy-like-test.s and load-store-alias.s specifically test the behavior of flag -noalias.

This patch should also address code review comments from Simon.

@MatzeB
I checked the impact of this patch on the code size of other llvm tools.
Tested on Release build configuration enabling all the default LLVM targets. Host compiler used to build the tools is the visual studio compiler for vs2015 64-bit.
As expected, the code size increase is small. With all targets enabled, the average code size increase is 0.03%. For example, llc and opt increased in size of a 0.03% when the MC+Tablegen patch was applied. The code size increase is 0.08% in the worst case scenario (llvm-dwarfdump and llvm-rtdyld; however, these two tools are much smaller in size).

Thanks,
Andrea

RKSimon added inline comments.Mar 5 2018, 8:38 AM
docs/CommandGuide/llvm-mca.rst
16

throughput as

104

default load or store queue size?

andreadb updated this revision to Diff 137012.Mar 5 2018, 8:44 AM
andreadb marked 2 inline comments as done.

Patch updated.

This should address all the remaining review comments from Matthias.
This should also address the most recent comments from Simon on the .rst file.

andreadb marked 7 inline comments as done.Mar 5 2018, 8:47 AM
andreadb added inline comments.
tools/llvm-mca/InstrBuilder.cpp
114–119

Yes. I manually fixed it. There was another place where this happened.
It looks like clang-format doesn't like complex statements inside of a DEBUG call.

andreadb updated this revision to Diff 137159.Mar 6 2018, 4:37 AM
andreadb marked an inline comment as done.

Patch updated.

Added a virtual destructor and a virtual anchor method for HWEventListener.
Also added missing header comments to file HWEventListener.h

RKSimon accepted this revision.Mar 7 2018, 4:17 AM

LGTM - as discussed in the summary, committing the llvm core changes separately prior to the llvm-mca addition. Then the further work discussed on the llvm-dev rfc can begin, and it should make it easier for others to investigate the tool themselves and propose patches (and find bugs....).

LGTM - as discussed in the summary, committing the llvm core changes separately prior to the llvm-mca addition. Then the further work discussed on the llvm-dev rfc can begin, and it should make it easier for others to investigate the tool themselves and propose patches (and find bugs....).

Thanks Simon.

@courbet
Once this patch is landed, I plan to address the comment where you suggest to introduce a View interface. That would allow customization of the performance report.
I already have this patch, but since it's pretty mechanical - if you agree - I would just commit it directly.
The reason why I didn't integrate that patch into this patch is mainly because it causes a small refactoring of the BackendPrinter interface. Since this patch is already big, I thought it was better to have it in a separate commit. If you agree with this plan, then I would commit it once this big patch landed.

This afternoon, I will post a comment on the RFC so that people are aware that the patch is ready to land.

As Clement pointed out on the RFC thread, his team would like to integrate extra logic (for example, to enable the frontend performance analysis).
To help keep track of the work on llvm-mca, I plan to create bugs for all the relevant comments/requests. The idea is to avoid that people end up working on the same tasks.

Thanks,
Andrea

courbet accepted this revision.Mar 7 2018, 5:35 AM

@courbet
Once this patch is landed, I plan to address the comment where you suggest to introduce a View interface. That would allow customization of the performance report.
I already have this patch, but since it's pretty mechanical - if you agree - I would just commit it directly.

SGTM.

This revision was automatically updated to reflect the committed changes.