This is an archive of the discontinued LLVM Phabricator instance.

Add an llvm-opt-report tool to generate basic source-annotated optimization summaries
ClosedPublic

Authored by hfinkel on Oct 4 2016, 6:30 PM.

Details

Summary

LLVM now has the ability to record information from optimization remarks in a machine-consumable YAML file for later analysis. This can be enabled in opt (see r282539), and D25225 adds a Clang flag to do the same. This patch adds llvm-opt-report, a tool to generate basic optimization "listing" files (annotated sources with information about what optimizations were performed) from one of these YAML inputs.

D19678 proposed to add this capability directly to Clang, but this more-general YAML-based infrastructure was the direction we decided upon in that review thread.

For this optimization report, I focused on making the output as succinct as possible while providing information on inlining and loop transformations. The goal here is that the source code should still be easily readable in the report. My primary inspiration here is the reports generated by Cray's tools (http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html). These reports are highly regarded within the HPC community. Intel's compiler, for example, also has an optimization-report capability (https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).

$ cat /tmp/v.c
void bar();
void foo() { bar(); }

void Test(int *res, int *c, int *d, int *p, int n) {
  int i;

#pragma clang loop vectorize(assume_safety)
  for (i = 0; i < 1600; i++) {
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  }

  for (i = 0; i < 16; i++) {
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  }

  foo();

  foo(); bar(); foo();
}

D25225 adds -fsave-optimization-record (and -fsave-optimization-record=filename), and this would be used as follows:

$ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
$ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
$ cat /tmp/v.lst

< /tmp/v.c
 2     | void bar();
 3     | void foo() { bar(); }
 4     |
 5     | void Test(int *res, int *c, int *d, int *p, int n) {
 6     |   int i;
 7     |
 8     | #pragma clang loop vectorize(assume_safety)
 9   V |   for (i = 0; i < 1600; i++) {
10     |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
11     |   }
12     |
13  U  |   for (i = 0; i < 16; i++) {
14     |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
15     |   }
16     |
17 I   |   foo();
18     |
19     |   foo(); bar(); foo();
   I   |   ^
   I   |                 ^
20     | }

Each source line gets a prefix giving the line number, and a few columns for important optimizations: inlining, loop unrolling and loop vectorization. An 'I' is printed next to a line where a function was inlined, a 'U' next to an unrolled loop, and 'V' next to a vectorized loop. These are printed on the relevant code line when that seems unambiguous, or on subsequent lines when multiple potential options exist (messages, both positive and negative, from the same optimization with different column numbers are taken to indicate potential ambiguity). When on subsequent lines, a '^' is output in the relevant column.

Annotated source for all relevant input files are put into the listing file (each starting with '<' and then the file name).

To see what this looks like for C++ code, here's a small excerpt from CodeGenAction.cpp:

340     |   // If the SMDiagnostic has an inline asm source location, translate it.
341 I   |   FullSourceLoc Loc;
342     |   if (D.getLoc() != SMLoc())
    I   |       ^
    I   |                  ^
    I   |                     ^
343     |     Loc = ConvertBackendLocation(D, Context->getSourceManager());
    I   |           ^
    I   |                                     ^
344     | 
345     |   unsigned DiagID;
346 I   |   switch (D.getKind()) {

I imagine some future enhancements to this output. Taking advantage of more-detailed information in the YAML file, l imagine the loop annotations might look like V4,2U4 for a loop vectorized with VF == 4 and interleaving by 2, and then partially unrolled by a factor of 4.

Please review.

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 73586.Oct 4 2016, 6:30 PM
hfinkel retitled this revision from to Add an llvm-opt-report tool to generate basic source-annotated optimization summaries.
hfinkel updated this object.
hfinkel added a reviewer: anemet.
hfinkel added a subscriber: llvm-commits.
hfinkel updated this revision to Diff 73647.Oct 5 2016, 7:55 AM

Updated to add vectorization and unrolling factors to the output (this can be disabled by passing -s). So the output now looks like this:

< /tmp/v.c
 2          | void bar();
 3          | void foo() { bar(); }
 4          |
 5          | void Test(int *res, int *c, int *d, int *p, int n) {
 6          |   int i;
 7          |
 8          | #pragma clang loop vectorize(assume_safety)
 9     V4,2 |   for (i = 0; i < 1600; i++) {
10          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
11          |   }
12          |
13  U16     |   for (i = 0; i < 16; i++) {
14          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
15          |   }
16          |
17 I        |   foo();
18          |
19          |   foo(); bar(); foo();
   I        |   ^
   I        |                 ^
20          | }
fhahn added a subscriber: fhahn.Oct 5 2016, 9:08 AM
anemet accepted this revision.Oct 5 2016, 9:20 AM
anemet edited edge metadata.

This is really great, LGTM!

tools/llvm-opt-report/OptReport.cpp
118

I think I understand that you're only interested to partially interpret the records but it would be good to add a comment why you don't use the YAML<->class functionality of the YAML I/O library.

This revision is now accepted and ready to land.Oct 5 2016, 9:20 AM
hfinkel added inline comments.Oct 5 2016, 1:11 PM
tools/llvm-opt-report/OptReport.cpp
118

Exactly; will do.

This revision was automatically updated to reflect the committed changes.