Add an llvm-opt-report tool to generate basic source-annotated optimization…

Description

Add an llvm-opt-report tool to generate basic source-annotated optimization summaries

LLVM now has the ability to record information from optimization remarks in a
machine-consumable YAML file for later analysis. This can be enabled in opt
(see r282539), and D25225 adds a Clang flag to do the same. This patch adds
llvm-opt-report, a tool to generate basic optimization "listing" files
(annotated sources with information about what optimizations were performed)
from one of these YAML inputs.

D19678 proposed to add this capability directly to Clang, but this more-general
YAML-based infrastructure was the direction we decided upon in that review
thread.

For this optimization report, I focused on making the output as succinct as
possible while providing information on inlining and loop transformations. The
goal here is that the source code should still be easily readable in the
report. My primary inspiration here is the reports generated by Cray's tools
(http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html).
These reports are highly regarded within the HPC community. Intel's compiler,
for example, also has an optimization-report capability
(https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).

$ cat /tmp/v.c
void bar();
void foo() { bar(); }

void Test(int *res, int *c, int *d, int *p, int n) {
  int i;

#pragma clang loop vectorize(assume_safety)
  for (i = 0; i < 1600; i++) {
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  }

  for (i = 0; i < 16; i++) {
    res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
  }

  foo();

  foo(); bar(); foo();
}

D25225 adds -fsave-optimization-record (and
-fsave-optimization-record=filename), and this would be used as follows:

$ clang -O3 -o /tmp/v.o -c /tmp/v.c -fsave-optimization-record
$ llvm-opt-report /tmp/v.yaml > /tmp/v.lst
$ cat /tmp/v.lst

< /tmp/v.c
 2          | void bar();
 3          | void foo() { bar(); }
 4          |
 5          | void Test(int *res, int *c, int *d, int *p, int n) {
 6          |   int i;
 7          |
 8          | #pragma clang loop vectorize(assume_safety)
 9     V4,2 |   for (i = 0; i < 1600; i++) {
10          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
11          |   }
12          |
13  U16     |   for (i = 0; i < 16; i++) {
14          |     res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
15          |   }
16          |
17 I        |   foo();
18          |
19          |   foo(); bar(); foo();
   I        |   ^
   I        |                 ^
20          | }

Each source line gets a prefix giving the line number, and a few columns for
important optimizations: inlining, loop unrolling and loop vectorization. An
'I' is printed next to a line where a function was inlined, a 'U' next to an
unrolled loop, and 'V' next to a vectorized loop. These are printed on the
relevant code line when that seems unambiguous, or on subsequent lines when
multiple potential options exist (messages, both positive and negative, from
the same optimization with different column numbers are taken to indicate
potential ambiguity). When on subsequent lines, a '^' is output in the relevant
column.

Annotated source for all relevant input files are put into the listing file
(each starting with '<' and then the file name).

You can disable having the unrolling/vectorization factors appear by using the
-s flag.

Differential Revision: https://reviews.llvm.org/D25262