This patch implements support for annotated-source optimization reports (a.k.a. "listing" files). Aside from optimizer improvements, this is a top feature request from my performance-engineering team. Most HPC-relevant compilers have some kind of capability along these lines. The DiagnosticInfo infrastructure at the IR level was designed specifically to support the development of this kind of feature, by allowing diagnostic messages to be subclass carrying arbitrary additional payload, although in terms of optimizer feedback, we currently only use this with -Rpass and friends. -Rpass and related options are very useful, but they can generate a lot of output, and that output lacks significant context, making it hard to see if the compiler is really doing what the user expects.
For this optimizer report, I focused on making the output as succinct as possible while providing information on inlining and loop transformations. The goal here is that the source code should still be easily readable in the report. My primary inspiration here is the reports generated by Cray's tools (http://docs.cray.com/books/S-2496-4101/html-S-2496-4101/z1112823641oswald.html). These reports are highly regarded within the HPC community. Intel's compiler, for example, also has an optimization-report capability (https://software.intel.com/sites/default/files/managed/55/b1/new-compiler-optimization-reports.pdf).
$ cat /tmp/v.c void bar(); void foo() { bar(); } void Test(int *res, int *c, int *d, int *p, int n) { int i; #pragma clang loop vectorize(assume_safety) for (i = 0; i < 1600; i++) { res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } for (i = 0; i < 16; i++) { res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; } foo(); foo(); bar(); foo(); }
The patch -flisting and -flisting=filename. For the first form, where the file name is not explicitly specified, the file name is computed automatically just as we do for split-debug output files.
$ clang -O3 -o /tmp/v.o -c /tmp/v.c -flisting
$ cat /tmp/v.lst
< /tmp/v.c 1 | void bar(); 2 | void foo() { bar(); } 3 | 4 | void Test(int *res, int *c, int *d, int *p, int n) { 5 | int i; 6 | 7 | #pragma clang loop vectorize(assume_safety) 8 V | for (i = 0; i < 1600; i++) { 9 | res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; 10 | } 11 | 12 | for (i = 0; i < 16; i++) { 13 U | res[i] = (p[i] == 0) ? res[i] : res[i] + d[i]; 14 | } 15 | 16 I | foo(); 17 | 18 | foo(); bar(); foo(); I | ^ I | ^ 19 | } 20 |
Each source line gets a prefix giving the line number, and a few columns for important optimizations: inlining, loop unrolling and loop vectorization. An 'I' is printed next to a line where a function was inlined, a 'U' next to an unrolled loop, and 'V' next to a vectorized loop. These are printing on the relevant code line when that seems unambiguous, or on subsequent lines when multiple potential options exist (messages, both positive and negative, from the same optimization with different column numbers are taken to indicate potential ambiguity). When on subsequent lines, a '^' is output in the relevant column. The fact that the 'U' is on the wrong line is also a problem with -Rpass=loop-unroll and may be something we can fix in the backend.
Annotated source for all relevant input files are put into the listing file (each starting with '<' and then the file name).
To see what this looks like for C++ code, here's a small excerpt from CodeGenAction.cpp:
340 | // If the SMDiagnostic has an inline asm source location, translate it. 341 I | FullSourceLoc Loc; 342 | if (D.getLoc() != SMLoc()) I | ^ I | ^ I | ^ 343 | Loc = ConvertBackendLocation(D, Context->getSourceManager()); I | ^ I | ^ 344 | 345 | unsigned DiagID; 346 I | switch (D.getKind()) {
There's obvious bikeshedding to do here, and I'm quite open to suggestions. My engineering team often calls these things "listing files", and other tools often name this files with lst as an extension, thus the naming in the patch. Intel's option is -opt-report-file=filename.
After some backend enhancements (to turn the relevant remark types into proper subclasses), I'd like to extend this to also print the vectorization factor, interleaving factor and unrolling factor when relevant. After these enhancements, I'd l imagine the loop annotations might look like V4,2U4 for a loop vectorized with VF == 4 and interleaving by 2, and then partially unrolled by a factor of 4.
Please review.
Should the abbreviation be somehow part of the optimization remark API and passed in just like the pass name?
It would be nice if someone added optimization remark for a new opt, it would show up here automatically. I could see how that could make the output too busy but at least have the option?