This is an archive of the discontinued LLVM Phabricator instance.

[ThinLTO] Implement summary visualizer
ClosedPublic

Authored by evgeny777 on Dec 15 2017, 9:15 AM.

Details

Summary

I find it useful on some occasions (especially when PGO is also used) to have ThinLTO combined summary exported to DOT file,
so it can be examined with GraphViz or (for large projects) analyzed with python scripts. For instance below is the graph, demonstrating
indirect call promotion of functions called cold and hot:

It shows cold and hot edges, refs (dashed lines), dead nodes (red) and is clustered by TU. This graph was obtained from following source file
(compiled with -O2 -fprofile-generate -flto=thin):

#include <stdio.h>
#include <math.h>
#include <stdlib.h>

double H = 0, C = 0;
void hot() __attribute__((noinline)) { H += (sin(H)*cos(H) + 1); }
void cold() __attribute__((noinline)) { C += 2*cos(C); }
void call_some(void (*fun)(), unsigned count) { fun(); }

int main(int argc, char *argv[]) {
  // Both cold and hot are called indirectly
  void (*func[])() = { hot, cold };

  for (int i = 0; i < atoi(argv[1]); ++i) {
    int d = rand() % 10;
    func[d > 7]();
  }
  printf("H = %f, C = %f\n", H, C);
  return (int)(C + H);
}

The dot file is dumped when caller provides --save-temps flag and contains node GUIDs. There is a small python script (approx 10 lines) used for annotation.
The code itself is a bit messed up and lacks test case, but for now I'm just wondering if it is of any interest to anybody

Diff Detail

Repository
rL LLVM

Event Timeline

evgeny777 created this revision.Dec 15 2017, 9:15 AM

Nice! This looks very useful. What exactly does the python script do? I suppose the script must have been used to create the attached .png graph, since it has GV names and not GUIDs?

Thanks for looking at it!
Currently I'm annotating graph in the following way

  1. I use this shell command to generate file containing GUIDs and their identifiers from bitcode files:

find ./ -iname '*.o' | xargs llvm-modextract -n=0 -o - | llvm-lto -thin-lto -print-summary-global-ids - >& hashnames.txt

  1. Use this python script (annotate.py) to annotate dot file (cat hashnames.txt | python annotate.py my.index.dot > annotated.dot)
import sys
import re

pat = re.compile(r'GUID [0-9]+\(([0-9]+)\) is (.*)')
pairs = []

for line in sys.stdin:
    m = pat.match(line)
    if m:
        pairs += [(m.group(1), m.group(2))]

with open(sys.argv[1]) as f:
    for line in f:
        l2 = line
        for p in pairs:
            l2 = line.replace("@" + str(p[0]), p[1])
            if l2 != line: break
        sys.stdout.write(l2)
  1. Make graph png image using graphviz:

dot -Tpng -oannotate.png annotate.dot

I suspect it should be possible to do this after the ThinLink and add notation about the importing decision, promotion, dead-elimination, ... on every edge/nodes.

evgeny777 updated this revision to Diff 127353.Dec 18 2017, 6:56 AM

@mehdi_amini I've updated diff with one which no longer requires post-processing at the cost of having extra StringRef in GlobalValueSummaryInfo

add notation about the importing decision, promotion, dead-elimination, ... on every edge/nodes.

It this supposed to be done as comments in DOT file?

@mehdi_amini I've updated diff with one which no longer requires post-processing at the cost of having extra StringRef in GlobalValueSummaryInfo

add notation about the importing decision, promotion, dead-elimination, ... on every edge/nodes.

It this supposed to be done as comments in DOT file?

I don't know about the best way to display this, could be color, bold, label, or other markers.

I don't know about the best way to display this, could be color, bold, label, or other markers.

Currently the following scheme is used:

  1. Nodes:
  • Red - means node is dead (isLive() returns false)
  • Yellow - not eligible to import
  • Rectangles - functions
  • Dotted rectangles - aliases
  • Rounded rectangles - data members
  1. Edges
  • Solid - means function call. Color is different for different hotness value (black = unknown, blue = cold, brown = hot, bold red = critical)
  • Dashed - means reference

Any suggestions?

Also it shows linkage for functions and variables. For each function flag bits and number of instructions are shown as well.

Any comments or suggestions?

Sorry for the delay! I have one big concern below. Will look at the code that does the graph emission next, I've only skimmed that part.

include/llvm/IR/ModuleSummaryIndex.h
77 ↗(On Diff #127353)

I have a concern about keeping this string here. Who owns the string? I guess it is the string table created during bitcode reading. Do those even stay live through reading all input bitcode files and building the combined index? If not, saving a reference here is a problem. If so, then we currently have overhead that isn't typically necessary (we don't operate on IR or the names during the thin link).

I guess you added this to avoid needing the script to annotate these names instead of using the guids. Assuming we need to do something to keep these names live, I wonder if we can keep these names around somewhere under a debugging option instead (and emit the GUID by default).

112 ↗(On Diff #127353)

Note that we'll never have a GV in the combined index case. Although it may be good to support emitting this for the per-module summary as well.

evgeny777 added inline comments.Jan 12 2018, 8:49 AM
include/llvm/IR/ModuleSummaryIndex.h
77 ↗(On Diff #127353)

Who owns the string?

My understanding is that linker owns this string because it loads bitcode file to memory and keeps it there during entire link phase. This StringRef points to strtab of the module, which is, to my understanding, is a region within MemoryBuffer owned by lld. Not so sure about gold.

I've found a problem though with VST_CODE_ENTRY/VST_CODE_FINENTRY. When they're being parsed their names are created on stack. This should be fixed.

I wonder if we can keep these names around somewhere under a debugging option instead (and emit the GUID by default).

Sounds reasonable, if there is no other way. I'll take a look on code reading/writing VST entries

112 ↗(On Diff #127353)

Although it may be good to support emitting this for the per-module summary as well.

Yep, that's what I was thinking about.

mehdi_amini added inline comments.Jan 12 2018, 8:58 AM
include/llvm/IR/ModuleSummaryIndex.h
77 ↗(On Diff #127353)

My understanding is that linker owns this string because it loads bitcode file to memory and keeps it there during entire link phase

At which point of the API is this guarantee? What about other tools than the linker that would manipulate a combined summary?

At which point of the API is this guarantee?

  1. ThinLTO.ModuleMap stores BitcodeModule objects and uses them during entire thin link phase (until backend threads are launched)
  2. BitcodeModule contains pointers to BC data, so this data should be actual for module to be parsed)
  3. Module summary is exported to .dot in CombinedIndexHook which is invoked in the beginning of thin link phase

At which point of the API is this guarantee?

  1. ThinLTO.ModuleMap stores BitcodeModule objects and uses them during entire thin link phase (until backend threads are launched)
  2. BitcodeModule contains pointers to BC data, so this data should be actual for module to be parsed)
  3. Module summary is exported to .dot in CombinedIndexHook which is invoked in the beginning of thin link phase

Note that the BitcodeModule itself holds a StringRef. It looks like this essentially points into the MemoryBuffer passed in as a MemoryBufferRef to InputFile::create, which is invoked by the linker. I don't recall what the guarantees are about how long that MemoryBufferRef has to stay valid. pcc may know since he added the InputFile and string table handling.

Mehdi also raises a good point that this wouldn't work if we used a tool that operated on an existing serialized combined index, which doesn't contain symbol names.

If we do have names available, as an aside, we could use them in many debugging messages that currently emit GUID during the thin link, which would be great. I just am concerned that the LTO API may need to explicitly keep these names around, in which case it would be good to do that only under a debugging option.

@tejohnson The BM.parseModule is invoked in the very end of thin link (see runThinLTOBackendThread). How can it work if it's not guaranteed that we have valid BC data till this moment? Note: parse module uses Stream

@tejohnson The BM.parseModule is invoked in the very end of thin link (see runThinLTOBackendThread). How can it work if it's not guaranteed that we have valid BC data till this moment? Note: parse module uses Stream

Good point. Since we create this map during combined index creation, and use it through the backends, then I guess we do have this available. Mehdi - any concerns I have missed?

We won't be able to dump the graph with names on an existing combined index, but that is much less important to support.

tejohnson added inline comments.Jan 12 2018, 10:31 AM
lib/IR/ModuleSummaryIndex.cpp
275 ↗(On Diff #127353)

Why are the nodes and edges emitted this way - i.e. intra-module first, then cross-module? Why not just want the index and emit all the edges in one sweep? That would certainly be less expensive. You could just use the module Id stored in the module path string table (see ModuleSummaryIndex::getModuleId) instead of generating your own, or use the module hash

It might be nice to emit the full module path somewhere (maybe as a comment).

evgeny777 added inline comments.Jan 12 2018, 10:45 AM
lib/IR/ModuleSummaryIndex.cpp
275 ↗(On Diff #127353)

Why are the nodes and edges emitted this way

This is due to GraphViz specifics. When you define clusters (subgraphs), you should list intra-cluster edges inside cluster definitions, otherwise GraphViz won't display them correctly. So you have two options: (a) do not use clusters and list edges in any order or (b) use clusters and define intra-cluster nodes and edges within cluster definition and cross cluster edges outside of it.

You could just use the module Id stored in the module path string table
It might be nice to emit the full module path somewhere (maybe as a comment).

Makes sense

At which point of the API is this guarantee?

  1. ThinLTO.ModuleMap stores BitcodeModule objects and uses them during entire thin link phase (until backend threads are launched)
  2. BitcodeModule contains pointers to BC data, so this data should be actual for module to be parsed)
  3. Module summary is exported to .dot in CombinedIndexHook which is invoked in the beginning of thin link phase

None of these seems like API guarantee to me, it sounds like "this is what actually happens in practice today in this particular workflow".

Can I write a tool that is using the ModuleSummaryIndexBitcodeReader (or the other entry point) to create the index and release the memory buffer that contains the bitcode?

include/llvm/IR/ModuleSummaryIndex.h
71 ↗(On Diff #127353)

Are we size-sensitive here @tejohnson ?
Since the GV field is only used in per-module and and name only in combined summary, we could have some sort of union (variant if available).

At which point of the API is this guarantee?

  1. ThinLTO.ModuleMap stores BitcodeModule objects and uses them during entire thin link phase (until backend threads are launched)
  2. BitcodeModule contains pointers to BC data, so this data should be actual for module to be parsed)
  3. Module summary is exported to .dot in CombinedIndexHook which is invoked in the beginning of thin link phase

None of these seems like API guarantee to me, it sounds like "this is what actually happens in practice today in this particular workflow".

It seems it must be a guarantee of the LTO API, since ThinLTO.ModuleMap holds the BitcodeModule objects and accesses their buffers through and after the thin link.

Can I write a tool that is using the ModuleSummaryIndexBitcodeReader (or the other entry point) to create the index and release the memory buffer that contains the bitcode?

Presumably. The CombinedIndexHook that invokes this new dot dumper is only accessed via the LTO API here. But I suppose we could get in trouble if it was invoked on the index via another tool in the future. Not sure if there is a way to prevent other than documenting this StringRef clearly.

include/llvm/IR/ModuleSummaryIndex.h
71 ↗(On Diff #127353)

Yeah that is a concern. I like the idea of unioning with the GV.

At which point of the API is this guarantee?

  1. ThinLTO.ModuleMap stores BitcodeModule objects and uses them during entire thin link phase (until backend threads are launched)
  2. BitcodeModule contains pointers to BC data, so this data should be actual for module to be parsed)
  3. Module summary is exported to .dot in CombinedIndexHook which is invoked in the beginning of thin link phase

None of these seems like API guarantee to me, it sounds like "this is what actually happens in practice today in this particular workflow".

It seems it must be a guarantee of the LTO API, since ThinLTO.ModuleMap holds the BitcodeModule objects and accesses their buffers through and after the thin link.

Right, but the combined index lives in include/llvm/IR/ModuleSummaryIndex.h and the strings are populated when loading the index/bitcode in lib/Bitcode/Reader/BitcodeReader.cpp.
None of these have anything to do with LTO or the LTO API.

Can I write a tool that is using the ModuleSummaryIndexBitcodeReader (or the other entry point) to create the index and release the memory buffer that contains the bitcode?

Presumably. The CombinedIndexHook that invokes this new dot dumper is only accessed via the LTO API here. But I suppose we could get in trouble if it was invoked on the index via another tool in the future. Not sure if there is a way to prevent other than documenting this StringRef clearly.

Probably!

evgeny777 updated this revision to Diff 129941.Jan 16 2018, 5:34 AM

Changes:

  • Var/function name StringRef is not saved for legacy summary formats (when VST is used instead of strtab).
  • Now printing full module path in comment before cluster definition
  • Added comments to every edge/node
  • Added test case

ModuleId issue was also fixed

Thanks. Do you have an example of what this looks like now with a couple of modules and some cross-module edges?

include/llvm/IR/ModuleSummaryIndex.h
71 ↗(On Diff #127353)

Please union the GV and Name fields as suggested, so that we don't bloat the size. You will likely need to know whether this is a combined index or not when accessing these. Currently we don't have a flag on the index, but one could be added. You can probably get by with checking whether there are any entries in the index's ModulePathStringTable though, since I believe that is empty for the per-module summary index.

76 ↗(On Diff #129941)

Need to clearly document ownership of this string

lib/IR/ModuleSummaryIndex.cpp
174 ↗(On Diff #129941)

This comment now belongs elsewhere.

190 ↗(On Diff #129941)

Not sure what is meant by "without ValueInfo" - this function takes a ValueInfo and uses it.

213 ↗(On Diff #129941)

Rename to something like TypeOrHotness instead?

281 ↗(On Diff #129941)

Maybe use a different type of edge for the alias -> aliasee? I.e. another type beyond reference or call with hotness?

297 ↗(On Diff #129941)

might as well add continue at the end of this if body since the subsequent handling is for the non-empty ModList case.

300 ↗(On Diff #129941)

Since these are cross module edges, when would these be equal?

evgeny777 added inline comments.Jan 19 2018, 5:22 AM
lib/IR/ModuleSummaryIndex.cpp
300 ↗(On Diff #129941)

The edge representing call or ref is drawn to every module target symbol is defined. When target is a linkonce symbol there can be multiple edges representing a single call or ref, both intra-module and cross-module. As we've already drawn all intra-module edges before we skip it here.

evgeny777 added inline comments.Jan 19 2018, 5:30 AM
lib/IR/ModuleSummaryIndex.cpp
297 ↗(On Diff #129941)

Actually we add -1 to module list to draw an edge to an external node (like printf or atoi in the picture). This edge is being drawn in the loop below.

evgeny777 updated this revision to Diff 130598.Jan 19 2018, 6:00 AM

Addressed review comments

Here is what test case dot file looks like:

digraph Summary {
  // Module: /home/evgeny/work/llvm_out/build_ninja_Debug/test/ThinLTO/X86/Output/dot-dumper.ll.tmp1.bc
  subgraph cluster_0 {
    style = filled;
    color = lightgrey;
    label = "dot-dumper.ll.tmp1.bc";
    node [style=filled,fillcolor=lightblue];
    M0_9247563407992198212 [style="dotted,filled",shape="box",label="main_alias",fillcolor="red"]; // alias, dead
    M0_15822663052811949562 [shape="record",label="main|extern (inst: 4, ffl: 0000)}"]; // function
    // Edges:
    M0_9247563407992198212 -> M0_15822663052811949562 [style=dotted]; // alias
  }
  // Module: /home/evgeny/work/llvm_out/build_ninja_Debug/test/ThinLTO/X86/Output/dot-dumper.ll.tmp2.bc
  subgraph cluster_1 {
    style = filled;
    color = lightgrey;
    label = "dot-dumper.ll.tmp2.bc";
    node [style=filled,fillcolor=lightblue];
    M1_12110082535487358335 [shape="Mrecord",label="A|extern}"]; // variable
    M1_6699318081062747564 [shape="record",label="foo|extern (inst: 4, ffl: 0000)}",fillcolor="yellow"]; // function, not eligible to import
    M1_14608648041743670941 [shape="Mrecord",label="B|extern}"]; // variable
    M1_16434608426314478903 [shape="record",label="bar|extern (inst: 1, ffl: 0000)}",fillcolor="red"]; // function, dead
    // Edges:
    M1_6699318081062747564 -> M1_14608648041743670941 [style=dashed]; // ref
    M1_6699318081062747564 -> M1_12110082535487358335 [style=dashed]; // ref
  }
  M0_15822663052811949562 -> M1_12110082535487358335 [style=dashed]; // ref
  M0_15822663052811949562 -> M1_6699318081062747564 // call (hotness : Unknown)
}

Here is the picture

Thanks, just some minor stuff left.

include/llvm/IR/ModuleSummaryIndex.h
77 ↗(On Diff #130598)

Better to initialize the appropriate union member when this is constructed, based on the value of IsAnalysis. I think that would just be in getOrInsertValuePtr, which could be passed IsAnalysis from its callers. Then init GV to nullptr or Name to "" depending on that flag.

147 ↗(On Diff #130598)

Add an assert then that L.isFromAnalysis() == R.isFromAnalysis() ?

lib/Analysis/ModuleSummaryAnalysis.cpp
375 ↗(On Diff #130598)

Please update all of these invocations to document the the constant bool param. E.g. "(/*IsPerformingAnalysis=*/true)"

lib/IR/ModuleSummaryIndex.cpp
214 ↗(On Diff #130598)

I actually meant change the name of parameter Hotness to TypeOrHotness. The array can stay EdgeAttrs. Sorry for not being clear.

lib/Transforms/IPO/FunctionImport.cpp
520 ↗(On Diff #130598)

We should never be computing dead symbols on a partial (per-module) index, so Index.isPerformingAnalysis should always be false. Suggest passing false here directly (with param name in comment).

lib/Transforms/IPO/LowerTypeTests.cpp
1532 ↗(On Diff #130598)

Why is this one true? It doesn't seem like we should be marking as InAnalysis.

evgeny777 added inline comments.Jan 20 2018, 12:23 AM
lib/Transforms/IPO/LowerTypeTests.cpp
1532 ↗(On Diff #130598)

Yep, looks like a bug

evgeny777 updated this revision to Diff 130759.Jan 20 2018, 6:07 AM

Addressed review comments

tejohnson accepted this revision.Jan 20 2018, 7:51 AM

LGTM
Thanks!

This revision is now accepted and ready to land.Jan 20 2018, 7:51 AM
This revision was automatically updated to reflect the committed changes.