[XRay] A tool for Comparing xray function call graphs

Authored by varno on Jan 31 2017, 1:31 AM.



This is a tool for comparing the function graphs produced by the
llvm-xray graph too. It takes the form of a new subcommand of the
llvm-xray tool 'graph-diff'.

This initial version of the patch is very rough, but it is close to
feature complete.

Depends on D29363

Diff Detail

There are a very large number of changes, so older changes are hidden. Show Older Changes
dberris added inline comments.Feb 9 2017, 12:14 AM
416–418 ↗(On Diff #87750)

Lose the { and } ?

This revision now requires changes to proceed.Feb 9 2017, 12:14 AM
varno updated this revision to Diff 87956.Feb 9 2017, 7:59 PM
varno marked 20 inline comments as done.
  • Address reviewer comments
    • fix mistake and address comments
    • address comments
    • rebase
    • address comments
    • -pedantic
    • rebase
dberris requested changes to this revision.Feb 24 2017, 8:41 PM

Lots of readability comments, and a couple of functionality questions.

222–238 ↗(On Diff #87956)

Looks like this is better merged:

for (auto i = 0; i < N; ++i) {
  const auto &G = G[i].get();
  for (const auto &V : G.vertices())
  for (const auto &E : G.edges())
223 ↗(On Diff #87956)

For readability purposes, I'd suggest you alias G[i].get() to reduce the visual noise.

228–229 ↗(On Diff #87956)

What is "S" and "E" here? It would really help if you use better names here. It will make more sense if you do something like:

auto &FromVertex = E.first.first;
auto &ToVertex = E.first.second;
auto FromVertexOrErr = G.at(FromVertex);
auto ToVertexOrErr = G.at(ToVertex);
236 ↗(On Diff #87956)

This is really unreadable. What is P?

242 ↗(On Diff #87956)

This is really opaque -- what does IM stand for? Please pick a more meaningful name. It's also not clear why we're using a monotonic number here.

244 ↗(On Diff #87956)

I suggest using enum names instead of 0 and 1 here, to aid in readability. Something like:

enum { LEFT = 0, RIGHT = 1 };
auto &VAttr = V.second;
if (VAttr.Parent[LEFT] == nullptr || VAttr.Parent[RIGHT] == nullptr)
244–245 ↗(On Diff #87956)

So let me get this right -- if one of the vertex attribute's parents doesn't have it, then we move on. So we don't indicate that the vertex is only seen in one of the graphs?

282–302 ↗(On Diff #87956)

All this repetition seems unnecessary. It seems you might be able to do this with a loop:

auto MemPtrs[] = { &TimeStat::Count, &TimeStat::Min, &TimeStat::Median, ... };
for (auto Mem : MemPtrs) {
  auto Diff = maxDiff(MA.*Mem, MB.*Mem);
  MA.*Mem = Diff.first;
  MB.*Mem = Diff.second;
324–327 ↗(On Diff #87956)

What is R and what is MR -- this code looks really hard to read. I can't even begin to explain what this is actually doing (what's EdgeAttr.P[0]->second.S ?).

334 ↗(On Diff #87956)

What is Manumim? Did you mean Maximum?

342–345 ↗(On Diff #87956)

Why are these colours hard-coded, when you already have a ColorHelper available?

369–372 ↗(On Diff #87956)

What is this meant to do? Did you mean to provide stringified labels? Why isn't this just always returning the empty string? Is this supposed to also provide the value of the difference, in relative terms as a string appropriately coloured?

382 ↗(On Diff #87956)

Looks like you might as well just use formatv here, as it's more confusing to read this as is.

405 ↗(On Diff #87956)

Why are we truncating the tooltip? It seems it might be better to provide the whole string in the tooltip so in cases where we can keep them (in SVG) then the truncated names show up in the code, and on hover will show more information.

38–44 ↗(On Diff #87956)

Please use a better identifier than P -- Is this meant to indicate Parent? If so, just use that to make it easier to read. If not, then please use a more descriptive name.

This revision now requires changes to proceed.Feb 24 2017, 8:41 PM

@varno -- are you able to make the changes, or are you alright with me taking over this patch?

varno planned changes to this revision.Mar 2 2017, 2:37 PM

Yep. This weekend

varno updated this revision to Diff 90568.Mar 4 2017, 12:30 AM
  • address reviewer comments
  • clang-format
varno marked 11 inline comments as done.Mar 4 2017, 12:32 AM
varno added inline comments.
244–245 ↗(On Diff #87956)

That the Vertex is in only one of the graphs is encoded in the fact that the corresponding pointer is null. So Yeah We do indicate this, i.e. if there is no link the 0th or 1st vertex does not exist in the corresponding graph.

282–302 ↗(On Diff #87956)

Can't quite do this due to type issues but I have fixed this most of the way there.

342–345 ↗(On Diff #87956)

These are different colors not on the scale for if the vertex or edge does not exist (as opposed to encoding some difference. They should not come from the ColorHelper but are specified separately.

369–372 ↗(On Diff #87956)

I am not sure what to do here in labeling edges with differences, so I have a TODO here, If you can tell me what you think I should put there I will change this.

dberris requested changes to this revision.Mar 8 2017, 12:23 AM

More readability comments. In general all the escaped strings and broken up stream operations don't read well. A better option is using formatv(...) and consolidating those into format strings that are specified as raw string literals would make all the output generation code much more readable.

226 ↗(On Diff #90568)

What is V.second? Is this the vertex attribute? Aliasing this will make it much easier to read:

const auto& VAttr = V.second;
242–244 ↗(On Diff #90568)

Single line body; prefer no {}'s.

253–255 ↗(On Diff #90568)

As documented, it doesn't say what a "negative" value means -- so for values 1 and 2, it will result in:

(1/2) (0.5) - (2) = -1.5

I presume that the sign implies says the value on the left is larger? Or is that in reverse? It would be better if the comment actually stated this.

Alternatively, if you used a relative computation instead and yielded the percentage difference in one direction, and can use that as a label too. That would look something like:

-(Left - Right / Left)

If Left == 0, then it's +100%. If Right is 0 then it's -100%.

256 ↗(On Diff #90568)

Reading this now, I think 'difference' is a little misleading a name for what it does. It seems to be a scaled relative difference, because it's not just A - B (as the name "difference" would suggest). So one of: scaledRelativeDiff(...) or ratioDiff(...) would make this less confusing as it says what it means.

281–282 ↗(On Diff #90568)

Alternative way of spelling this:

std::tie(MA.*Mem, MB.*Mem) = DP;

Or just:

std::tie(MA.*Mem, MB.*Mem) =
    maxDiff(MA.*Mem, MB.*Mem, A.*Mem, B.*Mem);

This might also benefit from better names. Mem isn't the best name, but "Stat" might be more indicative of what it actually is. DoubleMemPtrs looks like it doesn't need to be given a name, the for loop could just have it there. Also, MA and MB don't mean much in terms of names. So rewriting that to be more succinct:

for (auto Stat : {&TimeStat::Min, ...})
  std::tie(StatLeft.*Stat, StatRight.*Stat) =
      maxDiff(StatLeft.*Stat, StatRight.*Stat, NewStatLeft.*Stat, NewStatRight.*Stat);
385 ↗(On Diff #90568)

Use raw string literals to avoid the problem of having to escape quotes.

385 ↗(On Diff #90568)

Can 40 not be hard-coded here? Consider a flag to control string length truncation.

412–415 ↗(On Diff #90568)

This is completely unreadable. Consider using the combination of formatv(...) and raw string literals instead. Also, please use aliases to make things like E.first.first say something a bit more readable. Something like:

const auto &SourceVertex = E.first.first;
const auto &DestinationVertex = E.first.second;
437–481 ↗(On Diff #90568)

Some of this duplication seems a little unnecessary. How about using control flow to reduce the repetition?

struct GraphData {
  bool KeepGoing;
  bool DeduceSiblings;
  string InstrMapFile;
  string Input;
  GraphRenderer::Factory Factory;
  Trace Trace;
  GraphRenderer Renderer;
  Graph Graph;
std::array<GraphData, 2> Graphs {
  { isSpecified(KeepGoing1), ... },
  { isSpecified(KeepGoing2), ...},
for (auto &Graph : Graphs) {
  Graph.Factory.KeepGoing = Graph.KeepGoing;
  auto TraceOrError = loadTraceFile(Graph.Input, true);
  if (!TraceOrError) ...
  Graph.Trace = std::move(*TraceOrError);
342–345 ↗(On Diff #87956)

I can reverse that question too -- why isn't the choice for what colour to use for when something is missing from one side not in the ColorHelper? It seems weird that these are the only colours hard-coded when I'd expect as a reader/maintainer of this that this information fits really well in the ColorHelper utility.

Something like a categorisation amongst "in both sets", "in set A", "in set B", "in neither". We could just use hard-coded colours to indicate those four states in the ColorHelper, and centralise where all the Colors are defined.

369–372 ↗(On Diff #87956)

How about providing the relative differences? Something like "+10%" or "-2.3%" in the same colour as the edge/vertex?

374–402 ↗(On Diff #90568)

Now that I read this, I'm more convinced of a mapping between the enum and data member pointers.

This revision now requires changes to proceed.Mar 8 2017, 12:23 AM

Hi @varno -- are you still working on this, or should I take this over instead?

varno marked 3 inline comments as done.Mar 17 2017, 3:34 AM

Am still working on this.

varno updated this revision to Diff 92261.Mar 18 2017, 11:42 PM
varno marked an inline comment as done.
  • Address Reviewer Comments
varno marked 12 inline comments as done.Mar 18 2017, 11:45 PM

I have fixed most of the comments, and replied to all of them, Tests incoming soon.

253–255 ↗(On Diff #90568)

I have improved the comment. I chose this function because it is symmetrical (that is diff(A,B) = -diff(B,A)) and it behaves similarly to a

Right/Left - 1 which you suggest. behaves well if Right == 0 but not if left == 0. it does not treat the left and right hand sides symmetrically, nor close to it away from 1.

369–372 ↗(On Diff #87956)

I am not convinced that this is the right statistic to show, or whether this is the right way to go about coloring edges.

dberris requested changes to this revision.Mar 19 2017, 5:58 PM
dberris added inline comments.
265–266 ↗(On Diff #92261)

So this is always a positive number?

369–372 ↗(On Diff #87956)

Why not?

User gave you two traces. We're showing them differences. Using a made-up relative difference as opposed to ((A - B / A) * 100%) seems wrong and not helpful. We want to make sure the magnitude of the difference is what we use to highlight the edge or the vertex. Any other scheme would be trying to be too clever, and be actively harmful because it wouldn't make sense to the user trying to make sense of the data. If you don't provide the relative difference then it's effectively useless -- I as a user wouldn't be able to tell how severe the difference is in relative terms which is the whole point of the diff tool.

What makes the current way better than using the relative difference for the stat chosen?

This revision now requires changes to proceed.Mar 19 2017, 5:58 PM
varno added inline comments.Mar 20 2017, 3:46 PM
265–266 ↗(On Diff #92261)

Sorry, bad comment will fix.

369–372 ↗(On Diff #87956)

Hi Dean.

I Agree that for the textrual representation that RatioDiff is not the best answer, however for coloring the differences and determining what the largest difference is, this is just fine as both as the graph plotting ratioDiff vs (A-B)/A is monotonic. The properties of the chosen funciton Namely the symmetry allow us to compare the amount of change going positive and negative with each other, which is a useful property.

The reason that this is better is that the order of diff is mostly arbitrary, and so A/B - B;/A behaves like max(A,B)/min(A,B) away from where A and B are similar and properly when A, and B are close to each other then it behaves like (A-B)/A. However even here the behavior is better behaved. Also for choosing colors this is easier. the function (A-B)/ A just treats A, and B too differently to work well for coloring edges (I tried this first at google btw as it was the "obvious" answer).

However as a User in a textual representation I agree that we want to use a different metric, that is why I store both sides of the max and minimum statistics and not just the diff. There I am not sure whether we want to show percentage difference, absolute difference or just both numbers in the change. The decision here also will likely be different for edges and vertices, with edges probably wanting something simple like the percentage change or absolute difference, while the vertices you may want to have both the old and new numbers plus an amount of change.

varno requested review of this revision.Mar 20 2017, 3:59 PM
varno added a comment.Mar 20 2017, 4:39 PM

Just doing the math now, (A/B - B/A)*100 is exactly 2x the average of x and y where x and y are defined such that :
A is x% faster than B, and B is x% slower than A. So this is really a reasonable relative difference function even intuitively, being the average of the two difference metrics.

(x = (A/B - 1)*100 and y= -(B/A - 1)*100, so (x+y)/2 = (A/B - B/A)/2)

varno planned changes to this revision.Mar 21 2017, 4:02 PM
varno updated this revision to Diff 92601.Mar 22 2017, 12:12 AM
  • Change coloring metric, add labels
dberris requested changes to this revision.Mar 26 2017, 11:13 PM

I've had a quick look, but haven't dug deep into the recent changes yet. Please add tests.

This revision now requires changes to proceed.Mar 26 2017, 11:13 PM

@varno -- are you still working on this?

varno added a comment.Apr 11 2017, 5:45 PM

I have been very busy recently but I have next week off. I will be working on it then.

varno updated this revision to Diff 96079.Apr 20 2017, 7:09 PM
  • New branch
  • address reviewer comments
  • clang-format
  • Address Reviewer Comments
  • Change coloring metric, add labels
  • tests
dberris accepted this revision.Sun, Apr 23, 10:26 PM

LGTM -- just a couple more nits, but I'll land it first and get the follow-up changes done myself. :)

Thanks for the work @varno!

310–315 ↗(On Diff #96079)

Why isn't the caller of this just doing:

llvm::find(Collection, nullptr) != Collection.end()
350 ↗(On Diff #96079)

Is it just me, or are the squiggly brackets not matched?

339–341 ↗(On Diff #96079)

This could be a static constexpr array, so it doesn't have to be initialised every time this function is called.

This revision is now accepted and ready to land.Sun, Apr 23, 10:26 PM
This revision was automatically updated to reflect the committed changes.
varno added a comment.Mon, Apr 24, 3:33 PM

Thanks for letting me finish this.

350 ↗(On Diff #96079)

two {{ escape a { in formatv, so yeah, but they have to be

jroelofs added inline comments.

'*' --> '/' copy-pasta?

sanjoy added a subscriber: sanjoy.Fri, Apr 28, 1:43 PM
sanjoy added inline comments.
310–315 ↗(On Diff #96079)

Armchair comment: we also have llvm::is_contained in STLExtras.h