This is an archive of the discontinued LLVM Phabricator instance.

[XRay][tools] Function call stack based analysis tooling for XRay traces
ClosedPublic

Authored by kpw on Jun 29 2017, 9:42 PM.

Details

Summary

This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.

The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.

When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).

The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.

Event Timeline

dberris created this revision.Jun 29 2017, 9:42 PM
dblaikie edited edge metadata.Jul 5 2017, 11:33 AM

Tests?

tools/llvm-xray/xray-stacks.cc
198

I'd probably write this as "(!Parent)" but I'm not sure if there's an especially prevailing convention in the LLVM codebase.

198–200

Skip {} on single line blocks

199

Prefer push_back over emplace_back

206

use 'auto *' in the range-for to indicate that this is a pointer

206–209

Alternatively, consider find_if (the llvm:: variant that takes a range rather than begin/end)

222–226

Usually omit {} on single line blocks

222–226

Alternatively, consider a conditional operator:

TS.emplace_back(Root ? Root : createTrieNode(R.FuncId, nullptr), R.TSC);
331–348

Not sure these explicit scopes ("{}") are sufficiently valuable (reducing the scope of 'E') to worry about/include?

353

Prefer push_back when emplace_back and puhs_back both do the same thing.

(for the same reason that one should prefer copy init (T u = v) over direct init (T u(v)) - because copy init can only cause implicit conversions, whereas direct init can perform explicit conversions - so it's easier to read code that uses the less powerful construct, since it can't do "interesting" things)

375

Could potentially write this as "[] {" I think, not sure if that's more readable though.

("()" can be omitted in a lambda, as can the return type if it's deducible from the return expressions consistently - at least I think that's supported on LLVM's supported compilers)

411–412

Unused variable?

kpw edited edge metadata.Jul 11 2017, 1:55 PM

Sorry for the delay getting comments on this. My phabricator email address lapsed into an unvalidated state and I haven't been getting messages until today.

Thanks for getting a start on this. The trie data structure looks very analogous to the Chrome Trace Viewer format given some scheme to generate stack ids like FnId->FnId->FnId.

I can pick this up and play with it, using StackTrie within llvm-xray convert. We have a sync later this afternoon, but the moral of the story is that I'll integrate this into my tree and make the changes from my comments to get started.
Would you like me to "Commandeer the revision" once I have some edits?

tools/llvm-xray/xray-stacks.cc
58

Might be worth pointing out that this expects a compiler output format (e.g. elf). Are there other supported formats? This type of option is common to all the llvm-xray sub commands. Is there a good way to hyperlink our command line options such as "look over in that manpage or in llvm-xray --help?"

64–65

Doxygen comments?

sed 's_//_///_g'
90

functions -> function

93–100

This might be more clear with two columns.

Step Duration State
push a <start time> a = ?
push b <start time> a = ?, a->b = ?
push c < start time> a = ?, a->b = ?, a->b->c = ?
pop c <end time> a = ?, a->b = ?, emit duration a->b->c
pop b <end time> a = ?, emit duration a->b
push c <start time> a = ?, a->c = ?
pop c <end time> a = ?, emit duration a->c
pop a <end time> emit duration a

The reason I find the comment confusing that it doesn't sound like there is a record emitted each time a pop happens, but only when a push follows a pop or the stack is empty.

118–119

Why are the higher levels of the stack special?

154

I think it might be worth inventing terminology like "instrumented call sequences" and "instrumented call stacks".

It's obvious from the implementation that intermediate and leaf nodes will only include xray instrumented functions, but for a user new to the tool and accustomed to sampling profilers, commonly understood terminology like call stack could serve to reinforce a misconception that each function is included in this trie.

183

Nit: We maintain a pointer -> We maintain pointers

189–190

Perhaps a comment that the uint64_t pair parameter is for start times is justified.

205

Alternatively use a DenseMap of FuncId -> TrieNode* for roots. Expected value of cardinality(root_functions) is small, so it's up to you which is preferable.

213

What does the return type mean?

234–236

We're going to create duplicate nodes for each thread that this stack id appears in if we don't search the FuncId index.

I think that this duplication is actually *good*, because it gives us another dimension to work with for statistics. Users that are interested in profiling work distribution approaches might want to compare stack-id duration across threads. We're going to lose that information once the thread stack is unwound, but we could trivially retain it.

259

Parent doesn't seem like the right name. We're not looking for the func-id's parent. We're looking for a match.

272–276

This seems really fishy to me. I think it's a bug.

It should be "auto I = Parent.base()" and I think this is more obvious if the Parent was named Match.

Parent is the reverse_iterator into the ThreadStack that points to an entry that has a *matching* function id of the function that an EXIT record is being processed for.

Calling std::next on the reverse_iterator does contain the actual parent (or caller) of that function in the thread stack, which is then turned into the forward_iterator with .base()

The caller function and all of its children have a duration recorded and are removed from the thread stack. This is wrong. The caller function isn't being exited from.

295

Fn and FN are both variable names at this point (in addition to F). That seems like a recipe for confusion.

328

I think this is accurate if you consider the thread id part of the stack identifier, otherwise see above comment. I think we should (optionally) merge stack sums across threads here.

350

Remove?

In D34863#805704, @kpw wrote:

Sorry for the delay getting comments on this. My phabricator email address lapsed into an unvalidated state and I haven't been getting messages until today.

Thanks for getting a start on this. The trie data structure looks very analogous to the Chrome Trace Viewer format given some scheme to generate stack ids like FnId->FnId->FnId.

I can pick this up and play with it, using StackTrie within llvm-xray convert. We have a sync later this afternoon, but the moral of the story is that I'll integrate this into my tree and make the changes from my comments to get started.
Would you like me to "Commandeer the revision" once I have some edits?

Yes, please! Feel free to take over the revision, happy to be a reviewer on the side for this (given the other things I'm working on). :)

kpw commandeered this revision.Jul 20 2017, 5:51 PM
kpw edited reviewers, added: dberris; removed: kpw.
kpw updated this revision to Diff 113049.Aug 29 2017, 3:21 AM
kpw marked 18 inline comments as done.

Got a working implementation and added some options for thread breakdown.

Will have to come back and tweak some LNT tests and check on the variable
name styles, but I was able to get some confidence by running the command
and manually verifying the outputs. I wanted to back up to Phabricator.

kpw updated this revision to Diff 113050.Aug 29 2017, 3:30 AM

Adding some TODOs and removing an unused struct.

dberris edited edge metadata.Aug 29 2017, 11:42 PM

Is this ready to land? What else is missing here?

tools/llvm-xray/xray-stacks.cc
293

nit: Do you need a doubly-linked list here? Or will a std::forward_list work better?

394

Do you need a case for tail exits? You might also want to log/ignore other kinds of records.

427

nit: leafs -> leaves?

431

Are you missing a reference somewhere in the signature? Otherwise you're making a copy in the call. Alternatively to this you can use an alias to make it a bit more readable:

using RootT = decltype(*Roots.begin());
auto SecondFn = [](const RootT &value) { return value.second; };
433–435

Is this something you can turn into a llvm::transform instead?

451

Use llvm::find_if instead? That takes a range already.

dberris added inline comments.Aug 29 2017, 11:44 PM
tools/llvm-xray/xray-stacks.cc
299–302

This is a remnant of an earlier implementation. I forget now what I was thinking. :)

dberris accepted this revision.Aug 30 2017, 12:01 AM

LGTM for the most part.

We can iterate on this if in case we find that there's something horribly wrong.

In particular, we should look into turning parts of this into a library, in lib/XRay (and include/llvm/XRay). Especially if we're planning to use this in the conversion tool for converting to stack-based formats as well, or in the accounting tool to support stack-based accounting.

I have a slight preference to having some of this functionality in sooner and useful, rather than later.

This revision is now accepted and ready to land.Aug 30 2017, 12:01 AM
kpw added a comment.Aug 30 2017, 12:03 AM

Thanks for the feedback Dean. I'm working on some FileCheck tests before I consider it ready to land, but I don't know that the implementation needs anything more.

It would be useful to have a chat with you about how the stacks tool can detect sibling calls. Do we have any compiler attributes/sleds planned or implemented to track that scenario?

In D34863#856235, @kpw wrote:

Thanks for the feedback Dean. I'm working on some FileCheck tests before I consider it ready to land, but I don't know that the implementation needs anything more.

More tests, more good! :)

It would be useful to have a chat with you about how the stacks tool can detect sibling calls. Do we have any compiler attributes/sleds planned or implemented to track that scenario?

It's not clear whether we can actually deduce that -- what we can do is deduce when we do tail exits at least. We need to treat tail exits as an exit of the calling function, and mark it when we're building the stack to know that the duration of the caller is defined as:

CallerStartTime - CalleeStartTime

Until we start writing out the tail exit records, we don't need to deal with these yet. But it would be good to keep in mind later.

kpw updated this revision to Diff 113634.Sep 1 2017, 6:37 PM

A few things here.

  1. Added tests.
  2. Added tracking so that prefix stacks can still be detected as unique.
  3. Removed some unused data structures and an unimplemented option.
kpw updated this revision to Diff 113660.Sep 2 2017, 3:21 PM
kpw marked 16 inline comments as done.

Read through some of the earlier comments and switched to make use of STLExtras.h

I think this is now tested and is ready to submit finally. As Dean pointed out, we
can come back to it and make lots of changes, but it's ready to be useful.

tools/llvm-xray/xray-stacks.cc
331–348

Ack.

375

Because of implicit conversion from llvm::ErrorSuccess to llvm::Error, the return type deduction fails. I also believe the return type aids reading.

394

Xray record doesn't have tail exits yet. This is now an error type in account record that gets logged though and a TODO.

433–435

I couldn't work out how to map it onto llvm::transform or std::transform. The crux of the trouble is that the contents of each pair in the map's iterator would have to be "flattened" before assignment to the output iterator. Maybe there is something in <algorithm> or STLExtras.h for this, but std::for_each with a back_inserter does it well even if it's a bit ugly.

kpw updated this revision to Diff 113661.Sep 2 2017, 3:37 PM

Fixed a test case where FileCheck was checking stderr and it should not have been.

kpw updated this revision to Diff 113662.Sep 2 2017, 3:50 PM

Cleaning up some trailing whitespace.

kpw updated this revision to Diff 113663.Sep 2 2017, 3:54 PM

Trailing newline in test case.

kpw updated this revision to Diff 113668.Sep 2 2017, 6:12 PM

Update the way an iterator type is referenced to not make assumptions about references.

kpw updated this revision to Diff 113887.Sep 5 2017, 11:26 AM

Switch from std::for_each to range based for.

Email discussion about D37417 convinced me that std::for_each is an inferior construct.

dberris accepted this revision.Sep 6 2017, 7:05 PM

LGTM -- just a couple of readability suggestions. Otherwise, good to land. :)

tools/llvm-xray/xray-stacks.cc
257–259

Is it possible to just do something like:

auto &RightCallees = MapPairIter.second;
Node->Callees.insert(Node->Callees.end(), RightCallees.begin(), RightCallees.end());

instead?

262–264

This one seems simpler as:

Node->TerminalDurations.insert(
  Node->TerminalDurations.end(),
  Left.TerminalDurations.begin(), Left.TerminalDurations.end());

Or, if there's an append that takes a range, it might be more efficient/simpler than growing the containers one element at a time.

kpw marked 2 inline comments as done.Sep 7 2017, 11:45 AM
kpw added inline comments.
tools/llvm-xray/xray-stacks.cc
257–259

I think it would be a bit more involved to do it this way and would require llvm::map_iter (has to do with mapping functions to an iterator, not the map type) to extract the values from the pair that iterating over the map presents. This seems less readable to me.

262–264

I don't know of an append that takes a range. That would be nice and I can imagine some SFINAE constructs to detect that the range is a difference iterator and only require one resize.

I didn't make the change to use insert, which imho doesn't activate pattern matchers for iterating through a range in my brain as well as for loops.