This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
test/tools/llvm-xray/X86/
-
tools/
-
llvm-xray/
-
X86/
-
stack-empty-case.yaml
-
stack-keep-going.yaml
-
stack-multithread.yaml
-
stack-simple-case.yaml
-
tools/llvm-xray/
-
llvm-xray/
-
CMakeLists.txt
-
xray-record-yaml.h
-
xray-stacks.cc

Differential D34863

[XRay][tools] Function call stack based analysis tooling for XRay traces
ClosedPublic

Authored by kpw on Jun 29 2017, 9:42 PM.

Download Raw Diff

Details

Reviewers

pelikan
dblaikie
dberris

Commits

rG9420ec3378b8: [XRay][tools] Function call stack based analysis tooling for XRay traces
rL312733: [XRay][tools] Function call stack based analysis tooling for XRay traces

Summary

This change introduces a subcommand to the llvm-xray tool called
"stacks" which allows for analysing XRay traces provided as inputs and
accounting time to stacks instead of just individual functions. This
gives us a more precise view of where in a program the latency is
actually attributed.

The tool uses a trie data structure to keep track of the caller-callee
relationships as we process the XRay traces. In particular, we keep
track of the function call stack as we enter functions. While we're
doing this we're adding nodes in a trie and indicating a "calls"
relatinship between the caller (current top of the stack) and the callee
(the new top of the stack). When we push function ids onto the stack, we
keep track of the timestamp (TSC) for the enter event.

When exiting functions, we are able to account the duration by getting
the difference between the timestamp of the exit event and the
corresponding entry event in the stack. This works even if we somehow
miss the exit events for intermediary functions (i.e. if the exit event
is not cleanly associated with the enter event at the top of the stack).

The output of the tool currently provides just the top N leaf functions
that contribute the most latency, and the top N stacks that have the
most frequency. In the future we can provide more sophisticated query
mechanisms and potentially an export to database feature to make offline
analysis of the stack traces possible with existing tools.

Diff Detail

Repository: rL LLVM

Event Timeline

dberris created this revision.Jun 29 2017, 9:42 PM

Herald added a subscriber: mgorny. · View Herald TranscriptJun 29 2017, 9:42 PM

Tests?

tools/llvm-xray/xray-stacks.cc
197 ↗	(On Diff #104815)	I'd probably write this as "(!Parent)" but I'm not sure if there's an especially prevailing convention in the LLVM codebase.
197–199 ↗	(On Diff #104815)	Skip {} on single line blocks
198 ↗	(On Diff #104815)	Prefer push_back over emplace_back
205 ↗	(On Diff #104815)	use 'auto *' in the range-for to indicate that this is a pointer
205–208 ↗	(On Diff #104815)	Alternatively, consider find_if (the llvm:: variant that takes a range rather than begin/end)
221–225 ↗	(On Diff #104815)	Usually omit {} on single line blocks
221–225 ↗	(On Diff #104815)	Alternatively, consider a conditional operator: TS.emplace_back(Root ? Root : createTrieNode(R.FuncId, nullptr), R.TSC);
330–347 ↗	(On Diff #104815)	Not sure these explicit scopes ("{}") are sufficiently valuable (reducing the scope of 'E') to worry about/include?
352 ↗	(On Diff #104815)	Prefer push_back when emplace_back and puhs_back both do the same thing. (for the same reason that one should prefer copy init (T u = v) over direct init (T u(v)) - because copy init can only cause implicit conversions, whereas direct init can perform explicit conversions - so it's easier to read code that uses the less powerful construct, since it can't do "interesting" things)
374 ↗	(On Diff #104815)	Could potentially write this as "[] {" I think, not sure if that's more readable though. ("()" can be omitted in a lambda, as can the return type if it's deducible from the return expressions consistently - at least I think that's supported on LLVM's supported compilers)
410–411 ↗	(On Diff #104815)	Unused variable?

Sorry for the delay getting comments on this. My phabricator email address lapsed into an unvalidated state and I haven't been getting messages until today.

Thanks for getting a start on this. The trie data structure looks very analogous to the Chrome Trace Viewer format given some scheme to generate stack ids like FnId->FnId->FnId.

I can pick this up and play with it, using StackTrie within llvm-xray convert. We have a sync later this afternoon, but the moral of the story is that I'll integrate this into my tree and make the changes from my comments to get started.
Would you like me to "Commandeer the revision" once I have some edits?

tools/llvm-xray/xray-stacks.cc
57 ↗	(On Diff #104815)	Might be worth pointing out that this expects a compiler output format (e.g. elf). Are there other supported formats? This type of option is common to all the llvm-xray sub commands. Is there a good way to hyperlink our command line options such as "look over in that manpage or in llvm-xray --help?"
63–64 ↗	(On Diff #104815)	Doxygen comments? sed 's_//_///_g'
89 ↗	(On Diff #104815)	functions -> function
92–99 ↗	(On Diff #104815)	This might be more clear with two columns. Step Duration State push a <start time> a = ? push b <start time> a = ?, a->b = ? push c < start time> a = ?, a->b = ?, a->b->c = ? pop c <end time> a = ?, a->b = ?, emit duration a->b->c pop b <end time> a = ?, emit duration a->b push c <start time> a = ?, a->c = ? pop c <end time> a = ?, emit duration a->c pop a <end time> emit duration a The reason I find the comment confusing that it doesn't sound like there is a record emitted each time a pop happens, but only when a push follows a pop or the stack is empty.
117–118 ↗	(On Diff #104815)	Why are the higher levels of the stack special?
153 ↗	(On Diff #104815)	I think it might be worth inventing terminology like "instrumented call sequences" and "instrumented call stacks". It's obvious from the implementation that intermediate and leaf nodes will only include xray instrumented functions, but for a user new to the tool and accustomed to sampling profilers, commonly understood terminology like call stack could serve to reinforce a misconception that each function is included in this trie.
182 ↗	(On Diff #104815)	Nit: We maintain a pointer -> We maintain pointers
188–189 ↗	(On Diff #104815)	Perhaps a comment that the uint64_t pair parameter is for start times is justified.
204 ↗	(On Diff #104815)	Alternatively use a DenseMap of FuncId -> TrieNode* for roots. Expected value of cardinality(root_functions) is small, so it's up to you which is preferable.
212 ↗	(On Diff #104815)	What does the return type mean?
233–235 ↗	(On Diff #104815)	We're going to create duplicate nodes for each thread that this stack id appears in if we don't search the FuncId index. I think that this duplication is actually good, because it gives us another dimension to work with for statistics. Users that are interested in profiling work distribution approaches might want to compare stack-id duration across threads. We're going to lose that information once the thread stack is unwound, but we could trivially retain it.
258 ↗	(On Diff #104815)	Parent doesn't seem like the right name. We're not looking for the func-id's parent. We're looking for a match.
271–275 ↗	(On Diff #104815)	This seems really fishy to me. I think it's a bug. It should be "auto I = Parent.base()" and I think this is more obvious if the Parent was named Match. Parent is the reverse_iterator into the ThreadStack that points to an entry that has a matching function id of the function that an EXIT record is being processed for. Calling std::next on the reverse_iterator does contain the actual parent (or caller) of that function in the thread stack, which is then turned into the forward_iterator with .base() The caller function and all of its children have a duration recorded and are removed from the thread stack. This is wrong. The caller function isn't being exited from.
294 ↗	(On Diff #104815)	Fn and FN are both variable names at this point (in addition to F). That seems like a recipe for confusion.
327 ↗	(On Diff #104815)	I think this is accurate if you consider the thread id part of the stack identifier, otherwise see above comment. I think we should (optionally) merge stack sums across threads here.
349 ↗	(On Diff #104815)	Remove?

In D34863#805704, @kpw wrote:

Sorry for the delay getting comments on this. My phabricator email address lapsed into an unvalidated state and I haven't been getting messages until today.

Thanks for getting a start on this. The trie data structure looks very analogous to the Chrome Trace Viewer format given some scheme to generate stack ids like FnId->FnId->FnId.

I can pick this up and play with it, using StackTrie within llvm-xray convert. We have a sync later this afternoon, but the moral of the story is that I'll integrate this into my tree and make the changes from my comments to get started.
Would you like me to "Commandeer the revision" once I have some edits?

Yes, please! Feel free to take over the revision, happy to be a reviewer on the side for this (given the other things I'm working on). :)

kpw commandeered this revision.Jul 20 2017, 5:51 PM

kpw edited reviewers, added: dberris; removed: kpw.

Got a working implementation and added some options for thread breakdown.

Will have to come back and tweak some LNT tests and check on the variable
name styles, but I was able to get some confidence by running the command
and manually verifying the outputs. I wanted to back up to Phabricator.

Harbormaster completed remote builds in B9727: Diff 113049.Aug 29 2017, 3:21 AM

Adding some TODOs and removing an unused struct.

Is this ready to land? What else is missing here?

tools/llvm-xray/xray-stacks.cc
292 ↗	(On Diff #113050)	nit: Do you need a doubly-linked list here? Or will a `std::forward_list` work better?
393 ↗	(On Diff #113050)	Do you need a case for tail exits? You might also want to log/ignore other kinds of records.
426 ↗	(On Diff #113050)	nit: leafs -> leaves?
430 ↗	(On Diff #113050)	Are you missing a reference somewhere in the signature? Otherwise you're making a copy in the call. Alternatively to this you can use an alias to make it a bit more readable: using RootT = decltype(*Roots.begin()); auto SecondFn = [](const RootT &value) { return value.second; };
432–434 ↗	(On Diff #113050)	Is this something you can turn into a `llvm::transform` instead?
450 ↗	(On Diff #113050)	Use `llvm::find_if` instead? That takes a range already.

dberris added inline comments.Aug 29 2017, 11:44 PM

tools/llvm-xray/xray-stacks.cc
298–301 ↗	(On Diff #113050)	This is a remnant of an earlier implementation. I forget now what I was thinking. :)

LGTM for the most part.

We can iterate on this if in case we find that there's something horribly wrong.

In particular, we should look into turning parts of this into a library, in lib/XRay (and include/llvm/XRay). Especially if we're planning to use this in the conversion tool for converting to stack-based formats as well, or in the accounting tool to support stack-based accounting.

I have a slight preference to having some of this functionality in sooner and useful, rather than later.

This revision is now accepted and ready to land.Aug 30 2017, 12:01 AM

Thanks for the feedback Dean. I'm working on some FileCheck tests before I consider it ready to land, but I don't know that the implementation needs anything more.

It would be useful to have a chat with you about how the stacks tool can detect sibling calls. Do we have any compiler attributes/sleds planned or implemented to track that scenario?

In D34863#856235, @kpw wrote:

Thanks for the feedback Dean. I'm working on some FileCheck tests before I consider it ready to land, but I don't know that the implementation needs anything more.

More tests, more good! :)

It would be useful to have a chat with you about how the stacks tool can detect sibling calls. Do we have any compiler attributes/sleds planned or implemented to track that scenario?

It's not clear whether we can actually deduce that -- what we can do is deduce when we do tail exits at least. We need to treat tail exits as an exit of the calling function, and mark it when we're building the stack to know that the duration of the caller is defined as:

CallerStartTime - CalleeStartTime

Until we start writing out the tail exit records, we don't need to deal with these yet. But it would be good to keep in mind later.

A few things here.

Added tests.
Added tracking so that prefix stacks can still be detected as unique.
Removed some unused data structures and an unimplemented option.

Harbormaster completed remote builds in B9861: Diff 113634.Sep 1 2017, 6:37 PM

Read through some of the earlier comments and switched to make use of STLExtras.h

I think this is now tested and is ready to submit finally. As Dean pointed out, we
can come back to it and make lots of changes, but it's ready to be useful.

tools/llvm-xray/xray-stacks.cc
393 ↗	(On Diff #113050)	Xray record doesn't have tail exits yet. This is now an error type in account record that gets logged though and a TODO.
432–434 ↗	(On Diff #113050)	I couldn't work out how to map it onto llvm::transform or std::transform. The crux of the trouble is that the contents of each pair in the map's iterator would have to be "flattened" before assignment to the output iterator. Maybe there is something in <algorithm> or STLExtras.h for this, but std::for_each with a back_inserter does it well even if it's a bit ugly.
330–347 ↗	(On Diff #104815)	Ack.
374 ↗	(On Diff #104815)	Because of implicit conversion from llvm::ErrorSuccess to llvm::Error, the return type deduction fails. I also believe the return type aids reading.

Fixed a test case where FileCheck was checking stderr and it should not have been.

Harbormaster completed remote builds in B9866: Diff 113661.Sep 2 2017, 3:37 PM

Cleaning up some trailing whitespace.

Harbormaster completed remote builds in B9867: Diff 113662.Sep 2 2017, 3:52 PM

Trailing newline in test case.

Harbormaster completed remote builds in B9868: Diff 113663.Sep 2 2017, 3:57 PM

Update the way an iterator type is referenced to not make assumptions about references.

Harbormaster completed remote builds in B9870: Diff 113668.Sep 2 2017, 6:13 PM

Switch from std::for_each to range based for.

Email discussion about D37417 convinced me that std::for_each is an inferior construct.

Harbormaster completed remote builds in B9906: Diff 113887.Sep 5 2017, 11:26 AM

LGTM -- just a couple of readability suggestions. Otherwise, good to land. :)

tools/llvm-xray/xray-stacks.cc
256–258 ↗	(On Diff #113887)	Is it possible to just do something like: auto &RightCallees = MapPairIter.second; Node->Callees.insert(Node->Callees.end(), RightCallees.begin(), RightCallees.end()); instead?
261–263 ↗	(On Diff #113887)	This one seems simpler as: Node->TerminalDurations.insert( Node->TerminalDurations.end(), Left.TerminalDurations.begin(), Left.TerminalDurations.end()); Or, if there's an `append` that takes a range, it might be more efficient/simpler than growing the containers one element at a time.

Closed by commit rL312733: [XRay][tools] Function call stack based analysis tooling for XRay traces (authored by kpw). · Explain WhySep 7 2017, 11:10 AM

This revision was automatically updated to reflect the committed changes.

kpw mentioned this in rL312733: [XRay][tools] Function call stack based analysis tooling for XRay traces.

kpw marked 2 inline comments as done.Sep 7 2017, 11:45 AM

kpw added inline comments.

tools/llvm-xray/xray-stacks.cc
256–258 ↗	(On Diff #113887)	I think it would be a bit more involved to do it this way and would require llvm::map_iter (has to do with mapping functions to an iterator, not the map type) to extract the values from the pair that iterating over the map presents. This seems less readable to me.
261–263 ↗	(On Diff #113887)	I don't know of an append that takes a range. That would be nice and I can imagine some SFINAE constructs to detect that the range is a difference iterator and only require one resize. I didn't make the change to use insert, which imho doesn't activate pattern matchers for iterating through a range in my brain as well as for loops.

Revision Contents

Path

Size

llvm/

trunk/

test/

tools/

llvm-xray/

X86/

stack-empty-case.yaml

13 lines

stack-keep-going.yaml

28 lines

stack-multithread.yaml

83 lines

stack-simple-case.yaml

13 lines

tools/

llvm-xray/

CMakeLists.txt

1 line

xray-record-yaml.h

2 lines

xray-stacks.cc

634 lines

Diff 114214

llvm/trunk/test/tools/llvm-xray/X86/stack-empty-case.yaml

				#RUN: (llvm-xray stack %s 2>&1 \|\| echo "Checking Command Failed") \| FileCheck %s
				---
				header:
				version: 1
				type: 0
				constant-tsc: true
				nonstop-tsc: true
				cycle-frequency: 2601000000
				records:
				...
				# CHECK: llvm-xray: No instrumented calls were accounted in the input file.
				# CHECK: Checking Command Failed
				# CHECK-NOT: {{[0-9A-Z]+}}

llvm/trunk/test/tools/llvm-xray/X86/stack-keep-going.yaml

				#RUN: (llvm-xray stack %s 2>&1 1>&- \|\| echo "Check Command Failed") \| FileCheck --check-prefix HALT %s
				#RUN: (llvm-xray stack -k %s 2>&1 && echo "Check Command Succeeded") \| FileCheck --check-prefix KEEP-GOING-SUCCEEDS %s
				#RUN: llvm-xray stack -k %s \| FileCheck --check-prefix KEEP-GOING %s
				---
				header:
				version: 1
				type: 0
				constant-tsc: true
				nonstop-tsc: true
				cycle-frequency: 2601000000
				records:
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
				- { type: 1, func-id: 4, cpu: 1, thread: 111, kind: function-exit, tsc: 10301 }
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10401 }
				- { type: 0, func-id: 2, cpu: 1, thread: 111, kind: function-enter, tsc: 10501 }
				- { type: 0, func-id: 3, cpu: 1, thread: 111, kind: function-enter, tsc: 10601 }
				- { type: 1, func-id: 3, cpu: 1, thread: 111, kind: function-exit, tsc: 10701 }
				- { type: 1, func-id: 2, cpu: 1, thread: 111, kind: function-exit, tsc: 10751 }
				- { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10775 }
				...

				#HALT: llvm-xray: Found record {FuncId: "#4", ThreadId: "111", RecordType: "Fn Exit"} with no matching function entry
				#HALT: Check Command Failed
				#KEEP-GOING-SUCCEEDS: Found record {FuncId: "#4", ThreadId: "111", RecordType: "Fn Exit"} with no matching function entry
				#KEEP-GOING-SUCCEEDS: Check Command Succeeded
				#KEEP-GOING: Unique Stacks: 2
				# Note the interesting case here that the stack { fn-1 } is a prefix of { fn-1, fn-2, fn-3 } but they
				# are still counted as unique stacks.

llvm/trunk/test/tools/llvm-xray/X86/stack-multithread.yaml

				#RUN: llvm-xray stack -per-thread-stacks %s \| FileCheck %s --check-prefix PER-THREAD
				#RUN: llvm-xray stack -aggregate-threads %s \| FileCheck %s --check-prefix AGGREGATE

				---
				header:
				version: 1
				type: 0
				constant-tsc: true
				nonstop-tsc: true
				cycle-frequency: 2601000000
				records:
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10100 }
				- { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10101 }
				- { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10301 }
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10401 }
				- { type: 0, func-id: 2, cpu: 1, thread: 111, kind: function-enter, tsc: 10501 }
				- { type: 0, func-id: 3, cpu: 1, thread: 111, kind: function-enter, tsc: 10601 }
				- { type: 1, func-id: 3, cpu: 1, thread: 111, kind: function-exit, tsc: 10701 }
				- { type: 1, func-id: 2, cpu: 1, thread: 111, kind: function-exit, tsc: 10751 }
				- { type: 1, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10775 }
				- { type: 0, func-id: 1, cpu: 1, thread: 123, kind: function-enter, tsc: 10401 }
				- { type: 0, func-id: 2, cpu: 1, thread: 123, kind: function-enter, tsc: 10501 }
				- { type: 0, func-id: 3, cpu: 1, thread: 123, kind: function-enter, tsc: 10701 }
				- { type: 1, func-id: 3, cpu: 1, thread: 123, kind: function-exit, tsc: 10801 }
				- { type: 1, func-id: 2, cpu: 1, thread: 123, kind: function-exit, tsc: 10951 }
				- { type: 1, func-id: 1, cpu: 1, thread: 123, kind: function-exit, tsc: 11075 }
				- { type: 0, func-id: 2, cpu: 1, thread: 200, kind: function-enter, tsc: 0 }
				- { type: 0, func-id: 3, cpu: 1, thread: 200, kind: function-enter, tsc: 10 }
				- { type: 1, func-id: 3, cpu: 1, thread: 200, kind: function-exit, tsc: 20 }
				- { type: 1, func-id: 2, cpu: 1, thread: 200, kind: function-exit, tsc: 30 }
				...
				# PER-THREAD: Thread 123
				# PER-THREAD: Unique Stacks: 1
				# PER-THREAD: Top 10 Stacks by leaf sum:
				# PER-THREAD: Sum: 100
				# PER-THREAD: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# PER-THREAD: #0 #1{{[[:space:]]+}}1{{[[:space:]]+}}674
				# PER-THREAD: #1 #2{{[[:space:]]+}}1{{[[:space:]]+}}450
				# PER-THREAD: #2 #3{{[[:space:]]+}}1{{[[:space:]]+}}100
				# PER-THREAD: Top 10 Stacks by leaf count:
				# PER-THREAD: #0 #1{{[[:space:]]+}}1{{[[:space:]]+}}674
				# PER-THREAD: #1 #2{{[[:space:]]+}}1{{[[:space:]]+}}450
				# PER-THREAD: #2 #3{{[[:space:]]+}}1{{[[:space:]]+}}100
				# PER-THREAD: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum

				# AGGREGATE: Unique Stacks: 3
				# AGGREGATE: Top 10 Stacks by leaf sum:
				# AGGREGATE: Sum: 200

				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #1{{[[:space:]]+}}3{{[[:space:]]+}}1348
				# AGGREGATE: #1 #2{{[[:space:]]+}}2{{[[:space:]]+}}700
				# AGGREGATE: #2 #3{{[[:space:]]+}}2{{[[:space:]]+}}200

				# AGGREGATE: Sum: 10
				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #2{{[[:space:]]+}}1{{[[:space:]]+}}30
				# AGGREGATE: #1 #3{{[[:space:]]+}}1{{[[:space:]]+}}10

				# AGGREGATE: Sum: 1
				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #1{{[[:space:]]+}}2{{[[:space:]]+}}674
				# AGGREGATE: #1 #1{{[[:space:]]+}}1{{[[:space:]]+}}1


				# AGGREGATE: Top 10 Stacks by leaf count:

				# AGGREGATE: Count: 2
				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #1{{[[:space:]]+}}3{{[[:space:]]+}}1348
				# AGGREGATE: #1 #2{{[[:space:]]+}}2{{[[:space:]]+}}700
				# AGGREGATE: #2 #3{{[[:space:]]+}}2{{[[:space:]]+}}200

				# AGGREGATE: Count: 1
				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #2{{[[:space:]]+}}1{{[[:space:]]+}}30
				# AGGREGATE: #1 #3{{[[:space:]]+}}1{{[[:space:]]+}}10

				# AGGREGATE: Count: 1
				# AGGREGATE: lvl function{{[[:space:]]+}}count{{[[:space:]]+}}sum
				# AGGREGATE: #0 #1{{[[:space:]]+}}2{{[[:space:]]+}}674
				# AGGREGATE: #1 #1{{[[:space:]]+}}1{{[[:space:]]+}}1

llvm/trunk/test/tools/llvm-xray/X86/stack-simple-case.yaml

				#RUN: llvm-xray stack %s \| FileCheck %s
				---
				header:
				version: 1
				type: 0
				constant-tsc: true
				nonstop-tsc: true
				cycle-frequency: 2601000000
				records:
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-enter, tsc: 10001 }
				- { type: 0, func-id: 1, cpu: 1, thread: 111, kind: function-exit, tsc: 10100 }
				...
				#CHECK: Unique Stacks: 1

llvm/trunk/tools/llvm-xray/CMakeLists.txt

Show All 9 Lines	set(LLVM_XRAY_TOOLS
func-id-helper.cc		func-id-helper.cc
xray-account.cc		xray-account.cc
xray-color-helper.cc		xray-color-helper.cc
xray-converter.cc		xray-converter.cc
xray-extract.cc		xray-extract.cc
xray-extract.cc		xray-extract.cc
xray-graph.cc		xray-graph.cc
xray-graph-diff.cc		xray-graph-diff.cc
		xray-stacks.cc
xray-registry.cc)		xray-registry.cc)

add_llvm_tool(llvm-xray llvm-xray.cc ${LLVM_XRAY_TOOLS})		add_llvm_tool(llvm-xray llvm-xray.cc ${LLVM_XRAY_TOOLS})

llvm/trunk/tools/llvm-xray/xray-record-yaml.h

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	static void mapping(IO &IO, xray::YAMLXRayTrace &Trace) {
IO.mapRequired("header", Trace.Header);		IO.mapRequired("header", Trace.Header);
IO.mapRequired("records", Trace.Records);		IO.mapRequired("records", Trace.Records);
}		}
};		};

} // namespace yaml		} // namespace yaml
} // namespace llvm		} // namespace llvm

LLVM_YAML_IS_SEQUENCE_VECTOR(xray::YAMLXRayRecord)		LLVM_YAML_IS_SEQUENCE_VECTOR(xray::YAMLXRayRecord)

#endif // LLVM_TOOLS_LLVM_XRAY_XRAY_RECORD_YAML_H		#endif // LLVM_TOOLS_LLVM_XRAY_XRAY_RECORD_YAML_H

llvm/trunk/tools/llvm-xray/xray-stacks.cc

				//===- xray-stacks.cc - XRay Function Call Stack Accounting ---------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements stack-based accounting. It takes XRay traces, and
				// collates statistics across these traces to show a breakdown of time spent
				// at various points of the stack to provide insight into which functions
				// spend the most time in terms of a call stack. We provide a few
				// sorting/filtering options for zero'ing in on the useful stacks.
				//
				//===----------------------------------------------------------------------===//

				#include <forward_list>
				#include <numeric>

				#include "func-id-helper.h"
				#include "xray-registry.h"
				#include "llvm/ADT/StringExtras.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Errc.h"
				#include "llvm/Support/ErrorHandling.h"
				#include "llvm/Support/FormatAdapters.h"
				#include "llvm/Support/FormatVariadic.h"
				#include "llvm/XRay/Graph.h"
				#include "llvm/XRay/InstrumentationMap.h"
				#include "llvm/XRay/Trace.h"

				using namespace llvm;
				using namespace llvm::xray;

				static cl::SubCommand Stack("stack", "Call stack accounting");
				static cl::list<std::string> StackInputs(cl::Positional,
				cl::desc("<xray trace>"), cl::Required,
				cl::sub(Stack), cl::OneOrMore);

				static cl::opt<bool>
				StackKeepGoing("keep-going", cl::desc("Keep going on errors encountered"),
				cl::sub(Stack), cl::init(false));
				static cl::alias StackKeepGoing2("k", cl::aliasopt(StackKeepGoing),
				cl::desc("Alias for -keep-going"),
				cl::sub(Stack));

				// TODO: Does there need to be an option to deduce tail or sibling calls?

				static cl::opt<std::string> StacksInstrMap(
				"instr_map",
				cl::desc("instrumentation map used to identify function ids. "
				"Currently supports elf file instrumentation maps."),
				cl::sub(Stack), cl::init(""));
				static cl::alias StacksInstrMap2("m", cl::aliasopt(StacksInstrMap),
				cl::desc("Alias for -instr_map"),
				cl::sub(Stack));

				static cl::opt<bool>
				SeparateThreadStacks("per-thread-stacks",
				cl::desc("Report top stacks within each thread id"),
				cl::sub(Stack), cl::init(false));

				static cl::opt<bool>
				AggregateThreads("aggregate-threads",
				cl::desc("Aggregate stack times across threads"),
				cl::sub(Stack), cl::init(false));

				/// A helper struct to work with formatv and XRayRecords. Makes it easier to use
				/// instrumentation map names or addresses in formatted output.
				struct format_xray_record : public FormatAdapter<XRayRecord> {
				explicit format_xray_record(XRayRecord record,
				const FuncIdConversionHelper &conv)
				: FormatAdapter<XRayRecord>(std::move(record)), Converter(&conv) {}
				void format(raw_ostream &Stream, StringRef Style) override {
				Stream << formatv(
				"{FuncId: \"{0}\", ThreadId: \"{1}\", RecordType: \"{2}\"}",
				Converter->SymbolOrNumber(Item.FuncId), Item.TId,
				DecodeRecordType(Item.RecordType));
				}

				private:
				Twine DecodeRecordType(uint16_t recordType) {
				switch (recordType) {
				case 0:
				return Twine("Fn Entry");
				case 1:
				return Twine("Fn Exit");
				default:
				// TODO: Add Tail exit when it is added to llvm/XRay/XRayRecord.h
				return Twine("Unknown");
				}
				}

				const FuncIdConversionHelper *Converter;
				};

				/// The stack command will take a set of XRay traces as arguments, and collects
				/// information about the stacks of instrumented functions that appear in the
				/// traces. We track the following pieces of information:
				///
				/// - Total time: amount of time/cycles accounted for in the traces.
				/// - Stack count: number of times a specific stack appears in the
				/// traces. Only instrumented functions show up in stacks.
				/// - Cumulative stack time: amount of time spent in a stack accumulated
				/// across the invocations in the traces.
				/// - Cumulative local time: amount of time spent in each instrumented
				/// function showing up in a specific stack, accumulated across the traces.
				///
				/// Example output for the kind of data we'd like to provide looks like the
				/// following:
				///
				/// Total time: 3.33234 s
				/// Stack ID: ...
				/// Stack Count: 2093
				/// # Function Local Time (%) Stack Time (%)
				/// 0 main 2.34 ms 0.07% 3.33234 s 100%
				/// 1 foo() 3.30000 s 99.02% 3.33 s 99.92%
				/// 2 bar() 30 ms 0.90% 30 ms 0.90%
				///
				/// We can also show distributions of the function call durations with
				/// statistics at each level of the stack. This works by doing the following
				/// algorithm:
				///
				/// 1. When unwinding, record the duration of each unwound function associated
				/// with the path up to which the unwinding stops. For example:
				///
				/// Step Duration (? means has start time)
				///
				/// push a <start time> a = ?
				/// push b <start time> a = ?, a->b = ?
				/// push c <start time> a = ?, a->b = ?, a->b->c = ?
				/// pop c <end time> a = ?, a->b = ?, emit duration(a->b->c)
				/// pop b <end time> a = ?, emit duration(a->b)
				/// push c <start time> a = ?, a->c = ?
				/// pop c <end time> a = ?, emit duration(a->c)
				/// pop a <end time> emit duration(a)
				///
				/// 2. We then account for the various stacks we've collected, and for each of
				/// them will have measurements that look like the following (continuing
				/// with the above simple example):
				///
				/// c : [<id("a->b->c"), [durations]>, <id("a->c"), [durations]>]
				/// b : [<id("a->b"), [durations]>]
				/// a : [<id("a"), [durations]>]
				///
				/// This allows us to compute, for each stack id, and each function that
				/// shows up in the stack, some important statistics like:
				///
				/// - median
				/// - 99th percentile
				/// - mean + stddev
				/// - count
				///
				/// 3. For cases where we don't have durations for some of the higher levels
				/// of the stack (perhaps instrumentation wasn't activated when the stack was
				/// entered), we can mark them appropriately.
				///
				/// Computing this data also allows us to implement lookup by call stack nodes,
				/// so that we can find functions that show up in multiple stack traces and
				/// show the statistical properties of that function in various contexts. We
				/// can compute information similar to the following:
				///
				/// Function: 'c'
				/// Stacks: 2 / 2
				/// Stack ID: ...
				/// Stack Count: ...
				/// # Function ...
				/// 0 a ...
				/// 1 b ...
				/// 2 c ...
				///
				/// Stack ID: ...
				/// Stack Count: ...
				/// # Function ...
				/// 0 a ...
				/// 1 c ...
				/// ----------------...
				///
				/// Function: 'b'
				/// Stacks: 1 / 2
				/// Stack ID: ...
				/// Stack Count: ...
				/// # Function ...
				/// 0 a ...
				/// 1 b ...
				/// 2 c ...
				///
				///
				/// To do this we require a Trie data structure that will allow us to represent
				/// all the call stacks of instrumented functions in an easily traversible
				/// manner when we do the aggregations and lookups. For instrumented call
				/// sequences like the following:
				///
				/// a()
				/// b()
				/// c()
				/// d()
				/// c()
				///
				/// We will have a representation like so:
				///
				/// a -> b -> c
				/// \| \|
				/// \| +--> d
				/// \|
				/// +--> c
				///
				/// We maintain a sequence of durations on the leaves and in the internal nodes
				/// as we go through and process every record from the XRay trace. We also
				/// maintain an index of unique functions, and provide a means of iterating
				/// through all the instrumented call stacks which we know about.

				struct TrieNode {
				int32_t FuncId;
				TrieNode *Parent;
				SmallVector<TrieNode *, 4> Callees;
				// Separate durations depending on whether the node is the deepest node in the
				// stack.
				SmallVector<int64_t, 4> TerminalDurations;
				SmallVector<int64_t, 4> IntermediateDurations;
				};

				/// Merges together two TrieNodes with like function ids, aggregating their
				/// callee lists and durations. The caller must provide storage where new merged
				/// nodes can be allocated in the form of a linked list.
				TrieNode *mergeTrieNodes(const TrieNode &Left, const TrieNode &Right,
				TrieNode *NewParent,
				std::forward_list<TrieNode> &NodeStore) {
				assert(Left.FuncId == Right.FuncId);
				NodeStore.push_front(TrieNode{Left.FuncId, NewParent, {}, {}, {}});
				auto I = NodeStore.begin();
				auto Node = &I;

				// Build a map of callees from the left side.
				DenseMap<int32_t, TrieNode *> LeftCalleesByFuncId;
				for (auto *Callee : Left.Callees) {
				LeftCalleesByFuncId[Callee->FuncId] = Callee;
				}

				// Iterate through the right side, either merging with the map values or
				// directly adding to the Callees vector. The iteration also removes any
				// merged values from the left side map.
				for (auto *Callee : Right.Callees) {
				auto iter = LeftCalleesByFuncId.find(Callee->FuncId);
				if (iter != LeftCalleesByFuncId.end()) {
				Node->Callees.push_back(
				mergeTrieNodes((iter->second), Callee, Node, NodeStore));
				LeftCalleesByFuncId.erase(iter);
				} else {
				Node->Callees.push_back(Callee);
				}
				}

				// Add any callees that weren't found in the right side.
				for (auto MapPairIter : LeftCalleesByFuncId) {
				Node->Callees.push_back(MapPairIter.second);
				}

				// Aggregate the durations.
				for (auto duration : Left.TerminalDurations) {
				Node->TerminalDurations.push_back(duration);
				}
				for (auto duration : Right.TerminalDurations) {
				Node->TerminalDurations.push_back(duration);
				}
				for (auto duration : Left.IntermediateDurations) {
				Node->IntermediateDurations.push_back(duration);
				}
				for (auto duration : Right.IntermediateDurations) {
				Node->IntermediateDurations.push_back(duration);
				}

				return Node;
				}

				class StackTrie {

				// We maintain pointers to the roots of the tries we see.
				DenseMap<uint32_t, SmallVector<TrieNode *, 4>> Roots;

				// We make sure all the nodes are accounted for in this list.
				std::forward_list<TrieNode> NodeStore;

				// A map of thread ids to pairs call stack trie nodes and their start times.
				DenseMap<uint32_t, SmallVector<std::pair<TrieNode *, uint64_t>, 8>>
				ThreadStackMap;

				TrieNode *createTrieNode(uint32_t ThreadId, int32_t FuncId,
				TrieNode *Parent) {
				NodeStore.push_front(TrieNode{FuncId, Parent, {}, {}, {}});
				auto I = NodeStore.begin();
				auto Node = &I;
				if (!Parent)
				Roots[ThreadId].push_back(Node);
				return Node;
				}

				TrieNode *findRootNode(uint32_t ThreadId, int32_t FuncId) {
				const auto &RootsByThread = Roots[ThreadId];
				auto I = find_if(RootsByThread,
				[&](TrieNode *N) { return N->FuncId == FuncId; });
				return (I == RootsByThread.end()) ? nullptr : *I;
				}

				public:
				enum class AccountRecordStatus {
				OK, // Successfully processed
				ENTRY_NOT_FOUND, // An exit record had no matching call stack entry
				UNKNOWN_RECORD_TYPE
				};

				struct AccountRecordState {
				// We keep track of whether the call stack is currently unwinding.
				bool wasLastRecordExit;

				static AccountRecordState CreateInitialState() { return {false}; }
				};

				AccountRecordStatus accountRecord(const XRayRecord &R,
				AccountRecordState *state) {
				auto &TS = ThreadStackMap[R.TId];
				switch (R.Type) {
				case RecordTypes::ENTER: {
				state->wasLastRecordExit = false;
				// When we encounter a new function entry, we want to record the TSC for
				// that entry, and the function id. Before doing so we check the top of
				// the stack to see if there are callees that already represent this
				// function.
				if (TS.empty()) {
				auto *Root = findRootNode(R.TId, R.FuncId);
				TS.emplace_back(Root ? Root : createTrieNode(R.TId, R.FuncId, nullptr),
				R.TSC);
				return AccountRecordStatus::OK;
				}

				auto &Top = TS.back();
				auto I = find_if(Top.first->Callees,
				[&](TrieNode *N) { return N->FuncId == R.FuncId; });
				if (I == Top.first->Callees.end()) {
				// We didn't find the callee in the stack trie, so we're going to
				// add to the stack then set up the pointers properly.
				auto N = createTrieNode(R.TId, R.FuncId, Top.first);
				Top.first->Callees.emplace_back(N);

				// Top may be invalidated after this statement.
				TS.emplace_back(N, R.TSC);
				} else {
				// We found the callee in the stack trie, so we'll use that pointer
				// instead, add it to the stack associated with the TSC.
				TS.emplace_back(*I, R.TSC);
				}
				return AccountRecordStatus::OK;
				}
				case RecordTypes::EXIT: {
				bool wasLastRecordExit = state->wasLastRecordExit;
				state->wasLastRecordExit = true;
				// The exit case is more interesting, since we want to be able to deduce
				// missing exit records. To do that properly, we need to look up the stack
				// and see whether the exit record matches any of the entry records. If it
				// does match, we attempt to record the durations as we pop the stack to
				// where we see the parent.
				if (TS.empty()) {
				// Short circuit, and say we can't find it.

				return AccountRecordStatus::ENTRY_NOT_FOUND;
				}

				auto FunctionEntryMatch =
				find_if(reverse(TS), [&](const std::pair<TrieNode *, uint64_t> &E) {
				return E.first->FuncId == R.FuncId;
				});
				auto status = AccountRecordStatus::OK;
				if (FunctionEntryMatch == TS.rend()) {
				status = AccountRecordStatus::ENTRY_NOT_FOUND;
				} else {
				// Account for offset of 1 between reverse and forward iterators. We
				// want the forward iterator to include the function that is exited.
				++FunctionEntryMatch;
				}
				auto I = FunctionEntryMatch.base();
				for (auto &E : make_range(I, TS.end() - 1))
				E.first->IntermediateDurations.push_back(std::max(E.second, R.TSC) -
				std::min(E.second, R.TSC));
				auto &Deepest = TS.back();
				if (wasLastRecordExit)
				Deepest.first->IntermediateDurations.push_back(
				std::max(Deepest.second, R.TSC) - std::min(Deepest.second, R.TSC));
				else
				Deepest.first->TerminalDurations.push_back(
				std::max(Deepest.second, R.TSC) - std::min(Deepest.second, R.TSC));
				TS.erase(I, TS.end());
				return status;
				}
				}
				return AccountRecordStatus::UNKNOWN_RECORD_TYPE;
				}

				bool isEmpty() const { return Roots.empty(); }

				void printStack(raw_ostream &OS, const TrieNode *Top,
				FuncIdConversionHelper &FN) {
				// Traverse the pointers up to the parent, noting the sums, then print
				// in reverse order (callers at top, callees down bottom).
				SmallVector<const TrieNode *, 8> CurrentStack;
				for (auto *F = Top; F != nullptr; F = F->Parent)
				CurrentStack.push_back(F);
				int Level = 0;
				OS << formatv("{0,-5} {1,-60} {2,+12} {3,+16}\n", "lvl", "function",
				"count", "sum");
				for (auto *F :
				reverse(make_range(CurrentStack.begin() + 1, CurrentStack.end()))) {
				auto Sum = std::accumulate(F->IntermediateDurations.begin(),
				F->IntermediateDurations.end(), 0LL);
				auto FuncId = FN.SymbolOrNumber(F->FuncId);
				OS << formatv("#{0,-4} {1,-60} {2,+12} {3,+16}\n", Level++,
				FuncId.size() > 60 ? FuncId.substr(0, 57) + "..." : FuncId,
				F->IntermediateDurations.size(), Sum);
				}
				auto Leaf = CurrentStack.begin();
				auto LeafSum = std::accumulate(Leaf->TerminalDurations.begin(),
				Leaf->TerminalDurations.end(), 0LL);
				auto LeafFuncId = FN.SymbolOrNumber(Leaf->FuncId);
				OS << formatv("#{0,-4} {1,-60} {2,+12} {3,+16}\n", Level++,
				LeafFuncId.size() > 60 ? LeafFuncId.substr(0, 57) + "..."
				: LeafFuncId,
				Leaf->TerminalDurations.size(), LeafSum);
				OS << "\n";
				}

				/// Prints top stacks for each thread.
				void printPerThread(raw_ostream &OS, FuncIdConversionHelper &FN) {
				for (auto iter : Roots) {
				OS << "Thread " << iter.first << ":\n";
				print(OS, FN, iter.second);
				OS << "\n";
				}
				}

				/// Prints top stacks from looking at all the leaves and ignoring thread IDs.
				/// Stacks that consist of the same function IDs but were called in different
				/// thread IDs are not considered unique in this printout.
				void printIgnoringThreads(raw_ostream &OS, FuncIdConversionHelper &FN) {
				SmallVector<TrieNode *, 4> RootValues;

				// Function to pull the values out of a map iterator.
				using RootsType = decltype(Roots.begin())::value_type;
				auto MapValueFn = [](const RootsType &Value) { return Value.second; };

				for (const auto &RootNodeRange :
				make_range(map_iterator(Roots.begin(), MapValueFn),
				map_iterator(Roots.end(), MapValueFn))) {
				for (auto *RootNode : RootNodeRange)
				RootValues.push_back(RootNode);
				}

				print(OS, FN, RootValues);
				}

				/// Merges the trie by thread id before printing top stacks.
				void printAggregatingThreads(raw_ostream &OS, FuncIdConversionHelper &FN) {
				std::forward_list<TrieNode> AggregatedNodeStore;
				SmallVector<TrieNode *, 4> RootValues;
				for (auto MapIter : Roots) {
				const auto &RootNodeVector = MapIter.second;
				for (auto *Node : RootNodeVector) {
				auto MaybeFoundIter = find_if(RootValues, [Node](TrieNode *elem) {
				return Node->FuncId == elem->FuncId;
				});
				if (MaybeFoundIter == RootValues.end()) {
				RootValues.push_back(Node);
				} else {
				RootValues.push_back(mergeTrieNodes(*MaybeFoundIter, Node, nullptr,
				AggregatedNodeStore));
				RootValues.erase(MaybeFoundIter);
				}
				}
				}
				print(OS, FN, RootValues);
				}

				void print(raw_ostream &OS, FuncIdConversionHelper &FN,
				SmallVector<TrieNode *, 4> RootValues) {
				// Go through each of the roots, and traverse the call stack, producing the
				// aggregates as you go along. Remember these aggregates and stacks, and
				// show summary statistics about:
				//
				// - Total number of unique stacks
				// - Top 10 stacks by count
				// - Top 10 stacks by aggregate duration
				SmallVector<std::pair<const TrieNode *, uint64_t>, 11> TopStacksByCount;
				SmallVector<std::pair<const TrieNode *, uint64_t>, 11> TopStacksBySum;
				auto greater_second = [](const std::pair<const TrieNode *, uint64_t> &A,
				const std::pair<const TrieNode *, uint64_t> &B) {
				return A.second > B.second;
				};
				uint64_t UniqueStacks = 0;
				for (const auto *N : RootValues) {
				SmallVector<const TrieNode *, 16> S;
				S.emplace_back(N);

				while (!S.empty()) {
				auto Top = S.pop_back_val();

				// We only start printing the stack (by walking up the parent pointers)
				// when we get to a leaf function.
				if (!Top->TerminalDurations.empty()) {
				++UniqueStacks;
				auto TopSum = std::accumulate(Top->TerminalDurations.begin(),
				Top->TerminalDurations.end(), 0uLL);
				{
				auto E = std::make_pair(Top, TopSum);
				TopStacksBySum.insert(std::lower_bound(TopStacksBySum.begin(),
				TopStacksBySum.end(), E,
				greater_second),
				E);
				if (TopStacksBySum.size() == 11)
				TopStacksBySum.pop_back();
				}
				{
				auto E = std::make_pair(Top, Top->TerminalDurations.size());
				TopStacksByCount.insert(std::lower_bound(TopStacksByCount.begin(),
				TopStacksByCount.end(), E,
				greater_second),
				E);
				if (TopStacksByCount.size() == 11)
				TopStacksByCount.pop_back();
				}
				}
				for (const auto *C : Top->Callees)
				S.push_back(C);
				}
				}

				// Now print the statistics in the end.
				OS << "\n";
				OS << "Unique Stacks: " << UniqueStacks << "\n";
				OS << "Top 10 Stacks by leaf sum:\n\n";
				for (const auto &P : TopStacksBySum) {
				OS << "Sum: " << P.second << "\n";
				printStack(OS, P.first, FN);
				}
				OS << "\n";
				OS << "Top 10 Stacks by leaf count:\n\n";
				for (const auto &P : TopStacksByCount) {
				OS << "Count: " << P.second << "\n";
				printStack(OS, P.first, FN);
				}
				OS << "\n";
				}
				};

				std::string CreateErrorMessage(StackTrie::AccountRecordStatus Error,
				const XRayRecord &Record,
				const FuncIdConversionHelper &Converter) {
				switch (Error) {
				case StackTrie::AccountRecordStatus::ENTRY_NOT_FOUND:
				return formatv("Found record {0} with no matching function entry\n",
				format_xray_record(Record, Converter));
				default:
				return formatv("Unknown error type for record {0}\n",
				format_xray_record(Record, Converter));
				}
				}

				static CommandRegistration Unused(&Stack, []() -> Error {
				// Load each file provided as a command-line argument. For each one of them
				// account to a single StackTrie, and just print the whole trie for now.
				StackTrie ST;
				InstrumentationMap Map;
				if (!StacksInstrMap.empty()) {
				auto InstrumentationMapOrError = loadInstrumentationMap(StacksInstrMap);
				if (!InstrumentationMapOrError)
				return joinErrors(
				make_error<StringError>(
				Twine("Cannot open instrumentation map: ") + StacksInstrMap,
				std::make_error_code(std::errc::invalid_argument)),
				InstrumentationMapOrError.takeError());
				Map = std::move(*InstrumentationMapOrError);
				}

				if (SeparateThreadStacks && AggregateThreads)
				return make_error<StringError>(
				Twine("Can't specify options for per thread reporting and reporting "
				"that aggregates threads."),
				std::make_error_code(std::errc::invalid_argument));

				symbolize::LLVMSymbolizer::Options Opts(
				symbolize::FunctionNameKind::LinkageName, true, true, false, "");
				symbolize::LLVMSymbolizer Symbolizer(Opts);
				FuncIdConversionHelper FuncIdHelper(StacksInstrMap, Symbolizer,
				Map.getFunctionAddresses());
				// TODO: Someday, support output to files instead of just directly to
				// standard output.
				for (const auto &Filename : StackInputs) {
				auto TraceOrErr = loadTraceFile(Filename);
				if (!TraceOrErr) {
				if (!StackKeepGoing)
				return joinErrors(
				make_error<StringError>(
				Twine("Failed loading input file '") + Filename + "'",
				std::make_error_code(std::errc::invalid_argument)),
				TraceOrErr.takeError());
				logAllUnhandledErrors(TraceOrErr.takeError(), errs(), "");
				continue;
				}
				auto &T = *TraceOrErr;
				StackTrie::AccountRecordState AccountRecordState =
				StackTrie::AccountRecordState::CreateInitialState();
				for (const auto &Record : T) {
				auto error = ST.accountRecord(Record, &AccountRecordState);
				if (error != StackTrie::AccountRecordStatus::OK) {
				if (!StackKeepGoing)
				return make_error<StringError>(
				CreateErrorMessage(error, Record, FuncIdHelper),
				make_error_code(errc::illegal_byte_sequence));
				errs() << CreateErrorMessage(error, Record, FuncIdHelper);
				}
				}
				}
				if (ST.isEmpty()) {
				return make_error<StringError>(
				"No instrumented calls were accounted in the input file.",
				make_error_code(errc::result_out_of_range));
				}
				if (AggregateThreads) {
				ST.printAggregatingThreads(outs(), FuncIdHelper);
				} else if (SeparateThreadStacks) {
				ST.printPerThread(outs(), FuncIdHelper);
				} else {
				ST.printIgnoringThreads(outs(), FuncIdHelper);
				}
				return Error::success();
				});