Page MenuHomePhabricator

[trace] Introduce Hierarchical Trace Representation (HTR) and add `thread trace export ctf` command for Intel PT trace visualization
ClosedPublic

Authored by jj10306 on Jul 9 2021, 2:51 PM.

Details

Summary

This diff introduces Hierarchical Trace Representation (HTR) and creates the thread trace export ctf -f <filename> -t <thread_id> command to export an Intel PT trace's HTR to Chrome Trace Format (CTF) for visualization.

See lldb/docs/htr.rst for context/documentation on HTR.

Overview of Changes

  • Add HTR documentation (see lldb/docs/htr.rst)
  • Add HTR structures (layer, block, block metadata)
  • Implement "Basic Super Block" HTR pass
  • Add 'thread trace export ctf' command to export the HTR of an Intel PT trace to Chrome Trace Format (CTF)

As this diff is the first iteration of HTR and trace visualization, future diffs will build on this work by generalizing the internal design of HTR and implementing new HTR passes that provide better trace summarization/visualization.

See attached video for an example of Intel PT trace visualization:

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
wallace added inline comments.Jul 23 2021, 12:33 AM
lldb/source/Target/TraceHTR.cpp
152 ↗(On Diff #360965)

this Printf is too noisy. You don't really need that. Just remove it and instead of invoking cursor.GetError(), invoke cursor.IsError() after rebasing the code

160 ↗(On Diff #360965)

valid_cursor is not a nice name, just call it no_more_data, end_of_trace, or something like that

164 ↗(On Diff #360965)

just call IsError as you don't need the error message

164 ↗(On Diff #360965)

i'm a little bit confused here, lines 165 -> 167 will only be execute if the current instruction is an error, whose load address is going to be 0, so you'll get an invalid function name. This should instead be

if(!end_of_trace && !cursor.IsError()) {
  m_instruction_layer_up->AddCallInstructionMetadata(
              current_instruction_load_address,
              function_name_from_load_address(cursor.GetLoadAddress()));
} else {
  m_instruction_layer_up->AddCallInstructionMetadata(
            current_instruction_load_address, llvm::None);
}
168 ↗(On Diff #360965)

remove this

184–185 ↗(On Diff #360965)

you could also just do

llvm::DenseMap<ConstString, size_t> func_calls = m1.m_func_calls;

if the object doesn't allow it, then it's fine

187–190 ↗(On Diff #360965)

did you try just doing

func_calls[name] += num_calls;

std::map will create a zeroed value for a new key when using the [] operator, so I imagine DenseMap work the same way. Nevermind if that doesn't work.

197 ↗(On Diff #360965)

remove this assert, the code wouldn't break if num_units is 0

204–205 ↗(On Diff #360965)

this is very expensive. Let's suppose that num_units is N, then you'll invoke MergeMetadata N - 1 times, and each element of the first map will be copied N - 1 times, each one from the second map will be copied N - 2 times, etc. So you'll have N^2 copies, which is just too slow.

Instead, you have two options:

  • when you merge A and B, you don't create a new map, you just copy the elements from B to A, effectively modifying A's map. That way after N merges, each element from each map will need to be copied only once. This might intuitively makes sense from an API standpoint, because when you merge blocks, you might expect the original blocks to go away
  • remove the merge method that accepts only two blocks and do the merging here coping everything to the same single map

up to you

212–214 ↗(On Diff #360965)

remove this comment

216 ↗(On Diff #360965)

better create a new method ArePassesDone

223 ↗(On Diff #360965)

also call the ArePassesDone method here

225–227 ↗(On Diff #360965)

remove this

233 ↗(On Diff #360965)

return an llvm::Error instead of receiving a Stream

233–234 ↗(On Diff #360965)

remove the TraceExportFormat enum. You don't really need it because there's one single case for now

240–241 ↗(On Diff #360965)

include the error code in the error message. See how other parts of the code emit a meaningful error message for this kind of calls

245 ↗(On Diff #360965)

can you also show the actual error message from os?

253–257 ↗(On Diff #360965)

remove

260 ↗(On Diff #360965)

you don't need this

412 ↗(On Diff #360965)

remove this comment

412–422 ↗(On Diff #360965)

use int64_t just in case, as that's how it's defined

439–440 ↗(On Diff #360965)

this comment doens't make sense, as you are instead passing display_name to the json object

lldb/test/API/commands/trace/TestTraceExport.py
61

Add a new test in which you run this algorithm on a program with a few levels of function calls, and check that the output trace makes sense. Don't check for individual load addresses, but instead check for the appearance of the methods and blocks that you expect.

This revision now requires changes to proceed.Jul 23 2021, 12:33 AM
jj10306 marked 2 inline comments as done.Jul 23 2021, 9:41 AM
jj10306 added inline comments.
lldb/include/lldb/Target/TraceHTR.h
272 ↗(On Diff #360965)

Makes sense. How should the case when an id that the layer doesn't contain is passed in? I would like to use llvm::Optional here but it's not possible to have an Optional reference (i.e. Optional<& const HTRBlock>). I was thinking about creating different methods:

bool ContainsBlock(size_t id) const for checking if the layer contains an ID
HTRBlock const & GetBlockById(size_t id) const for retrieving a reference to a block in the layer
but this still doesn't solve the issue of if a caller calls GetBlockId with an invalid ID.

Another option would be to change GetBlockId to return Optional<HTRBlock> instead of a reference, but this would result in an unnecessary copy, when all we really need is an const reference.

339 ↗(On Diff #360965)

I based this method off of TraceCursor::DumpInstructions which takes in a Stream & to output its success/error messages to. This method (TraceHTR::Export) prints a message to the console on success (not only failure) so I'm not sure how the success message could be propagated back up to the caller. I agree that using llvm::Error here would make sense if we were only printing if an Error occurred, but since we print on success I'm not sure if we should use llvm::Error over passing the Stream.

lldb/source/Target/TraceHTR.cpp
152 ↗(On Diff #360965)

Initially I wasn't planning on displaying the error, but then I was getting runtime errors because I wasn't doing anything with the error. As a result, I decided to display the error and looked at how TraceCursor::DumpInstructions does it: s << toString(std::move(err));.
I believe if an isError() method existed then I would not have this issue about not checking/using the error and I could just use the isError() method to check for the presence of an error without actually retrieving the error object.

197 ↗(On Diff #360965)

If num_units is 0 then the loop body would never execute and thus merged_metadata would be None at the time of return. This would be problematic since we return *merged_metadata.

204–205 ↗(On Diff #360965)

Ahh yep, I've been meaning to update this method to take the first approach you described.

233 ↗(On Diff #360965)

See comment on this function's declaration in TraceHTR.h

439–440 ↗(On Diff #360965)

I was trying to indicate that GetMostFrequentlyCalledFunction returns a std::string instead of ConstString because eventually the ConsString function name will need to be converted to std::string in order to be serialized, but I will remove this comment and just change the function to return ConstString and then do the conversion in the ternary that defines display_name.

lldb/test/API/commands/trace/TestTraceExport.py
61

Yes, this makes sense. Is the way to do this by loading the JSON file's contents into memory in the test and doing assertions on that structure?

wallace added inline comments.Jul 23 2021, 10:10 AM
lldb/include/lldb/Target/TraceHTR.h
272 ↗(On Diff #360965)

Optional<const HTRBlock &> should work like a charm. You can't put the & before the const AFAIK. People normally read the & backwards as in "reference to <const HTRBlock>"

339 ↗(On Diff #360965)

It's different, because DumpInstructions is supposed to dump information in the stream, but in this case Export is printing to a file. So what if someone wants to invoke Export but not from a command? e.g. from a python script. You are making this case a little bit more complicated.
You can just return llvm::success() and let the caller print anything to Stream if available. In any case, it's standard in lldb to not print anything if the operation is a success

lldb/source/Target/TraceHTR.cpp
152 ↗(On Diff #360965)

yes my bad =P After rebasing you should be able to use IsError() :) Sorry for the hiccup

197 ↗(On Diff #360965)

Ahh, true. Then also mention in the documentation that num_units must be >= 1, otherwise someone might think that could somehow get an empty HTRBlock ><

439–440 ↗(On Diff #360965)

you shouldn't change a generic API just to ease a single instance of a usage. Better return a ConstString, or even better, a llvm::StringRef, which can point to anything. ConstString has a method to get a StringRef out of it. StringRef::str() gets you a std::string if you need it, or you can use StringRef::data() to get the const char *, which helps you avoid one copy.

lldb/test/API/commands/trace/TestTraceExport.py
61

yes, that should be fine. You can also precompile a non-trivial a.out file and include it here, if you want the code to be fully deterministic. If you make it non-trivial enough, we should be able to reuse for other intel pt tests as well

jj10306 marked an inline comment as done.Jul 23 2021, 10:28 AM
jj10306 added inline comments.
lldb/include/lldb/Target/TraceHTR.h
272 ↗(On Diff #360965)

I tried what you suggested and also just a reference (no const), but it none of them work - this is the error message:

llvm/include/llvm/ADT/Optional.h:281:5: error: 'getPointer' declared as a pointer to a reference of type 'lldb_
private::HTRBlock &'
  T *getPointer() { return &Storage.getValue(); }

A cursory search of the issue brought up many articles, here's one:
https://www.fluentcpp.com/2018/10/05/pros-cons-optional-references/

339 ↗(On Diff #360965)

Ah I see, that makes sense. I'll switch it to use llvm::Error and not print anything in the case of success, because the success indication isn't particularly useful.

lldb/source/Target/TraceHTR.cpp
439–440 ↗(On Diff #360965)

Will update that function to return a StringRef 🙂

jj10306 marked 29 inline comments as done.Jul 23 2021, 2:41 PM
jj10306 added inline comments.
lldb/include/lldb/Target/TraceHTR.h
272 ↗(On Diff #360965)

Discussed offline - decided to return HTRBlock * const with a nullptr indicating that the layer does not contain a bock with the specified block id.

lldb/source/Target/TraceHTR.cpp
160 ↗(On Diff #360965)

I was trying to use "positive logic", otherwise using something like end_of_trace would require and negating every use of it and every time the Next() is called (i.e end_of_trace =!cursor->Next()).

This is obviously a very minor thing, but I think using a well named "positive logic" variable makes the code easier to read - thoughts on using a name like more_data_in_trace? It's definitely harder to come up with a well named positive logic variable so let me know if you have any better suggestions

245 ↗(On Diff #360965)

I don't believe llvm::raw_fd_ostream stores the error message, only the error code.

wallace added inline comments.Jul 23 2021, 2:48 PM
lldb/source/Target/TraceHTR.cpp
160 ↗(On Diff #360965)

more_data_in_trace is fine!

245 ↗(On Diff #360965)

you can invoke os.error() and then convert that value to a string :)

jj10306 marked 11 inline comments as done.Jul 24 2021, 8:50 AM
jj10306 updated this revision to Diff 361460.Jul 24 2021, 8:52 AM
jj10306 edited the summary of this revision. (Show Details)

Rebase and address comments

Looks like I pulled in some unwanted changes related to the traceinstruction dumper and tsc when I rebased - will resolve this issue before landing

Pretty good. Just some minor nits. Once you rebase on top of the trace exporter plugin patch, this should be good to go.

lldb/test/API/commands/trace/TestTraceExport.py
55–57

delete this. You can actually compile a main.cpp file as part of the test and trace it with intel pt, and then check its output. You don't need "trace save" to do that.

lldb/test/API/commands/trace/intelpt-trace/functions_example.cpp
1 ↗(On Diff #361460)

rename this file to export_htr_input.cpp or something like that. Same for the a.out

26 ↗(On Diff #361460)

given that we are going to add a binary and we don't want to do it often, add another function like

void iterative_handle_request_by_id(int id, int reps) {
  for (int i = 0; i < reps; i++) {
    if (i % 2 == 0)
      slow_handle_request(id);
    else
      fast_handle_request(id);
}

just to have someone interesting to test later. Then recompile your a.out

jj10306 updated this revision to Diff 361916.Jul 26 2021, 11:34 PM

Rebased (integrate with https://reviews.llvm.org/D106501#change-SoRRVpoqDNdx) and address comments

wallace accepted this revision.Jul 27 2021, 12:35 AM

Nice! I'll land this for you

jj10306 retitled this revision from [trace] Add `thread trace export` command for Intel PT trace visualization to [trace] Introduce Hierarchical Trace Representation (HTR) and add `thread trace export ctf` command for Intel PT trace visualization.Jul 27 2021, 10:09 AM
jj10306 edited the summary of this revision. (Show Details)
This revision was not accepted when it landed; it landed in state Needs Review.Jul 28 2021, 11:04 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Temporarily reverted this change as it breaks LLDB build on 32 bit Arm/Linux bot:
https://lab.llvm.org/buildbot/#/builders/17/builds/9497

teemperor added inline comments.
lldb/source/Plugins/TraceExporter/ctf/CommandObjectThreadTraceExportCTF.cpp
78

This generates a compiler warning:

[3344/4085] Building CXX object tools/lldb/source/Plugins/TraceExporter/ctf/CMakeFiles/lldbPluginTraceExporterCTF.dir/CommandObjectThreadTraceExportCTF.cpp.o
/Users/teemperor/3llvm/llvm-project/lldb/source/Plugins/TraceExporter/ctf/CommandObjectThreadTraceExportCTF.cpp:79:82: warning: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat]
        "Thread index %" PRIu64 " is out of range (valid values are 0 - %u).\n", tid,
                      ~~~~~~~~~                                                  ^~~

The variadic format methods are obsolete and you can just use llvm::formatv(...).str() until we have a type-safe replacement.

wallace added inline comments.Jul 29 2021, 6:52 PM
lldb/source/Plugins/TraceExporter/ctf/CommandObjectThreadTraceExportCTF.cpp
78

thanks for the suggestion. TIL :)
I'll make a quick fix for this

vsk added a subscriber: vsk.Aug 2 2021, 12:09 PM

Hey @jj10306, thanks for working on this.

@wallace @clayborg -- It seems like there are a ton of logic changes introduced in this patch that are tested at too coarse a level. I'm not confident in the changes being made here. For example, there's a bunch of subtle work being done in BasicSuperBlockMerge, but only ~one opaque high-level check that attempts to validate any of the logic: self.assertTrue(num_units_by_layer[1] == 383). Where does 383 come from? How do we know that that's the right result? If there's a bug in BasicSuperBlockMerge, would the regression test just look like changing the magic 383 number?

I'm not really comfortable with this being checked in, and would suggest a revert until some more granular unit tests can be added to the patch. (Also paging @JDevlieghere in case he has thoughts on the testing.)

lldb/docs/htr.rst
27

Would be helpful to describe a brief feature roadmap here, explaining what debugger functionality can be built on top of this HTR concept.

lldb/source/Plugins/TraceExporter/common/TraceHTR.h
38

Why not move 'func_calls' instead of copying it?

Sure thing. Sorry for not being that thorough.

foad added a subscriber: foad.Aug 3 2021, 2:17 AM
foad added inline comments.
lldb/docs/htr.rst
13

Typo "oad".

22

Stray "l" at end of line.

Hey @jj10306, thanks for working on this.

@wallace @clayborg -- It seems like there are a ton of logic changes introduced in this patch that are tested at too coarse a level. I'm not confident in the changes being made here. For example, there's a bunch of subtle work being done in BasicSuperBlockMerge, but only ~one opaque high-level check that attempts to validate any of the logic: self.assertTrue(num_units_by_layer[1] == 383). Where does 383 come from? How do we know that that's the right result? If there's a bug in BasicSuperBlockMerge, would the regression test just look like changing the magic 383 number?

I'm not really comfortable with this being checked in, and would suggest a revert until some more granular unit tests can be added to the patch. (Also paging @JDevlieghere in case he has thoughts on the testing.)

Apologies for this - will submit a patch with more comprehensive tests and other minor fixes this week!

A more comprehensive test for this logic was added in https://reviews.llvm.org/D107674