This is an archive of the discontinued LLVM Phabricator instance.

Introduce new command: thread backtrace unique
ClosedPublic

Authored by bgianfo on May 22 2017, 5:43 PM.

Details

Summary

This patch introduces a new thread backtrace command "unique".
The command is based off of "thread backtrace all" but will instead
find all threads which share matching call stacks and de-duplicate
their output, listing call stack and all the threads which share it.
This is especially useful for apps which use thread/task pools
sitting around waiting for work and cause excessive duplicate output.
I needed this behavior recently when debugging a core with 700+ threads.

Event Timeline

bgianfo created this revision.May 22 2017, 5:43 PM
jingham requested changes to this revision.May 22 2017, 6:09 PM

The idea is great. I think it is a little confusing that you would do:

(lldb) thread backtrace all

to get all threads but

(lldb) thread unique-stacks

to backtrace the unique stacks. Wouldn't it be more logical to do:

(lldb) thread backtrace unique

It's really just another specification of the threads you are backtracing.

A few inline comments, we use '_' word separators for locals.

source/Commands/CommandObjectThread.cpp
448

uniqueStacks -> unique_stacks

488–489

We do local variables as: word_other_word, not camel-case.

497

found_match

This revision now requires changes to proceed.May 22 2017, 6:09 PM
bgianfo updated this revision to Diff 99844.May 22 2017, 8:22 PM
bgianfo edited edge metadata.
bgianfo marked 3 inline comments as done.

This iteration addresses Jim's feedback.

I've updated all of the code to match the surrounding style where
I missed it originally. (branches/variable names)

I also moved the command under "thread backtrace unique".
This required refactoring the thread iterator command object.
I've tried to reuse as much as code possible in the thread iterator
so that we actually use HandleOneThread to push the rending of
of the call stack down to the BackTrace command.

The bucketing has been moved directly into the command iterator
when he user passes in the "unique" option.

bgianfo retitled this revision from Introduce new command: thread unique-stacks to Introduce new command: thread backtrace unique.May 22 2017, 8:24 PM
bgianfo edited the summary of this revision. (Show Details)
labath added a subscriber: labath.May 23 2017, 3:05 AM

Sounds like an awesome feature.

Could you please add a test for it as well?

jingham requested changes to this revision.May 23 2017, 9:24 AM

Pavel's right, it would be good to add a test case. You could modify the test case in packages/Python/lldbsuite/test/functionalities/thread/num_threads/ to this end. Note this Test Case used to be sitting mixed with the directories in the thread directory, but I just moved it into its own directory - so update before doing this. You could just make a few more thread 3's and break at the lock.unlock line. Then you should have a bunch of threads with the same stack, and your 'unique' listing would coalesce them. If you do this, please add it as a separate test (so make a new method 'test_unique' and redo the setup. That way the tests will remain independent.

There's a "functionalities/thread/backtrace_all" test that looks tempting to modify, but it doesn't actually look that amenable to modification for your purposes. The TestBacktraceAll.py there does show an example of running a command-line command, and looking for patterns in the result, which is what you'll need to add.

source/Commands/CommandObjectThread.cpp
324

Maybe "to see threads grouped by unique call stacks"? As written it sounds like you're going to show me the really cool call stacks.

This revision now requires changes to proceed.May 23 2017, 9:24 AM
clayborg edited edge metadata.May 23 2017, 10:15 AM

Add a test and fix the minor things as suggested and this will be good to go.

bgianfo updated this revision to Diff 100047.May 24 2017, 12:54 AM
bgianfo edited edge metadata.
bgianfo marked an inline comment as done.

Address Pavel/Jim/Greg's feedback with the addition of a new test.

I followed Jim's advice and extended the existing num_threads suite
so that we start more thread3's. The new test_unique_stacks test case
was added to verify that we correct bucketing of the multiple thread3
call stacks executing across threads.

Thanks for the feedback thus far!

Thanks for writing the test. We just need to make sure it runs in a stable manner.

packages/Python/lldbsuite/test/functionalities/thread/num_threads/TestNumThreads.py
91

I am afraid this is going to be very flaky. There's no guarantee that the threads will have had enough time to block in the lock call by the time you reach the breakpoint. The simplest solution would be to insert a sufficiently long (a couple of seconds) sleep in the inferior, but smarter positions are possible as well. For example, we could iterate through all SBThreads in the process and manually issue a Continue on all of them who haven't hit our predefined breakpoints yet.

packages/Python/lldbsuite/test/functionalities/thread/num_threads/main.cpp
57–59

I'd find for(auto &t: thread_3s) t.join() more understandable than this.

bgianfo updated this revision to Diff 100365.May 25 2017, 10:30 PM

Address Pavel's feedback, made the unit test more robust.

This update increases the robustness of the new test I added.
We ensure synchronization, and force the threads into the state
we want them to be in by manually stepping them into place.

thanks for the effort. I found the logic of the test quite difficult to follow, with multiple breakpoints and notify_calls(). Instead of trying to point out each problem, I figured it will be easier to write my take on how the test could look like: https://paste.ubuntu.com/24665770/. Besides making the logic flow nicer (I hope), it also avoids a couple of other potentially problematic spots in your test (reliance on the exact way we number threads, potentially infinite loop in the thread.StepInstruction calls, etc).

Let me know what you think.

emaste added a subscriber: emaste.May 26 2017, 4:36 AM
clayborg requested changes to this revision.May 26 2017, 9:59 AM

We should probably store the stacks as lldb::addr_t values that are load addresses for quicker comparisons and searches. Inlined code details things more clearly.

source/Commands/CommandObjectThread.cpp
76

Might be more efficient to store a "std::stack<lldb::addr_t>" where you store the load address of each address. The comparisons would go quicker, not to mention the construction and destruction of stack itself.

148

Add an operator function for UniqueStack that only compares the stack frames:

bool inline operator <(const UniqueStack &lhs, const UniqueStack &rhs) {
  return lhs.m_stack_frames < rhs.m_stack_frames;
}

The comparison will be cheap when m_stack_frames is a std::stack of lldb::addr_t values.

Then store as a std::set for efficient lookup later instead of a linear search.

204
std::set<UniqueSet> &unique_stacks;
220–221
const lldb::addr_t pc = frame_sp->GetStackID().GetPC();
stack_frames.push(pc);
226

use fast std::set.find() so search for the matching value.

This revision now requires changes to proceed.May 26 2017, 9:59 AM
bgianfo updated this revision to Diff 100529.May 26 2017, 11:16 PM
bgianfo edited edge metadata.
bgianfo marked 7 inline comments as done.

Address Pavel and Greg's feedback on Diff 100365.

Pavel: I took your suggestions to make the test case more readable,
I really appreciate the guidance. I did have to tweak some of the
functionality to make the test case pass reliably, as there were
still some races possible. I also saw that SBThread.Resume() seems
to occasionally result in a StopReason of eStopReasonNon. So I worked
around that by only including threads int expected output that the Resume
resulted in making it to our breakpoint. I have verified the test is
consistently passes by executing it on repeat 100 times,

Greg: Thanks for the suggestions for using std::set, and lldb:adrr_t.
The code is a lot cleaner now, and as you mentioned should perform faster.
I didn't notice any significant speed up with my 700 thread dump however.

Much cleaner. After seeing how we must construct a UniqueStack just so we can search for it, an even cleaner solution would be to make unique_stacks into:

std::map<std::vector<lldb::addr_t>

Clicked submit too early with last comments. After seeing how we previously needed to create a temp UniqueStack just to do the lookup, and also how UniqueStack make its thread index ID list mutable, a better solution is to use a std::map as shown in the inlined comments.

One issue with this fix that we will need to fix is we will need to disable showing variable values in the stack backtrace. The default frame format is:

(lldb) settings show frame-format
frame-format (format-string) = "frame #${frame.index}: ${frame.pc}{ ${module.file.basename}{`${function.name-with-args}{${frame.no-debug}${function.pc-offset}}}}{ at ${line.file.basename}:${line.number}}{${function.is-optimized} [opt]}\n"

The main problem is we have "${function.name-with-args}" in the format string. We must detect we are trying to unique stacks and not try to print any thread variables otherwise we will not be showing the correct information to the user. This would mean piping the fact we are trying to unique stacks into the FormatEntity::Format() function in FormatEntity.h/.cpp and just don't show the variables in this case.

source/Commands/CommandObjectThread.cpp
50

After seeing how we have to create a temp UniqueStack so we can search for it in our set, we can do this a bit cleaner. We can get rid of this class if we use a std::map that maps from the unique stack load addresses to the thread ID list:

std::map<std::vector<lldb::addr_t>, std::vector<uint32_t>> unique_stacks;

So more changes below...

148

Change to:

std::map<std::vector<lldb::addr_t>, std::vector<uint32_t>> unique_stacks;
159

change to:

for (const auto &stacks_tids: unique_stacks) {
161

change to:

const std::vector<uint32_t>& thread_index_ids = stacks_tids.second;
169

change to:

// Since all threads have the same call stack we can just pick the first one.
uint32_t representative_thread_id = thread_index_ids.front();
204
std::map<std::vector<lldb::addr_t>, std::vector<uint32_t>> &unique_stacks
225

Remove this.

228
auto matching_stack = unique_stacks.find(stack_frames);
229–233

Can probably replace the entire if/else with:

unique_stacks[stack_frames].push_back(thread_index_id);

The other option for fixing the problem with showing the wrong variables in the backtraces would be to make up a new frame-format string that is used for uniqued stack frames and use that format when showing uniqued stack frames:

(lldb) settings show frame-format-unique
frame-format (format-string) = "frame #${frame.index}: ${frame.pc}{ ${module.file.basename}{`${function.name}{${frame.no-debug}${function.pc-offset}}}}{ at ${line.file.basename}:${line.number}}{${function.is-optimized} [opt]}\n"

The only difference in the format string is we use ${function.name} instead of ${function.name-with-args}.

Zach made some comments via e-mail:

Couple of things:

  1. Unless you really really want iteration over this map to be in some deterministic order, better to use unordered_map. And if you do want it to be in some deterministic order, you should provide a comparison function, as the default one is probably not what you want.'

True. The previous solution had the same ordering as this map I proposed.

  1. std::map<std::vector<lldb::addr_t>, std::vector<uint32_t>> is pretty horrible to read, and offers no insight into what the keys and values are. At the very least, add some typedefs like:

typedef std::vector<lldb::addr_t> StackID;
typedef std::vector<tid_t> ThreadIDList;
std::unordered_map<StackID, ThreadIDList> TheMap;

Agreed.

  1. If performance is a concern here (700+ threads seems like it could exhibit slowdown when uniquing stacks), the best performance would be to use DenseSet<std::unique_ptr<StackInfo>>, where StackInfo contains both the address list and the thread is list as well as a pre-computed hash value, and then use a custom implementation of DenseMapInfo that just returns the hash value directly. See llvm/lib/DebugInfo/CodeView/TypeSerializer.cpp for an example.

Not sure I would go that far. When you include both things in the type (StackInfo here) then you must make one up in order to search for it right? We are mainly trying to create a collection of stacks that we can use to point the the thread IDs. I agreed it doesn't need to be sorted so std::unordered_map would work just fine.

The more I think about it, the previous approach was probably fine with the std::set and with the UniqueStack class. Lets just stick with that for now, but we still need to fix the display to not show variable values and claim that many threads have the same variable values when showing ${function.name-with-args}.

bgianfo updated this revision to Diff 100830.May 31 2017, 1:09 AM

Fix bug in "unique" backtrace output that Greg pointed out.

Introduced a new format for unique frames and plumbed that
through the stacks to be able to toggle between them both
depending on the calling arguments.

Address Pavel and Greg's feedback on Diff 100365.

Pavel: I took your suggestions to make the test case more readable,
I really appreciate the guidance. I did have to tweak some of the
functionality to make the test case pass reliably, as there were
still some races possible. I also saw that SBThread.Resume() seems
to occasionally result in a StopReason of eStopReasonNon. So I worked
around that by only including threads int expected output that the Resume
resulted in making it to our breakpoint. I have verified the test is
consistently passes by executing it on repeat 100 times,

Thanks. The fact that we are not able to rely on the operation of Resume in this case sounds like a bug. Obviously we can't condition the acceptance of this patch by fixing that issue, but we should at least track it. Can you create a bug on bugs.llvm.org, and reference it in your workaround. BTW, what's the platform you are testing this on?

clayborg accepted this revision.May 31 2017, 10:11 AM

Looks like a good starting point. Thanks for the changes.

@jingham @labath do you have any more feedback?

labath added a comment.Jun 5 2017, 4:05 AM

@jingham @labath do you have any more feedback?

As Jim pointed out, the Resume command does not do what I thought it does, so having it in the test makes no sense. One option would be to just leave out the calls to Resume (as they don't do anything anyway) and go with your workaround of testing the threads that happened to stop at the breakpoint on the first run. The other option would be to replace the Resume call with something that does what we intended (one such candidate would be thread.StepOut). I prefer the second one as it gives a more deterministic output, but I think both of them are good enough.

bgianfo updated this revision to Diff 101664.EditedJun 6 2017, 7:54 PM

Update test to use thread.StepOut() instead of thread.Resume().

labath added a comment.Jun 7 2017, 1:35 AM

Cool. Sorry about all the back-and-forth and thanks for the patience.

jingham accepted this revision.Jun 7 2017, 11:37 AM

This looks fine to me.

This revision is now accepted and ready to land.Jun 7 2017, 11:37 AM
clayborg accepted this revision.Jun 8 2017, 7:52 AM

Can someone commit this as I obviously don't have a svn commit bit?

This revision was automatically updated to reflect the committed changes.

Committed as r305197. I needed to tweak the test logic a bit, as not even thread.StepOut() did exactly what we needed there, because it was resuming also all other threads even though it was not necessary. Take a look at the resulting commit if you want to see the differences: (spoiler: the magic command that worked was thread continue #id)