This is an archive of the discontinued LLVM Phabricator instance.

[trace] Dedup different source lines when dumping instructions + refactor
ClosedPublic

Authored by wallace on Apr 18 2021, 11:09 PM.

Details

Summary

When dumping the traced instructions in a for loop, like this one

4:  for (int a = 0; a < n; a++)  
5:    do something;

there might be multiple LineEntry objects for line 4, but with different address ranges. This was causing the dump command to dump something like this:

a.out`main + 11 at main.cpp:4
  [1] 0x0000000000400518    movl   $0x0, -0x8(%rbp)
  [2] 0x000000000040051f    jmp    0x400529                  ; <+28> at main.cpp:4
a.out`main + 28 at main.cpp:4
  [3] 0x0000000000400529    cmpl   $0x3, -0x8(%rbp)
  [4] 0x000000000040052d    jle    0x400521                  ; <+20> at main.cpp:5

which is confusing, as main.cpp:4 appears twice consecutively.

This diff fixes that issue by making the line entry comparison strictly about the line, column and file name. Before it was also comparing the address ranges, which we don't need because our output is strictly about what the user sees in the source.

Besides, I've noticed that the logic that traverses instructions and calculates symbols and disassemblies had too much coupling, and made my changes harder to implement, so I decided to decouple it. Now there are two methods for iterating over the instruction of a trace. The existing one does it on raw load addresses, but the one provides a SymbolContext and an InstructionSP, and does the calculations efficiently (not as efficient as possible for now though), so the caller doesn't need to care about these details. I think I'll be using that iterator to reconstruct the call stacks.

I was able to fix a test with this change.

Diff Detail

Event Timeline

wallace requested review of this revision.Apr 18 2021, 11:09 PM
wallace created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2021, 11:09 PM
wallace planned changes to this revision.Apr 18 2021, 11:09 PM
wallace updated this revision to Diff 338435.EditedApr 18 2021, 11:51 PM
This comment has been deleted.
wallace planned changes to this revision.Apr 18 2021, 11:51 PM
wallace retitled this revision from json to [trace] Dedup different source lines when dumping instructions + refactor.Apr 20 2021, 4:32 PM
wallace edited the summary of this revision. (Show Details)
wallace added a reviewer: clayborg.
wallace updated this revision to Diff 339044.Apr 20 2021, 4:32 PM
wallace retitled this revision from [trace] Dedup different source lines when dumping instructions + refactor to json.

ready for review

clayborg requested changes to this revision.Apr 20 2021, 5:06 PM
clayborg added inline comments.
lldb/include/lldb/Symbol/LineEntry.h
94–104 ↗(On Diff #339044)

I would leave the compare function alone and create a new one that just compares the file and line only. Why? The compare function is comparing a lot more: LineEntry::is_terminal_entry, LineEntry::column. So make a new function, and we don't need to return a -1, 0, or 1, but a bool:

static bool FileAndLineMatches(const LineEntry &lhs, const LineEntry &rhs);

Or just compare the file and line manually where this was being called.

lldb/include/lldb/Target/Trace.h
214–222

Does this need to be exposed in the Trace.h file? Seems like this could be declared in the Trace.cpp file

224

s/byt/but/

225–240

Can we put a static function into Trace.cpp so this isn't needed in the Trace.h file?

lldb/source/Target/Trace.cpp
114–115

Remove the "Optional<Trace::InstructionSymbolInfo>" for "prev_insn and just check it prior to calling and make params const

118

we should compare the function/symbol first, then do the line entries.

127

use != instead of ^ to make the code easier to deduce.

132

call LjneEntry::FileAndLineMatches(...) or just manually compare the file and line only.

135–139

Don't we want to check the function first before the line entry? What if the function changes but we have the same file and line, like a vector:23 from STL...

Also you can probably just make this code much much simpler

142–145

Ditto, simplify:

148

No need to do a module check as the module owns the function or symbol and they won't match.

This revision now requires changes to proceed.Apr 20 2021, 5:06 PM

The title should be updated on this diff as well. Set the "json" right now

wallace retitled this revision from json to [trace] Dedup different source lines when dumping instructions + refactor.Apr 22 2021, 9:20 PM
wallace updated this revision to Diff 342448.May 3 2021, 10:30 AM
wallace marked 10 inline comments as done.

Address all comments

Harbormaster completed remote builds in B102328: Diff 342453.
clayborg requested changes to this revision.May 3 2021, 2:32 PM
clayborg added inline comments.
lldb/source/Target/Trace.cpp
163

Can or should we get the disassembler in calculate_symbol_context above after calling address.CalculateSymbolContext(...)?

175

Can this function really be called without a valid load address?

181

Remove the " * 1" here

182–183

Better to use:

bool Address::SetLoadAddress(lldb::addr_t load_addr, Target *target);
185

AddressRange::ContainsFileAddress() can't be used here after looking at how it is currently implemented. It will check if the sections are the same first and return the correct answer if they are, otherwise it will just extract a file addresses and compare them.

To do this right, we need to uncomment out the currently unused:

bool AddressRange::Contains (const Address &addr) const;

And we will need to fix and use it by making sure the modules match. If the modules don't match, we don't want to extract two file addresses from two different modules and then compare those, as two modules can both have a 0x1000 file address.

The current commented out implementation of AddressRange::Contains(...) is wrong too. Fixing correctly looks like:

bool
AddressRange::Contains(const Address &addr) const
{
  SectionSP rangeSectSP = GetBaseAddress().GetSection();
  SectionSP addrSectSP = addr.GetSection();
  if (rangeSectSP) {
    if (!addrSectSP || rangeSectSP->GetModule() != addrSectSP->GetModule())
      return false; // Modules do not match.
  } else if (addrSectSP) {
    return false; // Range has no module but "addr" does because addr has a section
  }
  // Either the modules match, or both have no module, so it is ok to compare 
  // the file addresses in this case only.
  return ContainsFileAddress(addr);
}
200–201

You can't just return if there is no symbol, you can have no symbol, but still have a function from debug info if the symbol was a local symbol since it could have been stripped from the executable at build time.

Just remove these two lines and the "else" below

204–206

Should this return a Expected<size_t>? In case there is an error we want to show? Like maybe the trace buffer was truncated, or missing when it was expected? Can we have a process that is traced, but by the time we fetch the instructions, the circular buffer was filled with other data and even though the thread was traced there is no data?

206–214
This revision now requires changes to proceed.May 3 2021, 2:32 PM
wallace marked 6 inline comments as done.May 4 2021, 9:15 AM
wallace added inline comments.
lldb/source/Target/Trace.cpp
163

i think it's fine as it is, as we don't always need a disassembly. I want to use this TraverseInstructionsWithSymbolInfo method for creating the call graph, and I don't need the disassembly for that

175

yes, whenever there are gaps in the trace

204–206

Any kind of errors are represented as failed instructions, which means that if the entire operation failed, there's one single instruction with an error message. If the thread is not traced at all, then this method return None.
There can be errors while getting the data or errors while decoding individual instructions. That was a suggestion by Labath and I think this makes the code simpler, as there is only one way to handle errors.

wallace updated this revision to Diff 342770.May 4 2021, 9:16 AM
wallace marked an inline comment as done.

address comments

clayborg accepted this revision.May 4 2021, 1:22 PM

Just fix the one issue where we use the FileSpec operator== and this is good to go!

lldb/source/Core/AddressRange.cpp
59 ↗(On Diff #342770)

It might be worth scanning the code to see if anyone is using AddressRange::ContainsFileAddress() incorrectly. It is fine to use this within a module to check if an address is in the range, but not ok if the address can come from a different module.

lldb/source/Target/Trace.cpp
164

No need to compare, just use == operator

This revision is now accepted and ready to land.May 4 2021, 1:22 PM
This revision was landed with ongoing or failed builds.May 4 2021, 7:42 PM
This revision was automatically updated to reflect the committed changes.