This is an archive of the discontinued LLVM Phabricator instance.

[NativeSession] Implement findLineNumbersByAddress functions for native llvm-symbolizer support.
ClosedPublic

Authored by akhuang on Jun 1 2020, 3:41 PM.

Details

Summary

This implements findLineNumbersByAddress in NativeSession, which takes
an address and a length and returns all lines within that address range.

Currently I'm doing this by creating a line table which is a vector of
line entries (line info, addr, file index) for all the lines from the PDB.
It then finds the specific line by binary searching the line table.

Diff Detail

Event Timeline

akhuang created this revision.Jun 1 2020, 3:41 PM
akhuang updated this revision to Diff 268314.Jun 3 2020, 2:57 PM

Should be ready for review now. I'm not quite sure if this is the best way to implement
the line table stuff; a lot of it is similar to the LineTable class in lldb.

akhuang edited the summary of this revision. (Show Details)Jun 3 2020, 4:33 PM
akhuang edited the summary of this revision. (Show Details)
akhuang added a reviewer: labath.
labath added a comment.Jun 4 2020, 6:19 AM

I'm afraid I lack sufficient context to be able to give meaningful feedback here. Right now all I can to is complain that the use of auto in the patch, while being consistent with the other code in these files, is probably not consistent with the llvm guidelines on auto usage. At least for me the auto does not make this code more readable as it obscures the types of everything, including some fairly important details like whether something is an Optional<T> or Expected<T>. I don't really have a specific suggestion on that. Just saying...

llvm/lib/DebugInfo/PDB/Native/NativeEnumLineNumbers.cpp
2–3

bad formatting

llvm/lib/DebugInfo/PDB/Native/NativeLineNumber.cpp
2–3

here too

llvm/lib/DebugInfo/PDB/Native/NativeSourceFile.cpp
38

This looks suspicious. Is the checksum really guaranteed to be null terminated? Maybe something like toStringRef(Checksum.Checksum).str() ?

llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
476–480

I'm not sure how often can this happen in PDBs, but in DWARF it happens a lot. A pattern like this has caused a significant slowdown in lldb in the past. It might be better to replace this with a better data structure. For instance you could store the data in a vector of vectors, sort it at the end of the function, and only flatten as the very last step. (That's what we did in lldb.)

502

This assumes that the table contains at least one element, but I'm not sure if that is always the case. To an unfamiliar reader it does seem like populateLineTable can legitimately return without inserting anything into the table.

I also get a distinct feeling that this iterator manipulation could be simplified, but I'm not sure how because I'm not exactly sure what is this code doing.

akhuang updated this revision to Diff 268627.Jun 4 2020, 5:02 PM
akhuang marked 5 inline comments as done.

Address comments.

I also changed the line table construction so that modules are added to the line
table as they are needed (instead of adding all at the beginning), since in practice
only a small number of lines are queried for.

@labath thanks for the comments! I removed some of the autos (but probably missed a bunch); I agree about the readability thing.

llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
476–480

Good to know. I'll try doing this.

502

My bad, I meant to put another non-empty check after calling populateLineTable.

Maybe it can be simplified; it's searching for the first address in the line table that is not greater than VA.

labath added inline comments.Jun 8 2020, 5:17 AM
llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
502

Maybe something like:

// This will find the first non-terminal entry whose address is >= VA
auto LineIter = llvm::partition_point(Lines, [&] (const auto &E) { return E.Addr < VA || (E.Addr == VA && E.IsTerminalEntry); });
if (LineIter == Lines.end() || LineIter->Addr > VA) {
  // We've gone too far. Try to back up.
  if (LineIter == Lines.begin() || std::prev(LineIter)->IsTerminalEntry) {
    // Can't back up. It's not clear to me if the existing version handles the case where the `VA` would be in a "hole" between a previous terminal entry ending before VA and the next entry which starts after VA.
    return nullptr;
  }
  --LineIter;
}
akhuang updated this revision to Diff 269990.Jun 10 2020, 4:10 PM

address comments and other random cleanup

akhuang marked an inline comment as done.Jun 10 2020, 4:26 PM
akhuang added inline comments.
llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
502

oh, ok. this does look a lot simpler.

Sorry about my review latency.

This mostly looks really good. I'm not trying to pick on you with the efficiency things. I've just been watching too many YouTube videos of Chandler's efficiency talks. Please consider the inline comments as alternatives to consider rather than change requests.

llvm/lib/DebugInfo/PDB/Native/NativeEnumLineNumbers.cpp
26

I know I argued the other way in a different patch, but the details were different. Here I think you should take the parameter by rvalue reference. I could be wrong here, but here's how I'm thinking about this.

As far as I can tell, the only invocation of this constructor is using std::move, which I don't think helps when the argument is accepted by value. Then the argument is copied into the Lines member. So that's two copies total. (Right?)

If you changed the initializer to be Lines(std::move(LineNums)), then the argument could be moved into Lines. In total, you'd up with a copy and a move, which is better.

If you changed the constructor to accept by rvalue reference, then the std::move at the call site becomes useful. Combine with with a std::move into Lines, and I think that would eliminate all copies and simply be two moves. (Moving a std::vector is a win over copying it.)

32

I'm trying to weigh the pros and cons of returning by std::unique_ptr<T> versus ErrorOr<T> or Expected<T>. I don't think either of the latter ones requires a memory allocation. But maybe there's API precedent here?

36

I'm wondering whether this should set the Index member to N (or N+1) before returning. If a caller calls getChildAtIndex(10) and then calls getNext(), would they expect child 0 or child 10 or child 11?

Is there an similar API that sets a precedent for the expectation here?

llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
531

Will LineNumbers generally contain a small number of entries? If it's usually or almost always small, it could be worth making it a SmallVector to eliminate extra allocations.

amccarth added inline comments.Jun 15 2020, 9:10 AM
llvm/lib/DebugInfo/PDB/Native/NativeEnumLineNumbers.cpp
26

Some of what I said is wrong. The existing std::move does seem to avoid a copy based on my experiments with Compiler Explorer ("godbolt"). Despite my error, I still think my recommendations are still good ideas.

akhuang updated this revision to Diff 270827.Jun 15 2020, 12:34 PM
akhuang marked 4 inline comments as done.

Address comments, and also fix some logic in the findLineNumbers function.

llvm/lib/DebugInfo/PDB/Native/NativeEnumLineNumbers.cpp
32

Yeah, I think they match the DIA APIs? This class is inheriting from the IPDBEnumChildren interface.

36

The other NativeEnum-- classes act the same way, such as NativeEnumModules::getChildAtIndex.

llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
531

Hm, I checked how many entries a NativeEnumLineNumbers normally has (using a clang crash for testing) and it appears it's often somewhere between 20 and 200.

amccarth accepted this revision.Jun 15 2020, 2:38 PM
amccarth marked 2 inline comments as done.

LGTM.

llvm/lib/DebugInfo/PDB/Native/NativeEnumLineNumbers.cpp
32

Yeah, that makes sense. OK, question withdrawn. std::unique_ptr is fine as is.

llvm/lib/DebugInfo/PDB/Native/SymbolCache.cpp
531

OK, that's more than I expected. std::vector seems an appropriate choice. Thanks for checking.

This revision is now accepted and ready to land.Jun 15 2020, 2:38 PM
akhuang edited the summary of this revision. (Show Details)Jun 15 2020, 4:51 PM
akhuang updated this revision to Diff 270917.Jun 15 2020, 4:56 PM

rebase and minor change to searchForPdb so it uses the last returned error message.

This revision was automatically updated to reflect the committed changes.