This is an archive of the discontinued LLVM Phabricator instance.

Breakpad: Parse Stack CFI records
ClosedPublic

Authored by labath on Apr 4 2019, 8:13 AM.

Details

Summary

This patch adds support for parsing STACK CFI records from breakpad
files. For the expressions specifying the values of registers, only a
basic parse is performed -- the record is split into a bunch of key-value
pairs, where both keys and values are strings. The idea is that these
will be handed off to the postfix expression -> dwarf compiler, once
it is extracted from the internals of the NativePDB plugin

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Apr 4 2019, 8:13 AM

This looks good. I have a few questions inline, but none of those are major concerns.

source/Plugins/ObjectFile/Breakpad/BreakpadRecords.h
173 ↗(On Diff #193717)

I'm not a fan of deep class hierarchies, but given that StackCFIRecord is a subset of StackCFIInitRecord and given that the naming suggests StackCFIInitRecord is a concrete type of StackCFIRecord, should StackCFIInitRecord derive from StackCFIRecord? (Or perhaps contain one?)

If not, perhaps a comment to explain why not.

unittests/ObjectFile/Breakpad/BreakpadRecordsTest.cpp
123 ↗(On Diff #193717)

All of the negative parsing tests seem to be incomplete (e.g., truncated or a missing INIT). Should there be others, like having the wrong token type? Or an invalid token? Or one that has extra tokens at the end of an otherwise valid record?

I see you're following the pattern for the other types here, but is that enough?

132 ↗(On Diff #193717)

This test relies on the parsing test for the StackCFIInitRecord having covered most of the cases. That works because the parsing implementation is shared, so I'm OK with it. Will others be concerned that this test isn't as independent as it could be?

labath marked 3 inline comments as done.Apr 5 2019, 2:45 AM
labath added inline comments.
source/Plugins/ObjectFile/Breakpad/BreakpadRecords.h
173 ↗(On Diff #193717)

Good question. I was doing the same thing as I did with FUNC and PUBLIC records, which are also very similar, but I wouln't be comfortable saying one is a special case of the other. However, the case here is not as clear, as one could consider the "init" record to be a special kind of a "stack cfi" record. OTOH, I also don't like deep hierarchies and particularly having a concrete class (init record) inherit from another concrete class (stack cfi record).

Since this code is likely to have only a single consumer (the thing which converts these into UnwindPlans), adding an AbstractCFIRecord class would seem like an overkill. Instead, how about we just collapse things even more and use just a single class to represent both records (with an Optional<addr_t> for the size)?

I don't think this should make it the job of the consumer much harder, but it will allow us to:

  • avoid needing special code to group these two records together
  • give us an excuse to call this group (section) "STACK CFI". (I have previously called this group "STACK CFI INIT" for consistency with how the FUNC+LINE groups is handled, but that didn't feel quite right here).
  • speed up parsing as we don't have to parse the "STACK CFI" part twice just to find out we're not parsing the correct record type.
unittests/ObjectFile/Breakpad/BreakpadRecordsTest.cpp
123 ↗(On Diff #193717)

Following the pattern isn't much of an excuse for me, as I was the one who established that pattern. :)

The reason the pattern is like it is was that I was trying to cover all of the paths through the parsing code. I believe I have succeeded at that, but I haven't used a profiler or anything to verify that. Knowing the implementation, it's obvious that a test with an invalid record type is not going to be interesting, as we will bail out straight after the first token. However, I agree that from a black-box testing perspective, these are interesting cases to check, so I've added a couple more that I could think of.

This is going to be the most extensively tested piece of code in lldb :P.

132 ↗(On Diff #193717)

Another advantage of merging the two record types is that it makes this comment void. :)

labath updated this revision to Diff 193849.Apr 5 2019, 2:47 AM

merge StackCFI and StackCFIInit records
add some tests

clayborg added inline comments.Apr 5 2019, 7:21 AM
source/Plugins/ObjectFile/Breakpad/BreakpadRecords.cpp
393 ↗(On Diff #193849)

Do we really need to pull the content apart into separate strings for each register? Seems like a lot of work and 99% of these we will never accessed. Maybe just store the entire string for all registers and be done?

394 ↗(On Diff #193849)

Does the format specify no space between the register name and the colon?

clayborg added inline comments.Apr 5 2019, 7:22 AM
source/Plugins/ObjectFile/Breakpad/BreakpadRecords.cpp
393 ↗(On Diff #193849)

You can add an iterator method to the StackCFIRecord record maybe for when you do want to parse each register?

My concerns were address, so LGTM. I'll leave the rest to you and clayborg.

labath marked 5 inline comments as done.Apr 8 2019, 2:39 AM
labath added inline comments.
source/Plugins/ObjectFile/Breakpad/BreakpadRecords.cpp
393 ↗(On Diff #193849)

That's a good point. I'll remove the register parsing code from this patch and just store the entire expression as one string(ref). I'll see what's the best api for splitting these up when I get to connecting this with the postfix expression parser.

394 ↗(On Diff #193849)

That is how I interpret the format "spec" https://chromium.googlesource.com/breakpad/breakpad/+/master/docs/symbol_files.md

Each registeri is the name of a register or pseudoregister. Each expression is a Breakpad postfix expression, which may contain spaces, but never ends with a colon.

That's not a very good definition because there are many ways one can split up these strings such they "don't end with a colon", but I chose this interpretation, as that's what made most sense to me (and it works for the example files I have lying around). We can always change that if we find a producer doing things slightly differently.

labath updated this revision to Diff 194108.Apr 8 2019, 2:40 AM
labath marked 2 inline comments as done.

Remove the fancy parsing of register expressions.

clayborg accepted this revision.Apr 8 2019, 7:55 AM
This revision is now accepted and ready to land.Apr 8 2019, 7:55 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2019, 1:03 AM