This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Support parsing DWARF expressions
ClosedPublic

Authored by rafauler on Feb 14 2018, 12:14 PM.

Details

Summary

This patch refactors the interface of DWARFFrame a little bit
with the goal of making FIEs and CIEs accessible to library users, so
they can process them in client tools that rely on LLVM. It also
enhances DWARFFrame with the capability of parsing CFI DWARF
expressions. To make it self-contained with test cases, it changes
llvm-readobj to be able to dump EH frames and checks they are correct
in a unit test.

The llvm-readobj code is Maksim Panchenko work (maksfb), but I am
supporting it in this patch for the purposes of making it
self-contained.

Diff Detail

Repository
rL LLVM

Event Timeline

rafauler created this revision.Feb 14 2018, 12:14 PM
aprantl added inline comments.Feb 14 2018, 12:35 PM
include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
26 ↗(On Diff #134293)

we build with autobrief enabled, so the \brief is redundant and should be dropped.

davide added a subscriber: davide.Feb 14 2018, 12:38 PM
rnk added a subscriber: rnk.Feb 15 2018, 12:02 PM
rnk added inline comments.
include/llvm/BinaryFormat/Dwarf.h
533–534 ↗(On Diff #134293)

Without a bounds, this seems like it's asking to crash on corrupt inputs. The pattern we adopted in the CV dumper that started life in llvm-readobj was to pass around ArrayRef<uint8_t> &Data and consume bytes from the front of the ArrayRef.

The other style common in dwarfdump is to use a DataExtractor, but personally we found it nicer to work directly with ArrayRefs.

Even if all we do is assert or report_fatal_error on buffer overrun, it's better than nothing.

lib/DebugInfo/DWARF/DWARFDebugFrame.cpp
607–614 ↗(On Diff #134293)

Yeah, modifying in ArrayRef by reference seems preferable to this offsetting dance.

Thanks Adrian, I'll drop the \brief. That makes sense, thanks for the context on DataExtractor versus other approaches. In our client code, it never made a lot of sense to use DataExtractor and that's part of the reason I was lifting DWARF decoding helpers to live on their own. Then I was using readobj as a proxy for our client code so we have something upstream that demonstrates how we are accessing and manipulating DWARF CFIs (both coming from FDEs and CIEs). Then I'll take a second look on this design to avoid pitfalls with unchecked bounds.

JDevlieghere requested changes to this revision.Feb 16 2018, 2:49 AM
JDevlieghere added inline comments.
include/llvm/BinaryFormat/Dwarf.h
534 ↗(On Diff #134293)

I'm not convinced that (1) we need these functions and (2) that they belong here. Can you look into using their counterparts in the DataExtractor?

include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
68 ↗(On Diff #134293)

Please run clang-format over the new/modified code.

100 ↗(On Diff #134293)

I believe this and the next line can be const.

108 ↗(On Diff #134293)

Let's add a comment to the next two functions as well to keep things consistent.

125 ↗(On Diff #134293)

Same here.

231 ↗(On Diff #134293)

I know this code was moved, but it would be nice if you could check which of these can be made const since you're touching these lines.

275 ↗(On Diff #134293)

Same as previous comment.

lib/DebugInfo/DWARF/DWARFDebugFrame.cpp
323 ↗(On Diff #134293)

::empty()?

407 ↗(On Diff #134293)

I think you can call ::indent(2*IndentLevel)

599 ↗(On Diff #134293)

Please don't use auto when you can't deduct the type form the RHS

This revision now requires changes to proceed.Feb 16 2018, 2:49 AM
rafauler updated this revision to Diff 135128.Feb 20 2018, 1:05 PM

The fact that DataExtractor already has read*LEB128 is an important
point. So I decided to move the only missing helper,
"readEncodedPointer", into DataExtractor. However, it is DWARF-specific.
So I've put it into DWARFDataExtractor and made the necessary
adjustments. I've addressed your other concerns as well, let me know if
this is fine.

@aprantl or @JDevlieghere have either of you looked at the decoding details carefully? If not, I'll have a go at it tomorrow; I was just skimming things today.

lib/DebugInfo/DWARF/DWARFDataExtractor.cpp
11 ↗(On Diff #135128)

Leave DWARFDataExtractor.h as the first #include because it is the module's header. Dwarf.h should go after it.

50 ↗(On Diff #135128)

You can combine the 2/4/8 cases, passing getAddressSize() to getUnsigned().

JDevlieghere requested changes to this revision.Feb 21 2018, 7:18 AM

Thanks for working on this, Rafael!

Halfway through the review I remembered that we already have a DWARFExpression class that has some logic for parsing DWARF Expressions and Operations.
Could you have a look at merging your implementation? Depending on how big a change that is, it would be great if you could split that off into a separate code review.

include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
84 ↗(On Diff #135128)

Let's return an llvm::error here instead of having unreachables in the code. The caller can then decide what to do with it.

89 ↗(On Diff #135128)

Probably the previous comment applies here too.

282 ↗(On Diff #135128)

Can we merge the private parts of this class? I'd just move this to the bottom.

lib/DebugInfo/DWARF/DWARFDataExtractor.cpp
30 ↗(On Diff #135128)

This should probably be an optional too?

This revision now requires changes to proceed.Feb 21 2018, 7:18 AM
espindola edited reviewers, added: espindola; removed: rafael.Feb 28 2018, 1:12 PM
rafauler updated this revision to Diff 136462.Feb 28 2018, 8:27 PM

Address comments.

Sorry I just saw rafael's comments, I'll address them too.

JDevlieghere added inline comments.Mar 1 2018, 1:02 AM
include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
50 ↗(On Diff #136462)

Doesn't this represent the same thing as the DWARFExpression::Operation?

rafauler added inline comments.Mar 1 2018, 10:37 AM
include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
50 ↗(On Diff #136462)

Well, not exactly. So the DWARF expression is a sequence of operations too, but they manipulate this evaluation stack whose topmost value will be considered the result of the expression after all operations have been processed. It is written in postfix notation.
It's a bit different with CFI (call frame information) instructions, which is represented in this struct. These instructions establish the frame status at each program counter inside a function (in my copy of the DWARF4 manual, this is explained in more detail on pg.127 in "Sec6. Other debugging information"). For example, the most important information about the frame is the one that tells where the frame starts, so the unwinder knows at least how to find the return address, jump to the caller and adjust the stack pointer accordingly. The address where the frame starts is called a "canonical frame address" (CFA), so one of the instructions that can set this is called "DW_CFA_def_cfa_expression", where the operand is a DWARF expression teaching the unwinder how to get to the frame start. In this case, DW_CFA_def_cfa_expression is the CFIProgram::Instruction here, and the expression is just an operand. The expression itself will be entirely interpreted in the DWARFExpression class.

rafauler updated this revision to Diff 136613.Mar 1 2018, 2:59 PM

Addressing rafael's comments

rafauler updated this revision to Diff 136634.Mar 1 2018, 4:23 PM

Address rafael's comments

A few nits but otherwise this LGTM.

include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
33 ↗(On Diff #136634)

Should this go in Dwarf.def?

45 ↗(On Diff #136634)

Probably we can use a SmallVector<2> here?

lib/DebugInfo/DWARF/DWARFDebugFrame.cpp
51 ↗(On Diff #136634)

s/Impossible/Invalid/ for consitency

tools/llvm-readobj/DwarfCFIEHPrinter.h
118 ↗(On Diff #136634)

Nit: below you inlined the variables, here you didn't. I also don't think we need the contexprnes?

rafauler added inline comments.Mar 5 2018, 3:41 PM
include/llvm/DebugInfo/DWARF/DWARFDebugFrame.h
33 ↗(On Diff #136634)

This OperandType stuff is not from DWARF specs, but this is a custom thing from our implementation created just to simplify instruction printing. I've moved this to "private" because this really shouldn't be exported, and I also enhanced the comments to better explain this enum.

rafauler updated this revision to Diff 137099.Mar 5 2018, 3:43 PM

Address comments.

rafauler updated this revision to Diff 137107.Mar 5 2018, 5:05 PM

Change the test case input from binary to YAML.

JDevlieghere accepted this revision.Mar 6 2018, 7:31 AM

Thanks Rafael. This LGTM!

This revision is now accepted and ready to land.Mar 6 2018, 7:31 AM
This revision was automatically updated to reflect the committed changes.

Thanks everyone that helped to review this, I appreciate your help.