Page MenuHomePhabricator

Introduce DWARFDataExtractor::getInitialLength

Authored by labath on Feb 13 2020, 8:01 AM.



This patch introduces a function to house the code needed to do the
DWARF64 detection dance. It also updates three instances of DWARF64
parsing code to use this function instead of the hand-rolled

This patch does _not_ attempt to handle the problem of detecting lengths
which extend past the size of the section, or cases when reads of a
single contribution accidentally escape beyond its specified length, but
I think it's useful in its own right. The idea is that this functionality will
be built on top of the function introduced here.

Diff Detail

Event Timeline

labath created this revision.Feb 13 2020, 8:01 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2020, 8:01 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
ikudrin added inline comments.Feb 13 2020, 9:45 AM

Would it be simpler if you implement getInitialLength(uint64_t *, Error *) via getInitialLength(Cursor &)? You anyway create a Cursor inside getInitialLength(uint64_t *, Error *).

labath marked an inline comment as done.Feb 14 2020, 5:31 AM
labath added inline comments.

I don't think that would help very much (or at all). The second cursor comes from a desire to not modify the original offset in case we fail to read the entire initial length field. I am not sure if any of the potential users would really need this behavior, but that is how the rest of the DataExtractor functions operate, so I thought it'd be good to maintain that.

If we stick to that requirement, then I'd still need to have two cursors (or offsets, or something) and fiddle around with them. If we drop it, we wouldn't need the second Cursor even in the current setup.

labath updated this revision to Diff 244652.Feb 14 2020, 6:45 AM
  • add a unit test
  • update debug_line parser to use the new method

This should be ready for a proper review now.

labath edited the summary of this revision. (Show Details)Feb 14 2020, 6:46 AM
labath retitled this revision from [WIP] Introduce DWARFDataExtractor::getInitialLength to Introduce DWARFDataExtractor::getInitialLength.

The patch should probably be split into smaller, more focused ones. DWARFDataExtractor::getInitialLength() and its test should go first, and then distinct patches for each debug section.


It looks like that behavior is not stated. I hardly can remember if I saw it was used anywhere, but once I fixed an issue which was (partially) caused by it, D71704. I guess we can easily drop it if that helps to simplify the code, maybe adding a warning note in the comments of the methods.

As suggested, please split this up into separate patches.


Perhaps a comment explaining what is meant by an initial length field, wouldn't be out-of-order?


I think it's more common for error/warning messages to use lower-case for their first word.

53 ↗(On Diff #244652)

I don't think this is a good change in behaviour. We should still be able to dump the total length field, with the reserved value.

195 ↗(On Diff #244652)

This is a regression in the quality of the error, as it is not clear what section this error comes from. Same goes for many of the other changes in errors.

5 ↗(On Diff #244652)

This error is basically useless. Where? What section? What does "unexpected end of data" mean?

5 ↗(On Diff #244652)

Same as above for .debug_line.


Ideally, we should check the error returned in this and similar cases, to show that it is what was expected.


It may be worth a brief comment before each pair of checks to show what the intent of the particularly case is.


Will this test work if the host endianness is different? It looks to me like 0x00010203 != {0x00, 0x01, 0x02, 0x03} for little endian hosts, since in that case, 0x00010203 is {0x03, 0x02, 0x01, 0x00}.



labath marked 3 inline comments as done.Feb 20 2020, 7:13 AM

Sorry about the delay (and thank you for the prompt response). It took a while before this got to the top of my stack.

I'll split this patch up with my next update, which will come when I figure out how to improve the error messages. My initial motivation for putting this into a single patch was to make the impact of this kind of change more obvious (I expected the change in error messages to be mildly controversial), but in retrospect, that could have also been achieved with separate linked patches. Having super-descriptive error messages is somewhat at odds with making the code reusable, but I think we can come up with a reasonable compromise.


This behavior is stated (and painstakingly reiterated) on every DataExtractor method. It looks like DWARFDataExtractor does not have these comments, but then this class is much more lightly documented. Nonetheless, I think it would be good to preserve this behavior, as a matter of consistency.

53 ↗(On Diff #244652)

Actually, I would say that here we should not print out the prologue at all (just some error about invalid initial length). Since the length field is not valid, it's hard to say whether we have any line table here at all -- and yet we will print out the whole prologue full of zeroes (even before this patch).

195 ↗(On Diff #244652)

When this is run normally on the command line (so both stdout and stderr go to the terminal), this comes out as:

warning: Unsupported reserved unit length of value 0xfffffffe at offset 0x48

So, the section where this comes from should be fairly obvious. Nonetheless, it should be pretty easy to prepend the error message with the section name. Maybe something like "warning: parsing line table prologue at offset 0x00000048: unsupported reserved unit length found of value 0xfffffffe" or "warning: parsing line table prologue: unsupported reserved unit length of value 0xfffffffe at offset 0x48" ?

The question boils down to whether it should be the DataExtractor who embeds the offset into the message, or the line table code.

ikudrin marked an inline comment as done.Feb 20 2020, 7:33 AM
ikudrin added inline comments.

Checked again and now I see it; you are right, it is indeed stated. In that case, I agree, it is better to stick with it.

jhenderson added inline comments.Feb 25 2020, 1:45 AM
195 ↗(On Diff #244652)

I don't have an answer on what layer adds the offset context to the message, as I don't really have any particular opinion on that. However, it's worth noting that other tools use this code (e.g. LLD), and they aren't printing the debug_line[0x00000048] or indeed any of the rest of the dumped output, only the warning/error. As such, without the offset context, it's impossible to see the problem from those tools' output.

labath updated this revision to Diff 246455.Feb 25 2020, 7:34 AM

Thanks for all the feedback. I've removed all changes to individual parsing code
from this patch. Will post those as separate patches.

This change now only deals with the new API.

jhenderson accepted this revision.Feb 25 2020, 8:27 AM



Nit: blank line before the using please.


is hard -> it is hard

This revision is now accepted and ready to land.Feb 25 2020, 8:27 AM
labath marked 7 inline comments as done.Feb 26 2020, 8:10 AM
labath added inline comments.

(I thought I replied to this already, but I don't see the comment anywhere.)

The test hardcodes a big-endian so this will work regardless of the host. I used big endian because the numbers come out more naturally that way, and because they are more likely to flesh out any host endiannes creeping in.

This revision was automatically updated to reflect the committed changes.
labath marked an inline comment as done.