Page MenuHomePhabricator

Add error handling to the DataExtractor class
AcceptedPublic

Authored by labath on Mon, Jun 24, 6:29 AM.

Details

Summary

This is motivated by D63591, where we realized that there isn't a really
good way of telling whether a DataExtractor is reading actual data, or
is it just returning default values because it reached the end of the
buffer.

This patch resolves that by providing a new "Cursor" class. A Cursor
object encapsulates two things:

  • the current position/offset in the DataExtractor
  • an error object

Storing the error object inside the Cursor enables one to use the same
pattern as the std::{io}stream API, where one can blindly perform a
sequence of reads and only check for errors once at the end of the
operation. Similarly to the stream API, as soon as we encounter one
error, all of the subsequent operations are skipped (return default
values) too, even if the would suceed with clear error state. Unlike the
std::stream API (but in line with other llvm APIs), we force the error
state to be checked through usage of llvm::Error.

Diff Detail

Event Timeline

labath created this revision.Mon, Jun 24, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptMon, Jun 24, 6:29 AM
Herald added a subscriber: kristina. · View Herald Transcript
labath marked 8 inline comments as done.Mon, Jun 24, 6:49 AM

Besides prototyping the "DataStream" class, this also converts debug_loc and accel table (the two pieces of code I'm familiar with) parsers to use it in order to demonstrate how would this look in action.

I have also highlighted some of the open questions in inline comments.

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
409–413 ↗(On Diff #206206)

This shows that if reaching the end of the stream is the only kind of error one may encounter, then he doesn't need an Expected<T> return type, as the error flag is implicitly returned through the stream.

That may be viewed as a good thing, or as a bad one... :)

Since these are just private methods, I would say that's a good thing here.

include/llvm/Support/DataExtractor.h
518

This isn't consistent with iostreams (which detect eof only after one attempts to read past it), but it seems to me that this makes using the class much simpler.

526

maybe readUXX would be better instead of getUXX?

529

As can be seen here, checking whether the offset is updated (which is currently the only way of checking for errors) is a very tricky thing, since some functions can also "succeed" while not updating the offset.

554–558

To be consistent with iostreams we should stop attempting to read data once we encounter the first error. Doing that would add another branch to the code. Not doing that opens up the possibility for some weird behavior, where we first try to read a uint64_t and fail, but then try to read a uint8_t and succeed because the stream happened to have one more byte left. This could be mitigated by setting Offset to UINT32_MAX on failure.

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
72 ↗(On Diff #206206)

This code had a bug where it did not check the value of the NumAtoms field. So it could end up returning millions of empty/invalid Atoms while still claiming the accelerator table is valid.

441–444 ↗(On Diff #206206)

This check wasn't fully correct, because we could still cross the EntriesBase boundary after reading the two ULEB fields.

485–488 ↗(On Diff #206206)

Some nice way of slicing the DataExtractor/DataStream would also be handy. That would also be useful when parsing DIEs from a DWARF unit (to make sure we don't cross into the neighbouring unit).

I like the overall direction of this!

include/llvm/Support/DataExtractor.h
509

We'll definitely need to add some comment here to motivate its existence, probably contrasting it with the statelessness of the DataExtractor. I saw this patch first and I wondered "why not add this to the DataExtractor directly?", before catching up on the other patch.

526

+1

550

Should we store an llvm::Error here instead? I guess it doesn't matter much if there's only one error we can detect, but OTOH it'd be nice if we're going to convert them anyway, and all these errors are consistent.

labath marked an inline comment as done.Mon, Jun 24, 9:59 AM
labath added inline comments.
include/llvm/Support/DataExtractor.h
550

If we enlist the help of DataExtractor class, then there may be more kinds of errors that we can detect (and I'm thinking we should enlist it, because checking for errors via these offsets is tricky and probably slower than if the DataExtractor set a flag directly). The one error kind that comes to mind is "invalid uleb128" -- right now the uleb functions will happily read a 1 megabyte uleb, if all the bytes have bit7 set.

Storing an Error object as a member variable is a bit tricky due to the checked flag, noncopyability, etc. We could definitely at least have a member function that returns an Error object. But I think the first question we need to answer is what kind of error semantics do we want to have here. Should it be something strictly optional (as it is right now), or should it be something that blows up if you forget to check for errors (like llvm::Error does)...

I don't really have an opinion here, as I can see a case for both things..

I'm kind of thinking a change to the cursor as Greg suggested might be better than wrapping the stream further - and probably using llvm::Error.

& yeah, I think using Error and saying "parsing failures are problems that must be handled" rather than allowing for accidentally ignoring failures,

Amounting to something like this:

Cursor C;
DataExtractor D = ...;
D.getU32(C);
D.getU64(C);
...
if (Error E = C.getError())
  ...;

That way early exits (if your record type has optional fields, etc) you can't accidentally forget your error checking (because if there's an Error inside the Cursor it'd have to be checked otherwise it'll assert that it's unchecked, etc)

But yeah, that's a huge amount of refactoring to add all the error handling into existing callers.

The new API doesn't let you eliminate *all* checks. :-)
Also it introduces some new dependencies as noted inline.

include/llvm/Support/DataExtractor.h
509

Also that this depends on DataExtractor "get" methods returning zero for past-the-end calls.

512

assert(Offset <= DE.size()); to guard against construction with a bogus Offset?

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
60 ↗(On Diff #206206)

NumAtoms still isn't validated before being used, and could consume all available memory before you get to the error check.

389 ↗(On Diff #206206)

You're now resizing the string without validating the new size.

441 ↗(On Diff #206206)

This depends on extractAttributeEncoding()returning a sentinel if you run off the end of the stream. That depends on sentinalAttrEnc() using the same values that getULEB128() returns when it runs off the end. These dependencies need to be documented.

492 ↗(On Diff #206206)

Again the requirements on sentinels need to be documented (not here, but where those functions are defined).

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
148 ↗(On Diff #206206)

Might want a static_assert for that?

181 ↗(On Diff #206206)

You've lost a range check on the size.

labath updated this revision to Diff 206430.Tue, Jun 25, 6:40 AM
labath marked 10 inline comments as done.

Cursor implementation.

I'm kind of thinking a change to the cursor as Greg suggested might be better than wrapping the stream further - and probably using llvm::Error.

& yeah, I think using Error and saying "parsing failures are problems that must be handled" rather than allowing for accidentally ignoring failures,

Amounting to something like this:

Cursor C;
DataExtractor D = ...;
D.getU32(C);
D.getU64(C);
...
if (Error E = C.getError())
  ...;

That way early exits (if your record type has optional fields, etc) you can't accidentally forget your error checking (because if there's an Error inside the Cursor it'd have to be checked otherwise it'll assert that it's unchecked, etc)

Yeah, using a cursor API seems fine. I've updated the patch to use something like you desribed above. Let me know what you think.

But yeah, that's a huge amount of refactoring to add all the error handling into existing callers.

I think this thing can be done incrementally. The first step could be to add a Error *Err = nullptr argument to all existing DataExtractor functions. This way, anyone who wishes it, can already detect errors by passing an error object. Then, the cursor api becomes just syntactic sugar on top of that, and can be introduced gradually. Once all users have been ported, the non-cursor API can be removed (or more likely replaced by something returning Expected<T>, since the non-cursor api is still handier to use for one-off parses like those in debug_str and similar..

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
60 ↗(On Diff #206206)

Indeed. Thanks for catching that. I'll have to remember to go through the comments on this patch and create regression tests once I'm done prototyping...

389 ↗(On Diff #206206)

Good catch.

This is a pattern that has been present a couple of times in this diff alone, so I've created a special DataExtractor function for safely reading a sequence of integers into a vector.

441 ↗(On Diff #206206)

Documenting them is easy. However, I would say that the more interesting question at this stage is "do they make the code more readable?". I would say "yes", because without this, correctly checking for EOF is nearly impossible. However, there may still be room for improvement here...

dblaikie added inline comments.Tue, Jun 25, 4:31 PM
lib/Support/DataExtractor.cpp
28

I /think/ these should be early-returns if the Error is already set (assuming the goal is to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations)

Also, you will need to be careful with how you check if the Error is already set - since you want to make sure you don't set the Error to the "checked" state (otherwise the caller won't be forced to check it, which is bad).

Hmm, perhaps you actually want to test that the Error is checked. If the error is unchecked, then return early & leave it to the caller to check.

The "unexpectedEndReached" can then just do "if (E) *E = ...;" without having to check the error state.

If the goal is to treat the Error as a straight out parameter without the repeated deserialization support (because something at a higher level will handle that in some way) - then maybe this should be ErrorAsOutParameter instead?

labath marked an inline comment as done.Tue, Jul 2, 3:00 AM
labath added inline comments.
lib/Support/DataExtractor.cpp
28

The goal is "to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations" like you've said.

I also think that it would be nice to have early returns here. The reason I am reluctant to do that is because this will add an additional branch to the hot success path, where we do not encounter an error. In reality, this will probably be two branches because we will also need to check the Err pointer, at least in the interim stage.

I hope that this would not matter in practice because once one reaches the end of the deserialization sequence and finds that the error object is set, the only thing he can reliably do is disregard all data he has read since the last time he checked for the error state.

Nonetheless having early returns here would definitely make the API more predictable. I'm going to try to get some numbers on the impact of returning early here...

labath added a comment.Tue, Jul 2, 6:40 AM

Ok, I got some numbers now. I went for a micro-benchmark-like setup to show some kind of a worst case scenario. The test setup is as follows:

I took the largest single .o file I had around (Registry.cpp.o in clangDynamicASTMatchers library). I then linked it to remove any relocations (otherwise, most of the time is spend applying those). Then I modified llvm-dwarfdump to parse the debug_loc section without dumping anything (to avoid measuring the time spend in printfs). Both llvm-dwarfdump and Registry.cpp.o I was dumping were built with -O3 -g, with asserts disabled (no FDO, LTO or other fancy stuff). This resulted in about 4.5 megabytes of debug_loc for parsing in Registry.cpp.o. Then I used the linux perf command to run llvm-dwarfdump -debug-loc 1000 times and dump the stats.

The baseline stats are:

  27.285986      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.11% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,813      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 58,831,163      cycles:u                  #    2.156 GHz                      ( +-  0.07% )
    606,986      stalled-cycles-frontend:u #    1.03% frontend cycles idle     ( +-  3.76% )
  7,924,778      stalled-cycles-backend:u  #   13.47% backend cycles idle      ( +-  0.33% )
146,588,727      instructions:u            #    2.49  insn per cycle         
                                           #    0.05  stalled cycles per insn  ( +-  0.00% )
 29,545,620      branches:u                # 1082.813 M/sec                    ( +-  0.00% )
    222,276      branch-misses:u           #    0.75% of all branches          ( +-  0.15% )

0.027663381 seconds time elapsed                                          ( +-  0.11% )

The stats with this patch applied are:

  27.397390      task-clock:u (msec)       #    0.987 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,833      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 60,160,571      cycles:u                  #    2.196 GHz                      ( +-  0.07% )
    584,825      stalled-cycles-frontend:u #    0.97% frontend cycles idle     ( +-  3.37% )
 10,729,974      stalled-cycles-backend:u  #   17.84% backend cycles idle      ( +-  0.26% )
156,141,836      instructions:u            #    2.60  insn per cycle         
                                           #    0.07  stalled cycles per insn  ( +-  0.00% )
 31,599,940      branches:u                # 1153.392 M/sec                    ( +-  0.00% )
    221,247      branch-misses:u           #    0.70% of all branches          ( +-  0.06% )

0.027771865 seconds time elapsed                                          ( +-  0.10% )

The stats for a version of this patch which additionally checks for the error flag before attempting the parse (as discussed in the inline comment) are:

  27.808349      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,839      page-faults:u             #    0.102 M/sec                    ( +-  0.24% )
 62,887,388      cycles:u                  #    2.261 GHz                      ( +-  0.06% )
    575,264      stalled-cycles-frontend:u #    0.91% frontend cycles idle     ( +-  3.18% )
 14,757,888      stalled-cycles-backend:u  #   23.47% backend cycles idle      ( +-  0.23% )
167,562,307      instructions:u            #    2.66  insn per cycle         
                                           #    0.09  stalled cycles per insn  ( +-  0.00% )
 33,414,152      branches:u                # 1201.587 M/sec                    ( +-  0.00% )
    221,454      branch-misses:u           #    0.66% of all branches          ( +-  0.12% )

0.028201319 seconds time elapsed                                          ( +-  0.10% )

As can be seen, this patch increases the parsing time by about 0.4%. This is enough to be statistically significant in a benchmark like this, but probably not world-shattering (some slowdown is unavoidable with a change like this).

If we additionally enable the early returns we get an additional 1.5% slowdown (or 1.9% above baseline). Still not bad for a "micro-benchmark", but it does make one wonder, whether it is really worth it. My feeling would be that it isn't...

Ok, I got some numbers now. I went for a micro-benchmark-like setup to show some kind of a worst case scenario. The test setup is as follows:

I took the largest single .o file I had around (Registry.cpp.o in clangDynamicASTMatchers library). I then linked it to remove any relocations (otherwise, most of the time is spend applying those). Then I modified llvm-dwarfdump to parse the debug_loc section without dumping anything (to avoid measuring the time spend in printfs). Both llvm-dwarfdump and Registry.cpp.o I was dumping were built with -O3 -g, with asserts disabled (no FDO, LTO or other fancy stuff). This resulted in about 4.5 megabytes of debug_loc for parsing in Registry.cpp.o. Then I used the linux perf command to run llvm-dwarfdump -debug-loc 1000 times and dump the stats.

The baseline stats are:

  27.285986      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.11% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,813      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 58,831,163      cycles:u                  #    2.156 GHz                      ( +-  0.07% )
    606,986      stalled-cycles-frontend:u #    1.03% frontend cycles idle     ( +-  3.76% )
  7,924,778      stalled-cycles-backend:u  #   13.47% backend cycles idle      ( +-  0.33% )
146,588,727      instructions:u            #    2.49  insn per cycle         
                                           #    0.05  stalled cycles per insn  ( +-  0.00% )
 29,545,620      branches:u                # 1082.813 M/sec                    ( +-  0.00% )
    222,276      branch-misses:u           #    0.75% of all branches          ( +-  0.15% )
 
0.027663381 seconds time elapsed                                          ( +-  0.11% )

The stats with this patch applied are:

  27.397390      task-clock:u (msec)       #    0.987 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,833      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 60,160,571      cycles:u                  #    2.196 GHz                      ( +-  0.07% )
    584,825      stalled-cycles-frontend:u #    0.97% frontend cycles idle     ( +-  3.37% )
 10,729,974      stalled-cycles-backend:u  #   17.84% backend cycles idle      ( +-  0.26% )
156,141,836      instructions:u            #    2.60  insn per cycle         
                                           #    0.07  stalled cycles per insn  ( +-  0.00% )
 31,599,940      branches:u                # 1153.392 M/sec                    ( +-  0.00% )
    221,247      branch-misses:u           #    0.70% of all branches          ( +-  0.06% )
 
0.027771865 seconds time elapsed                                          ( +-  0.10% )

The stats for a version of this patch which additionally checks for the error flag before attempting the parse (as discussed in the inline comment) are:

  27.808349      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,839      page-faults:u             #    0.102 M/sec                    ( +-  0.24% )
 62,887,388      cycles:u                  #    2.261 GHz                      ( +-  0.06% )
    575,264      stalled-cycles-frontend:u #    0.91% frontend cycles idle     ( +-  3.18% )
 14,757,888      stalled-cycles-backend:u  #   23.47% backend cycles idle      ( +-  0.23% )
167,562,307      instructions:u            #    2.66  insn per cycle         
                                           #    0.09  stalled cycles per insn  ( +-  0.00% )
 33,414,152      branches:u                # 1201.587 M/sec                    ( +-  0.00% )
    221,454      branch-misses:u           #    0.66% of all branches          ( +-  0.12% )
 
0.028201319 seconds time elapsed                                          ( +-  0.10% )

As can be seen, this patch increases the parsing time by about 0.4%. This is enough to be statistically significant in a benchmark like this, but probably not world-shattering (some slowdown is unavoidable with a change like this).

If we additionally enable the early returns we get an additional 1.5% slowdown (or 1.9% above baseline). Still not bad for a "micro-benchmark", but it does make one wonder, whether it is really worth it. My feeling would be that it isn't...

Sorry I've lost some of the context here (my brain's like a sieve sometimes) - the first set of results come from this patch which, is this correct: implements a wrapper around DataExtractor that handles the offset comparison to determine validity? (if the offset isn't modified, the attempted parsing was invalid in some way)

And the second results are from adding a separate "if (already seen an error) return;" to these same wrapped function calls?

What if "if (already seen an error)" was added to the "isValidOffset" function (making all offsets invalid if there's already an error) - would that collapse the early return codepaths & perhaps make things simpler/faster?

labath updated this revision to Diff 210090.Tue, Jul 16, 7:03 AM
  • update the patch to reflect our offline conversation with @dblaikie. We decided to go for the API which makes most sense (i.e. skip all reads as soon as one of them returns an error), at least until there is evidence that this makes a difference in practice (one can always prove that there is a slowdown here with a suitable micro-benchmark).
  • add tests for the new APIs
  • move the refactoring of other parsing classes into a separate patch (keeping this patch solely about DataExtractor).
labath retitled this revision from WIP: DataExtractor error handling to Add error handling to the DataExtractor class.Tue, Jul 16, 7:08 AM
labath edited the summary of this revision. (Show Details)

I think this should be ready for a proper review now. To see the new API in action, please take a look at D64798.

Minor stuff.

include/llvm/Support/DataExtractor.h
152–156

Need to add Err to the doxygen.

225–226

Need to add Err to the doxygen.

262

it -> if

288–289

Need to add Err to the doxygen.

353–354

Need to add Err to the doxygen.

401–402

Need to add Err to the doxygen.

469–473

Need to add Err to the doxygen.

473

So eof() intentionally ignores the error state? (Just making sure the contract is understood.)

unittests/Support/DataExtractorTest.cpp
10

The Error is part of the DataExtractor interface, you should not need to #include this here.

266

Test for eof() in the error case? (attempt to read too far, but eof is still false, if that's the correct contract.)

labath marked 12 inline comments as done.Wed, Jul 17, 1:52 AM
labath added inline comments.
include/llvm/Support/DataExtractor.h
473

Yes, that's analogous to std::iostream, where eof() tests the eofbit and operator bool checks the failbit (we don't have a badbit because there's no way to get a hard read error when reading from memory).

unittests/Support/DataExtractorTest.cpp
10

This is the testing support header which defines stuff that makes things like EXPECT_THAT_ERROR(..., Succeeded()) work.

labath updated this revision to Diff 210270.Wed, Jul 17, 1:53 AM
labath marked an inline comment as done.
labath edited the summary of this revision. (Show Details)

Add doxygen comments and fix typo.

probinson accepted this revision.Wed, Jul 17, 8:02 AM

I'm happy, but other people obviously have better eyesight than I do. Give Jonas and Blaikie a day to chime in, I think.

unittests/Support/DataExtractorTest.cpp
10

Doh! I'm due for an eye test.

This revision is now accepted and ready to land.Wed, Jul 17, 8:02 AM
dblaikie added inline comments.Wed, Jul 17, 11:01 AM
lib/Support/DataExtractor.cpp
18–21

Not sure this function adds value compared to writing the "createStringError" call directly in "unexpectedEndReached"?

28

The problem with this is that it clears the "unchecked" bit in the Error, which means code using DataExtractor would not get an assertion failure in Error about it being unchecked.

Might be worth having a gunit death test to demonstrate that failing to check the Error after parsing some fields does assert/crash.

34

Is the LLVM_UNLIKELY justified by performance data? (again, microbenchmarks could probably justify it in many parts of LLVM where it doesn't make a difference in practice - so I'd be inclined to leave these out for now)

I have been reminded that there's also a desire to make DataExtractor work with 64-bit section sizes. Maybe Cursor should use a 64-bit offset (i.e., size_t not uint32_t), and then migrating from non-Cursor to Cursor APIs will also do the 64-bit transition? We need to bite that bullet at some point. (I kind of expect y'all to say, no way do that later, which is fine; mainly I wanted to refresh that in our collective minds.)

I have been reminded that there's also a desire to make DataExtractor work with 64-bit section sizes. Maybe Cursor should use a 64-bit offset (i.e., size_t not uint32_t), and then migrating from non-Cursor to Cursor APIs will also do the 64-bit transition? We need to bite that bullet at some point. (I kind of expect y'all to say, no way do that later, which is fine; mainly I wanted to refresh that in our collective minds.)

That sounds like a good idea to me. It shouldn't complicate the DataExtractor implementation too much (it should be enough to make the getU & co. static functions templated also on the offset type), and since adopting the Cursor API will require changes to the code using the DataExtractor anyway, it will make it easier to check that the code is compatible with 64-bit offsets.

labath marked 3 inline comments as done.Thu, Jul 18, 7:10 AM
labath added inline comments.
lib/Support/DataExtractor.cpp
18–21

Indeed. This was a relict from earlier versions of this patch..

28

I already have such tests (DataExtractorDeathTest, below). The reason this works is because ErrorAsOutParameter clears the "checked" flag when the function returns.

We could design a way to check this that is statically known to be safe, but I'm not sure if it's worth it for a function that is local to this file (the only method i can think of involves subclassing/wrapping ErrorAsOutParameter).

34

No, I don't have any performance data for this. I'll leave these out...

labath updated this revision to Diff 210558.Thu, Jul 18, 7:12 AM
  • address feedback from @dblaikie
  • waiting to hear more opinions before updating the patch to use size_t