This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
DebugInfo/DWARF/
-
DWARF/
-
DWARFDataExtractor.h
-
Support/
20/21
DataExtractor.h
-
lib/
-
DebugInfo/DWARF/
-
DWARF/
-
DWARFDataExtractor.cpp
-
Support/
12/12
DataExtractor.cpp
-
unittests/Support/
-
Support/
6/6
DataExtractorTest.cpp

Differential D63713

Add error handling to the DataExtractor class
ClosedPublic

Authored by labath on Jun 24 2019, 6:29 AM.

Download Raw Diff

Details

Reviewers

probinson
dblaikie
JDevlieghere
aprantl
echristo
ikudrin
lhames

Commits

rL370042: Add error handling to the DataExtractor class
rGb1f29cec2511: Add error handling to the DataExtractor class

Summary

This is motivated by D63591, where we realized that there isn't a really
good way of telling whether a DataExtractor is reading actual data, or
is it just returning default values because it reached the end of the
buffer.

This patch resolves that by providing a new "Cursor" class. A Cursor
object encapsulates two things:

the current position/offset in the DataExtractor
an error object

Storing the error object inside the Cursor enables one to use the same
pattern as the std::{io}stream API, where one can blindly perform a
sequence of reads and only check for errors once at the end of the
operation. Similarly to the stream API, as soon as we encounter one
error, all of the subsequent operations are skipped (return default
values) too, even if the would suceed with clear error state. Unlike the
std::stream API (but in line with other llvm APIs), we force the error
state to be checked through usage of llvm::Error.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 37111
Build 37110: arc lint + arc unit

Event Timeline

labath created this revision.Jun 24 2019, 6:29 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2019, 6:29 AM

Herald added a subscriber: kristina. · View Herald Transcript

Harbormaster completed remote builds in B33786: Diff 206206.Jun 24 2019, 6:31 AM

Besides prototyping the "DataStream" class, this also converts debug_loc and accel table (the two pieces of code I'm familiar with) parsers to use it in order to demonstrate how would this look in action.

I have also highlighted some of the open questions in inline comments.

include/llvm/DebugInfo/DWARF/DWARFAcceleratorTable.h
409–413 ↗	(On Diff #206206)	This shows that if reaching the end of the stream is the only kind of error one may encounter, then he doesn't need an `Expected<T>` return type, as the error flag is implicitly returned through the stream. That may be viewed as a good thing, or as a bad one... :) Since these are just private methods, I would say that's a good thing here.
include/llvm/Support/DataExtractor.h
539	This isn't consistent with iostreams (which detect eof only after one attempts to read past it), but it seems to me that this makes using the class much simpler.
547	maybe `readUXX` would be better instead of `getUXX`?
550	As can be seen here, checking whether the offset is updated (which is currently the only way of checking for errors) is a very tricky thing, since some functions can also "succeed" while not updating the offset.
575–579	To be consistent with iostreams we should stop attempting to read data once we encounter the first error. Doing that would add another branch to the code. Not doing that opens up the possibility for some weird behavior, where we first try to read a uint64_t and fail, but then try to read a uint8_t and succeed because the stream happened to have one more byte left. This could be mitigated by setting Offset to UINT32_MAX on failure.
lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
72 ↗	(On Diff #206206)	This code had a bug where it did not check the value of the NumAtoms field. So it could end up returning millions of empty/invalid Atoms while still claiming the accelerator table is valid.
441–444 ↗	(On Diff #206206)	This check wasn't fully correct, because we could still cross the `EntriesBase` boundary after reading the two ULEB fields.
485–488 ↗	(On Diff #206206)	Some nice way of slicing the DataExtractor/DataStream would also be handy. That would also be useful when parsing DIEs from a DWARF unit (to make sure we don't cross into the neighbouring unit).

I like the overall direction of this!

include/llvm/Support/DataExtractor.h
530	We'll definitely need to add some comment here to motivate its existence, probably contrasting it with the statelessness of the DataExtractor. I saw this patch first and I wondered "why not add this to the DataExtractor directly?", before catching up on the other patch.
547	+1
571	Should we store an llvm::Error here instead? I guess it doesn't matter much if there's only one error we can detect, but OTOH it'd be nice if we're going to convert them anyway, and all these errors are consistent.

labath marked an inline comment as done.Jun 24 2019, 9:59 AM

labath added inline comments.

include/llvm/Support/DataExtractor.h
571	If we enlist the help of DataExtractor class, then there may be more kinds of errors that we can detect (and I'm thinking we should enlist it, because checking for errors via these offsets is tricky and probably slower than if the DataExtractor set a flag directly). The one error kind that comes to mind is "invalid uleb128" -- right now the uleb functions will happily read a 1 megabyte uleb, if all the bytes have bit7 set. Storing an `Error` object as a member variable is a bit tricky due to the checked flag, noncopyability, etc. We could definitely at least have a member function that returns an `Error` object. But I think the first question we need to answer is what kind of error semantics do we want to have here. Should it be something strictly optional (as it is right now), or should it be something that blows up if you forget to check for errors (like `llvm::Error` does)... I don't really have an opinion here, as I can see a case for both things..

I'm kind of thinking a change to the cursor as Greg suggested might be better than wrapping the stream further - and probably using llvm::Error.

& yeah, I think using Error and saying "parsing failures are problems that must be handled" rather than allowing for accidentally ignoring failures,

Amounting to something like this:

Cursor C;
DataExtractor D = ...;
D.getU32(C);
D.getU64(C);
...
if (Error E = C.getError())
  ...;

That way early exits (if your record type has optional fields, etc) you can't accidentally forget your error checking (because if there's an Error inside the Cursor it'd have to be checked otherwise it'll assert that it's unchecked, etc)

But yeah, that's a huge amount of refactoring to add all the error handling into existing callers.

The new API doesn't let you eliminate *all* checks. :-)
Also it introduces some new dependencies as noted inline.

include/llvm/Support/DataExtractor.h
530	Also that this depends on DataExtractor "get" methods returning zero for past-the-end calls.
533	`assert(Offset <= DE.size());` to guard against construction with a bogus Offset?
lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
60 ↗	(On Diff #206206)	NumAtoms still isn't validated before being used, and could consume all available memory before you get to the error check.
389 ↗	(On Diff #206206)	You're now resizing the string without validating the new size.
441 ↗	(On Diff #206206)	This depends on `extractAttributeEncoding()`returning a sentinel if you run off the end of the stream. That depends on `sentinalAttrEnc()` using the same values that `getULEB128()` returns when it runs off the end. These dependencies need to be documented.
492 ↗	(On Diff #206206)	Again the requirements on sentinels need to be documented (not here, but where those functions are defined).
lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
148 ↗	(On Diff #206206)	Might want a static_assert for that?

probinson added inline comments.Jun 24 2019, 11:26 AM

lib/DebugInfo/DWARF/DWARFDebugLoc.cpp
181 ↗	(On Diff #206206)	You've lost a range check on the size.

Cursor implementation.

Harbormaster completed remote builds in B33877: Diff 206430.Jun 25 2019, 6:42 AM

In D63713#1556126, @dblaikie wrote:
I'm kind of thinking a change to the cursor as Greg suggested might be better than wrapping the stream further - and probably using llvm::Error.

& yeah, I think using Error and saying "parsing failures are problems that must be handled" rather than allowing for accidentally ignoring failures,

Amounting to something like this:
Cursor C;
DataExtractor D = ...;
D.getU32(C);
D.getU64(C);
...
if (Error E = C.getError())
  ...;
That way early exits (if your record type has optional fields, etc) you can't accidentally forget your error checking (because if there's an Error inside the Cursor it'd have to be checked otherwise it'll assert that it's unchecked, etc)

Yeah, using a cursor API seems fine. I've updated the patch to use something like you desribed above. Let me know what you think.

But yeah, that's a huge amount of refactoring to add all the error handling into existing callers.

I think this thing can be done incrementally. The first step could be to add a Error *Err = nullptr argument to all existing DataExtractor functions. This way, anyone who wishes it, can already detect errors by passing an error object. Then, the cursor api becomes just syntactic sugar on top of that, and can be introduced gradually. Once all users have been ported, the non-cursor API can be removed (or more likely replaced by something returning Expected<T>, since the non-cursor api is still handier to use for one-off parses like those in debug_str and similar..

lib/DebugInfo/DWARF/DWARFAcceleratorTable.cpp
60 ↗	(On Diff #206206)	Indeed. Thanks for catching that. I'll have to remember to go through the comments on this patch and create regression tests once I'm done prototyping...
389 ↗	(On Diff #206206)	Good catch. This is a pattern that has been present a couple of times in this diff alone, so I've created a special DataExtractor function for safely reading a sequence of integers into a vector.
441 ↗	(On Diff #206206)	Documenting them is easy. However, I would say that the more interesting question at this stage is "do they make the code more readable?". I would say "yes", because without this, correctly checking for EOF is nearly impossible. However, there may still be room for improvement here...

dblaikie added inline comments.Jun 25 2019, 4:31 PM

lib/Support/DataExtractor.cpp
28	I /think/ these should be early-returns if the Error is already set (assuming the goal is to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations) Also, you will need to be careful with how you check if the Error is already set - since you want to make sure you don't set the Error to the "checked" state (otherwise the caller won't be forced to check it, which is bad). Hmm, perhaps you actually want to test that the Error is checked. If the error is unchecked, then return early & leave it to the caller to check. The "unexpectedEndReached" can then just do "if (E) *E = ...;" without having to check the error state. If the goal is to treat the Error as a straight out parameter without the repeated deserialization support (because something at a higher level will handle that in some way) - then maybe this should be ErrorAsOutParameter instead?

labath marked an inline comment as done.Jul 2 2019, 3:00 AM

labath added inline comments.

lib/Support/DataExtractor.cpp
28	The goal is "to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations" like you've said. I also think that it would be nice to have early returns here. The reason I am reluctant to do that is because this will add an additional branch to the hot success path, where we do not encounter an error. In reality, this will probably be two branches because we will also need to check the `Err` pointer, at least in the interim stage. I hope that this would not matter in practice because once one reaches the end of the deserialization sequence and finds that the error object is set, the only thing he can reliably do is disregard all data he has read since the last time he checked for the error state. Nonetheless having early returns here would definitely make the API more predictable. I'm going to try to get some numbers on the impact of returning early here...

Ok, I got some numbers now. I went for a micro-benchmark-like setup to show some kind of a worst case scenario. The test setup is as follows:

I took the largest single .o file I had around (Registry.cpp.o in clangDynamicASTMatchers library). I then linked it to remove any relocations (otherwise, most of the time is spend applying those). Then I modified llvm-dwarfdump to parse the debug_loc section without dumping anything (to avoid measuring the time spend in printfs). Both llvm-dwarfdump and Registry.cpp.o I was dumping were built with -O3 -g, with asserts disabled (no FDO, LTO or other fancy stuff). This resulted in about 4.5 megabytes of debug_loc for parsing in Registry.cpp.o. Then I used the linux perf command to run llvm-dwarfdump -debug-loc 1000 times and dump the stats.

The baseline stats are:

  27.285986      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.11% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,813      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 58,831,163      cycles:u                  #    2.156 GHz                      ( +-  0.07% )
    606,986      stalled-cycles-frontend:u #    1.03% frontend cycles idle     ( +-  3.76% )
  7,924,778      stalled-cycles-backend:u  #   13.47% backend cycles idle      ( +-  0.33% )
146,588,727      instructions:u            #    2.49  insn per cycle         
                                           #    0.05  stalled cycles per insn  ( +-  0.00% )
 29,545,620      branches:u                # 1082.813 M/sec                    ( +-  0.00% )
    222,276      branch-misses:u           #    0.75% of all branches          ( +-  0.15% )

0.027663381 seconds time elapsed                                          ( +-  0.11% )

The stats with this patch applied are:

  27.397390      task-clock:u (msec)       #    0.987 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,833      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 60,160,571      cycles:u                  #    2.196 GHz                      ( +-  0.07% )
    584,825      stalled-cycles-frontend:u #    0.97% frontend cycles idle     ( +-  3.37% )
 10,729,974      stalled-cycles-backend:u  #   17.84% backend cycles idle      ( +-  0.26% )
156,141,836      instructions:u            #    2.60  insn per cycle         
                                           #    0.07  stalled cycles per insn  ( +-  0.00% )
 31,599,940      branches:u                # 1153.392 M/sec                    ( +-  0.00% )
    221,247      branch-misses:u           #    0.70% of all branches          ( +-  0.06% )

0.027771865 seconds time elapsed                                          ( +-  0.10% )

The stats for a version of this patch which additionally checks for the error flag before attempting the parse (as discussed in the inline comment) are:

  27.808349      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,839      page-faults:u             #    0.102 M/sec                    ( +-  0.24% )
 62,887,388      cycles:u                  #    2.261 GHz                      ( +-  0.06% )
    575,264      stalled-cycles-frontend:u #    0.91% frontend cycles idle     ( +-  3.18% )
 14,757,888      stalled-cycles-backend:u  #   23.47% backend cycles idle      ( +-  0.23% )
167,562,307      instructions:u            #    2.66  insn per cycle         
                                           #    0.09  stalled cycles per insn  ( +-  0.00% )
 33,414,152      branches:u                # 1201.587 M/sec                    ( +-  0.00% )
    221,454      branch-misses:u           #    0.66% of all branches          ( +-  0.12% )

0.028201319 seconds time elapsed                                          ( +-  0.10% )

As can be seen, this patch increases the parsing time by about 0.4%. This is enough to be statistically significant in a benchmark like this, but probably not world-shattering (some slowdown is unavoidable with a change like this).

If we additionally enable the early returns we get an additional 1.5% slowdown (or 1.9% above baseline). Still not bad for a "micro-benchmark", but it does make one wonder, whether it is really worth it. My feeling would be that it isn't...

In D63713#1566605, @labath wrote:
Ok, I got some numbers now. I went for a micro-benchmark-like setup to show some kind of a worst case scenario. The test setup is as follows:

I took the largest single .o file I had around (Registry.cpp.o in clangDynamicASTMatchers library). I then linked it to remove any relocations (otherwise, most of the time is spend applying those). Then I modified llvm-dwarfdump to parse the debug_loc section without dumping anything (to avoid measuring the time spend in printfs). Both llvm-dwarfdump and Registry.cpp.o I was dumping were built with -O3 -g, with asserts disabled (no FDO, LTO or other fancy stuff). This resulted in about 4.5 megabytes of debug_loc for parsing in Registry.cpp.o. Then I used the linux perf command to run llvm-dwarfdump -debug-loc 1000 times and dump the stats.

The baseline stats are:
  27.285986      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.11% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,813      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 58,831,163      cycles:u                  #    2.156 GHz                      ( +-  0.07% )
    606,986      stalled-cycles-frontend:u #    1.03% frontend cycles idle     ( +-  3.76% )
  7,924,778      stalled-cycles-backend:u  #   13.47% backend cycles idle      ( +-  0.33% )
146,588,727      instructions:u            #    2.49  insn per cycle         
                                           #    0.05  stalled cycles per insn  ( +-  0.00% )
 29,545,620      branches:u                # 1082.813 M/sec                    ( +-  0.00% )
    222,276      branch-misses:u           #    0.75% of all branches          ( +-  0.15% )

0.027663381 seconds time elapsed                                          ( +-  0.11% )
The stats with this patch applied are:
  27.397390      task-clock:u (msec)       #    0.987 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,833      page-faults:u             #    0.103 M/sec                    ( +-  0.24% )
 60,160,571      cycles:u                  #    2.196 GHz                      ( +-  0.07% )
    584,825      stalled-cycles-frontend:u #    0.97% frontend cycles idle     ( +-  3.37% )
 10,729,974      stalled-cycles-backend:u  #   17.84% backend cycles idle      ( +-  0.26% )
156,141,836      instructions:u            #    2.60  insn per cycle         
                                           #    0.07  stalled cycles per insn  ( +-  0.00% )
 31,599,940      branches:u                # 1153.392 M/sec                    ( +-  0.00% )
    221,247      branch-misses:u           #    0.70% of all branches          ( +-  0.06% )

0.027771865 seconds time elapsed                                          ( +-  0.10% )
The stats for a version of this patch which additionally checks for the error flag before attempting the parse (as discussed in the inline comment) are:
  27.808349      task-clock:u (msec)       #    0.986 CPUs utilized            ( +-  0.10% )
          0      context-switches:u        #    0.000 K/sec                  
          0      cpu-migrations:u          #    0.000 K/sec                  
      2,839      page-faults:u             #    0.102 M/sec                    ( +-  0.24% )
 62,887,388      cycles:u                  #    2.261 GHz                      ( +-  0.06% )
    575,264      stalled-cycles-frontend:u #    0.91% frontend cycles idle     ( +-  3.18% )
 14,757,888      stalled-cycles-backend:u  #   23.47% backend cycles idle      ( +-  0.23% )
167,562,307      instructions:u            #    2.66  insn per cycle         
                                           #    0.09  stalled cycles per insn  ( +-  0.00% )
 33,414,152      branches:u                # 1201.587 M/sec                    ( +-  0.00% )
    221,454      branch-misses:u           #    0.66% of all branches          ( +-  0.12% )

0.028201319 seconds time elapsed                                          ( +-  0.10% )
As can be seen, this patch increases the parsing time by about 0.4%. This is enough to be statistically significant in a benchmark like this, but probably not world-shattering (some slowdown is unavoidable with a change like this).

If we additionally enable the early returns we get an additional 1.5% slowdown (or 1.9% above baseline). Still not bad for a "micro-benchmark", but it does make one wonder, whether it is really worth it. My feeling would be that it isn't...

Sorry I've lost some of the context here (my brain's like a sieve sometimes) - the first set of results come from this patch which, is this correct: implements a wrapper around DataExtractor that handles the offset comparison to determine validity? (if the offset isn't modified, the attempted parsing was invalid in some way)

And the second results are from adding a separate "if (already seen an error) return;" to these same wrapped function calls?

What if "if (already seen an error)" was added to the "isValidOffset" function (making all offsets invalid if there's already an error) - would that collapse the early return codepaths & perhaps make things simpler/faster?

update the patch to reflect our offline conversation with @dblaikie. We decided to go for the API which makes most sense (i.e. skip all reads as soon as one of them returns an error), at least until there is evidence that this makes a difference in practice (one can always prove that there is a slowdown here with a suitable micro-benchmark).
add tests for the new APIs
move the refactoring of other parsing classes into a separate patch (keeping this patch solely about DataExtractor).

Harbormaster completed remote builds in B35087: Diff 210090.Jul 16 2019, 7:03 AM

labath mentioned this in D64798: WIP: Refactor accel table and debug_loc parsers to demonstrate the new DataExtractor API.Jul 16 2019, 7:06 AM

I think this should be ready for a proper review now. To see the new API in action, please take a look at D64798.

Minor stuff.

include/llvm/Support/DataExtractor.h
169–174	Need to add Err to the doxygen.
243–244	Need to add Err to the doxygen.
280	it -> if
306–307	Need to add Err to the doxygen.
371–372	Need to add Err to the doxygen.
419–420	Need to add Err to the doxygen.
487–494	Need to add Err to the doxygen.
491	So eof() intentionally ignores the error state? (Just making sure the contract is understood.)
unittests/Support/DataExtractorTest.cpp
10	The Error is part of the DataExtractor interface, you should not need to #include this here.
266	Test for eof() in the error case? (attempt to read too far, but eof is still false, if that's the correct contract.)

labath marked 12 inline comments as done.Jul 17 2019, 1:52 AM

labath added inline comments.

include/llvm/Support/DataExtractor.h
491	Yes, that's analogous to std::iostream, where `eof()` tests the `eofbit` and `operator bool` checks the `failbit` (we don't have a `badbit` because there's no way to get a hard read error when reading from memory).
unittests/Support/DataExtractorTest.cpp
10	This is the testing support header which defines stuff that makes things like `EXPECT_THAT_ERROR(..., Succeeded())` work.

Add doxygen comments and fix typo.

Harbormaster completed remote builds in B35154: Diff 210270.Jul 17 2019, 1:53 AM

I'm happy, but other people obviously have better eyesight than I do. Give Jonas and Blaikie a day to chime in, I think.

unittests/Support/DataExtractorTest.cpp
10	Doh! I'm due for an eye test.

This revision is now accepted and ready to land.Jul 17 2019, 8:02 AM

dblaikie added inline comments.Jul 17 2019, 11:01 AM

lib/Support/DataExtractor.cpp
18–21	Not sure this function adds value compared to writing the "createStringError" call directly in "unexpectedEndReached"?
28	The problem with this is that it clears the "unchecked" bit in the Error, which means code using DataExtractor would not get an assertion failure in Error about it being unchecked. Might be worth having a gunit death test to demonstrate that failing to check the Error after parsing some fields does assert/crash.
34	Is the LLVM_UNLIKELY justified by performance data? (again, microbenchmarks could probably justify it in many parts of LLVM where it doesn't make a difference in practice - so I'd be inclined to leave these out for now)

I have been reminded that there's also a desire to make DataExtractor work with 64-bit section sizes. Maybe Cursor should use a 64-bit offset (i.e., size_t not uint32_t), and then migrating from non-Cursor to Cursor APIs will also do the 64-bit transition? We need to bite that bullet at some point. (I kind of expect y'all to say, no way do that later, which is fine; mainly I wanted to refresh that in our collective minds.)

In D63713#1591371, @probinson wrote:

I have been reminded that there's also a desire to make DataExtractor work with 64-bit section sizes. Maybe Cursor should use a 64-bit offset (i.e., size_t not uint32_t), and then migrating from non-Cursor to Cursor APIs will also do the 64-bit transition? We need to bite that bullet at some point. (I kind of expect y'all to say, no way do that later, which is fine; mainly I wanted to refresh that in our collective minds.)

That sounds like a good idea to me. It shouldn't complicate the DataExtractor implementation too much (it should be enough to make the getU & co. static functions templated also on the offset type), and since adopting the Cursor API will require changes to the code using the DataExtractor anyway, it will make it easier to check that the code is compatible with 64-bit offsets.

labath marked 3 inline comments as done.Jul 18 2019, 7:10 AM

labath added inline comments.

lib/Support/DataExtractor.cpp
18–21	Indeed. This was a relict from earlier versions of this patch..
28	I already have such tests (DataExtractorDeathTest, below). The reason this works is because `ErrorAsOutParameter` clears the "checked" flag when the function returns. We could design a way to check this that is statically known to be safe, but I'm not sure if it's worth it for a function that is local to this file (the only method i can think of involves subclassing/wrapping ErrorAsOutParameter).
34	No, I don't have any performance data for this. I'll leave these out...

address feedback from @dblaikie
waiting to hear more opinions before updating the patch to use size_t

Harbormaster completed remote builds in B35253: Diff 210558.Jul 18 2019, 7:12 AM

Use size_t in the Cursor versions of the API

Harbormaster completed remote builds in B35634: Diff 211720.Jul 25 2019, 5:13 AM

I finally got around to updating the patch for size_t. It ended up looking slightly uglier than I hoped for (like the need for a special isValidOffsetForDataOfSizeT), but OTOH I also was able to remove the default-null error arguments from the "legacy" APIs.

Overall I don't think this can be made much cleaner without porting existing users to size_t first. Please LMK what you think.

This probably needs to be rebased now that D64006 is in?

Yeah, I noticed that go by (pretty good stuff btw), but I didn't get around to updating this yet..

Rebase/reimplement the patch on top of the uint64_t changes

Harbormaster completed remote builds in B37109: Diff 216597.Aug 22 2019, 6:24 AM

labath added a reviewer: ikudrin.Aug 22 2019, 6:25 AM

clang-format

Harbormaster completed remote builds in B37111: Diff 216600.Aug 22 2019, 6:25 AM

All the API overloads that take a Cursor should have doxygen descriptions.
No other complaints.

dblaikie added inline comments.Aug 22 2019, 1:46 PM

include/llvm/Support/DataExtractor.h
54–76	I don't feel strongly, but figured I'd mention it - do you reckon this encapsulation is worthwhile compared to having a struct with the Offset and Error being public members? It doesn't look like there'd be a lot of misuse if the two members were public.
lib/Support/DataExtractor.cpp
31	You can skip the 0 here, if you like, "return T();" should do the right thing, if I recall correctly.
36	And here
38	But maybe it'd be simpler to move "T val = 0;" (or "T val = T();") earlier and "return val" in those places that have "return T(0);"?
unittests/Support/DataExtractorTest.cpp
225	This produces undefined behavior in the non-death case (the dtor will run twice - importantly it'll run on something that isn't an object of the right type (because that object's already been destroyed) You could use an Optional<Cursor> to have control over the point of destruction (then "EXPECT_DEATH(opt.reset()..." to observe the unchecked Error).

@lhames - mind checking the testing of Error handling is following best practices?

Add doxygen comments. I've tried to be more brief than the existing comments, as I have found them exessively verbose and repetitive (they're longer than the implementation!).
Fix the double destruction in the death tests
Remove the template argument from the vector version of the getU8 method -- there's a lot of confusion about whether bytes should be stored as chars or uint8_ts and this was an attempt to placate both. I now think this is not a good idea, and I will try to make the users consistent instead.

Harbormaster completed remote builds in B37184: Diff 216839.Aug 23 2019, 7:10 AM

labath added inline comments.Aug 23 2019, 7:10 AM

include/llvm/Support/DataExtractor.h
54–76	No, I don't think the user could do much harm. I suppose one could somehow end up accidentally taking the address of the contained Offset member, use that for parsing but then later expect the regular cursor semantics to apply. But that probably not very likely to happen, At the end of the day, it just seems nicer to me to have accessor functions instead of the user twiddling with the fields of the struct directly, but that's highly subjective.
lib/Support/DataExtractor.cpp
31	Yeah, it will, except for the `Uint24` pseudo-type, which does not have the default constructor. I could add the default constructor, but that does not matter now with the new code layout (I think `val = 0` looks better than `val = T()`)
unittests/Support/DataExtractorTest.cpp
225	Done. I've gone with a std::unique_ptr because it makes the construction of the cursor slightly nicer (missing an `in_place_t` for llvm::Optional).

make sure the tests actually compile after removing the getU8 template

Harbormaster completed remote builds in B37185: Diff 216841.Aug 23 2019, 7:16 AM

Looks good to me - if Lang has some ideas on how to tidy up the testing, that can be dealt with in follow-up commits.

This revision is now accepted and ready to land.Aug 23 2019, 4:39 PM

Closed by commit rGb1f29cec2511: Add error handling to the DataExtractor class (authored by labath). · Explain WhyAug 27 2019, 4:26 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: hiraditya. · View Herald TranscriptAug 27 2019, 4:26 AM

MaskRay mentioned this in D67340: [Object] Implement relocation resolver for COFF ARM/ARM64.Sep 9 2019, 1:52 AM

labath mentioned this in D67343: [DebugInfo] Change object::RelocationResolver to return Expected<uint64_t>.Sep 11 2019, 2:06 AM

Revision Contents

Path

Size

include/

llvm/

DebugInfo/

DWARF/

DWARFDataExtractor.h

7 lines

Support/

DataExtractor.h

117 lines

lib/

DebugInfo/

DWARF/

DWARFDataExtractor.cpp

7 lines

Support/

DataExtractor.cpp

129 lines

unittests/

Support/

DataExtractorTest.cpp

143 lines

Diff 216600

include/llvm/DebugInfo/DWARF/DWARFDataExtractor.h

Show All 30 Lines	public:

/// Constructor for cases when there are no relocations.		/// Constructor for cases when there are no relocations.
DWARFDataExtractor(StringRef Data, bool IsLittleEndian, uint8_t AddressSize)		DWARFDataExtractor(StringRef Data, bool IsLittleEndian, uint8_t AddressSize)
: DataExtractor(Data, IsLittleEndian, AddressSize) {}		: DataExtractor(Data, IsLittleEndian, AddressSize) {}

/// Extracts a value and applies a relocation to the result if		/// Extracts a value and applies a relocation to the result if
/// one exists for the given offset.		/// one exists for the given offset.
uint64_t getRelocatedValue(uint32_t Size, uint64_t *Off,		uint64_t getRelocatedValue(uint32_t Size, uint64_t *Off,
uint64_t *SectionIndex = nullptr) const;		uint64_t *SectionIndex = nullptr,
		Error *Err = nullptr) const;

/// Extracts an address-sized value and applies a relocation to the result if		/// Extracts an address-sized value and applies a relocation to the result if
/// one exists for the given offset.		/// one exists for the given offset.
uint64_t getRelocatedAddress(uint64_t Off, uint64_t SecIx = nullptr) const {		uint64_t getRelocatedAddress(uint64_t Off, uint64_t SecIx = nullptr) const {
return getRelocatedValue(getAddressSize(), Off, SecIx);		return getRelocatedValue(getAddressSize(), Off, SecIx);
}		}
		uint64_t getRelocatedAddress(Cursor &C, uint64_t *SecIx = nullptr) const {
		return getRelocatedValue(getAddressSize(), &getOffset(C), SecIx,
		&getError(C));
		}

/// Extracts a DWARF-encoded pointer in \p Offset using \p Encoding.		/// Extracts a DWARF-encoded pointer in \p Offset using \p Encoding.
/// There is a DWARF encoding that uses a PC-relative adjustment.		/// There is a DWARF encoding that uses a PC-relative adjustment.
/// For these values, \p AbsPosOffset is used to fix them, which should		/// For these values, \p AbsPosOffset is used to fix them, which should
/// reflect the absolute address of this pointer.		/// reflect the absolute address of this pointer.
Optional<uint64_t> getEncodedPointer(uint64_t *Offset, uint8_t Encoding,		Optional<uint64_t> getEncodedPointer(uint64_t *Offset, uint8_t Encoding,
uint64_t AbsPosOffset = 0) const;		uint64_t AbsPosOffset = 0) const;

size_t size() const { return Section == nullptr ? 0 : Section->Data.size(); }		size_t size() const { return Section == nullptr ? 0 : Section->Data.size(); }
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_DEBUGINFO_DWARFDATAEXTRACTOR_H		#endif // LLVM_DEBUGINFO_DWARFDATAEXTRACTOR_H

include/llvm/Support/DataExtractor.h

//===-- DataExtractor.h ------------------------------------------ C++ --===//		//===-- DataExtractor.h ------------------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_SUPPORT_DATAEXTRACTOR_H		#ifndef LLVM_SUPPORT_DATAEXTRACTOR_H
#define LLVM_SUPPORT_DATAEXTRACTOR_H		#define LLVM_SUPPORT_DATAEXTRACTOR_H

#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/DataTypes.h"		#include "llvm/Support/DataTypes.h"
		#include "llvm/Support/Error.h"

namespace llvm {		namespace llvm {

/// An auxiliary type to facilitate extraction of 3-byte entities.		/// An auxiliary type to facilitate extraction of 3-byte entities.
struct Uint24 {		struct Uint24 {
uint8_t Bytes[3];		uint8_t Bytes[3];
Uint24(uint8_t U) {		Uint24(uint8_t U) {
Bytes[0] = Bytes[1] = Bytes[2] = U;		Bytes[0] = Bytes[1] = Bytes[2] = U;
Show All 15 Lines	inline uint24_t getSwappedBytes(uint24_t C) {
return uint24_t(C.Bytes[2], C.Bytes[1], C.Bytes[0]);		return uint24_t(C.Bytes[2], C.Bytes[1], C.Bytes[0]);
}		}

class DataExtractor {		class DataExtractor {
StringRef Data;		StringRef Data;
uint8_t IsLittleEndian;		uint8_t IsLittleEndian;
uint8_t AddressSize;		uint8_t AddressSize;
public:		public:
		/// A class representing a position in a DataExtractor, as well as any error
		/// encountered during extraction. It enables one to extract a sequence of
		/// values without error-checking and then checking for errors in bulk at the
		/// end. The class holds an Error object, so failing to check the result of
		/// the parse will result in a runtime error. The error flag is sticky and
		/// will cause all subsequent extraction functions to fail without even
		/// attempting to parse and without updating the Cursor offset. After clearing
		/// the error flag, one can again use the Cursor object for parsing.
		class Cursor {
		uint64_t Offset;
		Error Err;

		friend class DataExtractor;

		public:
		/// Construct a cursor for extraction from the given offset.
		explicit Cursor(uint64_t Offset) : Offset(Offset), Err(Error::success()) {}

		/// Checks whether the cursor is valid (i.e. no errors were encountered). In
		/// case of errors, this does not clear the error flag -- one must call
		/// takeError() instead.
		explicit operator bool() { return !Err; }

		/// Return the current position of this Cursor. In the error state this is
		/// the position of the Cursor before the first error was encountered.
		uint64_t tell() const { return Offset; }

		/// Return error contained inside this Cursor, if any. Clears the internal
		/// Cursor state.
		Error takeError() { return std::move(Err); }
		};
		dblaikieUnsubmitted Not Done Reply Inline Actions I don't feel strongly, but figured I'd mention it - do you reckon this encapsulation is worthwhile compared to having a struct with the Offset and Error being public members? It doesn't look like there'd be a lot of misuse if the two members were public. dblaikie: I don't feel strongly, but figured I'd mention it - do you reckon this encapsulation is…
		labathAuthorUnsubmitted Done Reply Inline Actions No, I don't think the user could do much harm. I suppose one could somehow end up accidentally taking the address of the contained Offset member, use that for parsing but then later expect the regular cursor semantics to apply. But that probably not very likely to happen, At the end of the day, it just seems nicer to me to have accessor functions instead of the user twiddling with the fields of the struct directly, but that's highly subjective. labath: No, I don't think the user could do much harm. I suppose one could somehow end up accidentally…

/// Construct with a buffer that is owned by the caller.		/// Construct with a buffer that is owned by the caller.
///		///
/// This constructor allows us to use data that is owned by the		/// This constructor allows us to use data that is owned by the
/// caller. The data must stay around as long as this object is		/// caller. The data must stay around as long as this object is
/// valid.		/// valid.
DataExtractor(StringRef Data, bool IsLittleEndian, uint8_t AddressSize)		DataExtractor(StringRef Data, bool IsLittleEndian, uint8_t AddressSize)
: Data(Data), IsLittleEndian(IsLittleEndian), AddressSize(AddressSize) {}		: Data(Data), IsLittleEndian(IsLittleEndian), AddressSize(AddressSize) {}

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	public:
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
/// @param[in] byte_size		/// @param[in] byte_size
/// The size in byte of the integer to extract.		/// The size in byte of the integer to extract.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The unsigned integer value that was extracted, or zero on		/// The unsigned integer value that was extracted, or zero on
/// failure.		/// failure.
uint64_t getUnsigned(uint64_t *offset_ptr, uint32_t byte_size) const;		uint64_t getUnsigned(uint64_t *offset_ptr, uint32_t byte_size,
		Error *Err = nullptr) const;

		uint64_t getUnsigned(Cursor &C, uint32_t Size) const {
		return getUnsigned(&C.Offset, Size, &C.Err);
		}
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Extract an signed integer of size \a byte_size from \a *offset_ptr.		/// Extract an signed integer of size \a byte_size from \a *offset_ptr.
///		///
/// Extract a single signed integer value (sign extending if required)		/// Extract a single signed integer value (sign extending if required)
/// and update the offset pointed to by \a offset_ptr. The size of		/// and update the offset pointed to by \a offset_ptr. The size of
/// the extracted integer is specified by the \a byte_size argument.		/// the extracted integer is specified by the \a byte_size argument.
/// \a byte_size should have a value greater than or equal to one		/// \a byte_size should have a value greater than or equal to one
/// and less than or equal to eight since the return value is 64		/// and less than or equal to eight since the return value is 64
Show All 30 Lines	public:
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
/// @return		/// @return
/// The extracted pointer value as a 64 integer.		/// The extracted pointer value as a 64 integer.
uint64_t getAddress(uint64_t *offset_ptr) const {		uint64_t getAddress(uint64_t *offset_ptr) const {
return getUnsigned(offset_ptr, AddressSize);		return getUnsigned(offset_ptr, AddressSize);
}		}
		uint64_t getAddress(Cursor &C) const { return getUnsigned(C, AddressSize); }

/// Extract a uint8_t value from \a *offset_ptr.		/// Extract a uint8_t value from \a *offset_ptr.
///		///
/// Extract a single uint8_t from the binary data at the offset		/// Extract a single uint8_t from the binary data at the offset
/// pointed to by \a offset_ptr, and advance the offset on success.		/// pointed to by \a offset_ptr, and advance the offset on success.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
/// A pointer to an offset within the data that will be advanced		/// A pointer to an offset within the data that will be advanced
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The extracted uint8_t value.		/// The extracted uint8_t value.
uint8_t getU8(uint64_t *offset_ptr) const;		uint8_t getU8(uint64_t offset_ptr, Error Err = nullptr) const;
		uint8_t getU8(Cursor &C) const { return getU8(&C.Offset, &C.Err); }
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Extract \a count uint8_t values from \a *offset_ptr.		/// Extract \a count uint8_t values from \a *offset_ptr.
///		///
/// Extract \a count uint8_t values from the binary data at the		/// Extract \a count uint8_t values from the binary data at the
/// offset pointed to by \a offset_ptr, and advance the offset on		/// offset pointed to by \a offset_ptr, and advance the offset on
/// success. The extracted values are copied into \a dst.		/// success. The extracted values are copied into \a dst.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
Show All 9 Lines	public:
///		///
/// @param[in] count		/// @param[in] count
/// The number of uint8_t values to extract.		/// The number of uint8_t values to extract.
///		///
/// @return		/// @return
/// \a dst if all values were properly extracted and copied,		/// \a dst if all values were properly extracted and copied,
/// NULL otherise.		/// NULL otherise.
uint8_t getU8(uint64_t offset_ptr, uint8_t *dst, uint32_t count) const;		uint8_t getU8(uint64_t offset_ptr, uint8_t *dst, uint32_t count) const;
		uint8_t getU8(Cursor &C, uint8_t Dst, uint32_t Count) const;

		template <typename T>
		void getU8(Cursor &C, SmallVectorImpl<T> &Dst, uint32_t Count) const {
		static_assert(
		std::is_same<T, char>::value \|\| std::is_same<T, uint8_t>::value, "");
		if (isValidOffsetForDataOfSize(C.Offset, Count))
		Dst.resize(Count);

		// This relies on the fact that getU8 will not attempt to write to the
		// buffer if isValidOffsetForDataOfSize(C.Offset, Count) is false.
		probinsonUnsubmitted Done Reply Inline Actions it -> if probinson: it -> if
		getU8(C, reinterpret_cast<uint8_t *>(Dst.data()), Count);
		}

//------------------------------------------------------------------		//------------------------------------------------------------------
/// Extract a uint16_t value from \a *offset_ptr.		/// Extract a uint16_t value from \a *offset_ptr.
///		///
/// Extract a single uint16_t from the binary data at the offset		/// Extract a single uint16_t from the binary data at the offset
/// pointed to by \a offset_ptr, and update the offset on success.		/// pointed to by \a offset_ptr, and update the offset on success.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
/// A pointer to an offset within the data that will be advanced		/// A pointer to an offset within the data that will be advanced
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The extracted uint16_t value.		/// The extracted uint16_t value.
//------------------------------------------------------------------		//------------------------------------------------------------------
uint16_t getU16(uint64_t *offset_ptr) const;		uint16_t getU16(uint64_t offset_ptr, Error Err = nullptr) const;
		uint16_t getU16(Cursor &C) const { return getU16(&C.Offset, &C.Err); }
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Extract \a count uint16_t values from \a *offset_ptr.		/// Extract \a count uint16_t values from \a *offset_ptr.
///		///
/// Extract \a count uint16_t values from the binary data at the		/// Extract \a count uint16_t values from the binary data at the
/// offset pointed to by \a offset_ptr, and advance the offset on		/// offset pointed to by \a offset_ptr, and advance the offset on
/// success. The extracted values are copied into \a dst.		/// success. The extracted values are copied into \a dst.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
Show All 39 Lines	public:
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
/// A pointer to an offset within the data that will be advanced		/// A pointer to an offset within the data that will be advanced
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The extracted uint32_t value.		/// The extracted uint32_t value.
uint32_t getU32(uint64_t *offset_ptr) const;		uint32_t getU32(uint64_t offset_ptr, Error Err = nullptr) const;
		uint32_t getU32(Cursor &C) const { return getU32(&C.Offset, &C.Err); }
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Extract \a count uint32_t values from \a *offset_ptr.		/// Extract \a count uint32_t values from \a *offset_ptr.
///		///
/// Extract \a count uint32_t values from the binary data at the		/// Extract \a count uint32_t values from the binary data at the
/// offset pointed to by \a offset_ptr, and advance the offset on		/// offset pointed to by \a offset_ptr, and advance the offset on
/// success. The extracted values are copied into \a dst.		/// success. The extracted values are copied into \a dst.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
Show All 22 Lines	public:
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
/// A pointer to an offset within the data that will be advanced		/// A pointer to an offset within the data that will be advanced
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The extracted uint64_t value.		/// The extracted uint64_t value.
uint64_t getU64(uint64_t *offset_ptr) const;		uint64_t getU64(uint64_t offset_ptr, Error Err = nullptr) const;
		uint64_t getU64(Cursor &C) const { return getU64(&C.Offset, &C.Err); }
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Extract \a count uint64_t values from \a *offset_ptr.		/// Extract \a count uint64_t values from \a *offset_ptr.
///		///
/// Extract \a count uint64_t values from the binary data at the		/// Extract \a count uint64_t values from the binary data at the
/// offset pointed to by \a offset_ptr, and advance the offset on		/// offset pointed to by \a offset_ptr, and advance the offset on
/// success. The extracted values are copied into \a dst.		/// success. The extracted values are copied into \a dst.
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	public:
///		///
/// @param[in,out] offset_ptr		/// @param[in,out] offset_ptr
/// A pointer to an offset within the data that will be advanced		/// A pointer to an offset within the data that will be advanced
/// by the appropriate number of bytes if the value is extracted		/// by the appropriate number of bytes if the value is extracted
/// correctly. If the offset is out of bounds or there are not		/// correctly. If the offset is out of bounds or there are not
/// enough bytes to extract this value, the offset will be left		/// enough bytes to extract this value, the offset will be left
/// unmodified.		/// unmodified.
///		///
		/// @param[in,out] Err
		/// A pointer to an Error object. Upon return the Error object is set to
		/// indicate the result (success/failure) of the function. If the Error
		/// object is already set when calling this function, no extraction is
		/// performed.
		///
/// @return		/// @return
/// The extracted unsigned integer value.		/// The extracted unsigned integer value.
uint64_t getULEB128(uint64_t *offset_ptr) const;		uint64_t getULEB128(uint64_t offset_ptr, llvm::Error Err = nullptr) const;
		uint64_t getULEB128(Cursor &C) const { return getULEB128(&C.Offset, &C.Err); }

		/// Advance the Cursor position by the given number of bytes.
		void skip(Cursor &C, uint64_t Length) const;
		probinsonUnsubmitted Done Reply Inline Actions So eof() intentionally ignores the error state? (Just making sure the contract is understood.) probinson: So eof() intentionally ignores the error state? (Just making sure the contract is understood.)
		labathAuthorUnsubmitted Done Reply Inline Actions Yes, that's analogous to std::iostream, where `eof()` tests the `eofbit` and `operator bool` checks the `failbit` (we don't have a `badbit` because there's no way to get a hard read error when reading from memory). labath: Yes, that's analogous to std::iostream, where `eof()` tests the `eofbit` and `operator bool`…

		/// Return true iff the Cursor is at the end of the buffer.
		bool eof(const Cursor &C) const { return Data.size() == C.Offset; }
		probinsonUnsubmitted Done Reply Inline Actions Need to add Err to the doxygen. probinson: Need to add Err to the doxygen.

/// Test the validity of \a offset.		/// Test the validity of \a offset.
///		///
/// @return		/// @return
/// \b true if \a offset is a valid offset into the data in this		/// \b true if \a offset is a valid offset into the data in this
/// object, \b false otherwise.		/// object, \b false otherwise.
bool isValidOffset(uint64_t offset) const { return Data.size() > offset; }		bool isValidOffset(uint64_t offset) const { return Data.size() > offset; }

Show All 11 Lines	public:
///		///
/// @return		/// @return
/// \b true if \a offset is a valid offset and there are enough		/// \b true if \a offset is a valid offset and there are enough
/// bytes for a pointer available at that offset, \b false		/// bytes for a pointer available at that offset, \b false
/// otherwise.		/// otherwise.
bool isValidOffsetForAddress(uint64_t offset) const {		bool isValidOffsetForAddress(uint64_t offset) const {
return isValidOffsetForDataOfSize(offset, AddressSize);		return isValidOffsetForDataOfSize(offset, AddressSize);
}		}

		protected:
		// Make it possible for subclasses to access these fields without making them
		// public.
		static uint64_t &getOffset(Cursor &C) { return C.Offset; }
		static Error &getError(Cursor &C) { return C.Err; }
};		};

} // namespace llvm		} // namespace llvm
		JDevlieghereUnsubmitted Done Reply Inline Actions We'll definitely need to add some comment here to motivate its existence, probably contrasting it with the statelessness of the DataExtractor. I saw this patch first and I wondered "why not add this to the DataExtractor directly?", before catching up on the other patch. JDevlieghere: We'll definitely need to add some comment here to motivate its existence, probably contrasting…
		probinsonUnsubmitted Done Reply Inline Actions Also that this depends on DataExtractor "get" methods returning zero for past-the-end calls. probinson: Also that this depends on DataExtractor "get" methods returning zero for past-the-end calls.

#endif		#endif
		labathAuthorUnsubmitted Done Reply Inline Actions This isn't consistent with iostreams (which detect eof only after one attempts to read past it), but it seems to me that this makes using the class much simpler. labath: This isn't consistent with iostreams (which detect eof only after one attempts to read past it)…
		labathAuthorUnsubmitted Done Reply Inline Actions As can be seen here, checking whether the offset is updated (which is currently the only way of checking for errors) is a very tricky thing, since some functions can also "succeed" while not updating the offset. labath: As can be seen here, checking whether the offset is updated (which is currently the only way of…
		labathAuthorUnsubmitted Done Reply Inline Actions To be consistent with iostreams we should stop attempting to read data once we encounter the first error. Doing that would add another branch to the code. Not doing that opens up the possibility for some weird behavior, where we first try to read a uint64_t and fail, but then try to read a uint8_t and succeed because the stream happened to have one more byte left. This could be mitigated by setting Offset to UINT32_MAX on failure. labath: To be consistent with iostreams we should stop attempting to read data once we encounter the…
		labathAuthorUnsubmitted Done Reply Inline Actions maybe `readUXX` would be better instead of `getUXX`? labath: maybe `readUXX` would be better instead of `getUXX`?
		JDevlieghereUnsubmitted Done Reply Inline Actions +1 JDevlieghere: +1
		JDevlieghereUnsubmitted Done Reply Inline Actions Should we store an llvm::Error here instead? I guess it doesn't matter much if there's only one error we can detect, but OTOH it'd be nice if we're going to convert them anyway, and all these errors are consistent. JDevlieghere: Should we store an llvm::Error here instead? I guess it doesn't matter much if there's only one…
		labathAuthorUnsubmitted Done Reply Inline Actions If we enlist the help of DataExtractor class, then there may be more kinds of errors that we can detect (and I'm thinking we should enlist it, because checking for errors via these offsets is tricky and probably slower than if the DataExtractor set a flag directly). The one error kind that comes to mind is "invalid uleb128" -- right now the uleb functions will happily read a 1 megabyte uleb, if all the bytes have bit7 set. Storing an `Error` object as a member variable is a bit tricky due to the checked flag, noncopyability, etc. We could definitely at least have a member function that returns an `Error` object. But I think the first question we need to answer is what kind of error semantics do we want to have here. Should it be something strictly optional (as it is right now), or should it be something that blows up if you forget to check for errors (like `llvm::Error` does)... I don't really have an opinion here, as I can see a case for both things.. labath: If we enlist the help of DataExtractor class, then there may be more kinds of errors that we…
		probinsonUnsubmitted Done Reply Inline Actions `assert(Offset <= DE.size());` to guard against construction with a bogus Offset? probinson: `assert(Offset <= DE.size());` to guard against construction with a bogus Offset?

lib/DebugInfo/DWARF/DWARFDataExtractor.cpp

	//===- DWARFDataExtractor.cpp ---------------------------------------------===//			//===- DWARFDataExtractor.cpp ---------------------------------------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"			#include "llvm/DebugInfo/DWARF/DWARFDataExtractor.h"
	#include "llvm/BinaryFormat/Dwarf.h"			#include "llvm/BinaryFormat/Dwarf.h"
	#include "llvm/DebugInfo/DWARF/DWARFContext.h"			#include "llvm/DebugInfo/DWARF/DWARFContext.h"

	using namespace llvm;			using namespace llvm;

	uint64_t DWARFDataExtractor::getRelocatedValue(uint32_t Size, uint64_t *Off,			uint64_t DWARFDataExtractor::getRelocatedValue(uint32_t Size, uint64_t *Off,
	uint64_t *SecNdx) const {			uint64_t *SecNdx,
				Error *Err) const {
	if (SecNdx)			if (SecNdx)
	*SecNdx = object::SectionedAddress::UndefSection;			*SecNdx = object::SectionedAddress::UndefSection;
	if (!Section)			if (!Section)
	return getUnsigned(Off, Size);			return getUnsigned(Off, Size, Err);
	Optional<RelocAddrEntry> E = Obj->find(Section, Off);			Optional<RelocAddrEntry> E = Obj->find(Section, Off);
	uint64_t A = getUnsigned(Off, Size);			uint64_t A = getUnsigned(Off, Size, Err);
	if (!E)			if (!E)
	return A;			return A;
	if (SecNdx)			if (SecNdx)
	*SecNdx = E->SectionIndex;			*SecNdx = E->SectionIndex;
	uint64_t R = E->Resolver(E->Reloc, E->SymbolValue, A);			uint64_t R = E->Resolver(E->Reloc, E->SymbolValue, A);
	if (E->Reloc2)			if (E->Reloc2)
	R = E->Resolver(*E->Reloc2, E->SymbolValue2, R);			R = E->Resolver(*E->Reloc2, E->SymbolValue2, R);
	return R;			return R;
	▲ Show 20 Lines • Show All 69 Lines • Show Last 20 Lines

lib/Support/DataExtractor.cpp

//===-- DataExtractor.cpp -------------------------------------------------===//		//===-- DataExtractor.cpp -------------------------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Support/DataExtractor.h"		#include "llvm/Support/DataExtractor.h"
		#include "llvm/Support/Errc.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Host.h"		#include "llvm/Support/Host.h"
#include "llvm/Support/SwapByteOrder.h"
#include "llvm/Support/LEB128.h"		#include "llvm/Support/LEB128.h"
		#include "llvm/Support/SwapByteOrder.h"

using namespace llvm;		using namespace llvm;

		static void unexpectedEndReached(Error *E) {
		if (E)
		*E = createStringError(errc::illegal_byte_sequence,
		"unexpected end of data");
		dblaikieUnsubmitted Done Reply Inline Actions Not sure this function adds value compared to writing the "createStringError" call directly in "unexpectedEndReached"? dblaikie: Not sure this function adds value compared to writing the "createStringError" call directly in…
		labathAuthorUnsubmitted Done Reply Inline Actions Indeed. This was a relict from earlier versions of this patch.. labath: Indeed. This was a relict from earlier versions of this patch..
		}

		static bool isError(Error E) { return E && E; }

template <typename T>		template <typename T>
static T getU(uint64_t offset_ptr, const DataExtractor de,		static T getU(uint64_t offset_ptr, const DataExtractor de,
bool isLittleEndian, const char *Data) {		bool isLittleEndian, const char Data, llvm::Error Err) {
		dblaikieUnsubmitted Done Reply Inline Actions I /think/ these should be early-returns if the Error is already set (assuming the goal is to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations) Also, you will need to be careful with how you check if the Error is already set - since you want to make sure you don't set the Error to the "checked" state (otherwise the caller won't be forced to check it, which is bad). Hmm, perhaps you actually want to test that the Error is checked. If the error is unchecked, then return early & leave it to the caller to check. The "unexpectedEndReached" can then just do "if (E) E = ...;" without having to check the error state. If the goal is to treat the Error as a straight out parameter without the repeated deserialization support (because something at a higher level will handle that in some way) - then maybe this should be ErrorAsOutParameter instead? dblaikie:* I /think/ these should be early-returns if the Error is already set (assuming the goal is to…
		labathAuthorUnsubmitted Done Reply Inline Actions The goal is "to support repeated calls to deserialization functions with the same Error and only check once at the end of the sequence of deserializations" like you've said. I also think that it would be nice to have early returns here. The reason I am reluctant to do that is because this will add an additional branch to the hot success path, where we do not encounter an error. In reality, this will probably be two branches because we will also need to check the `Err` pointer, at least in the interim stage. I hope that this would not matter in practice because once one reaches the end of the deserialization sequence and finds that the error object is set, the only thing he can reliably do is disregard all data he has read since the last time he checked for the error state. Nonetheless having early returns here would definitely make the API more predictable. I'm going to try to get some numbers on the impact of returning early here... labath: The goal is "to support repeated calls to deserialization functions with the same Error and…
		dblaikieUnsubmitted Done Reply Inline Actions The problem with this is that it clears the "unchecked" bit in the Error, which means code using DataExtractor would not get an assertion failure in Error about it being unchecked. Might be worth having a gunit death test to demonstrate that failing to check the Error after parsing some fields does assert/crash. dblaikie: The problem with this is that it clears the "unchecked" bit in the Error, which means code…
		labathAuthorUnsubmitted Done Reply Inline Actions I already have such tests (DataExtractorDeathTest, below). The reason this works is because `ErrorAsOutParameter` clears the "checked" flag when the function returns. We could design a way to check this that is statically known to be safe, but I'm not sure if it's worth it for a function that is local to this file (the only method i can think of involves subclassing/wrapping ErrorAsOutParameter). labath: I already have such tests (DataExtractorDeathTest, below). The reason this works is because…
T val = 0;		ErrorAsOutParameter ErrAsOut(Err);
		if (isError(Err))
		return T(0);
		dblaikieUnsubmitted Done Reply Inline Actions You can skip the 0 here, if you like, "return T();" should do the right thing, if I recall correctly. dblaikie: You can skip the 0 here, if you like, "return T();" should do the right thing, if I recall…
		labathAuthorUnsubmitted Done Reply Inline Actions Yeah, it will, except for the `Uint24` pseudo-type, which does not have the default constructor. I could add the default constructor, but that does not matter now with the new code layout (I think `val = 0` looks better than `val = T()`) labath: Yeah, it will, except for the `Uint24` pseudo-type, which does not have the default constructor.

uint64_t offset = *offset_ptr;		uint64_t offset = *offset_ptr;
if (de->isValidOffsetForDataOfSize(offset, sizeof(val))) {		if (!de->isValidOffsetForDataOfSize(offset, sizeof(T))) {
		dblaikieUnsubmitted Done Reply Inline Actions Is the LLVM_UNLIKELY justified by performance data? (again, microbenchmarks could probably justify it in many parts of LLVM where it doesn't make a difference in practice - so I'd be inclined to leave these out for now) dblaikie: Is the LLVM_UNLIKELY justified by performance data? (again, microbenchmarks could probably…
		labathAuthorUnsubmitted Done Reply Inline Actions No, I don't have any performance data for this. I'll leave these out... labath: No, I don't have any performance data for this. I'll leave these out...
		unexpectedEndReached(Err);
		return T(0);
		dblaikieUnsubmitted Done Reply Inline Actions And here dblaikie: And here
		}
		T val = 0;
		dblaikieUnsubmitted Done Reply Inline Actions But maybe it'd be simpler to move "T val = 0;" (or "T val = T();") earlier and "return val" in those places that have "return T(0);"? dblaikie: But maybe it'd be simpler to move "T val = 0;" (or "T val = T();") earlier and "return val" in…
std::memcpy(&val, &Data[offset], sizeof(val));		std::memcpy(&val, &Data[offset], sizeof(val));
if (sys::IsLittleEndianHost != isLittleEndian)		if (sys::IsLittleEndianHost != isLittleEndian)
sys::swapByteOrder(val);		sys::swapByteOrder(val);

// Advance the offset		// Advance the offset
*offset_ptr += sizeof(val);		*offset_ptr += sizeof(val);
}
return val;		return val;
}		}

template <typename T>		template <typename T>
static T getUs(uint64_t offset_ptr, T *dst, uint32_t count,		static T getUs(uint64_t offset_ptr, T *dst, uint32_t count,
const DataExtractor de, bool isLittleEndian, const char Data){		const DataExtractor de, bool isLittleEndian, const char Data,
		llvm::Error *Err) {
		ErrorAsOutParameter ErrAsOut(Err);
		if (isError(Err))
		return nullptr;

uint64_t offset = *offset_ptr;		uint64_t offset = *offset_ptr;

if (count > 0 && de->isValidOffsetForDataOfSize(offset, sizeof(dst)count)) {		if (!de->isValidOffsetForDataOfSize(offset, sizeof(dst) count)) {
		unexpectedEndReached(Err);
		return nullptr;
		}
for (T value_ptr = dst, end = dst + count; value_ptr != end;		for (T value_ptr = dst, end = dst + count; value_ptr != end;
++value_ptr, offset += sizeof(*dst))		++value_ptr, offset += sizeof(*dst))
*value_ptr = getU<T>(offset_ptr, de, isLittleEndian, Data);		*value_ptr = getU<T>(offset_ptr, de, isLittleEndian, Data, Err);
// Advance the offset		// Advance the offset
*offset_ptr = offset;		*offset_ptr = offset;
// Return a non-NULL pointer to the converted data as an indicator of		// Return a non-NULL pointer to the converted data as an indicator of
// success		// success
return dst;		return dst;
}		}
return nullptr;
}

uint8_t DataExtractor::getU8(uint64_t *offset_ptr) const {		uint8_t DataExtractor::getU8(uint64_t offset_ptr, llvm::Error Err) const {
return getU<uint8_t>(offset_ptr, this, IsLittleEndian, Data.data());		return getU<uint8_t>(offset_ptr, this, IsLittleEndian, Data.data(), Err);
}		}

uint8_t *		uint8_t *
DataExtractor::getU8(uint64_t offset_ptr, uint8_t dst, uint32_t count) const {		DataExtractor::getU8(uint64_t offset_ptr, uint8_t dst, uint32_t count) const {
return getUs<uint8_t>(offset_ptr, dst, count, this, IsLittleEndian,		return getUs<uint8_t>(offset_ptr, dst, count, this, IsLittleEndian,
Data.data());		Data.data(), nullptr);
		}

		uint8_t DataExtractor::getU8(Cursor &C, uint8_t Dst, uint32_t Count) const {
		return getUs<uint8_t>(&C.Offset, Dst, Count, this, IsLittleEndian,
		Data.data(), &C.Err);
}		}

uint16_t DataExtractor::getU16(uint64_t *offset_ptr) const {		uint16_t DataExtractor::getU16(uint64_t offset_ptr, llvm::Error Err) const {
return getU<uint16_t>(offset_ptr, this, IsLittleEndian, Data.data());		return getU<uint16_t>(offset_ptr, this, IsLittleEndian, Data.data(), Err);
}		}

uint16_t DataExtractor::getU16(uint64_t offset_ptr, uint16_t *dst,		uint16_t DataExtractor::getU16(uint64_t offset_ptr, uint16_t *dst,
uint32_t count) const {		uint32_t count) const {
return getUs<uint16_t>(offset_ptr, dst, count, this, IsLittleEndian,		return getUs<uint16_t>(offset_ptr, dst, count, this, IsLittleEndian,
Data.data());		Data.data(), nullptr);
}		}

uint32_t DataExtractor::getU24(uint64_t *offset_ptr) const {		uint32_t DataExtractor::getU24(uint64_t *offset_ptr) const {
uint24_t ExtractedVal =		uint24_t ExtractedVal =
getU<uint24_t>(offset_ptr, this, IsLittleEndian, Data.data());		getU<uint24_t>(offset_ptr, this, IsLittleEndian, Data.data(), nullptr);
// The 3 bytes are in the correct byte order for the host.		// The 3 bytes are in the correct byte order for the host.
return ExtractedVal.getAsUint32(sys::IsLittleEndianHost);		return ExtractedVal.getAsUint32(sys::IsLittleEndianHost);
}		}

uint32_t DataExtractor::getU32(uint64_t *offset_ptr) const {		uint32_t DataExtractor::getU32(uint64_t offset_ptr, llvm::Error Err) const {
return getU<uint32_t>(offset_ptr, this, IsLittleEndian, Data.data());		return getU<uint32_t>(offset_ptr, this, IsLittleEndian, Data.data(), Err);
}		}

uint32_t DataExtractor::getU32(uint64_t offset_ptr, uint32_t *dst,		uint32_t DataExtractor::getU32(uint64_t offset_ptr, uint32_t *dst,
uint32_t count) const {		uint32_t count) const {
return getUs<uint32_t>(offset_ptr, dst, count, this, IsLittleEndian,		return getUs<uint32_t>(offset_ptr, dst, count, this, IsLittleEndian,
Data.data());		Data.data(), nullptr);
}		}

uint64_t DataExtractor::getU64(uint64_t *offset_ptr) const {		uint64_t DataExtractor::getU64(uint64_t offset_ptr, llvm::Error Err) const {
return getU<uint64_t>(offset_ptr, this, IsLittleEndian, Data.data());		return getU<uint64_t>(offset_ptr, this, IsLittleEndian, Data.data(), Err);
}		}

uint64_t DataExtractor::getU64(uint64_t offset_ptr, uint64_t *dst,		uint64_t DataExtractor::getU64(uint64_t offset_ptr, uint64_t *dst,
uint32_t count) const {		uint32_t count) const {
return getUs<uint64_t>(offset_ptr, dst, count, this, IsLittleEndian,		return getUs<uint64_t>(offset_ptr, dst, count, this, IsLittleEndian,
Data.data());		Data.data(), nullptr);
}		}

uint64_t		uint64_t DataExtractor::getUnsigned(uint64_t *offset_ptr, uint32_t byte_size,
DataExtractor::getUnsigned(uint64_t *offset_ptr, uint32_t byte_size) const {		llvm::Error *Err) const {
switch (byte_size) {		switch (byte_size) {
case 1:		case 1:
return getU8(offset_ptr);		return getU8(offset_ptr, Err);
case 2:		case 2:
return getU16(offset_ptr);		return getU16(offset_ptr, Err);
case 4:		case 4:
return getU32(offset_ptr);		return getU32(offset_ptr, Err);
case 8:		case 8:
return getU64(offset_ptr);		return getU64(offset_ptr, Err);
}		}
llvm_unreachable("getUnsigned unhandled case!");		llvm_unreachable("getUnsigned unhandled case!");
}		}

int64_t		int64_t
DataExtractor::getSigned(uint64_t *offset_ptr, uint32_t byte_size) const {		DataExtractor::getSigned(uint64_t *offset_ptr, uint32_t byte_size) const {
switch (byte_size) {		switch (byte_size) {
case 1:		case 1:
Show All 23 Lines	StringRef DataExtractor::getCStrRef(uint64_t *offset_ptr) const {
StringRef::size_type Pos = Data.find('\0', Start);		StringRef::size_type Pos = Data.find('\0', Start);
if (Pos != StringRef::npos) {		if (Pos != StringRef::npos) {
*offset_ptr = Pos + 1;		*offset_ptr = Pos + 1;
return StringRef(Data.data() + Start, Pos - Start);		return StringRef(Data.data() + Start, Pos - Start);
}		}
return StringRef();		return StringRef();
}		}

uint64_t DataExtractor::getULEB128(uint64_t *offset_ptr) const {		uint64_t DataExtractor::getULEB128(uint64_t *offset_ptr,
		llvm::Error *Err) const {
assert(*offset_ptr <= Data.size());		assert(*offset_ptr <= Data.size());
		ErrorAsOutParameter ErrAsOut(Err);
		if (isError(Err))
		return 0;

const char *error;		const char *error;
unsigned bytes_read;		unsigned bytes_read;
uint64_t result = decodeULEB128(		uint64_t result = decodeULEB128(
reinterpret_cast<const uint8_t >(Data.data() + offset_ptr), &bytes_read,		reinterpret_cast<const uint8_t >(Data.data() + offset_ptr), &bytes_read,
reinterpret_cast<const uint8_t *>(Data.data() + Data.size()), &error);		reinterpret_cast<const uint8_t *>(Data.data() + Data.size()), &error);
if (error)		if (error) {
		if (Err)
		*Err = createStringError(errc::illegal_byte_sequence, error);
return 0;		return 0;
		}
*offset_ptr += bytes_read;		*offset_ptr += bytes_read;
return result;		return result;
}		}

int64_t DataExtractor::getSLEB128(uint64_t *offset_ptr) const {		int64_t DataExtractor::getSLEB128(uint64_t *offset_ptr) const {
assert(*offset_ptr <= Data.size());		assert(*offset_ptr <= Data.size());

const char *error;		const char *error;
unsigned bytes_read;		unsigned bytes_read;
int64_t result = decodeSLEB128(		int64_t result = decodeSLEB128(
reinterpret_cast<const uint8_t >(Data.data() + offset_ptr), &bytes_read,		reinterpret_cast<const uint8_t >(Data.data() + offset_ptr), &bytes_read,
reinterpret_cast<const uint8_t *>(Data.data() + Data.size()), &error);		reinterpret_cast<const uint8_t *>(Data.data() + Data.size()), &error);
if (error)		if (error)
return 0;		return 0;
*offset_ptr += bytes_read;		*offset_ptr += bytes_read;
return result;		return result;
}		}

		void DataExtractor::skip(Cursor &C, uint64_t Length) const {
		ErrorAsOutParameter ErrAsOut(&C.Err);
		if (isError(&C.Err))
		return;

		if (isValidOffsetForDataOfSize(C.Offset, Length))
		C.Offset += Length;
		else
		unexpectedEndReached(&C.Err);
		}

unittests/Support/DataExtractorTest.cpp

//===- llvm/unittest/Support/DataExtractorTest.cpp - DataExtractor tests --===//		//===- llvm/unittest/Support/DataExtractorTest.cpp - DataExtractor tests --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Support/DataExtractor.h"		#include "llvm/Support/DataExtractor.h"
		#include "llvm/Testing/Support/Error.h"
		probinsonUnsubmitted Done Reply Inline Actions The Error is part of the DataExtractor interface, you should not need to #include this here. probinson: The Error is part of the DataExtractor interface, you should not need to #include this here.
		labathAuthorUnsubmitted Done Reply Inline Actions This is the testing support header which defines stuff that makes things like `EXPECT_THAT_ERROR(..., Succeeded())` work. labath: This is the testing support header which defines stuff that makes things like…
		probinsonUnsubmitted Done Reply Inline Actions Doh! I'm due for an eye test. probinson: Doh! I'm due for an eye test.
#include "gtest/gtest.h"		#include "gtest/gtest.h"
using namespace llvm;		using namespace llvm;

namespace {		namespace {

const char numberData[] = "\x80\x90\xFF\xFF\x80\x00\x00\x00";		const char numberData[] = "\x80\x90\xFF\xFF\x80\x00\x00\x00";
const char stringData[] = "hellohello\0hello";		const char stringData[] = "hellohello\0hello";
const char leb128data[] = "\xA6\x49";		const char leb128data[] = "\xA6\x49";
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	TEST(DataExtractorTest, LEB128_error) {
uint64_t Offset = 0;		uint64_t Offset = 0;
EXPECT_EQ(0U, DE.getULEB128(&Offset));		EXPECT_EQ(0U, DE.getULEB128(&Offset));
EXPECT_EQ(0U, Offset);		EXPECT_EQ(0U, Offset);

Offset = 0;		Offset = 0;
EXPECT_EQ(0U, DE.getSLEB128(&Offset));		EXPECT_EQ(0U, DE.getSLEB128(&Offset));
EXPECT_EQ(0U, Offset);		EXPECT_EQ(0U, Offset);
}		}

		TEST(DataExtractorTest, Cursor_tell) {
		DataExtractor DE(StringRef("AB"), false, 8);
		DataExtractor::Cursor C(0);
		// A successful read operation advances the cursor
		EXPECT_EQ('A', DE.getU8(C));
		EXPECT_EQ(1u, C.tell());

		// An unsuccessful one doesn't.
		EXPECT_EQ(0u, DE.getU16(C));
		EXPECT_EQ(1u, C.tell());

		// And neither do any subsequent operations.
		EXPECT_EQ(0, DE.getU8(C));
		EXPECT_EQ(1u, C.tell());

		consumeError(C.takeError());
		}

		TEST(DataExtractorTest, Cursor_takeError) {
		DataExtractor DE(StringRef("AB"), false, 8);
		DataExtractor::Cursor C(0);
		// Initially, the cursor is in the "success" state.
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());

		// It remains "success" after a successful read.
		EXPECT_EQ('A', DE.getU8(C));
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());

		// An unsuccessful read sets the error state.
		EXPECT_EQ(0u, DE.getU32(C));
		EXPECT_THAT_ERROR(C.takeError(), Failed());

		// Once set the error sticks until explicitly cleared.
		EXPECT_EQ(0u, DE.getU32(C));
		EXPECT_EQ(0, DE.getU8(C));
		EXPECT_THAT_ERROR(C.takeError(), Failed());

		// At which point reads can be succeed again.
		EXPECT_EQ('B', DE.getU8(C));
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		}

		TEST(DataExtractorTest, Cursor_chaining) {
		DataExtractor DE(StringRef("ABCD"), false, 8);
		DataExtractor::Cursor C(0);

		// Multiple reads can be chained without trigerring any assertions.
		EXPECT_EQ('A', DE.getU8(C));
		EXPECT_EQ('B', DE.getU8(C));
		EXPECT_EQ('C', DE.getU8(C));
		EXPECT_EQ('D', DE.getU8(C));
		// And the error checked at the end.
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		}

		#if defined(GTEST_HAS_DEATH_TEST) && defined(_DEBUG)
		TEST(DataExtractorDeathTest, Cursor) {
		DataExtractor DE(StringRef("AB"), false, 8);

		// Even an unused cursor must be checked for errors:
		EXPECT_DEATH(DataExtractor::Cursor(0),
		"Success values must still be checked prior to being destroyed");

		{
		DataExtractor::Cursor C(0);
		EXPECT_EQ(0u, DE.getU32(C));
		// It must also be checked after an unsuccessful operation.
		// destruction.
		EXPECT_DEATH(C.~Cursor(), "unexpected end of data");
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		}
		{
		DataExtractor::Cursor C(0);
		EXPECT_EQ('A', DE.getU8(C));
		// Same goes for a successful one.
		EXPECT_DEATH(
		C.~Cursor(),
		"Success values must still be checked prior to being destroyed");
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		}
		{
		DataExtractor::Cursor C(0);
		EXPECT_EQ('A', DE.getU8(C));
		EXPECT_EQ(0u, DE.getU32(C));
		// Even if a successful operation is followed by an unsuccessful one.
		EXPECT_DEATH(C.~Cursor(), "unexpected end of data");
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		}
		{
		DataExtractor::Cursor C(0);
		EXPECT_EQ(0u, DE.getU32(C));
		EXPECT_EQ(0, DE.getU8(C));
		// Even if an unsuccessful operation is followed by one that would normally
		// succeed.
		EXPECT_DEATH(C.~Cursor(), "unexpected end of data");
		dblaikieUnsubmitted Done Reply Inline Actions This produces undefined behavior in the non-death case (the dtor will run twice - importantly it'll run on something that isn't an object of the right type (because that object's already been destroyed) You could use an Optional<Cursor> to have control over the point of destruction (then "EXPECT_DEATH(opt.reset()..." to observe the unchecked Error). dblaikie: This produces undefined behavior in the non-death case (the dtor will run twice - importantly…
		labathAuthorUnsubmitted Done Reply Inline Actions Done. I've gone with a std::unique_ptr because it makes the construction of the cursor slightly nicer (missing an `in_place_t` for llvm::Optional). labath: Done. I've gone with a std::unique_ptr because it makes the construction of the cursor slightly…
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		}
		}
		#endif

		TEST(DataExtractorTest, getU8_vector) {
		DataExtractor DE(StringRef("AB"), false, 8);
		DataExtractor::Cursor C(0);
		SmallString<2> S;

		DE.getU8(C, S, 4);
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		EXPECT_EQ("", S);

		DE.getU8(C, S, 2);
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		EXPECT_EQ("AB", S);
		}

		TEST(DataExtractorTest, skip) {
		DataExtractor DE(StringRef("AB"), false, 8);
		DataExtractor::Cursor C(0);

		DE.skip(C, 4);
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		EXPECT_EQ(0u, C.tell());

		DE.skip(C, 2);
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		EXPECT_EQ(2u, C.tell());
		}

		TEST(DataExtractorTest, eof) {
		DataExtractor DE(StringRef("A"), false, 8);
		DataExtractor::Cursor C(0);

		EXPECT_FALSE(DE.eof(C));

		EXPECT_EQ(0, DE.getU16(C));
		EXPECT_FALSE(DE.eof(C));
		EXPECT_THAT_ERROR(C.takeError(), Failed());
		probinsonUnsubmitted Done Reply Inline Actions Test for eof() in the error case? (attempt to read too far, but eof is still false, if that's the correct contract.) probinson: Test for eof() in the error case? (attempt to read too far, but eof is still false, if that's…

		EXPECT_EQ('A', DE.getU8(C));
		EXPECT_TRUE(DE.eof(C));
		EXPECT_THAT_ERROR(C.takeError(), Succeeded());
		}
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Add error handling to the DataExtractor classClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 216600

include/llvm/DebugInfo/DWARF/DWARFDataExtractor.h

include/llvm/Support/DataExtractor.h

lib/DebugInfo/DWARF/DWARFDataExtractor.cpp

lib/Support/DataExtractor.cpp

unittests/Support/DataExtractorTest.cpp

Add error handling to the DataExtractor class
ClosedPublic