This is an archive of the discontinued LLVM Phabricator instance.

[XRay] Improve error reporting when loading traces
ClosedPublic

Authored by dberris on Aug 1 2018, 10:01 PM.

Details

Summary

This change uses a single offset pointer used throughout the
implementation of the individual record parsers. This allows us to
report where in a trace file parsing failed.

We're still in an intermediate step here as we prepare to refactor this
further into a set of types and use object-oriented design principles
for a cleaner implementation. The next steps will be to allow us to
parse/dump files in a streaming fashion and incrementally build up the
structures in memory instead of the current all-or-nothing approach.

Diff Detail

Repository
rL LLVM

Event Timeline

dberris created this revision.Aug 1 2018, 10:01 PM
kpw added inline comments.Aug 5 2018, 2:43 PM
llvm/lib/XRay/Trace.cpp
71 ↗(On Diff #158682)

Missing error check here.

113 ↗(On Diff #158682)

It would be simpler to check IsValidOffset(OffsetPtr + 31) at the start of the loop. If there are incomplete records, we can identify, diagnose, and handle them here.

Since the condition for the offset not being updated is "the offset is out of bounds or there are not enough bytes to extract this value" and we know the size of the record, we can check once per record (and check before reading custom event payloads).

Granted, this relies on logic errors not causing reads past the expected 32 bytes, but interspersing all of these error checks on every read obscures the logic imho.

244 ↗(On Diff #158682)

If you move the preoffset ptr check to the beginning of the loop, we could add an assert here that OffsetPtr ends up 32 bytes ahead to keep the implementation honest.

325 ↗(On Diff #158682)

asserts would be good at the end of these blocks to check that the OffsetPtr has advanced by the body size as expected.

478 ↗(On Diff #158682)

If the source of error is that the event size exceeds the actual buffer, it would be more informative to check that here rather than get an error message trying to read the next record.

549 ↗(On Diff #158682)

In my opinion this deserves an explanation, but you've deleted the comment.

635–637 ↗(On Diff #158682)

Do you really believe this combination of auto and decltype is more clear?

646–649 ↗(On Diff #158682)

While I think this is an improvement, it runs counter to the NFC assertion, no?

793–801 ↗(On Diff #158682)

Consider that (apart from custom events, we know to expect at least either kFDRMetadataBodySize more valid bytes or (sizeof(functionrecord) - 1) == 7) more valid bytes. We could check that here perfectly well and avoid relying on so many different places to correctly update OffsetPtr.

dberris updated this revision to Diff 159247.Aug 5 2018, 11:22 PM
dberris marked 9 inline comments as done.
  • fixup: address comments by @kpw
dberris added inline comments.Aug 5 2018, 11:22 PM
llvm/lib/XRay/Trace.cpp
244 ↗(On Diff #158682)

I've added a check explicitly to see whether we can load enough bytes for a full record at the beginning of the loop instead.

635–637 ↗(On Diff #158682)

Yes, because this avoids having to guess what the types are (and allowing the compiler decide these). In particular, this doesn't force narrowing/widening conversions of the values resulting from the arithmetic operations.

646–649 ↗(On Diff #158682)

Right, I'll update the title.

793–801 ↗(On Diff #158682)

I thought about that, but it makes refactoring this code much harder (which I'm actually doing after this change, in my local repo). Having this one point where the information on the sizing of the records actually makes it harder to reason about the state of the parsing. This single change pushes down the logic for knowing which parts of the record is padding, which is important, etc.

dberris retitled this revision from [XRay] Improve error reporting when loading traces (NFC) to [XRay] Improve error reporting when loading traces.Aug 5 2018, 11:24 PM
kpw accepted this revision.EditedAug 6 2018, 1:19 AM

Consider creating wrapper functions around DataExtractor's getU64, getU32, getU16, and getU8 that take a DataExtractor&, an Offset&, a descriptive twine, and return an Expected<T>. I'm trying to come up with ideas to keep all the granular checks as you desire, but pull some of the ceremonial noise out of the flow of the main logic.

the use would be something like:

auto result = expectU16(RecordExtractor, OffsetPtr, "cpu id");
if (auto err = result.takeError()) return err;
State.CPUId = *result;

The implementation (you'd have to duplicate functions for a few different types) would be something like

Expected<uint16_t> expectU16(DataExtractor &RecordExtractor, uint32_t &OffsetPtr, Twine FieldDescriptor) {
  auto PreReadOffset = OffsetPtr;
  auto result = RecordExtractor.getU16(&OffsetPtr);
  if (OffsetPtr == PreReadOffset)
    return createStringError(
        std::make_error_code(std::errc::executable_format_error),
        "Failed reading %s at offset %d.", FieldDescriptor, OffsetPtr);
  return result;
}

I'm not sure this would amount to much less code, but it would move boilerplate out of "the way" and I claim that readers (or future you or I) would be able to follow the core state transitions more easily.

Edit: You could even get away with defining it once as a template function, where you dispatch to specialized wrappers on RecordExtractor:

template<typename T>
auto extract(DataExtractor& extractor, uint32_t& OffsetPtr) -> T; 
// Can you declare with the auto syntax like this without defining?
// Does it even need to match the return type?
 
template<>
uint8_t extract<uint8_t>(DataExtractor& extractor, uint32_t& OffsetPtr) {
   return extractor.getU8(OffsetPtr);
}

...

llvm/lib/XRay/Trace.cpp
121–124 ↗(On Diff #159247)

This checks makes the other PreReadOffset error checks in this loop defensive coding at best, but for the moment dead code. They aren't hurting anything by being there, and are a defense against future changes, so that's OK if you really want.

This revision is now accepted and ready to land.Aug 6 2018, 1:19 AM

Thanks, I considered doing something like that, but it turns out a more principled refactoring makes it easier to reduce the requirement to wrap the data extractor API. It also turns out that we can make this simpler by extracting the "parsing" and validation logic out from the loading/reading logic, and operating at a higher level. I'm close to getting these parts re-written, where I may be able to do something like you're suggesting, but at a more fundamental level.

This revision was automatically updated to reflect the committed changes.