Page MenuHomePhabricator

[Profile] Handle invalid profile data

Authored by aeubanks on Jun 10 2021, 11:05 AM.



This mostly follows LLVM's InstrProfReader.cpp error handling.
Previously, attempting to merge corrupted profile data would result in
crashes. See

Diff Detail

Event Timeline

aeubanks requested review of this revision.Jun 10 2021, 11:05 AM
aeubanks created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 11:05 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
aeubanks updated this revision to Diff 351222.Jun 10 2021, 11:08 AM


trying to come up with a test case

rnk added a comment.Jun 10 2021, 11:41 AM

I think it's worth the time to carefully refactor this code so we don't end up in a similar situation, maybe in the value profile data.


This is a bit of an ABI backwards compatibility concern, but I think adding a return type should be safe enough.


The joys of not having RAII...


I'm having a hard time reasoning about this code. My favorite code pattern for safely reading binary data in C++ is the "consume" pattern, illustrated in the consume* APIs in RecordSerialization.h:

Unfortunately, I don't think it translate to C very well. I guess the main recommendation I have would be to keep everything in bytes for as long as possible, and delay casting until after bounds checking.


I think we should check SrcValueProfData + SrcValueProfData->TotalSize against ProfileSize while we're here. The NVK value seems like it's not really needed, it just checks for the absence of value data.

aeubanks updated this revision to Diff 351258.Jun 10 2021, 2:05 PM

check SrcValueProfData

aeubanks updated this revision to Diff 351267.Jun 10 2021, 2:33 PM
aeubanks marked an inline comment as done.

do more pointer comparisons
still working on a test

rnk added a comment.Jun 10 2021, 2:40 PM

Sorry about all the nits, bounds checking is all about attention to detail.


size_t is unsigned, so this check is probably always false, and will probably trigger a warning in some build configurations.


size_t is unsigned, so underflow is possible here. The < 0 check below doesn't catch anything. I think that's fine: any value in the range [0, MaxNumCounters) is OK.


This is really nitty, but I think the overflow-safe way to do this addition and comparison is to use subtraction. We know both values are less than MaxNumCounters, so either subtraction is safe: NC >= MaxNumCounters - SrcCountersOffset

Also, should these be >=? I think the final condition should be:

if (SrcCountersOffset >= MaxNumCounters ||
    NC >= MaxNumCounters - SrcCountersOffset)
  return 1;

Hopefully I did that right.


nit: this variable is used once in the loop, maybe inline the addition into the loop, save a line (SrcCountersStart[SrcCountersOffset + I]). It makes it clearer that we are notionally indexing into the SrcCountersStart array, which is of length MaxNumCounters.


clang-tidy suggests a pointer cast here. Honestly, we'll have fewer casts if we change SrcValueProfData to be a const char * and cast when passing into VPMergeHook.

sorry, I think I updated while you were reviewing and some of your comments are no longer relevant

aeubanks updated this revision to Diff 351282.Jun 10 2021, 3:32 PM

add test
make SrcValueProfData a char*

there are a couple ways to handle the invalid profile

  1. remove it and bail
  2. remove it and overwrite with the current profile
  3. leave it and don't write anything more
  1. seems the least useful, 3) seems the most useful for diagnosing issues, 2) is the most "keep going in the face of failures"

perhaps we should go with 3)?

rnk accepted this revision.Jun 10 2021, 3:51 PM
rnk added subscribers: xur, vsk, davidxl.


This fixes an urgent crash bug, so let's go ahead and land it.

+@davidxl @vsk @xur, feel free to provide post commit review.


I'm not sure this will reliably trigger the corruption check on big endian platforms. I'd recommend memsetting 8 bytes to 0xAB to reduce the likelihood of buildbot emails.

I also worry about 32-bit buildbots, but I think it's fine.

This revision is now accepted and ready to land.Jun 10 2021, 3:51 PM
aeubanks updated this revision to Diff 351286.Jun 10 2021, 3:54 PM

set all 8 bytes in test

rnk added a comment.Jun 10 2021, 3:55 PM

I agree, it's a choice between 2 and 3. 3 would be consistent with what happens when the existing profraw file is from another binary, right? Maybe do that? It risks losing coverage data, and means the corrupt file is persisted. I guess that's best, since it will cause llvm-profdata merge to fail later.

aeubanks updated this revision to Diff 351288.Jun 10 2021, 4:01 PM

keep old profile around, add test for that

This revision was landed with ongoing or failed builds.Jun 10 2021, 4:12 PM
This revision was automatically updated to reflect the committed changes.
davidxl added inline comments.Jun 10 2021, 4:19 PM

Is the second check redundant ? It is already checked outside the loop.

aeubanks added inline comments.Jun 10 2021, 4:25 PM

yes, fixed in b73742bc