This is an archive of the discontinued LLVM Phabricator instance.

[InstrProf] Tighten a check for malformed data records in raw profiles
ClosedPublic

Authored by vsk on Aug 29 2019, 3:41 PM.

Details

Summary

The check needs to validate a counter offset before performing pointer
arithmetic with the (potentially corrupt) offset.

Found by UBSan's pointer overflow check.

rdar://54843625

Diff Detail

Repository
rL LLVM

Event Timeline

vsk created this revision.Aug 29 2019, 3:41 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2019, 3:41 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
vsk planned changes to this revision.Aug 29 2019, 3:44 PM

This mistakenly drops validation for NumCounters...

vsk updated this revision to Diff 217997.Aug 29 2019, 4:09 PM
vsk edited the summary of this revision. (Show Details)
  • Bring back validation of NumCounters.
davidxl added inline comments.Aug 29 2019, 7:31 PM
llvm/include/llvm/ProfileData/InstrProfReader.h
271 ↗(On Diff #217997)

add a documentation " Returns the double word offset of a counter to the start of the counter segment'

llvm/lib/ProfileData/InstrProfReader.cpp
419 ↗(On Diff #217997)

how does the corruption happen?

423 ↗(On Diff #217997)

should it assert to be zero?

Dor1s accepted this revision.Aug 29 2019, 9:24 PM

Pardon my ignorance, but what does rdar://54843625 mean? I guess it's not http://openradar.appspot.com/54843625 ? Is it something I can access? :)

llvm/include/llvm/ProfileData/InstrProfReader.h
271 ↗(On Diff #217997)

is uint64_t a "double word"?

This revision is now accepted and ready to land.Aug 29 2019, 9:24 PM
davidxl added inline comments.Aug 29 2019, 9:43 PM
llvm/lib/ProfileData/InstrProfReader.cpp
423 ↗(On Diff #217997)

Discard this comment. I read it wrong.

vsk planned changes to this revision.Aug 30 2019, 8:55 AM

Thanks for your feedback. I will address review comments and continue investigating on Tuesday.

llvm/lib/ProfileData/InstrProfReader.cpp
419 ↗(On Diff #217997)

Not yet sure, this needs more investigation.

The corruption happens on the WIP lldb code coverage bot [1]. The binary is debugserver. The failure reason is that a CounterPtr field within a InstrProfData record is incorrect. I am not sure how a merge failure could cause this, because the on-line profile merging does not mutate the CounterPtr field. It's possible that debugserver was miscompiled.

My debugging notes:

  • Symtab contains _ZN17DNBBreakpointList31FindBreakpointsThatOverlapRange, must be debugserver
  • std::__1::allocator<char>::allocator() has the bad counter pointer
  • (lldb) p RawCounts.data()
  • (const unsigned long long *) $1 = 0x00000001022b14e8
  • (lldb) p NumCounters
  • (uint32_t) $2 = 1
  • (lldb) p NamesStartAsCounter
  • (const unsigned long long *) $3 = 0x0000000100e693d0
  • TestLldbGdbServer.py times out, then the test harness kills debugserver. Possible cause?

[1] http://lab.llvm.org:8080/green/view/Experimental/job/coverage/20/consoleText

vsk added a comment.Aug 30 2019, 9:07 AM

Pardon my ignorance, but what does rdar://54843625 mean? I guess it's not http://openradar.appspot.com/54843625 ? Is it something I can access? :)

This is an Apple-internal bug identifier that tracks this change.

vsk marked an inline comment as done.Sep 3 2019, 9:12 AM
vsk added a subscriber: pete.
vsk added inline comments.
llvm/lib/ProfileData/InstrProfReader.cpp
419 ↗(On Diff #217997)

+ @pete for dyld expertise -- for context, this is about malformed profile data seen when generating code coverage, and I suspect dyld might play a role.

Some more debugging notes:

I modified the raw profile reader to create an InstrProfRecord with raw-counts = {999999, 999999} when it finds a malformed CounterPtr. Feeding in the malformed profile, I see that the first 162 records are well-formed, but that the last 6662 records have an incorrect CounterPtr field. Here is the last valid record and the first invalid record:

std::__1::__compressed_pair_elem<std::__1::allocator<char>, 1, true>::__compressed_pair_elem():
  Hash: 0x0000000000000000
  Counters: 1
  Function count: 95500
  Block counts: []
std::__1::allocator<char>::allocator():
  Hash: 0x0000000000000000
  Counters: 2
  Function count: 999999
  Block counts: [999999]

Substituting fake '999999' counts for the raw counts in malformed records is enough to allow the profile to be parsed successfully. Note that all of the names are available, which means that the names section (which follows the data section) was written correctly. Further, printing out the computed CounterOffset field for malformed records, I see that they are not random, but instead look monotonic:

CounterOffset: 28c286  NumCounters: 1
CounterOffset: 28c287  NumCounters: 3
CounterOffset: 28c28a  NumCounters: 1
CounterOffset: 28c28b  NumCounters: 1
CounterOffset: 28c28c  NumCounters: 4

I cannot reproduce this issue in local testing with debugserver. Based on all of this, a miscompile seems unlikely, as does a write failure.

It looks as if the CounterPtr fields within the DATA segment were all rebased by the dynamic loader (dyld). On Darwin, it appears that the data in llvm_prf_cnts is emitted linkonce_odr. Perhaps the debugserver process dlopen()'d an instrumented library which contained conflicting definitions of some functions: dyld would then have to pick one out of multiple candidate definitions, and then rewrite pointers to the old definitions. @pete does that sound plausible?

pete added inline comments.Sep 3 2019, 9:20 AM
llvm/lib/ProfileData/InstrProfReader.cpp
419 ↗(On Diff #217997)

It's possible yeah. I don't remember a relationship between link once ODR and weak definition coalescing, but the two could be used together if you wanted.

You can see whether these symbols would be coalesced by dyld if they are "weak binds" with (for example on libc++.1.dylib)

llvm-objdump -macho -weak-bind /usr/lib/libc++.1.dylib

davidxl accepted this revision.Sep 3 2019, 9:36 AM

lgtm

vsk marked an inline comment as done.Sep 3 2019, 9:55 AM
vsk added inline comments.
llvm/lib/ProfileData/InstrProfReader.cpp
419 ↗(On Diff #217997)

Thanks Pete. I don't think dyld is involved here. The function records with "rebased" CounterPtr fields aren't in the weak bind table.

This revision was not accepted when it landed; it landed in state Changes Planned.Sep 3 2019, 3:22 PM
This revision was automatically updated to reflect the committed changes.
w2yehia added a subscriber: w2yehia.Oct 9 2019, 4:23 AM

Hi @vsk can you provide a description/script on how to recreate the malformed-ptr-to-counter-array.profraw file when someone is changing the profile layout (for example by adding new value profiling kinds).
I'm thinking something like llvm/test/tools/llvm-profdata/raw-two-profiles.test would be nice
Thanks.

vsk added a comment.Oct 9 2019, 11:19 AM

Hi @vsk can you provide a description/script on how to recreate the malformed-ptr-to-counter-array.profraw file when someone is changing the profile layout (for example by adding new value profiling kinds).
I'm thinking something like llvm/test/tools/llvm-profdata/raw-two-profiles.test would be nice
Thanks.

Hi @w2yehia, I think the test needs to rewritten. PTAL at D68718.