This is an archive of the discontinued LLVM Phabricator instance.

[llvm-profdata] Make profile reader behavior consistent when encountering malformed profiles
AbandonedPublic

Authored by huangjd on Apr 12 2023, 7:10 PM.

Details

Summary

Changes to the profile reader behavior (previously the behavior is bugged when encountering such scenario);

  1. For extbinary format, when reading a second name table, the previous one is cleared (previously causes array out of bound)
  1. Check if profile mixes MD5 and nonMD5 functions

Diff Detail

Event Timeline

huangjd created this revision.Apr 12 2023, 7:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 7:10 PM
huangjd requested review of this revision.Apr 12 2023, 7:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2023, 7:10 PM
davidxl added inline comments.Apr 13 2023, 10:30 AM
llvm/lib/ProfileData/SampleProfReader.cpp
745–746

why changing FixedLengthMD5 into a local variable?

1314

Should the code be guarded with #ifdef NDEBUG?

llvm/test/tools/llvm-profdata/sample-nametable.test
2

Are the test cases (profile data) ill-formed data or valid data? If valid, is it possible to check in the text format and convert into binary format in the testing itself?

5

Should this succeed instead? I assume without the fix it will crash?

8

Are there any expected strings?

huangjd added inline comments.Apr 14 2023, 4:49 PM
llvm/lib/ProfileData/SampleProfReader.cpp
745–746

The profile reader does not need to keep this info. It only matters while reading the section. In the future we plan to support having multiple name tables possibly with different flags set, so the reader should minimize keeping any states as a result of reading a section.

huangjd added inline comments.Apr 14 2023, 5:19 PM
llvm/test/tools/llvm-profdata/sample-nametable.test
2

They are all intentionally constructed malformed data and cannot be represented in text form, since the principle is not to trust user input, so the compiler should safeguard against such cases

5

The design decision (which I have previously asked for clarification and got no response https://discourse.llvm.org/t/llvm-profdata-what-behavior-was-intended-if-multiple-name-tables-are-present-in-an-extensible-sample-profile/69145) here is to make the reader clear a previously read name table when reading a new one. In this case the second name table only have one entry, so the function sample, being read after reading the second name table, has a string index of 1, which is out of bounds, so it should be rejected. Before the patch, it appends the second name table on the first one, while changing the fixed lenght MD5 base address to the second table, and index (out of bounds) into the sample's section, reading garbage data instead of crashing

8

This one is similar to the previous test, also out of bounds when indexing into the second table. In the original code it crashes because the fixed length MD5 base address is not set correctly

huangjd updated this revision to Diff 513825.Apr 14 2023, 5:32 PM

Add ifndef guard to debug block

huangjd marked an inline comment as done.Apr 14 2023, 5:40 PM
wenlei added inline comments.Apr 25 2023, 5:16 PM
llvm/lib/ProfileData/SampleProfReader.cpp
743–744

Why is assertion removed? Is it no longer the case?

wenlei added inline comments.Apr 25 2023, 5:35 PM
llvm/lib/ProfileData/SampleProfReader.cpp
557–558

What is the bogus End used for? Defend against call site that didn't check error and tries to continue reading?

huangjd added inline comments.Apr 26 2023, 12:14 PM
llvm/lib/ProfileData/SampleProfReader.cpp
557–558

Even though the section header of the name table should appear before the section header of the function offset table or profiles, the actual section data can be placed in arbitrary order. If the name table section is placed after the profile data section, when a string is read, End still points to the end of the profile data section which is before Data here, and readUnencodedNumber fails even if the name table index is valid.

Since we already checked a fixed length MD5 name table contains actually enough entries when reading the name table, we don't need to check for bounds here again, so setting End to max int so that readUnencodedNumber never fails bounds check

743–744

In release mode assert is ignored, and test case 2 causes a bug with a flag of FixedLengthMD5 & ~UseMD5 . I fixed the logic so that even in this case the logic is correct, so the check is no longer needed.

wenlei added inline comments.Apr 27 2023, 6:38 PM
llvm/lib/ProfileData/SampleProfReader.cpp
557–558

I think it's better to keep the invariant that End >= Data everywhere instead trying to get around it. To me, the problem here is the caller of readUnencodedNumber only set Data, but not End which leave the two in an inconsistent state. Can we also set/save/restore End there, and avoid the trick here to bypass the check?

const uint8_t *SavedData = Data;
Data = MD5NameMemStart + ((*Idx) * sizeof(uint64_t));
auto FID = readUnencodedNumber<uint64_t>();
743–744

Perhaps we should fix the logic/testcase that produced the situation FixedLengthMD5 is true while UseMD5 is false, instead of removing the assert and tolerate such cases.

In this function below, it also assumed IsMD5 needs to be true if FixedLengthMD5. I think it's reasonable to keep the assertion.

SampleProfileReaderExtBinaryBase::readNameTableSec(bool IsMD5,
                                                   bool FixedLengthMD5)
  if (!IsMD5)
    return SampleProfileReaderBinary::readNameTable();
...
  if (FixedLengthMD5)
huangjd abandoned this revision.May 5 2023, 5:25 PM