Some tests with binary IDs would fail with error: no profile can be merged. This is because raw profiles could have unaligned headers when emitting binary IDs. This means padding should be emitted after binary IDs are emitted to ensure everything else is aligned. This patch adds padding after each binary ID to ensure the next binary ID size is 8-byte aligned. This also adds extra checks to ensure we aren't reading corrupted data when printing binary IDs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/ProfileData/InstrProfReader.cpp | ||
---|---|---|
548 | The following code is still not robust to invalid BinaryIdLen values. | |
558 | What is the packing protocol? It seems wise to pad after *each* ID to make the next size field naturally-aligned, rather than having the second size field be misaligned if the first build ID length is not a multiple of 8. |
D110188 which originally added the padding was reverted. I'm merging that change into this since they're both tightly coupled. Rather than adding padding after all the build IDs, we instead add padding after *each* build ID such that the build ID sizes are naturally aligned.
llvm/lib/ProfileData/InstrProfReader.cpp | ||
---|---|---|
558 | Right now there's no alignment and the size just seems to be added after the previous build ID. I updated __llvm_write_build_ids to take this suggestion of adding padding after each ID. This should also prevent us from needing to add padding after all the build IDs if there's padding in between each. I'm wondering though if maybe adding padding after each ID would lead to some bloat compared to just adding padding once. |
llvm/lib/ProfileData/InstrProfReader.cpp | ||
---|---|---|
558 |
Oh, I may have misunderstood at first. In terms of packing, the only requirement seems to be that each header be 8-byte aligned (https://github.com/llvm/llvm-project/blob/dcadd64986b8a84dc244d4e7faa848fb4c18cea6/llvm/lib/ProfileData/InstrProfReader.cpp#L337). I believe all the other sections like counters, data, and names are aligned as a whole with padding coming after each section to make sure the next is aligned. So if we want to follow suite, we could stick to just padding after the binary IDs section unless padding in between IDs is more desireable. |
llvm/lib/ProfileData/InstrProfReader.cpp | ||
---|---|---|
558 | The reason that other sections are aligned is so that normal, aligned word access can be used on their fields. The same logic applies to the size field in the BinaryIds section. There is a) no reason to expect multiple IDs in raw profiles in practice and b) no reason to worry about a few bytes per build ID in the context of the size of the data overall. IMHO it is far more important to maintain universal invariants like arranging that fields with natural alignment requirements are all kept aligned in the data so that code doesn't need to use memcpy to access them safely. |
LGTM the only question is whether these tests shouldn't rather live in compiler-rt/test/profile?
I think llvm might be better since they're more for checking that the reader operates properly on some profdata rather than explicitly checking what the profdata is. That said, I do think there should be perhaps a compiler-rt test here for checking that the profile runtime generates the correct data. I'll add one shortly.
Added a compiler-rt test to check for padding. I'll leave it up for a bit in case others have comments on the test. Otherwise I'll commit this either by EOD or tomorrow.
The following code is still not robust to invalid BinaryIdLen values.
It needs to compare against Remaining (accounting for the size field just read) before using the value in arithmetic to avoid overflow risks. Computing a pointer from untrusted input and then comparing it to another pointer is never robust.