This is an archive of the discontinued LLVM Phabricator instance.

[llvm][profile] Add padding after binary IDs
ClosedPublic

Authored by leonardchan on Sep 23 2021, 2:17 PM.

Details

Summary

Some tests with binary IDs would fail with error: no profile can be merged. This is because raw profiles could have unaligned headers when emitting binary IDs. This means padding should be emitted after binary IDs are emitted to ensure everything else is aligned. This patch adds padding after each binary ID to ensure the next binary ID size is 8-byte aligned. This also adds extra checks to ensure we aren't reading corrupted data when printing binary IDs.

Diff Detail

Event Timeline

leonardchan created this revision.Sep 23 2021, 2:17 PM
leonardchan requested review of this revision.Sep 23 2021, 2:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 2:17 PM
mcgrathr added inline comments.Sep 23 2021, 3:23 PM
llvm/lib/ProfileData/InstrProfReader.cpp
556

The following code is still not robust to invalid BinaryIdLen values.
It needs to compare against Remaining (accounting for the size field just read) before using the value in arithmetic to avoid overflow risks. Computing a pointer from untrusted input and then comparing it to another pointer is never robust.

565–567

What is the packing protocol? It seems wise to pad after *each* ID to make the next size field naturally-aligned, rather than having the second size field be misaligned if the first build ID length is not a multiple of 8.

leonardchan marked an inline comment as done.
leonardchan retitled this revision from [llvm][profile] Do not read padding when printing build IDs to [llvm][profile] Add padding after binary IDs.
leonardchan edited the summary of this revision. (Show Details)
Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 5:12 PM
Herald added a subscriber: Restricted Project. · View Herald Transcript

D110188 which originally added the padding was reverted. I'm merging that change into this since they're both tightly coupled. Rather than adding padding after all the build IDs, we instead add padding after *each* build ID such that the build ID sizes are naturally aligned.

llvm/lib/ProfileData/InstrProfReader.cpp
565–567

Right now there's no alignment and the size just seems to be added after the previous build ID. I updated __llvm_write_build_ids to take this suggestion of adding padding after each ID. This should also prevent us from needing to add padding after all the build IDs if there's padding in between each.

I'm wondering though if maybe adding padding after each ID would lead to some bloat compared to just adding padding once.

leonardchan added inline comments.Sep 23 2021, 5:23 PM
llvm/lib/ProfileData/InstrProfReader.cpp
565–567

What is the packing protocol?

Oh, I may have misunderstood at first. In terms of packing, the only requirement seems to be that each header be 8-byte aligned (https://github.com/llvm/llvm-project/blob/dcadd64986b8a84dc244d4e7faa848fb4c18cea6/llvm/lib/ProfileData/InstrProfReader.cpp#L337). I believe all the other sections like counters, data, and names are aligned as a whole with padding coming after each section to make sure the next is aligned. So if we want to follow suite, we could stick to just padding after the binary IDs section unless padding in between IDs is more desireable.

mcgrathr added inline comments.Sep 23 2021, 6:37 PM
llvm/lib/ProfileData/InstrProfReader.cpp
565–567

The reason that other sections are aligned is so that normal, aligned word access can be used on their fields. The same logic applies to the size field in the BinaryIds section. There is a) no reason to expect multiple IDs in raw profiles in practice and b) no reason to worry about a few bytes per build ID in the context of the size of the data overall. IMHO it is far more important to maintain universal invariants like arranging that fields with natural alignment requirements are all kept aligned in the data so that code doesn't need to use memcpy to access them safely.

Added tests to generate corrupted headers that we check errors for.

phosek accepted this revision.Sep 27 2021, 4:02 PM

LGTM the only question is whether these tests shouldn't rather live in compiler-rt/test/profile?

This revision is now accepted and ready to land.Sep 27 2021, 4:02 PM

LGTM the only question is whether these tests shouldn't rather live in compiler-rt/test/profile?

I think llvm might be better since they're more for checking that the reader operates properly on some profdata rather than explicitly checking what the profdata is. That said, I do think there should be perhaps a compiler-rt test here for checking that the profile runtime generates the correct data. I'll add one shortly.

leonardchan updated this revision to Diff 375434.EditedSep 27 2021, 4:42 PM

Added a compiler-rt test to check for padding. I'll leave it up for a bit in case others have comments on the test. Otherwise I'll commit this either by EOD or tomorrow.

This revision was landed with ongoing or failed builds.Sep 28 2021, 11:51 AM
This revision was automatically updated to reflect the committed changes.