Download Raw Diff

Details

Reviewers

phosek
mcgrathr
gulfem

Commits

rGb9f547e8e511: [llvm][profile] Add padding after binary IDs

Summary

Some tests with binary IDs would fail with error: no profile can be merged. This is because raw profiles could have unaligned headers when emitting binary IDs. This means padding should be emitted after binary IDs are emitted to ensure everything else is aligned. This patch adds padding after each binary ID to ensure the next binary ID size is 8-byte aligned. This also adds extra checks to ensure we aren't reading corrupted data when printing binary IDs.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	160 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
	180 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-static-initializer.S
	170 ms	x64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-tls.S

Event Timeline

leonardchan created this revision.Sep 23 2021, 2:17 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptSep 23 2021, 2:17 PM

leonardchan requested review of this revision.Sep 23 2021, 2:17 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 2:17 PM

leonardchan updated this revision to Diff 374668.Sep 23 2021, 2:18 PM

Harbormaster completed remote builds in B125437: Diff 374668.Sep 23 2021, 2:47 PM

mcgrathr added inline comments.Sep 23 2021, 3:23 PM

llvm/lib/ProfileData/InstrProfReader.cpp
548	The following code is still not robust to invalid BinaryIdLen values. It needs to compare against Remaining (accounting for the size field just read) before using the value in arithmetic to avoid overflow risks. Computing a pointer from untrusted input and then comparing it to another pointer is never robust.
558	What is the packing protocol? It seems wise to pad after each ID to make the next size field naturally-aligned, rather than having the second size field be misaligned if the first build ID length is not a multiple of 8.

leonardchan mentioned this in D110188: [compiler-rt][profile] Add padding after binary IDs.Sep 23 2021, 3:32 PM

leonardchan updated this revision to Diff 374695.Sep 23 2021, 5:12 PM

leonardchan marked an inline comment as done.

leonardchan retitled this revision from [llvm][profile] Do not read padding when printing build IDs to [llvm][profile] Add padding after binary IDs.

leonardchan edited the summary of this revision. (Show Details)

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2021, 5:12 PM

Herald added a subscriber: Restricted Project. · View Herald Transcript

D110188 which originally added the padding was reverted. I'm merging that change into this since they're both tightly coupled. Rather than adding padding after all the build IDs, we instead add padding after *each* build ID such that the build ID sizes are naturally aligned.

llvm/lib/ProfileData/InstrProfReader.cpp
558	Right now there's no alignment and the size just seems to be added after the previous build ID. I updated `__llvm_write_build_ids` to take this suggestion of adding padding after each ID. This should also prevent us from needing to add padding after all the build IDs if there's padding in between each. I'm wondering though if maybe adding padding after each ID would lead to some bloat compared to just adding padding once.

leonardchan added inline comments.Sep 23 2021, 5:23 PM

llvm/lib/ProfileData/InstrProfReader.cpp
558	What is the packing protocol? Oh, I may have misunderstood at first. In terms of packing, the only requirement seems to be that each header be 8-byte aligned (https://github.com/llvm/llvm-project/blob/dcadd64986b8a84dc244d4e7faa848fb4c18cea6/llvm/lib/ProfileData/InstrProfReader.cpp#L337). I believe all the other sections like counters, data, and names are aligned as a whole with padding coming after each section to make sure the next is aligned. So if we want to follow suite, we could stick to just padding after the binary IDs section unless padding in between IDs is more desireable.

Harbormaster completed remote builds in B125454: Diff 374695.Sep 23 2021, 5:56 PM

mcgrathr added inline comments.Sep 23 2021, 6:37 PM

llvm/lib/ProfileData/InstrProfReader.cpp
558	The reason that other sections are aligned is so that normal, aligned word access can be used on their fields. The same logic applies to the size field in the BinaryIds section. There is a) no reason to expect multiple IDs in raw profiles in practice and b) no reason to worry about a few bytes per build ID in the context of the size of the data overall. IMHO it is far more important to maintain universal invariants like arranging that fields with natural alignment requirements are all kept aligned in the data so that code doesn't need to use memcpy to access them safely.

Added tests to generate corrupted headers that we check errors for.

LGTM the only question is whether these tests shouldn't rather live in compiler-rt/test/profile?

This revision is now accepted and ready to land.Sep 27 2021, 4:02 PM

Harbormaster completed remote builds in B125992: Diff 375424.Sep 27 2021, 4:06 PM

In D110365#3026182, @phosek wrote:

LGTM the only question is whether these tests shouldn't rather live in compiler-rt/test/profile?

I think llvm might be better since they're more for checking that the reader operates properly on some profdata rather than explicitly checking what the profdata is. That said, I do think there should be perhaps a compiler-rt test here for checking that the profile runtime generates the correct data. I'll add one shortly.

Added a compiler-rt test to check for padding. I'll leave it up for a bit in case others have comments on the test. Otherwise I'll commit this either by EOD or tomorrow.

Harbormaster completed remote builds in B126001: Diff 375434.Sep 27 2021, 5:00 PM

This revision was landed with ongoing or failed builds.Sep 28 2021, 11:51 AM

Closed by commit rGb9f547e8e511: [llvm][profile] Add padding after binary IDs (authored by leonardchan). · Explain Why

This revision was automatically updated to reflect the committed changes.

leonardchan added a commit: rGb9f547e8e511: [llvm][profile] Add padding after binary IDs.

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][profile] Add padding after binary IDs
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 374668

llvm/lib/ProfileData/InstrProfReader.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[llvm][profile] Add padding after binary IDsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 374668

llvm/lib/ProfileData/InstrProfReader.cpp

[llvm][profile] Add padding after binary IDs
ClosedPublic