This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Sort function offset table to speed up profile loading.
ClosedPublic

Authored by hoy on Aug 31 2021, 5:17 PM.

Details

Summary

With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time.

Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time.

Diff Detail

Event Timeline

hoy created this revision.Aug 31 2021, 5:17 PM
hoy requested review of this revision.Aug 31 2021, 5:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2021, 5:17 PM
hoy edited the summary of this revision. (Show Details)Aug 31 2021, 5:18 PM
hoy added reviewers: wenlei, wlei, wmi.
hoy updated this revision to Diff 369823.Aug 31 2021, 5:23 PM
hoy edited the summary of this revision. (Show Details)

Updating D109036: [CSSPGO] Sort function offset table to speed up profile loading.

This effectively cut down compile time by 20% for one of our large services.

Does this gets us back to where we were? or is there still small regression?

llvm/lib/ProfileData/SampleProfReader.cpp
731–732

FName -> FContext

742

assert on the order comparing to OrderedFuncOffsets->back()?

llvm/lib/ProfileData/SampleProfWriter.cpp
44

Why do we need a flag? Can we simplify this by always order the function offset table for CSSPGO profile?

hoy edited the summary of this revision. (Show Details)Aug 31 2021, 6:21 PM
hoy added a comment.Aug 31 2021, 9:13 PM

This effectively cut down compile time by 20% for one of our large services.

Does this gets us back to where we were? or is there still small regression?

We still have a small regression compared to non-CS build, but we are getting closer.

hoy added inline comments.Aug 31 2021, 9:22 PM
llvm/lib/ProfileData/SampleProfReader.cpp
731–732

Fixed.

742

The order may not be strictly the lexical order of SampleContext for md5 profiles. Md5 profiles may be converted from a string-based profile and the order of the func offset table in the md5 profile will be based on the original string order. However, the caller-callee context order, i.e, the trie tree path order, is still fulfilled. That order is what we want to load profile based on.

llvm/lib/ProfileData/SampleProfWriter.cpp
44

This is mostly for debugging and verification of the unordered path that is kept for being compatible with legacy unordered profiles.

hoy updated this revision to Diff 369839.Aug 31 2021, 9:22 PM
hoy edited the summary of this revision. (Show Details)

Addressing Wenlei's comment.

wenlei added inline comments.Aug 31 2021, 10:39 PM
llvm/lib/ProfileData/SampleProfWriter.cpp
44

I don't think we have a backward-compatibility issue, there's no need to keep the legacy unordered profile. Can we remove it? Testing just need to make sure this ordering change does not affecting importing.

hoy added inline comments.Aug 31 2021, 11:00 PM
llvm/lib/ProfileData/SampleProfWriter.cpp
44

Makes sense. Removed the switch.

hoy updated this revision to Diff 369850.Aug 31 2021, 11:00 PM

Updating D109036: [CSSPGO] Sort function offset table to speed up profile loading.

wenlei added inline comments.Aug 31 2021, 11:08 PM
llvm/lib/ProfileData/SampleProfReader.cpp
815

When suggesting the removal of ordered switch, I was thinking about always have ordered offset table for extbin CSSPGO profile. The simplification in implementation is the main reason I think removing that switch is better.

The idea is that we don't need unnecessary flexibility, so we can make assumptions about CSSPGO profile and avoid the complexity of handling the combinations of different input. With that, we either assert SecFlagOrdered is always on for extbin CSSPGO profile, or remove SecFlagOrdered completely and make that an implicit assumption for extbin CSSPGO profile.

wenlei added inline comments.Aug 31 2021, 11:11 PM
llvm/lib/ProfileData/SampleProfWriter.cpp
176

In the comment here, it would be good to point out the need for sorting due to how importing works in prelink. Basically part of the comment from readFuncProfiles.

hoy marked an inline comment as done.Aug 31 2021, 11:19 PM
hoy added inline comments.
llvm/lib/ProfileData/SampleProfReader.cpp
815

Sounds good. This makes it cleaner.

llvm/lib/ProfileData/SampleProfWriter.cpp
176

Comment added.

hoy updated this revision to Diff 369855.Aug 31 2021, 11:19 PM
hoy marked an inline comment as done.

Addressing Wenlei's comment.

wenlei accepted this revision.Aug 31 2021, 11:34 PM

lgtm, thanks.

This revision is now accepted and ready to land.Aug 31 2021, 11:34 PM
hoy updated this revision to Diff 369956.Sep 1 2021, 9:32 AM

Rebasing

wmi accepted this revision.Sep 1 2021, 11:37 AM

LGTM.

llvm/lib/ProfileData/SampleProfReader.cpp
790

can it be a const?

791

const?

hoy added inline comments.Sep 1 2021, 12:07 PM
llvm/lib/ProfileData/SampleProfReader.cpp
790

Yes, const should work.

hoy updated this revision to Diff 370018.Sep 1 2021, 12:08 PM

Addressing Wei's comment.

This revision was landed with ongoing or failed builds.Sep 1 2021, 12:18 PM
This revision was automatically updated to reflect the committed changes.