This is an archive of the discontinued LLVM Phabricator instance.

[CSSPGO] Enable pseudo probe instrumentation in O0 mode.
ClosedPublic

Authored by hoy on Sep 9 2021, 11:42 AM.

Details

Summary

Pseudo probe instrumentation was missing from O0 build. It is needed in cases where some source files are built in O0 while the others are built in optimize mode.

Diff Detail

Event Timeline

hoy created this revision.Sep 9 2021, 11:42 AM
hoy requested review of this revision.Sep 9 2021, 11:42 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 9 2021, 11:42 AM
hoy added a subscriber: spupyrev.
wlei accepted this revision.Sep 9 2021, 12:18 PM

LGTM, thanks!

This revision is now accepted and ready to land.Sep 9 2021, 12:18 PM
wenlei added a comment.Sep 9 2021, 2:01 PM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

hoy added a comment.Sep 9 2021, 2:20 PM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

wmi added a comment.Sep 9 2021, 8:14 PM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm will not run SampleProfileLoader pass for O0 module. Why there is the complain?

hoy added a comment.Sep 9 2021, 10:31 PM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm will not run SampleProfileLoader pass for O0 module. Why there is the complain?

Good question. It could happen in lto postlink which by default optimizes in -O2 mode. More specifically, with the following command, both cc1 and lld will run in default mode, which is -O0 for cc1 and -O2 for lld.

clang -flto 1.cpp -v -fuse-ld=lld

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm will not run SampleProfileLoader pass for O0 module. Why there is the complain?

We've encountered this exception while building an old version of gcc (8.3) with llvm-12. As Hongtao pointed out, they sometimes try to build targets with "-flto" but without "-O2/O3". Surely, the right "fix" would be to modify the gcc build scripts (which is possibly already done in later gcc releases); this workaround is to make sure we can also process "incorrect" builds.

wmi added a comment.Sep 10 2021, 9:24 AM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm will not run SampleProfileLoader pass for O0 module. Why there is the complain?

Good question. It could happen in lto postlink which by default optimizes in -O2 mode. More specifically, with the following command, both cc1 and lld will run in default mode, which is -O0 for cc1 and -O2 for lld.

clang -flto 1.cpp -v -fuse-ld=lld

I see. It seems a problem only exposed in monolithic lto. Could you add some comment before the change in PassBuilder.cpp?

hoy added a comment.Sep 10 2021, 9:52 AM

The change makes sense given instr PGO also happens for O0. But practically, if a file is being built with O0, do we care about its profile given we're not really optimizing it anyways? Functions from O0 modules are not supposed to be inlined into O1+ modules either.

We probably don't care about performance for O0 build. The change is for consistency, also makes the compiler happy which otherwise will complain about "Pseudo-probe-based profile requires SampleProfileProbePass" for O0 modules that don't have probes.

The complain message is emitted in SampleProfileLoader::doInitialization. llvm will not run SampleProfileLoader pass for O0 module. Why there is the complain?

Good question. It could happen in lto postlink which by default optimizes in -O2 mode. More specifically, with the following command, both cc1 and lld will run in default mode, which is -O0 for cc1 and -O2 for lld.

clang -flto 1.cpp -v -fuse-ld=lld

I see. It seems a problem only exposed in monolithic lto. Could you add some comment before the change in PassBuilder.cpp?

Sounds good, comment added. Actually this can also happen in thinlto. Having probe inserted in O0 mode allows users to switch between arbitrary setups.

hoy updated this revision to Diff 371942.Sep 10 2021, 9:53 AM

Updating D109531: [CSSPGO] Enable pseudo probe instrumentation in O0 mode.

wmi accepted this revision.Sep 10 2021, 10:13 AM

LGTM.

More specifically, with the following command, both cc1 and lld will run in default mode, which is -O0 for cc1 and -O2 for lld.

clang -flto 1.cpp -v -fuse-ld=lld

I'm wondering is this the expected behavior or an oversight of pass pipeline setup? In what scenario would a O0 prelink + O2 postlink make sense?

Btw, just double check - the O2 here you mentioned is not LLD's O2 for linking, but actually postlink LLVM O2, right?

llvm/lib/Passes/PassBuilder.cpp
1930

Loading a sample profile in the postlink will require pseudo probe instrumentation in the prelink.

Even with this change, is it still possible that prelink compile for some module actually doesn't have -fpseudo-probe-for-profiling, and it's on for LTO postlink? We could contrive such case, and it could happen in reality too, right? Would we have the same problem when trying to load profile for functions from modules without pseudo-probe in prelink?

hoy added inline comments.Sep 13 2021, 9:07 AM
llvm/lib/Passes/PassBuilder.cpp
1930

Yes, it could happen. The compiler will stop working too with the current implementation. We could change the error reporting to be a warning to make that pass. It is an error because we want to remind user if that's intentional.

I think it's mostly user's responsibility to be clear if pseudo probe instrumentation is needed or not, especially when passing linker flags separately. The change being made here is to ensure it works when all flags are passed via CXX_FLAGS, such as

clang -flto 1.cpp -v -fuse-ld=lld -fpseudo-probe-for-profiling -fprofile-sample-use=....

wenlei accepted this revision.Sep 14 2021, 4:52 PM

lgtm, thanks.

This revision was landed with ongoing or failed builds.Sep 14 2021, 6:13 PM
This revision was automatically updated to reflect the committed changes.