This is an archive of the discontinued LLVM Phabricator instance.

Move the SampleProfileLoader right after EarlyFPM.
ClosedPublic

Authored by danielcdh on Aug 4 2017, 11:35 AM.

Details

Summary

SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.

Event Timeline

danielcdh created this revision.Aug 4 2017, 11:35 AM
tejohnson added inline comments.Aug 4 2017, 11:55 AM
lib/Passes/PassBuilder.cpp
548

What is the downside of always doing InstCombine here?

555

Also needs comment about why before globalopt.

858–862

Can we simply invoke PGO ICP always from within buildModuleSimplificationPipeline if we're not in the ThinLTO PreLink phase? I.e. regardless of Sample or Instrumented PGO? It seems cleaner to do it in the same place if possible.

danielcdh added inline comments.Aug 4 2017, 12:38 PM
lib/Passes/PassBuilder.cpp
548

Increase compile time?

Additionally, I'd like to make this patch NFC for other compilation models.

555

comments added, also added test to cover that.

858–862

Yes, but I'd like to make this patch NFC for other modes such as instrumentation based PGO. Added FIXME

tejohnson added inline comments.Aug 4 2017, 12:52 PM
lib/Passes/PassBuilder.cpp
561

PGO ICP is called here for both the ThinLTO backend and the non-ThinLTO sample PGO case, right? If so, perhaps the comment should be modified slightly to "We perform early indirect call promotion here, before globalopt. This is important for the ThinLTO backend phase because otherwise imported..."

test/Other/new-pm-thinlto-defaults.ll
56

Where did this one go?

danielcdh marked an inline comment as done.Aug 4 2017, 12:58 PM
danielcdh added inline comments.
test/Other/new-pm-thinlto-defaults.ll
56

This change is irrelevant. I simply did it because I'm at it.

I changed to remove PGOIndirectCallPromotion from default ThinLTO pipeline if PGO is not enabled.

danielcdh updated this revision to Diff 109806.Aug 4 2017, 12:59 PM

update comment

This revision is now accepted and ready to land.Aug 4 2017, 1:15 PM
chandlerc added inline comments.Aug 4 2017, 1:25 PM
lib/Passes/PassBuilder.cpp
555–556

Above, you say that this is because SampleProfileLoaderPass needs to be here because it inlines calls. Here you talk about debug info freshness... Which is it?

The debug info freshness makes plenty of sense to me. If SampleProfileLoaderPass is *doing inlining*, I find that deeply surprising and confusing. I'm not suggesting that we need to change the design of this right now, but I think the comment needs to be *really* clear because lots of other people will be surprised by this and I'd suggest a FIXME to revisit how this is structured. (Regardless of whether inlining makes sense here, I think it should *definitely* be separated from what appears to be a pass that merely annotates the IR from a profile.)

davidxl added inline comments.Aug 4 2017, 1:36 PM
lib/Passes/PassBuilder.cpp
555–556

Should probably add a reference to the autofdo CGO 2016 paper which has detailed description of how autofdo's context senstive profile data is organized and the profile driven inline and branch profile annotation design.

danielcdh added inline comments.Aug 4 2017, 1:50 PM
lib/Passes/PassBuilder.cpp
555–556

Yes, we do invoke InlineFunction directly in SampleProfileLoader. There is a great many details about how we end up with this design in out cgo2016 paper. And I could explain to you in person if you like.

The fundamental requirement is that we need to make the IR similar with the binary that is used for profiling, in order to annotate sample profile accurately.

To get to that goal, we have choose to first inline inside the SampleProfileLoader before annotation. During the inlining we need to make the inline decision by interacting with SampleProfile interface. By doing this inside the SampleProfileLoader, we lock the use of SampleProfile interface within this file, instead of making it escape into the inliner.

Alternatively, we could have a separate inliner pass to do this job. But that inliner pass need to read the SampleProfile too. So if we do that, we need to read the sample profile twice.

I'm open to discuss about other design alternatives too.

danielcdh updated this revision to Diff 109819.Aug 4 2017, 1:57 PM

add cgo2016 paper reference

Thanks for the reviews.

Chandler, please let me know if you have more comments about this patch. If possible, I'd like to have committed before today's integration.

Thanks,
Dehao

ping...

I think Chandler's concerns for this patch has been addressed. We can continue discussion on the AutoFDO design in a separate thread. Will submit the patch by EOD if no more concerns have been raised.

chandlerc accepted this revision.Aug 7 2017, 11:17 AM

ping...

I think Chandler's concerns for this patch has been addressed. We can continue discussion on the AutoFDO design in a separate thread. Will submit the patch by EOD if no more concerns have been raised.

Sorry for delay,

Yes, all I was looking for was a comment so that a reader would better understand what the sample profile loading is doing (IE, transforming the IR). Not trying to suggest that we can or should change behavior here, just getting it more clearly documented. Thanks for that! =D

danielcdh closed this revision.Aug 7 2017, 1:24 PM