SampleProfileLoader pass do need to happen after some early cleanup passes so that inlining can happen correctly inside the SampleProfileLoader pass.
Details
Diff Detail
- Build Status
Buildable 9041 Build 9041: arc lint + arc unit
Event Timeline
lib/Passes/PassBuilder.cpp | ||
---|---|---|
548 | What is the downside of always doing InstCombine here? | |
552 | Also needs comment about why before globalopt. | |
854–858 | Can we simply invoke PGO ICP always from within buildModuleSimplificationPipeline if we're not in the ThinLTO PreLink phase? I.e. regardless of Sample or Instrumented PGO? It seems cleaner to do it in the same place if possible. |
lib/Passes/PassBuilder.cpp | ||
---|---|---|
558 | PGO ICP is called here for both the ThinLTO backend and the non-ThinLTO sample PGO case, right? If so, perhaps the comment should be modified slightly to "We perform early indirect call promotion here, before globalopt. This is important for the ThinLTO backend phase because otherwise imported..." | |
test/Other/new-pm-thinlto-defaults.ll | ||
56 | Where did this one go? |
test/Other/new-pm-thinlto-defaults.ll | ||
---|---|---|
56 | This change is irrelevant. I simply did it because I'm at it. I changed to remove PGOIndirectCallPromotion from default ThinLTO pipeline if PGO is not enabled. |
lib/Passes/PassBuilder.cpp | ||
---|---|---|
552–553 | Above, you say that this is because SampleProfileLoaderPass needs to be here because it inlines calls. Here you talk about debug info freshness... Which is it? The debug info freshness makes plenty of sense to me. If SampleProfileLoaderPass is *doing inlining*, I find that deeply surprising and confusing. I'm not suggesting that we need to change the design of this right now, but I think the comment needs to be *really* clear because lots of other people will be surprised by this and I'd suggest a FIXME to revisit how this is structured. (Regardless of whether inlining makes sense here, I think it should *definitely* be separated from what appears to be a pass that merely annotates the IR from a profile.) |
lib/Passes/PassBuilder.cpp | ||
---|---|---|
552–553 | Should probably add a reference to the autofdo CGO 2016 paper which has detailed description of how autofdo's context senstive profile data is organized and the profile driven inline and branch profile annotation design. |
lib/Passes/PassBuilder.cpp | ||
---|---|---|
552–553 | Yes, we do invoke InlineFunction directly in SampleProfileLoader. There is a great many details about how we end up with this design in out cgo2016 paper. And I could explain to you in person if you like. The fundamental requirement is that we need to make the IR similar with the binary that is used for profiling, in order to annotate sample profile accurately. To get to that goal, we have choose to first inline inside the SampleProfileLoader before annotation. During the inlining we need to make the inline decision by interacting with SampleProfile interface. By doing this inside the SampleProfileLoader, we lock the use of SampleProfile interface within this file, instead of making it escape into the inliner. Alternatively, we could have a separate inliner pass to do this job. But that inliner pass need to read the SampleProfile too. So if we do that, we need to read the sample profile twice. I'm open to discuss about other design alternatives too. |
Thanks for the reviews.
Chandler, please let me know if you have more comments about this patch. If possible, I'd like to have committed before today's integration.
Thanks,
Dehao
ping...
I think Chandler's concerns for this patch has been addressed. We can continue discussion on the AutoFDO design in a separate thread. Will submit the patch by EOD if no more concerns have been raised.
Sorry for delay,
Yes, all I was looking for was a comment so that a reader would better understand what the sample profile loading is doing (IE, transforming the IR). Not trying to suggest that we can or should change behavior here, just getting it more clearly documented. Thanks for that! =D
What is the downside of always doing InstCombine here?