This is an archive of the discontinued LLVM Phabricator instance.

[SampleFDO][NFC] Refactor SampleProfileLoad to reuse the code in CodeGen
ClosedPublic

Authored by xur on Feb 1 2021, 3:51 PM.

Details

Summary

This is split from patch https://reviews.llvm.org/D95698

Break SampleProfileLoader into to a base and a derived class.
Base class (SampleProfileLoaderBaseImpl) includes the common
code for IR and MachineIR samples loader.
It will be templatelized in the following patch.

Inline and Probe related code will remain in the derived class of
SampleProfileLoader and stays in SampleProfile.cpp.

We need to refactor some functions:
(1) getInstWeight() to enable the code sharing -- put the core into getInstWeightImpl().
(2) emitAnnotation() and propagateWeights() to carve out the code specific to SampleProfileLoader.
(3) make getInstWeight() and findFunctionSamples() virtual and override

in SampleProfileLoader as they need to access the fields in the derived class.

In a following patch, I will move the base class to a new header file in
llvm/include/llvm/ProfileData

After that, I will make the base class a template. Some function will be specialized in
IR and MachineIR, respectively.

Diff Detail

Event Timeline

xur created this revision.Feb 1 2021, 3:51 PM
xur requested review of this revision.Feb 1 2021, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2021, 3:51 PM

Instead of having inlining keyword, it is better to move the out of line definitions as inline bodies in the class.

Is the intention to move them (inline bodies) into the header? For those functions that are not, leave them as being defined out of line. Otherwise moving them into class body is a preparation for the next step (moving to header).

I meant the functions move the header file. In the header file, we can
achieve the same by adding an inline keyword to the out of class function
definition (vs in class definition). Out of class definitions have better
readability because some of the functions are really large.

Ok.

question: it seems SampleCoverageTracker implementation is independent of IR/MIR level. Do we need to templatize it ?

For SampleProfileLoader class, is there any part of of the code that is independent of the IR level which can be extracted into a base class (to avoid pulling a lot of definitions into header)?

wmi added a comment.Feb 2 2021, 4:59 PM

I meant the functions move the header file. In the header file, we can achieve the same by adding an inline keyword to the out of class function definition (vs in class definition). Out of class definitions have better readability because some of the functions are really large.

Have you considered to define SampleProfileImpl.cc in addition to SampleProfileImpl.h so some large function can be put in .cc file and don't need inline keyword? But they need explicit instantiation.

Good to know that SampleCoverageTracker can remain as it is (not templatized). Is it possible to not move it into the header? There are a few advantages:

  1. reduce the size of the patch and merge churns;
  2. a little better in compile time as the functions don't need to be built when building FlowSensitiveSampleLoader.

There are a few ways -- only move the class body to the header (so that inline keyword is not needed) or introduce some wrapper/helper function (taking tracker reference) with forward declaration of TrackerType.

xur updated this revision to Diff 321963.Feb 6 2021, 1:53 PM
xur retitled this revision from [SampleFDO][NFC] Add inline keywords to member functions that will be in the template header file to [SampleFDO][NFC] Refacrot SampleProfileLoad to reuse the code in MachineIR.
xur edited the summary of this revision. (Show Details)

Here is the another way to refactor the SampleProfileLoader:

I tried Wei's idea of putting all the function bodies into a cpp file and explicit instantiate for IR and MachineIR.
That idea did not work: I place a new cpp file into llvm/lib/ProfileData. But that creates a circular dependency b/w ProfileData and CodeGen.
Current implementation in sample loader requires a complete definition of machine IR classes.
It might be possible to break the dependency by rewriting some of the code. But that would be too many changes. The benefit does not justify the cost -- we still have duplicated object for IR and MachineIR and we just combine them into one object.

(sorry if this comment is off-base, I haven't quite understood/followed the reviews, but some of the discussion here seems potentially problematic, so I'll say this... )

If code is being moved into a header to avoid layering/circular dependency violations, that's not OK. Since the layering violation would still exist (from a software design perspective, though not one a linker would diagnose/trip over). Google's build system, for instance, does enforce header layering & we use/support that to avoid creating header-only dependencies.

Another way of thinking about it: We shouldn't create a situation where it's non-trivial to move code between headers and implementation files. That should always be simple/trivial to do.

I think what Rong means is that

  1. moving code to a header is for code sharing between IR and codeGenPass
  2. Wei's idea of avoiding the header move can lead to circular dependencies.

In other words, doing 1) is not to achieve circular dependency avoidance,
but just using 2) as an alternative way is not working.

I think what Rong means is that

  1. moving code to a header is for code sharing between IR and codeGenPass
  2. Wei's idea of avoiding the header move can lead to circular dependencies.

In other words, doing 1) is not to achieve circular dependency avoidance,
but just using 2) as an alternative way is not working.

I'm not quite following - if there's no .cc file where the implementation of the header could live, if we wanted to write some of it out of line, that seems like there is a circular dependency at the source/design level even if it doesn't result in linker errors. It sounds like (2) is showing that (1) has a hidden circular dependency.

That idea did not work: I place a new cpp file into llvm/lib/ProfileData. But that creates a circular dependency b/w ProfileData and CodeGen.

Oh, I think I understand now - these are (going to be) templates, and so long as they remain templates that are implicitly instantiated, there's no dependency problem - but if we tried to explicitly instantiate the CodeGen version inside ProfileData, that would create a circular dependency. *nod*

Yes, using out-of-class-but-still marked 'inline' definitions (once they're templates they don't even need the inline keyword) may help readability by making the class easier to read through without long function bodies interleaved. (the only other option would be to put the implementation in a whole separate file and have some place in each library that includes both interface and implementation headers and performs the explicit instantiation there (in the appropriate libraries) - but that's probably overkill unless the header is included in many different places in each library, such that the explicit instantiation could significantly reduce compile-time/object size, etc)

Right right. -thanks for helping me understand. Sorry for the noise.

The goal here is to have Transforms/IPO/

no it is not noise -- the questions you raised are very good :)

xur added a comment.Feb 8 2021, 10:13 AM

Thanks David for explaining this for me. Sorry for not making it clear in
the original email.

davidxl accepted this revision.Feb 8 2021, 11:13 AM

lgtm. Please wait for other reviewer's feedback too.

This revision is now accepted and ready to land.Feb 8 2021, 11:13 AM
wmi added inline comments.Feb 9 2021, 10:20 PM
llvm/lib/Transforms/IPO/SampleProfile.cpp
474–475

The usages of ProfileIsCS are all in SampleProfileLoader member functions, so it doesn't need to be defined in SampleProfileLoaderImpl?

488

Will FSAFDO need ProfAccForSymsInList to know which function is new? If not, we can leave it in SampleLoader class.

846–847

Why not remove this part and just call SampleProfileLoaderBaseImpl::getInstWeight below instead of SampleProfileLoaderBaseImpl::getInstWeightImpl?

llvm/test/Transforms/SampleProfile/inline-coverage.ll
34 ↗(On Diff #321963)

Why does this CHECK need to be moved lower?

xur added a comment.Feb 10 2021, 9:45 AM

Thanks, Wei!

llvm/lib/Transforms/IPO/SampleProfile.cpp
474–475

Yes. ProfileIsCS is not longer used in base class after we break some of the methods.
Thanks for catching this.

488

Is this only used in inline?
I'm not sure if this flag will affect the branch probability so I put into base to be safe, and I have it in FSAFDO.

If this only intends for using in inline, I can move it to base.

846–847

If we move this part down and combine the call to getInstrWeightImpl, the semantic is not the same.
I did not look deep into this, but instead of returning error_code(), we could get 0 (and calls to findCalleeFunctionSamples().

llvm/test/Transforms/SampleProfile/inline-coverage.ll
34 ↗(On Diff #321963)

I did not dig into this. The calls to remark are in the same order. But diagnostic prints different order. I don't think this matter too much. Isn't it.

wmi added inline comments.Feb 10 2021, 10:47 AM
llvm/lib/Transforms/IPO/SampleProfile.cpp
488

It is used to set function EntryCount. For old symbol (which show up in the binary where profile is collected) without profile, we set its EntryCount to 0 instead of -1, indicating it is very cold instead of unknown. It is also used in callsiteIsHot function. It won't affect branch probablity, so I think FSAFDO may not need it.

846–847

It won't change the semantic. findCalleeFunctionSamples calls findFunctionSamples inside of it too. If findFunctionSamples returns nullptr for a call inst, findCalleeFunctionSamples will return nullptr as well, and the "return 0" branch won't be taken.

llvm/test/Transforms/SampleProfile/inline-coverage.ll
34 ↗(On Diff #321963)

It could expose some hidden behavior change somewhere so may be good to find out.

xur updated this revision to Diff 322772.Feb 10 2021, 11:40 AM

Integrated Wei's review comments.
The test case turns out to be my problem. Also fixed it.

wmi accepted this revision.Feb 10 2021, 12:09 PM

LGTM.

xur retitled this revision from [SampleFDO][NFC] Refacrot SampleProfileLoad to reuse the code in MachineIR to [SampleFDO][NFC] Refactor SampleProfileLoad to reuse the code in CodeGen.Feb 10 2021, 2:12 PM
xur edited the summary of this revision. (Show Details)