Page MenuHomePhabricator

[llvm] Machine Learned policy for inlining -Oz
Needs ReviewPublic

Authored by mtrofin on Apr 8 2020, 2:04 PM.

Details

Summary

[llvm] Machine Learned policy for inlining -Oz

This change is a placeholder, used to reference the submitted changes related to this work

Authors: Mircea Trofin, Yundi Qian, Eugene Brevdo

Please refer to the RFC for more details.

The change introduces a machine learned policy for -Oz inlining as a build-time opt-in.

There are two opt-in modes: 'release' and 'development'.

'Release' is the 'day-to-day compiler use'. A pre-trained ML model is compiled ahead of time into a native library, which is then used to make inlining
decisions. Determinism is ensured because the model is unchanged as the compiler runs.

Just like with hand-written heuristics, there is no formal guarantee that the policy performs well, only empirical results on benchmarks, based on which we derive a belief of general applicability.

'Development' is the mode used to train a model. Training happens offline, through reinforcement learning, by providing a training algorithm with traces of decisions made by the compiler on a training corpus (IR modules).

This initial change introduces all the 'release' mode components, together with a reference, pre-trained model, as well as key training mode components. More training mode components will be provided in subsequent changes. The reference model was trained on a Google-internal corpus of ~25K IR modules, and appears to generalize well to clang, opt, SPEC2006 (which are code bases sharing little to nothing with the training corpus), as well as some internal binaries. We observe as much as 6% size reduction (opt), and generally ~3%, when compared to clang -Oz.

To enable either/both modes, reference the buildbot script, as well as the buildbot definition.

TL;DR;

  • Release mode.

Get the tensorflow pip package, find out where it is installed:

python3 -m pip install --upgrade pip
python3 -m pip install --user tf_nightly==2.3.0.dev20200528
export TF_PIP=$(sudo -u buildbot python3 -m pip show tf_nightly | grep Location | cut -d ' ' -f 2)

Now setup the build by passing cmake -DTENSORFLOW_AOT_PATH=${TF_PIP}/tensorflow

  • Development

Get the tensorflow C API library from https://www.tensorflow.org/install/lang_c. Install it somewhere, i.e. $TF_API. Then, pass -DTENSORFLOW_API_PATH=${TF_API} to cmake.

Note that TF_API points to the directory containing the include and lib directories provided by the install package.

The Release and Development mode may co-exist.

To opt-in to one of the modes (including 'default' - which is the manual heuristic), pass the (-mllvm) -enable-ml-advisor={default|release|development}. Note that, for development, you need to also pass in the path to the size estimator model, and/or path to the saved model currently under training - see lib/Analysis/DevelopmentModeInlineAdvisor.cpp.

Diff Detail

Unit TestsFailed

TimeTest
180 mslinux > Polly.ScopInfo::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; opt -polly-process-unprofitable -polly-remarks-minimal -polly-use-llvm-names -polly-import-jscop-dir=/mnt/disks/ssd0/agent/llvm-project/polly/test/ScopInfo -polly-codegen-verify -basic-aa -scoped-noalias -tbaa -polly-scops -analyze < /mnt/disks/ssd0/agent/llvm-project/polly/test/ScopInfo/memcpy-raw-source.ll

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
dblaikie added inline comments.Apr 8 2020, 3:58 PM
llvm/lib/Analysis/ML/IRToNativeSizeLearning.cpp
31–32 ↗(On Diff #256104)

LLVM style guide encourages using namespace/global-scope static for file-local functions, and only using anonymous namespaces for types: https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

llvm/lib/Analysis/ML/InliningAdvisor.cpp
480 ↗(On Diff #256104)

Generally if you're passing a lambda that'll be only run immediately (or even within the scope it's declared, but not immediately) - I'd suggest using [&] for capture, treating the lambda scope the same as other scopes (like for/while/if/etc) - that implicitly have access to all outer variables

492–494 ↗(On Diff #256104)

Generally LLVM code doesn't use {} on single line blocks.

llvm/lib/Analysis/ML/InliningModelFeatureMaps.h
16–32 ↗(On Diff #256104)

Enumerators need a prefix: https://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly (or use an enum class, that forces using the enum name as a prefix)

34–54 ↗(On Diff #256104)

static things in headers are problematic (since they define new/distinct variables in every translation unit that includes the header) - you can use inline accessor functions, or maybe constexpr variables... (I forget/don't know the specific linkage rules there) or declared in the header and defined in a cpp file

llvm/lib/Analysis/ML/InliningModelRunnerProduction.h
56–63 ↗(On Diff #256104)

Ah, this is more of the "two implementations of the same entities" discussed earlier - yeah, I'd be in favor of avoiding this as it seems like it'll make the code quite a bit harder to work with (since it'll only compile in one mode at a time, so changes to these APIs will need different builds to validate that both implementatios are correctly updated, etc), concerns around testing, etc

116 ↗(On Diff #256104)

Prefer = default where possible

llvm/lib/Analysis/ML/InliningModelRunnerTraining.h
24–45 ↗(On Diff #256104)

More static things in headers which will need to move to an implementation file

172–173 ↗(On Diff #256104)

Looks like an extra set of () here that could be removed

sstefan1 resigned from this revision.Apr 9 2020, 1:38 AM
sstefan1 added a subscriber: sstefan1.
lkail added a subscriber: lkail.Apr 9 2020, 2:23 AM
simoll added a subscriber: simoll.Apr 9 2020, 7:00 AM
mtrofin updated this revision to Diff 257139.Apr 13 2020, 4:17 PM
mtrofin marked 14 inline comments as done.

Small changes

mtrofin added inline comments.Apr 13 2020, 4:20 PM
llvm/lib/Analysis/InliningAdvisor.cpp
32–34 ↗(On Diff #256104)

If I understand it correctly, the idea would be to:

  • detect optional dependencies (or be hinted where they are).
  • if any/all of them are available, compile dependent code
  • use flags to pick the right implementation, if more than one is available. In extreme, if the compiler is compiled with both 'rel' and 'dev' modes, one would have 3 flag possibilities for the inliner: 'manual heuristic', 'ml-rel mode', and 'ml-dev mode'

I can see how this can result in cleaner code - probably all those ifdefs get removed, for example.

I like it!

llvm/lib/Analysis/ML/InliningModelFeatureMaps.h
34–54 ↗(On Diff #256104)

Ack - this sorts itself out with your earlier suggestion of building depending on availability of dependencies.

llvm/lib/Analysis/ML/InliningModelRunnerProduction.h
56–63 ↗(On Diff #256104)

Ack - addressing with the 'build everything if deps available'

mtrofin updated this revision to Diff 257146.Apr 13 2020, 4:35 PM

renamed FeatureList

mtrofin updated this revision to Diff 258465.Apr 17 2020, 6:16 PM
mtrofin marked an inline comment as done.

Incorporated feedback to implement the 2 modes as buildable depending on presence of dependencies, meaning they may both be present.

mtrofin updated this revision to Diff 259105.Apr 21 2020, 2:21 PM

Update/integrate

Incorporated feedback to implement the 2 modes as buildable depending on presence of dependencies, meaning they may both be present.

Hey - thanks for having a go at what I was describing, but this isn't /quite/ what I was thinking and still involves quite a few different bits of macro usage across different situations (some conditional functionality implemented in the build files, some of it in source files - the two different implementations (trivial/null/empty and non-trivial) #ifdef'd out, etc)

At a high level:

  • I think the preprocessor macros provided by the build system should be named after the libraries that have been detected, not the functionality those libraries are intended to provide (rather than LLVM_USE_ML_POLICY_DEV, it'd be something more like ZLIB (LLVM_ENABLE_TF, LLVM_ENABLE_TF_LITE or something?))
  • I think it'd be clearer if all the conditional was done in the source/header files, not mixed between that and the build files.
  • If possible, I'd like to avoid having two different implementations of multiple classes like that - what I'd picture was one factory function that says "get the thing" and that implementation would check the runtime parameter from the user and call unconditionally provided (but conditionally implemented) functions that retrieve implementations of an interface. There would be a default/null implementation, if needed (if it's easier to have that than to have the callers check for presence/absence) - a separate class, rather than a #ifdef implementation of another class.
llvm/lib/Analysis/ML/InliningAdvisor.cpp
47–48 ↗(On Diff #258465)

LLVM doesn't usually use top-level const like this, FWIW - I'd suggest avoiding it since it does things like disable assignability, etc.

(also you don't necessarily have to write that ctor - you can use braced init to construct such a thing memberwise without having a ctor written)

117 ↗(On Diff #258465)

Drop the () around F here (instead delete F;).

But perhaps more generally: could DeletedFunctions contain unique_ptrs, so there's less manual memory management here? (instead this function could be DeletedFunctions.clear();)

469 ↗(On Diff #258465)

Generally I'd encourage using [&] for captures in any lambda that doesn't escape its scope, doubly so if it doesn't escape the statement it's defined in.

llvm/lib/Analysis/ML/Rel/InliningModelRunnerRel.cpp
24–32 ↗(On Diff #258465)

Externally visible definitions should go in headers (unless this is a definition of a pimpl or the like) - otherwise this type should be defined in an anonymous namespace (so it can't collide with other implementation details in other files)

https://llvm.org/docs/CodingStandards.html#anonymous-namespaces

mtrofin updated this revision to Diff 260689.Apr 28 2020, 10:15 AM
mtrofin marked 6 inline comments as done.

feedback

mtrofin added inline comments.Apr 28 2020, 10:16 AM
llvm/lib/Analysis/InliningAdvisor.cpp
32–34 ↗(On Diff #256104)

Updated along these lines.

llvm/lib/Analysis/ML/InliningAdvisor.cpp
47–48 ↗(On Diff #258465)

Removed the const and changed to explicit readonly accessors. I do want to underscore this is an immutable object - easier to read / maintain. If it's OK style-wise, I prefer the explicit ctor for readability, too.

117 ↗(On Diff #258465)

I need to do lookups in the DeletedFunctions set. That would require some gymnastics with std::unique_ptr around find, I think. Not sure it's worth the complexity in this case - is there an alternative?

dblaikie added inline comments.May 2 2020, 4:51 PM
llvm/lib/Analysis/ML/InliningAdvisor.cpp
136 ↗(On Diff #260689)

This'll be implicitly deleted, FWIW (similarly the assignment operators will be implicitly deleted or omitted as necessary)

173–178 ↗(On Diff #260689)

again, top level const like this is a bit quirky/uncommon in LLVM, but not outright banned

Even if the struct isn't mutable - your vector is, and the equivalent to mutation could be achieved by removing an element from the vector and adding a new one with the desired values, so make const members aren't really providing the invariants you'd like them to.

209–217 ↗(On Diff #260689)

Similarly - not sure if making these members const carries its weight - essentially it's to ensure these members aren't modified during "recordInlining"? (I don't see any other code that could be accessing these? But then I guess I wonder why aren't they private if they're only accessed by that member? Perhaps I'm missing something)

272–274 ↗(On Diff #260689)

LLVM usually skips braces on single line (some people go so far as to omit braces on single statement (even if that statement has to be wrapped over several lines)) blocks - and several single line blocks in this code skip braces - so this one (& maybe others in this patch? I haven't looked) seems inconsistent.

465 ↗(On Diff #260689)

This is often written as just "if (x.count(y))" but up to you. (about 30:1 ratio in favor of that compared to > 0 or != 0)

47–48 ↗(On Diff #258465)

But it's not really immutable (well, even less now, without const members) since it can be reassigned (OldCSI = CallSiteInfo(NewCB, NewH))

& what's the readability benefit you have in mind in terms of having a ctor?

It looks like this is only constructed in one place, and everything else refers to it via const ref - so I'm not sure it'd present a significant loss of readability if it were a plain struct?

(this isn't a "you must not do it this way", but a bit of a discussion to understand if this/things like this are worthwhile compared to many other use cases that have simple structs, etc)

117 ↗(On Diff #258465)

Ah, right. Yeah - maybe leave a FIXME to fix this once we're using C++20, which supports heterogenous lookup.

Or might be worth using SmallPtrSet (https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallptrset-h) here anyway (which doesn't & probably can't support unique_ptr elements anyway, just an unfortunate reality) for reduced memory usage/better cache locality.

mtrofin marked 2 inline comments as done.May 2 2020, 5:43 PM
mtrofin added inline comments.
llvm/lib/Analysis/ML/InliningAdvisor.cpp
173–178 ↗(On Diff #260689)

I see value in const helping a reader understand where and when mutation may come. Here, it says "when constructed, then can't change". So it helps the reader see the whole object as one value, where parts don't shift.

47–48 ↗(On Diff #258465)

Maybe I should just pass the params directly, come to think of it.

What I mean re. readability is reading stuff like callingAFunction(param1, param2..., {value1, value2},...). If I read this code, I have no idea what that tuple means, I have to go to the definition and maybe implementation of callingAFunction. Instead, callingAFunction(..., CallSiteInfo(value1, value2)) adds a bit more info. It's not much extra, but it has no perf cost, so why not. The concern is also wrt what happens later, if the type is constructed in more places.

Similarly, a naked struct would get me chasing more for "where is this field set". The getter-only design may help a reader understand this is meant to be a set-once value.

Anyway, if I just pass the scalar params, all this becomes moot.

dblaikie added inline comments.May 2 2020, 7:29 PM
llvm/lib/Analysis/ML/InliningAdvisor.cpp
173–178 ↗(On Diff #260689)

I think the inconsistency with the rest of the codebase would raise more questions than it'd answer - but it's sufficiently locally scoped it's not a huge deal. Mostly trying to discuss this not just in this case, but in the broader context of coding practices in LLVM in general.

47–48 ↗(On Diff #258465)

Maybe I should just pass the params directly, come to think of it.

If it's just two parameters, to one function/function call - yeah, doesn't seem like it's adding a lot of value.

Sometimes if a function ends up with too many arguments

What I mean re. readability is reading stuff like callingAFunction(param1, param2..., {value1, value2},...). If I read this code, I have no idea what that tuple means, I have to go to the definition and maybe implementation of callingAFunction. Instead, callingAFunction(..., CallSiteInfo(value1, value2)) adds a bit more info. It's not much extra, but it has no perf cost, so why not. The concern is also wrt what happens later, if the type is constructed in more places.

Ah, thanks for explaining - callers would still be able to write CallSiteInfo{x, y} with a struct, if the 'CallSiteInfo' helps describe things. I'd tend to leave it up to the caller to decide what's most readable? (though I agree there are probably a few places in LLVM that get a bit over-enthusiastic about {}). Also, as it happens - even with the struct as-written, you could still write the less-legible version. You'd have to mark the ctor "explicit" to disallow bare {x, y} construction. (which would still allow CallSiteInfo{x, y}, FWIW).

Similarly, a naked struct would get me chasing more for "where is this field set". The getter-only design may help a reader understand this is meant to be a set-once value.

*nod* if it's only passed across the boundary into a function to group parameters, hopefully those scopes aren't too complicated/find where it's created/used/set/etc.

(admittedly, shorter functions helps here - big long functions risk code mutating parameters in ways that might be surprising, etc - so the benefit of using "const" for function parameters ("void f1(const int i) { /* really long body where it's nice to know 'i' isn't mutated */ }") but even that wouldn't need the struct to be intrinsically immutable)

There's some value, from my perspective, in having types act like basic (what some folks call "vocabulary") types as much as possible - C++ goes to quite some lengths to make it possible, and it reduces surprises/curiosities, at least I find, when types work that way. If I see a non-copyable type, for instance, I worry that someone's keeping pointers to it in some "interesting" way that means moving/copying it would be problematic.

Anyway, if I just pass the scalar params, all this becomes moot.

Sure enough, but hopefully useful design discussions.

mtrofin updated this revision to Diff 262497.May 6 2020, 4:36 PM

keeping it in sync

asl added a subscriber: asl.Jun 17 2020, 4:09 AM
asl added inline comments.
llvm/cmake/modules/TensorFlowCompile.cmake
15 ↗(On Diff #269637)

I believe you need to provide --target_triple here as well. Otherwise you'd end with ELF object on MacOS

phosek added a subscriber: phosek.Jun 24 2020, 1:22 AM
phosek added inline comments.
llvm/lib/Analysis/ML/InlineModelFeatureMaps.h
14 ↗(On Diff #269637)

This is missing <string> include which Clang complains about.

mtrofin marked 2 inline comments as done.Jun 24 2020, 8:03 AM
mtrofin added inline comments.
llvm/lib/Analysis/ML/InlineModelFeatureMaps.h
14 ↗(On Diff #269637)

thanks I'll patch it correctly in D81515 and then here

phosek added inline comments.Jun 24 2020, 11:11 AM
llvm/lib/Analysis/ML/CMakeLists.txt
19 ↗(On Diff #269637)

I think line 19 and line 25 should be swapped: TensorFlow C API should be used with LLVM_HAVE_TF_API and LLVMMLPoliciesXLA with LLVM_HAS_TF_AOT. I got link errors with the current version, but everything works after swapping those two lines.

mtrofin marked 2 inline comments as done.Jun 24 2020, 11:32 AM
mtrofin added inline comments.
llvm/lib/Analysis/ML/CMakeLists.txt
19 ↗(On Diff #269637)

Correct, yes. I'll make sure these are correct once I rebase.

mtrofin updated this revision to Diff 279314.Jul 20 2020, 11:31 AM

All work in initial patch was upstreamed, clearing changes in this patch, so we may use it just for tracking.

mtrofin updated this revision to Diff 279316.Jul 20 2020, 11:33 AM

(trying again - clearing this change)

mtrofin edited the summary of this revision. (Show Details)Jul 20 2020, 11:48 AM
mtrofin updated this revision to Diff 281016.Jul 27 2020, 12:19 PM

Added RFC doc as a md file, for reference (doesn't need to be landed)

uenoku added a comment.EditedAug 13 2020, 7:33 AM

Hi, I want to use my TensorFlow model in LLVM with MLModelRunner, like ReleaseModeModelRunner.cpp but I have difficulty compiling the model, especially in the TensorFlow AOT complication of the model.
It would be really helpful if you point me out the code which defines the computation graph and the signature. Is such a script already landed or published somewhere?

Hi, I want to use my TensorFlow model in LLVM with MLModelRunner, like ReleaseModeModelRunner.cpp but I have difficulty compiling the model, especially in the TensorFlow AOT complication of the model.
It would be really helpful if you point me out the code which defines the computation graph and the signature. Is such a script already landed or published somewhere?

We are about to publish the training algorithm, I'll update here when that's done - meanwhile, a possible way to unblock would be to inspect the expected inputs and outputs of the graph from the protobuf itself - it is in textual form - or using tensorboard, I believe it has a facility for this.

Note that the only interface between LLVM and the model is the input and output signatures, the internal structure of the model can differ.

uenoku added a comment.EditedAug 13 2020, 8:27 AM

Hi, I want to use my TensorFlow model in LLVM with MLModelRunner, like ReleaseModeModelRunner.cpp but I have difficulty compiling the model, especially in the TensorFlow AOT complication of the model.
It would be really helpful if you point me out the code which defines the computation graph and the signature. Is such a script already landed or published somewhere?

We are about to publish the training algorithm, I'll update here when that's done - meanwhile, a possible way to unblock would be to inspect the expected inputs and outputs of the graph from the protobuf itself - it is in textual form - or using tensorboard, I believe it has a facility for this.

Note that the only interface between LLVM and the model is the input and output signatures, the internal structure of the model can differ.

Great, I'm looking forward to it:) Thanks for the advice. I'll try again.

mtrofin edited the summary of this revision. (Show Details)Sep 16 2020, 10:46 AM
mtrofin edited the summary of this revision. (Show Details)