Page MenuHomePhabricator

[llvm] Native size estimator for training -Oz inliner
ClosedPublic

Authored by mtrofin on Jun 29 2020, 3:31 PM.

Details

Summary

This is an experimental ML-based native size estimator, necessary for
computing partial rewards during -Oz inliner policy training. Data
extraction for model training will be provided in a separate patch.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Diff Detail

Event Timeline

mtrofin created this revision.Jun 29 2020, 3:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2020, 3:31 PM
mtrofin updated this revision to Diff 274269.Jun 29 2020, 3:38 PM

removed duplicate file, one more test

A high level question: is partial reward based training something needed in the long run (as compared with total reward based pipeline)?

A high level question: is partial reward based training something needed in the long run (as compared with total reward based pipeline)?

We're still evaluating the two, and evaluating improvements to the partial reward calculation. The good thing is that, other than the partial reward calculation, the compiler components are otherwise the same.

Thanks for working on this. Glad we are making progress. Some comments below, mostly style, nothing major.

llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h
36

I would prefer to remove the TF prefix (in the generic code parts). Since we only have a single model evaluator class we could add using ModelEvaluator = TFModelEvaluator here. We then document it saying that new evaluators might be added in which case ModelEvaluator becomes an interface implemented by different classes.

llvm/include/llvm/Analysis/Utils/TFUtils.h
21

No undocumented top-level functions please. If this is tied to the class below, make it a documented static member.

30

Same comment as above. Do we need these to populate the llvm namespace? Either way, we need documentation.

41

Is "serve" a keyword for TF? If not I would recommend changing it. Could you please use llvm types if possible? The strings don't seem to be necessary, StringRef should do fine. For sure for the const char. const SmallVectorImpl should be preferred over std::vector. If you think a vector of strings is safer for the future, that is fine with me.

60

Documentation.

Also below.

70

SmallVector's above if possible.

llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
18

Nit: I think these go after the llvm includes.

45

Now that I see this, we might want to consider adding ModelEvaluator right away with some default implementation and deriving from it.

60

Can we move this to the top, just below the DEBUG_TYPE. That is where people (=I) look for command line options.

66

We do not have an existing function for this? Surprising.

121

I guess this is basically fixed? maybe a std::array would do too.

159

What is the difference between the opcode and the ID here? I mean why not use I.getOpcode() instead of this function?

168

Are these functions (here and above) in global namespace and with external linkage? If possible, extend the anonymous namespace or make them static.

davidxl added inline comments.Jul 5 2020, 10:44 PM
llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
102

Is it possible to avoid using hardcoded opcodes?

175

Still called for every function .

llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp
98

why is the result 0?

mtrofin updated this revision to Diff 275743.Jul 6 2020, 9:42 AM
mtrofin marked 23 inline comments as done.

feedback

llvm/include/llvm/Analysis/InlineSizeEstimatorAnalysis.h
36

Happy to rename to ModelEvaluator, but I would shy away from documenting how this may pan out if we wanted to support other evaluators. The issue would with the API typing. Currently it is tied to TF. If we wanted to generalize later over a few other evaluators, we'd need to figure out common abstractions and so on - time at which the name of the base type would be the least disruptive of the changes, I think.

Note also that this code isn't that generic, in that its compilation is tied to the presence of the tensorflow C API library - the only generic part of it is that it's reused by both the size estimator in this patch, and the development mode InlineAdvisor (next patch).

Back to the original feedback, happy to rename to ModelEvaluator (i.e. not even do "using", just rename); or we can keep it "TF" because it's tied to tensorflow anyway (APIs and implementation)

wdyt?

llvm/include/llvm/Analysis/Utils/TFUtils.h
41

"serve" is the typical tag used in tensorflow for the graph that's ready to be used ("served").

Re. strings - the usecase we'll have in the 'development' mode InlineAdvisor will require string concatenation, and I think at that point StringRef would loose its benefit.

The vectors will be larger (the development mode currently has 11 features, and we're looking at more) - would SmallVectorImpl still make sense?

70

ack - addressing in the previous comment

llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
45

See my previous observation about the API design.

60

OK - I flipped the ifdef, so this bubbled up, does that work?

66

Not that I can tell - see also TargetLoweringBase.cpp InstructionOpcodes enum, and Instruction.h - the OtherOps enum. I would hesitate using OtherOps because it may evolve differently.

102

Not easily, and the benefit would be unclear.

The pairs were generated through analysis of the data sets used for training this model. *If* we ended up down this path (i.e. feature pairs), they would be maintained in the same way - starting from analysis of data sets, not starting from IR. Using indices is just simpler in that case.

121

Yes, but (unless I'm missing something) I'd have to specify the "N" (the size). This way, I let the vector's size calculate other values (like FeatureCount)

mtrofin updated this revision to Diff 276619.Jul 8 2020, 7:12 PM

removed the lit unnecessary (yet) changes, and added a name for the analysis in the registry

davidxl added inline comments.Jul 9 2020, 10:18 AM
llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
218

To avoid this translation, the Features can be a wrapper of the tensor array -- with each field accessed with a symbolic index.

264

Can we make getFunctionFeatures directly return the filled tensor -- or at least provide a wrapper? There is no need to expose the TF details with the inline sequence here.

269

Code from line 268 to 271 can probably be wrapped in a single wrapper function to hide TF details including Tensor delete

mtrofin updated this revision to Diff 276868.Jul 9 2020, 4:26 PM
mtrofin marked 5 inline comments as done.

feedback

llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
264

I'm planning on reusing getFunctionFeatures for the other part of this functionality - extracting them to provide a training data set for the ir2native model.

269

Ya, and it looks nicer - thanks!

mtrofin marked 2 inline comments as done.Jul 9 2020, 4:27 PM
mtrofin added inline comments.
llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp
98

It's not, it's greater than 0.

MaskRay added a subscriber: MaskRay.Jul 9 2020, 4:33 PM

One point I want to raise: the model files are larger than 100KiB. Doing this once and keeping it stable for, say, 6 months is a probably an acceptable pace. If folks keep iterating on the model and checking in other model files as a result in a more regular basis, I'd be more wary as I am not sure other folks may accept this. llvm/llvm-test-suite might be suitable place if you want to add large model files.

mtrofin marked an inline comment as done.Jul 9 2020, 4:42 PM

One point I want to raise: the model files are larger than 100KiB. Doing this once and keeping it stable for, say, 6 months is a probably an acceptable pace. If folks keep iterating on the model and checking in other model files as a result in a more regular basis, I'd be more wary as I am not sure other folks may accept this. llvm/llvm-test-suite might be suitable place if you want to add large model files.

That makes sense - we won't want to churn these other than for large new changes, so that should be "rarely" - but I'll take a look at the llvm-test-suite as a next step, too.

davidxl added inline comments.Jul 10 2020, 9:05 AM
llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
271

can this call be folded in to getFunctionFeatures? or this interface is expected to be used in other places?

mtrofin marked 2 inline comments as done.Jul 10 2020, 9:10 AM
mtrofin added inline comments.
llvm/lib/Analysis/InlineSizeEstimatorAnalysis.cpp
271

when we'll use getFunctionFeatures for publishing training data, the vector won't come from a TF_Tensor anymore, it'll just be from a memory buffer. This is because just getting the data out doesn't need to depend on any tensorflow stuff (I'll split, at that point, this file in 2)

mtrofin updated this revision to Diff 277090.Jul 10 2020, 10:22 AM
mtrofin marked an inline comment as done.

Switched the saved_model protobuf to its textual representation.

davidxl accepted this revision.Mon, Jul 13, 9:00 AM

lgtm

This revision is now accepted and ready to land.Mon, Jul 13, 9:00 AM
This revision was automatically updated to reflect the committed changes.
davide added a subscriber: davide.Mon, Jul 13, 1:15 PM

This broke the modules build.

In file included from <module-includes>:109:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/Utils/TFUtils.h:12:10: fatal error: 'tensorflow/c/c_api.h' file not found
#include "tensorflow/c/c_api.h"
         ^~~~~~~~~~~~~~~~~~~~~~
While building module 'LLVM_Backend' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/CodeGen/BranchFolding.h:14:
In file included from <module-includes>:3:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/AntiDepBreaker.h:20:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/ScheduleDAG.h:24:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h:34:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/SelectionDAG.h:32:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:27:10: fatal error: could not build module 'LLVM_Analysis'
#include "llvm/Analysis/EHPersonalities.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To reproduce, pass -DLLVM_ENABLE_MODULES=On. Please let me know if you need anything else to repro.

In the meanwhile, reverted this in:

commit 9908a3b9f521c954cbf6adcec35b14b2f6c8da49 (HEAD -> master, origin/master, origin/HEAD)
Author: Davide Italiano <ditaliano@apple.com>
Date:   Mon Jul 13 12:59:16 2020 -0700

    Revert "[llvm] Native size estimator for training -Oz inliner"
    
    This reverts commit 83080a294ad7d145d758821bcf4354ad0cb7d299 as
    it breaks the macOS modules build.

Tried repro-ing this on a macosx.

cmake -GNinja -DLLVM_ENABLE_MODULES=On -DCMAKE_BUILD_TYPE=Release ../llvm

ninja opt

I got a warning - missing submodule 'LLVM_Analysis.InlineSizeEstimatorAnalysis' - but no error.

Could you please point me to the specific bot, I'm probably missing something else in the cmake config - thanks!

This broke the modules build.

In file included from <module-includes>:109:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/Utils/TFUtils.h:12:10: fatal error: 'tensorflow/c/c_api.h' file not found
#include "tensorflow/c/c_api.h"
         ^~~~~~~~~~~~~~~~~~~~~~
While building module 'LLVM_Backend' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/CodeGen/BranchFolding.h:14:
In file included from <module-includes>:3:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/AntiDepBreaker.h:20:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/ScheduleDAG.h:24:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h:34:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/SelectionDAG.h:32:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:27:10: fatal error: could not build module 'LLVM_Analysis'
#include "llvm/Analysis/EHPersonalities.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To reproduce, pass -DLLVM_ENABLE_MODULES=On. Please let me know if you need anything else to repro.

In the meanwhile, reverted this in:

commit 9908a3b9f521c954cbf6adcec35b14b2f6c8da49 (HEAD -> master, origin/master, origin/HEAD)
Author: Davide Italiano <ditaliano@apple.com>
Date:   Mon Jul 13 12:59:16 2020 -0700

    Revert "[llvm] Native size estimator for training -Oz inliner"
    
    This reverts commit 83080a294ad7d145d758821bcf4354ad0cb7d299 as
    it breaks the macOS modules build.

Tried repro-ing this on a macosx.

cmake -GNinja -DLLVM_ENABLE_MODULES=On -DCMAKE_BUILD_TYPE=Release ../llvm

ninja opt

I got a warning - missing submodule 'LLVM_Analysis.InlineSizeEstimatorAnalysis' - but no error.

Could you please point me to the specific bot, I'm probably missing something else in the cmake config - thanks!

This broke the modules build.

In file included from <module-includes>:109:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/Analysis/Utils/TFUtils.h:12:10: fatal error: 'tensorflow/c/c_api.h' file not found
#include "tensorflow/c/c_api.h"
         ^~~~~~~~~~~~~~~~~~~~~~
While building module 'LLVM_Backend' imported from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/lib/CodeGen/BranchFolding.h:14:
In file included from <module-includes>:3:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/AntiDepBreaker.h:20:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/ScheduleDAG.h:24:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/TargetLowering.h:34:
In file included from /Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/SelectionDAG.h:32:
/Users/buildslave/jenkins/workspace/lldb-cmake/llvm-project/llvm/include/llvm/CodeGen/MachineFunction.h:27:10: fatal error: could not build module 'LLVM_Analysis'
#include "llvm/Analysis/EHPersonalities.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To reproduce, pass -DLLVM_ENABLE_MODULES=On. Please let me know if you need anything else to repro.

In the meanwhile, reverted this in:

commit 9908a3b9f521c954cbf6adcec35b14b2f6c8da49 (HEAD -> master, origin/master, origin/HEAD)
Author: Davide Italiano <ditaliano@apple.com>
Date:   Mon Jul 13 12:59:16 2020 -0700

    Revert "[llvm] Native size estimator for training -Oz inliner"
    
    This reverts commit 83080a294ad7d145d758821bcf4354ad0cb7d299 as
    it breaks the macOS modules build.

sure, this is the log:
http://green.lab.llvm.org/green/view/LLDB/job/lldb-cmake/22740/console

I have some concerns about TensorFlow types and datastructure being directly exposed to Analysis, and relying on macros to conditionally compile all this away.
This seems to me like an indication of a missing layering, and some dependencies to cut through injection somehow.

Can you explain what in InlineSizeEstimatorAnalysis.cpp depends on TF for example? That would be a good starting point to help me understand how to break this dependency.

Thanks!

llvm/unittests/Analysis/InlineSizeEstimatorAnalysisTest.cpp
101

This is not the kind of test that is implemented as C++ unit-tests in LLVM usually I believe, can you make this a lit/FileCheck test?

I have some concerns about TensorFlow types and datastructure being directly exposed to Analysis, and relying on macros to conditionally compile all this away.
This seems to me like an indication of a missing layering, and some dependencies to cut through injection somehow.

Can you explain what in InlineSizeEstimatorAnalysis.cpp depends on TF for example? That would be a good starting point to help me understand how to break this dependency.

Thanks!

In 'development' mode, InlineSizeEstimatorAnaysis uses a tensorflow model to estimate the native size of a function from some features.
In this upcoming patch (https://reviews.llvm.org/D83733), also in 'development' mode, we use a tensorflow model that we want to load via command line (see the patch description).

Note that the TF APIs are those limited to the C API surface area - basic data types and TF_Tensor. If one builds LLVM with TF API support, and then wants to add an analysis, they may benefit from reusing this API.

I have some concerns about TensorFlow types and datastructure being directly exposed to Analysis, and relying on macros to conditionally compile all this away.
This seems to me like an indication of a missing layering, and some dependencies to cut through injection somehow.

Can you explain what in InlineSizeEstimatorAnalysis.cpp depends on TF for example? That would be a good starting point to help me understand how to break this dependency.

Thanks!

In 'development' mode, InlineSizeEstimatorAnaysis uses a tensorflow model to estimate the native size of a function from some features.
In this upcoming patch (https://reviews.llvm.org/D83733), also in 'development' mode, we use a tensorflow model that we want to load via command line (see the patch description).

Note that the TF APIs are those limited to the C API surface area - basic data types and TF_Tensor. If one builds LLVM with TF API support, and then wants to add an analysis, they may benefit from reusing this API.

I understand all this, I am still not sure we can't abstract the TF API away. I can clarify my earlier question: can you point me exactly what TF types and APIs needs to be exposed? To understand this I think that starting by InlineSizeEstimatorAnaysis may be a good idea, but skimming through it I only see the FunctionFeatures being expressed in terms of std::vector of various integer sizes: it isn't clear to me what from TensorFlow is used in this file right now?

I have some concerns about TensorFlow types and datastructure being directly exposed to Analysis, and relying on macros to conditionally compile all this away.
This seems to me like an indication of a missing layering, and some dependencies to cut through injection somehow.

Can you explain what in InlineSizeEstimatorAnalysis.cpp depends on TF for example? That would be a good starting point to help me understand how to break this dependency.

Thanks!

In 'development' mode, InlineSizeEstimatorAnaysis uses a tensorflow model to estimate the native size of a function from some features.
In this upcoming patch (https://reviews.llvm.org/D83733), also in 'development' mode, we use a tensorflow model that we want to load via command line (see the patch description).

Note that the TF APIs are those limited to the C API surface area - basic data types and TF_Tensor. If one builds LLVM with TF API support, and then wants to add an analysis, they may benefit from reusing this API.

I understand all this, I am still not sure we can't abstract the TF API away. I can clarify my earlier question: can you point me exactly what TF types and APIs needs to be exposed? To understand this I think that starting by InlineSizeEstimatorAnaysis may be a good idea, but skimming through it I only see the FunctionFeatures being expressed in terms of std::vector of various integer sizes: it isn't clear to me what from TensorFlow is used in this file right now?

https://reviews.llvm.org/D83843 should address this.