This is an archive of the discontinued LLVM Phabricator instance.

[mlgo] Add infrastructure to use EmitC-generated models for inlining.
Needs ReviewPublic

Authored by jacobhegna on Mar 20 2023, 5:38 PM.

Details

Summary

Current MLGO models can be served in three ways:

  1. release mode, which uses a TF AOT compiler to compile models into binary blobs and a header file. This requires a TF dependency at (LLVM) build time.
  2. development mode, which loads TFLite models using the TFLite runtime dynamically by providing LLVM with a path to the .tflite file. This requires a TFLite dependency.
  3. interactive mode, which fetches actions via two pipes (one for features going out, and one for actions going in).

None of these are suitable for a general clang release, where we don't assume any TF dependencies and package a model in the LLVM source code.

The EmitC serving path compiles TFLite models to pure C++ code with a mild runtime dependency. The runtime is a small set of tensor kernels necessary to run a neural net. The mlcompileropt repository contains (or at least, will eventually contain) a script which automates the process of the compiling the .tflile file through the various MLIR stages down to C++, and also embeds the C++ runtime directly in the autogenerated .cpp file. The result is that there is a single (.cpp, .h) pair which can be contained in the LLVM repository and built in a normal CMake build process, with no additional dependencies.

This patch adds two things:

  1. the infrastructure to load + run EmitC-generated models for the ML inlining advisor, and
  2. a "test" policy which was a TF function that always returned 1 and was fed through EmitC, and is loaded in the above framework. This is used to test the code in (1).

You can use the EmitC module inliner in opt with the flags -enable-ml-inliner=emitc -inliner-emitc-model-name=NAME_OF_MODEL. Currently the only supported model name is InlineOzTestModel.

Diff Detail

Event Timeline

jacobhegna created this revision.Mar 20 2023, 5:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 5:38 PM
jacobhegna requested review of this revision.Mar 20 2023, 5:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 5:38 PM

Fixing the error message when trying to use the emitc model when it was
not built with clang.

phosek added a subscriber: phosek.Mar 23 2023, 11:53 AM
marbre added a subscriber: marbre.Apr 18 2023, 2:37 AM

Slight update to the patch; add command line selection for registered emitc
models.

Note this still isn't ready for review, need to trim down this patch and split it up. just posting for visibility

Clean up the code & delete non-test models.

jacobhegna retitled this revision from Add initial EmitC inlining-for-size model. to Add infrastructure to use EmitC-generated models for inlining..Apr 21 2023, 5:32 PM
jacobhegna edited the summary of this revision. (Show Details)

Ok, this is ready for review. The code which compiles the .tflite models to C++ and embeds the runtime can be found in this external PR: https://github.com/google/ml-compiler-opt/pull/215

The phabricator interface/diff looked weird, updating diff.

Squashing commits to clean up diff.

Remove spurious changes.

jacobhegna edited the summary of this revision. (Show Details)Apr 21 2023, 8:14 PM

Add Apache 2.0 license to the autogenerated files + notice of original location.

Note to reviewers: I am not the most familiar with open source licensing and I
would like to get attribution + licensing reviewed carefully before this patch
lands.

Thanks for doing this!

High level, the dependency from model to LLVM should go away. Also emitc stuff should be implementation detail (i.e. not in any headers)

The models can then have their CMakeLists.txt, which IIUC it's the expected way for subdirectories.

llvm/include/llvm/Analysis/EmitCModelRegistry.h
56

StringMap?

61

why do we need this, wouldn't the macro below be sufficient?

llvm/include/llvm/Analysis/EmitCTensor.h
1 ↗(On Diff #516020)

won't this be part of the model package, i.e. something that's (maybe cloned) per-model?

llvm/include/llvm/Analysis/MLInlineEmitCModel.h
1 ↗(On Diff #516020)

Nit: isn't it "Model for inlining Oz"?

35 ↗(On Diff #516020)

can't these not be surfaced to begin with?

llvm/lib/Analysis/MLInlinerEmitCRunner.h
27

This should be in the emit-ed code, so emitc stuff shouldn't appear in any llvm-facing APIs.

64

can't we set the buffers in the ctor once? they don't get reallocated.

llvm/lib/Analysis/models/emitc/InlineOzTestModel.emitc.h
21

Not sure what the inheritance relationship buys us. We already have a common mechanism for referring to features by their enum-named index. This also seems to add a tight coupling between codegen and llvm.

If all we need is a way to identify groups of models in the registry, that could be done with the address of a static specific to each consumer, i.e. MLInlineAdvisor could have a static char ID = '\0' and the address of that ID is the key (like the legacy PM does things, too)

marbre added inline comments.Apr 26 2023, 9:15 AM
llvm/include/llvm/Analysis/EmitCTensor.h
1 ↗(On Diff #516020)

I actually had the same question in mind and I am not sure if llvm/include/llvm/Analysis is a good location for this. However, we discussed some time ago if we should upstream the whole reference implementation, see https://discourse.llvm.org/t/tosa-reference-model-from-mlir-using-emitc/4799/11.

Furthermore, I don't know if it is acceptable to upstream code not licensed under Apache-2.0 WITH LLVM exceptions. The good news is, we would be able to re-license (most of) the reference implementation.

Address comments from reviewers. The main changes are:

  1. Include the python files to generate the EmitC models in this patch, instead

of in the mlcompileropt github repo.

  1. Remove the global declaration of the emitc::Tensor type and make it an

implementation detail. This requires changing the signatures of the generated
models to speak in terms of T* instead of Tensor<T, ...>.

llvm/include/llvm/Analysis/EmitCModelRegistry.h
56

Fixed.

61

we need to call EmitCModelRegistry<ModelT>::get().addModel(...) before main() runs in the final linked binary. the easiest way to do that is to define a global object whose constructor runs that code - that's the point of the EmitCModelRegistrationHandle. We can't (as far as I know) just have the macro because then we wouldn't have a "context" to run the needed code (right now, that context is the constructor of ...Handle.

this is basically how cl::opt works, too.

Note that this mechanism is about to change when I update the patch. Instead of registering instantiated unique_ptrs of each model, I register a factory method which is capable of creating a MLModelRunner pointer. The reason for the change is because now that the emitc models don't share a common base class, there isn't a good way to store them all in a common registry (but we can store the MLModelRunner* because they use a subclass templated on the emitc model.

llvm/include/llvm/Analysis/EmitCTensor.h
1 ↗(On Diff #516020)

It is true that the EmitC runtime includes all of this code. it's not quite cloned per-model right now, because in the python emitc code I rename the Tensor type to something like _UnusedTensorType.

The reason is that when EmitC generates models, the signature of the functions take Tensors for input and output. If the definition of Tensor is in a local, anonymous namespace (aka the embedded runtime), then the code won't even compile because there won't be an accessible declaration for Tensor for the model header.

There are really only two solutions to this:

  1. make Tensor a type that's declared in a common LLVM header that each model can #include, and so each model will use the same shared definition of Tensor. This is what I did in this commit.
  1. make the model signatures only use void* or float* and keep the definition of Tensor as an internal implementation detail.

I talked with mircea offline, and (2) seems ok for us. it avoids exposing this code outside of generated models. It weakens the type in the exported function signatures, but that's not horrible because we already deal with void* internally (and we're the only user of this modified emitc backend)

1 ↗(On Diff #516020)

(see my above comment to mircea as well).

+1 to making the license change, that will be a blocker regardless of how we want to use the library. right now I don't use any of the Eigen code, so we can ignore that (if that was the main thing that can't be relicensed).

So, originally we were planning on putting the entire reference implementation in LLVM and having models #include it. The reason we didn't do this is essentially because we wanted two conflicting things:

  1. models to maintain bit-identical stability over time - if you give it these bits as input today, it will give you the same answer today or in 6 months from today.
  2. to be able to upgrade the runtime for performance/other reasons.

(2) can conflict with (1) because upgrades can create slight differences due to floating point numerical stability issues. So, instead we embed the runtime in each model individually so that new models get all the upgrades and old models get stability.

llvm/include/llvm/Analysis/MLInlineEmitCModel.h
1 ↗(On Diff #516020)

file gone, not an issue anymore

35 ↗(On Diff #516020)

the base model is going away, so not a problem anymore.

llvm/lib/Analysis/MLInlinerEmitCRunner.h
27

fixed. this function actually doesn't exist anymore.

64

fixed.

llvm/lib/Analysis/models/emitc/InlineOzTestModel.emitc.h
21

fixed, but not with the static char trick. see the patch update for details

Minor fixes.

mtrofin added inline comments.Apr 27 2023, 11:25 AM
llvm/include/llvm/Analysis/EmitCModelRegistry.h
61

ah, makes sense. thanks!

110

could this macro also #include the header - I think it'd be totally reasonable to follow a canonical naming convention where the header name is trivially derivable from EmitCModelType, for example.

llvm/include/llvm/Analysis/InlineModelFeatureMaps.h
25–48

pls add a small doc accordingly

84

can you peel this bit of the change separately and just submit it (so the part that adds the self-doc-ing and makes the 2 lists the same)

llvm/lib/Analysis/MLInlinerEmitCRunner.h
29

would it be possible for the emitC-ed code to just give us a void* lookupTensorBuffer(const std::string &) - it'd also keep the codegen-ed files smaller

llvm/lib/Analysis/models/emitc/InlineOzTestModel.emitc.h
18

can we add a "this is generated, avoid modifying me by hand" type of thing loud somewhere in the comment? same for the .cpp

Address most of comments; will move the feature macro stuff to a new patch in
the next update.

llvm/include/llvm/Analysis/EmitCModelRegistry.h
110

mmm we can if you want, but the problem is that I think we will eventually store models for different passes in different places - for example, the regalloc models might end up somewhere in llvm/lib/CodeGen. we can put the include here if you want, but not sure it's the most useful optimization

llvm/lib/Analysis/MLInlinerEmitCRunner.h
29

fixed

llvm/lib/Analysis/models/emitc/InlineOzTestModel.emitc.h
18

fixed

mtrofin added inline comments.Apr 27 2023, 2:28 PM
llvm/include/llvm/Analysis/EmitCModelRegistry.h
110

maybe add a parameter that is the header location, so this becomes 1 gesture rather than 2 (easier to discover what needs to be given, too)

llvm/lib/Analysis/MLInlinerEmitCRunner.h
29

ok but now you can avoid the macro-ing, right?

Address comments - mainly making the emitc runner generic beyond inlining.

Rebasing to HEAD to get inline macro changes.

mtrofin retitled this revision from Add infrastructure to use EmitC-generated models for inlining. to [mlgo] Add infrastructure to use EmitC-generated models for inlining..