This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Set up boilerplate build for MLIR benchmarks
ClosedPublic

Authored by SaurabhJha on Dec 6 2021, 11:45 AM.

Details

Summary

This is the start of mlir benchmarks. It configures cmake files necessary to get started writing actual benchmarks.

Diff Detail

Event Timeline

SaurabhJha created this revision.Dec 6 2021, 11:45 AM
SaurabhJha requested review of this revision.Dec 6 2021, 11:45 AM

Changing commit message by removing references to sparse kernel since we are introducing a general benchmark framework here

SaurabhJha retitled this revision from [mlir] Set up boilerplate build for MLIR sparse kernel benchmarks to [mlir] Set up boilerplate build for MLIR benchmarks.Dec 6 2021, 12:02 PM
SaurabhJha edited the summary of this revision. (Show Details)
mehdi_amini added inline comments.Dec 6 2021, 1:38 PM
mlir/CMakeLists.txt
239

I think this should all be inside the benchmark folder.

244

This should be behind an option -DMLIR_ENABLE_CXX_BENCHMARKS or something like that.
I wouldn't use an overly generic name like MLIR_ENABLE_BENCHMARKS because I can imagine other infrastructure (Python for example) that will also do benchmarking.

SaurabhJha updated this revision to Diff 392175.Dec 6 2021, 2:05 PM

Address comments: move benchmarking configuration to inner directory and enable/disable benchmarking with a flag

aartbik added inline comments.Dec 6 2021, 4:21 PM
mlir/CMakeLists.txt
218

looks like newline is missing

mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

jumping ahead a little, you are setting up a framework that runs C++ code

how are you going to make the link from C++ to MLIR IR (i.e. how do you envision running a sparse kernel benchmark this way)?

SaurabhJha added inline comments.Dec 7 2021, 11:24 AM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

I don't know the answer to it yet but I am going through some more documentation about how MLIR IR, specifically around what's the equivalent of IRBuilder in MLIR world.

What metrics of MLIR IR do we care about? For example, is code size important?

mehdi_amini added inline comments.Dec 7 2021, 11:32 AM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

code size could be something important on Mobile indeed. It isn't the main focus in general though (compared to execution time).

We can write C++ execution tests (example: https://github.com/llvm/llvm-project/blob/main/mlir/unittests/ExecutionEngine/Invoke.cpp ) but we'll need to significantly improve the runtime support to make this convenient I think.

SaurabhJha added inline comments.Dec 7 2021, 11:46 AM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

Thanks for the pointers @mehdi_amini. That's what I was thinking, somehow generating IR inline and then timing it, but looks like it's not viable yet. I will think about it more and post my findings here.

SaurabhJha added inline comments.Dec 7 2021, 12:16 PM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

We can have an external python program, similar to llvm-lit, that can do these transformations externally and run some benchmarks. What do you think about this high level idea? I can start on this direction if that sounds viable.

mehdi_amini added inline comments.Dec 7 2021, 9:06 PM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

I'm not sure what you mean actually, can you elaborate what you have in mind?

aartbik added inline comments.Dec 7 2021, 10:06 PM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

Note that we are actually much more advanced in building, compiling, and running MLIR IR from Python programs, take a look at https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py for example (which would give many opportunities to time part of the generated IR in a very fine-grained manner [after a warm JIT execution startup]). So in that sense, perhaps a Python based benchmarking framework for starters would make some sense too.

But please explore a bit and report back to us.

SaurabhJha added inline comments.Dec 8 2021, 12:27 AM
mlir/benchmarks/MLIRStarter.cpp
3–8 ↗(On Diff #392175)

Yeah, my idea is not very concrete yet but https://github.com/llvm/llvm-project/blob/main/mlir/test/Integration/Dialect/SparseTensor/python/test_SpMM.py can be a good starting point. I will flesh this out more and come back.

Introduce a python framework for benchmarking llvm programs

@mehdi_amini @aartbik I have introduced a python example benchmark. Let me know what you think about this new approach. It needs more fleshing out and a possible integration with llvm-lit. I have suggested a way we can use FileCheck for these benchmarks.

I am also posting a comment on discourse thread with more context so that more people can weigh in.

aartbik added inline comments.Dec 13 2021, 11:07 AM
mlir/benchmark/Dialect/SparseTensor/python/test_SpMM.py
135 ↗(On Diff #393755)

if you are just showing this as a first very rough example, that's fine, but we cannot check this in as is, since this is a very bad test for a benchmark (it stress tests all sparsity annotations over a very small test case)

In the long run, we want a proper benchmark to

(1) pick one sparsity annotation (or report numbers for each different annotation at least)
(2) pick one test input, larger size
(3) split the JIT part from the actual kernel execution part
(4) maybe even report various metrics, such time to compile, time to execute etc.

SaurabhJha added inline comments.Dec 13 2021, 11:13 AM
mlir/benchmark/Dialect/SparseTensor/python/test_SpMM.py
135 ↗(On Diff #393755)

Yes, I couldn't think of a good example so just copied the original sparse multiplication test just to demonstrate what it would look like. In fact, we should probably not even use timeit utility of python.

Set up a python script to run benchmarks. Remove google benchmark setup.

Even if the current benchmarking is the way to go, I couldn't find a way to consistently run them. In local, I have been running them from command line like this

bash
PYTHONPATH=build/tools/mlir/python_packages/mlir_core MLIR_C_RUNNER_UTILS=build/lib/libmlir_c_runner_utils.dylib MLIR_RUNNER_UTILS=build/lib/libmlir_runner_utils.dylib python mlir/benchmark/python/*.bench.py

Added support of pushing benchmarks to an LNT server. Also added a README.

mehdi_amini added inline comments.Dec 22 2021, 1:04 PM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

I'm not convinced by the parameters here: I would expect the "driver" / "framework" to be agnostic to what the benchmark function does mostly.
Somehow like Python Unit-test maybe?
I expect mostly from a framework like this to:

  • discover benchmarks
  • organize suites (so we need either some metadata, some naming convention, a manifest, or a directory convention eventually). Think about gtests for example.
  • allow to run entire suites or just a subset with filtering.
  • for each benchmark, runs multiple times until you get confidence in the result (google benchmarks does that I think?) instead of a fixed arbitrary number of times.
  • collect and aggregate results in a report file.
mlir/benchmark/python/run.py
33 ↗(On Diff #395836)

Can you add a mode where I get a local report file instead? (offline mode)

SaurabhJha added inline comments.Dec 22 2021, 3:18 PM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

Makes total sense. I have never implemented discovery or organising suites so give me some time to design and implement it. 🙂

I do hope that the approach is right and the final outcome would mostly look like a pytest invocation.

SaurabhJha added inline comments.Dec 23 2021, 7:00 AM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

Hey, I was implementing this and a question came to me.

In the current implementation, we compile the module once and then run the compiled module n times. If we make the driver benchmark agnostic, we would be both compiling and running n number of times. Would that be acceptable?

One way we could reproduce the current arrangement is by having a convention that the benchmarks return the compiled module and then we run the compiled module at the driver level. But because we are recording running times using nano_time, the driver would have to make assumptions about kernel's function signature to access the running times like we are doing now. I don't think we should be imposing this much from the driver.

Do you have any thoughts on this?

SaurabhJha added inline comments.Dec 23 2021, 10:49 AM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

Thinking about this more, I think what I am looking for is "setup" for unit testing. This could be a good way to divide responsibilities between the driver and the sparse matrix specific setup.

I also think that each benchmark can return a float for the time taken in a run which can then be collected by the driver.

SaurabhJha added inline comments.Dec 23 2021, 12:52 PM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

Sorry for too many messages here. I will spend some more time. It would be much easier if I have an implementation of these scattered ideas.

mehdi_amini added inline comments.Dec 23 2021, 4:31 PM
mlir/benchmark/python/README.md
25 ↗(On Diff #395836)

There are many ways to handle this, for example a function decorated with @benchmark could actually return a tuple of functors to split various phases, instead of the function itself being invoked and timed.

```python
@benchmark(pipeline_string, ntimes)
def benchmark_something_module():
   def setup():
      module = ir.Module.create()
      # Define arguments for the kernel function
      with ir.InsertionPoint(module.body):
          @builtin.FuncOp.from_py_func(<arguments_defined_above>)
          def kernel_name(<parameters>):
              # Kernel implementation

      compiled_module = compile(module)
      execution_engine = mlir.jit.init()
      execution_engine.register(compiled_module)
      return compiled_module, execution_engine

  def run(args):
      compiled_module = args[0]
      execution_engine = args[1]
      execution_engine.run("main", ....)   

  return (setup, run)

The framework can then do:

run_args = setup()

time(run, run_args)

I just made that up right now, not saying it is necessarily the best option :)

Still a work in progress but I have addressed some comments.

  1. I have abstracted everything into a library.
  2. Implemented benchmark discovery similar to pytests.
  3. Better separation of running passes, compiling, and running.

Some issues still need resolution:

  1. I am still running benchmarks a fixed number of times instead of running till they are statistically significant.
  2. The module discovery doesn't include filtering like pytests.

I am working on these two things right now.

Glancing at it:

  • it's nice that you're splitting the general library in mlir/utils/
  • it seems that there is still a strong coupling between the benchmark framework and the "thing to benchmark". It isn't clear to me why the code mlir/utils/mbr needs to know anything about the IR or the execution engine. Ideally it should be able to benchmark any random python code. Think that we may use this to benchmark a Numpy implementation vs MLIR for example. This is why the example I showed you before was directly returning callbacks: they can run any arbitrary code and the framework does not know about it.

This diff has these changes:

  1. Address the comment of returning a compile function and a run function from a benchmark which the framework can use.
  2. Add filtering of benchmark paths.
  3. Dynamically determine the number of runs required for a benchmark function. I have used a strategy similar to python's timeit (https://github.com/python/cpython/blob/main/Lib/timeit.py#L31-L33) and google benchmark (https://github.com/google/benchmark/blob/main/src/benchmark_runner.cc#L231-L253). Let me know if need something different here.

I haven't included a README since I am not yet sure if the framework interface is something we agree on. Once we have an agreement, I will add a user guide of this library in the README.

SaurabhJha updated this revision to Diff 397059.Jan 3 2022, 6:55 AM

The latest revision contains the following changes

  1. Adding a README.
  2. Having a configuration file for the library.
  3. Having a numpy benchmark as an example where there is no compile function.
  4. Improve benchmark filtering to filter by benchmark name.

Hey @mehdi_amini @aartbik , I have addressed all outstanding issues. Can you please take a look and help me decide how to proceed further on this?

this is converging to something very nice, thanks for all your work and patience!

mlir/benchmark/python/benchmark_sparse.py
23

Please add top level comment

"""xxxx"""

describing the benchmark, just to set the right example for future benchmarks

also, nitpicky, do we want a "flat" python/bencmark_*.py structure, or do we want to specialize like

benchmark/sparse/*.py
benchmark/dense/*.py

Just curious if we should set that example now already, or whether we refactor later if the number of benchmarks grows

24

this probably belongs in common now, ie. setting up various pipelines that will be shared by benchmarks

mlir/benchmark/python/common.py
10

top level doc on what is in this file

"""Common utilities .... """

mlir/utils/mbr/mbr/config.ini
10

what are these newline missing comments?

mlir/utils/mbr/mbr/discovery.py
10

please document every method with a python doc string

mlir/utils/mbr/mbr/main.py
33

try to keep the 80-col limit

Addressed latest round of comments

  1. Added python docstrings for functions and modules.
  2. Made all lines wrap around 80 character limit.
  3. Added newlines.

Regarding flat python/bencmark_*.py structure, my vote would be for the current structure as its simpler and we could always move
things around later. I am open to other directory structures as well so let me know if you have any alternate thoughts :)

SaurabhJha added inline comments.Jan 18 2022, 10:06 AM
mlir/benchmark/python/common.py
27

I wonder whether this setup_pass belongs to common as it seems specific to sparse tensors.

Add CMake targets for installation of the library

Add trailing line to mlir/.gitignore

mehdi_amini added inline comments.Jan 18 2022, 11:55 PM
mlir/.gitignore
10 ↗(On Diff #400972)

Why do all these folder show up here? In general the execution is in the build (which is at the top level and not under MLIR).
We should leave our source directory "pristine".

mlir/benchmark/python/common.py
27

Yes it isn't "common" from this point of view, but this is also the case of a helper like create_sparse_np_tensor below.

91

This is not describing the loop that is emitted and the wrapping

mlir/utils/mbr/mbr/stats.py
15

These seems fairly arbitrary and overly large to me. Where is this coming from?

Did you look into how other solution are using some statistical approach to evaluate confidence?

Address latest round of comments.

Why do all these folder show up here? In general the execution is in the build (which is at the top level and not under MLIR).

We should leave our source directory "pristine".
I have now made an llvm-lit kind of arrangement where we move the executable to the build directory and mlir-mbr is invoked from there. That way, we won't be bothered by "*.pyc" files. Unfortunately, the egg-info and build/ directories are created by pip install -e which we have to do to install mlir-mbr. I included manual rm -rs in CMakeLists.txt.

Yes it isn't "common" from this point of view, but this is also the case of a helper like create_sparse_np_tensor below.

Absolutely. I tried splitting common into benchmark wide common and sparse specific common by creating a directory called mlir/benchmark/python/sparse and moving sparse specific common and sparse benchmark there. Unfortunately, I couldn't work around parent package import problem where benchmark is trying to import from common in the parent directory. I kept it unchanged for now.

We need to solve this problem as I expect benchmarks would be divided into different directories.

This is not describing the loop that is emitted and the wrapping

Updated the comment.

These seems fairly arbitrary and overly large to me. Where is this coming from?

I took the general idea from google benchmark https://github.com/google/benchmark/blob/main/src/benchmark_runner.cc#L260-L265 and python timeit https://github.com/python/cpython/blob/main/Lib/timeit.py#L31-L33. The values probably need tweaking but my understanding is the general idea is to either run it till some max time has elapsed or we have a sufficient number of measurements.

Address latest round of comments.

Why do all these folder show up here? In general the execution is in the build (which is at the top level and not under MLIR).

We should leave our source directory "pristine".
I have now made an llvm-lit kind of arrangement where we move the executable to the build directory and mlir-mbr is invoked from there. That way, we won't be bothered by "*.pyc" files. Unfortunately, the egg-info and build/ directories are created by pip install -e which we have to do to install mlir-mbr. I included manual rm -rs in CMakeLists.txt.

llvm-lit is also a python program, how is it setup?
The rm looks a bit hacky to me, I rather not touch the source directory at all.

These seems fairly arbitrary and overly large to me. Where is this coming from?

I took the general idea from google benchmark https://github.com/google/benchmark/blob/main/src/benchmark_runner.cc#L260-L265 and python timeit https://github.com/python/cpython/blob/main/Lib/timeit.py#L31-L33. The values probably need tweaking but my understanding is the general idea is to either run it till some max time has elapsed or we have a sufficient number of measurements.

Wow, that really low-tech...
Probably good enough to get started, but please make a default to a couple of seconds, not 100.

llvm-lit is also a python program, how is it setup?
The rm looks a bit hacky to me, I rather not touch the source directory at all.

Yep, got a solution for it! The problem was mbr seeked to be both a library and a CLI runner. As a library, it used to provide BenchmarkRunConfig so that benchmarks
could import them and return them like this.

python
from mbr import BenchmarkRunConfig


def some_benchmark():
    ...
    return BenchmarkRunConfig(compiler=compiler, runner=runner)

To make mbr available, we had to do pip install which created those build directories. Now I just return a two element tuple from each benchmark and check them in the runner. So we don't need a library and thus we can remove the pip install.

Probably good enough to get started, but please make a default to a couple of seconds, not 100.

Made it 1 second (1e9 ns).

I have also made it default to continue running benchmarks if any benchmark raises an exception. This is similar to how unittests handle testcases that raise an exception.

Remove references to BenchmarkRunConfig from README

LGTM, if @aartbik is happy with this.

LGTM, if @aartbik is happy with this.

@aartbik polite ping 🙂

aartbik accepted this revision.Jan 26 2022, 6:34 PM

Thanks a lot for all your efforts and patience!

LGTM as well, but with a few last nits on naming

please address those before submitting (but I am giving you a green light now already to avoid further delay once you are done with those ;-) ;-)

mlir/benchmark/python/benchmark_sparse.py
2

matrices -> tensors

31

throughout the file please replace

sparse multiplication

with

sparse matrix multiplication

(note that we use "sparse tensor" when we talk about the support in general, but for specific 2-d cases, using matirx is okay)

mlir/benchmark/python/common.py
27

Also note that thanks to Wren's work, we soon can simplify this a lot!

This revision is now accepted and ready to land.Jan 26 2022, 6:34 PM

Address final comments on naming

SaurabhJha added inline comments.Jan 27 2022, 10:17 AM
mlir/benchmark/python/common.py
27

Nice, I guess you are talking about this one. I am keeping track of that and will change this once that's in the main tree.

I hope I am guessing it correctly that you are okay with this going in as of now. I have addressed the name changes that you raised in the last comment.

I'll wait for the build to pass before merging this in.

The build is taking quite a while. Is it okay to restart if it doesn't finish in about 9-10 hours?

Alright, this is it! The build has passed, merging.

This revision was automatically updated to reflect the committed changes.
Via Conduit