This is an archive of the discontinued LLVM Phabricator instance.

Add C++ helpers to manage Unranked Memref in native code
Needs RevisionPublic

Authored by mehdi_amini on Feb 9 2021, 10:18 PM.

Details

Summary

This added an OwningUnrankedMemRef which wraps a UnrankedMemRefType
descriptor. New helpers method to initialize and index the descriptor, as
well as an iterator (UnrankedMemrefIterator) are also provided.

Diff Detail

Event Timeline

mehdi_amini created this revision.Feb 9 2021, 10:18 PM
mehdi_amini requested review of this revision.Feb 9 2021, 10:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2021, 10:18 PM
mehdi_amini added inline comments.Feb 9 2021, 10:19 PM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
401

Note to self: refactor this to share the implementation with the ranked iterator.

mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

CRunner utils is meant to run on HW that does not have C++ runtime library.
In particular, uses of vector are prohibited and array is used instead.
Does this file fit the bill?

387

An implementation that is easier to follow and reuses common utils is to just keep the offset, add 1 to it and delinearize the indices in the stride basis.

422

That's not ok for the CRunnerUtils.
Maybe put those in RunnerUtils?

mehdi_amini added inline comments.Feb 10 2021, 9:59 AM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

We should probably refactor all this, it seems all quite not well organized to me right now, including mixing the definition of the descriptors with random set of utilities that look quite "ad-hoc" (printComma(), getTensorFilename(), ...).

387

I didn't quite follow what you describe here?
Why adding 1 to the offset? The offset is physical in the buffer right now, are you thinking about making it a logical offset? When would the logical -> physical take place?

422

I can use a SmallVector :)

Use LLVM SmallVector instead of std::vector and refactor iterator indexing to share the code between Ranked/Unranked iterators

mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

I'm very happy if you have the cycles to devote to this and tidy things up.

Basically the original version of CRunnerUtils was running on ARM micro but it was a half manual process with kludges to connect to a Makefile based system that I didn't have access to (i.e. shipping pre-cross-compiled .o across the wall).
As time passed and this was only used as the runtime library support to write some basic integration tests, other stuff got added.

It would be great if there was a simple way to test the works without a C++ runtime aspect, as this was the original

387

I meant you could make available and reuse these functions from mlir/include/mlir/Dialect/Vector/VectorUtils.h

/// Computes and returns the linearized index of 'offsets' w.r.t. 'basis'.
int64_t linearize(ArrayRef<int64_t> offsets, ArrayRef<int64_t> basis);

/// Given the strides together with a linear index in the dimension
/// space, returns the vector-space offsets in each dimension for a
/// de-linearized index.
SmallVector<int64_t, 4> delinearize(ArrayRef<int64_t> strides,
                                    int64_t linearIndex);

Then your function would just delinearize(strides, linearize(offset, strides) + 1);.
If you were interested in performance (which I would claim you shouldn't be here, or you would just use some prebaked Torch-like library), you could additionally store the linearOffset in to avoid 1/2 the compute.

Use a templated Range argument for operator []

mehdi_amini added inline comments.Feb 10 2021, 12:27 PM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

To be clear, by "C++ runtime" do you mean the non-header part of the STL?
Otherwise, without RTTI enabled there shouldn't be anything left I think

387

Ah I see what you mean now, thanks.

It isn't clear to me that the + 1 is enough here if the outermost stride isn't 1? I'm not sure if the behavior of delinearize should even be specific for an offset that is impossible to get from linearize...
But I guess I could replace +1 with + strides[rank-1]?

mehdi_amini retitled this revision from Add C++ helpers to manager Unranked Memeref in native code to Add C++ helpers to manage Unranked Memref in native code.Feb 10 2021, 12:32 PM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
372

This could be: linearize(indices, strides) ?

387

You're right sorry, I took the wrong basis.

Here is the thinking, dropping the word "offset" as it is overloaded by now.

Indexing in the basis of memref "sizes" gives you the virtual index, you can go from linear to multi-D as:
delinearize(sizes, linearize(indices, sizes) + 1).

Indexing in the basis of memref "strides" gives you the physical displacement which skips holes, you can go from linear to multi-D as:
linearize(delinearize(sizes, linearize(indices, sizes) + 1), *strides*).

So assuming you keep a list of indices, to get the physical displacement of your next element compared to the base, you can:
linearize(delinearize(sizes, linearize(indices, sizes) + 1), *strides*)

If you prefer instead to keep the 1-D physical displacement, you can get the displacement of the next entry by:
linearize(delinearize(sizes, linearize(delinearize(displacement, *strides*), sizes) + 1), *strides*)

Does this make sense ?

bondhugula added inline comments.Feb 11 2021, 6:14 PM
mlir/unittests/ExecutionEngine/Invoke.cpp
421

Is this a typo here?

mehdi_amini added inline comments.Feb 11 2021, 6:31 PM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

There is another wrinkle here: this file is built in C++11 mode? Why?
It is a bit confusing right now to have this piece of the project compiled differently and in a way that it can't interact with any other data-structure from ADT/Support, what is the purpose here?
If there is a specific needs, I'd like to see isolate in its own component (separate directory, etc.) rather than mixed with the ExecutionEngine folder which is part of the regular C++ infra.

bondhugula added inline comments.Feb 13 2021, 3:44 AM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

I think this file is built the same way as the rest of MLIR. The comment there perhaps exists in the interest of avoiding link time issues with libmlir_runner_utils.so - for eg. to link objects compiled from MLIR outside of the mlir-cpu-runner JIT?

mehdi_amini added inline comments.Feb 13 2021, 9:16 AM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

We have this: mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11)

nicolasvasilache requested changes to this revision.Feb 14 2021, 5:50 AM
nicolasvasilache added inline comments.
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34
There is another wrinkle here: this file is built in C++11 mode? Why?
It is a bit confusing right now to have this piece of the project compiled differently and in a way that it can't interact with any other data-structure from ADT/Support, what is the purpose here?
If there is a specific needs, I'd like to see isolate in its own component (separate directory, etc.) rather than mixed with the ExecutionEngine folder which is part of the regular C++ infra.

Yes there are specific needs for this file: when it was created it was used to run ModelBuilder with some ARM micro (both arm32 and arm64 I believe) HW. I don't know offhand the exact processor but we can find out if relevant. Anyway, what's relevant is that the SDK, compiler etc for those required what is documented at the top of the file:

  1. use C++11
  2. don't require linking in libstdc++.

If you want to iterate on a better set of requirements and think there is a better place/way to do this, by all means let's start the discussion.
Maybe we should start an integration_test/xxx directory for some e2e test on a restricted embedded HW.

In the meantime, let's not break it.

35

I don't see value in introducing a dependency to LLVM at runtime here.
Things were absolutely fine with std::array.

This revision now requires changes to proceed.Feb 14 2021, 5:50 AM
mehdi_amini added inline comments.Feb 15 2021, 11:51 AM
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

In the meantime, let's not break it.

I don't think this is reasonable to have this kind of requirement here right now.

Wanting to support an embedded environment which does not support the LLVM C++ standard is interesting, but I don't think we have an agreed goal to have this in-tree at the moment. I'm more inclined to align everything right now for what the project supports, rather that having a "project within the project" for out-of-tree reasons.

35

std::array is fine if you have a fixed rank at compile time, it does not scale with unranked...

Rebase and fix API changes + typo found by Uday

mehdi_amini marked an inline comment as done.Feb 15 2021, 1:02 PM
bondhugula accepted this revision.Apr 8 2021, 9:27 PM

Looks good. Could you add an additional line to the commit summary for better context? The commit title is a bit cryptic and the connection to runtime utils and runners is missing.

mlir/include/mlir/ExecutionEngine/MemRefUtils.h
144

Nit: * -> .

322–323

Doc comments here please.

mlir/unittests/ExecutionEngine/Invoke.cpp
316–320

Can these five lines be put in a helper registerDialectsAndParseSourceString? You have four repetitions.

I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?

nicolasvasilache requested changes to this revision.Apr 8 2021, 11:36 PM

I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?

I don't either which is why the best proxy I found is mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11).
I don't think there are active users though so this could be your window of opportunity to move everything into RunnerUtils as I suggested above.

I still question the decision to not use the lowest common denominator when the logic you need is barely more than ptr + offset.
But I also see more value in progress and iteration than YAGNI discussions: it is perfectly valid to say that interested parties should grow/bring their own tensor library and this is merely for providing the batteries than MLIR is lacking.
If you want to grow this into something more than simple 2xAAA, an RFC would be useful.

Please address the comments on the indexing logic.

mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
34

As usual, with MLIR everything is on a per-need basis, the need for setting up such goals has not come up before.
This probably deserves an RFC.

In the meantime, to unblock progress, I recommend moving all the code from this file into RunnerUtils.cpp and revisit later.
CRunnerUtils.* should not stay around if you're breaking the requirements.

35

pointer + offset ?

372

I don't believe this has been addressed?

387

I don't believe this has been addressed?

This revision now requires changes to proceed.Apr 8 2021, 11:36 PM

I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?

I don't either which is why the best proxy I found is mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11).
I don't think there are active users though so this could be your window of opportunity to move everything into RunnerUtils as I suggested above.

I'm not sure how I could "move" things without knowing who depends on the things I'd move :)

That said my best shot at a plan for a future layering would be:

  • A low-level layer to manipulate memref descriptor written purely in C. That would make it a C runtime very suitable for embedding on micro environment as well I think.
  • A higher-level layer written in modern C++ with all the possible niceties one want. This layer could target the low-level C API to manipulate the descriptors, that said it is possible that template+inline would give better perf for C++ users.

WDYT?
@bondhugula as well?

That said my best shot at a plan for a future layering would be:

This sounds great to me!

I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?

I don't either which is why the best proxy I found is mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11).
I don't think there are active users though so this could be your window of opportunity to move everything into RunnerUtils as I suggested above.

I'm not sure how I could "move" things without knowing who depends on the things I'd move :)

That said my best shot at a plan for a future layering would be:

  • A low-level layer to manipulate memref descriptor written purely in C. That would make it a C runtime very suitable for embedding on micro environment as well I think.
  • A higher-level layer written in modern C++ with all the possible niceties one want. This layer could target the low-level C API to manipulate the descriptors, that said it is possible that template+inline would give better perf for C++ users.

WDYT?
@bondhugula as well?

This sounds great to me too.