This added an OwningUnrankedMemRef which wraps a UnrankedMemRefType
descriptor. New helpers method to initialize and index the descriptor, as
well as an iterator (UnrankedMemrefIterator) are also provided.
Details
- Reviewers
nicolasvasilache ftynse bondhugula aartbik
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
400 | Note to self: refactor this to share the implementation with the ranked iterator. |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | CRunner utils is meant to run on HW that does not have C++ runtime library. | |
386 | An implementation that is easier to follow and reuses common utils is to just keep the offset, add 1 to it and delinearize the indices in the stride basis. | |
421 | That's not ok for the CRunnerUtils. |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | We should probably refactor all this, it seems all quite not well organized to me right now, including mixing the definition of the descriptors with random set of utilities that look quite "ad-hoc" (printComma(), getTensorFilename(), ...). | |
386 | I didn't quite follow what you describe here? | |
421 | I can use a SmallVector :) |
Use LLVM SmallVector instead of std::vector and refactor iterator indexing to share the code between Ranked/Unranked iterators
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | I'm very happy if you have the cycles to devote to this and tidy things up. Basically the original version of CRunnerUtils was running on ARM micro but it was a half manual process with kludges to connect to a Makefile based system that I didn't have access to (i.e. shipping pre-cross-compiled .o across the wall). It would be great if there was a simple way to test the works without a C++ runtime aspect, as this was the original | |
386 | I meant you could make available and reuse these functions from mlir/include/mlir/Dialect/Vector/VectorUtils.h /// Computes and returns the linearized index of 'offsets' w.r.t. 'basis'. int64_t linearize(ArrayRef<int64_t> offsets, ArrayRef<int64_t> basis); /// Given the strides together with a linear index in the dimension /// space, returns the vector-space offsets in each dimension for a /// de-linearized index. SmallVector<int64_t, 4> delinearize(ArrayRef<int64_t> strides, int64_t linearIndex); Then your function would just delinearize(strides, linearize(offset, strides) + 1);. |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | To be clear, by "C++ runtime" do you mean the non-header part of the STL? | |
386 | Ah I see what you mean now, thanks. It isn't clear to me that the + 1 is enough here if the outermost stride isn't 1? I'm not sure if the behavior of delinearize should even be specific for an offset that is impossible to get from linearize... |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
371 | This could be: linearize(indices, strides) ? | |
386 | You're right sorry, I took the wrong basis. Here is the thinking, dropping the word "offset" as it is overloaded by now. Indexing in the basis of memref "sizes" gives you the virtual index, you can go from linear to multi-D as: Indexing in the basis of memref "strides" gives you the physical displacement which skips holes, you can go from linear to multi-D as: So assuming you keep a list of indices, to get the physical displacement of your next element compared to the base, you can: If you prefer instead to keep the 1-D physical displacement, you can get the displacement of the next entry by: Does this make sense ? |
mlir/unittests/ExecutionEngine/Invoke.cpp | ||
---|---|---|
408 | Is this a typo here? |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | There is another wrinkle here: this file is built in C++11 mode? Why? |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | I think this file is built the same way as the rest of MLIR. The comment there perhaps exists in the interest of avoiding link time issues with libmlir_runner_utils.so - for eg. to link objects compiled from MLIR outside of the mlir-cpu-runner JIT? |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | We have this: mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11) |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | There is another wrinkle here: this file is built in C++11 mode? Why? It is a bit confusing right now to have this piece of the project compiled differently and in a way that it can't interact with any other data-structure from ADT/Support, what is the purpose here? If there is a specific needs, I'd like to see isolate in its own component (separate directory, etc.) rather than mixed with the ExecutionEngine folder which is part of the regular C++ infra. Yes there are specific needs for this file: when it was created it was used to run ModelBuilder with some ARM micro (both arm32 and arm64 I believe) HW. I don't know offhand the exact processor but we can find out if relevant. Anyway, what's relevant is that the SDK, compiler etc for those required what is documented at the top of the file:
If you want to iterate on a better set of requirements and think there is a better place/way to do this, by all means let's start the discussion. In the meantime, let's not break it. | |
35 | I don't see value in introducing a dependency to LLVM at runtime here. |
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 |
I don't think this is reasonable to have this kind of requirement here right now. Wanting to support an embedded environment which does not support the LLVM C++ standard is interesting, but I don't think we have an agreed goal to have this in-tree at the moment. I'm more inclined to align everything right now for what the project supports, rather that having a "project within the project" for out-of-tree reasons. | |
35 | std::array is fine if you have a fixed rank at compile time, it does not scale with unranked... |
Looks good. Could you add an additional line to the commit summary for better context? The commit title is a bit cryptic and the connection to runtime utils and runners is missing.
mlir/include/mlir/ExecutionEngine/MemRefUtils.h | ||
---|---|---|
144 | Nit: * -> . | |
322–323 | Doc comments here please. | |
mlir/unittests/ExecutionEngine/Invoke.cpp | ||
303–307 | Can these five lines be put in a helper registerDialectsAndParseSourceString? You have four repetitions. |
I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?
I haven't worked around the specific requirement in the CRunnerUtils file and I suspect it requires some refactoring. But I don't know also the exact use-case and how to test it in order to be able to propose a path forward here, @nicolasvasilache can you help figure out who's using this and how?
I don't either which is why the best proxy I found is mlir/lib/ExecutionEngine/CMakeLists.txt:set_property(TARGET mlir_c_runner_utils PROPERTY CXX_STANDARD 11).
I don't think there are active users though so this could be your window of opportunity to move everything into RunnerUtils as I suggested above.
I still question the decision to not use the lowest common denominator when the logic you need is barely more than ptr + offset.
But I also see more value in progress and iteration than YAGNI discussions: it is perfectly valid to say that interested parties should grow/bring their own tensor library and this is merely for providing the batteries than MLIR is lacking.
If you want to grow this into something more than simple 2xAAA, an RFC would be useful.
Please address the comments on the indexing logic.
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h | ||
---|---|---|
34 | As usual, with MLIR everything is on a per-need basis, the need for setting up such goals has not come up before. In the meantime, to unblock progress, I recommend moving all the code from this file into RunnerUtils.cpp and revisit later. | |
35 | pointer + offset ? | |
371 | I don't believe this has been addressed? | |
386 | I don't believe this has been addressed? |
I'm not sure how I could "move" things without knowing who depends on the things I'd move :)
That said my best shot at a plan for a future layering would be:
- A low-level layer to manipulate memref descriptor written purely in C. That would make it a C runtime very suitable for embedding on micro environment as well I think.
- A higher-level layer written in modern C++ with all the possible niceties one want. This layer could target the low-level C API to manipulate the descriptors, that said it is possible that template+inline would give better perf for C++ users.
WDYT?
@bondhugula as well?
That said my best shot at a plan for a future layering would be:
This sounds great to me!
CRunner utils is meant to run on HW that does not have C++ runtime library.
In particular, uses of vector are prohibited and array is used instead.
Does this file fit the bill?