Depends On D110680
Details
- Reviewers
- None
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This cost model component is pretty interesting! It looks like something that really can be a generic pass that would drive a traversal and query opinterfaces?
mlir/lib/Dialect/Async/Transforms/CostModel.cpp | ||
---|---|---|
2 | (license header) | |
90–92 | ||
96 | Seems like all these kind of "estimateCostXXX" are the kind of things that should be behind an op interface instead, shouldn't it? | |
mlir/lib/Dialect/Async/Transforms/CostModel.h | ||
2 | You're missing the license header, and the guard does not match the usual format I believe. | |
7 | Please avoid using namespace directive in a header. | |
15 | Can you make it a class and increase the doc? Edit: reading the implementation, I get a better feel for what is going on, but the comment stands :) |
This seems important and cross-cutting enough to warrant an RFC on discourse, I am surprised @mehdi_amini has not asked for this already :)
Putting a blocker until a discussion has happened.
mlir/lib/Dialect/Async/Transforms/CostModel.cpp | ||
---|---|---|
96 | Yes, the code is written with that potential development in mind. But it is not clear that something as intrusive is necessary now. The concept needs to be validated first. The biggest question is whether the estimation precision of what we have now is sufficient. If it turns to be insufficient, we might need to do better analysis. For instance, it's conceivable that the whole approach would need to be scrapped, it might be imperative to do the memory component of the analysis earlier and/or incrementally, before the tiling and fusion have been lowered to ops which make it difficult to guess the cache performance. Furthermore, it might be beneficial to make a first-class cost-model which would help drive optimizations in other parts of the pipeline, most importantly the codegen. Furthermore, I suspect that once we are trying to use op interfaces, the cost model needs to be reimagined as this first-class entity, which might mean that we would need to design it in a way, that is useful for other architectures, e.g. GPUs. That in itself will take a quarter. So my preference is to do small and local to the AsyncParallelFor pass. | |
mlir/lib/Dialect/Async/Transforms/CostModel.h | ||
2 | It seems to be matching the PassDetail.h from the same directory, no? |
It would be great to have the cost model to be a part of core MLIR, defined with op interface, but for it to be usable for all the potential targets it will require a lot of discussions (e.g. how cost model can capture GPU memory coalescing, should it be the concern of the cost model at all, or it should just track compute?). Let's start an RFC to figure out how it should look like. In the meantime we can hide the current crude implementation behind the AsyncBlockSizeComputationFunction API, and keep implementation inside the Tensorflow compiler. We can tune it for the compilation pipeline we have right now, without need to worry about the general case.
mlir/include/mlir/Dialect/Async/Passes.h | ||
---|---|---|
25 | How about giving more control to the caller (similar to the TileSizeComputationFunction in linalg). Something like this: struct AsyncBlockSize { Value targetBlockSize; // must be a Value of index type ... // some other values? } using AsyncBlockSizeComputationFunction = std::function<AsyncBlockSize(OpBuilder &, scf::ParallelOp parallelLoop)>; createAsyncParallelForPass(bool asyncDispatch, int32_t numWorkerThreads, AsyncBlockSizeComputationFunction cost); Constant int32_t targetBlockSize will be just a trivial cost compute function. Also Async dialect can provide the default cost function based on the ops properties. | |
mlir/include/mlir/Dialect/Async/Passes.td | ||
34 | and this will become a property of the caller pipeline |
How about giving more control to the caller (similar to the TileSizeComputationFunction in linalg).
Something like this:
Constant int32_t targetBlockSize will be just a trivial cost compute function. Also Async dialect can provide the default cost function based on the ops properties.