implements simple canonicalization which eliminates memref.dim calls if possible
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/GPU/IR/GPUOps.cpp | ||
---|---|---|
1 | FYI: this is not the full header, we also include the filename and a brief description in the top line. | |
17 | Please do not commit commented-out code. | |
20–26 | I'm pretty sure most of these are unnecessary. Please do not include files that are not used, this may increase compilation time significantly. (Also, just put this code in GPUDialect.cpp as suggested above). | |
34 | Nit: MLIR uses /// to document top-level entities such as classes and functions. | |
42 | Nit: use named accessors instead of getOperand(1) with magic numbers, index() here. | |
52 | Nit: please expand auto here. We use auto if the type is obvious from local context, e.g., there's a cast on the RHS, or annoyingly long to spell, e.g., iterators. | |
mlir/test/Dialect/GPU/canonicalize.mlir | ||
25 | Please add a newline |
mlir/lib/Dialect/GPU/IR/GPUOps.cpp | ||
---|---|---|
50 | You can use getDefiningOp<AllocOp>() which will do dyn_cast for you |
I saw in other dialects like memref that all foldings/canonicalization are placed into *Ops.cpp, while *Dialect contains basically only initialize method. I thought it's reasonable since it rains cpp files with td files. What do you think on this?
We are rather inconsistent with naming, unfortunately. There are dialects that have NameOps.cpp, NameDialect.cpp or just Name.cpp that contain op implementations. Sometimes these files are also under the IR directory. Cleanups are always welcome as separate patches annotated as non-functional changes (NFC). For functional changes, we tend to prefer the minimally intrusive version. Since op implementations are already in GPUDialect.cpp, let's keep them there in this patch.
@ftynse, @Hardcode84, thanks for your suggestions. They are all addressed. Please proceed with the review
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp | ||
---|---|---|
1116 ↗ | (On Diff #369441) | Should this code check whether the size at index is actually dynamic? This case is hard to trigger, as dim applied to a constant memref dimension is canonicalized by patterns for dim but still. |
[mlir][gpu] folds memref.dim of gpu.alloc
implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size
Thanks, @herhut for you suggestion. I added this check, hoverer appropriate test for this feature didn't come up to my mind. Do you have any ideas?
To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.
@ftynse WDYT?
I any case, I am happy with this to land as is.
[mlir][gpu] folds memref.dim of gpu.alloc
implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size
[mlir][gpu] folds memref.dim of gpu.alloc
implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size
In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.
@geexie let me know if you need help landing this patch.
The canonicalizer supports filtering patterns by label.
https://mlir.llvm.org/docs/PatternRewriter/#pattern-filtering-1
https://mlir.llvm.org/docs/PatternRewriter/#debug-names-and-labels
@geexie let me know if you need help landing this patch.
To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.
@herhut, that is a good idea. I might try implementing this.
In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.
@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?
The canonicalizer supports filtering patterns by label.
@rriddle, are those filters available in command line? We are kind of looking for a command
// RUN: mlir-opt %s -canonicalize,pattern=gpu --split-input-file -allow-unregistered-dialect | FileCheck %s
or something in this direction
@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?
I don't think we have it elsewhere. The non-test use case for this is running the canonicalizer on IR where you know only a handful of dialects are used whereas much more are loaded in the context, and we therefore waste cycles on collecting and checking the patterns that we know cannot match. I don't have first-hand experience with this case so cannot tell how bad it actually is performance-wise.
Yes, they are pass options on the canonicalizer (and any pattern pass that also hooks into those options).
https://github.com/llvm/llvm-project/blob/main/mlir/test/Transforms/test-canonicalize-filter.mlir
// RUN: mlir-opt %s -canonicalize,pattern=gpu --split-input-file -allow-unregistered-dialect | FileCheck %sor something in this direction
@rriddle, thanks for a link. I didn’t know about this possibly and it looks the functionality I need. I’d try to write a test in similar way
To test this, we would need to apply only the canonicalizations from the gpu dialect.
Can you clarify what is "this" and why it requires to only apply a subset of the canonicalization patterns? This looks unusual to me.
@mehdi_amini, I added a check which tests if memrefType has a dynamic dimension in my pass since static dimensions got folded during memref::DimOp canonicalization. I do canonicalization for gpu::Alloc. With parametrized tests I can write a test which checks that memref.dim from constant dimension is not hit by -canonicalize pass for gpu dialect for which my pass is implemented.
FYI: this is not the full header, we also include the filename and a brief description in the top line.