This is an archive of the discontinued LLVM Phabricator instance.

folds memref.dim(gpu.alloc(%size), %idx) -> %size
ClosedPublic

Authored by geexie on Aug 29 2021, 9:15 AM.

Details

Summary

implements simple canonicalization which eliminates memref.dim calls if possible

Diff Detail

Event Timeline

geexie created this revision.Aug 29 2021, 9:15 AM
geexie requested review of this revision.Aug 29 2021, 9:15 AM

I think, it can be added into GPUDialect.cpp instead of creating new file

ftynse requested changes to this revision.Aug 30 2021, 3:15 AM
ftynse added a subscriber: ftynse.
ftynse added inline comments.
mlir/lib/Dialect/GPU/IR/GPUOps.cpp
1 ↗(On Diff #369325)

FYI: this is not the full header, we also include the filename and a brief description in the top line.

17 ↗(On Diff #369325)

Please do not commit commented-out code.

20–26 ↗(On Diff #369325)

I'm pretty sure most of these are unnecessary. Please do not include files that are not used, this may increase compilation time significantly.

(Also, just put this code in GPUDialect.cpp as suggested above).

34 ↗(On Diff #369325)

Nit: MLIR uses /// to document top-level entities such as classes and functions.

42 ↗(On Diff #369325)

Nit: use named accessors instead of getOperand(1) with magic numbers, index() here.

52 ↗(On Diff #369325)

Nit: please expand auto here. We use auto if the type is obvious from local context, e.g., there's a cast on the RHS, or annoyingly long to spell, e.g., iterators.

mlir/test/Dialect/GPU/canonicalize.mlir
25

Please add a newline

This revision now requires changes to proceed.Aug 30 2021, 3:15 AM
Hardcode84 added inline comments.Aug 30 2021, 3:27 AM
mlir/lib/Dialect/GPU/IR/GPUOps.cpp
50 ↗(On Diff #369325)

You can use getDefiningOp<AllocOp>() which will do dyn_cast for you

I think, it can be added into GPUDialect.cpp instead of creating new file

I saw in other dialects like memref that all foldings/canonicalization are placed into *Ops.cpp, while *Dialect contains basically only initialize method. I thought it's reasonable since it rains cpp files with td files. What do you think on this?

I think, it can be added into GPUDialect.cpp instead of creating new file

I saw in other dialects like memref that all foldings/canonicalization are placed into *Ops.cpp, while *Dialect contains basically only initialize method. I thought it's reasonable since it rains cpp files with td files. What do you think on this?

We are rather inconsistent with naming, unfortunately. There are dialects that have NameOps.cpp, NameDialect.cpp or just Name.cpp that contain op implementations. Sometimes these files are also under the IR directory. Cleanups are always welcome as separate patches annotated as non-functional changes (NFC). For functional changes, we tend to prefer the minimally intrusive version. Since op implementations are already in GPUDialect.cpp, let's keep them there in this patch.

I think, it can be added into GPUDialect.cpp instead of creating new file

I saw in other dialects like memref that all foldings/canonicalization are placed into *Ops.cpp, while *Dialect contains basically only initialize method. I thought it's reasonable since it rains cpp files with td files. What do you think on this?

We are rather inconsistent with naming, unfortunately. There are dialects that have NameOps.cpp, NameDialect.cpp or just Name.cpp that contain op implementations. Sometimes these files are also under the IR directory. Cleanups are always welcome as separate patches annotated as non-functional changes (NFC). For functional changes, we tend to prefer the minimally intrusive version. Since op implementations are already in GPUDialect.cpp, let's keep them there in this patch.

Got your point.

geexie updated this revision to Diff 369441.Aug 30 2021, 7:29 AM

implements canonicalization which folds memref.dim(gpu.alloca(%size), %idx) -> %size

geexie marked an inline comment as done.Aug 30 2021, 7:32 AM

@ftynse, @Hardcode84, thanks for your suggestions. They are all addressed. Please proceed with the review

ftynse accepted this revision.Aug 30 2021, 7:32 AM

Thanks!

This revision is now accepted and ready to land.Aug 30 2021, 7:32 AM
herhut added inline comments.Aug 30 2021, 7:57 AM
mlir/lib/Dialect/GPU/IR/GPUDialect.cpp
1116

Should this code check whether the size at index is actually dynamic?

This case is hard to trigger, as dim applied to a constant memref dimension is canonicalized by patterns for dim but still.

nit: please fix alloca->alloc in description and comments

geexie updated this revision to Diff 369452.Aug 30 2021, 8:42 AM

[mlir][gpu] folds memref.dim of gpu.alloc

implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size

nit: please fix alloca->alloc in description and comments

Done, thanks!

geexie marked an inline comment as done.Aug 30 2021, 8:45 AM

Thanks, @herhut for you suggestion. I added this check, hoverer appropriate test for this feature didn't come up to my mind. Do you have any ideas?

herhut accepted this revision.Aug 30 2021, 9:17 AM

Thanks, @herhut for you suggestion. I added this check, hoverer appropriate test for this feature didn't come up to my mind. Do you have any ideas?

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@ftynse WDYT?

I any case, I am happy with this to land as is.

geexie updated this revision to Diff 369472.Aug 30 2021, 9:53 AM

[mlir][gpu] folds memref.dim of gpu.alloc

implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size

geexie updated this revision to Diff 369474.Aug 30 2021, 9:57 AM

[mlir][gpu] folds memref.dim of gpu.alloc

implements canonicalization which folds memref.dim(gpu.alloc(%size), %idx) -> %size

nit: please fix alloca->alloc in description and comments

Finally done!

geexie retitled this revision from folds memref.dim(gpu.alloca(%size), %idx) -> %size to folds memref.dim(gpu.alloc(%size), %idx) -> %size.Aug 30 2021, 10:28 AM

Thanks, @herhut for you suggestion. I added this check, hoverer appropriate test for this feature didn't come up to my mind. Do you have any ideas?

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@ftynse WDYT?

I any case, I am happy with this to land as is.

In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.

@geexie let me know if you need help landing this patch.

rriddle added a comment.EditedAug 31 2021, 1:54 AM

Thanks, @herhut for you suggestion. I added this check, hoverer appropriate test for this feature didn't come up to my mind. Do you have any ideas?

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@ftynse WDYT?

I any case, I am happy with this to land as is.

In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.

The canonicalizer supports filtering patterns by label.

https://mlir.llvm.org/docs/PatternRewriter/#pattern-filtering-1
https://mlir.llvm.org/docs/PatternRewriter/#debug-names-and-labels

@geexie let me know if you need help landing this patch.

This revision was automatically updated to reflect the committed changes.

Thanks @ftynse! Just landed.

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@herhut, that is a good idea. I might try implementing this.

In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.

@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?

The canonicalizer supports filtering patterns by label.

@rriddle, are those filters available in command line? We are kind of looking for a command

// RUN: mlir-opt %s -canonicalize,pattern=gpu --split-input-file -allow-unregistered-dialect | FileCheck %s

or something in this direction

@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?

I don't think we have it elsewhere. The non-test use case for this is running the canonicalizer on IR where you know only a handful of dialects are used whereas much more are loaded in the context, and we therefore waste cycles on collecting and checking the patterns that we know cannot match. I don't have first-hand experience with this case so cannot tell how bad it actually is performance-wise.

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@herhut, that is a good idea. I might try implementing this.

In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.

@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?

The canonicalizer supports filtering patterns by label.

@rriddle, are those filters available in command line? We are kind of looking for a command

Yes, they are pass options on the canonicalizer (and any pattern pass that also hooks into those options).

https://github.com/llvm/llvm-project/blob/main/mlir/test/Transforms/test-canonicalize-filter.mlir

// RUN: mlir-opt %s -canonicalize,pattern=gpu --split-input-file -allow-unregistered-dialect | FileCheck %s

or something in this direction

To test this, we would need to apply only the canonicalizations from the gpu dialect. The canonicalize pass does not support this, though. One could add support for limiting canonicalization to a specific dialect for testing purposes.

@herhut, that is a good idea. I might try implementing this.

In the general scheme of things, I don't think even limiting to one dialect is sufficient, we may want to select specific patterns or, at least, a set of operations for which the patterns apply. There were also concerns about always having all patterns available in the pass that makes it expensive to run... Sounds like we should think about making the canonicalizer more flexible, but this is out of scope for this patch.

@ftynse, I kind of understand that you mean. Do we need such functionality for no-testing use cases? do we have such level of customization in any of other passes?

The canonicalizer supports filtering patterns by label.

@rriddle, are those filters available in command line? We are kind of looking for a command

Yes, they are pass options on the canonicalizer (and any pattern pass that also hooks into those options).

https://github.com/llvm/llvm-project/blob/main/mlir/test/Transforms/test-canonicalize-filter.mlir

// RUN: mlir-opt %s -canonicalize,pattern=gpu --split-input-file -allow-unregistered-dialect | FileCheck %s

or something in this direction

@rriddle, thanks for a link. I didn’t know about this possibly and it looks the functionality I need. I’d try to write a test in similar way

To test this, we would need to apply only the canonicalizations from the gpu dialect.

Can you clarify what is "this" and why it requires to only apply a subset of the canonicalization patterns? This looks unusual to me.

To test this, we would need to apply only the canonicalizations from the gpu dialect.

Can you clarify what is "this" and why it requires to only apply a subset of the canonicalization patterns? This looks unusual to me.

@mehdi_amini, I added a check which tests if memrefType has a dynamic dimension in my pass since static dimensions got folded during memref::DimOp canonicalization. I do canonicalization for gpu::Alloc. With parametrized tests I can write a test which checks that memref.dim from constant dimension is not hit by -canonicalize pass for gpu dialect for which my pass is implemented.