This is an archive of the discontinued LLVM Phabricator instance.

[mlir][TilingInterface] Add callback to yield a produced value.
AbandonedPublic

Authored by mravishankar on Sep 20 2022, 1:25 PM.

Details

Summary

To support cases where a fused producer value is computed within the loop
and yielded, with the yield value being used as the replacement of the fused
op; add callback that provides the caller control towards this.

Diff Detail

Event Timeline

mravishankar created this revision.Sep 20 2022, 1:25 PM
mravishankar requested review of this revision.Sep 20 2022, 1:25 PM

Set callback default to return false.

hanchung added inline comments.Nov 22 2022, 6:09 PM
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
674–684

Is it possible that it failed? If so, should we signal something? IF not, can we add an assertion and we won't need one level of nesting.

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
272–274

should we add memref::DimOp to the list?

chelini added inline comments.Nov 23 2022, 1:47 AM
mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
93

Why use here? SliceOp is an Operation*.

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
208
mravishankar marked an inline comment as done.

Rebase and address comments.

mravishankar marked 2 inline comments as done.

Addressing minor comment that was missed.

mlir/include/mlir/Dialect/SCF/Transforms/TileUsingInterface.h
93

Sorry, that was confusing. Fixed.

mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
674–684

Dropped the FailureOr from the called method. It always succeeds.

mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
272–274

tile and fuse doesnt work with memrefs and also with memref there is no replacement to do.

hanchung added inline comments.Nov 28 2022, 11:30 AM
mlir/lib/Dialect/SCF/Transforms/TileUsingInterface.cpp
674–684

do we need to capture the result of yieldTiledValues? It's not used, and I assume that it's only for generating IRs.

hanchung accepted this revision.Nov 30 2022, 1:19 PM
hanchung added inline comments.
mlir/test/lib/Interfaces/TilingInterface/TestTilingInterface.cpp
36

should be consistent with other uses (see below)

This revision is now accepted and ready to land.Nov 30 2022, 1:19 PM
nicolasvasilache requested changes to this revision.Nov 30 2022, 2:07 PM

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

This revision now requires changes to proceed.Nov 30 2022, 2:07 PM

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

I dont follow what is fishy here. This is the culmination of months long effort to fix this https://github.com/llvm/llvm-project/issues/57205 (which has been blocking things downstream for as long). The changes here are the same as what is done in lines 394 - 407 that returns the replacements for the tiled consumer. This is now returning the replacements for a fused producer when that the caller determines is worth/valid to return. It is too much state to track if this value is returnable or not from the fused code. So this is left as a caller flag, so its a complete opt-in, and controlled by the caller. I dont see the complexity increase. Happy to discuss offline. This needs to be solved, and it was determined that the previous tile+fuse method was not structured enough to fix the original issue, so it was redone to make this issue solvable...

nicolasvasilache resigned from this revision.Dec 1 2022, 2:29 AM

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

I dont follow what is fishy here. This is the culmination of months long effort to fix this https://github.com/llvm/llvm-project/issues/57205 (which has been blocking things downstream for as long). The changes here are the same as what is done in lines 394 - 407 that returns the replacements for the tiled consumer. This is now returning the replacements for a fused producer when that the caller determines is worth/valid to return. It is too much state to track if this value is returnable or not from the fused code. So this is left as a caller flag, so its a complete opt-in, and controlled by the caller. I dont see the complexity increase. Happy to discuss offline. This needs to be solved, and it was determined that the previous tile+fuse method was not structured enough to fix the original issue, so it was redone to make this issue solvable...

I'll paste my comment from https://reviews.llvm.org/D138882 here.

Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt.

Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally.
Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed.
The alternative is often to refactor multiple times until the right abstractions emerge.

In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform.
This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas.

If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs.
Refactor the reusable utility helpers to avoid copy-pasta.

These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors.

Hard stop on more lambdas from above, I am happy to iterate with you on the IREE side to a good solution.

As a side note, softmax and other complex things fuse properly into scf.foreach_thread thanks to a redesigned iterative algorithm and better sets of input / outputs.
Not sure if relevant to your specific case but my gut feeling is that this is not a case that qualifies for a pass.

This revision is now accepted and ready to land.Dec 1 2022, 2:29 AM
nicolasvasilache requested changes to this revision.Dec 1 2022, 2:30 AM

Sorry, didn't mean to resign.

This revision now requires changes to proceed.Dec 1 2022, 2:30 AM

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

I dont follow what is fishy here. This is the culmination of months long effort to fix this https://github.com/llvm/llvm-project/issues/57205 (which has been blocking things downstream for as long). The changes here are the same as what is done in lines 394 - 407 that returns the replacements for the tiled consumer. This is now returning the replacements for a fused producer when that the caller determines is worth/valid to return. It is too much state to track if this value is returnable or not from the fused code. So this is left as a caller flag, so its a complete opt-in, and controlled by the caller. I dont see the complexity increase. Happy to discuss offline. This needs to be solved, and it was determined that the previous tile+fuse method was not structured enough to fix the original issue, so it was redone to make this issue solvable...

I'll paste my comment from https://reviews.llvm.org/D138882 here.

Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt.

Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally.
Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed.
The alternative is often to refactor multiple times until the right abstractions emerge.

In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform.
This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas.

If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs.
Refactor the reusable utility helpers to avoid copy-pasta.

These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors.

Hard stop on more lambdas from above, I am happy to iterate with you on the IREE side to a good solution.

If I have to internalize that comment, and the places you have blocked this, all the places you mentioned are worklist based algorithms (either using pattern rewrite based fixed point solutions, or the worklist algorithm used here). In both the cases, the callbacks allow the way the worklist is built. Callbacks allow control externally without leaking too much of the implementation details. I think that is useful if done in specific/controlled ways. I am not a fan of hard red-lines. In this particular case, I can rewrite things to not use callbacks but it will have to leak too much of the implementation details to allow callers to put things together again. Thats the tradeoff IMO

As a side note, softmax and other complex things fuse properly into scf.foreach_thread thanks to a redesigned iterative algorithm and better sets of input / outputs.

Since you have put a hard stop to this, could point me to where this is. Since there is unilateral decision made to disallow callbacks, Ill try to conform to this. Also btw, not related to softmax (that already works), and not related to scf.foreach_thread cause it is not related to distribution.

Not sure if relevant to your specific case but my gut feeling is that this is not a case that qualifies for a pass.

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

I dont follow what is fishy here. This is the culmination of months long effort to fix this https://github.com/llvm/llvm-project/issues/57205 (which has been blocking things downstream for as long). The changes here are the same as what is done in lines 394 - 407 that returns the replacements for the tiled consumer. This is now returning the replacements for a fused producer when that the caller determines is worth/valid to return. It is too much state to track if this value is returnable or not from the fused code. So this is left as a caller flag, so its a complete opt-in, and controlled by the caller. I dont see the complexity increase. Happy to discuss offline. This needs to be solved, and it was determined that the previous tile+fuse method was not structured enough to fix the original issue, so it was redone to make this issue solvable...

I'll paste my comment from https://reviews.llvm.org/D138882 here.

Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt.

Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally.
Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed.
The alternative is often to refactor multiple times until the right abstractions emerge.

In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform.
This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas.

If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs.
Refactor the reusable utility helpers to avoid copy-pasta.

These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors.

Hard stop on more lambdas from above, I am happy to iterate with you on the IREE side to a good solution.

If I have to internalize that comment, and the places you have blocked this, all the places you mentioned are worklist based algorithms (either using pattern rewrite based fixed point solutions, or the worklist algorithm used here). In both the cases, the callbacks allow the way the worklist is built. Callbacks allow control externally without leaking too much of the implementation details. I think that is useful if done in specific/controlled ways. I am not a fan of hard red-lines. In this particular case, I can rewrite things to not use callbacks but it will have to leak too much of the implementation details to allow callers to put things together again. Thats the tradeoff IMO

As a side note, softmax and other complex things fuse properly into scf.foreach_thread thanks to a redesigned iterative algorithm and better sets of input / outputs.

Since you have put a hard stop to this, could point me to where this is. Since there is unilateral decision made to disallow callbacks, Ill try to conform to this. Also btw, not related to softmax (that already works), and not related to scf.foreach_thread cause it is not related to distribution.

Not sure if relevant to your specific case but my gut feeling is that this is not a case that qualifies for a pass.

I apologize for the language, I realize upon rereading that it was too harsh and didn't mean it this way, I could have posted in less of a hurry and worded things better.
As we discussed offline, I would love to be able to come up with a characterization where we are very careful with C++ callbacks and avoid things that amount to inversion of control and breaking the functional contract that has worked very well for the implementation of most transformations logic.
I do think this example can be written in a way that avoids this injection and I am happy to help iterate until we get there (it may be a long process and I am sorry for the drag this has been causing for a too long time already).

The piece I was referring to I think was actually adapted from an earlier version of yours, it is only used inside the transform dialect for now but if there is a need to extract as a free function we can:
https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp?L504

My experience with this is that this can run in 2 modes: either use the list of ops to fuse incrementally and let the transformation figure things out, or call the transformation itself in a sequence controlled from outside, which allows more flexibility.
I could offer to try the example you have in mind through this and see whether something new appears that would force us to rethink the approach.

This feels extremely fishy, I do not understand the use case or the value but see significant complexity increase.

Putting a blocker until I can make sense of this.

I dont follow what is fishy here. This is the culmination of months long effort to fix this https://github.com/llvm/llvm-project/issues/57205 (which has been blocking things downstream for as long). The changes here are the same as what is done in lines 394 - 407 that returns the replacements for the tiled consumer. This is now returning the replacements for a fused producer when that the caller determines is worth/valid to return. It is too much state to track if this value is returnable or not from the fused code. So this is left as a caller flag, so its a complete opt-in, and controlled by the caller. I dont see the complexity increase. Happy to discuss offline. This needs to be solved, and it was determined that the previous tile+fuse method was not structured enough to fix the original issue, so it was redone to make this issue solvable...

I'll paste my comment from https://reviews.llvm.org/D138882 here.

Injecting lambda form above allows inversion of control that is convenient very short term and has almost always proven to very quickly turn to technical debt.

Given recent offline discussions I had and other parts of the codebase I have seen, I am going to more seriously push back against this anti-pattern globally.
Injecting C++ callback control from above is a sign of missing abstractions and should almost always be disallowed.
The alternative is often to refactor multiple times until the right abstractions emerge.

In other words, the transformations we add must be functional-style, statically return the pieces of IR (existing created or updated) that make sense for that transform.
This is not something customizable, if you need more information then statically return more information: no backchannels through lambdas.

If you need different behaviors, instead of injecting a dynamic mechanism through a lambda, write another transformation that takes more/different inputs and return more/different outputs.
Refactor the reusable utility helpers to avoid copy-pasta.

These transformations can then be plugged into patterns and transform dialect ops who can be responsible for the switch between different static behaviors.

Hard stop on more lambdas from above, I am happy to iterate with you on the IREE side to a good solution.

If I have to internalize that comment, and the places you have blocked this, all the places you mentioned are worklist based algorithms (either using pattern rewrite based fixed point solutions, or the worklist algorithm used here). In both the cases, the callbacks allow the way the worklist is built. Callbacks allow control externally without leaking too much of the implementation details. I think that is useful if done in specific/controlled ways. I am not a fan of hard red-lines. In this particular case, I can rewrite things to not use callbacks but it will have to leak too much of the implementation details to allow callers to put things together again. Thats the tradeoff IMO

As a side note, softmax and other complex things fuse properly into scf.foreach_thread thanks to a redesigned iterative algorithm and better sets of input / outputs.

Since you have put a hard stop to this, could point me to where this is. Since there is unilateral decision made to disallow callbacks, Ill try to conform to this. Also btw, not related to softmax (that already works), and not related to scf.foreach_thread cause it is not related to distribution.

Not sure if relevant to your specific case but my gut feeling is that this is not a case that qualifies for a pass.

I apologize for the language, I realize upon rereading that it was too harsh and didn't mean it this way, I could have posted in less of a hurry and worded things better.

Thanks for that!

As we discussed offline, I would love to be able to come up with a characterization where we are very careful with C++ callbacks and avoid things that amount to inversion of control and breaking the functional contract that has worked very well for the implementation of most transformations logic.
I do think this example can be written in a way that avoids this injection and I am happy to help iterate until we get there (it may be a long process and I am sorry for the drag this has been causing for a too long time already).

The piece I was referring to I think was actually adapted from an earlier version of yours, it is only used inside the transform dialect for now but if there is a need to extract as a free function we can:
https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp?L504

My experience with this is that this can run in 2 modes: either use the list of ops to fuse incrementally and let the transformation figure things out, or call the transformation itself in a sequence controlled from outside, which allows more flexibility.
I could offer to try the example you have in mind through this and see whether something new appears that would force us to rethink the approach.

Thanks. I think I understand. I will hold off on this patch, and send some NFC patches that basically do

  1. Have an API entry that does one step of the fusion, i.e. one producer -> tensor.extract_slice swap.
  2. Refactor the existing scf::tileConsumerAndFuseProducersGreedily to use this and then maybe have different API entry points that allow having a list of producers and tile + fuse all of them etc. that use the entry point added in (1).

Might not get to this soon (I am on vacation for a few days). Ill send out the stack once I have something working.

mravishankar abandoned this revision.Jan 16 2023, 10:38 AM