This is an archive of the discontinued LLVM Phabricator instance.

Add element preserving trait to optimize involution folding
Needs RevisionPublic

Authored by ahmedsabie on Oct 16 2020, 11:11 AM.

Details

Summary

Adding this trait allows involution folding to be applicable in more situations

Diff Detail

Event Timeline

ahmedsabie created this revision.Oct 16 2020, 11:11 AM
ahmedsabie requested review of this revision.Oct 16 2020, 11:11 AM
rriddle requested changes to this revision.Oct 16 2020, 11:14 AM

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This revision now requires changes to proceed.Oct 16 2020, 11:14 AM

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This makes the original trait applicable in more places because it can ignore nodes that don't actually end up changing tensor values so you can still fold. This is currently used in tensorflow grappler optimizations under the same name

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This makes the original trait applicable in more places because it can ignore nodes that don't actually end up changing tensor values so you can still fold.

I don't understand this aspect. Is this supposed to mean that the operation would somehow mutate the input tensors? Putting the fact that tensors are value-typed aside, I would expect this to be something covered by side effects and not a trait like this. If an operation is going to mutate its input operands, that is something that should be expressed as a side effect.

This is currently used in tensorflow grappler optimizations under the same name

I would prefer to always use names that make sense in the context of MLIR, and not carry over names just because they are used in a different infrastructure. If a name can be carried over that's great(e.g. if it is widely used throughout the domain), but in this case it just doesn't make sense given how general of a name it is.

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This makes the original trait applicable in more places because it can ignore nodes that don't actually end up changing tensor values so you can still fold.

I don't understand this aspect. Is this supposed to mean that the operation would somehow mutate the input tensors? Putting the fact that tensors are value-typed aside, I would expect this to be something covered by side effects and not a trait like this. If an operation is going to mutate its input operands, that is something that should be expressed as a side effect.
No I mean if you had abs(permute(abs(x))) then that's the same as permute(abs(x)) but you can't generally apply this optimization to all side effect free operations, example abs(negate(abs(x))) is not the same as negate(abs(x)). ValuePreserving is capturing the property of permute that it doesn't actually produce new values when applied

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This makes the original trait applicable in more places because it can ignore nodes that don't actually end up changing tensor values so you can still fold.

I don't understand this aspect. Is this supposed to mean that the operation would somehow mutate the input tensors? Putting the fact that tensors are value-typed aside, I would expect this to be something covered by side effects and not a trait like this. If an operation is going to mutate its input operands, that is something that should be expressed as a side effect.

No I mean if you had abs(permute(abs(x))) then that's the same as permute(abs(x)) but you can't generally apply this optimization to all side effect free operations, example abs(negate(abs(x))) is not the same as negate(abs(x)). ValuePreserving is capturing the property of permute that it doesn't actually produce new values when applied

Thank you for explaining, I understand what you mean by value preserving now. That being said, I don't think this optimization is legal with just idempotent and "value preserving". For example, stable_sort(permute(stable_sort(x))) is not strictly the same as permute(stable_sort(x)).

Update name of trait and apply it to involutions instead

andyly requested changes to this revision.Oct 21 2020, 4:59 AM
andyly added inline comments.
mlir/include/mlir/IR/OpDefinition.h
1061–1062

Can this comment be updated to not be the same as IsIdempotent?

This revision now requires changes to proceed.Oct 21 2020, 4:59 AM

Update comment

I don't really understand the motivation here, can you detail it a bit more. Why is the original trait not enough? Also, ValuePreserving seems too broad of a name for what this is doing.

This makes the original trait applicable in more places because it can ignore nodes that don't actually end up changing tensor values so you can still fold.

I don't understand this aspect. Is this supposed to mean that the operation would somehow mutate the input tensors? Putting the fact that tensors are value-typed aside, I would expect this to be something covered by side effects and not a trait like this. If an operation is going to mutate its input operands, that is something that should be expressed as a side effect.

No I mean if you had abs(permute(abs(x))) then that's the same as permute(abs(x)) but you can't generally apply this optimization to all side effect free operations, example abs(negate(abs(x))) is not the same as negate(abs(x)). ValuePreserving is capturing the property of permute that it doesn't actually produce new values when applied

Thank you for explaining, I understand what you mean by value preserving now. That being said, I don't think this optimization is legal with just idempotent and "value preserving". For example, stable_sort(permute(stable_sort(x))) is not strictly the same as permute(stable_sort(x)).

That's a good point, it would only work if we define idempotent to split across elements but that wouldn't be called idempotent necessarily anymore. I updated the diff with a similar idea but for involution which should work although I added it as a pass rather than a trait fold because it can't be implemented as a fold I believe

ahmedsabie retitled this revision from Add value preserving trait to optimize idempotent folding to Add element preserving trait to optimize involution folding.Oct 25 2020, 8:26 PM
ahmedsabie edited the summary of this revision. (Show Details)
andyly accepted this revision.Oct 26 2020, 1:17 PM
rriddle requested changes to this revision.Oct 26 2020, 1:29 PM

Can you formulate a formal description for what you consider "element preserving"? I still have trouble understanding the implications just by looking at the name and the equation notation.

mlir/include/mlir/IR/OpBase.td
1727

nit: Use == here?

mlir/include/mlir/Transforms/Trait.h
4

I would not put this in the Transforms/ library because it prevents dialects from depending on it, i.e dialect libraries should never really depend on Transforms/.

mlir/lib/Transforms/Trait.cpp
9

Can you add a detailed comment of the folding here?

This revision now requires changes to proceed.Oct 26 2020, 1:29 PM
rriddle added inline comments.Oct 26 2020, 1:30 PM
mlir/include/mlir/Transforms/Trait.h
4

We already have some common foldings near the trait definitions, this could go there for now.

ahmedsabie added inline comments.Oct 27 2020, 10:18 AM
mlir/include/mlir/Transforms/Trait.h
4

That would be the OpDefinition.h file right? The issue is this requires inheriting from TraitRewritePattern which is in PatternMatch.h and PatternMatch.h itself relies on OpDefinition.h

rriddle added inline comments.Oct 30 2020, 1:52 AM
mlir/include/mlir/IR/PatternMatch.h
234

I think checking for a trait is easy enough to do inside of a normal rewrite pattern that this is probably overkill.

mlir/include/mlir/Transforms/Trait.h
4

Yes, Normally we expose patterns via some form of "populate*Patterns" method. Using that technique would remove the need for RewritePattern to be defined in the .h file. populateInvolutionCanonicalizationPatterns?

Orthogonally, this case starts to make me think that we should maybe have a 'getCanonicalizationPatterns' hook for traits as well, so this gets auto exposed?