This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Remove SameOperandsAndResultShape when redundant with ElementwiseMappable
ClosedPublic

Authored by silvas on Nov 12 2020, 5:24 PM.

Details

Summary

SameOperandsAndResultShape and ElementwiseMappable have similar
verification, but in general neither is strictly redundant with the
other.

Examples:

  • SameOperandsAndResultShape allows `"foo"(%0) : tensor<2xf32> -> tensor<?xf32> but ElementwiseMappable does not.
  • ElementwiseMappable allows select %scalar_pred, %true_tensor, %false_tensor but SameOperandsAndResultShape does not.

SameOperandsAndResultShape is redundant with ElementwiseMappable when
we can prove that the mixed scalar/non-scalar case cannot happen. In
those situations, `ElementwiseMappable & SameOperandsAndResultShape ==
ElementwiseMappable`:

  • Ops with 1 operand: the case of mixed scalar and non-scalar operands cannot happen since there is only one operand.
  • When SameTypeOperands is also present, the mixed scalar/non-scalar operand case cannot happen.

Diff Detail

Event Timeline

silvas created this revision.Nov 12 2020, 5:24 PM
silvas requested review of this revision.Nov 12 2020, 5:24 PM

SameOperandsAndResultShape allows `"foo"(%0) : tensor<2xf32> -> tensor<?xf32> but ElementwiseMappable does not.

Why isn't this allowed by ElementwiseMappable? This would be OK right "foo"(%0) : tensor<?xf32> -> tensor<?xf32> ? Not allowing a "looser" output for an op hinders shape inference possibilities in general (we can't refine a type).

SameOperandsAndResultShape allows `"foo"(%0) : tensor<2xf32> -> tensor<?xf32> but ElementwiseMappable does not.

Why isn't this allowed by ElementwiseMappable? This would be OK right "foo"(%0) : tensor<?xf32> -> tensor<?xf32> ? Not allowing a "looser" output for an op hinders shape inference possibilities in general (we can't refine a type).

"foo"(%0) : tensor<?xf32> -> tensor<?xf32> is ok.

"foo"(%0) : tensor<2xf32> -> tensor<?xf32> is not ok. We specifically prohibit it so that folding works across scalar/vector/tensor. Otherwise we would have to potentially insert a cast when folding, which would make it no longer be a "fold".

From the rationale:

/// - 3. guarantees that folding can be done across scalars/vectors/tensors
///   with the same pattern, as otherwise lots of special handling of type
///   mismatches would be needed.

It shouldn't hinder shape inference because it has a completely trivial shape transfer function. It also leads to a nice separation of conerns, because you also don't have awkward situations like ?xf32 -> 2xf32 "injecting information" about the static shape of the output -- that is orthogonal, and we have tensor_cast to do that.

From the rationale:

/// - 3. guarantees that folding can be done across scalars/vectors/tensors
///   with the same pattern, as otherwise lots of special handling of type
///   mismatches would be needed.

I don't quite get what is the issue with this ?->2 ? Can you elaborate a bit? Do you have an example of a pattern maybe?

It shouldn't hinder shape inference because it has a completely trivial shape transfer function.

It does in the sense that you can't "stop" there: you are forced to generate invalid IR and "hope" to visit this op, or worse: you have to insert a cast before this op to preserve the ?

It also leads to a nice separation of conerns, because you also don't have awkward situations like ?xf32 -> 2xf32 "injecting information" about the static shape of the output -- that is orthogonal, and we have tensor_cast to do that.

Injecting information can be dealt with separately, but in general our experience in TensorFlow transformation has been that we quickly hit many verifier issues when being overly "rigid" about these.

From the rationale:

/// - 3. guarantees that folding can be done across scalars/vectors/tensors
///   with the same pattern, as otherwise lots of special handling of type
///   mismatches would be needed.

I don't quite get what is the issue with this ?->2 ? Can you elaborate a bit? Do you have an example of a pattern maybe?

Consider xor(xor(a, b), b) -> a

%0 = xor %lhs, %rhs : (tensor<?xindex>, tensor<2xindex>) -> tensor<2xindex>
%1 = xor %0, %rhs : (tensor<2xindex>, tensor<2xindex>) -> tensor<2xindex>
"only_accepts_2xindex"(%1) : (tensor<2xindex>) -> ()

only_accepts_2xindex might be defined so that it only accepts tensor<2xindex>. If you apply the fold, you will replace %1 with %lhs, which will create invalid IR.

It shouldn't hinder shape inference because it has a completely trivial shape transfer function.

It does in the sense that you can't "stop" there: you are forced to generate invalid IR and "hope" to visit this op, or worse: you have to insert a cast before this op to preserve the ?

I expect most shape inference algorithms will keep the standard "DenseMap<Value, LatticeVal>" rathern than doing on-the-fly updates? Generally doing on-the-fly updates is not considered good practice for IR transformations.

Also, on-the-fly updates don't compose well. E.g. if you have an op that accepts tensor<?xindex> but rejects tensor<2xindex> (that's a real example from the shape dialect) or the reverse, you could create invalid IR. (see also: https://llvm.discourse.group/t/eliding-casts/1546/5)

It also leads to a nice separation of conerns, because you also don't have awkward situations like ?xf32 -> 2xf32 "injecting information" about the static shape of the output -- that is orthogonal, and we have tensor_cast to do that.

Injecting information can be dealt with separately, but in general our experience in TensorFlow transformation has been that we quickly hit many verifier issues when being overly "rigid" about these.

Again, I think that's a problem with the fact that that algorithm is doing in-place updates rather than a proper analysis + transform separation. Same issue that made LLVM's OldGVN so hard to maintain.

From the rationale:

/// - 3. guarantees that folding can be done across scalars/vectors/tensors
///   with the same pattern, as otherwise lots of special handling of type
///   mismatches would be needed.

I don't quite get what is the issue with this ?->2 ? Can you elaborate a bit? Do you have an example of a pattern maybe?

Consider xor(xor(a, b), b) -> a

%0 = xor %lhs, %rhs : (tensor<?xindex>, tensor<2xindex>) -> tensor<2xindex>
%1 = xor %0, %rhs : (tensor<2xindex>, tensor<2xindex>) -> tensor<2xindex>
"only_accepts_2xindex"(%1) : (tensor<2xindex>) -> ()

only_accepts_2xindex might be defined so that it only accepts tensor<2xindex>. If you apply the fold, you will replace %1 with %lhs, which will create invalid IR.

Right, you can't fold without checking the type opaquely, but I don't see this necessarily as a problem: that is part of supporting dynamic shapes.

It does in the sense that you can't "stop" there: you are forced to generate invalid IR and "hope" to visit this op, or worse: you have to insert a cast before this op to preserve the ?

I expect most shape inference algorithms will keep the standard "DenseMap<Value, LatticeVal>" rathern than doing on-the-fly updates? Generally doing on-the-fly updates is not considered good practice for IR transformations.

Sure, but think about other kind of transformations, whether it is canonicalization or inlining or anything that can refine the shape in the process.
It seems awkward to me in general that an op would accept ? but it would be invalid IR to infer a static dimension.

silvas added a comment.EditedNov 13 2020, 10:15 AM

Some comments inline, but one major point here is that in the view of a bigger system, ElementwiseMappable ops are "TCP" level, and I think most of the shape inference will be happening at the "TCF" level (analogous to "tf" level). The reason for this is very simple: when lowering from TCF to TCP, we have to insert error handling / guard code, and possibly do multiversioning and other such stuff to deal with multiple cases (e.g. ops like aten.matmul or numpy.dot that do different things depending on the operand ranks).

Refining shapes at TCF level greatly reduces the code we need to emit. I don't expect that there will be very many shape refinement opportunities left at the TCP level, which is where ElementwiseMappable ops live.

I think at TCF level, we are likely to formalize something like I suggest in the "eliding casts" thread -- a way to describe that an operand can be legally refined to a different shape (otherwise we need to introduce a cast back to the original shape), which is a separate property from knowing the shape transfer function. This puts us on a firmer ground for shape inference and by construction avoid the issues like the ones you mention that TF shape inference currently runs into.

That is, I expect that our future shape inference passes will calculate a DenseMap<Value, ShapeLattice> generated by an analysis phase, and then have a transform phase that analyzes the legality of actually inserting the newly refined shapes into the IR, including potentially inserting casts around ops we can't trivially update (which includes public function boundaries, and situations where phi operands were inferred to be different types along different control flow branches).

It does in the sense that you can't "stop" there: you are forced to generate invalid IR and "hope" to visit this op, or worse: you have to insert a cast before this op to preserve the ?

I expect most shape inference algorithms will keep the standard "DenseMap<Value, LatticeVal>" rathern than doing on-the-fly updates? Generally doing on-the-fly updates is not considered good practice for IR transformations.

Sure, but think about other kind of transformations, whether it is canonicalization or inlining or anything that can refine the shape in the process.
It seems awkward to me in general that an op would accept ? but it would be invalid IR to infer a static dimension.

Do we have examples of where canonicalization or inlining blindly refine a shape without checking that the users can accept the new shape?

In general I agree that ops that accept tensor<?xf32> should probably also accept tensor<2xf32>. (though probably not calls/functions, since those check Type by pointer equality, and I think we want to keep that?)

Thus, I think it is plausible at this level to allow "foo" : tensor<2xf32> -> tensor<?xf32>. That is, the op can "lose shape information", so that folds can only refine the shape.

The case for supporting "foo" : tensor<?xf32> -> tensor<2xf32> is much harder to justify on that basis, because folds can then "de-refine" the shape. It seems totally plausible for mychip.fft to accept only 256xf32, and verifier-fail on ?xf32.

In general I agree that ops that accept tensor<?xf32> should probably also accept tensor<2xf32>. (though probably not calls/functions, since those check Type by pointer equality, and I think we want to keep that?)

Calls are a good example of this: we struggled as well with this in TF and I'd rather have the call operation be able to take tensor<2xf32> when the function takes tensor<?xf32> (lowering this is then ABI dependent when handling the call operation).

In general I agree that ops that accept tensor<?xf32> should probably also accept tensor<2xf32>. (though probably not calls/functions, since those check Type by pointer equality, and I think we want to keep that?)

Calls are a good example of this: we struggled as well with this in TF and I'd rather have the call operation be able to take tensor<2xf32> when the function takes tensor<?xf32> (lowering this is then ABI dependent when handling the call operation).

That's an interesting direction.

Anyway, for now, the need for ensuring folds result in the same Type is a reality and that is why ElementwiseMappable is defined that way. Do you have any objections to landing this patch?

Anyway, for now, the need for ensuring folds result in the same Type is a reality and that is why ElementwiseMappable is defined that way.

Well we agree that we can't replace a static shape with an unknown one, but you haven't explained why it is a good tradeoff to go this route instead of fold correctly handling the types?

Anyway, for now, the need for ensuring folds result in the same Type is a reality and that is why ElementwiseMappable is defined that way.

Well we agree that we can't replace a static shape with an unknown one, but you haven't explained why it is a good tradeoff to go this route instead of fold correctly handling the types?

Fold cannot correctly handle the types because it would involve inserting casts. The alternative would be to write all folds as canonicalizations, or have the folds just fail in these situations. I'd rather avoid those.

The current behavior is stricter than the alternative, and we an always loosen it in the future if we see a need.

mehdi_amini accepted this revision.Nov 23 2020, 4:51 PM

(I am not convinced by your reasoning here)

This revision is now accepted and ready to land.Nov 23 2020, 4:51 PM