Initially, the padding transformation and the related operation were only used
to guarantee static shapes of subtensors in tiled operations. The
transformation would not insert the padding operation if the shapes were
already static, and the overall code generation would actively remove such
"noop" pads. However, this transformation can be also used to pack data into
smaller tensors and marshall them into faster memory, regardless of the size
mismatches. In context of expert-driven transformation, we should assume that,
if padding is requested, a potentially padded tensor must be always created.
Update the transformation accordingly. To do this, introduce an optional
packing attribute to the pad_tensor op that serves as an indication that
the padding is an intentional choice (as opposed to side effect of type
normalization) and should be left alone by cleanups.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sorry I didn't see this earlier, but I don't think this direction makes sense. This op operates on tensors, which are value semantic, so terminology like "guaranteed to create a new tensor suitable for packing" simply don't make sense.
Having an attribute that prevents folding from happening gives me bad memories of e.g. tf.Identity being marked as having side effects to prevent folding. I would rather not go down that route.
I fail to see what doesn't make sense? Packing is done on tensor semantics too: this can be also called "tensor tiling" by opposition to "op tiling".
Having an attribute that prevents folding from happening gives me bad memories of e.g. tf.Identity being marked as having side effects to prevent folding. I would rather not go down that route.
I do not see the relation between the 2.
Marking tf.Identity as having side effects is clearly a hack misusing side effects for other purposes: the op still has no side effects.
Here, the semantics of the op are : it is illegitimate to fold this op. There is no discussion about side effects or any other interface.
This is much more similar to the "inbounds" attribute of vector.transfer ops: it is well-contained and part of the op semantics.
"the semantics of the op are : it is illegitimate to fold this op." - that is not a valid semantics. For example, if I prove that the output tensor and the input tensor have the same value, then SSA IR semantics + value semantics mean that it *must* be correct to replace the output with the input.
Specifically, it is impossible for these two functions to have different semantics (as long as linalg.pad_tensor is marked NoSideEffect) -- both return a value that is identical to their argument, and it is always legal to fold @with_pad to @without_pad per tensor value semantics:
func @with_pad(%arg0: tensor<5x6xf32>) -> tensor<5x6xf32> { %cst = constant 0.000000e+00 : f32 %0 = linalg.pad_tensor %arg0 packing low[0, 0] high[0, 0] { ^bb0(%arg1: index, %arg2: index): linalg.yield %cst : f32 } : tensor<5x6xf32> to tensor<5x6xf32> return %0 : tensor<5x6xf32> }
func @without_pad(%arg0: tensor<5x6xf32>) -> tensor<5x6xf32> { return %arg0 : tensor<5x6xf32> }
I share the same feeling as Sean: there is a confusion in the semantics definition of the operation and what should be "optimization hints". I'm not sure why the "guarantees" that are provided are semantically important here.
What would be a more correct phrasing for the semantics description in your opinion? We want the op to always result in a new value.
I don't really understand the argument about the operation having to be always folded away. This would effectively render impossible any clone or copy operation. Yet, there is a TensorCloneOp in IREE if I remember correctly and it isn't an always-folded noop. So there is clearly precedent and a use case for operations defining new values with same "content" as the existing values.
The additional attribute may indeed be interpreted as preventing some canonicalization wrt the semantics of the original op (or its version with the attribute unset). However, I argue that we just choose to define different canonical forms for the op depending on it having the attribute. Referring to the post our doc on canonicalization quotes in the opening paragraph - https://sunfishcode.github.io/blog/2018/10/22/Canonicalization.html - that describes how canonicalization and redundancy elimination get ambiguous: "<...> ultimately, in its purest form, canonicalization just focuses on removing unnecessary variation so that subsequent optimizations can be simpler", this change is exactly about removing, or rather shifting, variation to simplify subsequent optimization. There exist subsequent optimizations such as hoisting that are rendered significantly simpler by always having a fresh value that is always defined by pad_tensor.
I agree, I don't think this is a question about op semantics. It is about suppressing a transformation under certain circumstances. I think we can distinguish about which transformations are legal, and what transformations are desirable. Based on tensor value semantics and NoSideEffect and the definition of the content of the output of linalg.pad_tensor, we can conclude that the transformation "remove a pad when it is an identity" is always a legal transformation. I think if the attribute was simply named doNotFold or suppressFolds, with the meaning "advisory flag suggesting that folding the op is undesirable -- used internally by certain multi-step transformations to maintain certain invariants that would otherwise be broken by folding" then it would be a lot clearer.
There is precedent for things like this. e.g. llvm.ssa.copy intrinsic, used internally by PredicateInfo https://llvm.org/docs/LangRef.html#llvm-ssa-copy-intrinsic