This is an archive of the discontinued LLVM Phabricator instance.

Introduce `tensor.pack` and `tensor.unpack` operations
ClosedPublic

Authored by chelini on Nov 16 2022, 4:35 AM.

Details

Summary

Pack and Unpack return new tensors within which the individual elements
are reshuffled according to the packing specification. This has the
consequence of modifying the canonical order in which a given operator
(i.e., Matmul) accesses the individual elements. After bufferization,
this typically translates to increased access locality and cache
behavior improvement, e.g., eliminating cache line splitting.

Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Han-Chung Wang <hanchung@google.com>

RFC: https://discourse.llvm.org/t/rfc-tensor-pack-and-tensor-unpack/66408/1

Diff Detail

Event Timeline

chelini created this revision.Nov 16 2022, 4:35 AM
chelini requested review of this revision.Nov 16 2022, 4:35 AM
rengolin added inline comments.Nov 16 2022, 5:20 AM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1771

Shouldn't we have a similar for unpack? getUnpackedType?

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3187

Merge ifs?

3211

why is this a lambda?

mlir/test/Dialect/Tensor/ops.mlir
354

Missing CHECK lines for the third test

Generally looks good, thanks @chelini !

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1717

Can you rephrase and make it sound more from the point of view of the IR attached to the op?

The pack operation converts an `input` N-D tensor into a 2N-D tensor with tiled and packed layout.
The mandatory `inner_dims_pos` attribute specifies the order in which the original N dimensions are permuted to obtain the data order inside the tile.
The optional `outer_dims_pos` ...
The optional `padding_value` operand specifies a padding value at the boundary on non-perfectly divisible dimensions:
  - if absent: ... UB
  - if present: ...
1720

s/tiled loops/tiled data dimensions, there are no loops here

1737

s/outer loops/outer data dimensions, there are no loops here

1771

We usualy call this "inferXXXType" in other places.

1787

similar description to what I suggested above in shorter form.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3070

better name plz: isInvalidPackingPosSpecification ?

3116

use isDynamic plz, we want to remove leaky uses of the magic constant.

3167

avoid leaky magic values plz

3213

I could swear I had a factored out util that implemented a templated form of this .. try to find it ?

This revision is now accepted and ready to land.Nov 16 2022, 12:24 PM
mehdi_amini added inline comments.Nov 16 2022, 12:55 PM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1674

Pure is reserved for operations that don't have any undefined behavior, this does not seem to be the case here.

1695

I don't think we should have member methods that looks like accessors but are actually "heavy processing". Better leave this to free functions (same everywhere else).

1751

Please use DenseI64ArrayAttr. We shouldn't use I64ArrayAttr anywhere moving forward I think.

1782

Typo: unpack

mlir/test/Dialect/Tensor/ops.mlir
328

Please use CHECK-LABEL

335

Please minimize the CHECK to the absolute minimum needed for what you intend to test.

mehdi_amini requested changes to this revision.Nov 16 2022, 12:55 PM
This revision now requires changes to proceed.Nov 16 2022, 12:55 PM
hanchung requested changes to this revision.Nov 16 2022, 2:24 PM
hanchung added inline comments.
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1717

note that we don't require the op to pack all the dimension. It is not always packing a N-D tensor into a 2N-D tensor. E.g., we can pack something like NHWC to NHWChw.

1720

+1, tiled data dimensions makes more sense to me. There are no loops.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3010–3012

Returns, and maybe we can format it a bit like

/// Returns ... is invalid when:
///   a) ..
///   b) ..
///   c) ..
3014

it is a redundant comment to me. I'd delete it.

3030

It also accepts equal case. How about renaming it to areAllInBound?

3075

s/less or equal than/less than or equal to

3170–3176

[optional] I'd use continue for having less indents. It can save one level of nesting. E.g.,

if (it == dimTileMapping.end())
  continue;
Optional<int64_t> cstTileValue = ...
if (!cstTileValue)
  continue;
if (...)
  return true;
3187

It's also used below. Maybe just declare a variable and merge the checks.

auto paddingValue = getPaddingValue();
if (paddingValue && ... ){ 
  ..
}
3211

I think it's worth for making it a method. We'll need interchange and undoInterchange for tiling implementation. We can add undoInterchange method when upstreaming tiling implementation.

FYI that here is the implementation used in IREE: https://github.com/iree-org/iree/blob/3625adf98f0b87c24a89f8d4101550c1ef1eea44/llvm-external-projects/iree-dialects/include/iree-dialects/Dialect/LinalgExt/Utils/Utils.h#L29-L50

RE Nicolas: I did not find a similar thing when prototyping it in IREE. Maybe I searched with bad keyword. The keyword I used is interchange, like rg --ignore-case 'interchange' **/*.h. :-(

mlir/test/Dialect/Tensor/ops.mlir
320

We should use // ----- to split tests.

I don't know why --split-input-file is not added in the test command (i.e., line 1), but we should add it at least for consistency. That's how we write the tests in this file.

mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1695

This is not something specific to this PR.

This is something we have more generally and predates the automatic generation of prefixed getXXX as accessors everywhere AFAIR.

Could you please start an RFC with a call for general cleanup and a proposal for properly naming these getters of derived information?
I don't think free functions is reasonable here, there is a prohibitive cognitive cost in finding those functions when not attached to the op directly.
Additionally, does that thinking also carry to interfaces?

chelini updated this revision to Diff 476065.Nov 17 2022, 2:55 AM
chelini marked 23 inline comments as done.

Use NoMemoryEffect instead of Pure as the operation may trigger UB.

Update pack and unpack documentation as suggested.

Switch to use DenseI64ArrayAttr for all attributes but static_inner_tiles.

Rename getPackedType to inferPackedType.

Rename isSmallerThan to areAllInBound.

Rename isInvalid to isInvalidPackingPosSpecification.

Avoid leaking magic constant.

Update tests and add some dynamic tests.

Add co-authors.

chelini edited the summary of this revision. (Show Details)Nov 17 2022, 2:56 AM
chelini added inline comments.Nov 17 2022, 2:56 AM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1751

Update outer_dims_perm and inner_dims_pos to use DenseI64ArrayAttr. Moving static_inner_tiles is a bit more involved as we need to update parseDynamicIndexList which is used by other ops. I can follow-up with a PR if we want to use DenseI64ArrayAttr in the future.

rengolin added inline comments.Nov 17 2022, 3:24 AM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1695

Definitely not for this PR. This is a much larger conversation, and whoever takes that task, will easily (and mechanically) convert these methods, too.

chelini updated this revision to Diff 476086.Nov 17 2022, 4:13 AM

Model UB behavior for pack and unpack

chelini updated this revision to Diff 476087.Nov 17 2022, 4:16 AM

Update comment.

chelini updated this revision to Diff 476101.Nov 17 2022, 5:55 AM

Drop lambda and use free function for interchange.

hanchung accepted this revision.Nov 17 2022, 9:51 AM

LGTM, just few nits!

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3195

nit: !paddingValue

3211

llvm style nit: do not use braces for single statement. I.e., change it to

for (...)
  vec[en.index() + offset] = ...

https://llvm.org/docs/CodingStandards.html#don-t-use-braces-on-simple-single-statement-bodies-of-if-else-loop-statements

3234–3235

I don't see the point of is available because there are no checks. Maybe we can just drop this comment. It's obvious to me because the code describes what's happening.

If you want to keep the comment, how about Swap outer tiled data dimensions.

mlir/test/Dialect/Tensor/ops.mlir
320

I meant add // ----- to the new tests. Sorry for the ambiguous comment. The change looks good to me, I was trying not add too much non-specific things to the revision. Any way, thanks for fixing the other parts in this file!

chelini updated this revision to Diff 476432.Nov 18 2022, 5:08 AM

Drop braces

Drop confusing comment

Address nit

chelini added inline comments.Nov 18 2022, 5:12 AM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1695

@mehdi_amini other than this, do you further comments/suggestions on the PR?

rengolin accepted this revision.Nov 21 2022, 12:45 AM

LGTM too, with a nit.

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3076

you changed the comment but not user visible the error message

chelini updated this revision to Diff 476820.Nov 21 2022, 12:50 AM

Fix user visible comment.

chelini marked an inline comment as done.Nov 21 2022, 12:51 AM
nicolasvasilache accepted this revision.Nov 21 2022, 4:54 AM
nicolasvasilache added inline comments.
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1695

I sync'ed with @mehdi_amini offline and he is OOO for 2 weeks, I do not expect him to check this thread (but I may be wrong ..)
I'd favor not blocking this for 2 weeks and landing as is with followup post-commit review / improvements.

1751

+1 yes we need to gain that muscle memory indeed ..

Any comment/suggestion on the 2 issues I have been seeing re DenseXXArrayAttr in https://discourse.llvm.org/t/rfc-inconsistency-between-dynamic-and-static-attributes-i64-v-index/66612 ?

mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
26

plz double check ou really need all new includes.

chelini updated this revision to Diff 476897.Nov 21 2022, 7:25 AM
  • Rebase
  • inferPackedType was also handling the memref case as in IREE; we have the

operations working at both abstractions; this is unnecessary here. Remove logic
and drop TypeSwitch include.

chelini marked an inline comment as done.Nov 21 2022, 7:28 AM
chelini added inline comments.
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1751

I can follow up with this.

mehdi_amini added inline comments.Nov 21 2022, 11:20 AM
mlir/include/mlir/Dialect/Tensor/IR/TensorOps.td
1695

I do not expect him to check this thread (but I may be wrong ..)

:)

I'd favor not blocking this for 2 weeks and landing as is with followup post-commit review / improvements.

Seems reasonable, my comments aren't intrinsic about the direction this PR is going to fundamentally and is really about some "coding convention" issues that we'll address later.

mehdi_amini resigned from this revision.Nov 21 2022, 11:20 AM
This revision is now accepted and ready to land.Nov 21 2022, 11:20 AM
This revision was automatically updated to reflect the committed changes.
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3211
template <typename T, unsigned N>
void applyPermutationToVector(SmallVector<T, N> &inVec,
                              ArrayRef<int64_t> permutation) {

In IndexingUtils.cpp/h

hanchung added inline comments.Nov 23 2022, 1:46 PM
mlir/lib/Dialect/Tensor/IR/TensorOps.cpp
3211

got it, thanks!