Page MenuHomePhabricator

[mlir] Rewrite transfer_read as transfer_read + broadcast + transpose.
AbandonedPublic

Authored by pifon2a on Sep 30 2021, 6:52 AM.

Diff Detail

Event Timeline

pifon2a created this revision.Sep 30 2021, 6:52 AM
pifon2a requested review of this revision.Sep 30 2021, 6:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2021, 6:52 AM
ThomasRaoux added inline comments.Sep 30 2021, 9:24 AM
mlir/lib/Dialect/Vector/VectorOps.cpp
2738–2739

There may be higher dim in the tensor that are ignored. I think we should check the affine map here instead?
for instance what about this case:

%vread = vector.transfer_read %t[%c0, %c0, %c0, %c0], %cst {
   in_bounds = [true, true, true, true],
   permutation_map = affine_map<(d0, d1, d2, d3) -> (0, d3, d2, 0)>
  } : tensor<4x4x4x5xf32>, vector<8x5x4x2xf32>

You basically want the affine map to be a minor identity.

2766

you also need to handle potential Mask.

2769

I don't understand how you can be sure you'll end up with an identity map? What if some dimensions were missing in the original map like in this example:
%vread = vector.transfer_read %t[%c0, %c0, %c0], %cst {

 in_bounds = [true, true, true, true],
 permutation_map = affine_map<(d0, d1, d2) -> (0, d2, d1, 0)>
} : tensor<4x4x5xf32>, vector<8x5x4x2xf32>
mlir/test/Dialect/Vector/canonicalize.mlir
1039

Note that this case is already handled by this pattern:
https://github.com/llvm/llvm-project/blob/7362cc5ef50b5ebcbb11380ab13a179902c7b8be/mlir/lib/Dialect/Vector/VectorTransforms.cpp#L2814

If your pattern is a superset then we should remove that one.

nicolasvasilache requested changes to this revision.Sep 30 2021, 11:47 PM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Vector/VectorOps.cpp
2717

typo: than

2725

Extracting the broadcast information LGTM but I am not sure why you also extract the permutation in 1 shot here.
Is there a reason to not keep this pattern orthogonal to populateVectorTransferPermutationMapLoweringPatterns?

I.e. I'd just drop the zeros from the permutation map to produce:

//  %vread = vector.transfer_read %t[%c0, %c0], %cst {
//    in_bounds = [true, true, true, true],
//    permutation_map = affine_map<(d0, d1) -> (d1, d0)>
//  } : tensor<4x5xf32>, vector<5x4xf32>
//  %1 = vector.broadcast %0
//    : vector<5x4xf32> to vector<8x2x5x4xf32>
//  %2 = vector.transpose %1, [0, 2, 3, 1]
//    : vector<8x2x5x4xf32> to vector<8x5x4x2xf32>

Then populateVectorTransferPermutationMapLoweringPatterns should kick in to drop the permutation.

I haven't tested the interaction specifically myself but experience says the more orthogonal the patterns are the least interference they exhibit.

This also makes me think that IIRC some f the transpose patterns also managed some small amount of broadcast; I'd drop the broadcast related part from the other pattterns and rely on composition + orthogonality.

2738

I am not sure why this constraint?

With the more I described above (i.e. this patterns just drops zeros and adds broadcast + transpose based on the zeros dropped), I think it would work more generally?

2749

Not yet looking at impl. details here until the previous points I made have been addressed or refuted.

Still I'd note that I'd expect to see something "more compositional and reusable" along the lines:

auto mapZero = readOp.permutation_map().select(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d1, d2, d4)
auto mapNonZero = readOp.permutation_map().selectNot(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d3, d5)
VectorType targetVectorType = Vector::get(.... applyPermutationMap(mapNonZero, originalVectorShape), ...);
transferRead_clone_with_permutation_map(mapNonZero.compose(permutation_map()))_and_result_type(targetVectorType);

SmallVector<int> broadcastShape = applyPermutationMap(mapZero, originalVectorShape);
broadcastShape.append(targetVectorType.shape());
VectorType broadcastVectorShape = Vector::get(.... broadcastShape, ...);
b.create<vector::BroadcastOp>(...);

// Finally transpose the vector in a similar fashion.

where select/selectNot are new AffineMap utilities that returns an affineMap that is a permutation can be composed with the original one to only keep the exprs you are interested in:

AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4)
AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4)

Any better name for select/selectNot or better utility functions that recover similar similar information are most welcome.

2766

I'd just bail on masks in a first impl.
We can transpose + extract from them in a followup CL as long as the TODO is here.

The inbounds attribute should be handled properly.

2769

I believe the route I proposed would handle all cases ?

This revision now requires changes to proceed.Sep 30 2021, 11:47 PM
pifon2a abandoned this revision.Oct 5 2021, 4:27 AM

Actually, I think the current TransferReadPermutationLowering and TransferOpReduceRank are good enough. I wish I knew about them before implementing this.