Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/Vector/VectorOps.cpp | ||
---|---|---|
2738–2739 | There may be higher dim in the tensor that are ignored. I think we should check the affine map here instead? %vread = vector.transfer_read %t[%c0, %c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (0, d3, d2, 0)> } : tensor<4x4x4x5xf32>, vector<8x5x4x2xf32> You basically want the affine map to be a minor identity. | |
2766 | you also need to handle potential Mask. | |
2769 | I don't understand how you can be sure you'll end up with an identity map? What if some dimensions were missing in the original map like in this example: in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2) -> (0, d2, d1, 0)> } : tensor<4x4x5xf32>, vector<8x5x4x2xf32> | |
mlir/test/Dialect/Vector/canonicalize.mlir | ||
1039 | Note that this case is already handled by this pattern: If your pattern is a superset then we should remove that one. |
mlir/lib/Dialect/Vector/VectorOps.cpp | ||
---|---|---|
2717 | typo: than | |
2725 | Extracting the broadcast information LGTM but I am not sure why you also extract the permutation in 1 shot here. I.e. I'd just drop the zeros from the permutation map to produce: // %vread = vector.transfer_read %t[%c0, %c0], %cst { // in_bounds = [true, true, true, true], // permutation_map = affine_map<(d0, d1) -> (d1, d0)> // } : tensor<4x5xf32>, vector<5x4xf32> // %1 = vector.broadcast %0 // : vector<5x4xf32> to vector<8x2x5x4xf32> // %2 = vector.transpose %1, [0, 2, 3, 1] // : vector<8x2x5x4xf32> to vector<8x5x4x2xf32> Then populateVectorTransferPermutationMapLoweringPatterns should kick in to drop the permutation. I haven't tested the interaction specifically myself but experience says the more orthogonal the patterns are the least interference they exhibit. This also makes me think that IIRC some f the transpose patterns also managed some small amount of broadcast; I'd drop the broadcast related part from the other pattterns and rely on composition + orthogonality. | |
2738 | I am not sure why this constraint? With the more I described above (i.e. this patterns just drops zeros and adds broadcast + transpose based on the zeros dropped), I think it would work more generally? | |
2749 | Not yet looking at impl. details here until the previous points I made have been addressed or refuted. Still I'd note that I'd expect to see something "more compositional and reusable" along the lines: auto mapZero = readOp.permutation_map().select(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d1, d2, d4) auto mapNonZero = readOp.permutation_map().selectNot(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d3, d5) VectorType targetVectorType = Vector::get(.... applyPermutationMap(mapNonZero, originalVectorShape), ...); transferRead_clone_with_permutation_map(mapNonZero.compose(permutation_map()))_and_result_type(targetVectorType); SmallVector<int> broadcastShape = applyPermutationMap(mapZero, originalVectorShape); broadcastShape.append(targetVectorType.shape()); VectorType broadcastVectorShape = Vector::get(.... broadcastShape, ...); b.create<vector::BroadcastOp>(...); // Finally transpose the vector in a similar fashion. where select/selectNot are new AffineMap utilities that returns an affineMap that is a permutation can be composed with the original one to only keep the exprs you are interested in: AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) Any better name for select/selectNot or better utility functions that recover similar similar information are most welcome. | |
2766 | I'd just bail on masks in a first impl. The inbounds attribute should be handled properly. | |
2769 | I believe the route I proposed would handle all cases ? |
Actually, I think the current TransferReadPermutationLowering and TransferOpReduceRank are good enough. I wish I knew about them before implementing this.
typo: than