# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

mlir/lib/Dialect/Vector/VectorOps.cpp | ||
---|---|---|

2738–2739 | There may be higher dim in the tensor that are ignored. I think we should check the affine map here instead? %vread = vector.transfer_read %t[%c0, %c0, %c0, %c0], %cst { in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2, d3) -> (0, d3, d2, 0)> } : tensor<4x4x4x5xf32>, vector<8x5x4x2xf32> You basically want the affine map to be a minor identity. | |

2766 | you also need to handle potential Mask. | |

2769 | I don't understand how you can be sure you'll end up with an identity map? What if some dimensions were missing in the original map like in this example: in_bounds = [true, true, true, true], permutation_map = affine_map<(d0, d1, d2) -> (0, d2, d1, 0)> } : tensor<4x4x5xf32>, vector<8x5x4x2xf32> | |

mlir/test/Dialect/Vector/canonicalize.mlir | ||

1039 | Note that this case is already handled by this pattern: If your pattern is a superset then we should remove that one. |

mlir/lib/Dialect/Vector/VectorOps.cpp | ||
---|---|---|

2717 | typo: than | |

2725 | Extracting the broadcast information LGTM but I am not sure why you also extract the permutation in 1 shot here. I.e. I'd just drop the zeros from the permutation map to produce: // %vread = vector.transfer_read %t[%c0, %c0], %cst { // in_bounds = [true, true, true, true], // permutation_map = affine_map<(d0, d1) -> (d1, d0)> // } : tensor<4x5xf32>, vector<5x4xf32> // %1 = vector.broadcast %0 // : vector<5x4xf32> to vector<8x2x5x4xf32> // %2 = vector.transpose %1, [0, 2, 3, 1] // : vector<8x2x5x4xf32> to vector<8x5x4x2xf32> Then I haven't tested the interaction specifically myself but experience says the more orthogonal the patterns are the least interference they exhibit. This also makes me think that IIRC some f the transpose patterns also managed some small amount of broadcast; I'd drop the broadcast related part from the other pattterns and rely on composition + orthogonality. | |

2738 | I am not sure why this constraint? With the more I described above (i.e. this patterns just drops zeros and adds broadcast + transpose based on the zeros dropped), I think it would work more generally? | |

2749 | Not yet looking at impl. details here until the previous points I made have been addressed or refuted. Still I'd note that I'd expect to see something "more compositional and reusable" along the lines: auto mapZero = readOp.permutation_map().select(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d1, d2, d4) auto mapNonZero = readOp.permutation_map().selectNot(b.getAffineConstantExpr(0)); // Given (i, j, k, l)[] -> (j + k, i, i, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d3, d5) VectorType targetVectorType = Vector::get(.... applyPermutationMap(mapNonZero, originalVectorShape), ...); transferRead_clone_with_permutation_map(mapNonZero.compose(permutation_map()))_and_result_type(targetVectorType); SmallVector<int> broadcastShape = applyPermutationMap(mapZero, originalVectorShape); broadcastShape.append(targetVectorType.shape()); VectorType broadcastVectorShape = Vector::get(.... broadcastShape, ...); b.create<vector::BroadcastOp>(...); // Finally transpose the vector in a similar fashion. where AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) AffineMap AffineMap::select(AffineExpr); // Given (i, j, k, l)[] -> (i, i, j + k, l, i, i + j) and `i` returns (d0, d1, d2, d3, d4, d5)[] -> (d0, d1, d4) Any better name for | |

2766 | I'd just bail on masks in a first impl. The inbounds attribute should be handled properly. | |

2769 | I believe the route I proposed would handle all cases ? |

Actually, I think the current `TransferReadPermutationLowering` and `TransferOpReduceRank` are good enough. I wish I knew about them before implementing this.