This extends `transform.structured.tile_reduction_using_forall` to

operations with multiple reduction dimensions as implied by the thread

counts. This enables reduction splitting strategies for operations with

higher dimensionality.

731–736 | Let me know if the explanation here makes sense. |

652 | Can we write this with LinalgOp::getReductionDims and a followup filter ? |

652 | It's both getting the reduction dims and also identifying which thread counts in the |

thanks for generalizing this transform!

725 | Can we extract this in a meaningfully named helper function? | |

407 | it is weird to me that you only need to specify 2 entries in num_threads here, I would have expected you'd need It would be good to also have a |

407 | This is intentional, as I'm trying to tile parallel dimensions as well as reductions here. As far as I could tell, this was never explicitly prohibited by the pattern and I find it convenient to be able to tile both at the same time (and otherwise avoid nested foralls which interact poorly with distribution later on). The interleaved parallel case is a good idea though, will add a test for it. In terms of forcing rank to align, unlike the scf.for version of this pattern, additional tile sizes require corresponding entries in the mapping which restricts the mapping options for distribution. For example, now I need to distribute explicitly along |

407 | Re tiling parallel and reduction at once, this is a great idea indeed, thanks for pushing on this. |