According to the API contract, LinalgLoopDistributionOptions
expects to work on parallel iterators. When getting processor
information, only loop ranges for parallel dimensions should
be fed in. But right now after generating scf.for loop nests,
we feed in *all* loops, including the ones materialized for
reduction iterators. This can cause unexpected distribution
of reduction dimensions. This commit fixes it.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I am not sure how this would interfact with the case where some loops are not tiled. Would be good to try out examples from here where some loops are not generated as scf.parallel when the tile size is set to 0. In the same way, if the tile size is set to 0, then then loop wont be tiled and therefore not distributed.
Essentially trying to enforce the same contract of distribution that happens with scf.parallel. The proc_id are used for every scf.parallel encountered in the generated tiled loop nest. So if the tile size was set to 0, since it wont be lowered to an scf.parallel it wont be distributed. That does not seem to be the same contract above.
Actually, I take it back. I think this makes sense. Since the tile size is set to 0, the loop wont be generated. Then the non-parallel loops are filtered out.