scf.parallel is currently not a good fit for tiling on tensors.
Instead provide a path to parallelism directly through scf.for.
For now, this transformation ignores the distribution scheme and always does a block-cyclic mapping (where block is the tile size).
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Comment Actions
Change is fairly straight-forward, but not sure what the issue with scf.parallel is. Is it the semantics of the op or the implementation of the distribution logic. If it is the latter, then maybe I can take a look there. Change looks fine as is though.