Depends On D115263
By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.
Paths
| Differential D115436
[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations ClosedPublic Authored by ezhulenev on Dec 9 2021, 2:52 AM.
Details Summary Depends On D115263 By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.
Diff Detail
Event TimelineHerald added subscribers: sdasgup3, wenzhicui, wrengr and 20 others. · View Herald TranscriptDec 9 2021, 2:52 AM This revision is now accepted and ready to land.Dec 9 2021, 6:44 AM This revision was landed with ongoing or failed builds.Dec 9 2021, 6:51 AM Closed by commit rG49ce40e9ab25: [mlir] AsyncParallelFor: align block size to be a multiple of inner loops… (authored by ezhulenev). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 393144 mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
mlir/test/Dialect/Async/async-parallel-for-compute-fn.mlir
mlir/test/Integration/Dialect/Async/CPU/test-async-parallel-for-2d.mlir
|