[mlir:Async] Implement recursive async work splitting for scf.parallel…

Authored by ezhulenev on Jun 24 2021, 5:27 AM.


[mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass)

Depends On D104780

Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks.

Algorithm outline:

  1. Collapse scf.parallel dimensions into a single dimension
  2. Compute the block size for the parallel operations from the 1d problem size
  3. Launch parallel tasks
  4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space
  5. Each parallel task computes the original parallel operation body using scf.for loop nest

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D104850