This is an archive of the discontinued LLVM Phabricator instance.

[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations
ClosedPublic

Authored by ezhulenev on Dec 9 2021, 2:52 AM.

Details

Summary

Depends On D115263

By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks.

Diff Detail

Event Timeline

ezhulenev created this revision.Dec 9 2021, 2:52 AM
ezhulenev requested review of this revision.Dec 9 2021, 2:52 AM
ezhulenev edited the summary of this revision. (Show Details)Dec 9 2021, 2:53 AM
ezhulenev added a reviewer: bkramer.
ezhulenev updated this revision to Diff 393095.Dec 9 2021, 3:21 AM

Reset number of unrallable loops if decide not to uroll

This revision is now accepted and ready to land.Dec 9 2021, 6:44 AM
This revision was landed with ongoing or failed builds.Dec 9 2021, 6:51 AM
This revision was automatically updated to reflect the committed changes.