This is an archive of the discontinued LLVM Phabricator instance.

[mlir:Async] Change async-parallel-for block size/count calculation
ClosedPublic

Authored by ezhulenev on Jun 29 2021, 8:10 AM.

Details

Summary

Depends On D105037

Avoid creating too many tasks when the number of workers is large.

Diff Detail

Event Timeline

ezhulenev created this revision.Jun 29 2021, 8:10 AM
ezhulenev requested review of this revision.Jun 29 2021, 8:10 AM
ezhulenev edited the summary of this revision. (Show Details)Jun 29 2021, 8:15 AM
ezhulenev added a reviewer: herhut.
herhut added inline comments.Jun 29 2021, 8:52 AM
mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
658

Can kMaxOversharding be removed?

663

Does this mean that for large number of threads, we do not use all of them? That is because we assume we have too many threads for the available memory bandwidth?

668

Should this be std::max?

ezhulenev updated this revision to Diff 355268.Jun 29 2021, 9:22 AM

Fix min/max bug

ezhulenev marked 2 inline comments as done.Jun 29 2021, 9:22 AM
ezhulenev added inline comments.
mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
663

Yes. This is an ugly hack because of an absence of a good cost model stolen from Eigen: https://gitlab.com/libeigen/eigen/-/blob/master/unsupported/Eigen/CXX11/src/Tensor/TensorContractionThreadPool.h#L241-246.

Assumption is that it's rarely beneficial to use all 64+ threads for problem that reads/writes from/to memory, and that you likely have competing tasks that can use idle threads.

I plan to implement something better than this for lowering from linalg.

herhut accepted this revision.Jun 29 2021, 9:31 AM

Thanks for the explanation!

This revision is now accepted and ready to land.Jun 29 2021, 9:31 AM
This revision was automatically updated to reflect the committed changes.
ezhulenev marked an inline comment as done.

After this change kMaxOvershading is unused, causing warnings in the clang build:

/usr/bin/clang++ -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=0 -DMLIR_ROCM_CONVERSIONS_ENABLED=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/mlir/lib/Dialect/Async/Transforms -I/mnt/vss/_work/2/s/mlir/lib/Dialect/Async/Transforms -Iinclude -I/mnt/vss/_work/2/s/llvm/include -I/mnt/vss/_work/2/s/mlir/include -Itools/mlir/include -fPIC -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -Werror=global-constructors -g  -fno-exceptions -fno-rtti -std=c++14 -MD -MT tools/mlir/lib/Dialect/Async/Transforms/CMakeFiles/obj.MLIRAsyncTransforms.dir/AsyncParallelFor.cpp.o -MF tools/mlir/lib/Dialect/Async/Transforms/CMakeFiles/obj.MLIRAsyncTransforms.dir/AsyncParallelFor.cpp.o.d -o tools/mlir/lib/Dialect/Async/Transforms/CMakeFiles/obj.MLIRAsyncTransforms.dir/AsyncParallelFor.cpp.o -c /mnt/vss/_work/2/s/mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp
/mnt/vss/_work/2/s/mlir/lib/Dialect/Async/Transforms/AsyncParallelFor.cpp:116:28: error: unused variable 'kMaxOversharding' [-Werror,-Wunused-const-variable]
  static constexpr int32_t kMaxOversharding = 4;

If the variable is no longer needed, can we remove it?