This patch implements the benchmarking of kernel execution with varying values for the number of threads per block and blocks per grid. This functionality is activated if the environment variable BENCHMARK_KERNEL is set.
In the benchmarking mode, the patch also implements the logging of the benchmark results (the kernel info, runtime parameters, and execution time) into the CSV file specified by the environment variable BENCHMARK_KERNEL_LOG.
This patch requires:
- https://reviews.llvm.org/D108528 for measuring elapsed time via CUDA events,
- https://reviews.llvm.org/D103770 for passing kernel features from the analysis stage to the runtime.
clang-format not found in user’s local PATH; not linting file.