This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Benchmarking the execution of kernel with different runtime parameters
DraftPublic

Authored by ksidorov on Sep 16 2021, 7:01 AM.
This is a draft revision that has not yet been submitted for review.

Details

Reviewers
jdoerfert
Summary

This patch implements the benchmarking of kernel execution with varying values for the number of threads per block and blocks per grid. This functionality is activated if the environment variable BENCHMARK_KERNEL is set.

In the benchmarking mode, the patch also implements the logging of the benchmark results (the kernel info, runtime parameters, and execution time) into the CSV file specified by the environment variable BENCHMARK_KERNEL_LOG.

This patch requires:

Diff Detail

Event Timeline

ksidorov created this revision.Sep 16 2021, 7:01 AM
jdoerfert retitled this revision from Benchmarking the execution of kernel with different runtime parameters to [OpenMP] Benchmarking the execution of kernel with different runtime parameters.Sep 16 2021, 7:02 AM
jdoerfert added a reviewer: jdoerfert.
ksidorov edited the summary of this revision. (Show Details)Sep 16 2021, 7:26 AM
jdoerfert added inline comments.
openmp/libomptarget/plugins/cuda/src/rtl.cpp
109

We eventually need a common header or table gen file that defines the size and order of these. LLVM and consumers need to agree.

870

We can test this actually. So we need a runtime test that looks for these in the plugin output. Check other tests for rutime debug output, @jhuber6 can help if you need advice.

That said, this should probably be moved out of the cuda plugin and into libomptarget. At least the functionality, maybe the entire logic. Unsure if the plugins need to be aware of the features and the logic to rerun a kernel. If not, everything should go into libomptarget, if they do, we move the extraction and schedule logic into libomptarget and call it from here.

1256

This should go into a separate commit/review I think. And it should go into a helper function.
I feel the launch code (which is what is in the else code) should be a helper function too.
That way you can invoke it or invoke the wrapper (above code) which will invoke the launch code internally.