This is an archive of the discontinued LLVM Phabricator instance.

[libomptarget] [amdgpu] Foundation for OMPT target callback support
AbandonedPublic

Authored by dhruvachak on Nov 11 2021, 7:05 PM.

Details

Summary

OMPT callback support for target task, target data op, and target submit

Patch from John Mellor-Crummey <johnmc@rice.edu>
(With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>)

o Connect up libomp, libomptarget and AMDGPU device rtl
o Provide a callback class used to interact with a tool
o Update omp-tool.h.var from LLVM main to ensure it is fresh
o Update libomp sources for two purposes

  1. Adapt to new macros in omp-tools.h
  2. Add API used by libomptarget to extract task_data and target_task_data when needed to support OMPT EMI calls

o Integrate support for EMI and NONEMI target callbacks
o Target, data operation, and target submit callbacks added

Diff Detail

Event Timeline

dhruvachak created this revision.Nov 11 2021, 7:05 PM
dhruvachak requested review of this revision.Nov 11 2021, 7:05 PM
Herald added a project: Restricted Project. · View Herald Transcript

I mainly checked the libomp code and put some inline comments.

openmp/libomptarget/CMakeLists.txt
72–95

This code does not reproduce the libomp control flow for in-llvm-tree and standalone build.

To use cmake variables from openmp/runtime, just make them visible in parent scope:

set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)

see https://github.com/llvm/llvm-project/commit/3bc8ce5dd718beef0031bf4b070ac4026e6910d7#diff-01127988bc2b1599723b049e94ada3025063f4e0a7bac326aaf37b892a118fd1R332

openmp/runtime/src/ompt-general.cpp
504–509

This change seems unrelated and it is not obvious whether this change is correct.

protze.joachim added inline comments.Nov 12 2021, 7:24 AM
openmp/libomptarget/include/ompt_device_callbacks.h
64

Why are the functions virtual?

openmp/runtime/src/ompt-internal.h
16–37

These changes seem unrelated.

121–137

These are unrelated changes.

130

From __builtin_frame_address reference:

Calling this function with a nonzero argument can have unpredictable effects, including crashing the calling program. As a result, calls that are considered unsafe are diagnosed when the -Wframe-address option is in effect. Such calls should only be made in debugging situations.

How should we deal with this restriction?

openmp/runtime/src/ompt-specific.cpp
359

@AndreyChurbanov what do you think about the use of thread_local in libomp?

If a thread-local value is sufficient (which I doubt from OMPT perspective), this value could be stored in a new entry of ompt_thread_info_t:
__kmp_threads[__kmp_get_gtid()]->th.ompt_thread_info.target_data

openmp/runtime/src/ompt-specific.h
87–113

These are unrelated changes.

dhruvachak marked 8 inline comments as done.Nov 19 2021, 5:47 PM

Thanks, Joachim, for your comments. I removed the unrelated changes and made the cmake change in the next version.

openmp/libomptarget/CMakeLists.txt
72–95

Added the parent scope.

openmp/libomptarget/include/ompt_device_callbacks.h
64

I think the intent was to have plugin specific derived classes but that's not the case at the moment. So removed virtual.

openmp/runtime/src/ompt-internal.h
16–37

OMPT_FRAME_SET is called by ompt_set_frame_enter_internal which is part of target region support. ompt_set_frame_enter_internal is called by ompt_set_frame_enter which is used by ompt_callback.

130

Since it's unrelated, I removed it.

openmp/runtime/src/ompt-specific.cpp
359

Made the suggested change.

dhruvachak marked 5 inline comments as done.Nov 19 2021, 5:49 PM

Addressed feedback: made the suggested cmake change and removed unrelated changes

protze.joachim added inline comments.Nov 22 2021, 1:58 AM
openmp/runtime/src/ompt-general.cpp
512–520

I think, we need to reorder the conditions to make sure that target finalize is called even if the tool does not register a finalize callback. We should check the callback pointer in any case, even without OMPD support.

dhruvachak added inline comments.Nov 22 2021, 2:29 PM
openmp/runtime/src/ompt-general.cpp
512–520

Made the change in the next version. I think we should check for NULL-ness unconditionally, so removed the OMPD_SUPPORT ifdef.

Addressed feedback: libomptarget finalize should not depend on OMPD

Skimmed through this patch. Lots of code that looks like it was copy + paste + modify which is usually a bug farm. Quite a lot of global variables. No associated tests.

Lack of tests seems the most significant problem here.

openmp/libomptarget/include/ompt-connector.h
54

I don't understand this note

78

error handling missing here

openmp/libomptarget/include/omptarget.h
356–357

I thought these functions were ABI stable, i.e. we can't change them

openmp/libomptarget/plugins/amdgpu/src/ompt_callback.cpp
119

initialization order hazard here. When is this supposed to be constructed relative to the other globals?

openmp/libomptarget/src/ompt_callback.cpp
43

I can't remember if private inheritance of std containers is well defined or not. This class doesn't seem to use any of the members of std::list. What's going on here?

92

Why are these all global variables?

243

Can we factor this so copy&paste typos are less likely?

408

This sort of thing makes the library awkward to construct safely. I guess 102 is related to the other function tagged constructor, but we're better off exposing a single library_init function which is called by libomptarget at the appropriate point in time

openmp/runtime/src/ompt-specific.cpp
269

?

JonChesterfield added inline comments.Dec 7 2021, 1:14 PM
openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
43

This is an unguarded if statement, will interact badly with other if statements next to it in code. Macros like this are traditionally wrapped in do { } while (0) to avoid that syntax hazard

jmellorcrummey added inline comments.Dec 9 2021, 12:03 AM
openmp/libomptarget/include/ompt-connector.h
54

This class is used to communicate between an OMPT implementation in libomptarget and libomp. It is also used to communicate between an OMPT implementation in a device-specific plugin. The decision whether OMPT is enabled or not needs to be made when the library is loaded before any functions in the library are invoked. For that reason, this class is intended to be declared in the constructor for libomptarget or a plugin so that the decision about whether OMPT is supposed to be enabled is known before any interface function in the library is invoked.

78

If the dlsym doesn't return a non-zero value, then the OMPT interface is not enabled. The check for a zero value is in the connect method. If library_ompt_connect is NULL because dlsym returned NULL, then the code in the client library (the one calling connect) won't try to call this function to connect with its server library.

openmp/libomptarget/plugins/amdgpu/src/ompt_callback.cpp
119

The only requirement here is that this initialization must occur in a constructor so that libomp can use the pointer set up in lines 123 to initialize OMPT support when the call on line 130 is made so that OMPT support is either initialized or not before any entry point to libomptarget is invoked.

openmp/libomptarget/src/ompt_callback.cpp
92

These are global (but thread local) variables because they hold persistent state used by a thread operating in a target region.
When you enter a target region these variables are initialized. The operation id is adjusted every call to a target region operation. ompt_task_data and ompt_target_task_data are set when the region is entered and persist throughout a target region being performed by a thread.

Note that these variables are outside the ompt_interface class because they are manipulated by functions passed to methods such as ompt_device_callbacks.ompt_callback_target_data_op_emi.

243

The commonality is between begin/end pairs. What would you propose? Factoring out an EMI call into a function F that can be called with a few less arguments so that global variables like ompt_target_task_data, ompt_target_data don't need to be passed to be passed to F? I don't think this adds much simplification.

408

While this could be a global function called at the end of the other constructor in rtl.cpp, this would be the only reason any OMPT header has to be included in rtl.cpp.

The other option is to declare all of the priorities passed to constructors in one place and make the number for this constructor the successor to the other constructor, e.g.

#define RTL_CONSTRUCTOR_PRIORITY 101
#define OMPT_CONSTRUCTOR_PRIORITY (RTL_CONSTRUCTOR_PRIORITY + 1)

Using these defines means that no OMPT header file needs to be included in rtl.cpp. Or perhaps you'd prefer a constructor in its own file that calls the "constructor" functions in rtl.cpp and ompt_callback.cpp and the constructor can include headers for rtl.cpp and ompt_callback.cpp that define the functions that need to be called by the single constructor.

openmp/runtime/src/ompt-specific.cpp
269

I'm not sure what the question is.

The enter_frame_flags and exit_frame_flags used by the runtime entry points for inspecting frame objects provide information about how to locate the frame pointer of a procedure frame that called into or out of the runtime.

When setting up a lightweight task, these flags are cleared.

They will be set later to a logical or

either ompt_frame_runtime or ompt_frame_application

ored together with either ompt_frame_framepointer or ompt_frame_cfa

dhruvachak marked 4 inline comments as done.Dec 9 2021, 7:26 PM
dhruvachak added inline comments.
openmp/libomptarget/include/ompt-connector.h
78

I deleted the default constructor in the next version so as to prevent its accidental use in new code. This will ensure proper initialization, no other error handling required here.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
43

Fixed in the next version.

openmp/libomptarget/src/ompt_callback.cpp
43

Removed in the next version.

dhruvachak updated this revision to Diff 393368.Dec 9 2021, 7:30 PM
dhruvachak marked 2 inline comments as done.

Addressed feedback

Updated implementation status of target callbacks
Removed unused static var and unused function
Added do-while to macros
Removed unused inheritance from std::list
Deleted the default constructor to ensure proper initialization

I've added a bunch of reviewers from git shortlog -sn openmp to get more eyes on this as it's hard to guess who will be most interested.

I think a reasonable path forward here is to check the ompt hooks don't upset the existing machinery and then iterate in tree. @dhruvachak is committing this as-is to the amd internal rocm branch to get some data on that.

Added new tests for OMPT target callbacks for amdgpu

Guard ompt_init constructors with ifdef OMPT_SUPPORT, added comments to tests

I commented on the most severe issues I could find. Since I don't know enough about the OMPT inner workings it is mostly style and general code comments.
That said, the lack of (doxygen) documentation is not a great sign. There is not much being explained in the code either. I fear we are piling up more and more code less and less people understand without reasonable means for new people to get into it. For now, I'm fine with this but @dhruvachak you should consider spending some time soon documenting the methods and non-trivial code pieces.

openmp/libomptarget/include/ompt-connector.h
38

This is not something we should expose in global namespace especially as it is not used here and has a rather generic name.

56

Nit: Try to make comments proper sentences incl. punctuation and such.

93

I would have used the LLVM coding convention in these new parts, it's a missed opportunity IMHO.

Early exists should be preferred over indention, e.g., if (is_initialized) return;

We should at least remove ; after methods (here and elsewhere).

Clang format all the code.

openmp/libomptarget/include/ompt_device_callbacks.h
3

Nit: Style

openmp/libomptarget/include/omptarget.h
356–357

What Jon said. Also, this method is effectively deprecated anyway, the one below is used even though we should not explicitly push the trip count in the first place.

openmp/libomptarget/plugins/amdgpu/src/ompt_callback.cpp
128

These stray ; appear a few times. Will trigger warnings and should be removed.

openmp/libomptarget/plugins/amdgpu/src/rtl.cpp
1855

Clang format these things, consider OMPT_IF_ENABLED({ ... }).
Also all the others that span more than one line/statement.

openmp/libomptarget/src/device.cpp
502

These functions use LLVM coding style, please don't mix it with the C style other places use.

The pattern here is repetitive and I am unsure if it would not make it much easier to use a RAII object instead.

Comments apply to the rest of the file too.

openmp/libomptarget/src/ompt_callback.cpp
59

assertions should have messages, use nullptr for a pointer not 0.

121

what are those 0s? Use /* enter_frame */ nullptr, ...

381

Please no assert(0). In release builds this will silently do something while it arguably should not.
print the error, then explicitly abort, exit, trap, ...

423

technically this is not a reserved name. I doubt it will clash with the user names but it could for all we know. Any chance we can make it a reserved name instead? ompx_ omp_ (ompt_?) __libomptarget ...

openmp/libomptarget/src/ompt_callback.h
106

Yet another naming scheme?

openmp/libomptarget/src/omptarget.cpp
33

This seems something worth refactoring, third time I see it.

openmp/libomptarget/test/ompt/callbacks.h
72

Please no assert(0). In release builds this will silently do something while it arguably should not.
print the error, then explicitly abort, exit, trap, ...

126

Please no assert(0). In release builds this will silently do something while it arguably should not.
print the error, then explicitly abort, exit, trap, ...

openmp/runtime/src/ompt-internal.h
98

Is this something we need to keep stable?

openmp/runtime/src/ompt-specific.cpp
271

Here and above, stray ; again. Will cause warnings (=problems) eventually.

Addressed feedback.

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2022, 11:20 AM
dhruvachak marked 20 inline comments as done.Apr 8 2022, 11:36 AM

I addressed most of the feedback and updated this review. I will work on splitting up this patch into multiple pieces so that they can be reviewed more easily.

openmp/libomptarget/include/ompt_device_callbacks.h
3

clang-format makes this change and I left it alone.

jdoerfert added inline comments.Apr 8 2022, 11:51 AM
openmp/libomptarget/include/ompt_device_callbacks.h
3

Your line is too long. "Target independent OpenMP target RTL" can be shortened, maybe even replaced with something that explains what this file does. I have no idea what "Target independent OpenMP target RTL" means wrt. OMPT callbacks. My guess is this was copied from somewhere else.

openmp/libomptarget/src/device.cpp
500

For these RAIIs, you could put OMPT_IF_BUILT into the RAII code and make them a no-op if OMPT is not used. That way you don't need any macro here, the RAII would always be there and removed when build w/o OMPT.

536

No need for the extra indention.

562

same as above.

openmp/libomptarget/src/interface.cpp
130

same as above, I'll stop pointing it out now.

dhruvachak marked an inline comment as done.Apr 8 2022, 12:16 PM
dhruvachak added inline comments.
openmp/libomptarget/src/interface.cpp
130

Yes, I thought about the additional scope. But I was worried about future new code possibly messing up where exactly I want the RAII object dtor called.

dhruvachak abandoned this revision.Jun 20 2022, 4:46 PM

Abandoning this revision in favor of https://reviews.llvm.org/D127372 which is the top of a stack of reviews. Comments from this revision have been incorporated in that stack.