This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Avoid internal calls to external compiler interface (kmpc)
Needs ReviewPublic

Authored by protze.joachim on Nov 26 2020, 3:33 PM.

Details

Summary

Currently, OMPT works around the issue of internal calls to the kmpc interface, because we need to collect frame and return address on entry to the runtime.
This patch adds __kmp_aux_ variants for all internally called __kmpc_ functions and replaces all internal calls, so that only compiler generated code will call the external kmpc interface.

The KMP_ASSERT in the return address guard now successfully shows that no return address is temporarily stored when we try to store a new address on entering the runtime. I'll replace the assert by KMP_DEBUG_ASSERT.
In my adhoc tests with epcc benchmarks I didn't see performance regressions from this patch, but I recommend to run your own regression tests.

This refactoring will allow more consistent handling of OMPT frame and state information in the next step.

Diff Detail

Event Timeline

protze.joachim created this revision.Nov 26 2020, 3:33 PM
protze.joachim requested review of this revision.Nov 26 2020, 3:33 PM
protze.joachim added inline comments.Nov 26 2020, 3:46 PM
openmp/runtime/src/kmp.h
3765–3791

I was unsure, where these definitions should go. The functions are called in kmp_gsupport.cpp and kmp_tasking.cpp, the definition is still in kmp_tasking.cpp.

openmp/runtime/src/kmp_csupport.cpp
476–482

Where should the documentation go?

openmp/runtime/src/kmp_runtime.cpp
1515

kmpc_serialized_parallel only asserts valid gtid and stores the OMPT return address before calling kmp_serialized_parallel. So directly call __kmp_serialized_parallel.

Replace the KMP_ASSERT by KMP_DEBUG_ASSERT.
Fixing several corner cases

I'm not convinced. I didn't go over everything but I think we should discuss the goal and choices here first.

I checked the OmptReturnAddressGuard and it looks like it is a noop to store a return address if one is set already, right? (assuming same thread)
Could we unconditionally execute OMPT_STORE_RETURN_ADDRESS(gtid); instead to avoid the _aux stuff?

(Partially related, @AndreyChurbanov I'd argue the branch in OmptReturnAddressGuard is UNLIKELY as well, basically all that check ompt_enabled.enabled are?)

openmp/runtime/src/kmp_barrier.cpp
327 ↗(On Diff #308487)

This is unrelated, right?

openmp/runtime/src/kmp_cancel.cpp
30

No __forceinline please, also elsewhere.

I'm not convinced. I didn't go over everything but I think we should discuss the goal and choices here first.

I checked the OmptReturnAddressGuard and it looks like it is a noop to store a return address if one is set already, right? (assuming same thread)
Could we unconditionally execute OMPT_STORE_RETURN_ADDRESS(gtid); instead to avoid the _aux stuff?

The store only if not yet stored approach is a hack necessary to work around the issue that the compiler facing interface of the runtime is also called internally and therefore we cannot determine the caller.
This approach fails if the runtime misses to reset the stored value, when passing control back to the application:

  • when returning to the application (solved by the guards)
  • when calling into the application to execute outlined code

Especially for barriers, we need the return address in different runtime functions and also after potentially calling into the application many times.

With a clear separation of the external and internal interface, we can unconditionally push the return address onto the threadlocal stack when entering the runtime trough the external interface.
The bigger deal are the frame pointers. There the assertions effectively only apply with this separated, external use of the interface.

(Partially related, @AndreyChurbanov I'd argue the branch in OmptReturnAddressGuard is UNLIKELY as well, basically all that check ompt_enabled.enabled are?)

I agree, that we can put UNLIKELY there. When we tuned the functions for _if0 tasks with templates some years ago, I did some epcc benchmark experiments, which showed that putting UNLIKELY for every ompt_enabled.enabled does not always improve performance, but sometimes actually degrades performance.

openmp/runtime/src/kmp_cancel.cpp
30

There are already many __forceinline in this file. Some dating back to Jim Cownies initial commit.

I'm not convinced. I didn't go over everything but I think we should discuss the goal and choices here first.

I checked the OmptReturnAddressGuard and it looks like it is a noop to store a return address if one is set already, right? (assuming same thread)
Could we unconditionally execute OMPT_STORE_RETURN_ADDRESS(gtid); instead to avoid the _aux stuff?

The store only if not yet stored approach is a hack necessary to work around the issue that the compiler facing interface of the runtime is also called internally and therefore we cannot determine the caller.
This approach fails if the runtime misses to reset the stored value, when passing control back to the application:

  • when returning to the application (solved by the guards)
  • when calling into the application to execute outlined code

Especially for barriers, we need the return address in different runtime functions and also after potentially calling into the application many times.

With a clear separation of the external and internal interface, we can unconditionally push the return address onto the threadlocal stack when entering the runtime trough the external interface.
The bigger deal are the frame pointers. There the assertions effectively only apply with this separated, external use of the interface.

Hm,.. I doubt I understand the problem fully. Could you present it in our wednesday meeting at some point (in ~7.5h or a week)?
FWIW, one reason I'm hesitant is that this duplication has to be continued as new APIs come in.

(Partially related, @AndreyChurbanov I'd argue the branch in OmptReturnAddressGuard is UNLIKELY as well, basically all that check ompt_enabled.enabled are?)

I agree, that we can put UNLIKELY there. When we tuned the functions for _if0 tasks with templates some years ago, I did some epcc benchmark experiments, which showed that putting UNLIKELY for every ompt_enabled.enabled does not always improve performance, but sometimes actually degrades performance.

If it does on average I'd do it. In the long run the compiler should improve in its use of hints; at least I hope ;)

openmp/runtime/src/kmp_cancel.cpp
30

This is a MSCV extension, at the very least we should use something like described in the wikipedia article: https://en.wikipedia.org/wiki/Inline_function#Nonstandard_extensions

Preferably, we would not need to do any of it though, arguably the compiler should figure this out. What we should do instead is use (un)likely and hot/cold annotations, IMHO. [Providing information is good, assuming you know better means you should work on the inliner heuristic ;)]

I think the idea behind this change is:

  • There is an external interface. The compiler only emits calls to functions in this interface.
  • Functions within the library should never call a function from this external interface

If so, I'm strongly in favour. That design lends itself to wrapping the external interface for logging, debugging etc. It means the functions that do the work can be marked internal when building the library as bitcode, allowing other transforms. That it is useful for ompt is a consequence of the separation being useful elsewhere.

The change itself seems noisy for that, possibly because it mixes the refactor to avoid calls to the interface with the ompt changes.

protze.joachim updated this revision to Diff 310840.EditedDec 10 2020, 4:47 AM

As discussed in yesterdays call, I marked the use of all __kmpc_ functions as deprecated and moved the declaration of those __kmpc functions into a new header file.
I also removed most OMPT-related changes from this patch and will open a separate differential for those. Therefore several OMPT tests fail after this patch.

In the current state of the patch I still see several deprecation warnings, which I will clean up, if we get general agreement on the changes in this patch.

hbae added inline comments.Dec 16 2020, 3:14 PM
openmp/runtime/src/kmp_csupport.cpp
476–482

I do not know if someone is actively using doxygen output now, but if we want to keep current documentation, it should move to __kmpc_* functions.

openmp/runtime/src/kmp_gsupport.cpp
1285

It seems that we need instantiation of this function and other two templates used below with <false> input. Alternatively, we can define these three functions that do not contain OMPT code.

__kmp_omp_task_begin_if0(...)
__kmp_omp_task_complete_if0(...)
__kmp_omp_taskwait(...)

Thanks for pushing this into the new direction. Don't wait for me though.

openmp/runtime/src/kmp_cexternal.h
1

License and short explanation what this file is.

openmp/runtime/src/kmp_csupport.cpp
476–482

We just created the doxygen build target. The website is on its way, assume doxygen is a thing now, and yes the main docue goes to the public facing ones. We can (as we get to it) add /// \see __kmp_end_serialized_parallel to the aux ones (or something similar that creates a link.