This patch makes it possible for a performance tool that uses call stack unwinding to map implementation-level call stacks from master and worker threads into a unified global view. There are several components to this patch.
include/*/ompt.h.var
- Add a new enumeration type that indicates whether the code for a master task for a parallel region is invoked by the user program or the runtime system
- Change the signature for OMPT parallel begin/end callbacks to indicate whether the master task will be invoked by the program or the runtime system. This enables a performance tool using call stack unwinding to handle these two cases differently. For this case, profiler that uses call stack unwinding needs to know that the call path prefix for the master task may differ from those available within the begin/end callbacks if the program invokes tha master.
kmp.h
- Change the signature for __kmp_join_call to take an additional parameter indicating the fork_context type. This is needed to supply the OMPT parallel end callback with information about whether the compiler or the runtime invoked the master task for a parallel region.
kmp_csupport.c
- Ensure that the OMPT task frame field reenter_runtime_frame is properly set and cleared before and after calls to fork and join threads for a parallel region.
- Adjust the code for the new signature for __kmp_join_call.
- Adjust the OMPT parallel begin callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region.
kmp_gsupport.c
- Apply all of the analogous changes described for kmp_csupport.c for the GOMP interface
- Add OMPT support for the GOMP combined parallel region + loop API to maintain the OMPT task frame field reenter_runtime_frame.
kmp_runtime.c:
- Use the new information passed by __kmp_join_call to adjust the OMPT parallel end callback invocations to carry the extra parameter indicating whether the program or the runtime invokes the master task for a parallel region.
ompt_internal.h:
- Use the flavor of the parallel region API (GNU or Intel) to determine who invokes the master task.
Indention