Page MenuHomePhabricator

[OpenMP] [OMPD] [1/6] Implementation of OMPD debugging library - libompd. Code changes in openmp/runtime to support libompd.
Needs ReviewPublic

Authored by Vigneshbalu on Apr 9 2021, 5:21 AM.



These patches propose the implementation of OMPD, a debugging interface to support debugging of OpenMP programs.

"" acts as an interface to the third-party tools, typically the debuggers. The tool accesses the state of the OpenMP program through the library and libompd can read or write the state of the OpenMP program that has begun execution through the callback function provided by the tool.

A brief talk regarding the upstreaming plans was provided a few months back in the LLVM-OpenMP committee meeting, the notes of which were provided at:

Most of the implementation has been contributed to by folks from RWTH-Aachen University, LLNL, Rice University, Perforce, AMD and others. (Might have missed out some of the contributors – apologies for that). It has been developed and maintained in

OMPD support is restricted only to CPU and Linux currently.

OMPD APIs are implemented as per the OpenMP 5.0 standard .
These patches also include some gdb-plugin code which is a python module which can be loaded into the debugger, gdb. This provides OMPD specific debugging commands once the "ompd init" command is invoked at the gdb prompt. More details regarding this are provided in the README available in the gdb-plugin directory.

CMake build system and directory structure are similar to the openmp/runtime and openmp/libomptarget.
Testing is done by two methods. First by comparing the output of OMPT and OMPD, second through gdb-plugin using llvm-lit and Filecheck.

This implementation does not provide debugging support for offloading to CPUs or GPUs.

The breakdown of patches is below.
Note: Patches should be applied in the below order.

  1. Code changes in openmp/runtime to support libompd.
  2. TargetValue: Access OpenMP runtime state through callbacks provided by the tool.
  3. omp-debug: Implementation of OMPD APIs.
  4. omp-icv: OMPD Internal control variable handlers.
  5. gdb-plugin: A Plugin code to gdb to leverage libompd to provide debugging support.
  6. libompd-tests: Testcases for libompd.

Another set of around 50 testcases will be pushed subsequently.

Please let me know if the patches need to be reorganized.

Diff Detail

Unit TestsFailed

1,020 msx64 debian > libomp.ompt/parallel::max_active_levels_serialized.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/max_active_levels_serialized.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/max_active_levels_serialized.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/max_active_levels_serialized.c.tmp | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/max_active_levels_serialized.c.tmp.out | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/max_active_levels_serialized.c
1,160 msx64 debian > libomp.ompt/parallel::nested_lwt.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/nested_lwt.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_lwt.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_lwt.c.tmp | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_lwt.c.tmp.out | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/nested_lwt.c
1,170 msx64 debian > libomp.ompt/parallel::nested_serialized.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/nested_serialized.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_serialized.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_serialized.c.tmp | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/nested_serialized.c.tmp.out | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/nested_serialized.c
730 msx64 debian > libomp.ompt/parallel::parallel_if0.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/parallel_if0.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/parallel_if0.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/parallel_if0.c.tmp | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/parallel_if0.c.tmp.out | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/parallel_if0.c
670 msx64 debian > libomp.ompt/parallel::serialized.c
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/serialized.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/serialized.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/serialized.c.tmp | tee /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/ompt/parallel/Output/serialized.c.tmp.out | /mnt/disks/ssd0/agent/llvm-project/build/./bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt/parallel/serialized.c

Event Timeline

Vigneshbalu created this revision.Apr 9 2021, 5:21 AM
Vigneshbalu requested review of this revision.Apr 9 2021, 5:21 AM
Herald added a project: Restricted Project. · View Herald Transcript edited the summary of this revision. (Show Details)Apr 9 2021, 7:32 AM
Vigneshbalu changed the visibility from "Public (No Login Required)" to "Custom Policy".Apr 9 2021, 7:55 AM edited the summary of this revision. (Show Details)Apr 9 2021, 8:12 AM
Vigneshbalu changed the visibility from "Custom Policy" to "Public (No Login Required)".Apr 11 2021, 9:47 AM

Can you please upload a diff with full context, see (


Here and elsewhere, if a change is not guarded by OMPD_SUPPORT, it seems to me they should be committed separately.

Diff with full context

Vigneshbalu added inline comments.Sun, Apr 11, 11:48 PM

I have guarded other places, this change and "openmp/runtime/src/ompt-specific.cpp" are code movement that stems from .
I will create separate review for this.

Removed the code that is not guarded by OMPD and created a separate review for the same

cchen added a subscriber: cchen.Wed, Apr 21, 2:07 PM

Addressed clang-tidy warnings.

hbae added inline comments.Tue, Apr 27, 5:11 PM

We need to use ompd_wait_id_t.


Why do we need scheduling task for implicit tasks while we don't call it task scheduling in the specification?


We already have program points defined for OMPT callbacks.
Isn't it better to place parallel begin/end BP at the places near OMPT parallel begin/end?
Are there any missing parallel begin/end not covered by the current OMPT parallel begin/end?


Can you add some comments why it is 7.

jini.susan added inline comments.Wed, Apr 28, 5:08 AM

Thanks for this good catch! Will work on modifying this.


According to the spec, there is a difference in the binding associated with the current parallel handle with the OMPT parallel-begin event and the OMPD parallel-begin event.

For OMPD, (Section 5.6.1.)

The OpenMP implementation must execute ompd_bp_parallel_begin at every
parallel-begin event. At the point that the implementation reaches
ompd_bp_parallel_begin, the binding for ompd_get_curr_parallel_handle is the
parallel region that is beginning and the binding for ompd_get_curr_task_handle is the
task that encountered the parallel construct.


Between a parallel-begin event and an implicit-task-begin event, a call to
ompt_get_parallel_info(0,...) may return information about the outer parallel team,
the new parallel team or an inconsistent state.

The OMPD runtime entry points in this implementation are invoked at points to ensure that the right binding holds as per the spec.

protze.joachim added inline comments.Wed, Apr 28, 7:33 AM

The OMPT parallel-begin is placed before the runtime gets a lock for creating the team. All information provided by the callback is available at this point. Other information like the number of threads is passed in the implicit-task-begin event.
As Jini pointed out, there is not enough information available for OMPD parallel-begin at that point.


Please drop these lines. The current patches don't include any cuda stuff.


same here, drop the cuda symbol

Addressed the Review comments.

Vigneshbalu marked 5 inline comments as done.Sun, May 2, 9:53 PM
Vigneshbalu added inline comments.

Thanks Hansang. it is unwanted variable and removed it




Thanks for the catch joachim

protze.joachim added inline comments.Tue, May 11, 1:45 AM

I think, the specification of ompd_get_scheduling_task_handle and ompd_get_generating_task_handle needs some refinement.

I think, the background here is the following:

On return, the scheduling_task_handle argument points to a location that points to a handle for the
task that is still on the stack of execution on the same thread and was deferred in favor of executing
the selected task.

When reaching a taskwait, the execution of the encountering implicit task is deferred to execute some explicit tasks.
Similarly, for a parallel region, the execution of the encountering implicit task is deferred to execute the implicit task of the parallel region.

When thinking about possible debugger workflow, it might help to cut the chain as you suggest and implicitly tell the debugger, that the implicit task is reached and the debugger should use ompd_get_generating_task_handle to get the parent task on the stack.

For the function ompd_get_generating_task_handle, the following might be an issue:

The ompd_get_generating_task_handle function obtains a pointer to the task handle for the task that encountered the OpenMP task construct that generated the task represented by task_handle.

Since the parallel construct is no task construct, following this definition, the task encountering a parallel construct is also not the generating task. The glossary entry for generating task also refers to the parallel region and not the implicit task.

I think, we should clarify the terms in the next tools subcommittee call.