This is an archive of the discontinued LLVM Phabricator instance.

[OMPT] Enable OMPT on 32-bit ARM machines
Needs RevisionPublic

Authored by pawosm01 on Jan 8 2018, 5:36 AM.

Details

Diff Detail

Repository
rOMP OpenMP

Event Timeline

pawosm01 created this revision.Jan 8 2018, 5:36 AM

I did my tests on 32-bit dual-core ARMv7 Cortex-A9 machine. Just in case I also tested it on 64-bit 96-cores ARMv8.1 machine to make sure there are no regressions.

I think I saw a few failures related to return addresses on my Raspberry Pi 2... I'll try to retest asap.

Hahnfeld requested changes to this revision.Jan 8 2018, 7:33 AM

I think I saw a few failures related to return addresses on my Raspberry Pi 2... I'll try to retest asap.

I'm seeing (with GCC 5.4.0):

Failing Tests (7):
    libomp :: ompt/parallel/dynamic_enough_threads.c
    libomp :: ompt/parallel/dynamic_not_enough_threads.c
    libomp :: ompt/synchronization/lock.c
    libomp :: ompt/synchronization/nest_lock.c
    libomp :: ompt/synchronization/test_lock.c
    libomp :: ompt/tasks/task_in_joinbarrier.c
    libomp :: ompt/tasks/untied_task.c

The lock tests seem to report a "wrong" return address and the others complain that ompt_event_parallel_begin doesn't have a reenter address...

This revision now requires changes to proceed.Jan 8 2018, 7:33 AM

I didn't even know these tests are allowed to be compiled with anything else than clang 6.0+...

The older gcc version the worse it gets, with 4.9.1-16ubuntu6 it's:

Failing Tests (9):
    libomp :: ompt/parallel/dynamic_enough_threads.c
    libomp :: ompt/parallel/dynamic_not_enough_threads.c
    libomp :: ompt/parallel/normal.c
    libomp :: ompt/parallel/not_enough_threads.c
    libomp :: ompt/synchronization/lock.c
    libomp :: ompt/synchronization/nest_lock.c
    libomp :: ompt/synchronization/test_lock.c
    libomp :: ompt/tasks/task_in_joinbarrier.c
    libomp :: ompt/tasks/untied_task.c

I didn't even know these tests are allowed to be compiled with anything else than clang 6.0+...

I remember seeing failures with Clang as well. We definitely need to take a look at these problems...

I didn't even know these tests are allowed to be compiled with anything else than clang 6.0+...

All ompt tests were tested with gcc-4+, clang-4+, icc-15+. If not flaged different (UNSUPPORTED or REQUIRED) the tests should be compatible with these compilers.

It's kmp_gsupport.cpp issue I presume, was it ever tested on any 32-bit platform? I'll try to look at it through the day.

It's kmp_gsupport.cpp issue I presume, was it ever tested on any 32-bit platform? I'll try to look at it through the day.

Yes, all tests pass on i386 AFAICS

It's kmp_gsupport.cpp issue I presume, was it ever tested on any 32-bit platform? I'll try to look at it through the day.

Yes, all tests pass on i386 AFAICS

Nah, looks like I didn't recreate the build directories and OMPT was still disabled. With OMPT enabled, I need to fix ompt_control (can you please submit the change separately so that we can backport the fix to release_60?) and a wrong cast in synchronization/lock.c (working on a fix) - all other tests are green with latest Clang and GCC with -m32 from x86_64.

Found that __builtin_frame_address(1) gives wrong answer in __kmp_api_GOMP_parallel (a.k.a KMP_API_NAME_GOMP_PARALLEL as in kmp_gsupport.cpp). Then I've found out that this won't work with gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60109
Status: RESOLVED WONTFIX
We can mark it UNSUPPORTED on arm32/gcc

Found that __builtin_frame_address(1) gives wrong answer in __kmp_api_GOMP_parallel (a.k.a KMP_API_NAME_GOMP_PARALLEL as in kmp_gsupport.cpp). Then I've found out that this won't work with gcc: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60109
Status: RESOLVED WONTFIX
We can mark it UNSUPPORTED on arm32/gcc

Sounds like the only thing we can do :-( if you add LIBOMP_ARCH as feature, you should be able to use something like XFAIL: arm32 && gcc according to https://llvm.org/docs/TestingGuide.html#constraining-test-execution

As I understand the issue, this is not a fault in the test code, but the value provided by parallel-begin is wrong? In that case, I suggest to pass NULL for this architecture/compiler.

Nah, doing architecture && compiler doesn't seem to work. I've got some better idea.

pawosm01 updated this revision to Diff 129116.Jan 9 2018, 10:05 AM
pawosm01 retitled this revision from [OMPT] Enable OMPT on 32-bit ARM machines and fix type mismatch in omp_control_tool implementation that makes it run incorrectly on 32-bit machines to [OMPT] Enable OMPT on 32-bit ARM machines.

I really like this message that proposed patch causes on ARM + gcc when someone's trying to set -DLIBOMP_OMPT_SUPPORT=ON:

CMake Error at runtime/cmake/LibompUtils.cmake:27 (message):
  LIBOMP: OpenMP Tools Interface requested but not available in this
  implementation
Call Stack (most recent call first):
  runtime/CMakeLists.txt:316 (libomp_error_say)

This is more clean solution than passing NULL which effects in less pleasant failures.

Nah, doing architecture && compiler doesn't seem to work. I've got some better idea.

You'd have to add the architecture as a feature in the lit configuration script.

I really like this message that proposed patch causes on ARM + gcc when someone's trying to set -DLIBOMP_OMPT_SUPPORT=ON:

CMake Error at runtime/cmake/LibompUtils.cmake:27 (message):
  LIBOMP: OpenMP Tools Interface requested but not available in this
  implementation
Call Stack (most recent call first):
  runtime/CMakeLists.txt:316 (libomp_error_say)

This is more clean solution than passing NULL which effects in less pleasant failures.

Does a Clang built runtime work for GCC? In particular, do all tests pass when building with Clang but adding -DOPENMP_TEST_C_COMPILER=gcc -DOPENMP_TEST_CXX_COMPILER=g++? If not, we should definitely think about returning NULL and what the spec says about this case.

Does a Clang built runtime work for GCC? In particular, do all tests pass when building with Clang but adding -DOPENMP_TEST_C_COMPILER=gcc -DOPENMP_TEST_CXX_COMPILER=g++?

No, Clang built runtime doesn't work for GCC:

********************
Failing Tests (13):
    libomp :: ompt/parallel/dynamic_enough_threads.c
    libomp :: ompt/parallel/dynamic_not_enough_threads.c
    libomp :: ompt/parallel/no_thread_num_clause.c
    libomp :: ompt/parallel/normal.c
    libomp :: ompt/parallel/not_enough_threads.c
    libomp :: ompt/synchronization/lock.c
    libomp :: ompt/synchronization/nest_lock.c
    libomp :: ompt/synchronization/test_lock.c
    libomp :: ompt/tasks/task_in_joinbarrier.c
    libomp :: ompt/tasks/untied_task.c
    libomp :: ompt/worksharing/for/dynamic.c
    libomp :: ompt/worksharing/for/guided.c
    libomp :: ompt/worksharing/for/runtime.c
Hahnfeld requested changes to this revision.Jan 10 2018, 12:16 AM

Does a Clang built runtime work for GCC? In particular, do all tests pass when building with Clang but adding -DOPENMP_TEST_C_COMPILER=gcc -DOPENMP_TEST_CXX_COMPILER=g++?

No, Clang built runtime doesn't work for GCC:

********************
Failing Tests (13):
    libomp :: ompt/parallel/dynamic_enough_threads.c
    libomp :: ompt/parallel/dynamic_not_enough_threads.c
    libomp :: ompt/parallel/no_thread_num_clause.c
    libomp :: ompt/parallel/normal.c
    libomp :: ompt/parallel/not_enough_threads.c
    libomp :: ompt/synchronization/lock.c
    libomp :: ompt/synchronization/nest_lock.c
    libomp :: ompt/synchronization/test_lock.c
    libomp :: ompt/tasks/task_in_joinbarrier.c
    libomp :: ompt/tasks/untied_task.c
    libomp :: ompt/worksharing/for/dynamic.c
    libomp :: ompt/worksharing/for/guided.c
    libomp :: ompt/worksharing/for/runtime.c

Quote from TR6:

In cases where attribution to source code is impossible or inappropriate, codeptr_ra may be NULL.

This revision now requires changes to proceed.Jan 10 2018, 12:16 AM
pawosm01 added a comment.EditedJan 10 2018, 8:19 AM

Guys, I can think of approach as follows, but fixing all relevant lit tests to make FileCheck directives aware of relevant architecture+compiler combinations is beyond my imagination:

diff --git a/runtime/src/kmp_gsupport.cpp b/runtime/src/kmp_gsupport.cpp
index 9fa05e6..d696d4a 100644
--- a/runtime/src/kmp_gsupport.cpp
+++ b/runtime/src/kmp_gsupport.cpp
@@ -15,8 +15,18 @@
 #include "kmp_atomic.h"

 #if OMPT_SUPPORT
-#include "ompt-specific.h"
+
+#if KMP_ARCH_ARM
+#ifndef OMPT_GET_FRAME_ADDRESS
+#error OMPT_GET_FRAME_ADDRESS should be defined after the inclusion of kmp.h
 #endif
+#undef OMPT_GET_FRAME_ADDRESS
+#define OMPT_GET_FRAME_ADDRESS(level) NULL
+#endif // KMP_ARCH_ARM
+
+#include "ompt-specific.h"
+
+#endif // OMPT_SUPPORT

 #ifdef __cplusplus
 extern "C" {
diff --git a/runtime/src/ompt-internal.h b/runtime/src/ompt-internal.h
index 6139e00..041fabf 100644
--- a/runtime/src/ompt-internal.h
+++ b/runtime/src/ompt-internal.h
@@ -93,7 +93,11 @@ void ompt_post_init(void);
 void ompt_fini(void);

 #define OMPT_GET_RETURN_ADDRESS(level) __builtin_return_address(level)
+#if KMP_ARCH_ARM && KMP_COMPILER_GCC
+#define OMPT_GET_FRAME_ADDRESS(level) NULL
+#else
 #define OMPT_GET_FRAME_ADDRESS(level) __builtin_frame_address(level)
+#endif

 int __kmp_control_tool(uint64_t command, uint64_t modifier, void *arg);

diff --git a/runtime/test/ompt/callback.h b/runtime/test/ompt/callback.h
index 3befe7c..559172d 100755
--- a/runtime/test/ompt/callback.h
+++ b/runtime/test/ompt/callback.h
@@ -61,9 +61,15 @@ static void print_ids(int level)
     printf("%" PRIu64 ": task level %d: parallel_id=%" PRIu64 ", task_id=%" PRIu64 ", frame=%p\n", ompt_get_thread_data()->value, level, exists_task ? parallel_data->value : 0, exists_task ? task_data->value : 0, frame);
 }

+#if KMP_ARCH_ARM && defined(__GNUC__) && !defined(__clang__)
+#define frame_address(level) NULL
+#else
+#define frame_address(level) __builtin_frame_address(level)
+#endif
+
 #define print_frame(level)\
 do {\
-  printf("%" PRIu64 ": __builtin_frame_address(%d)=%p\n", ompt_get_thread_data()->value, level, __builtin_frame_address(level));\
+  printf("%" PRIu64 ": __builtin_frame_address(%d)=%p\n", ompt_get_thread_data()->value, level, frame_address(level));\
 } while(0)

 // clang (version 5.0 and above) adds an intermediate function call with debug flag (-g)

I'm not sure why you need to treat KMP_COMPILER_GCC differently, it's the test compiler that is relevant, isn't it?

pawosm01 added a comment.EditedJan 10 2018, 1:40 PM

Ok, relevant line changed. I didn't notice that although kmp_platform.h (where KMP_ARCH_ARM is defined) is included from callback.h, kmp_os.h where KMP_COMPILER_GCC is defined is not included.
Also note that both clang++ and g++ define __GNUC__ hence full check for GNU C++ compiler looks as follows:

#if defined(__GNUC__) && !defined(__clang__)
#warning this is a GNU compiler
#else
#warning this is NOT a GNU compiler
#endif

Ok, relevant line changed. I didn't notice that although kmp_platform.h (where KMP_ARCH_ARM is defined) is included from callback.h, kmp_os.h where KMP_COMPILER_GCC is defined is not included.
Also note that both clang++ and g++ define __GNUC__ hence full check for GNU C++ compiler looks as follows:

#if defined(__GNUC__) && !defined(__clang__)
#warning this is a GNU compiler
#else
#warning this is NOT a GNU compiler
#endif

I meant to say that you shouldn't check for GCC at all: You said the library doesn't work either if built with Clang, so there is no point in restricting NULL to GCC.

pawosm01 added a comment.EditedJan 11 2018, 12:25 AM

Ahh, I didn't say it isn't working, but I also didn't explicitly said that: when libomp+OMPT is built with clang on 32-bit ARM, all of OMPT test cases pass iff they are compiled with clang.
The problem is on 32-bit ARM and GCC.

Library built with: Clang  | GCC
Testing with:              |
Clang               works  | fails
GCC                 fails  | fails
Library built with: Clang  | GCC
Testing with:              |
Clang               works  | fails
GCC                 fails  | fails

I think, this is how we understood that. But at the end this means, that a tool might see an strange address, i.e. neither the proper frame address nor NULL, for any of these failing combinations. As I understood Jonas, we should therefore return NULL as the frame address, independent of the compiler in use.

I can see three approaches:

  • Generally return NULL on arm32:

#define OMPT_GET_FRAME_ADDRESS(level) NULL

  • Generally return NULL on arm32, if level>0:

#define OMPT_GET_FRAME_ADDRESS(level) ((level==0)?__builtin_frame_address(level):NULL)

  • go into the runtime and use NULL for the cases, where the tests break.

The tricky part for testing any of these changes is how NULL is printed :(
I would feel better if we could provide something like (~(0L)) for the case where the frame address cannot be determined.

Ahh, I didn't say it isn't working, but I also didn't explicitly said that: when libomp+OMPT is built with clang on 32-bit ARM, all of OMPT test cases pass iff they are compiled with clang.
The problem is on 32-bit ARM and GCC.

Library built with: Clang  | GCC
Testing with:              |
Clang               works  | fails
GCC                 fails  | fails

I didn't get that the Clang interface doesn't work if compiled with GCC :-(

  • Generally return NULL on arm32, if level>0:

#define OMPT_GET_FRAME_ADDRESS(level) ((level==0)?__builtin_frame_address(level):NULL)

Why do these cases still exist, I though calling __builtin_frame_address with a level greater than 0 is undefined?

The tricky part for testing any of these changes is how NULL is printed :(
I would feel better if we could provide something like (~(0L)) for the case where the frame address cannot be determined.

The spec says the runtime should return NULL in that case, there isn't much we can about that (except changing the spec).

I suspect proposed change in kmp_gsupport.cpp might of confused you (as I check only for arch but not for the compiler). Note that gsupport (GOMP interface for libomp) must be compatible with what GCC-compiled programs do, otherwise compiling libomp with clang and running test cases compiled with GCC would lead to confusing inconsistency.

Ah, I think I understand your plan now.

A problem is, that the implementation of the gomp interface is inconsistent. Some of the functions call kmp entry points, some call the internal implementation.
In the kmp entry point function we cannot distinguish, whether we come from gomp or the application. Therefore we check for frame==NULL and assign the frame address if not set. This assumes that the gomp function already set the frame address if this was the way to enter the runtime.

This could actually already happen here: __builtin_frame_address(1) returns NULL in the gomp-parallel-begin-function (because the frame address could not be determined), then the value is assigned in the kmp-parallel-begin-function (this time the address can be determined, because it is within the runtime library).