Page MenuHomePhabricator

Hahnfeld (Jonas Hahnfeld)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 2 2015, 4:52 AM (229 w, 1 d)
RWTH Aachen University

Recent Activity

Thu, Aug 15

Hahnfeld accepted D65819: [Driver][Bundler] Improve bundling of object files..

LG, thanks for the changes.

Thu, Aug 15, 9:53 AM · Restricted Project, Restricted Project
Hahnfeld accepted D66296: [BUNDLER]Improve the test, NFC..

LG

Thu, Aug 15, 8:30 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Please submit the test changes unrelated to the code changes in a separate patch!

Thu, Aug 15, 8:01 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D66292: [OpenMP] Change initialization of __kmp_global.
Thu, Aug 15, 7:16 AM · Restricted Project
Hahnfeld created D66292: [OpenMP] Change initialization of __kmp_global.
Thu, Aug 15, 7:13 AM · Restricted Project
Hahnfeld added a comment to D65868: [OpenMP] Remove workaround for CMPXCHG.

Unfortunately, the Windows atomic code will not compile with this change. See in kmp_atomic.h the different atomic types for Windows. The complaints are regarding assignment operations for the complex types.

../../src/kmp_atomic.cpp(1325): error: class "__kmp_cmplx32_t" has no suitable assignment operator
  ATOMIC_CMPXCHG(cmplx4, add, kmp_cmplx32, 64, +, 8c, 7,
  ^
Thu, Aug 15, 7:07 AM · Restricted Project
Hahnfeld requested changes to D65819: [Driver][Bundler] Improve bundling of object files..

The code changes look good to me, but the test doesn't pass on x86. We've faced the same problem when clang-offload-bundler was initially committed and the current testing is the best we were able to do.

Thu, Aug 15, 6:54 AM · Restricted Project, Restricted Project
Hahnfeld removed a reviewer for D55892: [OpenMP] 'close' map-type-modifier code generation: Hahnfeld.

Can we close this after D65341 landed?

Thu, Aug 15, 6:30 AM · Restricted Project, Restricted Project
Hahnfeld committed rGd2ae0c4f443c: [OpenMP] Enable warning about "implicit fallthrough" (authored by Hahnfeld).
[OpenMP] Enable warning about "implicit fallthrough"
Thu, Aug 15, 6:28 AM
Hahnfeld committed rG4d77e50e6ede: [OpenMP] Remove 'unnecessary parentheses' (authored by Hahnfeld).
[OpenMP] Remove 'unnecessary parentheses'
Thu, Aug 15, 6:27 AM
Hahnfeld committed rGfb72a03f85dc: [OMPT] Resolve warnings because of ints in if conditions (authored by Hahnfeld).
[OMPT] Resolve warnings because of ints in if conditions
Thu, Aug 15, 6:26 AM
Hahnfeld committed rL369003: [OpenMP] Enable warning about "implicit fallthrough".
[OpenMP] Enable warning about "implicit fallthrough"
Thu, Aug 15, 6:26 AM
Hahnfeld closed D65871: [OpenMP] Enable warning about "implicit fallthrough".
Thu, Aug 15, 6:26 AM · Restricted Project, Restricted Project
Hahnfeld committed rL369002: [OpenMP] Remove 'unnecessary parentheses'.
[OpenMP] Remove 'unnecessary parentheses'
Thu, Aug 15, 6:26 AM
Hahnfeld closed D65870: [OpenMP] Remove 'unnecessary parentheses'.
Thu, Aug 15, 6:26 AM · Restricted Project, Restricted Project
Hahnfeld committed rL369001: [OMPT] Resolve warnings because of ints in if conditions.
[OMPT] Resolve warnings because of ints in if conditions
Thu, Aug 15, 6:26 AM
Hahnfeld closed D65869: [OMPT] Resolve warnings because of ints in if conditions.
Thu, Aug 15, 6:25 AM · Restricted Project, Restricted Project
Hahnfeld committed rGdc23c832f4f7: [OpenMP] Turn on -Wall compiler warnings by default (authored by Hahnfeld).
[OpenMP] Turn on -Wall compiler warnings by default
Thu, Aug 15, 6:12 AM
Hahnfeld committed rL368999: [OpenMP] Turn on -Wall compiler warnings by default.
[OpenMP] Turn on -Wall compiler warnings by default
Thu, Aug 15, 6:12 AM
Hahnfeld closed D65867: [RFC] [OpenMP] Turn on -Wall compiler warnings by default.
Thu, Aug 15, 6:12 AM · Restricted Project, Restricted Project

Wed, Aug 14

Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Okay, so I wasn't happy with the current explanations because I don't like "fixing" an issue without understanding the problem. Here's a small reproducer:

$  cat main.cpp 
#include "test.h"
Wed, Aug 14, 2:47 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65836: Factor architecture dependent code out of loop.cu.

Jon, please subscribe to openmp-commits so that commit emails get through immediately. Thanks!

Wed, Aug 14, 1:08 AM · Restricted Project, Restricted Project

Tue, Aug 13

Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Can you please again share what exactly is the problem, with a small example? I saw discussions on openmp-dev, but that project was huge, and above you were quoting a man page and hinted to global constructors.

I don't have a small reproducer, unfortunately, only the big one.
Here is the message from the user:

Hi,
I am revisiting this possible compiler bug in Clang 8.0.0.

In the source code I am developing, there's a global static variable, nest::sli_neuron::recordablesMap_ put in the BSS section and it is expected to be fully initialized by the time nest::sli_neuron::sli_neuron() gets called, however in a gdb session:

(gdb) p nest::sli_neuron::recordablesMap_
Python Exception <type 'exceptions.AttributeError'> 'gdb.Type' object has no attribute 'name':
$1 = {<std::map<Name, double (nest::sli_neuron::*)() const, std::less<Name>, std::allocator<std::pair<Name const, double (nest::sli_neuron::*)() const> > >> = std::map with 0 elements, _vptr$RecordablesMap = 0x0}

this doesn't happen when -fopenmp-targets is _not_ used, is it not trivial to come up a reproducer, thus I am sending a work-in-progress report hoping someone will shed some light on this.

Thanks,
Itaru.

Another one:

Hi,
I am seeing a link error shown below:

`.text.startup' referenced in section `.init_array.0' of /tmp/event_delivery_manager-02f392.o: defined in discarded section `.text.startup[_ZN4nest18DataSecondaryEventIdNS_16GapJunctionEventEE18supported_syn_ids_E]' of /tmp/event_delivery_manager-02f392.o
clang-10: error: linker command failed with exit code 1 (use -v to see invocation)

I am not sure how to tackle this as the part is referenced isn't what I am
working on. I am using the latest trunk Clang 10 at this moment.

Steps to reproduce:

Steps to produce:
$ git clone https://https://github.com/nest/nest-simulator/
$ mkdir /tmp/build/nest-clang-offload/
$ cd /tmp/build/nest-clang-offload/
$ cmake -DCMAKE_TOOLCHAIN_FILE=Platform/JURON_Clang -DCMAKE_INSTALL_PREFIX=/path/to/opt/nest-clang -Dwith-gsl=ON -Dwith-openmp=ON -Dwith-mpi=OFF -Dwith-python=OFF -Dwith-offload=ON /path/to/nest-simulator/
Tue, Aug 13, 12:32 PM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

I agree on a technical level that it's still a "correct" object, but not what I was looking for: The host object file will only be in the bundled section, so you cannot examine it without unbundling.

For example, with a small test file and clang -fopenmp -fopenmp-targets=x86_64 -c test.c I saw the following:

$ nm test.o                                       
0000000000000000 t .omp_offloading.requires_reg
0000000000000000 T test
                 U __tgt_register_requires

After applying this patch, the output is empty which might be a problem in certain cases.

Unfortunately, this is the only possible solution I see. There are already 2 reports that bundled objects does not work correctly after unbundling.

Tue, Aug 13, 11:52 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65819: [Driver][Bundler] Improve bundling of object files..
Tue, Aug 13, 11:41 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Ping.

Missed this. We still produce correct object files, so all the tools will work with this.

Tue, Aug 13, 11:37 AM · Restricted Project, Restricted Project
Hahnfeld updated the diff for D65871: [OpenMP] Enable warning about "implicit fallthrough".

Rebase.

Tue, Aug 13, 11:02 AM · Restricted Project, Restricted Project
Hahnfeld updated the diff for D65867: [RFC] [OpenMP] Turn on -Wall compiler warnings by default.

Change check_c_compiler_flag() to check_cxx_compiler_flag().

Tue, Aug 13, 11:00 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65867: [RFC] [OpenMP] Turn on -Wall compiler warnings by default.
Tue, Aug 13, 9:08 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Will this patch change the ability to consume a bundled object file without calling the unbundler? Using known ELF tools on the produced object files was an important design decision and IIRC was somewhat important for using build systems that are unaware of the bundled format.

Tue, Aug 13, 7:02 AM · Restricted Project, Restricted Project

Mon, Aug 12

Hahnfeld added inline comments to D65867: [RFC] [OpenMP] Turn on -Wall compiler warnings by default.
Mon, Aug 12, 11:55 AM · Restricted Project, Restricted Project

Sun, Aug 11

Hahnfeld updated the diff for D65868: [OpenMP] Remove workaround for CMPXCHG.

Remove compiler check for -Wno-uninitialized.

Sun, Aug 11, 4:17 AM · Restricted Project
Hahnfeld added inline comments to D65870: [OpenMP] Remove 'unnecessary parentheses'.
Sun, Aug 11, 4:09 AM · Restricted Project, Restricted Project
Hahnfeld updated the diff for D65870: [OpenMP] Remove 'unnecessary parentheses'.

Remove another unnecessary cast.

Sun, Aug 11, 4:09 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D63877: Avoid infinite loop with asan interception.

ok, it's probably a regex issue for the test, I'll check that.

Sun, Aug 11, 2:33 AM · Restricted Project, Restricted Project

Sat, Aug 10

Hahnfeld added a comment to D63877: Avoid infinite loop with asan interception.

This test is also failing for me on CentOS 7, Intel x86. Any clues what's wrong here?

Sat, Aug 10, 4:38 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D63877: Avoid infinite loop with asan interception.

This test is also failing for me on CentOS 7, Intel x86. Any clues what's wrong here?

Sat, Aug 10, 2:56 AM · Restricted Project, Restricted Project

Fri, Aug 9

Hahnfeld accepted D65340: [OpenMP][libomptarget] Add support for close map modifier.

LG after addressing the last minor nits (comment + cast + check).

Fri, Aug 9, 1:17 PM · Restricted Project, Restricted Project
Hahnfeld requested changes to D65340: [OpenMP][libomptarget] Add support for close map modifier.

Can you please rebase on top of D66019? Sorry, that'll probably give you some conflicts :-/

Fri, Aug 9, 12:30 PM · Restricted Project, Restricted Project
Hahnfeld committed rG7a0f2dc5a4c0: [libomptarget] Remove duplicate RTLRequiresFlags per device (authored by Hahnfeld).
[libomptarget] Remove duplicate RTLRequiresFlags per device
Fri, Aug 9, 12:22 PM
Hahnfeld committed rL368465: [libomptarget] Remove duplicate RTLRequiresFlags per device.
[libomptarget] Remove duplicate RTLRequiresFlags per device
Fri, Aug 9, 12:20 PM
Hahnfeld closed D66019: [libomptarget] Remove duplicate RTLRequiresFlags per device.
Fri, Aug 9, 12:20 PM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

@Hahnfeld
I would like to clarify something about tgt_register_requires call.

This call is special and in a compiler that supports handling of requires clauses (like unified_shared_memory clause) ( i.e. Clang 9.0 onwards), this function is called before any main function code. Here is a printout with LIBOMPTARGET_DEBUG=1 to realize the difference:

===== WHERE COMPILER PLACES CALL TO __tgt_register_requires() ========
Libomptarget --> Loading RTLs... <<<< THIS IS THE VERY FIRST PRINT FROM THE RUNTIME. The call to __tgt_register_requires is just before that.
Libomptarget --> Loading library 'libomptarget.rtl.ppc64.so'...
Libomptarget --> Successfully loaded library 'libomptarget.rtl.ppc64.so'!
Libomptarget --> Registering RTL libomptarget.rtl.ppc64.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.x86_64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.x86_64.so': libomptarget.rtl.x86_64.so: cannot open shared object file: No such file or directory!
Libomptarget --> Loading library 'libomptarget.rtl.cuda.so'...
Target CUDA RTL --> Start initializing CUDA
Libomptarget --> Successfully loaded library 'libomptarget.rtl.cuda.so'!
Libomptarget --> Registering RTL libomptarget.rtl.cuda.so supporting 4 devices!
Libomptarget --> Loading library 'libomptarget.rtl.aarch64.so'...
Libomptarget --> Unable to load library 'libomptarget.rtl.aarch64.so': libomptarget.rtl.aarch64.so: cannot open shared object file: No such file or directory!
Libomptarget --> RTLs loaded!
Libomptarget --> Image 0x0000000010020080 is compatible with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> RTL 0x000000004747b3e0 has index 0!
Libomptarget --> Registering image 0x0000000010020080 with RTL libomptarget.rtl.ppc64.so!
Libomptarget --> Done registering entries!
===== MANUAL CALL TO __tgt_register_requires() IN MAIN (top of MAIN function!) ========

Manually calling __tgt_register_requires can never actually work correctly because any user space call to this function will be after the binary has been loaded and the flags already decided. The compiler just has to support the handling of the requires clauses.

Fri, Aug 9, 10:57 AM · Restricted Project, Restricted Project
Hahnfeld updated the summary of D66019: [libomptarget] Remove duplicate RTLRequiresFlags per device.
Fri, Aug 9, 10:53 AM · Restricted Project, Restricted Project
Hahnfeld created D66019: [libomptarget] Remove duplicate RTLRequiresFlags per device.
Fri, Aug 9, 10:53 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Fri, Aug 9, 7:34 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Fri, Aug 9, 3:44 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65340: [OpenMP][libomptarget] Add support for close map modifier.
Fri, Aug 9, 2:59 AM · Restricted Project, Restricted Project

Wed, Aug 7

Hahnfeld added inline comments to D65870: [OpenMP] Remove 'unnecessary parentheses'.
Wed, Aug 7, 12:09 PM · Restricted Project, Restricted Project
Hahnfeld accepted D65885: Cleanup unused variable..

Thanks, that's even easier than a pointer to NULL :)

Wed, Aug 7, 11:05 AM · Restricted Project, Restricted Project
Hahnfeld created D65871: [OpenMP] Enable warning about "implicit fallthrough".
Wed, Aug 7, 6:40 AM · Restricted Project, Restricted Project
Hahnfeld created D65870: [OpenMP] Remove 'unnecessary parentheses'.
Wed, Aug 7, 6:40 AM · Restricted Project, Restricted Project
Hahnfeld created D65869: [OMPT] Resolve warnings because of ints in if conditions.
Wed, Aug 7, 6:40 AM · Restricted Project, Restricted Project
Hahnfeld created D65868: [OpenMP] Remove workaround for CMPXCHG.
Wed, Aug 7, 6:40 AM · Restricted Project
Hahnfeld created D65867: [RFC] [OpenMP] Turn on -Wall compiler warnings by default.
Wed, Aug 7, 6:40 AM · Restricted Project, Restricted Project

Tue, Aug 6

Hahnfeld added a comment to D65819: [Driver][Bundler] Improve bundling of object files..

Can you detail what "incorrect linking" means? AFAIK the additional sections were just bloating the executable, but how do they affect correctness?

Tue, Aug 6, 1:57 PM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Additionally, I'd still like to see a small test with manual calls to __tgt_target_data_begin / _end if possible. I'm thinking of the following:

int *a = // malloc
int *device;

__tgt_target_data_begin(...); // with close modifier for a
#pragma omp target data use_device_ptr(a)
{
  device = a;
}
__tgt_target_data_end(...); // correspondingly to the begin call above

// check...

Instead of the nested target data region, you could also use a simple target region in the outer data region, and have a check for use_device_ptr in another test.

I don't mind adding such a test but calling these functions manually is very tedious and error prone and hard to maintain. Are there any benefits I'm missing to this apart from running this test in isolation?

Tue, Aug 6, 8:11 AM · Restricted Project, Restricted Project
Hahnfeld requested changes to D65340: [OpenMP][libomptarget] Add support for close map modifier.

The test will currently fail with older versions of Clang. It must at least be marked UNSUPPORTED for Clang versions older than what-will-be Clang 10.

Tue, Aug 6, 6:09 AM · Restricted Project, Restricted Project

Sat, Aug 3

Hahnfeld added a comment to D65285: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS.

Apologies, but I think my fix for -Wcast-qual is wrong and the code is still broken. Luckily, the latest Intel Compiler still warns about it which forced me to rethink what the pointers try to achieve.

Sat, Aug 3, 4:51 AM · Restricted Project, Restricted Project

Fri, Aug 2

Hahnfeld added inline comments to D65340: [OpenMP][libomptarget] Add support for close map modifier.
Fri, Aug 2, 1:19 PM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

I'd propose to mark the test close_modifier.c as unsupported with older compiler versions. Additionally, can you add a small, new test with manual calls to __tgt_target_data_begin / _end without relying on the compiler?

Fri, Aug 2, 2:24 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65340: [OpenMP][libomptarget] Add support for close map modifier.
Fri, Aug 2, 12:10 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

Oh, but the tests again look add. Please run them through clang-format.

Fri, Aug 2, 12:02 AM · Restricted Project, Restricted Project

Thu, Aug 1

Hahnfeld accepted D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

I'm fine with the last update.

Thu, Aug 1, 11:59 PM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D64375: [OpenMP][Docs] Provide implementation status details.
Thu, Aug 1, 11:55 PM · Restricted Project
Hahnfeld removed a reviewer for D65341: [OpenMP] Add support for close map modifier in Clang: Hahnfeld.
Thu, Aug 1, 9:38 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

I think this still needs some work.

Thu, Aug 1, 9:26 AM · Restricted Project, Restricted Project
Hahnfeld requested changes to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

There's still no call to __tgt_register_requires in the two tests, so I guess they won't pass with older versions of Clang.

Thu, Aug 1, 9:15 AM · Restricted Project, Restricted Project

Wed, Jul 31

Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Wed, Jul 31, 9:56 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

@Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

Wed, Jul 31, 12:44 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

You need to declare them yourself in the test. That's not really elegant, but many other runtime tests already do.

Thanks for the tip! I can do that. Since I'm pretty new to the runtime, could you suggest a similar test that I should look into?

Wed, Jul 31, 12:06 AM · Restricted Project, Restricted Project

Tue, Jul 30

Hahnfeld added a comment to D60972: [OpenMP 5.0] libomptarget interface for declare mapper functions.

Since the mapper is not really implemented in this patch, if I add a test, it will be something like below:

__tgt_push_mapper_component(h, base0, begin0, size0, type0);
__tgt_push_mapper_component(h, base1, begin1, size1, type1);
auto total_size = __tgt_mapper_num_components(h);
printf("size=%d", total_size);
// CHECK: size=2

It seems to me this test is not meaningful. I can add a more meaningful test after all mapper patches are upstreamed.
Do you think we need a meaningless test like this now?

Yes, it's good to have a test, even a very elementary one. When full support for declare mapper is upstreamed we can revisit the test and extend it to check real-use scenarios.

I just realized that these functions (__tgt_push_mapper_component and __tgt_mapper_num_components) are not exposed to users (i.e., defined in omp.h), so the test proposed above is not possible.

I cannot imagine what test it can have for this patch now, so I think we can leave this patch testless. If you have an idea of test, please let me know.

Tue, Jul 30, 1:32 PM · Restricted Project, Restricted Project
Hahnfeld committed rG52b87ac32f57: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS (authored by Hahnfeld).
[OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS
Tue, Jul 30, 11:39 AM
Hahnfeld committed rL367343: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS.
[OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS
Tue, Jul 30, 11:36 AM
Hahnfeld closed D65285: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS.
Tue, Jul 30, 11:36 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Tue, Jul 30, 9:54 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Tue, Jul 30, 12:01 AM · Restricted Project, Restricted Project

Mon, Jul 29

Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 12:23 PM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 11:54 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 11:23 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 8:48 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 8:26 AM · Restricted Project, Restricted Project
Hahnfeld requested changes to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

Oh, and there's still no test for use_device_ptr

Mon, Jul 29, 7:36 AM · Restricted Project, Restricted Project
Hahnfeld added inline comments to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.
Mon, Jul 29, 7:36 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65340: [OpenMP][libomptarget] Add support for close map modifier.

I think this mostly looks good, but depends on D65001, right?

Mon, Jul 29, 7:36 AM · Restricted Project, Restricted Project
Hahnfeld requested changes to D65341: [OpenMP] Add support for close map modifier in Clang.

There's already D55892 with a better set of tests, including target enter data / target exit data.

Mon, Jul 29, 7:36 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65285: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS.

I will contact ittnotify owners just in case. So that we might have lesser burden with possible future updates of this third-party code.

Mon, Jul 29, 7:36 AM · Restricted Project, Restricted Project

Thu, Jul 25

Hahnfeld committed rGbaeab1fc442e: [OpenMP] Fix build of stubs library, NFC. (authored by Hahnfeld).
[OpenMP] Fix build of stubs library, NFC.
Thu, Jul 25, 10:53 AM
Hahnfeld committed rL367041: [OpenMP] Fix build of stubs library, NFC..
[OpenMP] Fix build of stubs library, NFC.
Thu, Jul 25, 10:53 AM
Hahnfeld closed D65284: [OpenMP] Fix build of stubs library.
Thu, Jul 25, 10:52 AM · Restricted Project, Restricted Project
Hahnfeld added a comment to D65284: [OpenMP] Fix build of stubs library.

LGTM (I'd treat this as NFC :).

Thu, Jul 25, 10:51 AM · Restricted Project, Restricted Project
Hahnfeld created D65285: [OpenMP] Rename last file to cpp and remove LIBOMP_CFLAGS.
Thu, Jul 25, 8:55 AM · Restricted Project, Restricted Project
Hahnfeld created D65284: [OpenMP] Fix build of stubs library.
Thu, Jul 25, 8:55 AM · Restricted Project, Restricted Project
Hahnfeld committed rG2488ae9df155: [OpenMP] RISCV64 port (authored by Hahnfeld).
[OpenMP] RISCV64 port
Thu, Jul 25, 7:38 AM
Hahnfeld committed rL367021: [OpenMP] RISCV64 port.
[OpenMP] RISCV64 port
Thu, Jul 25, 7:38 AM
Hahnfeld closed D59880: [OpenMP] RISCV64 port.
Thu, Jul 25, 7:38 AM · Restricted Project
Hahnfeld added a comment to D59880: [OpenMP] RISCV64 port.

I've tested this patch with current upstream LLVM and Clang on a HiFive Unleashed board and all tests are passing, including OMPT. I don't have commit access yet, @Hahnfeld would you mind commiting this for me? Thank you!

Thu, Jul 25, 7:38 AM · Restricted Project
Hahnfeld added a comment to D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps.

@Hahnfeld I added several tests. Because these tests require unified memory to be supported by the underlying system I have added them as a new type of test: check-libomptarget-nvptx-unified (in addition to the check-libomptarget-nvptx one). These tests should only be run on platforms which support unified memory.

Thu, Jul 25, 6:32 AM · Restricted Project, Restricted Project

Jul 23 2019

Hahnfeld added a comment to D65112: [OPENMP][NVPTX]Make the test compatible with CUDA9+, NFC..

But simple removal does not help, actually. It still might produce incorrect results. When you removed the barrier, you introduced implicit threads divergence. Since cuda9+ threads are not executed in lock-step manner anymore (see https://devblogs.nvidia.com/using-cuda-warp-level-primitives/). It leads to the fact that the result you get is not stable and not guaranteed to be reproduced on other platforms by other users.
The runtime relies on the warp-synchronous model and threads in the warp after the critical region must be synchronized. It means, that we should not use barriers here but still need to synchronize threads within the warp. To synchronize the threads we must use __syncwarp(mask) function instead.
Currently, it is pure coincidence that the test passes. It happens just because the second parallel region requires much time to start and execute its body and the threads in the else branch have time to execute their code. But it is not guaranteed in Cuda9+.
To reveal this problem, just enclose the code in else branch (Count += omp_get_level(); // 6 * 1 = 6) under control of another #pragma omp critical.

#pragma omp critical
Count += omp_get_level(); // 6 * 1 = 6

It must produce the same result as before but it won't, most probably.

I still get the correct results. Do you have a test that you know to fail?

I get Expected count = 67 with the critical section in this test. It is on Power9 with Cuda9. Did you try to compile it at O3?

At first I didn't, but now the original test case with added critical in the else branch works with full optimization iff I completely remove the specialization CGOpenMPRuntimeNVPTX::emitCriticalRegion.

But again, it just masks the real problem but does not solve it. It is again just a pure coincidence that it returns the expected result.

for (int I = 0; I < 32; ++I) {
 if (omp_get_thread_num() == I) {
   #pragma omp critical
   Count += omp_get_level(); // 6 * 1 = 6
  }
}

Again, it will fail though it must return correct result.

Jul 23 2019, 12:29 PM · Restricted Project
Hahnfeld added a comment to D65112: [OPENMP][NVPTX]Make the test compatible with CUDA9+, NFC..

I can reproduce that this test hangs on our Volta GPUs and I debugged it briefly: The problem seems to be how Clang emits critical regions, more precisely that the generated code introduces a barrier. Effectively, this assumes that all threads pass via all critical regions the same number of times, making it a worksharing construct. Obviously, this is not true for the current code, only 10 threads are needed to execute the J loop and all other threads wait at the end of the kernel. If I manually remove the barrier(s) from the generated code, the executable finishes and prints the correct results.

Yep, this is what I'm going to fix in my next patches.

I think we should fix this first instead of relaxing a test that fails for something that is easy to fix.

But simple removal does not help, actually. It still might produce incorrect results. When you removed the barrier, you introduced implicit threads divergence. Since cuda9+ threads are not executed in lock-step manner anymore (see https://devblogs.nvidia.com/using-cuda-warp-level-primitives/). It leads to the fact that the result you get is not stable and not guaranteed to be reproduced on other platforms by other users.
The runtime relies on the warp-synchronous model and threads in the warp after the critical region must be synchronized. It means, that we should not use barriers here but still need to synchronize threads within the warp. To synchronize the threads we must use __syncwarp(mask) function instead.
Currently, it is pure coincidence that the test passes. It happens just because the second parallel region requires much time to start and execute its body and the threads in the else branch have time to execute their code. But it is not guaranteed in Cuda9+.
To reveal this problem, just enclose the code in else branch (Count += omp_get_level(); // 6 * 1 = 6) under control of another #pragma omp critical.

#pragma omp critical
Count += omp_get_level(); // 6 * 1 = 6

It must produce the same result as before but it won't, most probably.

I still get the correct results. Do you have a test that you know to fail?

I get Expected count = 67 with the critical section in this test. It is on Power9 with Cuda9. Did you try to compile it at O3?

Jul 23 2019, 11:11 AM · Restricted Project