User Details
- User Since
- Feb 16 2016, 1:22 PM (257 w, 2 d)
Tue, Jan 12
Also, since D94565 is merged now, this patch needs rebasing :)
One question is, why do we need two libraries? The only difference is, the static library contains omp_data.cu and the bitcode library doesn't. It's unclear why they were implemented in this way
Tue, Jan 5
Slight optimization - removed condition from the hot path.
Mon, Jan 4
Thanks for the changes! LGTM.
Mon, Dec 28
Wed, Dec 23
Nov 17 2020
Nov 6 2020
LGTM. Thanks!
Nov 5 2020
The link in the description for the clang patch is outdated (that patch has been abandoned), can you replace it with https://reviews.llvm.org/D84192? Thanks!
Oct 20 2020
Can we now abandon D75581?
Oct 16 2020
Oct 12 2020
Oct 5 2020
I just revisited this patch. It seems it's based on a very early implementation of the base memory manager patch (D81054). Can you rebase the patch so that we review it? Thanks!
Sep 29 2020
I think these API functions should also include the source location pointer from https://reviews.llvm.org/D87946. We need to consider renaming the *_issue and *_wait functions to extend the *_loc API the aforementioned patch is introducing. E.g. after D87946 the "current" data begin API function will be __tgt_target_data_begin_mapper_loc, so this patch should extend that name as __tgt_target_data_begin_mapper_loc_issue and __tgt_target_data_begin_mapper_loc_wait. Because both patches make changes to the API, I think it's better to wait until the former patch has been committed.
Sep 25 2020
LGTM as well.
Sep 23 2020
Sep 21 2020
Sep 18 2020
Sep 1 2020
Minor comments (about typos) to be taken into account in case of a future patch.
Aug 28 2020
Looks good!
Aug 17 2020
Aug 6 2020
Nice catch! Looks good.
Aug 5 2020
LGTM. Thanks!
Looks good.
Aug 4 2020
OK, now it makes sense. LGTM
I suppose the same must be applied to all cases where present is used. Without a modification like this, present just confirms that the starting address of the object is already mapped, although a size may be specified in the map clause.
#pragma omp target map(present: p[0:10]) // present will succeed even if only p[0] is mapped and p[1]-p[9] are not
So the question here is: should present apply to the size as well as the begin address? I would say yes.
So does the mapper function emit entries in reverse order upon exiting a target/target data region?
Aug 3 2020
Jul 29 2020
Jul 27 2020
Thanks, it makes sense. Can you submit a patch?
Any idea why this change broke some tests? And how does --crash fix the problem?
This looks much better now. I don't have any other comments. Since this patch is now essentially a clang-only patch, I'll let @ABataev accept it or post comments.
(Earlier I posted here an answer for another patch). Looks good, obviously it must be committed after the clang patch.
Jul 24 2020
After this patch was committed, I tried to run the following example:
#include <stdio.h>
So let's proceed with the patch.
Jul 23 2020
@lildmh: I think I've found a bug. I used declare_mapper_target.cpp. When we call the mapper function, it generates 3 components. The first 2 are identical and correspond to the parent struct. This is what MapperComponents looks like inside function target_data_mapper:
(gdb) print MapperComponents $1 = {Components = std::vector of length 3, capacity 4 = {{Base = 0x7fffffffb598, Begin = 0x7fffffffb598, Size = 8, Type = 32}, {Base = 0x7fffffffb598, Begin = 0x7fffffffb598, Size = 8, Type = 32}, {Base = 0x7fffffffb598, Begin = 0x62efd0, Size = 4096, Type = 562949953421331}}}
Mapping the parent struct twice is problematic. If we have more struct members and some of them are NOT pointers, then upon target_data_end libomptarget will check the parent struct's reference counter to determine whether the scalar member must be copied back to the host. If the reference counter is greater than 1, then the runtime will skip copying back the scalar. Mapping the parent struct two times in a row results in RefCount=2.
I tried the patch and indeed it fixes the problem with target_depend_nowait.cpp. I'll let someone else do the review though.
So is the test case that motivated this patch illegal OpenMP code?
#pragma omp target enter data map(alloc:i) #pragma omp target data map(present, alloc: i) { #pragma omp target exit data map(delete:i) // you cannot delete that object in the scope, illegal code? } // fails presence check here
What confuses me about this interpretation of the standard is the inconsistency at data exit. So if we have an explicit omp target exit data map(present...) then we should respect the "present" semantics, whereas when we have a scoped data exit:
#pragma omp target data map(present,...) { ... } // implicit "exit data" here
then "present" should be ignored.
Jul 22 2020
@lildmh I've got a question unrelated to the problem we are discussing here. I ran declare_mapper_target.cpp and when libomptarget calls the mapper function it prints the following:
Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffcebd0fb48) adds an entry (Base=0x00007ffcebd101e0, Begin=0x00007ffcebd101e0, Size=8, Type=0x20). Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffcebd0fb48) adds an entry (Base=0x00007ffcebd101e0, Begin=0x000000000231bfe0, Size=4096, Type=0x2000000000013)
Why is the second entry's MEMBER_OF field set to 2? It should be MEMBER_OF 1, since the pointer-pointee pair c.a[0:N] is part of struct c which is the first entry on the list.
OK, I suspect there is a race condition involving the CUDA plugin. If I compile the test on x86_64-pc-linux-gnu then I always get the correct result, no matter whether we print debug output or not.
My system was messed up and used libraries and compilers from different builds. Please ignore my previous message.
I tried to run declare_mapper_target.cpp on a Nvidia GPU. The problem occurs while loading the device image:
Target CUDA RTL --> Error returned from cuModuleLoadDataEx Target CUDA RTL --> CUDA error is: device kernel image is invalid
Sounds like a clang problem, I don't see why libomptarget could be the culprit here.
Sure, go ahead!
Jul 21 2020
OK, let's proceed with this version now and later on we can fine-tune the behavior.
Jul 20 2020
I'm having a look. The problem actually occurs after D67833 (the clang patch), not D68100. If you see at the backtrace in gdb, control never reaches a libomptarget entry point, things fail once we enter the outlined function for the parallel region via __kmp_invoke_microtask. I doubt it's a libomptarget issue.
Jul 17 2020
Jul 16 2020
abort() does not flush stdout, but here we are printing to stderr which is not buffered so there is no need to flush.
I can confirm the warnings are gone with this fix.
Jul 15 2020
OK, now it works. Thanks!
I tried to build clang with this patch and I get errors like:
CGOpenMPRuntime.cpp:9463:38: error: ‘OMPRTL___tgt_target_teams_nowait_mapper’ was not declared in this scope ? OMPRTL___tgt_target_teams_nowait_mapper
Jul 14 2020
No, it's not needed anymore. This patch bypasses the need to do that refactoring. Can you please abandon that revision?
Ping. If the patch lands toady or tomorrow, then we will meet the clang-11 deadline and include support for declare mapper.
Jul 9 2020
The patch looks good now, thanks for all the work!
Jul 8 2020
Jul 7 2020
Jul 6 2020
I tried to address our previous complaints about code duplication and came up with a scheme which results in a much shorter and cleaner diff with virtually no code duplication. Instead of refactoring code from taget_data_begin/end/update, I introduced a new internal function target_data_mapper which generates new arrays args_base, args, arg_sizes and arg_types for the custom mapper and calls target_data_begin/end/update again using the new arguments.
Looks good. Do we know why the test doesn't run without -fPIC?
Jul 2 2020
Like D82224, looks good.
Jul 1 2020
Jun 23 2020
Looks good now. Thanks!