The implicitly generated mappings for allocation/deallocation in mappers
runtime should be mapped as implicit, also no need to clear member_of
flag to avoid ref counter increment. Also, the ref counter should not be
incremented for the very first element that comes from the mapper
function.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Can we use the reproducer from the bug report, I want an outermost array section with objects that have a declare mapper.
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers_array.cpp | ||
---|---|---|
8 | Thanks for the fixes, Alexey. Are you planning on handling this case separately? |
Sure, will add.
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers_array.cpp | ||
---|---|---|
8 | Yes |
I tested the patch and still get wrong results (I modified the declare_mapper_nested_default_mappers_complex_structure.cpp test to use my verbose printf from bugzilla):
Modified Data Hierarchy: Object C (0x623e50, 2) Contents: Object B (0x623e80, 2) Contents: Object A (0x6243f0) Contents: data1 = 6439680 data2 = 0 Object A (0x6243f8) Contents: data1 = -784902216 data2 = 11062 Object B (0x623e90, 2) Contents: Object A (0x6249e0) Contents: data1 = 6441856 data2 = 0 Object A (0x6249e8) Contents: data1 = 6439904 data2 = 0 Object C (0x623e60, 2) Contents: Object B (0x623ef0, 2) Contents: Object A (0x624b90) Contents: data1 = 6438736 data2 = 0 Object A (0x624b98) Contents: data1 = 6441344 data2 = 0 Object B (0x623f00, 2) Contents: Object A (0x624c60) Contents: data1 = 6444032 data2 = 0 Object A (0x624c68) Contents: data1 = 6441856 data2 = 0
The values inside the target region look good, but things break when mapping back.
When I remove the teams distribute for and only leave #pragma omp target, I get a different result and the test mistakenly passes:
Modified Data Hierarchy: Object C (0x622e40, 2) Contents: Object B (0x622e70, 2) Contents: Object A (0x6233e0) Contents: data1 = 6433120 data2 = 0 Object A (0x6233e8) Contents: data1 = 11 data2 = 22 Object B (0x622e80, 2) Contents: Object A (0x6239d0) Contents: data1 = 6435792 data2 = 0 Object A (0x6239d8) Contents: data1 = 11 data2 = 22 Object C (0x622e50, 2) Contents: Object B (0x622ee0, 2) Contents: Object A (0x623b80) Contents: data1 = 6437312 data2 = 0 Object A (0x623b88) Contents: data1 = 11 data2 = 22 Object B (0x622ef0, 2) Contents: Object A (0x623c50) Contents: data1 = 6437744 data2 = 0 Object A (0x623c58) Contents: data1 = 11 data2 = 22 Testing for correctness... outer[1].arr[1].arr[1].data2 = 22.
I'm currently only testing x86_64 offloading.
The problem is in transferring from the device. Looks like we need not only to invert the list of data in one data transfer block but also invert the list of blocks. Trying to implement this.
The latest changes fix the issue. I tested with x86_64 and nvidia offloading.
The behavior and tests looks good to me, just one comment.
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers_complex_structure.cpp | ||
---|---|---|
124–127 | Do these asserts make sense with the assert in line 104? |
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers_complex_structure.cpp | ||
---|---|---|
124–127 | Ah, just forgot to remove these lines, thanks! |
Hmm, this app does not use mappers. Are you sure that this commit is the actual cause of the problem? Also, could you provide detailed debug log for the investigation (with LIBOMPTARGET_DEBUG set)?
I did debug something else and the clang-omp branch failed until I reverted this patch, at least that is what I thought. I will look again as soon as I find the time. For sure it run fine until very recently and we should make sure it continues to do so ;)
Definitely! That's why I'm asking for the debug log for the investigation. It requires CUDA but I don't have an access to the machine with cuda.
openmp/libomptarget/test/mapping/declare_mapper_nested_default_mappers_array.cpp | ||
---|---|---|
55 | Yes, looks so, missed it. |
Thanks for the fixes, Alexey. Are you planning on handling this case separately?