Page MenuHomePhabricator

protze.joachim (Joachim Protze)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 24 2015, 4:45 AM (299 w, 4 d)

Recent Activity

Today

protze.joachim updated the summary of D104633: [OpenMP][OMPT] Fix compile-time assertion in multiplex.h.
Mon, Jun 21, 3:55 AM · Restricted Project
protze.joachim requested review of D104633: [OpenMP][OMPT] Fix compile-time assertion in multiplex.h.
Mon, Jun 21, 3:53 AM · Restricted Project

Thu, Jun 17

protze.joachim added a comment to D104418: [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement.

Correct, but in the body of the task, BlockA[0] and BlockB[0] are only read. It's valid to mark them as in only.

Thu, Jun 17, 1:15 PM · Restricted Project
protze.joachim added a comment to D104418: [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement.

So is the race this is trying to fix between the host and the target, or between regions running on the target?

Passing data back to the host suggests the former, but that doesn't make sense, because the async operations are queued up within an async object that is not shared between host threads (and has a lexically scoped lifetime, iirc).

If it's a race between different target regions on the device, we should be able to resolve that with no host involvement at all.

It's a data race between host and plugin. https://bugs.llvm.org/show_bug.cgi?id=49940 has detailed analysis. Long story short, let's say we have two concurrent target regions T1 and T2 both reading memory region M. Assume T1 is scheduled to execute first. It then maps M to the device. The mapping consists of several steps:
0. Get the lock of mapping table.

  1. Look up into the mapping table to see if M is already mapped. - No.
  2. Allocate device memory for M and create a mapping entry for M.
  3. Issue data transfer.
  4. Release the lock.
  5. Do remaining things, such as more data transfer, or directly launch the kernel.

Everything seems fine. Now T2 starts to execute. It also maps M to the device. However, steps are slightly different.
0. Get the lock of mapping table.

  1. Look up into the mapping table to see if M is already mapped. - Yes.
  2. Since the mapping is already there, T2 will NOT issue data transfer. This can improve performance. It doesn't make sense to transfer same data for several times.
  3. Do remaining things, like Step 5 above.

It also looks good for this logic. However, there is a data race. Since we now have async data movement, Step 3 for T1 can only guarantee that movement of M is issued, but it cannot guarantee that after Step 3, the data is already on the device. For T2, its logic is, if the mapping is already in the table, it "assumes" that the data is already on the device, and therefore it doesn't issue data transfer. And then, bang! For the case that T2 is scheduled to execute by the GPU scheduler ahead of the data movement of M, the memory region on the device contains nothing from host. It is random.

So in one sentence, the root cause is the misalignment between assumption of mapping lookup logic and non-atomic data movement operations. It is not a problem for targets not supporting async data movement because everything can be guarded by the lock. Also, for those targets supporting async data movement, we don't want to use synchronous data movement here as well. The reason is obvious, for better performance. Data movement is extremely expensive, and multiple data movements can be potentially parallel, even for one single target region.

What I'm trying to do in this patch is to establish the dependency between the data movement and all other tasks using that data. We want to utilize the device side synchronization to do that by using event. The interesting thing for the event is, from host's perspective, "wait for an event" means just insert the event to the queue, and that's it. It is not like a barrier because it will not block the host. But yes, it is a barrier for the device scheduler for sure that all enqueued operations following the event can only start execution if the event is fulfilled.

Thu, Jun 17, 8:34 AM · Restricted Project
protze.joachim added a comment to D104418: [OpenMP][Offloading] Fixed data race in libomptarget caused by async data movement.

I must admit that I have no idea about the internals of libomptarget. But here is my high-level view on this issue:

Thu, Jun 17, 1:05 AM · Restricted Project

Wed, Jun 16

protze.joachim committed rGd2a7871b5e6a: [OpenMP][NFC] Add back suppression of warning (authored by protze.joachim).
[OpenMP][NFC] Add back suppression of warning
Wed, Jun 16, 1:16 AM
protze.joachim committed rGcff215565e93: [OpenMP] Remove unused variables from libomp code (authored by protze.joachim).
[OpenMP] Remove unused variables from libomp code
Wed, Jun 16, 12:37 AM
protze.joachim closed D104303: [OpenMP] Remove unused variables from libomp code.
Wed, Jun 16, 12:37 AM · Restricted Project

Tue, Jun 15

protze.joachim added a comment to D104303: [OpenMP] Remove unused variables from libomp code.

Were these seen from gcc warnings or a static analysis tool?

Tue, Jun 15, 2:17 PM · Restricted Project
protze.joachim requested review of D104303: [OpenMP] Remove unused variables from libomp code.
Tue, Jun 15, 8:20 AM · Restricted Project
protze.joachim added inline comments to D100997: [OpenMP] Refactor/Rework topology discovery code.
Tue, Jun 15, 7:52 AM · Restricted Project

Mon, Jun 14

protze.joachim added a comment to D102229: [libomptarget][nfc] Add hook to easily disable building amdgcn bclib.

It seems like a change in this patch introduced a dependency issue for fresh builds.

Mon, Jun 14, 1:28 PM · Restricted Project

Wed, Jun 9

protze.joachim committed rG639b3979310d: [OpenMP][Tools] Fix Archer handling of task dependencies (authored by protze.joachim).
[OpenMP][Tools] Fix Archer handling of task dependencies
Wed, Jun 9, 4:39 AM
protze.joachim committed rG08d8f1a958bd: [OpenMP][Tools] Cleanup memory pool used in Archer (authored by protze.joachim).
[OpenMP][Tools] Cleanup memory pool used in Archer
Wed, Jun 9, 4:39 AM
protze.joachim committed rG82e4e505315b: [OpenMP][Tools] Fix Archer for MACOS (authored by protze.joachim).
[OpenMP][Tools] Fix Archer for MACOS
Wed, Jun 9, 4:38 AM
protze.joachim closed D103608: [OpenMP][Tools] Fix Archer handling of task dependencies.
Wed, Jun 9, 4:38 AM · Restricted Project
protze.joachim closed D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.
Wed, Jun 9, 4:38 AM · Restricted Project
protze.joachim closed D103607: [OpenMP][Tools] Fix Archer for MACOS.
Wed, Jun 9, 4:38 AM · Restricted Project
protze.joachim added inline comments to D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.
Wed, Jun 9, 1:54 AM · Restricted Project
protze.joachim updated the diff for D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.

Making DataPoolEntry non-virtual

Wed, Jun 9, 1:53 AM · Restricted Project

Tue, Jun 8

protze.joachim added inline comments to D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type.
Tue, Jun 8, 7:50 AM · Restricted Project, Restricted Project
protze.joachim added a comment to D103885: [clang] Suppress warnings for tautological comparison in generated macro code.

I don't think, this needs ifdefs. As I understand, a compiler is supposed to ignore unknown pragmas.
I fully agree that the suppression will not be compatible with other compilers.

Tue, Jun 8, 5:26 AM · Restricted Project, Restricted Project, Restricted Project
protze.joachim requested review of D103885: [clang] Suppress warnings for tautological comparison in generated macro code.
Tue, Jun 8, 4:05 AM · Restricted Project, Restricted Project, Restricted Project

Mon, Jun 7

protze.joachim added inline comments to D103607: [OpenMP][Tools] Fix Archer for MACOS.
Mon, Jun 7, 3:46 PM · Restricted Project
protze.joachim updated the diff for D103608: [OpenMP][Tools] Fix Archer handling of task dependencies.

Applied a more recent version of clang-format

Mon, Jun 7, 3:22 PM · Restricted Project
protze.joachim updated the diff for D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.

Renamed class as suggested. Also applied a more recent version of clang-format.

Mon, Jun 7, 3:21 PM · Restricted Project
protze.joachim added inline comments to D103608: [OpenMP][Tools] Fix Archer handling of task dependencies.
Mon, Jun 7, 3:10 PM · Restricted Project
protze.joachim updated the diff for D103608: [OpenMP][Tools] Fix Archer handling of task dependencies.

Implement @Hahnfeld 's suggested changes

Mon, Jun 7, 3:10 PM · Restricted Project

Sun, Jun 6

protze.joachim added a comment to D103767: [OpenMP] Remove TSan annotations from libomp.

"libarcher.so, which is automatically loaded if an OpenMP application is built with TSan." Is this valid with gcc + libomp combination?

Sun, Jun 6, 11:43 AM · Restricted Project
protze.joachim requested review of D103767: [OpenMP] Remove TSan annotations from libomp.
Sun, Jun 6, 9:24 AM · Restricted Project
protze.joachim accepted D103711: [OpenMP] [host runtime] add .clang-tidy file.

I think, this reflects what we discussed in the OpenMP in LLVM meeting. Thanks for working on this!

Sun, Jun 6, 9:17 AM · Restricted Project
protze.joachim added inline comments to D103607: [OpenMP][Tools] Fix Archer for MACOS.
Sun, Jun 6, 9:14 AM · Restricted Project

Thu, Jun 3

protze.joachim requested review of D103608: [OpenMP][Tools] Fix Archer handling of task dependencies.
Thu, Jun 3, 3:54 AM · Restricted Project
protze.joachim requested review of D103607: [OpenMP][Tools] Fix Archer for MACOS.
Thu, Jun 3, 3:29 AM · Restricted Project
protze.joachim updated the summary of D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.
Thu, Jun 3, 3:20 AM · Restricted Project
protze.joachim requested review of D103606: [OpenMP][Tools] Cleanup memory pool used in Archer.
Thu, Jun 3, 3:18 AM · Restricted Project

Wed, Jun 2

protze.joachim added a reviewer for D101082: [OpenMP] Fix deadlock for detachable task with child tasks: adurang.
Wed, Jun 2, 8:38 AM · Restricted Project
protze.joachim accepted D103464: [OpenMP] Use new task type/flag for taskwait depend events.

LGTM. Thanks!

Wed, Jun 2, 7:36 AM · Restricted Project

Tue, May 25

protze.joachim added inline comments to D100181: [OpenMP] [OMPD] [1/6] Implementation of OMPD debugging library - libompd. Code changes in openmp/runtime to support libompd..
Tue, May 25, 3:46 PM · Restricted Project

May 19 2021

protze.joachim added inline comments to D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type.
May 19 2021, 8:35 AM · Restricted Project, Restricted Project
protze.joachim updated the diff for D101082: [OpenMP] Fix deadlock for detachable task with child tasks.

The initial fix broke in some cases. I missed the difference between the incomplete and allocated task counters.
This updated patch should fix the issue, but I still run into starvation issues. I'll attach another reproducer in the bug.

May 19 2021, 8:32 AM · Restricted Project

May 18 2021

protze.joachim added inline comments to D97085: [OpenMP] libomp: implement OpenMP 5.1 inoutset task dependence type.
May 18 2021, 4:52 AM · Restricted Project, Restricted Project

May 12 2021

protze.joachim added inline comments to D102230: [libomptarget][amdgpu][nfc] Expand errorcheck macros.
May 12 2021, 7:05 AM · Restricted Project

May 11 2021

protze.joachim added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

We are getting build errors internally with this change. They are all related to libomptarget. Our internal build script uses gcc to build llvm.

One example:

[108/122] Generating target_impl.gfx700.bc
FAILED: projects/openmp/libomptarget/deviceRTLs/amdgcn/target_impl.gfx700.bc
cd /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/projects/openmp/libomptarget/deviceRTLs/amdgcn && /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/bin/clang-13 -xc++ -c -std=c++14 -ffreestanding -target amdgcn-amd-amdhsa -emit-llvm -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__AMDGCN__ -Xclang -target-cpu -Xclang gfx700 -fvisibility=default -Wno-unused-value -nogpulib -O2 -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/common/include -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs /home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip -o target_impl.gfx700.bc
/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:185:25: error: use of undeclared identifier '__UINT64_C'
  lo = (uint32_t)(val & UINT64_C(0x00000000FFFFFFFF));
                        ^
/home/wangwei/fbsource/fbcode/third-party2/gcc/9.x/centos7-native/3bed279/lib/gcc/x86_64-redhat-linux-gnu/9.x/include/stdint-gcc.h:254:21: note: expanded from macro 'UINT64_C'
#define UINT64_C(c) __UINT64_C(c)
                    ^
/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:186:26: error: use of undeclared identifier '__UINT64_C'
  hi = (uint32_t)((val & UINT64_C(0xFFFFFFFF00000000)) >> 32);
                         ^
/home/wangwei/fbsource/fbcode/third-party2/gcc/9.x/centos7-native/3bed279/lib/gcc/x86_64-redhat-linux-gnu/9.x/include/stdint-gcc.h:254:21: note: expanded from macro 'UINT64_C'
#define UINT64_C(c) __UINT64_C(c)
                    ^
2 errors generated.
May 11 2021, 2:12 AM · Restricted Project
protze.joachim added inline comments to D100181: [OpenMP] [OMPD] [1/6] Implementation of OMPD debugging library - libompd. Code changes in openmp/runtime to support libompd..
May 11 2021, 1:46 AM · Restricted Project
protze.joachim added a comment to D101882: [OpenMP] Fix hidden helper + affinity assignment.

Do we set any affinity for the hidden helper threads, or are they free floating?

The hidden helpers do get their affinity set as if they were normal worker threads.

e.g., with this patch and if KMP_AFFINITY=compact (and two hardware threads per core), then
regular gtid 0 is pinned to the first core
regular gtid 9 is pinned to the first core
regular gtid 10 is pinned to the second core
regular gtid 11 is pinned to the second core
regular gtid 12 is pinned to the third core
...
hidden helper gitd 1 is pinned to the first core
hidden helper gtid 2 is pinned to the second core
hidden helper gtid 3 is pinned to the second core
hidden helper gtid 4 is pinned to the third core
...
hidden helper gtid 8 is pinned to the fifth core

Is there a consensus on if we want them free-floating or not? I assume we do, but want to make sure.

May 11 2021, 12:52 AM · Restricted Project
protze.joachim accepted D101498: [OpenMP] Test unified shared memory tests only on systems that support it..

Ok, let's see what people with different test systems report.

May 11 2021, 12:43 AM · Restricted Project

May 7 2021

protze.joachim added inline comments to D101960: [openmp] Drop requirement on library path environment variables.
May 7 2021, 10:22 AM · Restricted Project, Restricted Project
protze.joachim accepted D102054: [AMDGPU][OpenMP] Disable tests when amdgpu-arch fails.

lgtm as well.

May 7 2021, 9:45 AM · Restricted Project

May 6 2021

protze.joachim added a comment to D102017: [libomptarget] Disable test bug49334 on amdgpu.

You can just use UNSUPPORTED.

May 6 2021, 3:46 PM · Restricted Project
protze.joachim added a comment to D101082: [OpenMP] Fix deadlock for detachable task with child tasks.

I still have issues with this patch and I think, it's not the right solution. I wonder, why __kmp_bottom_half_finish_proxy is necessary at all - see the inline comment.
If it's necessary, I tend to change the "imaginary children" code to td_incomplete_child_tasks |= 0x4000000 and td_incomplete_child_tasks &= !0x4000000. Then wait while (td_incomplete_child_tasks & 0x4000000) != 0

May 6 2021, 8:55 AM · Restricted Project
protze.joachim added a comment to rGc9ae3c5e1079: [openmp] Disable tests flaky on Debian.

I'm not convinced that disabling the tests is a good idea. While it might not be optimal that they don't trigger reliably, their failing shows that something is fundamentally broken in the runtime.

May 6 2021, 7:11 AM

May 5 2021

protze.joachim added a comment to D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.

Where's the logic to skip running these tests if ./amdgpu-arch fails?

May 5 2021, 11:25 AM · Restricted Project
protze.joachim updated subscribers of D101882: [OpenMP] Fix hidden helper + affinity assignment.

@ronlieb Thanks for testing! When I saw this patch, I thought that broken thread-binding might possibly cause the observed performance issue.

May 5 2021, 5:49 AM · Restricted Project
protze.joachim added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

For me, the main use-case for stand-alone builds is runtime code development. In that case, I use a ("nightly") build of LLVM and use that to compile and test stand-alone runtime builds.
Switching branches regularly results in recompiling a lot of code. With stand-alone runtime builds, this is restricted to OpenMP runtime code.

I don't really get why you would not rebase the runtime you are developing instead but as long as this doesn't block the patch it does not matter much.

May 5 2021, 5:13 AM · Restricted Project
protze.joachim updated subscribers of D101882: [OpenMP] Fix hidden helper + affinity assignment.

Do we set any affinity for the hidden helper threads, or are they free floating?

May 5 2021, 1:33 AM · Restricted Project

May 4 2021

protze.joachim added a comment to D96893: [OpenMP] libomp minor cleanup.

I temporarily disabled assertion until the bug with dependences is fixed.

May 4 2021, 7:38 AM · Restricted Project

Apr 30 2021

protze.joachim added a comment to D101498: [OpenMP] Test unified shared memory tests only on systems that support it..

The logic to identify whether certain features are supported for testing is in openmp/cmake/OpenMPTesting.cmake.

The file does not contain target-specific features, but "Set up testing infrastructure" (According to the comment in the top-level CMakeLists.txt). At the point when it is invoked, libomptarget has not been processed yet, making it impossible to determine target-dependent features.

Apr 30 2021, 7:14 AM · Restricted Project
protze.joachim added inline comments to D101265: [OpenMP][CMake] Use in-project clang as CUDA->IR compiler..
Apr 30 2021, 6:48 AM · Restricted Project, Restricted Project
protze.joachim accepted D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.

Thanks!
lgtm

Apr 30 2021, 6:33 AM · Restricted Project

Apr 29 2021

protze.joachim added a comment to D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).

https://reviews.llvm.org/D101498 will fix this by allowing to distinguish
platforms supporting unified shared memory from platforms that don't.
Do you think, that looking at the gpu generation is sufficient, or does the
host architecture matter as well?

Apr 29 2021, 12:48 PM · Restricted Project
protze.joachim added a comment to D101265: [OpenMP][CMake] Use in-project clang as CUDA->IR compiler..

If OpenMP is built via LLVM_ENABLE_PROJECTS, it is *intentional* to build OpenMP w/o using the clang in recent build.

Is this documented somewhere?

You could refer to https://llvm.org/docs/CMake.html for LLVM_ENABLE_PROJECTS and https://llvm.org/docs/BuildingADistribution.html for LLVM_ENABLE_RUNTIMES.

The first link does not mention what is intended to build with what compiler.
The second link is for creating a distributable package, which does not apply here.

The two links show the difference between LLVM_ENABLE_RUNTIMES and LLVM_ENABLE_PROJECTS. It does mention that projects in LLVM_ENABLE_RUNTIMES are built by the recent built clang.

Apr 29 2021, 3:57 AM · Restricted Project, Restricted Project
protze.joachim added inline comments to D101265: [OpenMP][CMake] Use in-project clang as CUDA->IR compiler..
Apr 29 2021, 3:44 AM · Restricted Project, Restricted Project
protze.joachim added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

For me, the main use-case for stand-alone builds is runtime code development. In that case, I use a ("nightly") build of LLVM and use that to compile and test stand-alone runtime builds.
Switching branches regularly results in recompiling a lot of code. With stand-alone runtime builds, this is restricted to OpenMP runtime code.

Apr 29 2021, 3:34 AM · Restricted Project
protze.joachim added a comment to D97329: [OpenMP] Fixed a crash when offloading to x86_64 with target nowait.

The test tends to fail for x86_64 offloading. When I reduce the BS to 64, the test also fails for nvptx offloading.

Apr 29 2021, 2:18 AM · Restricted Project
protze.joachim added a comment to D95976: [OpenMP] Simplify offloading parallel call codegen.

Please update the test with a NFC commit.

Apr 29 2021, 2:04 AM · Restricted Project, Restricted Project, Restricted Project
protze.joachim requested changes to D101498: [OpenMP] Test unified shared memory tests only on systems that support it..

The logic to identify whether certain features are supported for testing is in openmp/cmake/OpenMPTesting.cmake. Please add checks there to identify unified-shared-memory support.

Apr 29 2021, 12:37 AM · Restricted Project

Apr 28 2021

protze.joachim added inline comments to D100181: [OpenMP] [OMPD] [1/6] Implementation of OMPD debugging library - libompd. Code changes in openmp/runtime to support libompd..
Apr 28 2021, 7:33 AM · Restricted Project
protze.joachim added a comment to D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.

It might help in the future, if you add comments for each XFAIL you add, saying what feature is missing to make the test case work (like you did with printf).
-> This would naturally result in a separate XFAIL line for each offloading target.

Apr 28 2021, 7:21 AM · Restricted Project

Apr 27 2021

protze.joachim committed rG24f836e8fd6f: [OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2) (authored by protze.joachim).
[OpenMP][libomptarget] Separate lit tests for different offloading targets (2/2)
Apr 27 2021, 6:55 AM
protze.joachim closed D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).
Apr 27 2021, 6:54 AM · Restricted Project
protze.joachim updated the diff for D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).

Rebased to master, fixing one conflict.

Apr 27 2021, 3:50 AM · Restricted Project
protze.joachim added inline comments to D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).
Apr 27 2021, 3:38 AM · Restricted Project
protze.joachim updated the diff for D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).

I rerun the tests for x86_64 and nvptx and removed the XFAIL for all tests succeeding in repeated tests.

Apr 27 2021, 3:35 AM · Restricted Project
protze.joachim committed rGb845217b1dad: [OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2) (authored by protze.joachim).
[OpenMP][libomptarget] Separate lit tests for different offloading targets (1/2)
Apr 27 2021, 3:30 AM
protze.joachim closed D101315: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets.
Apr 27 2021, 3:30 AM · Restricted Project

Apr 26 2021

protze.joachim added a comment to D101266: [OpenMP][CMake] Pass --cuda-path to regression tests..

On our site, the dependencies are not necessarily found in the same directory when moving from build system to compute system. So, for production use I actually prefer LD_LIBRARY_PATH & CO, which are managed by the environment-modules system.

Apr 26 2021, 2:52 PM · Restricted Project
protze.joachim added a comment to D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.

With properly configured cmake and lit, it should be possible to add tests run separately for architectures. For e.g. it should be possible to create targets like check-libomptarget-amdgcn-amd-amdhsa which would run tests only for amdgcn and similar for others. These can be grouped under existing check-libomptarget.

Apr 26 2021, 2:35 PM · Restricted Project
protze.joachim updated the diff for D101315: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets.

I moved the RUN line changes into D101326, which updates all tests.

Apr 26 2021, 2:30 PM · Restricted Project
protze.joachim requested review of D101326: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets (2/2).
Apr 26 2021, 2:27 PM · Restricted Project
protze.joachim updated the diff for D101315: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets.

The majority of tests has 5 identical RUN lines, which just differ in the triple-suffix. These 5 identical lines fold into the generic RUN line like I showed for private_mappings.c
The code I added in lit.cfg effectively makes the generic RUN line an alias for the selected specific RUN line.
I'm not sure, what special handling for nvptx you have in mind.

Apr 26 2021, 1:18 PM · Restricted Project
protze.joachim requested review of D101315: [OpenMP][libomptarget][test] Separate lit tests for different offloading targets.
Apr 26 2021, 11:23 AM · Restricted Project
protze.joachim added a comment to D99656: [AMDGPU][OpenMP] Enable Libomptarget runtime tests.

I guess there's no XFAIL equivalent here? In that case we should probably leave off the RUN line for the tests that can't work, as they'll otherwise break the build once an amdgpu CI machine goes live

Apr 26 2021, 12:23 AM · Restricted Project

Apr 22 2021

protze.joachim requested review of D101082: [OpenMP] Fix deadlock for detachable task with child tasks.
Apr 22 2021, 10:05 AM · Restricted Project

Apr 21 2021

protze.joachim accepted D100673: [OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault..

The latest changes fix the issue. I tested with x86_64 and nvidia offloading.

Apr 21 2021, 7:15 AM · Restricted Project, Restricted Project

Apr 19 2021

protze.joachim requested changes to D100673: [OPENMP]Fix PR49698: OpenMP declare mapper causes segmentation fault..

I tested the patch and still get wrong results (I modified the declare_mapper_nested_default_mappers_complex_structure.cpp test to use my verbose printf from bugzilla):

Modified Data Hierarchy:
Object C (0x623e50, 2) Contents:
        Object B (0x623e80, 2) Contents:
                Object A (0x6243f0) Contents:
                        data1 = 6439680  data2 = 0
                Object A (0x6243f8) Contents:
                        data1 = -784902216  data2 = 11062
        Object B (0x623e90, 2) Contents:
                Object A (0x6249e0) Contents:
                        data1 = 6441856  data2 = 0
                Object A (0x6249e8) Contents:
                        data1 = 6439904  data2 = 0
Object C (0x623e60, 2) Contents:
        Object B (0x623ef0, 2) Contents:
                Object A (0x624b90) Contents:
                        data1 = 6438736  data2 = 0
                Object A (0x624b98) Contents:
                        data1 = 6441344  data2 = 0
        Object B (0x623f00, 2) Contents:
                Object A (0x624c60) Contents:
                        data1 = 6444032  data2 = 0
                Object A (0x624c68) Contents:
                        data1 = 6441856  data2 = 0
Apr 19 2021, 4:29 AM · Restricted Project, Restricted Project

Apr 14 2021

protze.joachim added inline comments to D99803: [openmp] Add OMPT initialization in libomptarget.
Apr 14 2021, 1:33 AM · Restricted Project
protze.joachim accepted D100414: [OpenMP] Fix printing routine for OMP_TOOL_VERBOSE_INIT.

LGTM. Thanks!

Apr 14 2021, 1:22 AM · Restricted Project

Apr 6 2021

protze.joachim added a comment to D99353: [driver] Make `clang` warn rather then error on `flang` options.

Btw, how important are these aliases for you?

Apr 6 2021, 2:13 PM · Restricted Project, Restricted Project

Mar 30 2021

protze.joachim abandoned D70412: [OpenMP][Tool] Runtime warning for missing TSan-option.

Phab also only lets me abandon the review

Mar 30 2021, 10:21 AM · Restricted Project

Mar 29 2021

protze.joachim added a comment to D99353: [driver] Make `clang` warn rather then error on `flang` options.

Thank you for working on this! It works for me.
As you mentioned in D95460, this makes the behavior more similar to gcc.

Mar 29 2021, 9:49 AM · Restricted Project, Restricted Project

Mar 25 2021

protze.joachim added a comment to D96893: [OpenMP] libomp minor cleanup.

I created https://bugs.llvm.org/show_bug.cgi?id=49723 to track this issue

Mar 25 2021, 4:33 AM · Restricted Project

Mar 18 2021

protze.joachim accepted D98838: [OpenMP] Fixed a crash in hidden helper thread.

LGTM after the latest update.

Mar 18 2021, 2:06 PM · Restricted Project
protze.joachim added a comment to D98838: [OpenMP] Fixed a crash in hidden helper thread.

Please add my reproducers as test cases.

Mar 18 2021, 1:02 PM · Restricted Project
protze.joachim added a comment to D98838: [OpenMP] Fixed a crash in hidden helper thread.

__kmp_hidden_helper_initialize() always initializes all hidden threads at once. Right? In this case, you modifications make sense.

Mar 18 2021, 12:48 PM · Restricted Project
protze.joachim added a comment to D98838: [OpenMP] Fixed a crash in hidden helper thread.

LIBOMP_NUM_HIDDEN_HELPER_THREADS=0 avoids the segfault/assertion for the two test cases I attached to the bugzilla issue. This kind of makes sense, as 0 hidden threads cannot create a hole in the __kmp_threads array.

Mar 18 2021, 12:30 PM · Restricted Project
protze.joachim added a comment to D98838: [OpenMP] Fixed a crash in hidden helper thread.

I filed https://bugs.llvm.org/show_bug.cgi?id=49631 to make release managers aware that we have a problem here.

Mar 18 2021, 10:35 AM · Restricted Project
protze.joachim added a comment to D95460: [flang][driver] Add forced form flags and -ffixed-line-length.

Hi Andrzej,

Mar 18 2021, 9:18 AM · Restricted Project, Restricted Project
protze.joachim added inline comments to D98838: [OpenMP] Fixed a crash in hidden helper thread.
Mar 18 2021, 7:11 AM · Restricted Project
protze.joachim added a comment to D77609: [OpenMP] Added the support for hidden helper task in RTL.

I think, the fundamental issue of this patch is, that it broke the implicit assumption, that entries in __kmp_threads are handed out contiguously. After spending quite some effort into trying to identify locations, where this implicit assumption is now broken, I think, much more effort is needed to identify all places which rely on this assumption and are now broken.

Mar 18 2021, 5:14 AM · Restricted Project