- User Since
- Sep 24 2015, 4:45 AM (299 w, 4 d)
Thu, Jun 17
I must admit that I have no idea about the internals of libomptarget. But here is my high-level view on this issue:
Wed, Jun 16
Tue, Jun 15
Mon, Jun 14
It seems like a change in this patch introduced a dependency issue for fresh builds.
Wed, Jun 9
Making DataPoolEntry non-virtual
Tue, Jun 8
I don't think, this needs ifdefs. As I understand, a compiler is supposed to ignore unknown pragmas.
I fully agree that the suppression will not be compatible with other compilers.
Mon, Jun 7
Applied a more recent version of clang-format
Renamed class as suggested. Also applied a more recent version of clang-format.
Implement @Hahnfeld 's suggested changes
Sun, Jun 6
I think, this reflects what we discussed in the OpenMP in LLVM meeting. Thanks for working on this!
Thu, Jun 3
Wed, Jun 2
Tue, May 25
May 19 2021
The initial fix broke in some cases. I missed the difference between the incomplete and allocated task counters.
This updated patch should fix the issue, but I still run into starvation issues. I'll attach another reproducer in the bug.
May 18 2021
May 12 2021
May 11 2021
Ok, let's see what people with different test systems report.
May 7 2021
lgtm as well.
May 6 2021
You can just use UNSUPPORTED.
I still have issues with this patch and I think, it's not the right solution. I wonder, why __kmp_bottom_half_finish_proxy is necessary at all - see the inline comment.
If it's necessary, I tend to change the "imaginary children" code to td_incomplete_child_tasks |= 0x4000000 and td_incomplete_child_tasks &= !0x4000000. Then wait while (td_incomplete_child_tasks & 0x4000000) != 0
I'm not convinced that disabling the tests is a good idea. While it might not be optimal that they don't trigger reliably, their failing shows that something is fundamentally broken in the runtime.
May 5 2021
@ronlieb Thanks for testing! When I saw this patch, I thought that broken thread-binding might possibly cause the observed performance issue.
Do we set any affinity for the hidden helper threads, or are they free floating?
May 4 2021
Apr 30 2021
Apr 29 2021
https://reviews.llvm.org/D101498 will fix this by allowing to distinguish
platforms supporting unified shared memory from platforms that don't.
Do you think, that looking at the gpu generation is sufficient, or does the
host architecture matter as well?
For me, the main use-case for stand-alone builds is runtime code development. In that case, I use a ("nightly") build of LLVM and use that to compile and test stand-alone runtime builds.
Switching branches regularly results in recompiling a lot of code. With stand-alone runtime builds, this is restricted to OpenMP runtime code.
The test tends to fail for x86_64 offloading. When I reduce the BS to 64, the test also fails for nvptx offloading.
Please update the test with a NFC commit.
The logic to identify whether certain features are supported for testing is in openmp/cmake/OpenMPTesting.cmake. Please add checks there to identify unified-shared-memory support.
Apr 28 2021
It might help in the future, if you add comments for each XFAIL you add, saying what feature is missing to make the test case work (like you did with printf).
-> This would naturally result in a separate XFAIL line for each offloading target.
Apr 27 2021
Rebased to master, fixing one conflict.
I rerun the tests for x86_64 and nvptx and removed the XFAIL for all tests succeeding in repeated tests.
Apr 26 2021
On our site, the dependencies are not necessarily found in the same directory when moving from build system to compute system. So, for production use I actually prefer LD_LIBRARY_PATH & CO, which are managed by the environment-modules system.
I moved the RUN line changes into D101326, which updates all tests.
The majority of tests has 5 identical RUN lines, which just differ in the triple-suffix. These 5 identical lines fold into the generic RUN line like I showed for private_mappings.c
The code I added in lit.cfg effectively makes the generic RUN line an alias for the selected specific RUN line.
I'm not sure, what special handling for nvptx you have in mind.
Apr 22 2021
Apr 21 2021
The latest changes fix the issue. I tested with x86_64 and nvidia offloading.
Apr 19 2021
I tested the patch and still get wrong results (I modified the declare_mapper_nested_default_mappers_complex_structure.cpp test to use my verbose printf from bugzilla):
Modified Data Hierarchy: Object C (0x623e50, 2) Contents: Object B (0x623e80, 2) Contents: Object A (0x6243f0) Contents: data1 = 6439680 data2 = 0 Object A (0x6243f8) Contents: data1 = -784902216 data2 = 11062 Object B (0x623e90, 2) Contents: Object A (0x6249e0) Contents: data1 = 6441856 data2 = 0 Object A (0x6249e8) Contents: data1 = 6439904 data2 = 0 Object C (0x623e60, 2) Contents: Object B (0x623ef0, 2) Contents: Object A (0x624b90) Contents: data1 = 6438736 data2 = 0 Object A (0x624b98) Contents: data1 = 6441344 data2 = 0 Object B (0x623f00, 2) Contents: Object A (0x624c60) Contents: data1 = 6444032 data2 = 0 Object A (0x624c68) Contents: data1 = 6441856 data2 = 0
Apr 14 2021
Apr 6 2021
Mar 30 2021
Phab also only lets me abandon the review
Mar 29 2021
Thank you for working on this! It works for me.
As you mentioned in D95460, this makes the behavior more similar to gcc.
Mar 25 2021
I created https://bugs.llvm.org/show_bug.cgi?id=49723 to track this issue
Mar 18 2021
LGTM after the latest update.
__kmp_hidden_helper_initialize() always initializes all hidden threads at once. Right? In this case, you modifications make sense.
LIBOMP_NUM_HIDDEN_HELPER_THREADS=0 avoids the segfault/assertion for the two test cases I attached to the bugzilla issue. This kind of makes sense, as 0 hidden threads cannot create a hole in the __kmp_threads array.
I filed https://bugs.llvm.org/show_bug.cgi?id=49631 to make release managers aware that we have a problem here.
I think, the fundamental issue of this patch is, that it broke the implicit assumption, that entries in __kmp_threads are handed out contiguously. After spending quite some effort into trying to identify locations, where this implicit assumption is now broken, I think, much more effort is needed to identify all places which rely on this assumption and are now broken.