Add info about completion of OpenMP 3.1 + support for some elements of OpenMP 4.0
Details
Diff Detail
Event Timeline
Suggest we modify to:
OpenMP Support
Clang 3.7 fully supports OpenMP 3.1 and reported to work on several platforms,
including x86, x86-64 and Power.
In addition to OpenMP 3.1, several important elements of 4.0 version of the
standard are supported as well:
- `omp simd, omp for simd and omp parallel for simd` pragmas
- atomic constructs
- `proc_bind clause of omp parallel` pragma
- `depend clause of omp task` pragma
- `omp cancel and omp cancellation point` pragmas
- `omp taskgroup` pragma
We plan to continue work on 4.0 for clang 3.8. Please see this link for up-to-date
status <https://github.com/clang-omp/clang/wiki/Status-of-supported-OpenMP-constructs>_.
Contributors to this work include AMD, Argonne National Lab., IBM, Intel, Texas Instruments, University of Houston and many others.
This looks fine to me. Please remember that this should be committed to the 3.7 branch, not to Clang trunk.
Regarding Michael Wong's suggestion:
- The addition of "We plan to continue work on 4.0 for clang 3.8. Please see this link for up-to-date status https://github.com/clang-omp/clang/wiki/Status-of-supported-OpenMP-constructs_." is useful information, but doesn't seem directly relevant for the release notes for Clang 3.7, because it's not information about the 3.7 release. I'm happy with this with or without that change.
- We have historically not included lists of contributors in our release notes. While attribution of this kind is important, a better place for it is probably LLVM's CREDITS.TXT or similar rather than here (we want these credits to remain for the lifetime of the project, not just for one release).
Can the note include a link to the documentation describing how to use OpenMP with Clang?
And speaking of that, what is the situation there? As far as I understand, the "switching CLANG_DEFAULT_OPENMP_RUNTIME to libomp" discussion is still not resolved. Does -fopenmp work out of the box now? -fopenmp=libomp? Where is the user supposed to get the runtime lib from? It's not currently part of the pre-built binaries that ship as part of the release.
docs/ReleaseNotes.rst | ||
---|---|---|
113 | Nit: "Now fully supported" and "reported to work on several platforms" reads slightly contradictory to me. |
+Jopnathan Peyton
We had a discussion on this on our Wednesday morning call. Jonathan Peyton
has added the cmake infrastructure to support this and is automating the
library tests to support the switch for libomp. Jonathan will be able to
correct me if I am wrong. This we hope will enable the setting of the
default switch of -fopenmp. If there are additional requirements, please
let us know. Thanks.
I would like to see that setting enabled first, before claiming full support in the release notes, though. If the release notes say it's fully supported, it needs to Just Work out of the box.
Also, I do want to point out that it's now pretty late for this kind of changes in the 3.7 process. Not necessarily too late, but pretty late. It would be good to have a plan for the potential scenario where the default switch doesn't make it into this release: even if it's not on by default, how do we get the most value out of it in this release, can it be made easy for users to experiment with it and provide feedback, etc.
The completeness of the OpenMP 3.1 support in 3.7.0 branch can be seen on x86_64-apple-darwin by using it to run the ctest of OpenMP3.1_Validation test suite from http://web.cs.uh.edu/~hpctools/openmp...
#Tested Directive t ct ot oct has_openmp 100 100 100 100 omp_atomic 100 100 100 100 omp_barrier 100 100 100 100 omp_critical 100 100 100 100 omp_flush 100 100 100 100 omp_for_firstprivate 100 100 100 100 omp_for_lastprivate 100 90 100 80 omp_for_ordered 100 100 100 100 omp_for_private 100 100 100 100 omp_for_reduction 100 100 100 100 omp_for_schedule_dynamic 100 100 100 100 omp_for_schedule_guided 100 100 100 100 omp_for_schedule_static 100 100 100 100 omp_for_nowait 100 100 100 100 omp_get_num_threads 100 100 100 100 omp_get_wtick 100 100 100 100 omp_get_wtime 100 100 100 100 omp_in_parallel 100 100 100 100 omp_lock 100 100 100 100 omp_master 100 100 100 100 omp_nest_lock 100 100 100 100 omp_parallel_copyin 100 100 100 100 omp_parallel_for_firstprivate 100 100 100 100 omp_parallel_for_lastprivate 100 100 100 100 omp_parallel_for_ordered 100 100 100 100 omp_parallel_for_private 100 100 100 100 omp_parallel_for_reduction 100 100 100 100 omp_parallel_num_threads 100 100 100 100 omp_parallel_sections_firstprivate 100 100 100 100 omp_parallel_sections_lastprivate 100 100 100 100 omp_parallel_sections_private 100 100 100 100 omp_parallel_sections_reduction 100 100 100 85 omp_section_firstprivate 100 100 100 100 omp_section_lastprivate 100 100 100 100 omp_section_private 100 100 100 100 omp_sections_reduction 100 100 100 95 omp_sections_nowait 100 100 100 100 omp_parallel_for_if 100 100 100 100 omp_single_copyprivate 100 100 100 100 omp_single_nowait 100 100 100 100 omp_single_private 100 100 100 100 omp_single 100 100 100 100 omp_test_lock 100 100 100 100 omp_test_nest_lock 100 100 100 100 omp_threadprivate 100 100 - - omp_parallel_default 100 100 100 100 omp_parallel_shared 100 100 100 100 omp_parallel_private 100 100 100 100 omp_parallel_firstprivate 100 100 100 100 omp_parallel_if 100 100 100 100 omp_parallel_reduction 100 100 100 100 omp_for_collapse 100 100 100 100 omp_master_3 100 100 100 100 omp_task 100 100 100 100 omp_task_if 100 100 100 100 omp_task_untied 0 - 0 - omp_task_shared 100 100 100 100 omp_task_private 100 100 100 100 omp_task_firstprivate 100 100 100 100 omp_taskwait 100 100 100 100 omp_taskyield 100 100 10 - omp_task_final 0 - 0 - Summary: S Number of tested Open MP constructs: 62 S Number of used tests: 123 S Number of failed tests: 5 S Number of successful tests: 118 S + from this were verified: 114 Normal tests: N Number of failed tests: 2 N + from this fail compilation: 0 N + from this timed out 0 N Number of successful tests: 60 N + from this were verified: 59 Orphaned tests: O Number of failed tests: 3 O + from this fail compilation: 0 O + from this timed out 0 O Number of successful tests: 58 O + from this were verified: 55
which compares very favorably to the results from using FSF gcc 5.2.0...
#Tested Directive t ct ot oct has_openmp 100 100 100 100 omp_atomic 100 60 100 35 omp_barrier 100 100 100 100 omp_critical 100 0 100 0 omp_flush 100 0 100 0 omp_for_firstprivate 100 100 100 100 omp_for_lastprivate 100 100 100 95 omp_for_ordered 100 100 100 100 omp_for_private 100 100 100 100 omp_for_reduction 100 100 100 100 omp_for_schedule_dynamic 100 100 100 100 omp_for_schedule_guided 100 100 100 100 omp_for_schedule_static 100 100 100 100 omp_for_nowait 100 100 100 100 omp_get_num_threads 100 100 100 100 omp_get_wtick 0 - 0 - omp_get_wtime 100 100 100 100 omp_in_parallel 100 100 100 100 omp_lock 100 55 100 50 omp_master 100 100 100 100 omp_nest_lock 100 40 100 25 omp_parallel_copyin 100 100 100 100 omp_parallel_for_firstprivate 100 100 100 100 omp_parallel_for_lastprivate 100 100 100 100 omp_parallel_for_ordered 100 100 100 100 omp_parallel_for_private 100 100 100 100 omp_parallel_for_reduction 100 100 100 100 omp_parallel_num_threads 100 100 100 100 omp_parallel_sections_firstprivate 100 100 100 100 omp_parallel_sections_lastprivate 100 100 100 100 omp_parallel_sections_private 100 100 100 100 omp_parallel_sections_reduction 100 25 100 15 omp_section_firstprivate 100 100 100 100 omp_section_lastprivate 100 100 100 100 omp_section_private 100 100 100 100 omp_sections_reduction 100 30 100 5 omp_sections_nowait 100 100 100 100 omp_parallel_for_if 100 100 100 100 omp_single_copyprivate 100 100 100 100 omp_single_nowait 100 100 100 100 omp_single_private 100 100 100 100 omp_single 100 100 100 100 omp_test_lock 100 60 100 45 omp_test_nest_lock 100 60 100 40 omp_threadprivate 100 100 - - omp_parallel_default 100 100 100 100 omp_parallel_shared 100 100 100 100 omp_parallel_private 100 100 100 100 omp_parallel_firstprivate 100 100 100 100 omp_parallel_if 100 100 100 100 omp_parallel_reduction 100 100 100 100 omp_for_collapse 100 100 100 100 omp_master_3 100 100 100 100 omp_task 100 100 100 100 omp_task_if 100 100 100 100 omp_task_untied 0 - 0 - omp_task_shared 100 100 100 100 omp_task_private 100 100 100 100 omp_task_firstprivate 100 100 100 100 omp_taskwait 100 100 100 100 omp_taskyield 100 45 10 - omp_task_final 0 - 0 - Summary: S Number of tested Open MP constructs: 62 S Number of used tests: 123 S Number of failed tests: 7 S Number of successful tests: 116 S + from this were verified: 96 Normal tests: N Number of failed tests: 3 N + from this fail compilation: 0 N + from this timed out 0 N Number of successful tests: 59 N + from this were verified: 49 Orphaned tests: O Number of failed tests: 4 O + from this fail compilation: 0 O + from this timed out 0 O Number of successful tests: 57 O + from this were verified: 47
For comparison, the results from the ctest of OpenMP3.1_Validation test suite using the current -fopenmp=libgomp default in 3.7.0 branch are very poor as expected since clang doesn't emit any OpenMP code generation for the libgomp case...
#Tested Directive t ct ot oct has_openmp 0 - 0 - omp_atomic 100 0 100 0 omp_barrier 0 - 0 - omp_critical 100 0 100 0 omp_flush 0 - 0 - omp_for_firstprivate 100 0 100 0 omp_for_lastprivate 100 0 100 0 omp_for_ordered 100 0 100 0 omp_for_private 100 0 100 0 omp_for_reduction 100 0 100 0 omp_for_schedule_dynamic 100 0 100 0 omp_for_schedule_guided 0 - 0 - omp_for_schedule_static 0 - 0 - omp_for_nowait 0 - 0 - omp_get_num_threads 100 100 100 100 omp_get_wtick 100 100 100 100 omp_get_wtime 100 100 100 100 omp_in_parallel 0 - 0 - omp_lock 100 0 100 0 omp_master 100 0 100 0 omp_nest_lock 100 0 100 0 omp_parallel_copyin 100 0 100 0 omp_parallel_for_firstprivate 100 0 100 0 omp_parallel_for_lastprivate 100 0 100 0 omp_parallel_for_ordered 100 0 100 0 omp_parallel_for_private 100 0 100 0 omp_parallel_for_reduction 100 0 100 0 omp_parallel_num_threads 100 0 100 0 omp_parallel_sections_firstprivate 100 0 100 0 omp_parallel_sections_lastprivate 100 0 100 0 omp_parallel_sections_private 100 100 100 100 omp_parallel_sections_reduction 100 0 100 0 omp_section_firstprivate 100 0 100 0 omp_section_lastprivate 100 0 100 0 omp_section_private 100 100 100 100 omp_sections_reduction 100 0 100 0 omp_sections_nowait 0 - 0 - omp_parallel_for_if 100 0 100 0 omp_single_copyprivate 100 0 100 0 omp_single_nowait 100 0 100 0 omp_single_private 0 - 0 - omp_single 100 0 100 0 omp_test_lock 100 0 100 0 omp_test_nest_lock 100 0 100 0 omp_threadprivate 100 0 - - omp_parallel_default 100 0 100 0 omp_parallel_shared 100 0 100 0 omp_parallel_private 100 100 100 100 omp_parallel_firstprivate 100 0 100 0 omp_parallel_if 100 0 100 0 omp_parallel_reduction 100 0 100 0 omp_for_collapse 100 0 100 0 omp_master_3 100 0 100 0 omp_task 0 - 0 - omp_task_if 100 0 100 0 omp_task_untied 0 - 0 - omp_task_shared 100 0 100 0 omp_task_private 100 100 100 100 omp_task_firstprivate 0 - 0 - omp_taskwait 100 0 100 0 omp_taskyield 0 - 0 - omp_task_final 0 - 0 - Summary: S Number of tested Open MP constructs: 62 S Number of used tests: 123 S Number of failed tests: 28 S Number of successful tests: 95 S + from this were verified: 14 Normal tests: N Number of failed tests: 14 N + from this fail compilation: 0 N + from this timed out 0 N Number of successful tests: 48 N + from this were verified: 7 Orphaned tests: O Number of failed tests: 14 O + from this fail compilation: 0 O + from this timed out 0 O Number of successful tests: 47 O + from this were verified: 7
Jack, I'm not trying to question to completeness of your implementation. My apologies if it was interpreted that way.
I'm just trying to make sure the release notes match the actual user experience.
This one works "out of the box" indeed (provided a user has runtime library available). As I see, Alexey updated his patch to reflect this.
OpenMP runtime sources (along with build instructions) is a part of llvm release since 3.5. As I understand, only core clang + llvm compilers are supplied as pre-built binaries, the rest is in source code only.
Yours,
Andrey Bokhanko
Software Engineer
Intel Compiler Team
Intel
So is the default of -fopenmp=libgomp going to be left in place just for the 3.7.0 release or for all future 3.7.x maintenance releases? Frankly this decision to favor a non-functional OpenMP implementation over own own OpenMP library is baffling if the goal it to get widespread testing of this new feature.
Also, if we are going to leave the default for CLANG_DEFAULT_OPENMP_RUNTIME set to libgomp, wouldn't it be better to at least modify cfe-3.7.0.src/CMakeLists.txt so that the user could pass -DCLANG_DEFAULT_OPENMP_RUNTIME=libomp to override that default in their own builds of 3.7.0 rather than forcing them to invoke -fopenmp=libomp? Currently we lock them into this unless they manually edit the CMakeLists.txt.
Great.
OpenMP runtime sources (along with build instructions) is a part of llvm release since 3.5. As I understand, only core clang + llvm compilers are supplied as pre-built binaries, the rest is in source code only.
compiler-rt, libc++ and other libraries which integrate nicely with the LLVM build are also part of the pre-built binaries, modulo platform support.
I have a patch at http://reviews.llvm.org/D11494 that would facilitate building the run-time as part of the release process and shipping it as a separate download on the release page. I think that would make it easier for users who wish to experiment with Clang's OpenMP support.
The maintenance releases only contain bug fixes. I don't think changing this flag would be in scope.
Frankly this decision to favor a non-functional OpenMP implementation over own own OpenMP library is baffling if the goal it to get widespread testing of this new feature.
Baffling or not, that is still the state on trunk, and I haven't seen any discussion or patches towards changing it. If nothing changes on trunk, there's nothing to consider for merging to 3.7.
Nit: "Now fully supported" and "reported to work on several platforms" reads slightly contradictory to me.