Page MenuHomePhabricator

[OpenMP] Codegen aggregate for outlined function captures
AcceptedPublic

Authored by ggeorgakoudis on May 8 2021, 8:16 AM.

Details

Summary

Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Additional changes by Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
- Fixed opaque pointer miscompile.
- Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.
- Fixed nocapture attribute of kmpc_alloc_aggregate_arg.
- Added align attribute for call to kmpc_alloc_shared.

Diff Detail

Unit TestsFailed

TimeTest
3,500 mslibcxx CI Modules > llvm-libc++-shared-cfg-in.libcxx/algorithms/specialized_algorithms/special_mem_concepts::nothrow_sentinel_for.compile.pass.cpp
Script: -- : 'COMPILED WITH'; /home/libcxx-builder/.buildkite-agent/builds/63a50703aa00-1/llvm-project/libcxx-ci/install/bin/clang++ /home/libcxx-builder/.buildkite-agent/builds/63a50703aa00-1/llvm-project/libcxx-ci/libcxx/test/libcxx/algorithms/specialized.algorithms/special.mem.concepts/nothrow_sentinel_for.compile.pass.cpp --target=x86_64-unknown-linux-gnu -nostdinc++ -I /home/libcxx-builder/.buildkite-agent/builds/63a50703aa00-1/llvm-project/libcxx-ci/build/generic-modules/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/63a50703aa00-1/llvm-project/libcxx-ci/build/generic-modules/include/c++/v1 -I /home/libcxx-builder/.buildkite-agent/builds/63a50703aa00-1/llvm-project/libcxx-ci/libcxx/test/support -std=c++2b -fmodules -fcxx-modules -Werror -Wall -Wctad-maybe-unsupported -Wextra -Wshadow -Wundef -Wno-unused-command-line-argument -Wno-attributes -Wno-pessimizing-move -Wno-c++11-extensions -Wno-noexcept-type -Wno-atomic-alignment -Wno-user-defined-literals -Wno-tautological-compare -Wsign-compare -Wunused-variable -Wunused-parameter -Wunreachable-code -Wno-unused-local-typedef -D_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER -D_LIBCPP_ENABLE_EXPERIMENTAL -D_LIBCPP_DISABLE_AVAILABILITY -fcoroutines-ts -Werror=thread-safety -Wuser-defined-warnings -fsyntax-only

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
JonChesterfield added a comment.EditedDec 1 2021, 11:42 AM

This works approximately as well as trunk does for me, provided D114865 is also applied. My baseline is not totally solid but I think there's a credible chance this would pass the buildbot, provided D114865 went in first.

Ron reports two new failures with this applied,
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c

My local sm_75 box with this patch applied (and otherwise a clean build) claims failures in

libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug51781.c
jdoerfert accepted this revision.Dec 22 2021, 2:45 PM

Can we land this? AMD issues seems resolved.

[AMD Official Use Only]

@Singh, Pushpinder is this resolved?
You were most recently working on it.

Thx

pdhaliwal added a comment.EditedDec 22 2021, 7:32 PM

I am seeing a lot of failures on nvptx machine (sm_70, cuda11.4) with this patch,

libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49779.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51982.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/close_enter_exit.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/close_modifier.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/shared_update.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/close_enter_exit.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/close_modifier.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/shared_update.c

On amdgcn, these are the tests failing,

libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c
libomptarget :: amdgcn-amd-amdhsa-newRTL :: offloading/bug49021.cpp
libomptarget :: amdgcn-amd-amdhsa-newRTL :: offloading/bug51781.c

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 12:05 PM

As discussed in https://github.com/llvm/llvm-project/issues/54654, this needs to be added for SPMDization with this patch. Not sure whether further handling is required.

diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index c4736521e475..23cfa6fe5e27 100644

  • a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp

+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -4260,6 +4260,7 @@ struct AAKernelInfoCallSite : AAKernelInfo {

case OMPRTL___kmpc_nvptx_parallel_reduce_nowait_v2:
case OMPRTL___kmpc_nvptx_teams_reduce_nowait_v2:
case OMPRTL___kmpc_nvptx_end_reduce_nowait:

+ case OMPRTL___kmpc_alloc_aggregate_arg:

  break;
case OMPRTL___kmpc_distribute_static_init_4:
case OMPRTL___kmpc_distribute_static_init_4u:

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Hey @dhruvachak. Unfortunately I can't find time lately to work on this patch. Would you like to take over?

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Hey @dhruvachak. Unfortunately I can't find time lately to work on this patch. Would you like to take over?

@ggeorgakoudis I rebased the sources on top of main and resolved conflicts in my local workspace. I haven't updated the clang/llvm tests. There are tests that fail on amdgpu that I am investigating. One example is https://github.com/llvm/llvm-project/issues/54654.

dhruvachak added inline comments.Apr 8 2022, 12:35 PM
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
937

NoCapture attributes for the parameters need to be removed. See https://github.com/llvm/llvm-project/issues/54654

reverse ping. Are there outstanding issues with this?

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Also, make sure to remove all deviceRTL files and probably reset the autogenerated tests to upstream (and re-generate) before you merge (or reupload).

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

See my comment below. I think that's the issue.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

It used __kmpc_alloc_shared, which should in theory work with O0 (also for AMDGPU) but in practice might not, especially if it has to fallback to malloc. We are working on malloc support right now. This should not stop us. No reasonable code runs with O0 (on AMDGPU) right now.

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Apply them, I can look over everything again.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1200

This doesn't work anymore with opaque pointers, IIRC. We should remember the type and pass to this place.

Also, make sure to remove all deviceRTL files and probably reset the autogenerated tests to upstream (and re-generate) before you merge (or reupload).

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

See my comment below. I think that's the issue.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

It used __kmpc_alloc_shared, which should in theory work with O0 (also for AMDGPU) but in practice might not, especially if it has to fallback to malloc. We are working on malloc support right now. This should not stop us. No reasonable code runs with O0 (on AMDGPU) right now.

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Apply them, I can look over everything again.

Yes, I will apply the changes, refresh the tests, and re-upload.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1200

Thanks. Changing this fixed the assertions.

dhruvachak added inline comments.Jul 8 2022, 11:31 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
3238

This should be VoidTy now that GlobalArgs type has changed.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

jdoerfert added a comment.EditedJul 8 2022, 1:19 PM

>>! In D102107#3639615, @dhruvachak wrote:

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

For these ones I have a script locally (attached) that need some manual doing but it helps:

1. run the ast dump and store the result (same as RUN line), e.g.,
  {F23722650} clang -cc1 -internal-isystem /data/build/llvm-project/lib/clang/13.0.0/include -nostdsysteminc -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump /data/src/llvm-project/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_4.cpp &> /tmp/ast
2. python3 ast_dump_2_check.py /tmp/ast CHECK
3. replace the check lines with the content of /tmp/ast.check

>>! In D102107#3639615, @dhruvachak wrote:

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

For these ones I have a script locally (attached) that need some manual doing but it helps:

1. run the ast dump and store the result (same as RUN line), e.g.,
  {F23722650} clang -cc1 -internal-isystem /data/build/llvm-project/lib/clang/13.0.0/include -nostdsysteminc -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump /data/src/llvm-project/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_4.cpp &> /tmp/ast
2. python3 ast_dump_2_check.py /tmp/ast CHECK
3. replace the check lines with the content of /tmp/ast.check

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

So, generate check lines for new tests in a separate patch first.
For the AST ones, you need to take the run line of the test, not what I posted there. If it doesn't work, one needs to check why. Hard to diagnose and I don't remember if there is something else. Maybe you need to only include part of it?

jhuber6 added a comment.EditedJul 8 2022, 5:48 PM

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

Sometimes if update_cc_test_check.py -u ${test} doesn't work you either just need to run it twice so the line numbers get updated on the kernel functions, or you can try taking the command line directly from the top of the file and running it again with that instead of -u. A few options aren't handled properly via the update with -u and need to be run again completely.

dhruvachak updated this revision to Diff 443399.Jul 8 2022, 5:53 PM

Fixed opaque pointer miscompile.
Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.
Fixed nocapture attribute of kmpc_alloc_aggregate_arg,
Added align attribute for call to
kmpc_alloc_shared.
Updated (most) failing clang tests.

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

So, generate check lines for new tests in a separate patch first.

Not sure what you mean by new tests. make check-clang has a few failures on existing tests. I think all of them are regen issues. I will post the results.

For the AST ones, you need to take the run line of the test, not what I posted there. If it doesn't work, one needs to check why. Hard to diagnose and I don't remember if there is something else. Maybe you need to only include part of it?

Yes, I took the run line of the test. The regen worked OK, I removed the old CHECK lines and added the new ones. But make check-clang still flags it as a failure. As you said, we need to understand why.

dhruvachak added inline comments.Jul 8 2022, 6:02 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4366

@jdoerfert Is this enough to enable SPMDization or is further handling required?

dhruvachak added inline comments.Jul 8 2022, 6:04 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4366

Just to be clear, this change does allow SPMDization now but want to make sure nothing else is missing.

Results from "make check-clang":

Failed Tests (14):

Clang :: AST/ast-dump-openmp-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for.c
Clang :: CodeGenCXX/observe-noexcept.cpp
Clang :: OpenMP/declare_variant_construct_codegen_1.c
Clang :: OpenMP/nvptx_lambda_pointer_capturing.cpp
Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c
Clang :: OpenMP/target_in_reduction_codegen.cpp
Clang :: SemaCXX/static-assert.cpp
Clang :: utils/update_cc_test_checks/generated-funcs.test

Testing Time: 29.33s

Skipped          :     4
Unsupported      :  1478
Passed           : 29406
Expectedly Failed:    27
Failed           :    14

Need to check the following again.

clang/test/AST/ast-dump-openmp-distribute-parallel-for.c was regenerated and part of the patch but the test still fails. The other regenerated AST tests are not part of this patch, they seem to fail even after regen.

Need to regen CodeGenCXX, SemaCXX, and utils tests (3 total).

I tried converting the OpenMP manual CHECK tests to the autogen format. Some of them still fail as above, don't know why.

Need to know how to regen the OpenMP remarks tests.

make check-openmp passes on amdgpu. Need to check on nvptx.

Testing Time: 39.95s

Unsupported      : 143
Passed           : 563
Expectedly Failed:  14

[100%] Built target check-openmp
[100%] Built target check-openmp

It seems the buildbot didn't actually test this patch but an old one, still:

The checks for this tests are not updated:
target_teams_distribute_parallel_for_order_codegen.cpp
target_in_reduction_codegen.cpp
nvptx_lambda_capturing.cpp
nvptx_lambda_pointer_capturing.cpp

Similar to clang/test/OpenMP/declare_variant_construct_codegen_1.c, we should manually update the few fork calls in clang/test/OpenMP/declare_variant_construct_codegen_1.c.

Can you share the output of the AST dump tests and the new check lines, so what run produces and the file we give to Filechec to verify it.

clang/test/OpenMP/declare_variant_construct_codegen_1.c
1052

Something went wrong here. Might be easier to manually change the kmpc_forc_call line (should not be much more)

Can you share the output of the AST dump tests and the new check lines, so what run produces and the file we give to Filechec to verify it.

I looked at the AST test output and the CHECK lines more carefully. Turns out the full path was embedded in some of the CHECK lines causing the failures. I corrected those manually and those AST tests now pass. I will move on to the other failures.

Regenerated clang tests, make check-clang passes

Rebased on top of a recent commit. Both check-clang and check-openmp (on amdgpu) pass.

Testing Time: 30.73s

Skipped          :     4
Unsupported      :  1480
Passed           : 29554
Expectedly Failed:    27

[100%] Built target check-clang

On amdgpu:

Testing Time: 42.65s

Unsupported      : 145
Passed           : 570
Expectedly Failed:  14

[100%] Built target check-openmp
[100%] Built target check-openmp

@jdoerfert With this patch, additional remarks are being generated. Please check whether the new OMP121 remarks in the following tests are OK.

Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c

All changes from my end are in. Please review.

@jdoerfert With this patch, additional remarks are being generated. Please check whether the new OMP121 remarks in the following tests are OK.

Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c

Can you send me the device IR generated for these (-save-temps). I need to check what's happening and building the patch myself will take a while.

@jdoerfert Attached are the device IR files, generated with -save-temps.

Pointing out the recent changes at the corresponding source locations.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1217

Added align attribute for call to __kmpc_alloc_shared.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
253

CapturedVarsElemTypes added to handle opaque pointers.

clang/lib/CodeGen/CodeGenFunction.h
3353

CapturedVarsElemTypes introduced to handle opaque pointers.

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
936

Fixed attribute of __kmpc_alloc_aggregate_arg,

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4366

Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.

jdoerfert accepted this revision.Aug 31 2022, 2:50 PM

LG, the new remarks need to be addressed in a follow up. Please test for them and make a TODO that they should be optimized away.

This revision was landed with ongoing or failed builds.Sep 14 2022, 5:55 PM
This revision was automatically updated to reflect the committed changes.

check-llvm fails bunch of test for me


Failed Tests (12):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

check-llvm fails bunch of test for me


Failed Tests (12):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Thanks for reporting them. I need to update them.

I reverted this commit while I fix the failing tests.

dhruvachak reopened this revision.Sep 23 2022, 1:41 PM

This patch was reverted.

This revision is now accepted and ready to land.Sep 23 2022, 1:41 PM

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll
dhruvachak edited the summary of this revision. (Show Details)Sep 23 2022, 1:44 PM

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Did you recreate the tests from the C snipped? That is probably not a good idea. We should modify the IR. If we start with C code we can't do it like this anyway. I mean:

  • the IR is totally different,
  • the debug info is missing,
  • lots of unrelated metadata,
  • part of the device runtime was merged in,
  • ...

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Did you recreate the tests from the C snipped? That is probably not a good idea. We should modify the IR. If we start with C code we can't do it like this anyway. I mean:

  • the IR is totally different,
  • the debug info is missing,
  • lots of unrelated metadata,
  • part of the device runtime was merged in,
  • ...

Yes, for the ones that have the C snippet, I re-created from that. Since the IR is quite different now, I thought this was the best way and less error-prone while generating the new IR.

Can you help update these tests by getting the patch locally?

These have the C snippet.

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

I think the following are updated correctly and pass.

LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll

I was not able to update the following, so it fails.

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll

I'm unlikely to get to it in the next 2 weeks (IWOMP and OpenMP F2F). What I would do is to take the new IR, the old IR, run instnamer on the new one. Then splice in the new parts into the old IR removing what was there wrt. parallel_51.

Hi, any chance this will be completed any time soon? We are very keen to resurrect our clang-based OpenMP offloading pipeline at https://github.com/devitocodes/devito :-)

@dhruvachak Do you still need help updating the LLVM tests?

@dhruvachak Do you still need help updating the LLVM tests?

If you go a few messages back, there are some llvm tests that @jdoerfert said were not updated properly. Can someone help update those tests properly?

@dhruvachak Do you still need help updating the LLVM tests?

If you go a few messages back, there are some llvm tests that @jdoerfert said were not updated properly. Can someone help update those tests properly?

The patch does not apply cleanly currently, please rebase it and I'll try to get it working locally.

dhruvachak edited the summary of this revision. (Show Details)

Rebased.

@jhuber6

Turns out a rebase on top of trunk had ~200 test conflicts. During my last update in Sep, I had resolved all of the clang test conflicts and failures, there were only llvm test failures.

At this point, I checked out commit 92bc3fb5 for all the failed tests (both clang and llvm tests) and then ran update_cc_test_checks.py on all of the auto-generated clang tests with the updated compiler. After this update, the test results look like the following:

make check-clang: Some of these may be new since the last iteration. But I believe most of them need some manual updates. I did not check all of the 7 tests below but I believe most of them are not autogenerated. I suggest looking at them after the llvm tests are regenerated properly.


Failed Tests (7):

Clang :: CodeGen/PowerPC/ppc64le-varargs-f128.c
Clang :: OpenMP/nvptx_target_printf_codegen.c
Clang :: OpenMP/parallel_copyin_combined_codegen.c
Clang :: OpenMP/target_globals_codegen.cpp
Clang :: OpenMP/target_map_codegen_hold.cpp
Clang :: OpenMP/task_target_device_codegen.c
Clang :: OpenMP/unroll_codegen_parallel_for_factor.cpp

Testing Time: 47.70s

Skipped          :     4
Unsupported      :  1490
Passed           : 30113
Expectedly Failed:    28
Failed           :     7

make check-llvm: Other than spmdization_constant_prop, I think the rest of them have a C code snippet. These are the ones to look at first. These tests are not updated in the current rebased version. You may see the updates I made to them from the previous commit.


Failed Tests (9):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Testing Time: 58.80s

Skipped          :    59
Unsupported      : 19062
Passed           : 31988
Expectedly Failed:    69
Failed           :     9

make check-openmp: On amdgpu, this looks good.


Expectedly Failed Tests (12):

libomptarget :: amdgcn-amd-amdhsa :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/host_as_target.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: offloading/host_as_target.c

Testing Time: 145.83s

Unsupported      : 139
Passed           : 613
Expectedly Failed:  12

[100%] Built target check-openmp
[100%] Built target check-openmp

After rebasing on top of main today and regenerating all the auto-update clang tests, here are the test results. The AST tests have to be updated manually as Johannes mentioned earlier. I haven't looked at the other clang test failures.

The llvm tests need to be fixed, they have not been regenerated at this point. @jhuber6

make check-clang:


Failed Tests (16):

Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for.c
Clang :: CodeGen/PowerPC/ppc64le-varargs-f128.c
Clang :: OpenMP/irbuilder_safelen.cpp
Clang :: OpenMP/irbuilder_safelen_order_concurrent.cpp
Clang :: OpenMP/irbuilder_simd_aligned.cpp
Clang :: OpenMP/irbuilder_simdlen.cpp
Clang :: OpenMP/irbuilder_simdlen_safelen.cpp
Clang :: OpenMP/parallel_copyin_combined_codegen.c
Clang :: OpenMP/target_data_map_codegen_hold.cpp
Clang :: OpenMP/target_globals_codegen.cpp
Clang :: OpenMP/target_map_codegen_hold.cpp
Clang :: OpenMP/target_map_member_expr_codegen.cpp
Clang :: OpenMP/unroll_codegen_parallel_for_factor.cpp

Testing Time: 51.15s

Skipped          :     4
Unsupported      :  2776
Passed           : 30423
Expectedly Failed:    26
Failed           :    16

make check-llvm:


Failed Tests (13):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/nested_parallelism.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Testing Time: 83.05s

Skipped          :    59
Unsupported      : 19442
Passed           : 32601
Expectedly Failed:    68
Failed           :    13

make check-openmp on amdgpu:


Expectedly Failed Tests (12):

libomptarget :: amdgcn-amd-amdhsa :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/host_as_target.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: offloading/host_as_target.c

Testing Time: 148.67s

Unsupported      : 141
Passed           : 676
Expectedly Failed:  12