This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Codegen aggregate for outlined function captures
AcceptedPublic

Authored by ggeorgakoudis on May 8 2021, 8:16 AM.

Details

Summary

Parallel regions are outlined as functions with capture variables explicitly generated as distinct parameters in the function's argument list. That complicates the fork_call interface in the OpenMP runtime: (1) the fork_call is variadic since there is a variable number of arguments to forward to the outlined function, (2) wrapping/unwrapping arguments happens in the OpenMP runtime, which is sub-optimal, has been a source of ABI bugs, and has a hardcoded limit (16) in the number of arguments, (3) forwarded arguments must cast to pointer types, which complicates debugging. This patch avoids those issues by aggregating captured arguments in a struct to pass to the fork_call.

Additional changes by Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>
- Fixed opaque pointer miscompile.
- Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.
- Fixed nocapture attribute of kmpc_alloc_aggregate_arg.
- Added align attribute for call to kmpc_alloc_shared.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
JonChesterfield added a comment.EditedJul 19 2021, 4:46 AM

@ronlieb bisected amdgpu crashing to this too, rocm 'veccopy' case tries to dereference 0. Might be the same failure mode as the above or a different one, the hsa error reporting is quite coarse grained.

Suggest we pull this and try to fix it up before reapplying

edit: I haven't looked through the patch in detail, but it seems plausible that we could diff IR before and after for the failing cases to narrow down the fix. Test update looks machine generated, was it a script that could be repeated after adjusting codegen?

Thanks for spotting. Test are auto-gened through update scripts so it should be easy to compare. I'll fix and ping.

ggeorgakoudis reopened this revision.Jul 27 2021, 11:04 AM
This revision is now accepted and ready to land.Jul 27 2021, 11:04 AM
jhuber6 added inline comments.Jul 27 2021, 11:05 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1340

This needs to include the size of the accompanying push after D106496.

Rebase, update tests

Add previous, upper bound to aggregate for combined distributed directives.
Update tests.

Remove unnecessary comments.

ggeorgakoudis requested review of this revision.Sep 16 2021, 4:57 PM
jhuber6 accepted this revision.Sep 20 2021, 5:49 AM

LGTM, this passed the usual tests and I didn't see any leftover allocations.

This revision is now accepted and ready to land.Sep 20 2021, 5:49 AM

Fix for clang-tidy

Rebase and update tests

This revision was landed with ongoing or failed builds.Sep 21 2021, 10:51 AM
This revision was automatically updated to reflect the committed changes.

seeing buildbot failures after this patch landed https://lab.llvm.org/staging/#/builders/183/builds/1598

ggeorgakoudis added a comment.EditedSep 21 2021, 11:46 AM

seeing buildbot failures after this patch landed https://lab.llvm.org/staging/#/builders/183/builds/1598

Looking at it @ronlieb, thanks for reporting

seeing buildbot failures after this patch landed https://lab.llvm.org/staging/#/builders/183/builds/1598

This looks like another AMDGPU issue. The code in question doesn't do anything AMDGPU specific.
@ronlieb @JonChesterfield How to debug this?

Please revert the patch so our buildbot can resume greeness, and we can look into it with urgency today (me or Jon)
as it should be reproducible

@ronlieb can you apply this to amd-stg-open? If it breaks there we have a chance of trying a debugger on it. @dpalermo might be available again now.

@jdoerfert I debug stuff like this by inspection, guesswork and a DIY printf implementation that is itself not totally robust. Very occasionally the thing can be isolated as a unit test. If we're lucky a debug llvm + debug rocr build will be more verbose about what is going wrong.

Please revert the patch so our buildbot can resume greeness, and we can look into it with urgency today (me or Jon)
as it should be reproducible

Sounds good. @ggeorgakoudis let's revert and wait for input.

@pdhaliwal
i will pass the problem over to Pushpinder Singh who should be waking up soon.

George, thank you for reverting it. i can reproduce the issue on a local system.
building latest (revert present) passes.
Revert the revert and fails

one test that fails :
in build directory:
export LOC=pwd; cd $LOC/runtimes/runtimes-bins/openmp && /usr/bin/python3.8 $LOC/./bin/llvm-l
it -vv --show-unsupported --show-xfail -j 32 $LOC/runtimes/runtimes-bins/openmp/libomptarget/test/amdgcn-amd-amdhsa/mapping/declare_mapper_targe
t_data.cpp

command stderr:

[GPU Memory Error] Addr: 0x0 Reason: Page not present or supervisor privilege.
Memory access fault by GPU node-2 (Agent handle: 0x18ae1d0) on address (nil). Reason: Page not present or supervisor privilege.

It looks like from IR diff that this patch is adding use of kmpc_alloc_shared method. These methods likely won't work on AMDGPU as device malloc is not available. Not sure what could be done apart from marking those tests as XFAIL on amdgcn. :(

It looks like from IR diff that this patch is adding use of kmpc_alloc_shared method. These methods likely won't work on AMDGPU as device malloc is not available. Not sure what could be done apart from marking those tests as XFAIL on amdgcn. :(

That's a good theory. Could confirm by patching the amdgpu malloc to return 0xdeadbeef or similar instead of 0 and seeing if that number shows up in the invalid memory access error. If so there's two problems:
1/ malloc on the gpu can fail, so it would mean we're missing a check on the return code of malloc in the devicertl
2/ increased importance for getting malloc running on amdgpu
The openmp in rocm/aomp does have a malloc, so it would also be interesting to see if they run OK with this patch applied

I got this after changing __kmpc_impl_malloc to return 0xdeadbeef. So, this confirms that missing malloc implementation is the root cause.

Memory access fault by GPU node-4 (Agent handle: 0x1bc5000) on address 0xdeadb000. Reason: Page not present or supervisor privilege.

I got this after changing __kmpc_impl_malloc to return 0xdeadbeef. So, this confirms that missing malloc implementation is the root cause.

Memory access fault by GPU node-4 (Agent handle: 0x1bc5000) on address 0xdeadb000. Reason: Page not present or supervisor privilege.

Nice! In that case I think the way to go is to audit the (probably few) places where kmpc_impl_malloc are called and add a check for whether the return value is 0. With that in place we can reland this and get more graceful failure (at a guess we should fall back to the host when gpu memory is exhausted? or maybe just print a 'out of gpu heap memory' style message and abort, don't know).

I got this after changing __kmpc_impl_malloc to return 0xdeadbeef. So, this confirms that missing malloc implementation is the root cause.

Memory access fault by GPU node-4 (Agent handle: 0x1bc5000) on address 0xdeadb000. Reason: Page not present or supervisor privilege.

Nice! In that case I think the way to go is to audit the (probably few) places where kmpc_impl_malloc are called and add a check for whether the return value is 0. With that in place we can reland this and get more graceful failure (at a guess we should fall back to the host when gpu memory is exhausted? or maybe just print a 'out of gpu heap memory' style message and abort, don't know).

We should only fail to remove the kmpc_shared_alloc with O0. Since we need kmpc_shared_alloc for all non-trivial codes, they would always fail on AMDGPU. That said,
why is the shared memory stack not catching this. It's a 64 byte stack for the main thread and we are looking at at 24 byte allocation for declare_mapper_target.cpp.
Can you determine why first two conditionals in __kmpc_alloc_shared don't catch this and return proper memory?

ggeorgakoudis reopened this revision.Sep 28 2021, 10:01 AM
This revision is now accepted and ready to land.Sep 28 2021, 10:01 AM

Update memory allocation for aggregate argument.
Introduce runtime interface to allocate from local memory,
when in SPMD mode, or heap, when in generic.

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 10:25 AM

@pdhaliwal @JonChesterfield @ronlieb I updated the aggregate argument memory allocation to use an alloca instead of malloc'ing in SPMD mode, which should resolve your issue. Could someone please test the updated patch and give me feedback before landing?

[AMD Official Use Only]

Hi George,
I will do it now ...

i backed up to your reverted patch, and applied this one.
I see some new errors

libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_data.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_data_enter_exit.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_update.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/taskloop_offload_nowait.cpp

Dont know if Jon is around, so i will ask Singh @pdhaliwal if he can dig into it a bit more.

i backed up to your reverted patch, and applied this one.
I see some new errors

libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_data.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_data_enter_exit.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_target_update.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_offloading_map.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/taskloop_offload_nowait.cpp

Dont know if Jon is around, so i will ask Singh @pdhaliwal if he can dig into it a bit more.

Hmm, thanks @ronlieb, @pdhaliwal please let me know what fails.

Apologies for late reply. Most of the tests now do not try to call malloc, so no page fault errors. But all of them are producing wrong results. For e.g. declare_mapper_target.cpp produces Sum = 132608 with the patch applied. Similarly for other tests as well. So don't know what's happening yet.

Apologies for late reply. Most of the tests now do not try to call malloc, so no page fault errors. But all of them are producing wrong results. For e.g. declare_mapper_target.cpp produces Sum = 132608 with the patch applied. Similarly for other tests as well. So don't know what's happening yet.

Thanks @pdhaliwal. I get the right result on nvidia. Please let me know when you get to the bottom of it.

I modified the declare_mapper_target to print the contents of array after target region and found the following output:

2 3 4 5 6 7 8 9 10 11 Sum = 65

Program:

#include <cstdio>
#include <cstdlib>

#define NUM 10

int main() {
  int *c= new int[NUM];
  for (int i = 0; i < NUM; i++) {
    c[i] = 1;
  }
#pragma omp target teams distribute  parallel for map(tofrom: c[0:NUM])
  for (int i = 0; i < NUM; i++) {
    c[i]++;
  }
  int sum = 0;
  for (int i = 0; i < NUM; i++) {
    sum += c[i];
    printf("%d ", c[i]);
  }
  // CHECK: Sum = 2048
  printf("Sum = %d\n", sum);
  return 0;
}

Different variant of the same program is producing correct output,

#include <cstdio>
#include <cstdlib>

#define NUM 10

int main() {
  int *c= new int[NUM];
  for (int i = 0; i < NUM; i++) {
    c[i] = 1;
  }

  int *b = new int[NUM];
#pragma omp target teams distribute  parallel for map(tofrom: c[0:NUM], b[0:NUM])
  for (int i = 0; i < NUM; i++) {
    b[i] = c[i] + 1;
  }
  int sum = 0;
  for (int i = 0; i < NUM; i++) {
    sum += b[i];
    printf("%d ", b[i]);
  }
  // CHECK: Sum = 2048
  printf("Sum = %d\n", sum);
  return 0;
}

Output (this is the right answer):

2 2 2 2 2 2 2 2 2 2 Sum = 20

On internal amd-stg-open branch, this patch works fine, so issue is only with the trunk.
I compared the generated IR before and after applying this patch, I didn't see anything suspicious. (but can't be 100% sure).

I have created a patch (D111218) with fix for amdgcn. This is a temporary fix. I will still keep on looking into it until I find a real root cause.

Update the interface for allocating/sharing the struct aggregate
Simplify invoking tasks

ggeorgakoudis requested review of this revision.Nov 9 2021, 7:44 AM

TODO update tests

Update tests
Fix for attributes to kmpc_alloc_aggregate_arg
Do not emit allocations if there are no arguments in the aggregate

ABataev accepted this revision.Dec 1 2021, 9:26 AM

LG wit a nit

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1342

Enclose into braces too

This revision is now accepted and ready to land.Dec 1 2021, 9:26 AM

Rebase, address comment, update few tests

ggeorgakoudis marked 2 inline comments as done.Dec 1 2021, 11:36 AM
JonChesterfield added a comment.EditedDec 1 2021, 11:42 AM

This works approximately as well as trunk does for me, provided D114865 is also applied. My baseline is not totally solid but I think there's a credible chance this would pass the buildbot, provided D114865 went in first.

Ron reports two new failures with this applied,
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c

My local sm_75 box with this patch applied (and otherwise a clean build) claims failures in

libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug51781.c
jdoerfert accepted this revision.Dec 22 2021, 2:45 PM

Can we land this? AMD issues seems resolved.

[AMD Official Use Only]

@Singh, Pushpinder is this resolved?
You were most recently working on it.

Thx

pdhaliwal added a comment.EditedDec 22 2021, 7:32 PM

I am seeing a lot of failures on nvptx machine (sm_70, cuda11.4) with this patch,

libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug49779.cpp
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda :: offloading/bug51982.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/close_enter_exit.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/close_modifier.c
libomptarget :: nvptx64-nvidia-cuda :: unified_shared_memory/shared_update.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49021.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug49334.cpp
libomptarget :: nvptx64-nvidia-cuda-newRTL :: offloading/bug51781.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/close_enter_exit.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/close_modifier.c
libomptarget :: nvptx64-nvidia-cuda-newRTL :: unified_shared_memory/shared_update.c

On amdgcn, these are the tests failing,

libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51781.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/bug51982.c
libomptarget :: amdgcn-amd-amdhsa-newRTL :: offloading/bug49021.cpp
libomptarget :: amdgcn-amd-amdhsa-newRTL :: offloading/bug51781.c

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 12:05 PM

As discussed in https://github.com/llvm/llvm-project/issues/54654, this needs to be added for SPMDization with this patch. Not sure whether further handling is required.

diff --git a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
index c4736521e475..23cfa6fe5e27 100644

  • a/llvm/lib/Transforms/IPO/OpenMPOpt.cpp

+++ b/llvm/lib/Transforms/IPO/OpenMPOpt.cpp
@@ -4260,6 +4260,7 @@ struct AAKernelInfoCallSite : AAKernelInfo {

case OMPRTL___kmpc_nvptx_parallel_reduce_nowait_v2:
case OMPRTL___kmpc_nvptx_teams_reduce_nowait_v2:
case OMPRTL___kmpc_nvptx_end_reduce_nowait:

+ case OMPRTL___kmpc_alloc_aggregate_arg:

  break;
case OMPRTL___kmpc_distribute_static_init_4:
case OMPRTL___kmpc_distribute_static_init_4u:

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Hey @dhruvachak. Unfortunately I can't find time lately to work on this patch. Would you like to take over?

I added https://github.com/llvm/llvm-project/issues/54654 documenting what I found when testing this patch on amdgpu.

@ggeorgakoudis Can you please rebase this patch on top of main? Thanks.

Hey @dhruvachak. Unfortunately I can't find time lately to work on this patch. Would you like to take over?

@ggeorgakoudis I rebased the sources on top of main and resolved conflicts in my local workspace. I haven't updated the clang/llvm tests. There are tests that fail on amdgpu that I am investigating. One example is https://github.com/llvm/llvm-project/issues/54654.

dhruvachak added inline comments.Apr 8 2022, 12:35 PM
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
925

NoCapture attributes for the parameters need to be removed. See https://github.com/llvm/llvm-project/issues/54654

reverse ping. Are there outstanding issues with this?

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Also, make sure to remove all deviceRTL files and probably reset the autogenerated tests to upstream (and re-generate) before you merge (or reupload).

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

See my comment below. I think that's the issue.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

It used __kmpc_alloc_shared, which should in theory work with O0 (also for AMDGPU) but in practice might not, especially if it has to fallback to malloc. We are working on malloc support right now. This should not stop us. No reasonable code runs with O0 (on AMDGPU) right now.

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Apply them, I can look over everything again.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1272

This doesn't work anymore with opaque pointers, IIRC. We should remember the type and pass to this place.

Also, make sure to remove all deviceRTL files and probably reset the autogenerated tests to upstream (and re-generate) before you merge (or reupload).

I rebased and resolved conflicts just now and got the compiler built. I did not update the tests, hence not updating this review. I see the following outstanding issues:

(1) make check-libomptarget produces a bunch of failures with the following compile-time assertion. So my rebased patch is not interacting correctly with opaque pointers. It is the same assertion for all the failures.
llvm-project/llvm/include/llvm/IR/Type.h:384: llvm::Type* llvm::Type::getNonOpaquePointerElementType() const: Assertion `NumContainedTys && "Attempting to get element type of opaque pointer"' failed.

See my comment below. I think that's the issue.

(2) From earlier investigation a couple of months back, this patch uses device alloc and will fail if device allocation is not implemented (e.g. in main branch of amdgpu). Most of these failures are seen at -O0, OpenMPOpt is able to optimize them away at higher opt levels. Are we ok with these failures at -O0?

It used __kmpc_alloc_shared, which should in theory work with O0 (also for AMDGPU) but in practice might not, especially if it has to fallback to malloc. We are working on malloc support right now. This should not stop us. No reasonable code runs with O0 (on AMDGPU) right now.

(3) There were a few issues found regarding SPDMization, NoCaptureAttrs, alignment that should be applied to this patch. I have those changes on a local branch.

Apply them, I can look over everything again.

Yes, I will apply the changes, refresh the tests, and re-upload.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1272

Thanks. Changing this fixed the assertions.

dhruvachak added inline comments.Jul 8 2022, 11:31 AM
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
3310

This should be VoidTy now that GlobalArgs type has changed.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

jdoerfert added a comment.EditedJul 8 2022, 1:19 PM

>>! In D102107#3639615, @dhruvachak wrote:

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

For these ones I have a script locally (attached) that need some manual doing but it helps:

1. run the ast dump and store the result (same as RUN line), e.g.,
  {F23722650} clang -cc1 -internal-isystem /data/build/llvm-project/lib/clang/13.0.0/include -nostdsysteminc -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump /data/src/llvm-project/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_4.cpp &> /tmp/ast
2. python3 ast_dump_2_check.py /tmp/ast CHECK
3. replace the check lines with the content of /tmp/ast.check

>>! In D102107#3639615, @dhruvachak wrote:

Is there an llvm/utils script to update clang tests that have RUN lines at the top? An example is clang/test/OpenMP/debug_threadprivate_copyin.c.

You can create the run lines with the llvm/utils/update_cc_test_checks.py script but those tests have manual lines for now.
I usually run llvm/utils/update_cc_test_checks.py -u clang/test/OpenMP/*.{c,cpp} to update all autogenerated tests.

Okay, I will convert those few manual OpenMP tests to autogen format.

How about the AST ones? Do they have to be manually updated? Example: clang/test/AST/ast-dump-openmp-distribute-parallel-for-simd.c

For these ones I have a script locally (attached) that need some manual doing but it helps:

1. run the ast dump and store the result (same as RUN line), e.g.,
  {F23722650} clang -cc1 -internal-isystem /data/build/llvm-project/lib/clang/13.0.0/include -nostdsysteminc -triple x86_64-unknown-unknown -fopenmp -verify -ast-dump /data/src/llvm-project/clang/test/AST/ast-dump-openmp-begin-declare-variant_template_4.cpp &> /tmp/ast
2. python3 ast_dump_2_check.py /tmp/ast CHECK
3. replace the check lines with the content of /tmp/ast.check

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

So, generate check lines for new tests in a separate patch first.
For the AST ones, you need to take the run line of the test, not what I posted there. If it doesn't work, one needs to check why. Hard to diagnose and I don't remember if there is something else. Maybe you need to only include part of it?

jhuber6 added a comment.EditedJul 8 2022, 5:48 PM

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

Sometimes if update_cc_test_check.py -u ${test} doesn't work you either just need to run it twice so the line numbers get updated on the kernel functions, or you can try taking the command line directly from the top of the file and running it again with that instead of -u. A few options aren't handled properly via the update with -u and need to be run again completely.

dhruvachak updated this revision to Diff 443399.Jul 8 2022, 5:53 PM

Fixed opaque pointer miscompile.
Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.
Fixed nocapture attribute of kmpc_alloc_aggregate_arg,
Added align attribute for call to
kmpc_alloc_shared.
Updated (most) failing clang tests.

Thanks. I followed the above steps and regenerated a couple of the AST tests but they still fail. Perhaps I am missing some options?

I currently have a handful of clang test failures where regen did not work. I am going to update the patch, post the current test results, and we can figure out how to regen the rest before we land this patch.

So, generate check lines for new tests in a separate patch first.

Not sure what you mean by new tests. make check-clang has a few failures on existing tests. I think all of them are regen issues. I will post the results.

For the AST ones, you need to take the run line of the test, not what I posted there. If it doesn't work, one needs to check why. Hard to diagnose and I don't remember if there is something else. Maybe you need to only include part of it?

Yes, I took the run line of the test. The regen worked OK, I removed the old CHECK lines and added the new ones. But make check-clang still flags it as a failure. As you said, we need to understand why.

dhruvachak added inline comments.Jul 8 2022, 6:02 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4316

@jdoerfert Is this enough to enable SPMDization or is further handling required?

dhruvachak added inline comments.Jul 8 2022, 6:04 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4316

Just to be clear, this change does allow SPMDization now but want to make sure nothing else is missing.

Results from "make check-clang":

Failed Tests (14):

Clang :: AST/ast-dump-openmp-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for.c
Clang :: CodeGenCXX/observe-noexcept.cpp
Clang :: OpenMP/declare_variant_construct_codegen_1.c
Clang :: OpenMP/nvptx_lambda_pointer_capturing.cpp
Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c
Clang :: OpenMP/target_in_reduction_codegen.cpp
Clang :: SemaCXX/static-assert.cpp
Clang :: utils/update_cc_test_checks/generated-funcs.test

Testing Time: 29.33s

Skipped          :     4
Unsupported      :  1478
Passed           : 29406
Expectedly Failed:    27
Failed           :    14

Need to check the following again.

clang/test/AST/ast-dump-openmp-distribute-parallel-for.c was regenerated and part of the patch but the test still fails. The other regenerated AST tests are not part of this patch, they seem to fail even after regen.

Need to regen CodeGenCXX, SemaCXX, and utils tests (3 total).

I tried converting the OpenMP manual CHECK tests to the autogen format. Some of them still fail as above, don't know why.

Need to know how to regen the OpenMP remarks tests.

make check-openmp passes on amdgpu. Need to check on nvptx.

Testing Time: 39.95s

Unsupported      : 143
Passed           : 563
Expectedly Failed:  14

[100%] Built target check-openmp
[100%] Built target check-openmp

It seems the buildbot didn't actually test this patch but an old one, still:

The checks for this tests are not updated:
target_teams_distribute_parallel_for_order_codegen.cpp
target_in_reduction_codegen.cpp
nvptx_lambda_capturing.cpp
nvptx_lambda_pointer_capturing.cpp

Similar to clang/test/OpenMP/declare_variant_construct_codegen_1.c, we should manually update the few fork calls in clang/test/OpenMP/declare_variant_construct_codegen_1.c.

Can you share the output of the AST dump tests and the new check lines, so what run produces and the file we give to Filechec to verify it.

clang/test/OpenMP/declare_variant_construct_codegen_1.c
1052

Something went wrong here. Might be easier to manually change the kmpc_forc_call line (should not be much more)

Can you share the output of the AST dump tests and the new check lines, so what run produces and the file we give to Filechec to verify it.

I looked at the AST test output and the CHECK lines more carefully. Turns out the full path was embedded in some of the CHECK lines causing the failures. I corrected those manually and those AST tests now pass. I will move on to the other failures.

Regenerated clang tests, make check-clang passes

Rebased on top of a recent commit. Both check-clang and check-openmp (on amdgpu) pass.

Testing Time: 30.73s

Skipped          :     4
Unsupported      :  1480
Passed           : 29554
Expectedly Failed:    27

[100%] Built target check-clang

On amdgpu:

Testing Time: 42.65s

Unsupported      : 145
Passed           : 570
Expectedly Failed:  14

[100%] Built target check-openmp
[100%] Built target check-openmp

@jdoerfert With this patch, additional remarks are being generated. Please check whether the new OMP121 remarks in the following tests are OK.

Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c

All changes from my end are in. Please review.

@jdoerfert With this patch, additional remarks are being generated. Please check whether the new OMP121 remarks in the following tests are OK.

Clang :: OpenMP/remarks_parallel_in_multiple_target_state_machines.c
Clang :: OpenMP/remarks_parallel_in_target_state_machine.c

Can you send me the device IR generated for these (-save-temps). I need to check what's happening and building the patch myself will take a while.

@jdoerfert Attached are the device IR files, generated with -save-temps.

Pointing out the recent changes at the corresponding source locations.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
1289

Added align attribute for call to __kmpc_alloc_shared.

clang/lib/CodeGen/CGOpenMPRuntimeGPU.h
263

CapturedVarsElemTypes added to handle opaque pointers.

clang/lib/CodeGen/CodeGenFunction.h
3352

CapturedVarsElemTypes introduced to handle opaque pointers.

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
924

Fixed attribute of __kmpc_alloc_aggregate_arg,

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
4316

Added alloc_aggregate_arg entry point to OpenMPOpt SPMD list.

jdoerfert accepted this revision.Aug 31 2022, 2:50 PM

LG, the new remarks need to be addressed in a follow up. Please test for them and make a TODO that they should be optimized away.

This revision was landed with ongoing or failed builds.Sep 14 2022, 5:55 PM
This revision was automatically updated to reflect the committed changes.

check-llvm fails bunch of test for me


Failed Tests (12):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

check-llvm fails bunch of test for me


Failed Tests (12):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Thanks for reporting them. I need to update them.

I reverted this commit while I fix the failing tests.

dhruvachak reopened this revision.Sep 23 2022, 1:41 PM

This patch was reverted.

This revision is now accepted and ready to land.Sep 23 2022, 1:41 PM

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll
dhruvachak edited the summary of this revision. (Show Details)Sep 23 2022, 1:44 PM

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Did you recreate the tests from the C snipped? That is probably not a good idea. We should modify the IR. If we start with C code we can't do it like this anyway. I mean:

  • the IR is totally different,
  • the debug info is missing,
  • lots of unrelated metadata,
  • part of the device runtime was merged in,
  • ...

Updated llvm tests. The following 3 tests still fail:

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

@jdoerfert @jhuber6
I updated the LLVM tests except one, Transforms/OpenMP/spmdization_constant_prop.ll. There is no C source snippet in there. Can you help as to how to update it? Please review the diffs for all the updated LLVM tests as well.

In addition, the other 2 tests above fail even after updating. Looks like something is wrong. Can you help as to how to fix them?

Similar to clang tests, we are seeing remarks differences. We already decided to file an issue (after this patch lands) and look at them after-the-fact.

Did you recreate the tests from the C snipped? That is probably not a good idea. We should modify the IR. If we start with C code we can't do it like this anyway. I mean:

  • the IR is totally different,
  • the debug info is missing,
  • lots of unrelated metadata,
  • part of the device runtime was merged in,
  • ...

Yes, for the ones that have the C snippet, I re-created from that. Since the IR is quite different now, I thought this was the best way and less error-prone while generating the new IR.

Can you help update these tests by getting the patch locally?

These have the C snippet.

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

I think the following are updated correctly and pass.

LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll

I was not able to update the following, so it fails.

LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll

I'm unlikely to get to it in the next 2 weeks (IWOMP and OpenMP F2F). What I would do is to take the new IR, the old IR, run instnamer on the new one. Then splice in the new parts into the old IR removing what was there wrt. parallel_51.

Hi, any chance this will be completed any time soon? We are very keen to resurrect our clang-based OpenMP offloading pipeline at https://github.com/devitocodes/devito :-)

@dhruvachak Do you still need help updating the LLVM tests?

@dhruvachak Do you still need help updating the LLVM tests?

If you go a few messages back, there are some llvm tests that @jdoerfert said were not updated properly. Can someone help update those tests properly?

@dhruvachak Do you still need help updating the LLVM tests?

If you go a few messages back, there are some llvm tests that @jdoerfert said were not updated properly. Can someone help update those tests properly?

The patch does not apply cleanly currently, please rebase it and I'll try to get it working locally.

dhruvachak edited the summary of this revision. (Show Details)

Rebased.

@jhuber6

Turns out a rebase on top of trunk had ~200 test conflicts. During my last update in Sep, I had resolved all of the clang test conflicts and failures, there were only llvm test failures.

At this point, I checked out commit 92bc3fb5 for all the failed tests (both clang and llvm tests) and then ran update_cc_test_checks.py on all of the auto-generated clang tests with the updated compiler. After this update, the test results look like the following:

make check-clang: Some of these may be new since the last iteration. But I believe most of them need some manual updates. I did not check all of the 7 tests below but I believe most of them are not autogenerated. I suggest looking at them after the llvm tests are regenerated properly.


Failed Tests (7):

Clang :: CodeGen/PowerPC/ppc64le-varargs-f128.c
Clang :: OpenMP/nvptx_target_printf_codegen.c
Clang :: OpenMP/parallel_copyin_combined_codegen.c
Clang :: OpenMP/target_globals_codegen.cpp
Clang :: OpenMP/target_map_codegen_hold.cpp
Clang :: OpenMP/task_target_device_codegen.c
Clang :: OpenMP/unroll_codegen_parallel_for_factor.cpp

Testing Time: 47.70s

Skipped          :     4
Unsupported      :  1490
Passed           : 30113
Expectedly Failed:    28
Failed           :     7

make check-llvm: Other than spmdization_constant_prop, I think the rest of them have a C code snippet. These are the ones to look at first. These tests are not updated in the current rebased version. You may see the updates I made to them from the previous commit.


Failed Tests (9):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Testing Time: 58.80s

Skipped          :    59
Unsupported      : 19062
Passed           : 31988
Expectedly Failed:    69
Failed           :     9

make check-openmp: On amdgpu, this looks good.


Expectedly Failed Tests (12):

libomptarget :: amdgcn-amd-amdhsa :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/host_as_target.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: offloading/host_as_target.c

Testing Time: 145.83s

Unsupported      : 139
Passed           : 613
Expectedly Failed:  12

[100%] Built target check-openmp
[100%] Built target check-openmp

After rebasing on top of main today and regenerating all the auto-update clang tests, here are the test results. The AST tests have to be updated manually as Johannes mentioned earlier. I haven't looked at the other clang test failures.

The llvm tests need to be fixed, they have not been regenerated at this point. @jhuber6

make check-clang:


Failed Tests (16):

Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c
Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for.c
Clang :: CodeGen/PowerPC/ppc64le-varargs-f128.c
Clang :: OpenMP/irbuilder_safelen.cpp
Clang :: OpenMP/irbuilder_safelen_order_concurrent.cpp
Clang :: OpenMP/irbuilder_simd_aligned.cpp
Clang :: OpenMP/irbuilder_simdlen.cpp
Clang :: OpenMP/irbuilder_simdlen_safelen.cpp
Clang :: OpenMP/parallel_copyin_combined_codegen.c
Clang :: OpenMP/target_data_map_codegen_hold.cpp
Clang :: OpenMP/target_globals_codegen.cpp
Clang :: OpenMP/target_map_codegen_hold.cpp
Clang :: OpenMP/target_map_member_expr_codegen.cpp
Clang :: OpenMP/unroll_codegen_parallel_for_factor.cpp

Testing Time: 51.15s

Skipped          :     4
Unsupported      :  2776
Passed           : 30423
Expectedly Failed:    26
Failed           :    16

make check-llvm:


Failed Tests (13):

LLVM :: Transforms/OpenMP/custom_state_machines.ll
LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
LLVM :: Transforms/OpenMP/nested_parallelism.ll
LLVM :: Transforms/OpenMP/parallel_level_fold.ll
LLVM :: Transforms/OpenMP/spmdization.ll
LLVM :: Transforms/OpenMP/spmdization_assumes.ll
LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
LLVM :: Transforms/OpenMP/spmdization_guarding.ll
LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
LLVM :: Transforms/OpenMP/spmdization_remarks.ll

Testing Time: 83.05s

Skipped          :    59
Unsupported      : 19442
Passed           : 32601
Expectedly Failed:    68
Failed           :    13

make check-openmp on amdgpu:


Expectedly Failed Tests (12):

libomptarget :: amdgcn-amd-amdhsa :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa :: offloading/host_as_target.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/data_member_ref.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_default_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/declare_mapper_nested_mappers.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/lambda_by_value.cpp
libomptarget :: amdgcn-amd-amdhsa-LTO :: mapping/ompx_hold/struct.c
libomptarget :: amdgcn-amd-amdhsa-LTO :: offloading/host_as_target.c

Testing Time: 148.67s

Unsupported      : 141
Passed           : 676
Expectedly Failed:  12

Rebased and updated tests.

I rebased the patch and regenerated the clang tests. I haven't regenerated the llvm tests. @jhuber6 @jdoerfert Please help regenerate the llvm tests. Several of the failing clang tests were regenerated earlier, they can perhaps be regenerated after the llvm tests are regenerated.

Here are the test results. check-openmp on amdgpu passes.

Failed Tests (13):
  Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for-simd.c
  Clang :: AST/ast-dump-openmp-target-teams-distribute-parallel-for.c
  Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for-simd.c
  Clang :: AST/ast-dump-openmp-teams-distribute-parallel-for.c
  Clang :: CodeGen/PowerPC/ppc64le-varargs-f128.c
  Clang :: Headers/amdgcn-openmp-device-math-complex.c
  Clang :: Headers/amdgcn-openmp-device-math-complex.cpp
  Clang :: Headers/amdgcn_openmp_device_math.c
  Clang :: Headers/openmp_device_math_isnan.cpp
  Clang :: OpenMP/nvptx_lambda_pointer_capturing.cpp
  Clang :: OpenMP/parallel_copyin_combined_codegen.c
  Clang :: OpenMP/target_globals_codegen.cpp
  Clang :: OpenMP/unroll_codegen_parallel_for_factor.cpp

Failed Tests (14):
  LLVM :: Transforms/OpenMP/custom_state_machines.ll
  LLVM :: Transforms/OpenMP/custom_state_machines_remarks.ll
  LLVM :: Transforms/OpenMP/get_hardware_num_threads_in_block_fold.ll
  LLVM :: Transforms/OpenMP/gpu_state_machine_function_ptr_replacement.ll
  LLVM :: Transforms/OpenMP/is_spmd_exec_mode_fold.ll
  LLVM :: Transforms/OpenMP/nested_parallelism.ll
  LLVM :: Transforms/OpenMP/parallel_level_fold.ll
  LLVM :: Transforms/OpenMP/spmdization.ll
  LLVM :: Transforms/OpenMP/spmdization_assumes.ll
  LLVM :: Transforms/OpenMP/spmdization_constant_prop.ll
  LLVM :: Transforms/OpenMP/spmdization_guarding.ll
  LLVM :: Transforms/OpenMP/spmdization_guarding_two_reaching_kernels.ll
  LLVM :: Transforms/OpenMP/spmdization_no_guarding_two_reaching_kernels.ll
  LLVM :: Transforms/OpenMP/spmdization_remarks.ll
jhuber6 updated this revision to Diff 510797.Apr 4 2023, 7:13 AM

Fixed the Clang tests. Haven't touched the LLVM ones because this breaks SPMDzation and state machine rewrites completely in those tests. Someone who knows what this patch changes should look into what needs to be updated to make those tests match whatever form SPDMzation expects now. Also for some bizarre reason this patch breaks adding alwaysinline on kmpc_parallel_51.

Fixed the Clang tests. Haven't touched the LLVM ones because this breaks SPMDzation and state machine rewrites completely in those tests. Someone who knows what this patch changes should look into what needs to be updated to make those tests match whatever form SPDMzation expects now. Also for some bizarre reason this patch breaks adding alwaysinline on kmpc_parallel_51.

I hadn't looked into whether OpenMPOpt was working ok, so did not realize this breakage. @jdoerfert If you get a chance, please look into this problem ^^.

I'm trying to pick up the context for this and D95976. Superficially it looks like lowering variadic functions in the compiler could be used to simplify quite a lot of this, @jdoerfert there's a comment from some time ago which suggests that this code path was originally a workaround for lack of variadics.

I'm currently debugging an IR pass that eliminates variadic calls in the hope of using that all the time on amdgpu. I think it could be adapted to patch these calls on the fly for nvptx as well if we added it to the openmp codegen pipeline, need to see whether the function pointer interacts well with the recent specialisation pass.