Page MenuHomePhabricator

[OpenMP][WIP] Initial support for `begin/end declare variant`
AbandonedPublic

Authored by jdoerfert on Dec 8 2019, 5:19 PM.

Details

Summary
NOTE: This is a WIP patch to foster a discussion. Please do consider that when browsing the code. Details will be discussed in individual commits once we agreed on the overall model. This is also the reason why test coverage, documentation, TODOs, etc. is lacking.

This patch provides initial support for omp begin/end declare variant,
as defined in OpenMP technical report 8 (TR8).

A major purpose of this patch is to provide proper math.h/cmath support
for OpenMP target offloading. See PR42061, PR42798, PR42799.
The three tests included in this patch show that these bugs (should be)
fixed in this scheme.

In contrast to the declare variant handling we already have, this patch
makes use of the multi-version handling in clang. This is especially
useful as the variants have the same name as the base functions. We
should try to port all OpenMP variant handling to this scheme, see the
TODO in CodeGenModule for a proposed way towards this goal. Other than
that, we tried to reuse the existing multi-version and OpenMP variant
handling as much as possible.

NOTE: There are various TODOs that need fixing and switches that need additional cases.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jdoerfert added inline comments.Dec 8 2019, 5:28 PM
clang/lib/Parse/ParseOpenMP.cpp
1064

The diff is confusing here. I actually extracted some code into a helper function (ParseOMPDeclareVariantMatchClause on the right) which I can reuse in the begin/end handling. The code "deleted" here is below ParseOMPDeclareVariantMatchClause on the right.

jdoerfert updated this revision to Diff 232750.Dec 8 2019, 5:32 PM

Add (missing) include. (Worked locally just fine).

Build result: FAILURE -
Log files: console-log.txt, CMakeCache.txt

I read the spec and don't think that we need all this complex stuff for the implementation. Yiu need judt to check at the codegen phase if the function must be emitted or not. We don't even need to move context checksnfrom codegen, because with the current semantics all the checkscan be safely performed at the codegen phase.

I read the spec and don't think that we need all this complex stuff for the implementation. Yiu need judt to check at the codegen phase if the function must be emitted or not. We don't even need to move context checksnfrom codegen, because with the current semantics all the checkscan be safely performed at the codegen phase.

For better or worse we need this and it is actually a natural reuse of the multi-versioning code. We need this because:

  1. For the begin/end version we cannot even parse anything in a context that does not match at encounter time, e.g. the kind(fpga) context in clang/test/AST/ast-dump-openmp-begin-declare-variant.c.
  2. For the 5.0 version we cannot use the replaceAllUses approach currently implemented in tryEmitDeclareVariant as soon as we have the construct context selector trait. That means we will have to resolve the call target earlier anyway.

(FWIW, I wrote this part of the SPEC.)

ye-luo added a subscriber: ye-luo.Dec 8 2019, 7:29 PM

I read the spec and don't think that we need all this complex stuff for the implementation. Yiu need judt to check at the codegen phase if the function must be emitted or not. We don't even need to move context checksnfrom codegen, because with the current semantics all the checkscan be safely performed at the codegen phase.

For better or worse we need this and it is actually a natural reuse of the multi-versioning code. We need this because:

  1. For the begin/end version we cannot even parse anything in a context that does not match at encounter time, e.g. the kind(fpga) context in clang/test/AST/ast-dump-openmp-begin-declare-variant.c.

Ok, here we can check the context and just skip everything between begin/end pragmas just like if something like #ifdef...#endif is seen.

  1. For the 5.0 version we cannot use the replaceAllUses approach currently implemented in tryEmitDeclareVariant as soon as we have the construct context selector trait. That means we will have to resolve the call target earlier anyway.

I thought about this. Here we need to use a little bit different method, but again everything can be reolved at the codegen phase, no need to resolve it at parsing/sema. Plus, this is completely different problem and must be solved in the different patch.

(FWIW, I wrote this part of the SPEC.)

I read the spec and don't think that we need all this complex stuff for the implementation. Yiu need judt to check at the codegen phase if the function must be emitted or not. We don't even need to move context checksnfrom codegen, because with the current semantics all the checkscan be safely performed at the codegen phase.

For better or worse we need this and it is actually a natural reuse of the multi-versioning code. We need this because:

  1. For the begin/end version we cannot even parse anything in a context that does not match at encounter time, e.g. the kind(fpga) context in clang/test/AST/ast-dump-openmp-begin-declare-variant.c.

Ok, here we can check the context and just skip everything between begin/end pragmas just like if something like #ifdef...#endif is seen.

Agreed.

  1. For the 5.0 version we cannot use the replaceAllUses approach currently implemented in tryEmitDeclareVariant as soon as we have the construct context selector trait. That means we will have to resolve the call target earlier anyway.

I thought about this. Here we need to use a little bit different method, but again everything can be reolved at the codegen phase, no need to resolve it at parsing/sema.

I doubt we can, yet alone want to do (basically) overload resolution during codegen.
Depending on what function we resolve to, we get different instantiations which require everything from semantic analysis to run on that code again, right?
Could you elaborate why we should not do all the overload resolution at the same time and with the same mechanism that is already present? I mean, SemaOverload deals with multi-versioning already.

Plus, this is completely different problem and must be solved in the different patch.

As I noted in the very beginning of the commit message, this is not supposed to be a commited like this but split into multiple patches. Let's not mix discussions here.

I read the spec and don't think that we need all this complex stuff for the implementation. Yiu need judt to check at the codegen phase if the function must be emitted or not. We don't even need to move context checksnfrom codegen, because with the current semantics all the checkscan be safely performed at the codegen phase.

For better or worse we need this and it is actually a natural reuse of the multi-versioning code. We need this because:

  1. For the begin/end version we cannot even parse anything in a context that does not match at encounter time, e.g. the kind(fpga) context in clang/test/AST/ast-dump-openmp-begin-declare-variant.c.

Ok, here we can check the context and just skip everything between begin/end pragmas just like if something like #ifdef...#endif is seen.

Agreed.

  1. For the 5.0 version we cannot use the replaceAllUses approach currently implemented in tryEmitDeclareVariant as soon as we have the construct context selector trait. That means we will have to resolve the call target earlier anyway.

I thought about this. Here we need to use a little bit different method, but again everything can be reolved at the codegen phase, no need to resolve it at parsing/sema.

I doubt we can, yet alone want to do (basically) overload resolution during codegen.
Depending on what function we resolve to, we get different instantiations which require everything from semantic analysis to run on that code again, right?
Could you elaborate why we should not do all the overload resolution at the same time and with the same mechanism that is already present? I mean, SemaOverload deals with multi-versioning already.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

Plus, this is completely different problem and must be solved in the different patch.

As I noted in the very beginning of the commit message, this is not supposed to be a commited like this but split into multiple patches. Let's not mix discussions here.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

cchen added a subscriber: cchen.Dec 8 2019, 9:21 PM
hfinkel added inline comments.Dec 8 2019, 11:04 PM
clang/lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h
17

Should we use a more-specific selector and then get rid of this __NVPTX__ check?

clang/lib/Parse/ParseOpenMP.cpp
1505

Will this just inf-loop if the file ends?

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

The intent of this feature is to allow us to include the device-function headers and the system headers simultaneously, giving preference to the device functions when compiling for the device, thus fixing a number of outstanding math.h OpenMP offloading problems. This definitely means that we'll have multiple functions with the same name and we need to pick the right ones during overload resolution.

@jdoerfert , how does the ".ompvariant" work with external functions? I see the part of the spec which says, "The symbol name of a function definition that appears between a begin declare variant...", but, if we append this name to, for example, the names of functions present in the device math library, won't we have a problem with linking?

Great to see the fragile math.h stuff disappear.

I'm not sure about the CPU/GPU/other granularity. An openmp program with x86 as the host and target offload regions for amdgcn and for nvptx seems like a reasonable aspiration. Or for a couple of different generations from the same vendor.

More ambitiously, one might want a GPU to be the host, and offload kernels for I/O to an aarch64 "target".

We don't need to wire such combinations in up front, and I don't think they're excluded by this design. A future 'x86-64' variant would presumably be chosen over a 'cpu' variant when compiling for x86-64.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

JonChesterfield added inline comments.Dec 9 2019, 2:17 AM
clang/test/OpenMP/begin_declare_variant_codegen.cpp
72

The name mangling should probably append the device kind, .e.g. _Z3foov.ompvariant.gpu

JonChesterfield added inline comments.Dec 9 2019, 2:25 AM
clang/lib/Headers/__clang_cuda_cmath.h
70

We could call __builtin_fpclassify for nvptx, e.g. from https://github.com/ROCm-Developer-Tools/aomp-extras/blob/0.7-6/aomp-device-libs/libm/src/libm-nvptx.cpp

int fpclassify(float __x) {
  return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x);
}
int fpclassify(double __x) {
  return __builtin_fpclassify(FP_NAN, FP_INFINITE, FP_NORMAL, FP_SUBNORMAL, FP_ZERO, __x);
}

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

I would recommend to drop all this extra stuff from the patch and focus on the initial patch. We'll need something similar to multiversion in case of the construct context selectors, but at first we need to solve all the problems with the simple versions of the construct rather that try to solve all the problems in the world in one patch. It is almost impossible to review.

clang/lib/AST/StmtOpenMP.cpp
2243

I don't think this is the right place for this code. Will try to move it to Basic directory in my patch.

clang/lib/Parse/ParseOpenMP.cpp
1505

It will.

jdoerfert marked 11 inline comments as done.Dec 9 2019, 8:04 AM
jdoerfert added inline comments.
clang/lib/AST/StmtOpenMP.cpp
2243

Sure. As noted in the TODOs, finding a place for this is needed.

clang/lib/Headers/__clang_cuda_cmath.h
70

Agreed. Assuming it works, I'll put the fpclassify code back in but only remove the todo and OPENMP macro.

clang/lib/Headers/openmp_wrappers/__clang_openmp_math_declares.h
17

For now, this is CUDA after all. I was going to revisit this once we know how the AMD solution looks (I guess via HIP).
That said, I'd put a pin on it for now.

(The kind(gpu) selector below is only because we don't have anything more specific and it matches all our one GPU targets for now.)

clang/lib/Parse/ParseOpenMP.cpp
1505

We'll add a check and test.

clang/test/OpenMP/begin_declare_variant_codegen.cpp
72

There is already a TODO for that (I think CodeGenModule). Mangling right now is hardcoded and needs to be revisited :)

jdoerfert marked 5 inline comments as done.Dec 9 2019, 8:16 AM

@jdoerfert , how does the ".ompvariant" work with external functions? I see the part of the spec which says, "The symbol name of a function definition that appears between a begin declare variant...", but, if we append this name to, for example, the names of functions present in the device math library, won't we have a problem with linking?

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Great to see the fragile math.h stuff disappear.

I'm not sure about the CPU/GPU/other granularity. An openmp program with x86 as the host and target offload regions for amdgcn and for nvptx seems like a reasonable aspiration. Or for a couple of different generations from the same vendor.

More ambitiously, one might want a GPU to be the host, and offload kernels for I/O to an aarch64 "target".

We don't need to wire such combinations in up front, and I don't think they're excluded by this design. A future 'x86-64' variant would presumably be chosen over a 'cpu' variant when compiling for x86-64.

As I wrote in the inline comment somewhere, kind(gpu) is an artifact due to missing fine-grained context selectors. If that wasn't the core of your issue, please elaborate.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

I would recommend to drop all this extra stuff from the patch and focus on the initial patch. We'll need something similar to multiversion in case of the construct context selectors, but at first we need to solve all the problems with the simple versions of the construct rather that try to solve all the problems in the world in one patch. It is almost impossible to review.

I agree with you to the point that this is not supposed to be reviewed. That's why I wrote that in the commit message. I did this so we can make sure the general path is clear and people (myself included) can see how/that it works.
I also agree that construct context selectors are very close to multi-versioned functions. That is why I said earlier we should move all variant handling into this scheme.

My plan:

  • We play around with this prototype now, make sure there are no major problems with it (so far it didn't seem so).
  • We split it up (This doesn't necessarily need to be only done by me, as that often slows down these processes).
  • We review the parts with proper test coverage, etc. and get it in.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

I would recommend to drop all this extra stuff from the patch and focus on the initial patch. We'll need something similar to multiversion in case of the construct context selectors, but at first we need to solve all the problems with the simple versions of the construct rather that try to solve all the problems in the world in one patch. It is almost impossible to review.

I agree. We should split this into several patches (e.g., basic handling, skipping parsing for incompatible selectors, overload things). I think that @jdoerfert posted this so that people can see the high-level direction and provide feedback (including feedback on how to stage the functionality for review).

@jdoerfert , also, do we have tests that can go into the test suite / libomptarget regression tests demonstrating the collection of problems people have currently opened bugs on regarding math.h? I recall we still had problems with host code needing the long-double overloads, with constants from the system headers, etc.

@jdoerfert , how does the ".ompvariant" work with external functions? I see the part of the spec which says, "The symbol name of a function definition that appears between a begin declare variant...", but, if we append this name to, for example, the names of functions present in the device math library, won't we have a problem with linking?

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Great to see the fragile math.h stuff disappear.

I'm not sure about the CPU/GPU/other granularity. An openmp program with x86 as the host and target offload regions for amdgcn and for nvptx seems like a reasonable aspiration. Or for a couple of different generations from the same vendor.

More ambitiously, one might want a GPU to be the host, and offload kernels for I/O to an aarch64 "target".

We don't need to wire such combinations in up front, and I don't think they're excluded by this design. A future 'x86-64' variant would presumably be chosen over a 'cpu' variant when compiling for x86-64.

As I wrote in the inline comment somewhere, kind(gpu) is an artifact due to missing fine-grained context selectors. If that wasn't the core of your issue, please elaborate.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

I would recommend to drop all this extra stuff from the patch and focus on the initial patch. We'll need something similar to multiversion in case of the construct context selectors, but at first we need to solve all the problems with the simple versions of the construct rather that try to solve all the problems in the world in one patch. It is almost impossible to review.

I agree with you to the point that this is not supposed to be reviewed. That's why I wrote that in the commit message. I did this so we can make sure the general path is clear and people (myself included) can see how/that it works.
I also agree that construct context selectors are very close to multi-versioned functions. That is why I said earlier we should move all variant handling into this scheme.

I don't think we should do this. Something similar to multiversioning is required only for a small subset. Everything else can be implemented in a more straightforward and simple way. Plus, I'm not sure that we'll need full reuse of the multiversioning. Seems to me, we can implement codegen in a different way. Multiversioning is supported only by x86 in clang/LLVM. I think we can try to implement a more portable and universal scheme.

My plan:

  • We play around with this prototype now, make sure there are no major problems with it (so far it didn't seem so).
  • We split it up (This doesn't necessarily need to be only done by me, as that often slows down these processes).
  • We review the parts with proper test coverage, etc. and get it in.

@jdoerfert , how does the ".ompvariant" work with external functions? I see the part of the spec which says, "The symbol name of a function definition that appears between a begin declare variant...", but, if we append this name to, for example, the names of functions present in the device math library, won't we have a problem with linking?

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Okay, but how to we distinguish functions for which there is a declaration and we need the mangling because the user has provided a definition elsewhere, from those for which there is a declaration, and we don't want mangling because we need to link to some system library?

jdoerfert updated this revision to Diff 232902.Dec 9 2019, 11:10 AM

Add one more test sin(long double), and fix some rebase issues

@jdoerfert , also, do we have tests that can go into the test suite / libomptarget regression tests demonstrating the collection of problems people have currently opened bugs on regarding math.h? I recall we still had problems with host code needing the long-double overloads, with constants from the system headers, etc.

The three tests I have in here show already that almost all of the known problems are solved by this (e.g. constants from the system headers). The rest can be easily added as lit test. The test suite situation is evolving but far from me being resolved. I would prefer not to mix these discussions and focus on lit tests with this patch (once split).

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Okay, but how to we distinguish functions for which there is a declaration and we need the mangling because the user has provided a definition elsewhere, from those for which there is a declaration, and we don't want mangling because we need to link to some system library?

The idea is, declarations inside begin/end declare variant are supposed to be not affected by the begin/end declare variant. That is, if you have declarations you cannot expect variant multi-versioning to happen. Having declarations inside or outside the begin/end declare variant is still fine if they all denote the same function.

I don't think we should do this. Something similar to multiversioning is required only for a small subset.

This is neither true, nor relevant. It is not true because OpenMP 5.0 declare variant is so broken it cannot be used for what it was intended for. That means people (as for example we for math) will inevitably use begin/end declare variant.

Everything else can be implemented in a more straightforward and simple way.

Having a single scheme is arguably simpler than maintaining multiple schemes. There is no additional overhead in using the more powerful and available multi-version scheme for everything.

Plus, I'm not sure that we'll need full reuse of the multiversioning. Seems to me, we can implement codegen in a different way.

Please provide actual details with statements like this. It is impossible to tell what you mean.

Multiversioning is supported only by x86 in clang/LLVM. I think we can try to implement a more portable and universal scheme.

This is not true, at least not from a conceptual standpoint. While cpu_supports and cpu_is multi-versioning is restricted to X86, see supportsMultiVersioning in TargetInfo.h, the new kind of OpenMP multi-versioning is a portable and universal scheme (see the uses of supportsMultiVersioning)

...

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Okay, but how to we distinguish functions for which there is a declaration and we need the mangling because the user has provided a definition elsewhere, from those for which there is a declaration, and we don't want mangling because we need to link to some system library?

The idea is, declarations inside begin/end declare variant are supposed to be not affected by the begin/end declare variant. That is, if you have declarations you cannot expect variant multi-versioning to happen. Having declarations inside or outside the begin/end declare variant is still fine if they all denote the same function.

Thanks, now I understand. This seems like it will work.

Build result: fail - 60637 tests passed, 24 failed and 726 were skipped.

failed: Clang.CXX/dcl_dcl/basic_namespace/namespace_udecl/p11.cpp
failed: Clang.CXX/drs/dr5xx.cpp
failed: Clang.CXX/modules-ts/basic/basic_def_odr/p6/global-vs-module.cpp
failed: Clang.CXX/special/class_inhctor/p3.cpp
failed: Clang.Headers/nvptx_device_cmath_functions.c
failed: Clang.Headers/nvptx_device_cmath_functions.cpp
failed: Clang.Headers/nvptx_device_cmath_functions_cxx17.cpp
failed: Clang.Headers/nvptx_device_math_functions.c
failed: Clang.Headers/nvptx_device_math_functions.cpp
failed: Clang.Headers/nvptx_device_math_functions_cxx17.cpp
failed: Clang.OpenMP/declare_variant_ast_print.cpp
failed: Clang.OpenMP/declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/declare_variant_implementation_vendor_codegen.cpp
failed: Clang.OpenMP/declare_variant_messages.c
failed: Clang.OpenMP/declare_variant_messages.cpp
failed: Clang.OpenMP/declare_variant_mixed_codegen.cpp
failed: Clang.OpenMP/math_codegen.cpp
failed: Clang.OpenMP/math_fp_macro.cpp
failed: Clang.OpenMP/nvptx_declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/nvptx_declare_variant_implementation_vendor_codegen.cpp
failed: Clang.SemaCXX/attr-cpuspecific.cpp
failed: Clang.SemaCXX/attr-target-mv.cpp
failed: Clang.SemaCXX/friend.cpp
failed: Clang.SemaCXX/using-decl-1.cpp

Log files: console-log.txt, CMakeCache.txt

tra added a subscriber: tra.Dec 9 2019, 11:30 AM
tra added inline comments.
clang/lib/Headers/__clang_cuda_cmath.h
72

Please keep fpclassify in place. It's been available in this header for a long time and it *is* needed.

462

I think only #ifdef should be removed here. scalblnf itself should remain.

clang/lib/Headers/__clang_cuda_device_functions.h
1724

Ditto here. Only preprocessor statements should be removed.

Build result: fail - 60641 tests passed, 20 failed and 726 were skipped.

failed: Clang.CXX/dcl_dcl/basic_namespace/namespace_udecl/p11.cpp
failed: Clang.CXX/drs/dr5xx.cpp
failed: Clang.CXX/modules-ts/basic/basic_def_odr/p6/global-vs-module.cpp
failed: Clang.CXX/special/class_inhctor/p3.cpp
failed: Clang.Headers/nvptx_device_cmath_functions.c
failed: Clang.Headers/nvptx_device_math_functions.c
failed: Clang.OpenMP/declare_variant_ast_print.cpp
failed: Clang.OpenMP/declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/declare_variant_implementation_vendor_codegen.cpp
failed: Clang.OpenMP/declare_variant_messages.c
failed: Clang.OpenMP/declare_variant_messages.cpp
failed: Clang.OpenMP/declare_variant_mixed_codegen.cpp
failed: Clang.OpenMP/math_codegen.cpp
failed: Clang.OpenMP/math_fp_macro.cpp
failed: Clang.OpenMP/nvptx_declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/nvptx_declare_variant_implementation_vendor_codegen.cpp
failed: Clang.SemaCXX/attr-cpuspecific.cpp
failed: Clang.SemaCXX/attr-target-mv.cpp
failed: Clang.SemaCXX/friend.cpp
failed: Clang.SemaCXX/using-decl-1.cpp

Log files: console-log.txt, CMakeCache.txt

@jdoerfert , also, do we have tests that can go into the test suite / libomptarget regression tests demonstrating the collection of problems people have currently opened bugs on regarding math.h? I recall we still had problems with host code needing the long-double overloads, with constants from the system headers, etc.

The three tests I have in here show already that almost all of the known problems are solved by this (e.g. constants from the system headers). The rest can be easily added as lit test. The test suite situation is evolving but far from me being resolved. I would prefer not to mix these discussions and focus on lit tests with this patch (once split).

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Okay, but how to we distinguish functions for which there is a declaration and we need the mangling because the user has provided a definition elsewhere, from those for which there is a declaration, and we don't want mangling because we need to link to some system library?

The idea is, declarations inside begin/end declare variant are supposed to be not affected by the begin/end declare variant. That is, if you have declarations you cannot expect variant multi-versioning to happen. Having declarations inside or outside the begin/end declare variant is still fine if they all denote the same function.

I don't think we should do this. Something similar to multiversioning is required only for a small subset.

This is neither true, nor relevant. It is not true because OpenMP 5.0 declare variant is so broken it cannot be used for what it was intended for. That means people (as for example we for math) will inevitably use begin/end declare variant.

I rather doubt that it is so much broken. The fact, that you need some new construct to express some functionality does not mean that the previous one is incorrect. It is incomplete, maybe. But not broken. And even for begin/end stuff, multiversioning is only required for construct traits, for all other traits we can reuse the existing implementation.

Everything else can be implemented in a more straightforward and simple way.

Having a single scheme is arguably simpler than maintaining multiple schemes. There is no additional overhead in using the more powerful and available multi-version scheme for everything.

Plus, I'm not sure that we'll need full reuse of the multiversioning. Seems to me, we can implement codegen in a different way.

Please provide actual details with statements like this. It is impossible to tell what you mean.

Multiversioning is supported only by x86 in clang/LLVM. I think we can try to implement a more portable and universal scheme.

This is not true, at least not from a conceptual standpoint. While cpu_supports and cpu_is multi-versioning is restricted to X86, see supportsMultiVersioning in TargetInfo.h, the new kind of OpenMP multi-versioning is a portable and universal scheme (see the uses of supportsMultiVersioning)

jdoerfert updated this revision to Diff 232909.Dec 9 2019, 12:18 PM
jdoerfert marked 6 inline comments as done.

Undo math function removal (fpclassify & scalblnf), reorder includes (host
first) The latter is the "natural way" but also necessary because fpclassify
uses macros and we did not copy the complex cuda_runtime_wrapper include magic.
However, the sin(long double) is back if it is called in a function that has
a target region. This is an artifact unrelated to any of this (I would argue).
The problem is that we parse + type check *host* code surrounding the target
region when we compile for the target. This has various down sites and can
easily break without math involvement. Long story short, we need to fix this
later separately.

clang/lib/Headers/__clang_cuda_cmath.h
72

Done.

462

I misinterpreted the TODOs, here and above. That is why I removed code. Sorry for the noise.

clang/lib/Headers/__clang_cuda_device_functions.h
1724

Yeah, my bad.

This is neither true, nor relevant. It is not true because OpenMP 5.0 declare variant is so broken it cannot be used for what it was intended for. That means people (as for example we for math) will inevitably use begin/end declare variant.

I rather doubt that it is so much broken. The fact, that you need some new construct to express some functionality does not mean that the previous one is incorrect. It is incomplete, maybe. But not broken.

Broken in the sense that we (in the OpenMP accelerator subcommittee) don't think it can be used for what we envisioned it initially. It can be used for certain things though.

And even for begin/end stuff, multiversioning is only required for construct traits, for all other traits we can reuse the existing implementation.

Again, this is not the case. begin/end *always* caused multiple definitions with the same name. Even if we ignore that for a second, why should we not use the powerful infrastructure we have (=multi-versioning) that supports construct traits and not use it for the other traits? Or asked differently, why should we have a second codegen rewriting scheme?

Build result: fail - 60639 tests passed, 24 failed and 726 were skipped.

failed: Clang.CXX/dcl_dcl/basic_namespace/namespace_udecl/p11.cpp
failed: Clang.CXX/drs/dr5xx.cpp
failed: Clang.CXX/modules-ts/basic/basic_def_odr/p6/global-vs-module.cpp
failed: Clang.CXX/special/class_inhctor/p3.cpp
failed: Clang.Headers/nvptx_device_cmath_functions.c
failed: Clang.Headers/nvptx_device_cmath_functions.cpp
failed: Clang.Headers/nvptx_device_cmath_functions_cxx17.cpp
failed: Clang.Headers/nvptx_device_math_functions.c
failed: Clang.Headers/nvptx_device_math_functions.cpp
failed: Clang.Headers/nvptx_device_math_functions_cxx17.cpp
failed: Clang.OpenMP/declare_variant_ast_print.cpp
failed: Clang.OpenMP/declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/declare_variant_implementation_vendor_codegen.cpp
failed: Clang.OpenMP/declare_variant_messages.c
failed: Clang.OpenMP/declare_variant_messages.cpp
failed: Clang.OpenMP/declare_variant_mixed_codegen.cpp
failed: Clang.OpenMP/math_codegen.cpp
failed: Clang.OpenMP/math_fp_macro.cpp
failed: Clang.OpenMP/nvptx_declare_variant_device_kind_codegen.cpp
failed: Clang.OpenMP/nvptx_declare_variant_implementation_vendor_codegen.cpp
failed: Clang.SemaCXX/attr-cpuspecific.cpp
failed: Clang.SemaCXX/attr-target-mv.cpp
failed: Clang.SemaCXX/friend.cpp
failed: Clang.SemaCXX/using-decl-1.cpp

Log files: console-log.txt, CMakeCache.txt

This is neither true, nor relevant. It is not true because OpenMP 5.0 declare variant is so broken it cannot be used for what it was intended for. That means people (as for example we for math) will inevitably use begin/end declare variant.

I rather doubt that it is so much broken. The fact, that you need some new construct to express some functionality does not mean that the previous one is incorrect. It is incomplete, maybe. But not broken.

Broken in the sense that we (in the OpenMP accelerator subcommittee) don't think it can be used for what we envisioned it initially. It can be used for certain things though.

And even for begin/end stuff, multiversioning is only required for construct traits, for all other traits we can reuse the existing implementation.

Again, this is not the case. begin/end *always* caused multiple definitions with the same name. Even if we ignore that for a second, why should we not use the powerful infrastructure we have (=multi-versioning) that supports construct traits and not use it for the other traits? Or asked differently, why should we have a second codegen rewriting scheme?

Not always. If we see that the context selector does not match, we can skip everything between begin/end. It means exactly what I said - multiversioning is needed only for construct because all other traits can be easily resolved at the compile time. Generally speaking, there are 2 kinds of traits - global traits (like vendor, kind, isa, etc.), which can be resolved completely statically and do not need multiversioning, and local traits, like construct, which depend on the OpenMP directives and require something similar to the multiversioning.

Not always. If we see that the context selector does not match, we can skip everything between begin/end. It means exactly what I said - multiversioning is needed only for construct because all other traits can be easily resolved at the compile time. Generally speaking, there are 2 kinds of traits - global traits (like vendor, kind, isa, etc.), which can be resolved completely statically and do not need multiversioning, and local traits, like construct, which depend on the OpenMP directives and require something similar to the multiversioning.

The case where the code is skipped is easy, sure. However, if we "could easily resolve" the other case, we could have implemented an #ifdef solution for math.h/cmath. This was not the case and still is not. We basically populate the namespace with multiple versions of the same function (with the same name) and then select the appropriate one for each call site.

Instead of trying to argue why this is not needed for some cases, could you argue why we should have multiple schemes to resolve all types of variants? It seems you inherently assume the codegen patching scheme implemented right now is useful even if we need something else to complement it. I don't think so, thus there is little reason for me to distinguish between the types of variants that need multi-version support ant the types that can be implemented with multi-versions but don't need it.

Not always. If we see that the context selector does not match, we can skip everything between begin/end. It means exactly what I said - multiversioning is needed only for construct because all other traits can be easily resolved at the compile time. Generally speaking, there are 2 kinds of traits - global traits (like vendor, kind, isa, etc.), which can be resolved completely statically and do not need multiversioning, and local traits, like construct, which depend on the OpenMP directives and require something similar to the multiversioning.

The case where the code is skipped is easy, sure. However, if we "could easily resolve" the other case, we could have implemented an #ifdef solution for math.h/cmath. This was not the case and still is not. We basically populate the namespace with multiple versions of the same function (with the same name) and then select the appropriate one for each call site.

Instead of trying to argue why this is not needed for some cases, could you argue why we should have multiple schemes to resolve all types of variants? It seems you inherently assume the codegen patching scheme implemented right now is useful even if we need something else to complement it. I don't think so, thus there is little reason for me to distinguish between the types of variants that need multi-version support ant the types that can be implemented with multi-versions but don't need it.

Because each particular problem requires its own solution and it is always a bad idea to use the microscope to hammer the nails.

Not always. If we see that the context selector does not match, we can skip everything between begin/end. It means exactly what I said - multiversioning is needed only for construct because all other traits can be easily resolved at the compile time. Generally speaking, there are 2 kinds of traits - global traits (like vendor, kind, isa, etc.), which can be resolved completely statically and do not need multiversioning, and local traits, like construct, which depend on the OpenMP directives and require something similar to the multiversioning.

The case where the code is skipped is easy, sure. However, if we "could easily resolve" the other case, we could have implemented an #ifdef solution for math.h/cmath. This was not the case and still is not. We basically populate the namespace with multiple versions of the same function (with the same name) and then select the appropriate one for each call site.

Instead of trying to argue why this is not needed for some cases, could you argue why we should have multiple schemes to resolve all types of variants? It seems you inherently assume the codegen patching scheme implemented right now is useful even if we need something else to complement it. I don't think so, thus there is little reason for me to distinguish between the types of variants that need multi-version support ant the types that can be implemented with multi-versions but don't need it.

Because each particular problem requires its own solution and it is always a bad idea to use the microscope to hammer the nails.

While I see where you are coming from, I disagree. We have a generic framework available that we already need to use in some cases, there is no harm in using it for all cases. It would be different if we wouldn't need the generic framework at all, but that is not the case. All I ask is to literally share existing code, no additional complexity needed. Your suggestion will complicate the setup, duplicate logic, and make it overall harder to maintain and compose in the future. If you still disagree, please provide some arguments (and details) why we would benefit from your setup.

Not always. If we see that the context selector does not match, we can skip everything between begin/end. It means exactly what I said - multiversioning is needed only for construct because all other traits can be easily resolved at the compile time. Generally speaking, there are 2 kinds of traits - global traits (like vendor, kind, isa, etc.), which can be resolved completely statically and do not need multiversioning, and local traits, like construct, which depend on the OpenMP directives and require something similar to the multiversioning.

The case where the code is skipped is easy, sure. However, if we "could easily resolve" the other case, we could have implemented an #ifdef solution for math.h/cmath. This was not the case and still is not. We basically populate the namespace with multiple versions of the same function (with the same name) and then select the appropriate one for each call site.

Instead of trying to argue why this is not needed for some cases, could you argue why we should have multiple schemes to resolve all types of variants? It seems you inherently assume the codegen patching scheme implemented right now is useful even if we need something else to complement it. I don't think so, thus there is little reason for me to distinguish between the types of variants that need multi-version support ant the types that can be implemented with multi-versions but don't need it.

Because each particular problem requires its own solution and it is always a bad idea to use the microscope to hammer the nails.

While I see where you are coming from, I disagree. We have a generic framework available that we already need to use in some cases, there is no harm in using it for all cases. It would be different if we wouldn't need the generic framework at all, but that is not the case. All I ask is to literally share existing code, no additional complexity needed. Your suggestion will complicate the setup, duplicate logic, and make it overall harder to maintain and compose in the future. If you still disagree, please provide some arguments (and details) why we would benefit from your setup.

I have different opinion. You can reuse existing codegen for declare variant functions with global context selectors only. You just need to iterate through all the variants and choose the best one.
That's why you don't need the dispatching in your scheme. You're doing absolutely the same thing as the original declare variant implementation.

We cannot use multiversioning for the original declare variant construct since there is no multiversioning at all. We have a single function with many different aliasing functions, having different names. They are completely different functions. And I don't think it would be correct to add them as multiversiin variants to the original function.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

jdoerfert marked an inline comment as done.Dec 9 2019, 5:53 PM
jdoerfert added inline comments.
clang/lib/Headers/__clang_cuda_math_forward_declares.h
41

I have to double check what abs declarations where here and which were not.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

rampitec removed a subscriber: rampitec.Dec 9 2019, 6:05 PM

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

I understand that they have different names. I don't see why we that means that they can't be added to the overload set as multi-version candidates if we add logic which does exactly that.

So, D71241 shows how declare variant (5.0) would look like if we implement it through SemaLookup. I will actually revisit this patch tomorrow as I might be able to make it even simpler. (D71241 is saving ~250 lines and from what I've seen in the tests actually fixing things.)

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

I understand that they have different names. I don't see why we that means that they can't be added to the overload set as multi-version candidates if we add logic which does exactly that.

Because this is exactly what I said- you want to reuse the exiwting solution for completely different purpose just because you want to you reuse though even semantically it has nothing to do with multiversioning. And I think it is bad idead to break the semantics of the existing solution. It requires some addition changes like merging of different functiins with different names. And here I want to ask - why do you think it is better than my proposal to reuse the codegen for the already implemented declare variant stuff for the OpenMP multiversioned functions? It really requires less work, bdcause you just need to add a loop over all varinants and call tryEmit... function.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

I understand that they have different names. I don't see why we that means that they can't be added to the overload set as multi-version candidates if we add logic which does exactly that.

@jdoerfert posted a prototype implementation in D71241, so we don't need to just have a theoretical discussion, but I'd like to address a high-level issue here:

Because this is exactly what I said- you want to reuse the exiwting solution for completely different purpose just because you want to you reuse though even semantically it has nothing to do with multiversioning.

This kind of comment really isn't appropriate. We're all experienced developers here, and no one is proposing to reuse code in an inappropriate manner "just because" or for any other reason. I ask you to reconsider your reasoning here for two reasons:

  1. "Reus[ing] the existing solution for a completely different purpose", which I'll classify as structural code reuse, is not necessarily bad. Structural code reuse, where you reuse code with a similar structure, but different purpose, from what you need, is often a useful impetus for the creation of new abstractions. The trade off relevant here, in my experience, is against future structural divergence. In the future, is it likely that the abstraction will break down because the different purposes will tend to require the code structure to change in the future in divergent ways? If so, that can be a good argument against code reuse.
  2. Your statement of differing purpose, that declare variant has "nothing to do with multiversioning", it not obviously true. Declare variant, as the spec says, "declares a specialized variant of a base function and specifies the context in which that specialized variant is used." multiversioning, according to the GCC docs, makes it so that "you may specify multiple versions of a function, where each function is specialized for a specific target feature. At runtime, the appropriate version of the function is automatically executed depending on the characteristics of the execution platform." These two concepts do share some conceptual relationship.

It certainly seems fair to say that the AST representation desired for a call with runtime dispatch might be sufficiently different from a call resolved at compile time to make the code reuse inadvisable. However, the requirements of which I'm aware are: the representation should be unambiguous and faithful to the source and language structure, and also, the statically-resolvable callee should be referenced by the call site in the AST. As can be seen in this patch and the associated tests, both of these requirements are satisfied.

And I think it is bad idead to break the semantics of the existing solution. It requires some addition changes like merging of different functiins with different names. And here I want to ask - why do you think it is better than my proposal to reuse the codegen for the already implemented declare variant stuff for the OpenMP multiversioned functions? It really requires less work, bdcause you just need to add a loop over all varinants and call tryEmit... function.

When you say "semantics of the existing solution", do you mean the extent to which it satisfies the standard, non-standard user-visible behaviors of the existing implementation, or the internal structure of the implementation? It's certainly not a bad idea to change, or even replace, the current implementation if the result is better in some way (e.g., more general, supports more features, conceptually cleaner). The proposed solution seems to have a number of advantages over the current solution in codegen, and in addition, naturally handles the new features that we would like to support. Generally, resolution of static calls is something that should happen in Sema, not in CodeGen, in part so that static analysis tools can easily understand the call semantics. This new approach naturally provides this implementation property.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

I understand that they have different names. I don't see why we that means that they can't be added to the overload set as multi-version candidates if we add logic which does exactly that.

@jdoerfert posted a prototype implementation in D71241, so we don't need to just have a theoretical discussion, but I'd like to address a high-level issue here:

Because this is exactly what I said- you want to reuse the exiwting solution for completely different purpose just because you want to you reuse though even semantically it has nothing to do with multiversioning.

This kind of comment really isn't appropriate. We're all experienced developers here, and no one is proposing to reuse code in an inappropriate manner "just because" or for any other reason. I ask you to reconsider your reasoning here for two reasons:

  1. "Reus[ing] the existing solution for a completely different purpose", which I'll classify as structural code reuse, is not necessarily bad. Structural code reuse, where you reuse code with a similar structure, but different purpose, from what you need, is often a useful impetus for the creation of new abstractions. The trade off relevant here, in my experience, is against future structural divergence. In the future, is it likely that the abstraction will break down because the different purposes will tend to require the code structure to change in the future in divergent ways? If so, that can be a good argument against code reuse.

I agree that reusing is a good idea but not in this case. I already wrote that Johanmes reuses just a single feature of the multiversioning - handling of the multiple definitons for the same function. Nothing else. Everything else is a new functionality to support declare variant stuff.

  1. Your statement of differing purpose, that declare variant has "nothing to do with multiversioning", it not obviously true. Declare variant, as the spec says, "declares a specialized variant of a base function and specifies the context in which that specialized variant is used." multiversioning, according to the GCC docs, makes it so that "you may specify multiple versions of a function, where each function is specialized for a specific target feature. At runtime, the appropriate version of the function is automatically executed depending on the characteristics of the execution platform." These two concepts do share some conceptual relationship.

It certainly seems fair to say that the AST representation desired for a call with runtime dispatch might be sufficiently different from a call resolved at compile time to make the code reuse inadvisable. However, the requirements of which I'm aware are: the representation should be unambiguous and faithful to the source and language structure, and also, the statically-resolvable callee should be referenced by the call site in the AST. As can be seen in this patch and the associated tests, both of these requirements are satisfied.

Current solution allows to do everything correctly and requires just a small rework. The actual selection of the fumction can (and must be done) at the codegen. There is no benefits inchoosing correct variant functiin in sema. Moreover, it leads to breaking of the AST in the way that the original function call is replaced by a new function call in AST. And dump/printing works differently than the user expected. L

And I think it is bad idead to break the semantics of the existing solution. It requires some addition changes like merging of different functiins with different names. And here I want to ask - why do you think it is better than my proposal to reuse the codegen for the already implemented declare variant stuff for the OpenMP multiversioned functions? It really requires less work, bdcause you just need to add a loop over all varinants and call tryEmit... function.

When you say "semantics of the existing solution", do you mean the extent to which it satisfies the standard, non-standard user-visible behaviors of the existing implementation, or the internal structure of the implementation? It's certainly not a bad idea to change, or even replace, the current implementation if the result is better in some way (e.g., more general, supports more features, conceptually cleaner). The proposed solution seems to have a number of advantages over the current solution in codegen, and in addition, naturally handles the new features that we would like to support. Generally, resolution of static calls is something that should happen in Sema, not in CodeGen, in part so that static analysis tools can easily understand the call semantics. This new approach naturally provides this implementation property.

There are no advantages at all. This solution has absolutely the same power as the existing one. And I see not a single reason why a function resolution should happen in sema. Moreover, there are real problems with the proposed solution with AST.

jdoerfert updated this revision to Diff 233232.Dec 10 2019, 4:58 PM

Consistent overload based solution.

Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 4:58 PM
jdoerfert updated this revision to Diff 233234.Dec 10 2019, 5:02 PM

Diff against TOT

Build result: FAILURE - Could not check out parent git hash "9a3d576b08c13533597182498ba5e739924f892f". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

Build result: FAILURE - Could not check out parent git hash "9a3d576b08c13533597182498ba5e739924f892f". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

jdoerfert updated this revision to Diff 233238.Dec 10 2019, 5:16 PM

Fix math problem

Build result: FAILURE - Could not check out parent git hash "9a3d576b08c13533597182498ba5e739924f892f". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

D75779 is the proper implementation of the OpenMP standard.

jdoerfert abandoned this revision.Mar 13 2020, 10:34 PM