This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
AST/
-
StmtOpenMP.h
-
Sema/
-
Sema.h
-
lib/
-
AST/
-
Expr.cpp
-
StmtOpenMP.cpp
-
CodeGen/
-
CGOpenMPRuntime.h
-
CGOpenMPRuntime.cpp
-
CGOpenMPRuntimeNVPTX.h
-
CGOpenMPRuntimeNVPTX.cpp
-
CodeGenModule.cpp
-
Sema/
-
SemaExpr.cpp
-
SemaLookup.cpp
-
SemaOpenMP.cpp
2/3
SemaOverload.cpp
-
SemaTemplateInstantiateDecl.cpp
-
test/OpenMP/
-
OpenMP/
-
declare_variant_ast_print.cpp
-
declare_variant_device_kind_codegen.cpp
-
declare_variant_implementation_vendor_codegen.cpp
-
declare_variant_mixed_codegen.cpp
-
nvptx_declare_variant_device_kind_codegen.cpp
-
nvptx_declare_variant_implementation_vendor_codegen.cpp

Differential D71241

[OpenMP][WIP] Use overload centric declare variants
AbandonedPublic

Authored by jdoerfert on Dec 10 2019, 1:05 AM.

Download Raw Diff

Details

Reviewers

kiranchandramohan
ABataev
RaviNarayanaswamy
gtbercea
grokos
sdmitriev
JonChesterfield
hfinkel
fghanim

Summary

Instead of going through a custom overload resolution twice, we can use
the existing one.

TODO:

The tests need updating, they checked for functions that shouldn't
  have been emitted and the way they check is hard to update.
There is a TODO in the code we need to fix (see below).

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	1,150 ms	Clang.OpenMP::Unknown Unit Message ("")

Event Timeline

jdoerfert created this revision.Dec 10 2019, 1:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 1:05 AM

Herald added subscribers: s.egerton, guansong, bollu and 5 others. · View Herald Transcript

jdoerfert mentioned this in D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.Dec 10 2019, 1:07 AM

rampitec removed a subscriber: rampitec.Dec 10 2019, 1:12 AM

Build result: fail - 60666 tests passed, 1 failed and 726 were skipped.

failed: Clang.OpenMP/declare_variant_ast_print.cpp

Log files: console-log.txt, CMakeCache.txt

Harbormaster failed remote builds in B42185: Diff 233011!Dec 10 2019, 1:38 AM

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

clang/lib/Sema/SemaOverload.cpp
9725	Implement all todos and check it with the size of the code where you just need to iterate through all the va4iants and call the existing functions to emit their aliases.

In D71241#1776798, @ABataev wrote:

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here. What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

clang/lib/Sema/SemaOverload.cpp
9725	We do not emit aliases at all with this approach. Emitting aliases does not work for the generic case, e.g., the construct selector trait.

In D71241#1777661, @jdoerfert wrote:

In D71241#1776798, @ABataev wrote:

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

clang/lib/Sema/SemaOverload.cpp
9725	Not directly. I know that it won't work for construct, for construct we'll need a little bit different approach but it is not very hard to implement.

In D71241#1777709, @ABataev wrote:

In D71241#1777661, @jdoerfert wrote:

In D71241#1776798, @ABataev wrote:

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Why is any of this complicated to you? The logic to do the overloading is 15 lines long and most of it is to determine the best of all versions. What about that is more complicated than having multiple patch points during code generation in which we try to modify existing IR but sometimes have to delay it and hope it'll work at the end.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

In D71241#1777972, @jdoerfert wrote:

In D71241#1777709, @ABataev wrote:

In D71241#1777661, @jdoerfert wrote:

In D71241#1776798, @ABataev wrote:

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Why is any of this complicated to you? The logic to do the overloading is 15 lines long and most of it is to determine the best of all versions. What about that is more complicated than having multiple patch points during code generation in which we try to modify existing IR but sometimes have to delay it and hope it'll work at the end.

Without templates etc. Early resolution of the variant function leads to problems with AST at least. Plus, as I said, problems with handling complex features of C++.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

In D71241#1778134, @ABataev wrote:

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

In D71241#1778564, @hfinkel wrote:

In D71241#1778134, @ABataev wrote:

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

What's not appropriate here?

In D71241#1778564, @hfinkel wrote:

In D71241#1778134, @ABataev wrote:

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

This is significant issue that you're modifiy the original user code.
It may have some troubles with serialization/deserialization probably. Plus, I just don't understand why it is good to rewrite the code for the user but to keep the original code is bad.

There can be another one issue with this solution with inline assembly. I’m not completely sure about it, will try to investigate it tomorrow. I suggest to discuss this solution with Richard Smith (or John McCall). If he/they are ok with this transformation of the AST, we can switch to this scheme.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

It is not quite so. Constexprs are not evaluated in sema. You can dump the ast and you will find all these constexprs in their original form. They are evaluated by the interpreter in place where it is required. But AST remains unaffected.

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.
The device function now exists twice, that is bad.
How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).
On the host the expression &base == &hst will evaluate to true with the alias.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

In D71241#1781955, @ABataev wrote:

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype
Try to compile as C++:
clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

In D71241#1782157, @jdoerfert wrote:

In D71241#1781955, @ABataev wrote:

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype
Try to compile as C++:
clang++ -с -fopenmp repro.cpp
You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

In D71241#1782173, @ABataev wrote:

In D71241#1782157, @jdoerfert wrote:

In D71241#1781955, @ABataev wrote:

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype
Try to compile as C++:
clang++ -с -fopenmp repro.cpp
You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?
Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?
In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

In D71241#1782317, @jdoerfert wrote:

In D71241#1782173, @ABataev wrote:

In D71241#1782157, @jdoerfert wrote:

In D71241#1781955, @ABataev wrote:

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

It is a different problem, with a compatibility check in C++. I'll investigate it and fix it. I mean, it is better to improve the error message, the error itself is emitted correctly because in C++ types are compatible only if they are equal (i.e. the type is the same). Strict type checking.

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.

Yes, I agree with this. But it is not intended to support construct traits since the very beginning, it is a solution only for global traits (like kind, isa, etc.) For construct the idea is to use the original function but choose this function at the codegen phase. It is much easier, no need to worry about templates and all other stuff, though I don't like it, because the user will see that instead of the base function the variant is called directly.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype
Try to compile as C++:
clang++ -с -fopenmp repro.cpp
You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?
Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.
I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?

Yes, I tried it with the original solution and your solution. The original works correctly, your patch leads to an error message about incorrect assembler instruction.

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

The function is marked as used but the current solution does not emit it. That is plain wrong. Even if it was not marked as used but externally visible you cannot *not emit it*. The alias solution is simply not working.

In D71241#1782317, @jdoerfert wrote:
In D71241#1782173, @ABataev wrote:

In D71241#1782157, @jdoerfert wrote:

In D71241#1781955, @ABataev wrote:

In D71241#1780715, @jdoerfert wrote:

In D71241#1779097, @ABataev wrote:

In D71241#1778963, @jdoerfert wrote:

In D71241#1778736, @ABataev wrote:

In D71241#1778717, @jdoerfert wrote:

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.
In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.
I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype
Try to compile as C++:
clang++ -с -fopenmp repro.cpp
You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?
Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.
I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?
In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

One note:

In D71241#1782365, @jdoerfert wrote:

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

The function is marked as used but the current solution does not emit it. That is plain wrong. Even if it was not marked as used but externally visible you cannot *not emit it*. The alias solution is simply not working.

Please, add Richard Smith and John McCall as reviewers. Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

In D71241#1782425, @JonChesterfield wrote:

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

In D71241#1782427, @ABataev wrote:

In D71241#1782425, @JonChesterfield wrote:

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

In D71241#1782430, @JonChesterfield wrote:

In D71241#1782427, @ABataev wrote:

In D71241#1782425, @JonChesterfield wrote:

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.

So let's actually look at the AST instead of just talking about it:

We take the asm.cpp example from @ABataev that, as I argued earlier, shows nicely why the alias solution does not work at all once we start thinking about linking things.

Now here is the code with calls to make it actually interesting:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

void foo() {
  cpu();
  wrong_asm();
}

In the current approach the as of foo looks like this:

`-FunctionDecl 0x563baf0af958 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x563baf0afb38 <col:12, line:11:1>
    |-CallExpr 0x563baf0afa78 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x563baf0afa60 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x563baf0afa40 <col:3> 'void ()' lvalue Function 0x563baf0af458 'cpu' 'void ()'
    `-CallExpr 0x563baf0afb18 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x563baf0afb00 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x563baf0afae0 <col:3> 'void ()' lvalue Function 0x563baf0af668 'wrong_asm' 'void ()'

As you might see, you don't see any hint of the declare variant stuff that will eventually transform the wrong_asm call into a cpu call.

In the proposed scheme, the AST looks like this:

`-FunctionDecl 0x1e53398 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x1e53580 <col:12, line:11:1>
    |-CallExpr 0x1e534b8 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x1e534a0 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x1e53480 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()'
    `-CallExpr 0x1e53560 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x1e53548 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x1e53520 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()' (Function 0x1e530a8 'wrong_asm' 'void ()')

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

In D71241#1779779, @ABataev wrote:
Here is the example that does not work with the proposed solution but works with the existing one:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}
The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand that this example represents. Unused static functions are generally not emitted. In general, if we perform overload resolution in Sema and, thus, only use the variants that are selected, then others that are static won't even be emitted. Here you're forcing the wrong_asm function to be used, but if it's used on all devices (host and target), then it's used, and we need to deal with the inline asm regardless.

In D71241#1782504, @jdoerfert wrote:
In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.
So let's actually look at the AST instead of just talking about it:

We take the asm.cpp example from @ABataev that, as I argued earlier, shows nicely why the alias solution does not work at all once we start thinking about linking things.

Now here is the code with calls to make it actually interesting:
static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

void foo() {
  cpu();
  wrong_asm();
}
In the current approach the as of foo looks like this:
`-FunctionDecl 0x563baf0af958 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x563baf0afb38 <col:12, line:11:1>
    |-CallExpr 0x563baf0afa78 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x563baf0afa60 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x563baf0afa40 <col:3> 'void ()' lvalue Function 0x563baf0af458 'cpu' 'void ()'
    `-CallExpr 0x563baf0afb18 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x563baf0afb00 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x563baf0afae0 <col:3> 'void ()' lvalue Function 0x563baf0af668 'wrong_asm' 'void ()'
As you might see, you don't see any hint of the declare variant stuff that will eventually transform the wrong_asm call into a cpu call.

In the proposed scheme, the AST looks like this:
`-FunctionDecl 0x1e53398 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x1e53580 <col:12, line:11:1>
    |-CallExpr 0x1e534b8 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x1e534a0 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x1e53480 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()'
    `-CallExpr 0x1e53560 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x1e53548 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x1e53520 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()' (Function 0x1e530a8 'wrong_asm' 'void ()')
Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

In D71241#1782551, @hfinkel wrote:

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

In D71241#1782614, @ABataev wrote:

In D71241#1782551, @hfinkel wrote:

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

In D71241#1778671, @ABataev wrote:

There can be another one issue with this solution with inline assembly. I’m not completely sure about it, will try to investigate it tomorrow. I suggest to discuss this solution with Richard Smith (or John McCall). If he/they are ok with this transformation of the AST, we can switch to this scheme.

@jdoerfert , please do add Richard and John to this thread. We should be kind to them, however, and please write a summary of the language feature including some examples showing usage, and please also summarize the current implementation strategy and the one being proposed, so that they don't need to read the OpenMP spec to figure out what the discussion is about.

In D71241#1782614, @ABataev wrote:

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

And that is a good thing. Even if you argue it is not, *only* in the Sema solution the tools have *all* the information available to redirect to the base or variant function.

In D71241#1782612, @ABataev wrote:

In D71241#1782504, @jdoerfert wrote:

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

This is plain insulting (again) and beyond the point. I just shown you that the proposed solution has *all* the information in the AST available to be used by tools, codegen, ... I did so because you claimed that would not be the case, e.g. the AST would not represent the program faithfully. As you see, all the original information is available. However, you still refuse to acknowledge that and instead try to discredit me. I am tired of this kind of "discussion", we went down this road before and, as it was back then, there is nothing to be gained. It is harmful for the community and it is insulting towards me.

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).
Take https://godbolt.org/z/Yi9Lht where the wrong function is called on the host (no there is *no* alias hidden)
Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.
Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

In D71241#1782648, @hfinkel wrote:

In D71241#1782614, @ABataev wrote:

In D71241#1782551, @hfinkel wrote:

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

In D71241#1782430, @JonChesterfield wrote:

In D71241#1782427, @ABataev wrote:

In D71241#1782425, @JonChesterfield wrote:

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

I'd like to add that what we're talking about is none of these things. We're not talking about "mutating" the AST at all. Neither are we inlining. We're talking about performing callee resolution when building the AST in the first place. This is exactly what we do in all other places where to do overload resolution.

This is different from other places where we perform overload resolution only in that the callee won't have the same name as the identifier used in the call expression. But that's okay - those are the semantics of the calls with OpenMP variants. You type one name, and the function that ends up being called has another name. But it's all static and part of the specified language semantics. Should we record the original "base" function? Yes. Should we represent it as the callee? No.

In D71241#1782650, @jdoerfert wrote:

In D71241#1782614, @ABataev wrote:

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

And that is a good thing. Even if you argue it is not, *only* in the Sema solution the tools have *all* the information available to redirect to the base or variant function.

In D71241#1782612, @ABataev wrote:

In D71241#1782504, @jdoerfert wrote:

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

This is plain insulting (again) and beyond the point. I just shown you that the proposed solution has *all* the information in the AST available to be used by tools, codegen, ... I did so because you claimed that would not be the case, e.g. the AST would not represent the program faithfully. As you see, all the original information is available. However, you still refuse to acknowledge that and instead try to discredit me. I am tired of this kind of "discussion", we went down this road before and, as it was back then, there is nothing to be gained. It is harmful for the community and it is insulting towards me.

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Take https://godbolt.org/z/Yi9Lht where the wrong function is called on the host (no there is *no* alias hidden)

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.

Undefined behavior according to the standard.

Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

In D71241#1782652, @ABataev wrote:

In D71241#1782648, @hfinkel wrote:

In D71241#1782614, @ABataev wrote:

In D71241#1782551, @hfinkel wrote:

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

In D71241#1782668, @ABataev wrote:

...

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Are the variants not permitted to be external functions?

In D71241#1782700, @hfinkel wrote:

In D71241#1782668, @ABataev wrote:

...

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Are the variants not permitted to be external functions?

Allowed, of course. But the alias/body will be emitted only if variant function is defined. Everyhing else is going to be resolved by the linker.

In D71241#1782670, @hfinkel wrote:

In D71241#1782652, @ABataev wrote:

In D71241#1782648, @hfinkel wrote:

In D71241#1782614, @ABataev wrote:

In D71241#1782551, @hfinkel wrote:

In D71241#1779168, @JonChesterfield wrote:

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I don't insist on function redefinition solution. You want to replace functions - fine, but do this at the codegen, not in AST.

In D71241#1782703, @ABataev wrote:

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

In D71241#1782742, @hfinkel wrote:

In D71241#1782703, @ABataev wrote:

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

In D71241#1782723, @ABataev wrote:

I don't insist on function redefinition solution. You want to replace functions - fine, but do this at the codegen, not in AST.

Again, no one is replacing anything, and we're not mutating the AST. We're simply resolving the callee according to the language rules. That's something that should be done during AST construction.

It's like if I have this code:

template <int x>
int foo() { return 0; }

template <>
int foo<8>() { return 1; }

int main() {
  return foo<8>();
}

and you said that, in the AST, it should look like the unspecialized foo was being called. And then later, in CodeGen, something happened in order to cause the correct specialization would be called. That clearly would not be considered an acceptable design.

In D71241#1782779, @ABataev wrote:

In D71241#1782742, @hfinkel wrote:

In D71241#1782703, @ABataev wrote:

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

But then we'd to add code to do that in Clang's static analyzer and all other code trying to match caller/callee relationships. The function provided in the AST being not the function that will actually be called. Instead, we should make these tools correct by default and make tools wanting to to OpenMP-specific things be the tools that need to call the OpenMP-specific functions.

In D71241#1782812, @hfinkel wrote:

In D71241#1782779, @ABataev wrote:

In D71241#1782742, @hfinkel wrote:

In D71241#1782703, @ABataev wrote:

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

But then we'd to add code to do that in Clang's static analyzer and all other code trying to match caller/callee relationships. The function provided in the AST being not the function that will actually be called. Instead, we should make these tools correct by default and make tools wanting to to OpenMP-specific things be the tools that need to call the OpenMP-specific functions.

In general, yes, but not necessarily. We can teach existing functions like getBody(), isDefined() etc. about this feature and thus tools will get the correct function automatically.
But I suggest to discuss this with Richard Smith.

In D71241#1782846, @ABataev wrote:

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this? The last few posts by Hal look clear to me. Especially convincing is:

We're simply resolving the callee according to the language rules.

In D71241#1782898, @JonChesterfield wrote:

In D71241#1782846, @ABataev wrote:

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this?

Yes, necessary.

The last few posts by Hal look clear to me. Especially convincing is:

We're simply resolving the callee according to the language rules.

In D71241#1782668, @ABataev wrote:

In D71241#1782650, @jdoerfert wrote:

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

This is perfectly valid code and with the current scheme impossible to support.

Take https://godbolt.org/z/Yi9Lht where the wrong function is called on the host (no there is *no* alias hidden)

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Exactly, as above, this is a problem.

Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

This can also not be implemented in the alias scheme.

In D71241#1782917, @ABataev wrote:

In D71241#1782898, @JonChesterfield wrote:

In D71241#1782846, @ABataev wrote:

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this?

Yes, necessary.

http://lists.llvm.org/pipermail/cfe-dev/2019-December/064101.html

In D71241#1782963, @jdoerfert wrote:

In D71241#1782668, @ABataev wrote:

In D71241#1782650, @jdoerfert wrote:

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

This is perfectly valid code and with the current scheme impossible to support.

Take https://godbolt.org/z/Yi9Lht where the wrong function is called on the host (no there is *no* alias hidden)

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Exactly, as above, this is a problem.

Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

Page 59, 25-27.

Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

This can also not be implemented in the alias scheme.

In D71241#1782586, @hfinkel wrote:
In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.
No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

In D71241#1783444, @ABataev wrote:
In D71241#1782586, @hfinkel wrote:
In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.
No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.
You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Yes, but our decltype representation is a semantically-accurate representation of the source constructs. Your CodeGen approach to variant selection leads to a semantically-inaccurate representation: it produces a CallExpr that has a DeclRefExpr to the wrong function declaration.

In any case, our decltype representation is fairly analogous to what I proposed above. DecltypeType derives from Type, and stores both the original expression and the underlying type. If we have an OpenMPVariantCallExpr, it would also store a deference to the base function definition, but like desugar() returns the "resolved" regular type, OpenMPVariantCallExpr's getCallee() would return the resolved function that will be called.

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

I'm aware of how this works. If we see a need for a lazy evaluation strategy here we can certainly discuss that. And I agree that we should not drop all information about the base function. I'm simply saying that's not what getCallee() should return. However, what you're doing here is not analogous to the evaluation of constexprs. In short, keeping close to the original source does justify misrepresenting the semantics.

In D71241#1783499, @hfinkel wrote:
In D71241#1783444, @ABataev wrote:
In D71241#1782586, @hfinkel wrote:
In D71241#1782460, @JonChesterfield wrote:
https://clang.llvm.org/docs/InternalsManual.html#the-ast-library
Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.
That's pretty convincing.
No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.
You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).
Yes, but our decltype representation is a semantically-accurate representation of the source constructs. Your CodeGen approach to variant selection leads to a semantically-inaccurate representation: it produces a CallExpr that has a DeclRefExpr to the wrong function declaration.

In any case, our decltype representation is fairly analogous to what I proposed above. DecltypeType derives from Type, and stores both the original expression and the underlying type. If we have an OpenMPVariantCallExpr, it would also store a deference to the base function definition, but like desugar() returns the "resolved" regular type, OpenMPVariantCallExpr's getCallee() would return the resolved function that will be called.

But we don't need it. We have all the required information in the AST tree. Each function has associated list of attributes with the context traits and variant function. When you process CallExpr, just check the list of these attributes and return the address of the variant function instead of the original.

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

I'm aware of how this works. If we see a need for a lazy evaluation strategy here we can certainly discuss that. And I agree that we should not drop all information about the base function. I'm simply saying that's not what getCallee() should return. However, what you're doing here is not analogous to the evaluation of constexprs. In short, keeping close to the original source does justify misrepresenting the semantics.

Let's wait the answer from Richard Smith.

I don't think it's reasonable to stall progress on optimising openmp indefinitely. Would you accept a time out of a week, after which the majority vote carries it?

Edit: race condition, Hal sent an email while I was writing this.

In D71241#1783362, @ABataev wrote:

Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

Page 59, 25-27.

OpenMP 5.0, Page 59, 25-27:

• If the function has any declarations, then the declare variant directives for any declarations that have one must be equivalent. If the function definition has a declare variant, it must also be equivalent. Otherwise, the result is unspecified.

In the example (https://godbolt.org/z/2evvtN), all declare variant directives on declarations *that have one* are equivalent, since only one declaration has a declare variant. Since the function definition does not have a declare variant, the second restriction is also fulfilled.

To summarize, this example is broken in the current scheme and valid according to the spec.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

For example, can it handle such cases as:
t1.c:

int hst();
#pragma omp declare variant(hst) match(device={kind(host)})
int base();

t2.c:

int main() {
  return base();
}

This is the correct C code, I assume. At least, clang compiles it.

Another problem:
t1.c:

int hst();
#pragma omp declare varint(hst) match(device={kind(host)})
int base();

t2.c:

int base() { return 0; }
int main() {
  return base();
}

According to the standard, this is valid code and hst function must be called (though it is not correct since in C/C++ each definition is a declaration and all restriction applied to the declaration must be applied to the definition too).

Another one possible problem might be with the templates:

template <typename T> T base() { return T(); }
int hst() { return 1; }

int main() {
  return base<int>();
}

#pragma omp declare variant(hst) match(device={kind(gpu)})
template<typename T>
T base();

int foo() {
  return base<int>();
}

sstefan1 added a subscriber: sstefan1.Dec 16 2019, 2:07 PM

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about? This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

For example, can it handle such cases as:
t1.c:
int hst();
#pragma omp declare variant(hst) match(device={kind(host)})
int base();
t2.c:
int main() {
  return base();
}
This is the correct C code, I assume. At least, clang compiles it.

Another problem:
t1.c:
int hst();
#pragma omp declare varint(hst) match(device={kind(host)})
int base();
t2.c:
int base() { return 0; }
int main() {
  return base();
}
According to the standard, this is valid code and hst function must be called (though it is not correct since in C/C++ each definition is a declaration and all restriction applied to the declaration must be applied to the definition too).

The second example is valid code and it doens't matter if the first example is.
The argument I used here https://reviews.llvm.org/D71241#1783711 holds again and the hst function *must not be called* in either case.
The standard is clear on that and that is one of the reason the alias solution *does not work*.

Another one possible problem might be with the templates:

template <typename T> T base() { return T(); }
int hst() { return 1; }

int main() {
  return base<int>();
}

#pragma omp declare variant(hst) match(device={kind(gpu)})
template<typename T>
T base();

int foo() {
  return base<int>();
}

It would be helpful if you explain the problem that might or might not arise. In the above code there is no target so it is unclear if foo would be emitted for the gpu or not (main is not emitted for the gpu for sure). Only when foo is emitted for the gpu, the call would be to hst, otherwise always to base.

ikitayama added a subscriber: ikitayama.Dec 16 2019, 5:53 PM

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

So far, I haven't seen a reason why we would need any new expression. If you think we need one, please explain to me why.

FWIW. The only open question I have is with the OpenMP committee, I'll have to figure out if we really want to restrict the variant selection to calls only. Anyway, we can implement it either way in this scheme.

If I get to it tomorrow, I'll split the patch and update it based on my newest version. I'll also write test cases for all the situations we get wrong right now (https://reviews.llvm.org/D71241#1782650).

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

In D71241#1787571, @ABataev wrote:

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Hal, are we going to support something like this?

void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}

Currently, there is an error when we try to emit the assembler output.

In D71241#1787888, @ABataev wrote:

Hal, are we going to support something like this?

I'm not Hal but I will answer anyway.

void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}
Currently, there is an error when we try to emit the assembler output.

While it is unclear to me what you think should happen, an error pointing at "xxx" is to be expected without further information on how this is compiled.

In D71241#1787998, @jdoerfert wrote:
In D71241#1787888, @ABataev wrote:

Hal, are we going to support something like this?

I'm not Hal but I will answer anyway.
void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}
Currently, there is an error when we try to emit the assembler output.
While it is unclear to me what you think should happen, an error pointing at "xxx" is to be expected without further information on how this is compiled.

Shall we emit function wrong_asm or not? If we're not going to use it in our program, only cpu must be used, wrong_asm is not required and must not be emitted.

In D71241#1787652, @hfinkel wrote:
In D71241#1787571, @ABataev wrote:

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):
} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)
and in SemaExpr.cpp near line 2783, we have:
// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

For the record:
OpenMP basically says, if you have a call to a (base)function that has variants with contexts that match at the call site, call the variant with the highest score. The variants are specified by a variant-func-id, which is a base language identifier or C++ template-id. For C++, the variant declaration is identified by *performing the base language lookup rules on the variant-func-id with arguments that correspond to the base function argument types*.

In D71241#1788003, @jdoerfert wrote:
In D71241#1787652, @hfinkel wrote:
In D71241#1787571, @ABataev wrote:

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):
} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)
and in SemaExpr.cpp near line 2783, we have:
// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));
Could you specify what behavior you expect, or what the test casees would look like?

For the record:
OpenMP basically says, if you have a call to a (base)function that has variants with contexts that match at the call site, call the variant with the highest score. The variants are specified by a variant-func-id, which is a base language identifier or C++ template-id. For C++, the variant declaration is identified by *performing the base language lookup rules on the variant-func-id with arguments that correspond to the base function argument types*.

No need to worry about lookup in C++, ADL lookup is implemented already.

In D71241#1788003, @jdoerfert wrote:

In D71241#1787652, @hfinkel wrote:

In D71241#1787571, @ABataev wrote:

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

I've read through some of the relevant parts of the OpenMP 5.0 specification (but not the 5.1 specification), and it looks like this is the same kind of language-specific function resolution that we do in C++: name lookup finds one declaration, which we then statically resolve to a different declaration. As with the C++ case, it seems reasonable and useful to me to represent the statically-selected callee in the AST as the chosen declaration in the DeclRefExpr -- that will be the most useful thing for tooling, static analysis, and so on.

However, that seems to lose information in some cases. Consider this:

void f(int) {}

template<typename T> void g(T) {}

#pragma omp declare variant(f) match(implementation = {vendor(llvm)})
template<> void g(int) {}

void h() { g(0); }

Here, h() calls f(int). The approach in this patch will form a DeclRefExpr whose FoundDecl is the FunctionTemplateDecl g<T>, and whose resolved declaration is f(int), but that has no reference to g<int> (where the declare variant appears). That seems like it could be annoying for some tooling uses to deal with; there's no real way to get back to g<int> without redoing template argument deduction or similar.

One possibility to improve the representation would be to replace the existing NamedDecl* storage for FoundDecls with a PointerUnion<NamedDecl*, OpenMPFoundVariantDecl*>, where a OpenMPFoundVariantDecl is an ASTContext-allocated struct listing the original found declaration and the function with the declare variant pragma.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):
} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

This case is only concerned with type templates, so we don't need to worry about it.

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

This code is handling a particular case of class member access in C++: if you name a member of class Base via a using-declaration in class Derived, we convert first to class Derived and then to class Base, and this is important for the case where (for example) Base is an ambiguous base class. Consider:

struct A {
    void f() {}
    int n;
};

struct B : A {
#pragma omp declare variant(A::f) match(implementation = {vendor(llvm)})
    void g() {}
};

struct C : A, B {};

void h(C *p) { p->g(); }

What I think should happen here (per the OpenMP rules, applied to this case in the most natural way they seem to be applicable) is that we first convert p to B* (the this type of B::g), and then convert it to A* (the this type of the selected variant) -- just like for the source language call p->A::f(). That's actually exactly what will happen if you make B::g be the found declaration of the member access and A::f be the resolved declaration, as this patch does. So I don't think we need changes there, assuming the case above works with this patch -- it seems to crash in code generation without this patch.

This is also, I think, a fairly decisive argument for representing the variant selection in the AST rather than deferring it to CodeGen -- we want to form the implicit conversion of the object parameter to A* in Sema.

Incidentally, if you make A a virtual base class of B, we produce what seems to be an incorrect and confusing diagnostic:

<source>:7:29: error: conversion from pointer to member of class 'A' to pointer to member of class 'B' via virtual base 'A' is not allowed

Finally, to address the question about what AST fidelity means in this case: we certainly want the AST to represent the program as written. But that's not all: we want the AST to also represent the semantics of the program in a reasonably useful form. For a DeclRefExpr, it's more useful to directly point to the statically-chosen declaration than to expect the users of DeclRefExpr to find it for themselves, especially since DeclRefExpr already has a mechanism to track the syntactic form (the found declaration) separately from the semantically selected declaration.

In D71241#1870218, @rsmith wrote:
In D71241#1788003, @jdoerfert wrote:

In D71241#1787652, @hfinkel wrote:

In D71241#1787571, @ABataev wrote:

In D71241#1787265, @hfinkel wrote:

In D71241#1786959, @jdoerfert wrote:

In D71241#1786530, @ABataev wrote:

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

I've read through some of the relevant parts of the OpenMP 5.0 specification (but not the 5.1 specification), and it looks like this is the same kind of language-specific function resolution that we do in C++: name lookup finds one declaration, which we then statically resolve to a different declaration. As with the C++ case, it seems reasonable and useful to me to represent the statically-selected callee in the AST as the chosen declaration in the DeclRefExpr -- that will be the most useful thing for tooling, static analysis, and so on.

However, that seems to lose information in some cases. Consider this:
void f(int) {}

template<typename T> void g(T) {}

#pragma omp declare variant(f) match(implementation = {vendor(llvm)})
template<> void g(int) {}

void h() { g(0); }
Here, h() calls f(int). The approach in this patch will form a DeclRefExpr whose FoundDecl is the FunctionTemplateDecl g<T>, and whose resolved declaration is f(int), but that has no reference to g<int> (where the declare variant appears). That seems like it could be annoying for some tooling uses to deal with; there's no real way to get back to g<int> without redoing template argument deduction or similar.

One possibility to improve the representation would be to replace the existing NamedDecl* storage for FoundDecls with a PointerUnion<NamedDecl*, OpenMPFoundVariantDecl*>, where a OpenMPFoundVariantDecl is an ASTContext-allocated struct listing the original found declaration and the function with the declare variant pragma.

Hi Richard, thanks for your answer. I agree that this is the best option.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):
} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)
This case is only concerned with type templates, so we don't need to worry about it.
and in SemaExpr.cpp near line 2783, we have:
// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));
Could you specify what behavior you expect, or what the test casees would look like?
This code is handling a particular case of class member access in C++: if you name a member of class Base via a using-declaration in class Derived, we convert first to class Derived and then to class Base, and this is important for the case where (for example) Base is an ambiguous base class. Consider:
struct A {
    void f() {}
    int n;
};

struct B : A {
#pragma omp declare variant(A::f) match(implementation = {vendor(llvm)})
    void g() {}
};

struct C : A, B {};

void h(C *p) { p->g(); }
What I think should happen here (per the OpenMP rules, applied to this case in the most natural way they seem to be applicable) is that we first convert p to B* (the this type of B::g), and then convert it to A* (the this type of the selected variant) -- just like for the source language call p->A::f(). That's actually exactly what will happen if you make B::g be the found declaration of the member access and A::f be the resolved declaration, as this patch does. So I don't think we need changes there, assuming the case above works with this patch -- it seems to crash in code generation without this patch.

This is also, I think, a fairly decisive argument for representing the variant selection in the AST rather than deferring it to CodeGen -- we want to form the implicit conversion of the object parameter to A* in Sema.

Incidentally, if you make A a virtual base class of B, we produce what seems to be an incorrect and confusing diagnostic:
<source>:7:29: error: conversion from pointer to member of class 'A' to pointer to member of class 'B' via virtual base 'A' is not allowed
Finally, to address the question about what AST fidelity means in this case: we certainly want the AST to represent the program as written. But that's not all: we want the AST to also represent the semantics of the program in a reasonably useful form. For a DeclRefExpr, it's more useful to directly point to the statically-chosen declaration than to expect the users of DeclRefExpr to find it for themselves, especially since DeclRefExpr already has a mechanism to track the syntactic form (the found declaration) separately from the semantically selected declaration.

tra added a subscriber: tra.Mar 4 2020, 4:22 PM

D75779 is the proper implementation of the OpenMP standard.

Revision Contents

Path

Size

clang/

include/

clang/

AST/

StmtOpenMP.h

12 lines

Sema/

Sema.h

6 lines

lib/

AST/

Expr.cpp

5 lines

StmtOpenMP.cpp

263 lines

CodeGen/

CGOpenMPRuntime.h

14 lines

CGOpenMPRuntime.cpp

328 lines

CGOpenMPRuntimeNVPTX.h

12 lines

CGOpenMPRuntimeNVPTX.cpp

13 lines

CodeGenModule.cpp

14 lines

Sema/

1 line

5 lines

19 lines

18 lines

SemaTemplateInstantiateDecl.cpp

91 lines

test/

OpenMP/

declare_variant_ast_print.cpp

3 lines

declare_variant_device_kind_codegen.cpp

15 lines

declare_variant_implementation_vendor_codegen.cpp

15 lines

declare_variant_mixed_codegen.cpp

15 lines

nvptx_declare_variant_device_kind_codegen.cpp

13 lines

nvptx_declare_variant_implementation_vendor_codegen.cpp

13 lines

Diff 233011

clang/include/clang/AST/StmtOpenMP.h

Show First 20 Lines • Show All 4,588 Lines • ▼ Show 20 Lines	public:
CreateEmpty(const ASTContext &C, unsigned NumClauses, unsigned CollapsedNum,		CreateEmpty(const ASTContext &C, unsigned NumClauses, unsigned CollapsedNum,
EmptyShell);		EmptyShell);

static bool classof(const Stmt *T) {		static bool classof(const Stmt *T) {
return T->getStmtClass() == OMPTargetTeamsDistributeSimdDirectiveClass;		return T->getStmtClass() == OMPTargetTeamsDistributeSimdDirectiveClass;
}		}
};		};

		class OMPDeclareVariantAttr;

		/// Helper to determine the best of two potential context matches. Note that
		/// nullptr are valid inputs but also valid outputs, e.g., if neither attribute
		/// describes a matching context.
		const OMPDeclareVariantAttr *
		getBetterOpenMPContextMatch(ASTContext &C, const OMPDeclareVariantAttr *LHSAttr,
		const OMPDeclareVariantAttr *RHSAttr);

		/// Return true if the context described by \p A matches.
		bool isOpenMPContextMatch(ASTContext &C, const OMPDeclareVariantAttr *A);

} // end namespace clang		} // end namespace clang

#endif		#endif

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,397 Lines • ▼ Show 20 Lines	private:
/// \param Selector selects diagnostic message: 0 for type and 1 for		/// \param Selector selects diagnostic message: 0 for type and 1 for
/// declaration.		/// declaration.
/// \return true if the type or declaration is disabled.		/// \return true if the type or declaration is disabled.
template <typename T, typename DiagLocT, typename DiagInfoT, typename MapT>		template <typename T, typename DiagLocT, typename DiagInfoT, typename MapT>
bool checkOpenCLDisabledTypeOrDecl(T D, DiagLocT DiagLoc, DiagInfoT DiagInfo,		bool checkOpenCLDisabledTypeOrDecl(T D, DiagLocT DiagLoc, DiagInfoT DiagInfo,
MapT &Map, unsigned Selector = 0,		MapT &Map, unsigned Selector = 0,
SourceRange SrcRange = SourceRange());		SourceRange SrcRange = SourceRange());

/// Marks all the functions that might be required for the currently active
/// OpenMP context.
void markOpenMPDeclareVariantFuncsReferenced(SourceLocation Loc,
FunctionDecl *Func,
bool MightBeOdrUse);

public:		public:
/// Struct to store the context selectors info for declare variant directive.		/// Struct to store the context selectors info for declare variant directive.
using OMPCtxStringType = SmallString<8>;		using OMPCtxStringType = SmallString<8>;
using OMPCtxSelectorData =		using OMPCtxSelectorData =
OpenMPCtxSelectorData<SmallVector<OMPCtxStringType, 4>, ExprResult>;		OpenMPCtxSelectorData<SmallVector<OMPCtxStringType, 4>, ExprResult>;

/// Checks if the variant/multiversion functions are compatible.		/// Checks if the variant/multiversion functions are compatible.
bool areMultiversionVariantFunctionsCompatible(		bool areMultiversionVariantFunctionsCompatible(
▲ Show 20 Lines • Show All 2,403 Lines • Show Last 20 Lines

clang/lib/AST/Expr.cpp

Show First 20 Lines • Show All 1,643 Lines • ▼ Show 20 Lines	MemberExpr::MemberExpr(Expr *Base, bool IsArrow, SourceLocation OperatorLoc,
const DeclarationNameInfo &NameInfo, QualType T,		const DeclarationNameInfo &NameInfo, QualType T,
ExprValueKind VK, ExprObjectKind OK,		ExprValueKind VK, ExprObjectKind OK,
NonOdrUseReason NOUR)		NonOdrUseReason NOUR)
: Expr(MemberExprClass, T, VK, OK, Base->isTypeDependent(),		: Expr(MemberExprClass, T, VK, OK, Base->isTypeDependent(),
Base->isValueDependent(), Base->isInstantiationDependent(),		Base->isValueDependent(), Base->isInstantiationDependent(),
Base->containsUnexpandedParameterPack()),		Base->containsUnexpandedParameterPack()),
Base(Base), MemberDecl(MemberDecl), MemberDNLoc(NameInfo.getInfo()),		Base(Base), MemberDecl(MemberDecl), MemberDNLoc(NameInfo.getInfo()),
MemberLoc(NameInfo.getLoc()) {		MemberLoc(NameInfo.getLoc()) {
assert(!NameInfo.getName() \|\|		// TODO: Add an OMP declare variant attribute on the variant declaration.
MemberDecl->getDeclName() == NameInfo.getName());		// assert(!NameInfo.getName() \|\| MemberDecl->hasAttr<OMPDeclareVariantAttr>()
		// \|\| MemberDecl->getDeclName() == NameInfo.getName());
MemberExprBits.IsArrow = IsArrow;		MemberExprBits.IsArrow = IsArrow;
MemberExprBits.HasQualifierOrFoundDecl = false;		MemberExprBits.HasQualifierOrFoundDecl = false;
MemberExprBits.HasTemplateKWAndArgsInfo = false;		MemberExprBits.HasTemplateKWAndArgsInfo = false;
MemberExprBits.HadMultipleCandidates = false;		MemberExprBits.HadMultipleCandidates = false;
MemberExprBits.NonOdrUseReason = NOUR;		MemberExprBits.NonOdrUseReason = NOUR;
MemberExprBits.OperatorLoc = OperatorLoc;		MemberExprBits.OperatorLoc = OperatorLoc;
}		}

▲ Show 20 Lines • Show All 3,043 Lines • Show Last 20 Lines

clang/lib/AST/StmtOpenMP.cpp

//===--- StmtOpenMP.cpp - Classes for OpenMP directives -------------------===//		//===--- StmtOpenMP.cpp - Classes for OpenMP directives -------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the subclesses of Stmt class declared in StmtOpenMP.h		// This file implements the subclesses of Stmt class declared in StmtOpenMP.h
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/AST/StmtOpenMP.h"		#include "clang/AST/StmtOpenMP.h"

#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
		#include "clang/AST/Attr.h"
		#include "llvm/ADT/SetOperations.h"

using namespace clang;		using namespace clang;
using namespace llvm::omp;		using namespace llvm::omp;

void OMPExecutableDirective::setClauses(ArrayRef<OMPClause *> Clauses) {		void OMPExecutableDirective::setClauses(ArrayRef<OMPClause *> Clauses) {
assert(Clauses.size() == getNumClauses() &&		assert(Clauses.size() == getNumClauses() &&
"Number of clauses is not the same as the preallocated buffer");		"Number of clauses is not the same as the preallocated buffer");
std::copy(Clauses.begin(), Clauses.end(), getClauses().begin());		std::copy(Clauses.begin(), Clauses.end(), getClauses().begin());
▲ Show 20 Lines • Show All 2,211 Lines • ▼ Show 20 Lines	auto Size = llvm::alignTo(sizeof(OMPTargetTeamsDistributeSimdDirective),
alignof(OMPClause *));		alignof(OMPClause *));
void *Mem = C.Allocate(		void *Mem = C.Allocate(
Size + sizeof(OMPClause ) NumClauses +		Size + sizeof(OMPClause ) NumClauses +
sizeof(Stmt )		sizeof(Stmt )
numLoopChildren(CollapsedNum, OMPD_target_teams_distribute_simd));		numLoopChildren(CollapsedNum, OMPD_target_teams_distribute_simd));
return new (Mem)		return new (Mem)
OMPTargetTeamsDistributeSimdDirective(CollapsedNum, NumClauses);		OMPTargetTeamsDistributeSimdDirective(CollapsedNum, NumClauses);
}		}

		// TODO: We have various representations for the same data, it might help to
		// reuse some instead of converting them.
		// TODO: It is unclear where this checking code should live. It is used all over
		// the place and would probably fit bet in OMPDeclareVariantAttr.
		using OMPContextSelectorData =
		OpenMPCtxSelectorData<ArrayRef<StringRef>, llvm::APSInt>;
		using CompleteOMPContextSelectorData = SmallVector<OMPContextSelectorData, 4>;

		/// Checks current context and returns true if it matches the context selector.
		template <OpenMPContextSelectorSetKind CtxSet, OpenMPContextSelectorKind Ctx,
		typename... Arguments>
		static bool checkContext(const OMPContextSelectorData &Data,
		Arguments... Params) {
		assert(Data.CtxSet != OMP_CTX_SET_unknown && Data.Ctx != OMP_CTX_unknown &&
		"Unknown context selector or context selector set.");
		return false;
		}

		/// Checks for implementation={vendor(<vendor>)} context selector.
		/// \returns true iff <vendor>="llvm", false otherwise.
		template <>
		bool checkContext<OMP_CTX_SET_implementation, OMP_CTX_vendor>(
		const OMPContextSelectorData &Data) {
		return llvm::all_of(Data.Names,
		[](StringRef S) { return !S.compare_lower("llvm"); });
		}

		/// Checks for device={kind(<kind>)} context selector.
		/// \returns true if <kind>="host" and compilation is for host.
		/// true if <kind>="nohost" and compilation is for device.
		/// true if <kind>="cpu" and compilation is for Arm, X86 or PPC CPU.
		/// true if <kind>="gpu" and compilation is for NVPTX or AMDGCN.
		/// false otherwise.
		template <>
		bool checkContext<OMP_CTX_SET_device, OMP_CTX_kind, const LangOptions &,
		const TargetInfo &>(const OMPContextSelectorData &Data,
		const LangOptions &LO,
		const TargetInfo &TI) {
		for (StringRef Name : Data.Names) {
		if (!Name.compare_lower("host")) {
		if (LO.OpenMPIsDevice)
		return false;
		continue;
		}
		if (!Name.compare_lower("nohost")) {
		if (!LO.OpenMPIsDevice)
		return false;
		continue;
		}
		switch (TI.getTriple().getArch()) {
		case llvm::Triple::arm:
		case llvm::Triple::armeb:
		case llvm::Triple::aarch64:
		case llvm::Triple::aarch64_be:
		case llvm::Triple::aarch64_32:
		case llvm::Triple::ppc:
		case llvm::Triple::ppc64:
		case llvm::Triple::ppc64le:
		case llvm::Triple::x86:
		case llvm::Triple::x86_64:
		if (Name.compare_lower("cpu"))
		return false;
		break;
		case llvm::Triple::amdgcn:
		case llvm::Triple::nvptx:
		case llvm::Triple::nvptx64:
		if (Name.compare_lower("gpu"))
		return false;
		break;
		case llvm::Triple::UnknownArch:
		case llvm::Triple::arc:
		case llvm::Triple::avr:
		case llvm::Triple::bpfel:
		case llvm::Triple::bpfeb:
		case llvm::Triple::hexagon:
		case llvm::Triple::mips:
		case llvm::Triple::mipsel:
		case llvm::Triple::mips64:
		case llvm::Triple::mips64el:
		case llvm::Triple::msp430:
		case llvm::Triple::r600:
		case llvm::Triple::riscv32:
		case llvm::Triple::riscv64:
		case llvm::Triple::sparc:
		case llvm::Triple::sparcv9:
		case llvm::Triple::sparcel:
		case llvm::Triple::systemz:
		case llvm::Triple::tce:
		case llvm::Triple::tcele:
		case llvm::Triple::thumb:
		case llvm::Triple::thumbeb:
		case llvm::Triple::xcore:
		case llvm::Triple::le32:
		case llvm::Triple::le64:
		case llvm::Triple::amdil:
		case llvm::Triple::amdil64:
		case llvm::Triple::hsail:
		case llvm::Triple::hsail64:
		case llvm::Triple::spir:
		case llvm::Triple::spir64:
		case llvm::Triple::kalimba:
		case llvm::Triple::shave:
		case llvm::Triple::lanai:
		case llvm::Triple::wasm32:
		case llvm::Triple::wasm64:
		case llvm::Triple::renderscript32:
		case llvm::Triple::renderscript64:
		return false;
		}
		}
		return true;
		}

		static CompleteOMPContextSelectorData
		translateAttrToContextSelectorData(ASTContext &C,
		const OMPDeclareVariantAttr *A) {
		CompleteOMPContextSelectorData Data;
		if (!A)
		return Data;
		for (unsigned I = 0, E = A->scores_size(); I < E; ++I) {
		Data.emplace_back();
		auto CtxSet = static_cast<OpenMPContextSelectorSetKind>(
		*std::next(A->ctxSelectorSets_begin(), I));
		auto Ctx = static_cast<OpenMPContextSelectorKind>(
		*std::next(A->ctxSelectors_begin(), I));
		Data.back().CtxSet = CtxSet;
		Data.back().Ctx = Ctx;
		const Expr Score = std::next(A->scores_begin(), I);
		Score->dump();
		Data.back().Score = Score->EvaluateKnownConstInt(C);
		switch (Ctx) {
		case OMP_CTX_vendor:
		assert(CtxSet == OMP_CTX_SET_implementation &&
		"Expected implementation context selector set.");
		Data.back().Names =
		llvm::makeArrayRef(A->implVendors_begin(), A->implVendors_end());
		break;
		case OMP_CTX_kind:
		assert(CtxSet == OMP_CTX_SET_device &&
		"Expected device context selector set.");
		Data.back().Names =
		llvm::makeArrayRef(A->deviceKinds_begin(), A->deviceKinds_end());
		break;
		case OMP_CTX_unknown:
		llvm_unreachable("Unknown context selector kind.");
		}
		}
		return Data;
		}

		static bool
		matchesOpenMPContextImpl(const CompleteOMPContextSelectorData &ContextData,
		const LangOptions &LO, const TargetInfo &TI) {
		for (const OMPContextSelectorData &Data : ContextData) {
		switch (Data.Ctx) {
		case OMP_CTX_vendor:
		assert(Data.CtxSet == OMP_CTX_SET_implementation &&
		"Expected implementation context selector set.");
		if (!checkContext<OMP_CTX_SET_implementation, OMP_CTX_vendor>(Data))
		return false;
		break;
		case OMP_CTX_kind:
		assert(Data.CtxSet == OMP_CTX_SET_device &&
		"Expected device context selector set.");
		if (!checkContext<OMP_CTX_SET_device, OMP_CTX_kind, const LangOptions &,
		const TargetInfo &>(Data, LO, TI))
		return false;
		break;
		case OMP_CTX_unknown:
		llvm_unreachable("Unknown context selector kind.");
		}
		}
		return true;
		}

		static bool isStrictSubset(const CompleteOMPContextSelectorData &LHS,
		const CompleteOMPContextSelectorData &RHS) {
		llvm::SmallDenseMap<std::pair<int, int>, llvm::StringSet<>, 4> RHSData;
		for (const OMPContextSelectorData &D : RHS) {
		auto &Pair = RHSData.FindAndConstruct(std::make_pair(D.CtxSet, D.Ctx));
		Pair.getSecond().insert(D.Names.begin(), D.Names.end());
		}
		bool AllSetsAreEqual = true;
		for (const OMPContextSelectorData &D : LHS) {
		auto It = RHSData.find(std::make_pair(D.CtxSet, D.Ctx));
		if (It == RHSData.end())
		return false;
		if (D.Names.size() > It->getSecond().size())
		return false;
		if (llvm::set_union(It->getSecond(), D.Names))
		return false;
		AllSetsAreEqual =
		AllSetsAreEqual && (D.Names.size() == It->getSecond().size());
		}

		return LHS.size() != RHS.size() \|\| !AllSetsAreEqual;
		}

		const OMPDeclareVariantAttr *
		clang::getBetterOpenMPContextMatch(ASTContext &C,
		const OMPDeclareVariantAttr *LHSAttr,
		const OMPDeclareVariantAttr *RHSAttr) {
		const CompleteOMPContextSelectorData LHS =
		translateAttrToContextSelectorData(C, LHSAttr);
		const CompleteOMPContextSelectorData RHS =
		translateAttrToContextSelectorData(C, RHSAttr);
		bool LHSMatch = LHSAttr && matchesOpenMPContextImpl(LHS, C.getLangOpts(),
		C.getTargetInfo());
		bool RHSMatch = RHSAttr && matchesOpenMPContextImpl(RHS, C.getLangOpts(),
		C.getTargetInfo());
		bool LHSisOK = LHSMatch && !LHSAttr->isInherited();
		bool RHSisOK = RHSMatch && !RHSAttr->isInherited();
		if (!LHSisOK && !RHSisOK)
		return nullptr;
		if (LHSisOK && !RHSisOK)
		return LHSAttr;
		if (!LHSisOK && RHSisOK)
		return RHSAttr;
		assert(LHSisOK && RHSisOK && "broken invariant");

		// Score is calculated as sum of all scores + 1.
		llvm::APSInt LHSScore(llvm::APInt(64, 1), /isUnsigned=/false);
		bool RHSIsSubsetOfLHS = isStrictSubset(RHS, LHS);
		if (RHSIsSubsetOfLHS) {
		LHSScore = llvm::APSInt::get(0);
		} else {
		for (const OMPContextSelectorData &Data : LHS) {
		if (Data.Score.getBitWidth() > LHSScore.getBitWidth()) {
		LHSScore = LHSScore.extend(Data.Score.getBitWidth()) + Data.Score;
		} else if (Data.Score.getBitWidth() < LHSScore.getBitWidth()) {
		LHSScore += Data.Score.extend(LHSScore.getBitWidth());
		} else {
		LHSScore += Data.Score;
		}
		}
		}
		llvm::APSInt RHSScore(llvm::APInt(64, 1), /isUnsigned=/false);
		if (!RHSIsSubsetOfLHS && isStrictSubset(LHS, RHS)) {
		RHSScore = llvm::APSInt::get(0);
		} else {
		for (const OMPContextSelectorData &Data : RHS) {
		if (Data.Score.getBitWidth() > RHSScore.getBitWidth()) {
		RHSScore = RHSScore.extend(Data.Score.getBitWidth()) + Data.Score;
		} else if (Data.Score.getBitWidth() < RHSScore.getBitWidth()) {
		RHSScore += Data.Score.extend(RHSScore.getBitWidth());
		} else {
		RHSScore += Data.Score;
		}
		}
		}
		return llvm::APSInt::compareValues(LHSScore, RHSScore) >= 0 ? LHSAttr
		: RHSAttr;
		}

		bool clang::isOpenMPContextMatch(ASTContext &C,
		const OMPDeclareVariantAttr *A) {
		const CompleteOMPContextSelectorData Data =
		translateAttrToContextSelectorData(C, A);
		return matchesOpenMPContextImpl(Data, C.getLangOpts(), C.getTargetInfo());
		}

clang/lib/CodeGen/CGOpenMPRuntime.h

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	protected:
/// Check if the default location must be constant.		/// Check if the default location must be constant.
/// Default is false to support OMPT/OMPD.		/// Default is false to support OMPT/OMPD.
virtual bool isDefaultLocationConstant() const { return false; }		virtual bool isDefaultLocationConstant() const { return false; }

/// Returns additional flags that can be stored in reserved_2 field of the		/// Returns additional flags that can be stored in reserved_2 field of the
/// default location.		/// default location.
virtual unsigned getDefaultLocationReserved2Flags() const { return 0; }		virtual unsigned getDefaultLocationReserved2Flags() const { return 0; }

/// Tries to emit declare variant function for \p OldGD from \p NewGD.
/// \param OrigAddr LLVM IR value for \p OldGD.
/// \param IsForDefinition true, if requested emission for the definition of
/// \p OldGD.
/// \returns true, was able to emit a definition function for \p OldGD, which
/// points to \p NewGD.
virtual bool tryEmitDeclareVariant(const GlobalDecl &NewGD,
const GlobalDecl &OldGD,
llvm::GlobalValue *OrigAddr,
bool IsForDefinition);

/// Returns default flags for the barriers depending on the directive, for		/// Returns default flags for the barriers depending on the directive, for
/// which this barier is going to be emitted.		/// which this barier is going to be emitted.
static unsigned getDefaultFlagsForBarriers(OpenMPDirectiveKind Kind);		static unsigned getDefaultFlagsForBarriers(OpenMPDirectiveKind Kind);

/// Get the LLVM type for the critical name.		/// Get the LLVM type for the critical name.
llvm::ArrayType *getKmpCriticalNameTy() const {return KmpCriticalNameTy;}		llvm::ArrayType *getKmpCriticalNameTy() const {return KmpCriticalNameTy;}

/// Returns corresponding lock object for the specified critical region		/// Returns corresponding lock object for the specified critical region
▲ Show 20 Lines • Show All 1,353 Lines • ▼ Show 20 Lines	public:

/// Checks if the variable has associated OMPAllocateDeclAttr attribute with		/// Checks if the variable has associated OMPAllocateDeclAttr attribute with
/// the predefined allocator and translates it into the corresponding address		/// the predefined allocator and translates it into the corresponding address
/// space.		/// space.
virtual bool hasAllocateAttributeForGlobalVar(const VarDecl *VD, LangAS &AS);		virtual bool hasAllocateAttributeForGlobalVar(const VarDecl *VD, LangAS &AS);

/// Return whether the unified_shared_memory has been specified.		/// Return whether the unified_shared_memory has been specified.
bool hasRequiresUnifiedSharedMemory() const;		bool hasRequiresUnifiedSharedMemory() const;

/// Emits the definition of the declare variant function.
virtual bool emitDeclareVariant(GlobalDecl GD, bool IsForDefinition);
};		};

/// Class supports emissionof SIMD-only code.		/// Class supports emissionof SIMD-only code.
class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {		class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
public:		public:
explicit CGOpenMPSIMDRuntime(CodeGenModule &CGM) : CGOpenMPRuntime(CGM) {}		explicit CGOpenMPSIMDRuntime(CodeGenModule &CGM) : CGOpenMPRuntime(CGM) {}
~CGOpenMPSIMDRuntime() override {}		~CGOpenMPSIMDRuntime() override {}

▲ Show 20 Lines • Show All 567 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntime.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,261 Lines • ▼ Show 20 Lines	CGOpenMPRuntime::CGOpenMPRuntime(CodeGenModule &CGM, StringRef FirstSeparator,
RD->completeDefinition();		RD->completeDefinition();
IdentQTy = C.getRecordType(RD);		IdentQTy = C.getRecordType(RD);
IdentTy = CGM.getTypes().ConvertRecordDeclType(RD);		IdentTy = CGM.getTypes().ConvertRecordDeclType(RD);
KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);		KmpCriticalNameTy = llvm::ArrayType::get(CGM.Int32Ty, /NumElements/ 8);

loadOffloadInfoMetadata();		loadOffloadInfoMetadata();
}		}

bool CGOpenMPRuntime::tryEmitDeclareVariant(const GlobalDecl &NewGD,
const GlobalDecl &OldGD,
llvm::GlobalValue *OrigAddr,
bool IsForDefinition) {
// Emit at least a definition for the aliasee if the the address of the
// original function is requested.
if (IsForDefinition \|\| OrigAddr)
(void)CGM.GetAddrOfGlobal(NewGD);
StringRef NewMangledName = CGM.getMangledName(NewGD);
llvm::GlobalValue *Addr = CGM.GetGlobalValue(NewMangledName);
if (Addr && !Addr->isDeclaration()) {
const auto *D = cast<FunctionDecl>(OldGD.getDecl());
const CGFunctionInfo &FI = CGM.getTypes().arrangeGlobalDeclaration(OldGD);
llvm::Type *DeclTy = CGM.getTypes().GetFunctionType(FI);

// Create a reference to the named value. This ensures that it is emitted
// if a deferred decl.
llvm::GlobalValue::LinkageTypes LT = CGM.getFunctionLinkage(OldGD);

// Create the new alias itself, but don't set a name yet.
auto *GA =
llvm::GlobalAlias::create(DeclTy, 0, LT, "", Addr, &CGM.getModule());

if (OrigAddr) {
assert(OrigAddr->isDeclaration() && "Expected declaration");

GA->takeName(OrigAddr);
OrigAddr->replaceAllUsesWith(
llvm::ConstantExpr::getBitCast(GA, OrigAddr->getType()));
OrigAddr->eraseFromParent();
} else {
GA->setName(CGM.getMangledName(OldGD));
}

// Set attributes which are particular to an alias; this is a
// specialization of the attributes which may be set on a global function.
if (D->hasAttr<WeakAttr>() \|\| D->hasAttr<WeakRefAttr>() \|\|
D->isWeakImported())
GA->setLinkage(llvm::Function::WeakAnyLinkage);

CGM.SetCommonAttributes(OldGD, GA);
return true;
}
return false;
}

void CGOpenMPRuntime::clear() {		void CGOpenMPRuntime::clear() {
InternalVars.clear();		InternalVars.clear();
// Clean non-target variable declarations possibly used only in debug info.		// Clean non-target variable declarations possibly used only in debug info.
for (const auto &Data : EmittedNonTargetVariables) {		for (const auto &Data : EmittedNonTargetVariables) {
if (!Data.getValue().pointsToAliveValue())		if (!Data.getValue().pointsToAliveValue())
continue;		continue;
auto *GV = dyn_cast<llvm::GlobalVariable>(Data.getValue());		auto *GV = dyn_cast<llvm::GlobalVariable>(Data.getValue());
if (!GV)		if (!GV)
continue;		continue;
if (!GV->isDeclaration() \|\| GV->getNumUses() > 0)		if (!GV->isDeclaration() \|\| GV->getNumUses() > 0)
continue;		continue;
GV->eraseFromParent();		GV->eraseFromParent();
}		}
// Emit aliases for the deferred aliasees.
for (const auto &Pair : DeferredVariantFunction) {
StringRef MangledName = CGM.getMangledName(Pair.second.second);
llvm::GlobalValue *Addr = CGM.GetGlobalValue(MangledName);
// If not able to emit alias, just emit original declaration.
(void)tryEmitDeclareVariant(Pair.second.first, Pair.second.second, Addr,
/IsForDefinition=/false);
}
}		}

std::string CGOpenMPRuntime::getName(ArrayRef<StringRef> Parts) const {		std::string CGOpenMPRuntime::getName(ArrayRef<StringRef> Parts) const {
SmallString<128> Buffer;		SmallString<128> Buffer;
llvm::raw_svector_ostream OS(Buffer);		llvm::raw_svector_ostream OS(Buffer);
StringRef Sep = FirstSeparator;		StringRef Sep = FirstSeparator;
for (StringRef Part : Parts) {		for (StringRef Part : Parts) {
OS << Sep << Part;		OS << Sep << Part;
▲ Show 20 Lines • Show All 9,694 Lines • ▼ Show 20 Lines	CGF.EHStack.pushCleanup<OMPAllocateCleanupTy>(NormalAndEHCleanup, FiniRTLFn,
llvm::makeArrayRef(FiniArgs));		llvm::makeArrayRef(FiniArgs));
Addr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(		Addr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
Addr,		Addr,
CGF.ConvertTypeForMem(CGM.getContext().getPointerType(CVD->getType())),		CGF.ConvertTypeForMem(CGM.getContext().getPointerType(CVD->getType())),
CVD->getName() + ".addr");		CVD->getName() + ".addr");
return Address(Addr, Align);		return Address(Addr, Align);
}		}

namespace {
using OMPContextSelectorData =
OpenMPCtxSelectorData<ArrayRef<StringRef>, llvm::APSInt>;
using CompleteOMPContextSelectorData = SmallVector<OMPContextSelectorData, 4>;
} // anonymous namespace

/// Checks current context and returns true if it matches the context selector.
template <OpenMPContextSelectorSetKind CtxSet, OpenMPContextSelectorKind Ctx,
typename... Arguments>
static bool checkContext(const OMPContextSelectorData &Data,
Arguments... Params) {
assert(Data.CtxSet != OMP_CTX_SET_unknown && Data.Ctx != OMP_CTX_unknown &&
"Unknown context selector or context selector set.");
return false;
}

/// Checks for implementation={vendor(<vendor>)} context selector.
/// \returns true iff <vendor>="llvm", false otherwise.
template <>
bool checkContext<OMP_CTX_SET_implementation, OMP_CTX_vendor>(
const OMPContextSelectorData &Data) {
return llvm::all_of(Data.Names,
[](StringRef S) { return !S.compare_lower("llvm"); });
}

/// Checks for device={kind(<kind>)} context selector.
/// \returns true if <kind>="host" and compilation is for host.
/// true if <kind>="nohost" and compilation is for device.
/// true if <kind>="cpu" and compilation is for Arm, X86 or PPC CPU.
/// true if <kind>="gpu" and compilation is for NVPTX or AMDGCN.
/// false otherwise.
template <>
bool checkContext<OMP_CTX_SET_device, OMP_CTX_kind, CodeGenModule &>(
const OMPContextSelectorData &Data, CodeGenModule &CGM) {
for (StringRef Name : Data.Names) {
if (!Name.compare_lower("host")) {
if (CGM.getLangOpts().OpenMPIsDevice)
return false;
continue;
}
if (!Name.compare_lower("nohost")) {
if (!CGM.getLangOpts().OpenMPIsDevice)
return false;
continue;
}
switch (CGM.getTriple().getArch()) {
case llvm::Triple::arm:
case llvm::Triple::armeb:
case llvm::Triple::aarch64:
case llvm::Triple::aarch64_be:
case llvm::Triple::aarch64_32:
case llvm::Triple::ppc:
case llvm::Triple::ppc64:
case llvm::Triple::ppc64le:
case llvm::Triple::x86:
case llvm::Triple::x86_64:
if (Name.compare_lower("cpu"))
return false;
break;
case llvm::Triple::amdgcn:
case llvm::Triple::nvptx:
case llvm::Triple::nvptx64:
if (Name.compare_lower("gpu"))
return false;
break;
case llvm::Triple::UnknownArch:
case llvm::Triple::arc:
case llvm::Triple::avr:
case llvm::Triple::bpfel:
case llvm::Triple::bpfeb:
case llvm::Triple::hexagon:
case llvm::Triple::mips:
case llvm::Triple::mipsel:
case llvm::Triple::mips64:
case llvm::Triple::mips64el:
case llvm::Triple::msp430:
case llvm::Triple::r600:
case llvm::Triple::riscv32:
case llvm::Triple::riscv64:
case llvm::Triple::sparc:
case llvm::Triple::sparcv9:
case llvm::Triple::sparcel:
case llvm::Triple::systemz:
case llvm::Triple::tce:
case llvm::Triple::tcele:
case llvm::Triple::thumb:
case llvm::Triple::thumbeb:
case llvm::Triple::xcore:
case llvm::Triple::le32:
case llvm::Triple::le64:
case llvm::Triple::amdil:
case llvm::Triple::amdil64:
case llvm::Triple::hsail:
case llvm::Triple::hsail64:
case llvm::Triple::spir:
case llvm::Triple::spir64:
case llvm::Triple::kalimba:
case llvm::Triple::shave:
case llvm::Triple::lanai:
case llvm::Triple::wasm32:
case llvm::Triple::wasm64:
case llvm::Triple::renderscript32:
case llvm::Triple::renderscript64:
return false;
}
}
return true;
}

bool matchesContext(CodeGenModule &CGM,
const CompleteOMPContextSelectorData &ContextData) {
for (const OMPContextSelectorData &Data : ContextData) {
switch (Data.Ctx) {
case OMP_CTX_vendor:
assert(Data.CtxSet == OMP_CTX_SET_implementation &&
"Expected implementation context selector set.");
if (!checkContext<OMP_CTX_SET_implementation, OMP_CTX_vendor>(Data))
return false;
break;
case OMP_CTX_kind:
assert(Data.CtxSet == OMP_CTX_SET_device &&
"Expected device context selector set.");
if (!checkContext<OMP_CTX_SET_device, OMP_CTX_kind, CodeGenModule &>(Data,
CGM))
return false;
break;
case OMP_CTX_unknown:
llvm_unreachable("Unknown context selector kind.");
}
}
return true;
}

static CompleteOMPContextSelectorData
translateAttrToContextSelectorData(ASTContext &C,
const OMPDeclareVariantAttr *A) {
CompleteOMPContextSelectorData Data;
for (unsigned I = 0, E = A->scores_size(); I < E; ++I) {
Data.emplace_back();
auto CtxSet = static_cast<OpenMPContextSelectorSetKind>(
*std::next(A->ctxSelectorSets_begin(), I));
auto Ctx = static_cast<OpenMPContextSelectorKind>(
*std::next(A->ctxSelectors_begin(), I));
Data.back().CtxSet = CtxSet;
Data.back().Ctx = Ctx;
const Expr Score = std::next(A->scores_begin(), I);
Data.back().Score = Score->EvaluateKnownConstInt(C);
switch (Ctx) {
case OMP_CTX_vendor:
assert(CtxSet == OMP_CTX_SET_implementation &&
"Expected implementation context selector set.");
Data.back().Names =
llvm::makeArrayRef(A->implVendors_begin(), A->implVendors_end());
break;
case OMP_CTX_kind:
assert(CtxSet == OMP_CTX_SET_device &&
"Expected device context selector set.");
Data.back().Names =
llvm::makeArrayRef(A->deviceKinds_begin(), A->deviceKinds_end());
break;
case OMP_CTX_unknown:
llvm_unreachable("Unknown context selector kind.");
}
}
return Data;
}

static bool isStrictSubset(const CompleteOMPContextSelectorData &LHS,
const CompleteOMPContextSelectorData &RHS) {
llvm::SmallDenseMap<std::pair<int, int>, llvm::StringSet<>, 4> RHSData;
for (const OMPContextSelectorData &D : RHS) {
auto &Pair = RHSData.FindAndConstruct(std::make_pair(D.CtxSet, D.Ctx));
Pair.getSecond().insert(D.Names.begin(), D.Names.end());
}
bool AllSetsAreEqual = true;
for (const OMPContextSelectorData &D : LHS) {
auto It = RHSData.find(std::make_pair(D.CtxSet, D.Ctx));
if (It == RHSData.end())
return false;
if (D.Names.size() > It->getSecond().size())
return false;
if (llvm::set_union(It->getSecond(), D.Names))
return false;
AllSetsAreEqual =
AllSetsAreEqual && (D.Names.size() == It->getSecond().size());
}

return LHS.size() != RHS.size() \|\| !AllSetsAreEqual;
}

static bool greaterCtxScore(const CompleteOMPContextSelectorData &LHS,
const CompleteOMPContextSelectorData &RHS) {
// Score is calculated as sum of all scores + 1.
llvm::APSInt LHSScore(llvm::APInt(64, 1), /isUnsigned=/false);
bool RHSIsSubsetOfLHS = isStrictSubset(RHS, LHS);
if (RHSIsSubsetOfLHS) {
LHSScore = llvm::APSInt::get(0);
} else {
for (const OMPContextSelectorData &Data : LHS) {
if (Data.Score.getBitWidth() > LHSScore.getBitWidth()) {
LHSScore = LHSScore.extend(Data.Score.getBitWidth()) + Data.Score;
} else if (Data.Score.getBitWidth() < LHSScore.getBitWidth()) {
LHSScore += Data.Score.extend(LHSScore.getBitWidth());
} else {
LHSScore += Data.Score;
}
}
}
llvm::APSInt RHSScore(llvm::APInt(64, 1), /isUnsigned=/false);
if (!RHSIsSubsetOfLHS && isStrictSubset(LHS, RHS)) {
RHSScore = llvm::APSInt::get(0);
} else {
for (const OMPContextSelectorData &Data : RHS) {
if (Data.Score.getBitWidth() > RHSScore.getBitWidth()) {
RHSScore = RHSScore.extend(Data.Score.getBitWidth()) + Data.Score;
} else if (Data.Score.getBitWidth() < RHSScore.getBitWidth()) {
RHSScore += Data.Score.extend(RHSScore.getBitWidth());
} else {
RHSScore += Data.Score;
}
}
}
return llvm::APSInt::compareValues(LHSScore, RHSScore) >= 0;
}

/// Finds the variant function that matches current context with its context
/// selector.
static const FunctionDecl *getDeclareVariantFunction(CodeGenModule &CGM,
const FunctionDecl *FD) {
if (!FD->hasAttrs() \|\| !FD->hasAttr<OMPDeclareVariantAttr>())
return FD;
// Iterate through all DeclareVariant attributes and check context selectors.
const OMPDeclareVariantAttr *TopMostAttr = nullptr;
CompleteOMPContextSelectorData TopMostData;
for (const auto *A : FD->specific_attrs<OMPDeclareVariantAttr>()) {
CompleteOMPContextSelectorData Data =
translateAttrToContextSelectorData(CGM.getContext(), A);
if (!matchesContext(CGM, Data))
continue;
// If the attribute matches the context, find the attribute with the highest
// score.
if (!TopMostAttr \|\| !greaterCtxScore(TopMostData, Data)) {
TopMostAttr = A;
TopMostData.swap(Data);
}
}
if (!TopMostAttr)
return FD;
return cast<FunctionDecl>(
cast<DeclRefExpr>(TopMostAttr->getVariantFuncRef()->IgnoreParenImpCasts())
->getDecl());
}

bool CGOpenMPRuntime::emitDeclareVariant(GlobalDecl GD, bool IsForDefinition) {
const auto *D = cast<FunctionDecl>(GD.getDecl());
// If the original function is defined already, use its definition.
StringRef MangledName = CGM.getMangledName(GD);
llvm::GlobalValue *Orig = CGM.GetGlobalValue(MangledName);
if (Orig && !Orig->isDeclaration())
return false;
const FunctionDecl *NewFD = getDeclareVariantFunction(CGM, D);
// Emit original function if it does not have declare variant attribute or the
// context does not match.
if (NewFD == D)
return false;
GlobalDecl NewGD = GD.getWithDecl(NewFD);
if (tryEmitDeclareVariant(NewGD, GD, Orig, IsForDefinition)) {
DeferredVariantFunction.erase(D);
return true;
}
DeferredVariantFunction.insert(std::make_pair(D, std::make_pair(NewGD, GD)));
return true;
}

llvm::Function *CGOpenMPSIMDRuntime::emitParallelOutlinedFunction(		llvm::Function *CGOpenMPSIMDRuntime::emitParallelOutlinedFunction(
const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,		const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {		OpenMPDirectiveKind InnermostKind, const RegionCodeGenTy &CodeGen) {
llvm_unreachable("Not supported in SIMD-only mode");		llvm_unreachable("Not supported in SIMD-only mode");
}		}

llvm::Function *CGOpenMPSIMDRuntime::emitTeamsOutlinedFunction(		llvm::Function *CGOpenMPSIMDRuntime::emitTeamsOutlinedFunction(
const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,		const OMPExecutableDirective &D, const VarDecl *ThreadIDVar,
▲ Show 20 Lines • Show All 284 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines	protected:
bool isDefaultLocationConstant() const override { return true; }		bool isDefaultLocationConstant() const override { return true; }

/// Returns additional flags that can be stored in reserved_2 field of the		/// Returns additional flags that can be stored in reserved_2 field of the
/// default location.		/// default location.
/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +		/// For NVPTX target contains data about SPMD/Non-SPMD execution mode +
/// Full/Lightweight runtime mode. Used for better optimization.		/// Full/Lightweight runtime mode. Used for better optimization.
unsigned getDefaultLocationReserved2Flags() const override;		unsigned getDefaultLocationReserved2Flags() const override;

/// Tries to emit declare variant function for \p OldGD from \p NewGD.
/// \param OrigAddr LLVM IR value for \p OldGD.
/// \param IsForDefinition true, if requested emission for the definition of
/// \p OldGD.
/// \returns true, was able to emit a definition function for \p OldGD, which
/// points to \p NewGD.
/// NVPTX backend does not support global aliases, so just use the function,
/// emitted for \p NewGD instead of \p OldGD.
bool tryEmitDeclareVariant(const GlobalDecl &NewGD, const GlobalDecl &OldGD,
llvm::GlobalValue *OrigAddr,
bool IsForDefinition) override;

public:		public:
explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);		explicit CGOpenMPRuntimeNVPTX(CodeGenModule &CGM);
void clear() override;		void clear() override;

/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32		/// Emit call to void __kmpc_push_proc_bind(ident_t *loc, kmp_int32
/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.		/// global_tid, int proc_bind) to generate code for 'proc_bind' clause.
virtual void emitProcBindClause(CodeGenFunction &CGF,		virtual void emitProcBindClause(CodeGenFunction &CGF,
OpenMPProcBindClauseKind ProcBind,		OpenMPProcBindClauseKind ProcBind,
▲ Show 20 Lines • Show All 287 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

Show First 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines	case EM_NonSPMD:
assert(requiresFullRuntime() && "Expected full runtime.");		assert(requiresFullRuntime() && "Expected full runtime.");
return (~KMP_IDENT_SPMD_MODE) & (~KMP_IDENT_SIMPLE_RT_MODE);		return (~KMP_IDENT_SPMD_MODE) & (~KMP_IDENT_SIMPLE_RT_MODE);
case EM_Unknown:		case EM_Unknown:
return UndefinedMode;		return UndefinedMode;
}		}
llvm_unreachable("Unknown flags are requested.");		llvm_unreachable("Unknown flags are requested.");
}		}

bool CGOpenMPRuntimeNVPTX::tryEmitDeclareVariant(const GlobalDecl &NewGD,
const GlobalDecl &OldGD,
llvm::GlobalValue *OrigAddr,
bool IsForDefinition) {
// Emit the function in OldGD with the body from NewGD, if NewGD is defined.
auto *NewFD = cast<FunctionDecl>(NewGD.getDecl());
if (NewFD->isDefined()) {
CGM.emitOpenMPDeviceFunctionRedefinition(OldGD, NewGD, OrigAddr);
return true;
}
return false;
}

CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)		CGOpenMPRuntimeNVPTX::CGOpenMPRuntimeNVPTX(CodeGenModule &CGM)
: CGOpenMPRuntime(CGM, "_", "$") {		: CGOpenMPRuntime(CGM, "_", "$") {
if (!CGM.getLangOpts().OpenMPIsDevice)		if (!CGM.getLangOpts().OpenMPIsDevice)
llvm_unreachable("OpenMP NVPTX can only handle device code.");		llvm_unreachable("OpenMP NVPTX can only handle device code.");
}		}

void CGOpenMPRuntimeNVPTX::emitProcBindClause(CodeGenFunction &CGF,		void CGOpenMPRuntimeNVPTX::emitProcBindClause(CodeGenFunction &CGF,
OpenMPProcBindClauseKind ProcBind,		OpenMPProcBindClauseKind ProcBind,
▲ Show 20 Lines • Show All 3,283 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenModule.cpp

Show First 20 Lines • Show All 2,528 Lines • ▼ Show 20 Lines	void CodeGenModule::EmitGlobal(GlobalDecl GD) {
// function. If the global must always be emitted, do it eagerly if possible		// function. If the global must always be emitted, do it eagerly if possible
// to benefit from cache locality.		// to benefit from cache locality.
if (MustBeEmitted(Global) && MayBeEmittedEagerly(Global)) {		if (MustBeEmitted(Global) && MayBeEmittedEagerly(Global)) {
// Emit the definition if it can't be deferred.		// Emit the definition if it can't be deferred.
EmitGlobalDefinition(GD);		EmitGlobalDefinition(GD);
return;		return;
}		}

// Check if this must be emitted as declare variant.
if (LangOpts.OpenMP && isa<FunctionDecl>(Global) && OpenMPRuntime &&
OpenMPRuntime->emitDeclareVariant(GD, /IsForDefinition=/false))
return;

// If we're deferring emission of a C++ variable with an		// If we're deferring emission of a C++ variable with an
// initializer, remember the order in which it appeared in the file.		// initializer, remember the order in which it appeared in the file.
if (getLangOpts().CPlusPlus && isa<VarDecl>(Global) &&		if (getLangOpts().CPlusPlus && isa<VarDecl>(Global) &&
cast<VarDecl>(Global)->hasInit()) {		cast<VarDecl>(Global)->hasInit()) {
DelayedCXXInitPosition[Global] = CXXGlobalInits.size();		DelayedCXXInitPosition[Global] = CXXGlobalInits.size();
CXXGlobalInits.push_back(nullptr);		CXXGlobalInits.push_back(nullptr);
}		}

▲ Show 20 Lines • Show All 547 Lines • ▼ Show 20 Lines	if (getLangOpts().OpenMPIsDevice && OpenMPRuntime &&
GDDef = GlobalDecl(CD, GD.getCtorType());		GDDef = GlobalDecl(CD, GD.getCtorType());
else if (const auto *DD = dyn_cast<CXXDestructorDecl>(FDDef))		else if (const auto *DD = dyn_cast<CXXDestructorDecl>(FDDef))
GDDef = GlobalDecl(DD, GD.getDtorType());		GDDef = GlobalDecl(DD, GD.getDtorType());
else		else
GDDef = GlobalDecl(FDDef);		GDDef = GlobalDecl(FDDef);
EmitGlobal(GDDef);		EmitGlobal(GDDef);
}		}
}		}
// Check if this must be emitted as declare variant and emit reference to
// the the declare variant function.
if (LangOpts.OpenMP && OpenMPRuntime)
(void)OpenMPRuntime->emitDeclareVariant(GD, /IsForDefinition=/true);

if (FD->isMultiVersion()) {		if (FD->isMultiVersion()) {
const auto *TA = FD->getAttr<TargetAttr>();		const auto *TA = FD->getAttr<TargetAttr>();
if (TA && TA->isDefaultVersion())		if (TA && TA->isDefaultVersion())
UpdateMultiVersionNames(GD, FD);		UpdateMultiVersionNames(GD, FD);
if (!IsForDefinition)		if (!IsForDefinition)
return GetOrCreateMultiVersionResolver(GD, Ty, FD);		return GetOrCreateMultiVersionResolver(GD, Ty, FD);
}		}
▲ Show 20 Lines • Show All 1,262 Lines • ▼ Show 20 Lines	void CodeGenModule::HandleCXXStaticMemberVarInstantiation(VarDecl *VD) {
if (VD->getDefinition() && TSK == TSK_ExplicitInstantiationDefinition)		if (VD->getDefinition() && TSK == TSK_ExplicitInstantiationDefinition)
GetAddrOfGlobalVar(VD);		GetAddrOfGlobalVar(VD);

EmitTopLevelDecl(VD);		EmitTopLevelDecl(VD);
}		}

void CodeGenModule::EmitGlobalFunctionDefinition(GlobalDecl GD,		void CodeGenModule::EmitGlobalFunctionDefinition(GlobalDecl GD,
llvm::GlobalValue *GV) {		llvm::GlobalValue *GV) {
// Check if this must be emitted as declare variant.
if (LangOpts.OpenMP && OpenMPRuntime &&
OpenMPRuntime->emitDeclareVariant(GD, /IsForDefinition=/true))
return;

const auto *D = cast<FunctionDecl>(GD.getDecl());		const auto *D = cast<FunctionDecl>(GD.getDecl());

// Compute the function info and LLVM type.		// Compute the function info and LLVM type.
const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);		const CGFunctionInfo &FI = getTypes().arrangeGlobalDeclaration(GD);
llvm::FunctionType *Ty = getTypes().GetFunctionType(FI);		llvm::FunctionType *Ty = getTypes().GetFunctionType(FI);

// Get or create the prototype for the function.		// Get or create the prototype for the function.
if (!GV \|\| (GV->getType()->getElementType() != Ty))		if (!GV \|\| (GV->getType()->getElementType() != Ty))
▲ Show 20 Lines • Show All 1,529 Lines • Show Last 20 Lines

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 15,636 Lines • ▼ Show 20 Lines	if (OdrUse == OdrUseContext::Used && !Func->isUsed(/CheckUsedAttr=/false)) {
// parameter types to be complete. Check that now.		// parameter types to be complete. Check that now.
if (funcHasParameterSizeMangling(*this, Func))		if (funcHasParameterSizeMangling(*this, Func))
CheckCompleteParameterTypesForMangler(*this, Func, Loc);		CheckCompleteParameterTypesForMangler(*this, Func, Loc);

Func->markUsed(Context);		Func->markUsed(Context);
}		}

if (LangOpts.OpenMP) {		if (LangOpts.OpenMP) {
markOpenMPDeclareVariantFuncsReferenced(Loc, Func, MightBeOdrUse);
if (LangOpts.OpenMPIsDevice)		if (LangOpts.OpenMPIsDevice)
checkOpenMPDeviceFunction(Loc, Func);		checkOpenMPDeviceFunction(Loc, Func);
else		else
checkOpenMPHostFunction(Loc, Func);		checkOpenMPHostFunction(Loc, Func);
}		}
}		}

/// Directly mark a variable odr-used. Given a choice, prefer to use		/// Directly mark a variable odr-used. Given a choice, prefer to use
▲ Show 20 Lines • Show All 2,402 Lines • Show Last 20 Lines

clang/lib/Sema/SemaLookup.cpp

Show First 20 Lines • Show All 323 Lines • ▼ Show 20 Lines
}		}

bool LookupResult::sanity() const {		bool LookupResult::sanity() const {
// This function is never called by NDEBUG builds.		// This function is never called by NDEBUG builds.
assert(ResultKind != NotFound \|\| Decls.size() == 0);		assert(ResultKind != NotFound \|\| Decls.size() == 0);
assert(ResultKind != Found \|\| Decls.size() == 1);		assert(ResultKind != Found \|\| Decls.size() == 1);
assert(ResultKind != FoundOverloaded \|\| Decls.size() > 1 \|\|		assert(ResultKind != FoundOverloaded \|\| Decls.size() > 1 \|\|
(Decls.size() == 1 &&		(Decls.size() == 1 &&
		(*begin())->getUnderlyingDecl()->getAttr<OMPDeclareVariantAttr>()) \|\|
		(Decls.size() == 1 &&
isa<FunctionTemplateDecl>((*begin())->getUnderlyingDecl())));		isa<FunctionTemplateDecl>((*begin())->getUnderlyingDecl())));
assert(ResultKind != FoundUnresolvedValue \|\| sanityCheckUnresolved());		assert(ResultKind != FoundUnresolvedValue \|\| sanityCheckUnresolved());
assert(ResultKind != Ambiguous \|\| Decls.size() > 1 \|\|		assert(ResultKind != Ambiguous \|\| Decls.size() > 1 \|\|
(Decls.size() == 1 && (Ambiguity == AmbiguousBaseSubobjects \|\|		(Decls.size() == 1 && (Ambiguity == AmbiguousBaseSubobjects \|\|
Ambiguity == AmbiguousBaseSubobjectTypes)));		Ambiguity == AmbiguousBaseSubobjectTypes)));
assert((Paths != nullptr) == (ResultKind == Ambiguous &&		assert((Paths != nullptr) == (ResultKind == Ambiguous &&
(Ambiguity == AmbiguousBaseSubobjectTypes \|\|		(Ambiguity == AmbiguousBaseSubobjectTypes \|\|
Ambiguity == AmbiguousBaseSubobjects)));		Ambiguity == AmbiguousBaseSubobjects)));
▲ Show 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	void LookupResult::resolveKind() {
// If there's a single decl, we need to examine it to decide what		// If there's a single decl, we need to examine it to decide what
// kind of lookup this is.		// kind of lookup this is.
if (N == 1) {		if (N == 1) {
NamedDecl D = (Decls.begin())->getUnderlyingDecl();		NamedDecl D = (Decls.begin())->getUnderlyingDecl();
if (isa<FunctionTemplateDecl>(D))		if (isa<FunctionTemplateDecl>(D))
ResultKind = FoundOverloaded;		ResultKind = FoundOverloaded;
else if (isa<UnresolvedUsingValueDecl>(D))		else if (isa<UnresolvedUsingValueDecl>(D))
ResultKind = FoundUnresolvedValue;		ResultKind = FoundUnresolvedValue;
		else if (auto *OMPVariantAttr = D->getAttr<OMPDeclareVariantAttr>())
		if (OMPVariantAttr->getVariantFuncRef())
		ResultKind = FoundOverloaded;
return;		return;
}		}

// Don't do any extra resolution if we've already resolved as ambiguous.		// Don't do any extra resolution if we've already resolved as ambiguous.
if (ResultKind == Ambiguous) return;		if (ResultKind == Ambiguous) return;

llvm::SmallDenseMap<NamedDecl*, unsigned, 16> Unique;		llvm::SmallDenseMap<NamedDecl*, unsigned, 16> Unique;
llvm::SmallDenseMap<QualType, unsigned, 16> UniqueTypes;		llvm::SmallDenseMap<QualType, unsigned, 16> UniqueTypes;
▲ Show 20 Lines • Show All 5,019 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOpenMP.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,454 Lines • ▼ Show 20 Lines	auto *NewAttr = OMPDeclareVariantAttr::CreateImplicit(
Context, VariantRef, CtxScores.begin(), CtxScores.size(),		Context, VariantRef, CtxScores.begin(), CtxScores.size(),
CtxSets.begin(), CtxSets.size(), Ctxs.begin(), Ctxs.size(),		CtxSets.begin(), CtxSets.size(), Ctxs.begin(), Ctxs.size(),
ImplVendors.begin(), ImplVendors.size(), DeviceKinds.begin(),		ImplVendors.begin(), ImplVendors.size(), DeviceKinds.begin(),
DeviceKinds.size(), SR);		DeviceKinds.size(), SR);
FD->addAttr(NewAttr);		FD->addAttr(NewAttr);
}		}
}		}

void Sema::markOpenMPDeclareVariantFuncsReferenced(SourceLocation Loc,
FunctionDecl *Func,
bool MightBeOdrUse) {
assert(LangOpts.OpenMP && "Expected OpenMP mode.");

if (!Func->isDependentContext() && Func->hasAttrs()) {
for (OMPDeclareVariantAttr *A :
Func->specific_attrs<OMPDeclareVariantAttr>()) {
// TODO: add checks for active OpenMP context where possible.
Expr *VariantRef = A->getVariantFuncRef();
auto *DRE = cast<DeclRefExpr>(VariantRef->IgnoreParenImpCasts());
auto *F = cast<FunctionDecl>(DRE->getDecl());
if (!F->isDefined() && F->isTemplateInstantiation())
InstantiateFunctionDefinition(Loc, F->getFirstDecl());
MarkFunctionReferenced(Loc, F, MightBeOdrUse);
}
}
}

StmtResult Sema::ActOnOpenMPParallelDirective(ArrayRef<OMPClause *> Clauses,		StmtResult Sema::ActOnOpenMPParallelDirective(ArrayRef<OMPClause *> Clauses,
Stmt *AStmt,		Stmt *AStmt,
SourceLocation StartLoc,		SourceLocation StartLoc,
SourceLocation EndLoc) {		SourceLocation EndLoc) {
if (!AStmt)		if (!AStmt)
return StmtError();		return StmtError();

auto *CS = cast<CapturedStmt>(AStmt);		auto *CS = cast<CapturedStmt>(AStmt);
▲ Show 20 Lines • Show All 11,554 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOverload.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===--- SemaOverload.cpp - C++ Overloading -------------------------------===//		//===--- SemaOverload.cpp - C++ Overloading -------------------------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file provides Sema routines for C++ overloading.		// This file provides Sema routines for C++ overloading.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Sema/Overload.h"		#include "clang/Sema/Overload.h"

#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
#include "clang/AST/CXXInheritance.h"		#include "clang/AST/CXXInheritance.h"
#include "clang/AST/DeclObjC.h"		#include "clang/AST/DeclObjC.h"
#include "clang/AST/Expr.h"		#include "clang/AST/Expr.h"
#include "clang/AST/ExprCXX.h"		#include "clang/AST/ExprCXX.h"
#include "clang/AST/ExprObjC.h"		#include "clang/AST/ExprObjC.h"
		#include "clang/AST/StmtOpenMP.h"
#include "clang/AST/TypeOrdering.h"		#include "clang/AST/TypeOrdering.h"
#include "clang/Basic/Diagnostic.h"		#include "clang/Basic/Diagnostic.h"
#include "clang/Basic/DiagnosticOptions.h"		#include "clang/Basic/DiagnosticOptions.h"
#include "clang/Basic/PartialDiagnostic.h"		#include "clang/Basic/PartialDiagnostic.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "clang/Sema/Initialization.h"		#include "clang/Sema/Initialization.h"
#include "clang/Sema/Lookup.h"		#include "clang/Sema/Lookup.h"
#include "clang/Sema/SemaInternal.h"		#include "clang/Sema/SemaInternal.h"
▲ Show 20 Lines • Show All 9,675 Lines • ▼ Show 20 Lines	OverloadCandidateSet::BestViableFunction(Sema &S, SourceLocation Loc,
// Best is the best viable function.		// Best is the best viable function.
if (Best->Function && Best->Function->isDeleted())		if (Best->Function && Best->Function->isDeleted())
return OR_Deleted;		return OR_Deleted;

if (!EquivalentCands.empty())		if (!EquivalentCands.empty())
S.diagnoseEquivalentInternalLinkageDeclarations(Loc, Best->Function,		S.diagnoseEquivalentInternalLinkageDeclarations(Loc, Best->Function,
EquivalentCands);		EquivalentCands);

		FunctionDecl *FD = Best->Function;
		if (!FD \|\| !FD->hasAttrs() \|\| !FD->hasAttr<OMPDeclareVariantAttr>())
		return OR_Success;

		// Iterate through all DeclareVariant attributes and check context selectors.
		const OMPDeclareVariantAttr *BestVariant = nullptr;
		for (const auto *A : FD->specific_attrs<OMPDeclareVariantAttr>())
		BestVariant =
		getBetterOpenMPContextMatch(S.getASTContext(), BestVariant, A);
		if (!BestVariant \|\| !BestVariant->getVariantFuncRef())
		return OR_Success;

		// TODO: Handle template instantiation
		ABataevUnsubmitted Done Reply Inline Actions Implement all todos and check it with the size of the code where you just need to iterate through all the va4iants and call the existing functions to emit their aliases. ABataev: Implement all todos and check it with the size of the code where you just need to iterate…
		jdoerfertAuthorUnsubmitted Done Reply Inline Actions We do not emit aliases at all with this approach. Emitting aliases does not work for the generic case, e.g., the construct selector trait. jdoerfert: We do not emit aliases at all with this approach. Emitting aliases does not work for the…
		ABataevUnsubmitted Not Done Reply Inline Actions Not directly. I know that it won't work for construct, for construct we'll need a little bit different approach but it is not very hard to implement. ABataev: Not directly. I know that it won't work for construct, for construct we'll need a little bit…
		Best->Function = cast<FunctionDecl>(
		cast<DeclRefExpr>(BestVariant->getVariantFuncRef()->IgnoreParenImpCasts())
		->getDecl());
return OR_Success;		return OR_Success;
}		}

namespace {		namespace {

enum OverloadCandidateKind {		enum OverloadCandidateKind {
oc_function,		oc_function,
oc_method,		oc_method,
▲ Show 20 Lines • Show All 4,787 Lines • Show Last 20 Lines

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp

Show First 20 Lines • Show All 342 Lines • ▼ Show 20 Lines	static void instantiateOMPDeclareSimdDeclAttr(
}		}
LinModifiers.append(Attr.modifiers_begin(), Attr.modifiers_end());		LinModifiers.append(Attr.modifiers_begin(), Attr.modifiers_end());
(void)S.ActOnOpenMPDeclareSimdDirective(		(void)S.ActOnOpenMPDeclareSimdDirective(
S.ConvertDeclToDeclGroup(New), Attr.getBranchState(), Simdlen.get(),		S.ConvertDeclToDeclGroup(New), Attr.getBranchState(), Simdlen.get(),
Uniforms, Aligneds, Alignments, Linears, LinModifiers, Steps,		Uniforms, Aligneds, Alignments, Linears, LinModifiers, Steps,
Attr.getRange());		Attr.getRange());
}		}

/// Instantiation of 'declare variant' attribute and its arguments.
static void instantiateOMPDeclareVariantAttr(
Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs,
const OMPDeclareVariantAttr &Attr, Decl *New) {
// Allow 'this' in clauses with varlists.
if (auto *FTD = dyn_cast<FunctionTemplateDecl>(New))
New = FTD->getTemplatedDecl();
auto *FD = cast<FunctionDecl>(New);
auto *ThisContext = dyn_cast_or_null<CXXRecordDecl>(FD->getDeclContext());

auto &&SubstExpr = [FD, ThisContext, &S, &TemplateArgs](Expr *E) {
if (auto *DRE = dyn_cast<DeclRefExpr>(E->IgnoreParenImpCasts()))
if (auto *PVD = dyn_cast<ParmVarDecl>(DRE->getDecl())) {
Sema::ContextRAII SavedContext(S, FD);
LocalInstantiationScope Local(S);
if (FD->getNumParams() > PVD->getFunctionScopeIndex())
Local.InstantiatedLocal(
PVD, FD->getParamDecl(PVD->getFunctionScopeIndex()));
return S.SubstExpr(E, TemplateArgs);
}
Sema::CXXThisScopeRAII ThisScope(S, ThisContext, Qualifiers(),
FD->isCXXInstanceMember());
return S.SubstExpr(E, TemplateArgs);
};

// Substitute a single OpenMP clause, which is a potentially-evaluated
// full-expression.
auto &&Subst = [&SubstExpr, &S](Expr *E) {
EnterExpressionEvaluationContext Evaluated(
S, Sema::ExpressionEvaluationContext::PotentiallyEvaluated);
ExprResult Res = SubstExpr(E);
if (Res.isInvalid())
return Res;
return S.ActOnFinishFullExpr(Res.get(), false);
};

ExprResult VariantFuncRef;
if (Expr *E = Attr.getVariantFuncRef())
VariantFuncRef = Subst(E);

// Check function/variant ref.
Optional<std::pair<FunctionDecl , Expr >> DeclVarData =
S.checkOpenMPDeclareVariantFunction(
S.ConvertDeclToDeclGroup(New), VariantFuncRef.get(), Attr.getRange());
if (!DeclVarData)
return;
SmallVector<Sema::OMPCtxSelectorData, 4> Data;
for (unsigned I = 0, E = Attr.scores_size(); I < E; ++I) {
ExprResult Score;
if (Expr E = std::next(Attr.scores_begin(), I))
Score = Subst(E);
// Instantiate the attribute.
auto CtxSet = static_cast<OpenMPContextSelectorSetKind>(
*std::next(Attr.ctxSelectorSets_begin(), I));
auto Ctx = static_cast<OpenMPContextSelectorKind>(
*std::next(Attr.ctxSelectors_begin(), I));
switch (CtxSet) {
case OMP_CTX_SET_implementation:
switch (Ctx) {
case OMP_CTX_vendor:
Data.emplace_back(CtxSet, Ctx, Score, Attr.implVendors());
break;
case OMP_CTX_kind:
case OMP_CTX_unknown:
llvm_unreachable("Unexpected context selector kind.");
}
break;
case OMP_CTX_SET_device:
switch (Ctx) {
case OMP_CTX_kind:
Data.emplace_back(CtxSet, Ctx, Score, Attr.deviceKinds());
break;
case OMP_CTX_vendor:
case OMP_CTX_unknown:
llvm_unreachable("Unexpected context selector kind.");
}
break;
case OMP_CTX_SET_unknown:
llvm_unreachable("Unexpected context selector set kind.");
}
}
S.ActOnOpenMPDeclareVariantDirective(DeclVarData.getValue().first,
DeclVarData.getValue().second,
Attr.getRange(), Data);
}

static void instantiateDependentAMDGPUFlatWorkGroupSizeAttr(		static void instantiateDependentAMDGPUFlatWorkGroupSizeAttr(
Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs,		Sema &S, const MultiLevelTemplateArgumentList &TemplateArgs,
const AMDGPUFlatWorkGroupSizeAttr &Attr, Decl *New) {		const AMDGPUFlatWorkGroupSizeAttr &Attr, Decl *New) {
// Both min and max expression are constant expressions.		// Both min and max expression are constant expressions.
EnterExpressionEvaluationContext Unevaluated(		EnterExpressionEvaluationContext Unevaluated(
S, Sema::ExpressionEvaluationContext::ConstantEvaluated);		S, Sema::ExpressionEvaluationContext::ConstantEvaluated);

ExprResult Result = S.SubstExpr(Attr.getMin(), TemplateArgs);		ExprResult Result = S.SubstExpr(Attr.getMin(), TemplateArgs);
▲ Show 20 Lines • Show All 141 Lines • ▼ Show 20 Lines	if (const auto *Mode = dyn_cast<ModeAttr>(TmplAttr)) {
continue;		continue;
}		}

if (const auto *OMPAttr = dyn_cast<OMPDeclareSimdDeclAttr>(TmplAttr)) {		if (const auto *OMPAttr = dyn_cast<OMPDeclareSimdDeclAttr>(TmplAttr)) {
instantiateOMPDeclareSimdDeclAttr(this, TemplateArgs, OMPAttr, New);		instantiateOMPDeclareSimdDeclAttr(this, TemplateArgs, OMPAttr, New);
continue;		continue;
}		}

if (const auto *OMPAttr = dyn_cast<OMPDeclareVariantAttr>(TmplAttr)) {
instantiateOMPDeclareVariantAttr(this, TemplateArgs, OMPAttr, New);
continue;
}

if (const auto *AMDGPUFlatWorkGroupSize =		if (const auto *AMDGPUFlatWorkGroupSize =
dyn_cast<AMDGPUFlatWorkGroupSizeAttr>(TmplAttr)) {		dyn_cast<AMDGPUFlatWorkGroupSizeAttr>(TmplAttr)) {
instantiateDependentAMDGPUFlatWorkGroupSizeAttr(		instantiateDependentAMDGPUFlatWorkGroupSizeAttr(
this, TemplateArgs, AMDGPUFlatWorkGroupSize, New);		this, TemplateArgs, AMDGPUFlatWorkGroupSize, New);
}		}

if (const auto *AMDGPUFlatWorkGroupSize =		if (const auto *AMDGPUFlatWorkGroupSize =
dyn_cast<AMDGPUWavesPerEUAttr>(TmplAttr)) {		dyn_cast<AMDGPUWavesPerEUAttr>(TmplAttr)) {
▲ Show 20 Lines • Show All 5,128 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_variant_ast_print.cpp

	Show All 34 Lines
	#pragma omp declare variant(foofoo <T>) match(xxx = {})			#pragma omp declare variant(foofoo <T>) match(xxx = {})
	#pragma omp declare variant(foofoo <T>) match(xxx = {vvv})			#pragma omp declare variant(foofoo <T>) match(xxx = {vvv})
	#pragma omp declare variant(foofoo <T>) match(user = {score(<expr>) : condition(<expr>)})			#pragma omp declare variant(foofoo <T>) match(user = {score(<expr>) : condition(<expr>)})
	#pragma omp declare variant(foofoo <T>) match(user = {score(<expr>) : condition(<expr>)})			#pragma omp declare variant(foofoo <T>) match(user = {score(<expr>) : condition(<expr>)})
	#pragma omp declare variant(foofoo <T>) match(user = {condition(<expr>)})			#pragma omp declare variant(foofoo <T>) match(user = {condition(<expr>)})
	#pragma omp declare variant(foofoo <T>) match(user = {condition(<expr>)})			#pragma omp declare variant(foofoo <T>) match(user = {condition(<expr>)})
	#pragma omp declare variant(foofoo <T>) match(implementation={vendor(llvm)},device={kind(cpu)})			#pragma omp declare variant(foofoo <T>) match(implementation={vendor(llvm)},device={kind(cpu)})
	#pragma omp declare variant(foofoo <T>) match(implementation={vendor(unknown)})			#pragma omp declare variant(foofoo <T>) match(implementation={vendor(unknown)})
				// TODO: Handle template instantiation
	#pragma omp declare variant(foofoo <T>) match(implementation={vendor(score(C+5): ibm, xxx, ibm)},device={kind(cpu,host)})			#pragma omp declare variant(foofoo <T>) match(implementation={vendor(score(C+5): ibm, xxx, ibm)},device={kind(cpu,host)})
	template <typename T, int C>			template <typename T, int C>
	T barbar();			T barbar();

	// CHECK: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(3 + 5):ibm, xxx)},device={kind(cpu, host)})			// CHECK: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(3 + 5):ibm, xxx)},device={kind(cpu, host)})
	// CHECK-NEXT: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(0):unknown)})			// CHECK-NEXT: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(0):unknown)})
	// CHECK-NEXT: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(0):llvm)},device={kind(cpu)})			// CHECK-NEXT: #pragma omp declare variant(foofoo<int>) match(implementation={vendor(score(0):llvm)},device={kind(cpu)})
	// CHECK-NEXT: template<> int barbar<int, 3>();			// CHECK-NEXT: template<> int barbar<int, 3>();

	// CHECK-NEXT: int baz() {			// CHECK-NEXT: int baz() {
	// CHECK-NEXT: return barbar<int, 3>();			// CHECK-NEXT: return foofoo<int, 3>();
	// CHECK-NEXT: }			// CHECK-NEXT: }
	int baz() {			int baz() {
	return barbar<int, 3>();			return barbar<int, 3>();
	}			}

	// CHECK: template <class C> void h_ref(C hp, C hp2, C hq, C lin) {			// CHECK: template <class C> void h_ref(C hp, C hp2, C hq, C lin) {
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: template<> void h_ref<double>(double hp, double hp2, double hq, double lin) {			// CHECK-NEXT: template<> void h_ref<double>(double hp, double hp2, double hq, double lin) {
	▲ Show 20 Lines • Show All 149 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_variant_device_kind_codegen.cpp

	Show All 35 Lines
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -fopenmp-targets=ppc64le-unknown-linux -emit-llvm-bc %s -o %t-host.bc -DNOHOST			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -fopenmp-targets=ppc64le-unknown-linux -emit-llvm-bc %s -o %t-host.bc -DNOHOST
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - -DNOHOST \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -o - -DNOHOST \| FileCheck %s
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -emit-pch -o %t -DNOHOST			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -emit-pch -o %t -DNOHOST
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -include-pch %t -o - -DNOHOST \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple ppc64le-unknown-linux -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-host.bc -include-pch %t -o - -DNOHOST \| FileCheck %s

	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{1\|4\|81\|84}}
	// CHECK-DAG: @_Z3barv = {{.}}alias i32 (), i32 () @_Z3foov
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN12SpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @_Z5prio_v = {{.}}alias i32 (), i32 () @_Z5prio1v
	// CHECK-DAG: @_ZL6prio1_v = internal alias i32 (), i32 ()* @_ZL5prio2v
	// CHECK-DAG: @_Z4callv = {{.}}alias i32 (), i32 () @_Z4testv
	// CHECK-DAG: @_ZL9stat_usedv = internal alias i32 (), i32 ()* @_ZL10stat_used_v
	// CHECK-DAG: @_ZN12SpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @fn_linkage = {{.}}alias i32 (), i32 () @_Z18fn_linkage_variantv
	// CHECK-DAG: @_Z11fn_linkage1v = {{.}}alias i32 (), i32 () @fn_linkage_variant1
	// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()			// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()
	// CHECK-DAG: declare {{.*}}i32 @_Z3bazv()			// CHECK-DAG: define {{.*}}i32 @_Z3bazv()
	// CHECK-DAG: ret i32 2			// CHECK-DAG: ret i32 2
	// CHECK-DAG: ret i32 3			// CHECK-DAG: ret i32 3
	// CHECK-DAG: ret i32 5			// CHECK-DAG: ret i32 5
	// CHECK-DAG: ret i32 6			// CHECK-DAG: ret i32 6
	// CHECK-DAG: ret i32 7			// CHECK-DAG: ret i32 7
	// CHECK-DAG: ret i32 82			// CHECK-DAG: ret i32 82
	// CHECK-DAG: ret i32 83			// CHECK-DAG: ret i32 83
	// CHECK-DAG: ret i32 85			// CHECK-DAG: ret i32 85
	// CHECK-DAG: ret i32 86			// CHECK-DAG: ret i32 86
	// CHECK-DAG: ret i32 87			// CHECK-DAG: ret i32 87
	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{4\|81\|84}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	#pragma omp declare target			#pragma omp declare target
	#ifdef HOST			#ifdef HOST
	#define CORRECT host			#define CORRECT host
	#define SUBSET host, cpu			#define SUBSET host, cpu
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_variant_implementation_vendor_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple %itanium_abi_triple -emit-llvm %s -fexceptions -fcxx-exceptions -o - -fsanitize-address-use-after-scope \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple %itanium_abi_triple -emit-llvm %s -fexceptions -fcxx-exceptions -o - -fsanitize-address-use-after-scope \| FileCheck %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple %itanium_abi_triple -fexceptions -fcxx-exceptions -emit-pch -o %t -fopenmp-version=50 %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple %itanium_abi_triple -fexceptions -fcxx-exceptions -emit-pch -o %t -fopenmp-version=50 %s
	// RUN: %clang_cc1 -fopenmp -x c++ -triple %itanium_abi_triple -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - -fopenmp-version=50 \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -triple %itanium_abi_triple -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - -fopenmp-version=50 \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{1\|4\|81\|84}}
	// CHECK-DAG: @_Z3barv = {{.}}alias i32 (), i32 () @_Z3foov
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN12SpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @_Z5prio_v = {{.}}alias i32 (), i32 () @_Z5prio1v
	// CHECK-DAG: @_ZL6prio1_v = internal alias i32 (), i32 ()* @_ZL5prio2v
	// CHECK-DAG: @_Z4callv = {{.}}alias i32 (), i32 () @_Z4testv
	// CHECK-DAG: @_ZL9stat_usedv = internal alias i32 (), i32 ()* @_ZL10stat_used_v
	// CHECK-DAG: @_ZN12SpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @fn_linkage = {{.}}alias i32 (), i32 () @_Z18fn_linkage_variantv
	// CHECK-DAG: @_Z11fn_linkage1v = {{.}}alias i32 (), i32 () @fn_linkage_variant1
	// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()			// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()
	// CHECK-DAG: declare {{.*}}i32 @_Z3bazv()			// CHECK-DAG: define {{.*}}i32 @_Z3bazv()
	// CHECK-DAG: ret i32 2			// CHECK-DAG: ret i32 2
	// CHECK-DAG: ret i32 3			// CHECK-DAG: ret i32 3
	// CHECK-DAG: ret i32 5			// CHECK-DAG: ret i32 5
	// CHECK-DAG: ret i32 6			// CHECK-DAG: ret i32 6
	// CHECK-DAG: ret i32 7			// CHECK-DAG: ret i32 7
	// CHECK-DAG: ret i32 82			// CHECK-DAG: ret i32 82
	// CHECK-DAG: ret i32 83			// CHECK-DAG: ret i32 83
	// CHECK-DAG: ret i32 85			// CHECK-DAG: ret i32 85
	// CHECK-DAG: ret i32 86			// CHECK-DAG: ret i32 86
	// CHECK-DAG: ret i32 87			// CHECK-DAG: ret i32 87
	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{4\|81\|84}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	int foo() { return 2; }			int foo() { return 2; }

	#pragma omp declare variant(foo) match(implementation = {vendor(llvm)})			#pragma omp declare variant(foo) match(implementation = {vendor(llvm)})
	int bar() { return 1; }			int bar() { return 1; }
	▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_variant_mixed_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux -emit-llvm %s -fexceptions -fcxx-exceptions -o - -fsanitize-address-use-after-scope \| FileCheck %s			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-linux -emit-llvm %s -fexceptions -fcxx-exceptions -o - -fsanitize-address-use-after-scope \| FileCheck %s
	// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -emit-pch -o %t -fopenmp-version=50 %s			// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -emit-pch -o %t -fopenmp-version=50 %s
	// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - -fopenmp-version=50 \| FileCheck %s			// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-linux -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - -fopenmp-version=50 \| FileCheck %s
	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{1\|4\|81\|84}}
	// CHECK-DAG: @_Z3barv = {{.}}alias i32 (), i32 () @_Z3foov
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN16SpecSpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecSpecialFuncs), i32 (%struct.SpecSpecialFuncs) @_ZN16SpecSpecialFuncs7method_Ev
	// CHECK-DAG: @_ZN12SpecialFuncs6methodEv = linkonce_odr {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @_Z5prio_v = {{.}}alias i32 (), i32 () @_Z5prio1v
	// CHECK-DAG: @_ZL6prio1_v = internal alias i32 (), i32 ()* @_ZL5prio2v
	// CHECK-DAG: @_Z4callv = {{.}}alias i32 (), i32 () @_Z4testv
	// CHECK-DAG: @_ZL9stat_usedv = internal alias i32 (), i32 ()* @_ZL10stat_used_v
	// CHECK-DAG: @_ZN12SpecialFuncs6MethodEv = {{.}}alias i32 (%struct.SpecialFuncs), i32 (%struct.SpecialFuncs) @_ZN12SpecialFuncs7method_Ev
	// CHECK-DAG: @fn_linkage = {{.}}alias i32 (), i32 () @_Z18fn_linkage_variantv
	// CHECK-DAG: @_Z11fn_linkage1v = {{.}}alias i32 (), i32 () @fn_linkage_variant1
	// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()			// CHECK-DAG: declare {{.*}}i32 @_Z5bazzzv()
	// CHECK-DAG: declare {{.*}}i32 @_Z3bazv()			// CHECK-DAG: define {{.*}}i32 @_Z3bazv()
	// CHECK-DAG: ret i32 2			// CHECK-DAG: ret i32 2
	// CHECK-DAG: ret i32 3			// CHECK-DAG: ret i32 3
	// CHECK-DAG: ret i32 5			// CHECK-DAG: ret i32 5
	// CHECK-DAG: ret i32 6			// CHECK-DAG: ret i32 6
	// CHECK-DAG: ret i32 7			// CHECK-DAG: ret i32 7
	// CHECK-DAG: ret i32 82			// CHECK-DAG: ret i32 82
	// CHECK-DAG: ret i32 83			// CHECK-DAG: ret i32 83
	// CHECK-DAG: ret i32 85			// CHECK-DAG: ret i32 85
	// CHECK-DAG: ret i32 86			// CHECK-DAG: ret i32 86
	// CHECK-DAG: ret i32 87			// CHECK-DAG: ret i32 87
	// CHECK-NOT: ret i32 {{1\|4\|81\|84}}			// CHECK-NOT: ret i32 {{4\|81\|84}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	int foo() { return 2; }			int foo() { return 2; }

	#pragma omp declare variant(foo) match(implementation = {vendor(llvm)}, device={kind(cpu)})			#pragma omp declare variant(foo) match(implementation = {vendor(llvm)}, device={kind(cpu)})
	int bar() { return 1; }			int bar() { return 1; }
	▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_declare_variant_device_kind_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50 -DGPU			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50 -DGPU
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 -DGPU \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 -DGPU \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50 -DGPU			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50 -DGPU
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 -DGPU \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 -DGPU \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50 -DNOHOST			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50 -DNOHOST
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 -DNOHOST \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 -DNOHOST \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50 -DNOHOST			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50 -DNOHOST
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 -DNOHOST \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 -DNOHOST \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'
	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-NOT: ret i32 {{1\|81\|84}}			// CHECK-NOT: ret i32 {{1\|81\|84}}
	// CHECK-DAG: define {{.*}}i32 @_Z3barv()			// CHECK-DAG: define {{.*}}i32 @_Z3barv()
	// CHECK-DAG: define {{.}}i32 @_ZN16SpecSpecialFuncs6MethodEv(%struct.SpecSpecialFuncs %{{.+}})			// CHECK-DAG: define {{.}}i32 @_ZN16SpecSpecialFuncs6MethodEv(%struct.SpecSpecialFuncs %{{.+}})
	// CHECK-DAG: define {{.}}i32 @_ZN12SpecialFuncs6MethodEv(%struct.SpecialFuncs %{{.+}})			// CHECK-DAG: define {{.}}i32 @_ZN12SpecialFuncs6MethodEv(%struct.SpecialFuncs %{{.+}})
	// CHECK-DAG: define linkonce_odr {{.}}i32 @_ZN16SpecSpecialFuncs6methodEv(%struct.SpecSpecialFuncs %{{.+}})
	// CHECK-DAG: define linkonce_odr {{.}}i32 @_ZN12SpecialFuncs6methodEv(%struct.SpecialFuncs %{{.+}})
	// CHECK-DAG: define {{.*}}i32 @_Z5prio_v()			// CHECK-DAG: define {{.*}}i32 @_Z5prio_v()
	// CHECK-DAG: define internal i32 @_ZL6prio1_v()
	// CHECK-DAG: define {{.*}}i32 @_Z4callv()			// CHECK-DAG: define {{.*}}i32 @_Z4callv()
	// CHECK-DAG: define internal i32 @_ZL9stat_usedv()
	// CHECK-DAG: define {{.*}}i32 @fn_linkage()			// CHECK-DAG: define {{.*}}i32 @fn_linkage()
	// CHECK-DAG: define {{.*}}i32 @_Z11fn_linkage1v()			// CHECK-DAG: define {{.*}}i32 @_Z11fn_linkage1v()

	// CHECK-DAG: ret i32 2			// CHECK-DAG: ret i32 2
	// CHECK-DAG: ret i32 3			// CHECK-DAG: ret i32 3
	// CHECK-DAG: ret i32 4
	// CHECK-DAG: ret i32 5
	// CHECK-DAG: ret i32 6			// CHECK-DAG: ret i32 6
	// CHECK-DAG: ret i32 7			// CHECK-DAG: ret i32 7
	// CHECK-DAG: ret i32 82
	// CHECK-DAG: ret i32 83			// CHECK-DAG: ret i32 83
	// CHECK-DAG: ret i32 85
	// CHECK-DAG: ret i32 86
	// CHECK-DAG: ret i32 87			// CHECK-DAG: ret i32 87

	// Outputs for function members			// Outputs for function members
	// CHECK-DAG: ret i32 6			// CHECK-NOT: ret i32 {{81\|84}}
	// CHECK-DAG: ret i32 7
	// CHECK-NOT: ret i32 {{1\|81\|84}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	#ifdef GPU			#ifdef GPU
	#define CORRECT gpu			#define CORRECT gpu
	#define SUBSET nohost, gpu			#define SUBSET nohost, gpu
	#define WRONG cpu, gpu			#define WRONG cpu, gpu
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

clang/test/OpenMP/nvptx_declare_variant_implementation_vendor_codegen.cpp

	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple powerpc64le-unknown-unknown -fopenmp-targets=nvptx64-nvidia-cuda -emit-llvm-bc %s -o %t-ppc-host.bc -fopenmp-version=50
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -o - -fopenmp-version=50 \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -emit-pch -o %t -fopenmp-version=50
	// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'			// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple nvptx64-unknown-unknown -aux-triple powerpc64le-unknown-unknown -emit-llvm %s -fopenmp-is-device -fopenmp-host-ir-file-path %t-ppc-host.bc -include-pch %t -o - -fopenmp-version=50 \| FileCheck %s --implicit-check-not='ret i32 {{1\|81\|84}}'
	// expected-no-diagnostics			// expected-no-diagnostics

	// CHECK-NOT: ret i32 {{1\|81\|84}}			// CHECK-NOT: ret i32 {{1\|81\|84}}
	// CHECK-DAG: define {{.*}}i32 @_Z3barv()			// CHECK-DAG: define {{.*}}i32 @_Z3barv()
	// CHECK-DAG: define {{.}}i32 @_ZN16SpecSpecialFuncs6MethodEv(%struct.SpecSpecialFuncs %{{.+}})			// CHECK-DAG: define {{.}}i32 @_ZN16SpecSpecialFuncs6MethodEv(%struct.SpecSpecialFuncs %{{.+}})
	// CHECK-DAG: define {{.}}i32 @_ZN12SpecialFuncs6MethodEv(%struct.SpecialFuncs %{{.+}})			// CHECK-DAG: define {{.}}i32 @_ZN12SpecialFuncs6MethodEv(%struct.SpecialFuncs %{{.+}})
	// CHECK-DAG: define linkonce_odr {{.}}i32 @_ZN16SpecSpecialFuncs6methodEv(%struct.SpecSpecialFuncs %{{.+}})
	// CHECK-DAG: define linkonce_odr {{.}}i32 @_ZN12SpecialFuncs6methodEv(%struct.SpecialFuncs %{{.+}})
	// CHECK-DAG: define {{.*}}i32 @_Z5prio_v()			// CHECK-DAG: define {{.*}}i32 @_Z5prio_v()
	// CHECK-DAG: define internal i32 @_ZL6prio1_v()
	// CHECK-DAG: define {{.*}}i32 @_Z4callv()			// CHECK-DAG: define {{.*}}i32 @_Z4callv()
	// CHECK-DAG: define internal i32 @_ZL9stat_usedv()
	// CHECK-DAG: define {{.*}}i32 @fn_linkage()			// CHECK-DAG: define {{.*}}i32 @fn_linkage()
	// CHECK-DAG: define {{.*}}i32 @_Z11fn_linkage1v()			// CHECK-DAG: define {{.*}}i32 @_Z11fn_linkage1v()

	// CHECK-DAG: ret i32 2			// CHECK-DAG: ret i32 2
	// CHECK-DAG: ret i32 3			// CHECK-DAG: ret i32 3
	// CHECK-DAG: ret i32 4
	// CHECK-DAG: ret i32 5
	// CHECK-DAG: ret i32 6			// CHECK-DAG: ret i32 6
	// CHECK-DAG: ret i32 7			// CHECK-DAG: ret i32 7
	// CHECK-DAG: ret i32 82
	// CHECK-DAG: ret i32 83			// CHECK-DAG: ret i32 83
	// CHECK-DAG: ret i32 85
	// CHECK-DAG: ret i32 86
	// CHECK-DAG: ret i32 87			// CHECK-DAG: ret i32 87

	// Outputs for function members			// Outputs for function members
	// CHECK-DAG: ret i32 6			// CHECK-NOT: ret i32 {{81\|84}}
	// CHECK-DAG: ret i32 7
	// CHECK-NOT: ret i32 {{1\|81\|84}}

	#ifndef HEADER			#ifndef HEADER
	#define HEADER			#define HEADER

	int foo() { return 2; }			int foo() { return 2; }
	int bazzz();			int bazzz();
	int test();			int test();
	static int stat_unused_();			static int stat_unused_();
	▲ Show 20 Lines • Show All 115 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][WIP] Use overload centric declare variantsAbandonedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 233011

clang/include/clang/AST/StmtOpenMP.h

clang/include/clang/Sema/Sema.h

clang/lib/AST/Expr.cpp

clang/lib/AST/StmtOpenMP.cpp

clang/lib/CodeGen/CGOpenMPRuntime.h

clang/lib/CodeGen/CGOpenMPRuntime.cpp

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.h

clang/lib/CodeGen/CGOpenMPRuntimeNVPTX.cpp

clang/lib/CodeGen/CodeGenModule.cpp

clang/lib/Sema/SemaExpr.cpp

clang/lib/Sema/SemaLookup.cpp

clang/lib/Sema/SemaOpenMP.cpp

clang/lib/Sema/SemaOverload.cpp

clang/lib/Sema/SemaTemplateInstantiateDecl.cpp

clang/test/OpenMP/declare_variant_ast_print.cpp

clang/test/OpenMP/declare_variant_device_kind_codegen.cpp

clang/test/OpenMP/declare_variant_implementation_vendor_codegen.cpp

clang/test/OpenMP/declare_variant_mixed_codegen.cpp

clang/test/OpenMP/nvptx_declare_variant_device_kind_codegen.cpp

clang/test/OpenMP/nvptx_declare_variant_implementation_vendor_codegen.cpp

[OpenMP][WIP] Use overload centric declare variants
AbandonedPublic