Page MenuHomePhabricator

[OpenMP][WIP] Use overload centric declare variants
AbandonedPublic

Authored by jdoerfert on Dec 10 2019, 1:05 AM.

Details

Summary

Instead of going through a custom overload resolution twice, we can use
the existing one.

TODO:

The tests need updating, they checked for functions that shouldn't
  have been emitted and the way they check is hard to update.
There is a TODO in the code we need to fix (see below).

Diff Detail

Unit TestsFailed

TimeTest
1,150 msClang.OpenMP::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/lib/clang/10.0.0/include -nostdsysteminc -verify -fopenmp -x c++ -std=c++14 -fexceptions -fcxx-exceptions /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/OpenMP/declare_variant_ast_print.cpp -ast-print -o - -Wno-source-uses-openmp -Wno-openmp-clauses | /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/build/bin/FileCheck /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/clang/test/OpenMP/declare_variant_ast_print.cpp

Event Timeline

jdoerfert created this revision.Dec 10 2019, 1:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 1:05 AM
rampitec removed a subscriber: rampitec.Dec 10 2019, 1:12 AM

Build result: fail - 60666 tests passed, 1 failed and 726 were skipped.

failed: Clang.OpenMP/declare_variant_ast_print.cpp

Log files: console-log.txt, CMakeCache.txt

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

clang/lib/Sema/SemaOverload.cpp
9725

Implement all todos and check it with the size of the code where you just need to iterate through all the va4iants and call the existing functions to emit their aliases.

jdoerfert marked 2 inline comments as done.Dec 10 2019, 9:06 AM

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here. What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

clang/lib/Sema/SemaOverload.cpp
9725

We do not emit aliases at all with this approach. Emitting aliases does not work for the generic case, e.g., the construct selector trait.

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

clang/lib/Sema/SemaOverload.cpp
9725

Not directly. I know that it won't work for construct, for construct we'll need a little bit different approach but it is not very hard to implement.

jdoerfert marked an inline comment as done.Dec 10 2019, 11:19 AM

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Why is any of this complicated to you? The logic to do the overloading is 15 lines long and most of it is to determine the best of all versions. What about that is more complicated than having multiple patch points during code generation in which we try to modify existing IR but sometimes have to delay it and hope it'll work at the end.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're merging different functions as multiversiin variants. I don't think this right to overcomplicate the semantics of multiversion functions just because you want to do it.

I am actually not doing that here.

You do this when tries to resolve the overloading though it is absolutely not required. You can easily implement it at the codegen phase (it is implemented already, actually). Because you don't need to resolve the overloads, it is resolved already by sema. hat you need to do is to select the correct version of the function and that's it. If you have global traits only, you emit alias. If you have local traits (like construct), you use the address of the best variant function directly. And no need to worry about templates, overloading resolution etc. Plus handling for the corner cases and future changes.

In your solution, you're actually not using mutiversioning at all, you use just one feature from the multiversioning - handling of multiple definitions of the same function. Nothing else. I'm saying that it is better to modify slightly the codegen because there you have to deal with the C-like constrcuts, where you don't need to worry about most of the problematic c++ features. But you insist on moving of all this stuff to Sema and overcomplicate the things.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

What over complication do you mean exactly? Especially because this patch does not touch multi-version functions at all I'm confused by your comment.

Handling of templates, for example. Plus, mixing different functions (with different names). You have it when you try to resolve overloadings though, actually, we don't need to do it, we can easily do this at the codegen.

Why is any of this complicated to you? The logic to do the overloading is 15 lines long and most of it is to determine the best of all versions. What about that is more complicated than having multiple patch points during code generation in which we try to modify existing IR but sometimes have to delay it and hope it'll work at the end.

Without templates etc. Early resolution of the variant function leads to problems with AST at least. Plus, as I said, problems with handling complex features of C++.

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

What's not appropriate here?

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Alexey, again, this kind of comment is not appropriate. We're all experienced developers here, and we all understand the importance of tooling support in Clang. We also serve developers who write tools using AST matchers and other Clang analysis facilities. Having the resolved callee represented in the AST for what looks like a static call from the base-language perspective makes a lot of sense from a tooling perspective. When performing static analysis on the code, forcing a tool to understand how OpenMP variant selectors work in order to perform inter-procedural static analysis is suboptimal in nearly all cases. It is also true that we might want the base callee represented in some way, but as that callee is never actually called, and one of the variants is always called at that call site, it is important that IPA propagate information into and out of the correct callee in order to produce the correct results. If we currently represent the base callee as the callee that will appear in the call graph, that's a bug: Clang's static analyzer will produce incorrect results.

If you know of specific tools that indeed depend on the current behavior to produce correct results, please provide us with details on what they're doing so that we understand the use cases. Regardless, we should prioritize correct-by-default functioning of AST-based call graphs and their associated static analysis.

  1. This is significant issue that you're modifiy the original user code.
  2. It may have some troubles with serialization/deserialization probably. Plus, I just don't understand why it is good to rewrite the code for the user but to keep the original code is bad.

There can be another one issue with this solution with inline assembly. I’m not completely sure about it, will try to investigate it tomorrow. I suggest to discuss this solution with Richard Smith (or John McCall). If he/they are ok with this transformation of the AST, we can switch to this scheme.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

ABataev added a comment.EditedDec 10 2019, 5:30 PM

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

It is not quite so. Constexprs are not evaluated in sema. You can dump the ast and you will find all these constexprs in their original form. They are evaluated by the interpreter in place where it is required. But AST remains unaffected.

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.
  2. The device function now exists twice, that is bad.
  3. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).
  4. On the host the expression &base == &hst will evaluate to true with the alias.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?
In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

ABataev added a comment.EditedDec 12 2019, 11:54 AM

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

It is a different problem, with a compatibility check in C++. I'll investigate it and fix it. I mean, it is better to improve the error message, the error itself is emitted correctly because in C++ types are compatible only if they are equal (i.e. the type is the same). Strict type checking.

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.

Yes, I agree with this. But it is not intended to support construct traits since the very beginning, it is a solution only for global traits (like kind, isa, etc.) For construct the idea is to use the original function but choose this function at the codegen phase. It is much easier, no need to worry about templates and all other stuff, though I don't like it, because the user will see that instead of the base function the variant is called directly.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?

Yes, I tried it with the original solution and your solution. The original works correctly, your patch leads to an error message about incorrect assembler instruction.

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

jdoerfert added a comment.EditedDec 12 2019, 12:09 PM

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

The function is marked as used but the current solution does not emit it. That is plain wrong. Even if it was not marked as used but externally visible you cannot *not emit it*. The alias solution is simply not working.

There is no evidence that this is more complicated. By all measures, this is less complicated (see also below). It is also actually doing the right thing when it comes to code emission. Take https://godbolt.org/z/sJiP3B for example. The calls are wrong and the definition of base is missing.

How did you measure it? I have a completely different opinion. Also, tried to reproduce the problem locally, could not reproduce. It seems to me, the output of the test misses several important things. You can check it yourself. tgt_target_teams() call uses @.offload_maptypes global var but it is not defined.

Here is the link with the globals not hidden: https://godbolt.org/z/5etB5S
The base function is called both times but should not be called at all. What is your local output and why does it differ?

On the host base is an alias for the hst function. On the device base has the body of dev function because NVPTX does nit support function aliases (10+ suppprts it, but LLVM does not support it yet). Try to change the bodies of dev and hst and you will see.

I tried to keep original function names to improve debugging and make users less wonder why instead of base something else is called.

How does that work with linking? Another translation unit can call both the base and hst/dev function, right? I mean they both need to be present.

If hst or dev are needed, they are emitted too, independently. On the host, the variamt function is emitted and base function is set to be an alias of the variant function. On the device we just inherit the body of the variant function, but this variant function also can be emitted, if used independently.

This is confusing:

  1. Especially for debugging we should do the same on host and device.

We do the same on the host and on the device. The user will see that he is the context of originally called function base. It is the same behavior just like the behavior of the GNU gcc alias attribute.

No we do not. You said it yourself just in the last comment: one time you emit an alias (host), one time you "emit code into the base function" (device).
That is not the same thing, especially if you look at debug information, stack frames, ...

Yes, formally it is not the same. But only because NVPTX does not support function aliasing.

Long story short, code generation (and debugging) is different in the existing approach as a workaround.

  1. The device function now exists twice, that is bad.

It is not so bad, actually. In your solution, you have almost the same situation. Plus, it is not a big problem taking into account that most of the function calls will be inlined and function will be eliminated. As soon as we have support for global aliases in LLVM/NVPTX, we can improve it.

It is bad to duplicate code for no reason. The SemaOverload solution has not "almost the same situation", as it does not replace the base function body with the variant function body.

Again, just a workaround for NVPTX problem with function aliases.

Long story short, we duplicate code in the exisiting approach as a workaround.

  1. How is this supposed to work with type corrections? I mean the variant needs to be compatible but not necessarily of the same type, right? https://godbolt.org/z/QAXCuv we just produce a cryptic error (locally it crashes for me afterwards).

Actually, these functions are not compatible. We're missing some extra checks in Sema, will add them later.

That is not necessarily clear to me. The standard does not say the types need to be equal but compatible.

Compatible in terms of the base language. In C, these functions are not compatible. Try to declare the function base as int base(float) and then try to redeclare it as int base(int). You will get the error because the types are not compatible. For example, int () and int(int) are compatible though are not equal.

The problem exists for C++ with short vs int as well: https://godbolt.org/z/wYB_pX

  1. On the host the expression &base == &hst will evaluate to true with the alias.

And what is the problem with this? I believe, with your solution the result will be the same because you will just replace base with the hst upon overloading resolution.

But I can do it for calls only. Aliases are not as fine granular. That is also why the current code cannot handle the construct trait but an overload solution can.

I see no problem here.

The current approach *cannot* be used to implement construct without adding a *third* way to handle declare variants. So we have aliases (no 1), if supported by the architecture and if the trait is not construct. We have "function hijacking" (no 2), if aliases are not supported and if the trait is not construct. We will need something else if the trait is construct (no 3). The alternative is to handle *all of it* during overload resolution. If you still believe it is better to modify the IR in various ways *after* we created it, I'll move this discussion on the openmp-dev list to unblock it.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand this example. What is the expected outcome here (I get the error below from ToT clang). Why is that not achievable by a SemaOverload solution?
asm.c:4:35: error: function with '#pragma omp declare variant' must have a prototype

Try to compile as C++:

clang++ -с -fopenmp repro.cpp

You did not answer any of the questions here. If you ignore my comments this is not working.
What is the expected outcome here? Why is that not achievable by a SemaOverload solution?

Expected result - the code is compiled successfully. Your solution will produce the error message for incorrect asm. SemaOverload won't help you with this. Assume, you have a base function with the target-specific code for the host and a variant with the target-specific code on the device.

I compiled it as cpp with both approaches just fine (ignoring the math errors due to the wrong order of things because I had the begin/end declare variant patch build as well).
No asm error on my side, did you run it or just assume it woulnd't work?
In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

One note:

In fact, the current solution will disregard the used attribute here and not emit the function, which is bad. The Sema solution will dispatch the right calls and honor the used attribute properly.

Nope, the function is emitted but as an alias to the function with the correct assembler.

The function is marked as used but the current solution does not emit it. That is plain wrong. Even if it was not marked as used but externally visible you cannot *not emit it*. The alias solution is simply not working.

Please, add Richard Smith and John McCall as reviewers. Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

So let's actually look at the AST instead of just talking about it:

We take the asm.cpp example from @ABataev that, as I argued earlier, shows nicely why the alias solution does not work at all once we start thinking about linking things.

Now here is the code with calls to make it actually interesting:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

void foo() {
  cpu();
  wrong_asm();
}

In the current approach the as of foo looks like this:

`-FunctionDecl 0x563baf0af958 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x563baf0afb38 <col:12, line:11:1>
    |-CallExpr 0x563baf0afa78 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x563baf0afa60 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x563baf0afa40 <col:3> 'void ()' lvalue Function 0x563baf0af458 'cpu' 'void ()'
    `-CallExpr 0x563baf0afb18 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x563baf0afb00 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x563baf0afae0 <col:3> 'void ()' lvalue Function 0x563baf0af668 'wrong_asm' 'void ()'

As you might see, you don't see any hint of the declare variant stuff that will eventually transform the wrong_asm call into a cpu call.

In the proposed scheme, the AST looks like this:

`-FunctionDecl 0x1e53398 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x1e53580 <col:12, line:11:1>
    |-CallExpr 0x1e534b8 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x1e534a0 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x1e53480 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()'
    `-CallExpr 0x1e53560 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x1e53548 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x1e53520 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()' (Function 0x1e530a8 'wrong_asm' 'void ()')

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

I don't understand that this example represents. Unused static functions are generally not emitted. In general, if we perform overload resolution in Sema and, thus, only use the variants that are selected, then others that are static won't even be emitted. Here you're forcing the wrong_asm function to be used, but if it's used on all devices (host and target), then it's used, and we need to deal with the inline asm regardless.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

So let's actually look at the AST instead of just talking about it:

We take the asm.cpp example from @ABataev that, as I argued earlier, shows nicely why the alias solution does not work at all once we start thinking about linking things.

Now here is the code with calls to make it actually interesting:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

void foo() {
  cpu();
  wrong_asm();
}

In the current approach the as of foo looks like this:

`-FunctionDecl 0x563baf0af958 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x563baf0afb38 <col:12, line:11:1>
    |-CallExpr 0x563baf0afa78 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x563baf0afa60 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x563baf0afa40 <col:3> 'void ()' lvalue Function 0x563baf0af458 'cpu' 'void ()'
    `-CallExpr 0x563baf0afb18 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x563baf0afb00 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x563baf0afae0 <col:3> 'void ()' lvalue Function 0x563baf0af668 'wrong_asm' 'void ()'

As you might see, you don't see any hint of the declare variant stuff that will eventually transform the wrong_asm call into a cpu call.

In the proposed scheme, the AST looks like this:

`-FunctionDecl 0x1e53398 <line:8:1, line:11:1> line:8:6 foo 'void ()'
  `-CompoundStmt 0x1e53580 <col:12, line:11:1>
    |-CallExpr 0x1e534b8 <line:9:3, col:7> 'void'
    | `-ImplicitCastExpr 0x1e534a0 <col:3> 'void (*)()' <FunctionToPointerDecay>
    |   `-DeclRefExpr 0x1e53480 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()'
    `-CallExpr 0x1e53560 <line:10:3, col:13> 'void'
      `-ImplicitCastExpr 0x1e53548 <col:3> 'void (*)()' <FunctionToPointerDecay>
        `-DeclRefExpr 0x1e53520 <col:3> 'void ()' lvalue Function 0x1e52e98 'cpu' 'void ()' (Function 0x1e530a8 'wrong_asm' 'void ()')

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

There can be another one issue with this solution with inline assembly. I’m not completely sure about it, will try to investigate it tomorrow. I suggest to discuss this solution with Richard Smith (or John McCall). If he/they are ok with this transformation of the AST, we can switch to this scheme.

@jdoerfert , please do add Richard and John to this thread. We should be kind to them, however, and please write a summary of the language feature including some examples showing usage, and please also summarize the current implementation strategy and the one being proposed, so that they don't need to read the OpenMP spec to figure out what the discussion is about.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

And that is a good thing. Even if you argue it is not, *only* in the Sema solution the tools have *all* the information available to redirect to the base or variant function.

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

This is plain insulting (again) and beyond the point. I just shown you that the proposed solution has *all* the information in the AST available to be used by tools, codegen, ... I did so because you claimed that would not be the case, e.g. the AST would not represent the program faithfully. As you see, all the original information is available. However, you still refuse to acknowledge that and instead try to discredit me. I am tired of this kind of "discussion", we went down this road before and, as it was back then, there is nothing to be gained. It is harmful for the community and it is insulting towards me.


While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).
  • Take https://godbolt.org/z/Yi9Lht where the wrong function is called on the host (no there is *no* alias hidden)
  • Take https://godbolt.org/z/2evvtN which shows that the alias solution is incompatible with linking.
  • Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

I'd like to add that what we're talking about is none of these things. We're not talking about "mutating" the AST at all. Neither are we inlining. We're talking about performing callee resolution when building the AST in the first place. This is exactly what we do in all other places where to do overload resolution.

This is different from other places where we perform overload resolution only in that the callee won't have the same name as the identifier used in the call expression. But that's okay - those are the semantics of the calls with OpenMP variants. You type one name, and the function that ends up being called has another name. But it's all static and part of the specified language semantics. Should we record the original "base" function? Yes. Should we represent it as the callee? No.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

And that is a good thing. Even if you argue it is not, *only* in the Sema solution the tools have *all* the information available to redirect to the base or variant function.

Here, both the original callee (wrong_ast) and the actual callee cpu are shown at the call site.

Why would we not want that?

You have wron idea about AST representation. If something is not printed in dump, it does not mean it does nit exist in AST.

This is plain insulting (again) and beyond the point. I just shown you that the proposed solution has *all* the information in the AST available to be used by tools, codegen, ... I did so because you claimed that would not be the case, e.g. the AST would not represent the program faithfully. As you see, all the original information is available. However, you still refuse to acknowledge that and instead try to discredit me. I am tired of this kind of "discussion", we went down this road before and, as it was back then, there is nothing to be gained. It is harmful for the community and it is insulting towards me.


While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Undefined behavior according to the standard.

  • Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

...

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Are the variants not permitted to be external functions?

...

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Are the variants not permitted to be external functions?

Allowed, of course. But the alias/body will be emitted only if variant function is defined. Everyhing else is going to be resolved by the linker.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I don't insist on function redefinition solution. You want to replace functions - fine, but do this at the codegen, not in AST.

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

I don't insist on function redefinition solution. You want to replace functions - fine, but do this at the codegen, not in AST.

Again, no one is replacing anything, and we're not mutating the AST. We're simply resolving the callee according to the language rules. That's something that should be done during AST construction.

It's like if I have this code:

template <int x>
int foo() { return 0; }

template <>
int foo<8>() { return 1; }

int main() {
  return foo<8>();
}

and you said that, in the AST, it should look like the unspecialized foo was being called. And then later, in CodeGen, something happened in order to cause the correct specialization would be called. That clearly would not be considered an acceptable design.

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

But then we'd to add code to do that in Clang's static analyzer and all other code trying to match caller/callee relationships. The function provided in the AST being not the function that will actually be called. Instead, we should make these tools correct by default and make tools wanting to to OpenMP-specific things be the tools that need to call the OpenMP-specific functions.

ABataev added a comment.EditedDec 12 2019, 5:03 PM

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

But then we'd to add code to do that in Clang's static analyzer and all other code trying to match caller/callee relationships. The function provided in the AST being not the function that will actually be called. Instead, we should make these tools correct by default and make tools wanting to to OpenMP-specific things be the tools that need to call the OpenMP-specific functions.

In general, yes, but not necessarily. We can teach existing functions like getBody(), isDefined() etc. about this feature and thus tools will get the correct function automatically.
But I suggest to discuss this with Richard Smith.

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this? The last few posts by Hal look clear to me. Especially convincing is:

We're simply resolving the callee according to the language rules.

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this?

Yes, necessary.

The last few posts by Hal look clear to me. Especially convincing is:

We're simply resolving the callee according to the language rules.

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

This is perfectly valid code and with the current scheme impossible to support.

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Exactly, as above, this is a problem.

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

  • Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

This can also not be implemented in the alias scheme.

But I suggest to discuss this with Richard Smith.

Is the appeal to authority necessary to resolve this?

Yes, necessary.

http://lists.llvm.org/pipermail/cfe-dev/2019-December/064101.html

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

This is perfectly valid code and with the current scheme impossible to support.

GlobalAlias can be emitted only for definitions. No definition for variant - no aliasing.

Exactly, as above, this is a problem.

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

Page 59, 25-27.

  • Take the construct context selector and the begin/end declare variant construct which both cannot be implemented with aliases.

This can also not be implemented in the alias scheme.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Yes, but our decltype representation is a semantically-accurate representation of the source constructs. Your CodeGen approach to variant selection leads to a semantically-inaccurate representation: it produces a CallExpr that has a DeclRefExpr to the wrong function declaration.

In any case, our decltype representation is fairly analogous to what I proposed above. DecltypeType derives from Type, and stores both the original expression and the underlying type. If we have an OpenMPVariantCallExpr, it would also store a deference to the base function definition, but like desugar() returns the "resolved" regular type, OpenMPVariantCallExpr's getCallee() would return the resolved function that will be called.

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

I'm aware of how this works. If we see a need for a lazy evaluation strategy here we can certainly discuss that. And I agree that we should not drop all information about the base function. I'm simply saying that's not what getCallee() should return. However, what you're doing here is not analogous to the evaluation of constexprs. In short, keeping close to the original source does justify misrepresenting the semantics.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Yes, but our decltype representation is a semantically-accurate representation of the source constructs. Your CodeGen approach to variant selection leads to a semantically-inaccurate representation: it produces a CallExpr that has a DeclRefExpr to the wrong function declaration.

In any case, our decltype representation is fairly analogous to what I proposed above. DecltypeType derives from Type, and stores both the original expression and the underlying type. If we have an OpenMPVariantCallExpr, it would also store a deference to the base function definition, but like desugar() returns the "resolved" regular type, OpenMPVariantCallExpr's getCallee() would return the resolved function that will be called.

But we don't need it. We have all the required information in the AST tree. Each function has associated list of attributes with the context traits and variant function. When you process CallExpr, just check the list of these attributes and return the address of the variant function instead of the original.

Constexprs are not lowered in AST. They are lowered in place where it is required. constexpr is just evaluated. It can be evaluated in Sema, if required, or in codegen, in the analysis tool. Check https://godbolt.org/z/wr9RFk as an example. You will see, the constexprs are not evaluated in AST, instead AST tries to do everything to keep them in their original form.

I'm aware of how this works. If we see a need for a lazy evaluation strategy here we can certainly discuss that. And I agree that we should not drop all information about the base function. I'm simply saying that's not what getCallee() should return. However, what you're doing here is not analogous to the evaluation of constexprs. In short, keeping close to the original source does justify misrepresenting the semantics.

Let's wait the answer from Richard Smith.

JonChesterfield added a comment.EditedDec 13 2019, 7:10 AM

I don't think it's reasonable to stall progress on optimising openmp indefinitely. Would you accept a time out of a week, after which the majority vote carries it?

Edit: race condition, Hal sent an email while I was writing this.

jdoerfert added a comment.EditedDec 13 2019, 8:42 AM

Undefined behavior according to the standard.

I don't think so. If you do, please reference the rules this would violate.

Page 59, 25-27.

OpenMP 5.0, Page 59, 25-27:

• If the function has any declarations, then the declare variant directives for any declarations that have one must be equivalent. If the function definition has a declare variant, it must also be equivalent. Otherwise, the result is unspecified.

In the example (https://godbolt.org/z/2evvtN), all declare variant directives on declarations *that have one* are equivalent, since only one declaration has a declare variant. Since the function definition does not have a declare variant, the second restriction is also fulfilled.

To summarize, this example is broken in the current scheme and valid according to the spec.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

For example, can it handle such cases as:
t1.c:

int hst();
#pragma omp declare variant(hst) match(device={kind(host)})
int base();

t2.c:

int main() {
  return base();
}

This is the correct C code, I assume. At least, clang compiles it.

Another problem:
t1.c:

int hst();
#pragma omp declare varint(hst) match(device={kind(host)})
int base();

t2.c:

int base() { return 0; }
int main() {
  return base();
}

According to the standard, this is valid code and hst function must be called (though it is not correct since in C/C++ each definition is a declaration and all restriction applied to the declaration must be applied to the definition too).

Another one possible problem might be with the templates:

template <typename T> T base() { return T(); }
int hst() { return 1; }

int main() {
  return base<int>();
}

#pragma omp declare variant(hst) match(device={kind(gpu)})
template<typename T>
T base();

int foo() {
  return base<int>();
}

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about? This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

For example, can it handle such cases as:
t1.c:

int hst();
#pragma omp declare variant(hst) match(device={kind(host)})
int base();

t2.c:

int main() {
  return base();
}

This is the correct C code, I assume. At least, clang compiles it.

Another problem:
t1.c:

int hst();
#pragma omp declare varint(hst) match(device={kind(host)})
int base();

t2.c:

int base() { return 0; }
int main() {
  return base();
}

According to the standard, this is valid code and hst function must be called (though it is not correct since in C/C++ each definition is a declaration and all restriction applied to the declaration must be applied to the definition too).

The second example is valid code and it doens't matter if the first example is.
The argument I used here https://reviews.llvm.org/D71241#1783711 holds again and the hst function *must not be called* in either case.
The standard is clear on that and that is one of the reason the alias solution *does not work*.

Another one possible problem might be with the templates:

template <typename T> T base() { return T(); }
int hst() { return 1; }

int main() {
  return base<int>();
}

#pragma omp declare variant(hst) match(device={kind(gpu)})
template<typename T>
T base();

int foo() {
  return base<int>();
}

It would be helpful if you explain the problem that might or might not arise. In the above code there is no target so it is unclear if foo would be emitted for the gpu or not (main is not emitted for the gpu for sure). Only when foo is emitted for the gpu, the call would be to hst, otherwise always to base.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

So far, I haven't seen a reason why we would need any new expression. If you think we need one, please explain to me why.

FWIW. The only open question I have is with the OpenMP committee, I'll have to figure out if we really want to restrict the variant selection to calls only. Anyway, we can implement it either way in this scheme.

If I get to it tomorrow, I'll split the patch and update it based on my newest version. I'll also write test cases for all the situations we get wrong right now (https://reviews.llvm.org/D71241#1782650).

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Hal, are we going to support something like this?

void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}

Currently, there is an error when we try to emit the assembler output.

Hal, are we going to support something like this?

I'm not Hal but I will answer anyway.

void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}

Currently, there is an error when we try to emit the assembler output.

While it is unclear to me what you think should happen, an error pointing at "xxx" is to be expected without further information on how this is compiled.

Hal, are we going to support something like this?

I'm not Hal but I will answer anyway.

void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
void wrong_asm() {
  asm ("xxx");
}

Currently, there is an error when we try to emit the assembler output.

While it is unclear to me what you think should happen, an error pointing at "xxx" is to be expected without further information on how this is compiled.

Shall we emit function wrong_asm or not? If we're not going to use it in our program, only cpu must be used, wrong_asm is not required and must not be emitted.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

For the record:
OpenMP basically says, if you have a call to a (base)function that has variants with contexts that match at the call site, call the variant with the highest score. The variants are specified by a variant-func-id, which is a base language identifier or C++ template-id. For C++, the variant declaration is identified by *performing the base language lookup rules on the variant-func-id with arguments that correspond to the base function argument types*.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

For the record:
OpenMP basically says, if you have a call to a (base)function that has variants with contexts that match at the call site, call the variant with the highest score. The variants are specified by a variant-func-id, which is a base language identifier or C++ template-id. For C++, the variant declaration is identified by *performing the base language lookup rules on the variant-func-id with arguments that correspond to the base function argument types*.

No need to worry about lookup in C++, ADL lookup is implemented already.

rsmith added a subscriber: rsmith.Feb 11 2020, 11:08 AM

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

I've read through some of the relevant parts of the OpenMP 5.0 specification (but not the 5.1 specification), and it looks like this is the same kind of language-specific function resolution that we do in C++: name lookup finds one declaration, which we then statically resolve to a different declaration. As with the C++ case, it seems reasonable and useful to me to represent the statically-selected callee in the AST as the chosen declaration in the DeclRefExpr -- that will be the most useful thing for tooling, static analysis, and so on.

However, that seems to lose information in some cases. Consider this:

void f(int) {}

template<typename T> void g(T) {}

#pragma omp declare variant(f) match(implementation = {vendor(llvm)})
template<> void g(int) {}

void h() { g(0); }

Here, h() calls f(int). The approach in this patch will form a DeclRefExpr whose FoundDecl is the FunctionTemplateDecl g<T>, and whose resolved declaration is f(int), but that has no reference to g<int> (where the declare variant appears). That seems like it could be annoying for some tooling uses to deal with; there's no real way to get back to g<int> without redoing template argument deduction or similar.

One possibility to improve the representation would be to replace the existing NamedDecl* storage for FoundDecls with a PointerUnion<NamedDecl*, OpenMPFoundVariantDecl*>, where a OpenMPFoundVariantDecl is an ASTContext-allocated struct listing the original found declaration and the function with the declare variant pragma.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

This case is only concerned with type templates, so we don't need to worry about it.

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

This code is handling a particular case of class member access in C++: if you name a member of class Base via a using-declaration in class Derived, we convert first to class Derived and then to class Base, and this is important for the case where (for example) Base is an ambiguous base class. Consider:

struct A {
    void f() {}
    int n;
};

struct B : A {
#pragma omp declare variant(A::f) match(implementation = {vendor(llvm)})
    void g() {}
};

struct C : A, B {};

void h(C *p) { p->g(); }

What I think should happen here (per the OpenMP rules, applied to this case in the most natural way they seem to be applicable) is that we first convert p to B* (the this type of B::g), and then convert it to A* (the this type of the selected variant) -- just like for the source language call p->A::f(). That's actually exactly what will happen if you make B::g be the found declaration of the member access and A::f be the resolved declaration, as this patch does. So I don't think we need changes there, assuming the case above works with this patch -- it seems to crash in code generation without this patch.

This is also, I think, a fairly decisive argument for representing the variant selection in the AST rather than deferring it to CodeGen -- we want to form the implicit conversion of the object parameter to A* in Sema.

Incidentally, if you make A a virtual base class of B, we produce what seems to be an incorrect and confusing diagnostic:

<source>:7:29: error: conversion from pointer to member of class 'A' to pointer to member of class 'B' via virtual base 'A' is not allowed

Finally, to address the question about what AST fidelity means in this case: we certainly want the AST to represent the program as written. But that's not all: we want the AST to also represent the semantics of the program in a reasonably useful form. For a DeclRefExpr, it's more useful to directly point to the statically-chosen declaration than to expect the users of DeclRefExpr to find it for themselves, especially since DeclRefExpr already has a mechanism to track the syntactic form (the found declaration) separately from the semantically selected declaration.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

I've read through some of the relevant parts of the OpenMP 5.0 specification (but not the 5.1 specification), and it looks like this is the same kind of language-specific function resolution that we do in C++: name lookup finds one declaration, which we then statically resolve to a different declaration. As with the C++ case, it seems reasonable and useful to me to represent the statically-selected callee in the AST as the chosen declaration in the DeclRefExpr -- that will be the most useful thing for tooling, static analysis, and so on.

However, that seems to lose information in some cases. Consider this:

void f(int) {}

template<typename T> void g(T) {}

#pragma omp declare variant(f) match(implementation = {vendor(llvm)})
template<> void g(int) {}

void h() { g(0); }

Here, h() calls f(int). The approach in this patch will form a DeclRefExpr whose FoundDecl is the FunctionTemplateDecl g<T>, and whose resolved declaration is f(int), but that has no reference to g<int> (where the declare variant appears). That seems like it could be annoying for some tooling uses to deal with; there's no real way to get back to g<int> without redoing template argument deduction or similar.

One possibility to improve the representation would be to replace the existing NamedDecl* storage for FoundDecls with a PointerUnion<NamedDecl*, OpenMPFoundVariantDecl*>, where a OpenMPFoundVariantDecl is an ASTContext-allocated struct listing the original found declaration and the function with the declare variant pragma.

Hi Richard, thanks for your answer. I agree that this is the best option.

We do need to be careful here. For cases with FoundDecl != Decl, I think that the typo-correction cases look like they probably work, but there are a few places where we make semantic decisions based on the mismatch, such as:

In SemaTemplate.cpp below line 512, we have (this is in C++03-specific code):

} else if (!Found.isSuppressingDiagnostics()) {
  //   - if the name found is a class template, it must refer to the same
  //     entity as the one found in the class of the object expression,
  //     otherwise the program is ill-formed.
  if (!Found.isSingleResult() ||
      getAsTemplateNameDecl(Found.getFoundDecl())->getCanonicalDecl() !=
          OuterTemplate->getCanonicalDecl()) {
    Diag(Found.getNameLoc(),
         diag::ext_nested_name_member_ref_lookup_ambiguous)

This case is only concerned with type templates, so we don't need to worry about it.

and in SemaExpr.cpp near line 2783, we have:

// If we actually found the member through a using declaration, cast
// down to the using declaration's type.
//
// Pointer equality is fine here because only one declaration of a
// class ever has member declarations.
if (FoundDecl->getDeclContext() != Member->getDeclContext()) {
  assert(isa<UsingShadowDecl>(FoundDecl));
  QualType URecordType = Context.getTypeDeclType(
                         cast<CXXRecordDecl>(FoundDecl->getDeclContext()));

Could you specify what behavior you expect, or what the test casees would look like?

This code is handling a particular case of class member access in C++: if you name a member of class Base via a using-declaration in class Derived, we convert first to class Derived and then to class Base, and this is important for the case where (for example) Base is an ambiguous base class. Consider:

struct A {
    void f() {}
    int n;
};

struct B : A {
#pragma omp declare variant(A::f) match(implementation = {vendor(llvm)})
    void g() {}
};

struct C : A, B {};

void h(C *p) { p->g(); }

What I think should happen here (per the OpenMP rules, applied to this case in the most natural way they seem to be applicable) is that we first convert p to B* (the this type of B::g), and then convert it to A* (the this type of the selected variant) -- just like for the source language call p->A::f(). That's actually exactly what will happen if you make B::g be the found declaration of the member access and A::f be the resolved declaration, as this patch does. So I don't think we need changes there, assuming the case above works with this patch -- it seems to crash in code generation without this patch.

This is also, I think, a fairly decisive argument for representing the variant selection in the AST rather than deferring it to CodeGen -- we want to form the implicit conversion of the object parameter to A* in Sema.

Incidentally, if you make A a virtual base class of B, we produce what seems to be an incorrect and confusing diagnostic:

<source>:7:29: error: conversion from pointer to member of class 'A' to pointer to member of class 'B' via virtual base 'A' is not allowed

Finally, to address the question about what AST fidelity means in this case: we certainly want the AST to represent the program as written. But that's not all: we want the AST to also represent the semantics of the program in a reasonably useful form. For a DeclRefExpr, it's more useful to directly point to the statically-chosen declaration than to expect the users of DeclRefExpr to find it for themselves, especially since DeclRefExpr already has a mechanism to track the syntactic form (the found declaration) separately from the semantically selected declaration.

tra added a subscriber: tra.Mar 4 2020, 4:22 PM
jdoerfert abandoned this revision.Mar 13 2020, 10:33 PM

D75779 is the proper implementation of the OpenMP standard.