This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Sema/
-
clang/
-
Sema/
2/2
Sema.h
-
lib/Sema/
-
Sema/
29/29
Sema.cpp
-
SemaCUDA.cpp
2/2
SemaDecl.cpp
9/9
SemaExpr.cpp
-
SemaOpenMP.cpp
-
test/
-
OpenMP/
-
declare_target_messages.cpp
-
nvptx_target_exceptions_messages.cpp
-
SemaCUDA/
-
bad-calls-on-same-line.cu
-
call-device-fn-from-host.cu
-
call-host-fn-from-device.cu
-
openmp-target.cu
-
trace-through-global.cu

Differential D70172

[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese
ClosedPublic

Authored by yaxunl on Nov 13 2019, 4:35 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
rsmith
jdoerfert
ABataev

Commits

rGb670ab7b6b3d: recommit 1b978ddba05c [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post…
rG1b978ddba05c: [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese

Summary

This patch removes the explicit call graph for CUDA/HIP/OpenMP deferred diagnostics generated during parsing
since it is error prone due to incomplete information about function declarations during parsing. In stead,
this patch does a post-parsing AST traverse and emits deferred diagnostics based on the use graph implicitly
generated during the traverse.

Diff Detail

Event Timeline

yaxunl created this revision.Nov 13 2019, 4:35 AM

Calling @rnk for Windows know-how.

clang/test/SemaCUDA/deleting-dtor.cu
45–46 ↗	(On Diff #229058)	Nit: I think it should be `requires deleting dtor to be emitted` or `requires that deleting dtor is emitted`

This seems like the wrong approach; @rsmith should take a look.

yaxunl removed 1 blocking reviewer(s): rsmith.Nov 13 2019, 11:22 AM

sorry I think I misunderstood the meaning of "blocking" so I put it back.

Are we sure using both Itanium and MS C++ ABIs at the same time is really the best way forward here? What are the constraints on CUDA that require the Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as is, but I'm curious what they are. Was there some RFC or design showing that this is the right way forward?

I wonder if it would be more productive to add new, more expansive attributes, similar to __attribute__((ms_struct)), that tag class or function decls as MS or Itanium C++ ABI. CUDA could then leverage this as needed, and it would be much easier to construct test cases for MS/Itanium interop. This is an expansion in scope, but it seems like it could be generally useful, and if we're already going to enter the crazy world of multiple C++ ABIs in a single TU, we might as well bite the bullet and do it in a way that isn't specific to CUDA.

In D70172#1745998, @rnk wrote:

Are we sure using both Itanium and MS C++ ABIs at the same time is really the best way forward here? What are the constraints on CUDA that require the Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as is, but I'm curious what they are. Was there some RFC or design showing that this is the right way forward?

I wonder if it would be more productive to add new, more expansive attributes, similar to __attribute__((ms_struct)), that tag class or function decls as MS or Itanium C++ ABI. CUDA could then leverage this as needed, and it would be much easier to construct test cases for MS/Itanium interop. This is an expansion in scope, but it seems like it could be generally useful, and if we're already going to enter the crazy world of multiple C++ ABIs in a single TU, we might as well bite the bullet and do it in a way that isn't specific to CUDA.

We are not using Itanium ABI when we do host compilation of CUDA/HIP on windows. During the host compilation on windows only MS C++ ABI is used.

This issue is not due to mixing MS ABI with Itanium ABI.

This issue arises from the delayed diagnostics for CUDA/HIP. Basically we do not want to emit certain diagnostics (e.g. error in inline assembly code) in __host__ __device__ functions to avoid clutter. We only want to emit such diagnostics once we are certain these functions will be emitted in IR.

To implement this, clang maintains a call graph. For each reference to a function, clang checks the current context. If it is evaluating context and it is a function, clang assumes the referenced function is callee and its context is the caller. Clang checks if the caller is known to be emitted (if it has body and external linkage). If not, clang adds this caller/callee pair to the call graph. If the caller is known to be emitted, clang will check if the callee is known to be emitted. If so, do nothing. If the callee is not known to be emitted, clang will eliminate it and all its callee from the call graph, and emits the delayed diagnostics associated with them.

You can see a caller is added to the call graph only if it is not known to be emitted. Therefore clang has an assert that if a callee is known to be emitted, it should not be in the call graph.

On windows, when vtable is known to be emitted for a class, clang does a body check for dtor of the class. It makes the dtor as the context, then checks the dtor. I think it is to emulate the situation that a deleting dtor is calling a normal dtor. This happens if the dtor is not defined since otherwise the dtor has already been checked. Since dtor is not defined yet, it is not known to be emitted and put into call graph. Later on, if the dtor is defined, it will be checked again. This time it is known to be emitted, then clang finds that it is in the call graph, then the assert fails.

So the issue is that clang incorrectly assume the dtor is not known to be emitted in the first check and put it in the call graph. To fix that, a map is added to Sema to tell clang that it is checking a deleting dtor which is supposed to be emitted even if it is not defined.

remove unnecessary states added to Sema.

clang/test/SemaCUDA/deleting-dtor.cu
45–46 ↗	(On Diff #229058)	fixed

In D70172#1746451, @yaxunl wrote:

We are not using Itanium ABI when we do host compilation of CUDA/HIP on windows. During the host compilation on windows only MS C++ ABI is used.

This issue is not due to mixing MS ABI with Itanium ABI.
...

I think I might have understood all that.

Really, the problem is that, in C++, there are many kinds of special members created by the compiler that are not modeled in the AST. Deleting destructors are a good example. If we consistently used GlobalDecl throughout Sema, then we would be able to separate marking the deleting destructor referenced from marking the base destructor referenced, and this code would be easier to understand.

However, given the way things stand, your new approach seems like a reasonable way of detecting the case of referencing the deleting dtor here. So from my perspective, this is fine. @rjmccall, assuming that Richard doesn't have time to give any input, do you still think this needs his review?

Richard is definitely our main expert in the implicit synthesis of special members. It seems to me that if we need the destructor declaration at some point, we should be forcing it to exist at that point.

In D70172#1772140, @rjmccall wrote:

Richard is definitely our main expert in the implicit synthesis of special members. It seems to me that if we need the destructor declaration at some point, we should be forcing it to exist at that point.

In AST there are no separate decls for deleting dtors and complete object dtors. In AST there are only complete object dtors. In codegen when clang emits the definition of a deleting dtor, clang uses GlobalDecl with Dtor_Deleting. However AST does not have that.

Since a deleting dtor is supposed to call a complete object dtor, clang needs to check the complete object dtor in the context of the deleting dtor. Since deleting dtor is synthesized in codegen and does not have a body, clang manually pushed the decl of the complete object dtor as context and checks the same complete object dtor.

One may consider using GlobalDecl to differentiate complete object dtor and deleting dtor in AST. However that requires to use GlobalDecl to replace Decl in many places in Sema, which seems to be an overkill.

Fortunately, we could identify the deleting dtor by context without using GlobalDecl.

There are two cases :

There is no definition of complete object dtor,

When clang checks a dtor, if the caller is itself and the caller has no definition. This can only happen when clang checks the deleting dtor. Clang should just assumes the dtor is emitted. Since the dtor has no definition, there is no deferred diagnostics emitted. Clang just add a call graph branch dtor->dtor to the call graph. There is no deferred diagnostics happening with the dtor since the deleting dtor only calls complete object dtor and deallocating functions which are not supposed to cause diagnostics.

Later, if the dtor is called in other functions and checked, since the caller is not itself, it is treated as a normal function, i.e., whether it is emitted is determined by whether it has definition. Since the deleting dtor does not have extra deferred diagnostics compared with complete object dtor, there is no need to differentiate whether the callee is deleting dtor or complet object dtor.

If the complete object dtor is defined, its callees and deferred diagnostics happening in its body will be recorded as normal functions. If the complete object dtor or deleting dtor is called by other functions, the deferred diagnotics of the complete object dtor will be emitted.

There is definition of complete object dtor.

Clang will not check the deleting dtor. In this case the complete object dtor will be checked as a normal function. As discussed in case 1, deleting dtor should result in the same deferred diagnotics as complete object dtor, therefore there is no need to differentiate call of deleting dtor and complete object dtor.

I thought you were saying that the destructor decl hadn't been created yet, but I see now that you're saying something more subtle.

CurContext is set to the destructor because the standard says in [class.dtor]p13:

At the point of definition of a virtual destructor (including an implicit definition), the non-array deallocation function is determined as if for the expression `delete this` appearing in a non-virtual destructor of the destructor’s class.

Which is to say that, semantically, the context is as if it were within the destructor, to the extent that this affects access control and so on.

I can see why this causes problems for your call graph (really a use graph), since it's a use in the apparent context of the destructor at a point where the destructor is not being defined. A similar thing happens with default arguments, but because we don't consider uses from default arguments to be true ODR-uses until the default argument is used, that probably doesn't cause problems for you.

I don't think the destructor -> deallocation function edge is actually interesting for your use graph. It'd be more appropriate to treat the deallocation function as used by the v-table than by the destructor; I don't know whether you make any attempt to model v-tables as nodes in your use graph. You might consider finding a simple way to suppress adding this edge, like just not adding edges from a destructor that's not currently being defined (D->willHaveBody()).

With all that said, maintaining a use graph for all the functions you might emit in the entire translation unit seems very expensive and brittle. Have you considered doing this walk in a final pass? You could just build up a set of all the functions you know you're going to emit and then walk their bodies looking for uses of lazy-emitted entities. If we don't already have a function that calls a callback for every declaration ODR-used by a function body, we should.

This doesn't look quite right to me. I don't think we should treat the delete this; for a destructor as being emitted-for-device in any translation unit in which the vtable is marked used. (For example, if in your testcase MSEmitDeletingDtor::CFileStream::CFileStream() were a __host__ function, I think you'd still diagnose, but presumably shouldn't do so, because the vtable -- and therefore CFileStream::operator delete -- is never referenced / emitted for the device.) Instead, I think we should treat the delete this; as being emitted in any translation unit in which the vtable itself is emitted-for-device. Presumably, this means you will need to model / track usage of the vtable itself in your "call graph".

Herald added a subscriber: herhut. · View Herald TranscriptJan 8 2020, 12:26 PM

In D70172#1809571, @rjmccall wrote:
I thought you were saying that the destructor decl hadn't been created yet, but I see now that you're saying something more subtle.

CurContext is set to the destructor because the standard says in [class.dtor]p13:
At the point of definition of a virtual destructor (including an implicit definition), the non-array deallocation function is determined as if for the expression `delete this` appearing in a non-virtual destructor of the destructor’s class.
Which is to say that, semantically, the context is as if it were within the destructor, to the extent that this affects access control and so on.

I can see why this causes problems for your call graph (really a use graph), since it's a use in the apparent context of the destructor at a point where the destructor is not being defined. A similar thing happens with default arguments, but because we don't consider uses from default arguments to be true ODR-uses until the default argument is used, that probably doesn't cause problems for you.

I don't think the destructor -> deallocation function edge is actually interesting for your use graph. It'd be more appropriate to treat the deallocation function as used by the v-table than by the destructor; I don't know whether you make any attempt to model v-tables as nodes in your use graph. You might consider finding a simple way to suppress adding this edge, like just not adding edges from a destructor that's not currently being defined (D->willHaveBody()).

With all that said, maintaining a use graph for all the functions you might emit in the entire translation unit seems very expensive and brittle. Have you considered doing this walk in a final pass? You could just build up a set of all the functions you know you're going to emit and then walk their bodies looking for uses of lazy-emitted entities. If we don't already have a function that calls a callback for every declaration ODR-used by a function body, we should.

The deferred diagnostic mechanism is shared between CUDA/HIP and OpenMP. The diagnostic messages not only depend on the callee, but also depend on the caller, the caller information needs to be kept. Also if a caller is to be emitted, all the deferred diagnostics associated with the direct or indirect callees need to be emitted. Therefore a call graph is needed for this mechanism.

If we ignore the dtor->deallocation edge in the call graph, we may miss diagnostics, e.g.

static __device__ __host__ void f(__m256i *p) {
  __asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
                 : "r0"); // MS-error{{unknown register name 'r0' in asm}}
}
struct CFileStream {
  void operator delete(void *p) {
    f(0);  // MS-note{{called by 'operator delete'}}
  }
  CFileStream();
  virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
};

struct CMultiFileStream {
  CFileStream m_fileStream;
  ~CMultiFileStream();
};

// This causes vtable emitted so that deleting dtor is emitted for MS.
CFileStream::CFileStream() {}

Assuming the host compilation is on windows.

Here f() is a host device function which is unknown to be emitted, therefore the inline assembly error results in a delayed diagnostic. When f() is checked in the delete operator body, a 'delete operator -> f' edge is added to the call graph since f() is unknown to be emitted.

Since CFileStream::CFileStream is defined, clang sets vtbl to be emitted and does an explicit dtor check even though dtor is not defined. clang knows that this dtor check is for deleting dtor and will check delete operator as referenced, which causes `dtor -> delete operator' to be added to the call graph. Then clang checks dtor as referenced. Since deleting dtor will be emitted together with vtbl, clang should assume dtor is to be emitted. Then clang will found the callees 'delete operator' and f(), and emits the delayed diagnostics associated with them.

If we do not add 'dtor -> delete operator' edge to the call graph, the diagnostic msg in f() will not be emitted.

Add tests for device compilation.

Add a test when both vtbl and deleting dtor are emitted with diagnostic due to delete operator.

In D70172#1812533, @yaxunl wrote:
In D70172#1809571, @rjmccall wrote:
I thought you were saying that the destructor decl hadn't been created yet, but I see now that you're saying something more subtle.

CurContext is set to the destructor because the standard says in [class.dtor]p13:
At the point of definition of a virtual destructor (including an implicit definition), the non-array deallocation function is determined as if for the expression `delete this` appearing in a non-virtual destructor of the destructor’s class.
Which is to say that, semantically, the context is as if it were within the destructor, to the extent that this affects access control and so on.

I can see why this causes problems for your call graph (really a use graph), since it's a use in the apparent context of the destructor at a point where the destructor is not being defined. A similar thing happens with default arguments, but because we don't consider uses from default arguments to be true ODR-uses until the default argument is used, that probably doesn't cause problems for you.

I don't think the destructor -> deallocation function edge is actually interesting for your use graph. It'd be more appropriate to treat the deallocation function as used by the v-table than by the destructor; I don't know whether you make any attempt to model v-tables as nodes in your use graph. You might consider finding a simple way to suppress adding this edge, like just not adding edges from a destructor that's not currently being defined (D->willHaveBody()).

With all that said, maintaining a use graph for all the functions you might emit in the entire translation unit seems very expensive and brittle. Have you considered doing this walk in a final pass? You could just build up a set of all the functions you know you're going to emit and then walk their bodies looking for uses of lazy-emitted entities. If we don't already have a function that calls a callback for every declaration ODR-used by a function body, we should.
The deferred diagnostic mechanism is shared between CUDA/HIP and OpenMP. The diagnostic messages not only depend on the callee, but also depend on the caller, the caller information needs to be kept. Also if a caller is to be emitted, all the deferred diagnostics associated with the direct or indirect callees need to be emitted. Therefore a call graph is needed for this mechanism.

If we ignore the dtor->deallocation edge in the call graph, we may miss diagnostics, e.g.
static __device__ __host__ void f(__m256i *p) {
  __asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
                 : "r0"); // MS-error{{unknown register name 'r0' in asm}}
}
struct CFileStream {
  void operator delete(void *p) {
    f(0);  // MS-note{{called by 'operator delete'}}
  }
  CFileStream();
  virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
};

struct CMultiFileStream {
  CFileStream m_fileStream;
  ~CMultiFileStream();
};

// This causes vtable emitted so that deleting dtor is emitted for MS.
CFileStream::CFileStream() {}
Assuming the host compilation is on windows.

Here f() is a host device function which is unknown to be emitted, therefore the inline assembly error results in a delayed diagnostic. When f() is checked in the delete operator body, a 'delete operator -> f' edge is added to the call graph since f() is unknown to be emitted.

Since CFileStream::CFileStream is defined, clang sets vtbl to be emitted and does an explicit dtor check even though dtor is not defined. clang knows that this dtor check is for deleting dtor and will check delete operator as referenced, which causes `dtor -> delete operator' to be added to the call graph. Then clang checks dtor as referenced. Since deleting dtor will be emitted together with vtbl, clang should assume dtor is to be emitted. Then clang will found the callees 'delete operator' and f(), and emits the delayed diagnostics associated with them.

If we do not add 'dtor -> delete operator' edge to the call graph, the diagnostic msg in f() will not be emitted.

Most uses of the destructor do not use the delete operator, though, and therefore should not trigger the diagnostics in f to be emitted. And this really doesn't require a fully-realized use graph; you could very easily track the current use stack when making a later pass over the entities used.

Also I agree with Richard that you really need the v-table to be a node in your use graph/stack.

In D70172#1810665, @rsmith wrote:

This doesn't look quite right to me. I don't think we should treat the delete this; for a destructor as being emitted-for-device in any translation unit in which the vtable is marked used. (For example, if in your testcase MSEmitDeletingDtor::CFileStream::CFileStream() were a __host__ function, I think you'd still diagnose, but presumably shouldn't do so, because the vtable -- and therefore CFileStream::operator delete -- is never referenced / emitted for the device.) Instead, I think we should treat the delete this; as being emitted in any translation unit in which the vtable itself is emitted-for-device. Presumably, this means you will need to model / track usage of the vtable itself in your "call graph".

A user declared ctor/dtor by default is __host__.

Let's consider this testcase:

static __device__ __host__ void f(__m256i *p) {
  __asm__ volatile("vmovaps  %0, %%ymm0" ::"m"(*(__m256i *)p)
                 : "r0"); // MS-error{{unknown register name 'r0' in asm}}
}
struct CFileStream {
  void operator delete(void *p) {
    f(0);  // MS-note{{called by 'operator delete'}}
  }
  CFileStream();
  virtual ~CFileStream();  // MS-note{{called by '~CFileStream'}}
};

struct CMultiFileStream {
  CFileStream m_fileStream;
  ~CMultiFileStream();
};

// This causes vtable emitted so that deleting dtor is emitted for MS.
CFileStream::CFileStream() {}

In host compilation, vtbl is emitted, since it causes dtor emitted, whereas dtor calls f(), therefore the diagnostic msg is emitted.

In device compilation, vtbl is not emitted, therefore dtor is not emitted, and the diagnostic msg in f() is not emitted.

We only need an entity in call graph if that entity can be called by other entities. Here vtbl is always at the top level of the 'call graph'. Therefore it is not needed to be in the call graph.

In D70172#1812631, @rjmccall wrote:

Most uses of the destructor do not use the delete operator, though, and therefore should not trigger the diagnostics in f to be emitted. And this really doesn't require a fully-realized use graph; you could very easily track the current use stack when making a later pass over the entities used.

The call graph is not for this specific situation. A call graph is needed because of the transitive nature of the deferred diagnostic message. That is, if any direct or indirect caller is emitted, the diagnostic msg needs to be emitted.

The deferred diagnostic msg is recorded when parsing a function body. At that time we do not know which function will directly or indirectly call it. How do we keep a use stack?

When we parsing other function bodies, we only know the direct callee. Since we do not know if this function indirectly calls the function with deferred diagnostics, we have to keep a record of all the caller/callee edges.

In D70172#1812664, @yaxunl wrote:

In D70172#1812631, @rjmccall wrote:

Most uses of the destructor do not use the delete operator, though, and therefore should not trigger the diagnostics in f to be emitted. And this really doesn't require a fully-realized use graph; you could very easily track the current use stack when making a later pass over the entities used.

The call graph is not for this specific situation. A call graph is needed because of the transitive nature of the deferred diagnostic message. That is, if any direct or indirect caller is emitted, the diagnostic msg needs to be emitted.

One of the points that Richard and I have been trying to make is that this really isn't specifically about *calls*, it's about *uses*. You only want to emit diagnostics associated with an entity if you actually have to emit that entity, and whether you emit an entity has nothing to do with what places might *call* it, but rather what places *use* it and therefore force it to be emitted. This is fortunate because call graphs are inherently imperfect because of indirect calls, but use graphs are totally reliable. It's also fortunate because it means you can piggy-back on all of the existing logic that Sema has for tracking ODR uses.

Richard and I are also pointing out that Sema has to treat the v-table as its own separate thing when tracking ODR uses, and so you need to as well. You want to emit diagnostics associated with a virtual function if you're emitting code that either (1) directly uses the function (e.g. by calling x->A::foo()) or (2) directly uses a v-table containing the function. You can't rely on Sema's normal ODR-use tracking for *either* of these, because Sema might have observed a use in code that you don't actually have to emit, e.g. host code if you're compiling for the device. That is, a v-table is only a "root" for virtual functions if you actually have to emit that v-table, and you can't know that without tracking v-tables in your use graph.

The deferred diagnostic msg is recorded when parsing a function body. At that time we do not know which function will directly or indirectly call it. How do we keep a use stack?

The "use stack" idea would apply if you switched from eagerly creating the entire use graph to instead just making a late pass that walked function bodies. If you walk function bodies depth-first, starting from a true root and gathering all the ODR-used entities to be recursively walked, then you can maintain a stack of what entities you're currently walking, and that stack is a use-path that explains why you need to emit the current function.

It should be straightforward to build a function that walks over the entities used by a function body and calls a callback by just extracting it out of the code in MarkDeclarationsUsedInExpr.

Remove the call graph and do a final AST traverse by John's comments.

Herald added a reviewer: jdoerfert. · View Herald TranscriptJan 27 2020, 8:01 PM

Herald added a subscriber: guansong. · View Herald Transcript

bader added a subscriber: bader.Jan 28 2020, 9:18 AM

Fznamznon added a subscriber: Fznamznon.Jan 28 2020, 11:17 PM

yaxunl added a reviewer: ABataev.Jan 30 2020, 8:35 AM

In D70172#1812749, @rjmccall wrote:

In D70172#1812664, @yaxunl wrote:

In D70172#1812631, @rjmccall wrote:

Most uses of the destructor do not use the delete operator, though, and therefore should not trigger the diagnostics in f to be emitted. And this really doesn't require a fully-realized use graph; you could very easily track the current use stack when making a later pass over the entities used.

The call graph is not for this specific situation. A call graph is needed because of the transitive nature of the deferred diagnostic message. That is, if any direct or indirect caller is emitted, the diagnostic msg needs to be emitted.

One of the points that Richard and I have been trying to make is that this really isn't specifically about *calls*, it's about *uses*. You only want to emit diagnostics associated with an entity if you actually have to emit that entity, and whether you emit an entity has nothing to do with what places might *call* it, but rather what places *use* it and therefore force it to be emitted. This is fortunate because call graphs are inherently imperfect because of indirect calls, but use graphs are totally reliable. It's also fortunate because it means you can piggy-back on all of the existing logic that Sema has for tracking ODR uses.

Richard and I are also pointing out that Sema has to treat the v-table as its own separate thing when tracking ODR uses, and so you need to as well. You want to emit diagnostics associated with a virtual function if you're emitting code that either (1) directly uses the function (e.g. by calling x->A::foo()) or (2) directly uses a v-table containing the function. You can't rely on Sema's normal ODR-use tracking for *either* of these, because Sema might have observed a use in code that you don't actually have to emit, e.g. host code if you're compiling for the device. That is, a v-table is only a "root" for virtual functions if you actually have to emit that v-table, and you can't know that without tracking v-tables in your use graph.

The deferred diagnostic msg is recorded when parsing a function body. At that time we do not know which function will directly or indirectly call it. How do we keep a use stack?

The "use stack" idea would apply if you switched from eagerly creating the entire use graph to instead just making a late pass that walked function bodies. If you walk function bodies depth-first, starting from a true root and gathering all the ODR-used entities to be recursively walked, then you can maintain a stack of what entities you're currently walking, and that stack is a use-path that explains why you need to emit the current function.

It should be straightforward to build a function that walks over the entities used by a function body and calls a callback by just extracting it out of the code in MarkDeclarationsUsedInExpr.

I updated the patch to remove the explicit call graph and use an AST traverse instead. Since this patch is big, is it OK to leave the tracking of vtable to some future patch? This patch is sufficient to fix the assertion seen on Windows. Thanks.

rjmccall added inline comments.Jan 30 2020, 10:17 AM

clang/lib/Sema/SemaExpr.cpp
17183	Is there any way to share most of the visitation logic here with the visitor we use in `MarkDeclarationsUsedInExpr`? Maybe make a `UsedDeclVisitor` CRTP class that calls a "asImpl().visitUsedDecl(SourceLocation Loc, Decl *D)" in the right places?

revised by John's comments.

rjmccall added inline comments.Feb 3 2020, 3:10 PM

clang/lib/Sema/SemaExpr.cpp
17127	This should inherit from `EvaluatedExprVisitor<Derived>`, or else calls from `EvaluatedExprVisitor` and above won't dispatch all the way down to the subclass. This will allow subclasses to do node-specific logic, like your subclass's handling of `InOMPDeviceContext` or `EvaluatedExprMarker`'s need to do custom things with local variables, DREs, and MEs. Please also define this in a header; it doesn't need to be file-specific. I guess it needs a `Sema &` because of the call to `LookupDestructor`, so `lib/Sema` is probably the right place for that header.
17152	Let's not have both a `visitDeclRefExpr` and a `VisitDeclRefExpr`, distinguished only by capitalization.
17158	Please have all these call sites call `asImpl().visitUsedDecl` directly, and then don't define it in this class.
17195	This should be in your OMP-specific subclass.

revised by John's comments.

rjmccall added inline comments.Feb 3 2020, 8:46 PM

clang/lib/Sema/SemaExpr.cpp
17254	Thanks, this looks a lot better. Should this be moved to SemaOpenMP.cpp (and renamed to be OpenMP-specific), or do you think it's going to be useful in other modes?
clang/lib/Sema/UsedDeclVisitor.h
1 ↗	(On Diff #242251)	Please fix this line.

rjmccall added inline comments.Feb 3 2020, 8:46 PM

clang/lib/Sema/UsedDeclVisitor.h
9 ↗	(On Diff #242251)	"a CRTP class which visits all the declarations that are ODR-used by an expression or statement."
65 ↗	(On Diff #242251)	It's generally best to `asImpl()` when restarting on a sub-expression like this, just in case the derived class wants to do something there. Same thing in `VisitCXXBindTemporaryExpr`.

revised by John's comments.

clang/lib/Sema/SemaExpr.cpp
17254	It is not just for OpenMP. Deferred diagnostics are also emitted by CUDA/HIP.

One minor request, but otherwise LGTM; feel free to commit with that change.

clang/lib/Sema/SemaExpr.cpp
17254	Okay. Can it go in Sema.cpp next to the other overload of `emitDeferredDiags`, then? There isn't really much purpose to it being in this file.

yaxunl marked 2 inline comments as done.Feb 4 2020, 8:08 AM

yaxunl added inline comments.

clang/lib/Sema/SemaExpr.cpp
17254	will do when committing. thanks.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 16 2020, 7:47 PM

Closed by commit rG1b978ddba05c: [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese (authored by yaxunl). · Explain Why

This revision was automatically updated to reflect the committed changes.

yaxunl marked an inline comment as done.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2020, 7:47 PM

one header is missing and breaks the build

clang/lib/Sema/Sema.cpp
14	this file is missing and breaks the build

MaskRay added a subscriber: MaskRay.Feb 16 2020, 8:36 PM

MaskRay added inline comments.

clang/lib/Sema/Sema.cpp
14	Fixed by c7fa409bcadaf4ddba1862b2e52349e0ab03d1b4

MaskRay mentioned this in rGc7fa409bcada: [CUDA][HIP][OpenMP] Add lib/Sema/UsedDeclVisitor.h after D70172.Feb 16 2020, 8:40 PM

Fznamznon added inline comments.Feb 17 2020, 8:17 AM

clang/lib/Sema/Sema.cpp
1441	This particular change causes duplication of deferred diagnostics. Consider the following example (please correct me if I'm doing something wrong, I'm not an expert in OpenMP): int foobar1() { throw 1; } // error is expected here // let's try to use foobar1 in the code where exceptions aren't allowed #pragma omp declare target int (*B)() = &foobar1; #pragma omp end declare target // and in some other place let's use foobar1 in device code again #pragma omp declare target int a = foobar1(); #pragma omp end declare target Then diagnostic for `foobar1` will be duplicated for each use of `foobar1` under `target` directive. I first experienced this behavior not with OpenMP, so I suppose reproducer can be done for each programming model which uses deferred diagnostics.

yaxunl marked an inline comment as done.Feb 17 2020, 10:27 AM

yaxunl added inline comments.

clang/lib/Sema/Sema.cpp
1441	The change is intentional so that each call chain causing the diagnostic can be identified. The drawback is that it is more verbose. I can change this behavior so that the diagnostic will be emitted only for the first call chain that causes the diagnostic, if less verbose diagnostics is preferred.

This seems to result in triggering clang/lib/CodeGen/CGExpr.cpp:2626 when compiling mlir/lib/Transforms/AffineDataCopyGeneration.cpp with clang build with assertions on (clean build at e8e078c just before this change, broken at this, assert triggering at build fix commit).

https://buildkite.com/mlir/mlir-core/builds/2792#a54fb239-718b-4f0b-a309-f83e46ceb252

In D70172#1879481, @jpienaar wrote:

This seems to result in triggering clang/lib/CodeGen/CGExpr.cpp:2626 when compiling mlir/lib/Transforms/AffineDataCopyGeneration.cpp with clang build with assertions on (clean build at e8e078c just before this change, broken at this, assert triggering at build fix commit).

https://buildkite.com/mlir/mlir-core/builds/2792#a54fb239-718b-4f0b-a309-f83e46ceb252

Seems reasonable to revert if there's a testcase that they can get from rebuilding llvm with mlir enabled.

yaxunl mentioned this in rG36f480f22c25: Revert "[CUDA][HIP][OpenMP] Add lib/Sema/UsedDeclVisitor.h after D70172".Feb 18 2020, 11:48 AM

erichkeane added a subscriber: erichkeane.Feb 19 2020, 8:46 AM

erichkeane added inline comments.

clang/lib/Sema/Sema.cpp
1486	Note that when recommitting this (if you choose to), this needs to also handle NamespaceDecl. We're a downstream and discovered that this doesn't properly handle functions or records handled in a namespace. It can be implemented identically to TranslationUnitDecl.

rjmccall added inline comments.Feb 19 2020, 10:31 AM

clang/lib/Sema/Sema.cpp
1486	Wait, what? We shouldn't be doing this for TranslationUnitDecl either. I don't even know how we're "using" a TranslationUnitDecl, but neither this case not the case for `NamespaceDecl` should be recursively using every declaration declared inside it. If there's a declaration in a namespace that's being used, it should be getting visited as part of the actual use of it. The logic for `RecordDecl` has the same problem.

erichkeane added inline comments.Feb 19 2020, 10:44 AM

clang/lib/Sema/Sema.cpp
1486	Despite the name, this seems to be more of a home-written ast walking class. The entry point is the 'translation unit' which seems to walk through everything in an attempt to find all the functions (including those that are 'marked' as used by an attribute). You'll see the FunctionDecl section makes this assumption as well (not necessarily that we got to a function via a call). IMO, this approach is strange, and we should register entry points in some manner (functions marked as emitted to the device in some fashion), then just follow its call-graph (via the clang::CallGraph?) to emit all of these functions. It seemed really odd to see this approach here, but it seemed well reviewed by the time I noticed it (via a downstream bug) so I figured I'd lost my chance to disagree with the approach.

rjmccall added inline comments.Feb 19 2020, 10:56 AM

clang/lib/Sema/Sema.cpp
1486	Sure, but `visitUsedDecl` isn't the right place to be entering the walk. `visitUsedDecl` is supposed to be the callback from the walk. If they need to walk all the global declarations to find kernels instead of tracking the kernels as they're encountered (which would be a much better approach), it should be done as a separate function. I just missed this in the review.

Seems to me, it causes some other issues. See https://bugs.llvm.org/show_bug.cgi?id=44948 for example

In D70172#1883567, @ABataev wrote:

Seems to me, it causes some other issues. See https://bugs.llvm.org/show_bug.cgi?id=44948 for example

I will fix that bug.

yaxunl marked an inline comment as done.Feb 19 2020, 2:05 PM

yaxunl added inline comments.

clang/lib/Sema/Sema.cpp
1486	The deferred diagnostics could be initiated by non-kernel functions or even host functions. Let's consider a device code library where no kernels are defined. A device function is emitted, which calls a host device function which has a deferred diagnostic. All device functions that are emitted need to be checked. Same with host functions that are emitted, which may call a host device function which has deferred diagnostic. Also not just function calls need to be checked. A function address may be taken then called through function pointer. Therefore any reference to a function needs to be followed. In the case of OpenMP, the initialization of a global function pointer which refers a function may trigger a deferred diangostic. There are tests for that.

rjmccall added inline comments.Feb 19 2020, 3:32 PM

clang/lib/Sema/Sema.cpp
1486	Right, I get that emitting deferred diagnostics for a declaration D needs to trigger any deferred diagnostics in declarations used by D, recursively. You essentially have a graph of lazily-emitted declarations (which may or may not have deferred diagnostics) and a number of eagerly-emitted "root" declarations with use-edges leading into that graph. Any declaration that's reachable from a root will need to be emitted and so needs to have any deferred diagnostics emitted as well. My question is why you're finding these roots with a retroactive walk of the entire translation unit instead of either building a list of roots as you go or (better yet) building a list of lazily-emitted declarations that are used by those roots. You can unambiguously identify at the point of declaration whether an entity will be eagerly or lazily emitted, right? If you just store those initial edges into the lazily-emitted declarations graph and then initiate the recursive walk from them at the end of the translation unit, you'll only end up walking declarations that are actually relevant to your compilation, so you'll have much better locality and (if this matters to you) you'll naturally work a lot better with PCH and modules.

yaxunl marked an inline comment as done.Feb 20 2020, 8:57 AM

yaxunl added inline comments.

clang/lib/Sema/Sema.cpp
1486	I will try the approach you suggested. Basically I will record the emitted functions and variables during parsing and use them as starting point for the final traversal. This should work for CUDA/HIP. However it may be tricky for OpenMP since the emission of some entities depending on pragmas. Still it may be doable. If I encounter difficulty I will come back for discussion. I will post the change for review. Thanks.

bader added inline comments.Feb 20 2020, 9:06 AM

clang/lib/Sema/Sema.cpp
1486	FYI: SYCL is also using deferred diagnostics engine to emit device side diagnostics, although this part hasn't been up-streamed yet, but we are tracking changes in this area. SYCL support implementation should be quite similar to CUDA/HIP.

I tried recording functions to be emitted during normal parsing and using it as starting point for the final traversal. It is quite promising. I only get one lit test failure for OpenMP:

int foobar2();

#pragma omp declare target
int (*B)() = &foobar2;
#pragma omp end declare target

int foobar2() { throw 1; } // expected-error {{cannot use 'throw' with exceptions disabled}}

In this case, the emission state of foobar2 cannot be determined by itself. It can only be determined to be emitted through variable B. Therefore, I also need to record variables that are potentially emitted.

In D70172#1894036, @yaxunl wrote:
I tried recording functions to be emitted during normal parsing and using it as starting point for the final traversal. It is quite promising. I only get one lit test failure for OpenMP:
int foobar2();

#pragma omp declare target
int (*B)() = &foobar2;
#pragma omp end declare target

int foobar2() { throw 1; } // expected-error {{cannot use 'throw' with exceptions disabled}}
In this case, the emission state of foobar2 cannot be determined by itself. It can only be determined to be emitted through variable B. Therefore, I also need to record variables that are potentially emitted.

Okay. Sounds like you have some common cause with https://reviews.llvm.org/D71227, then. Pinging @hliao.

Also, we cannot remove traversing of RecordDecl and CapturedDecl encountered in function body since we have OpenMP test like this:

int main() {
#pragma omp target
  {
    t1(0);
  }
  return 0;
}

This results in a kernel function embedded in a captured record decl in AST. We have to drill into the record decl to get the kernel and the function called by it.

I still got assertion when I use the built clang with check-mlir. The reduced testcase is

class A {
public:
  int foo();
};

static A a;

struct B {
  B(int x = a.foo());
};

void test() {
  B x;
}

The assertion I got is:

clang: /home/yaxunl/git/llvm/llvm/tools/clang/lib/CodeGen/CGExpr.cpp:2628: clang::CodeGen::LValue clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(const clang::DeclRefExpr *): Assertion `(ND->isUsed(false) || !isa<VarDecl>(ND) || E->isNonOdrUse() || !E->getLocation().isValid()) && "Should not use decl without marking it used!"' failed.
Stack dump:


 #0 0x000000000258c614 PrintStackTraceSignalHandler(void*) (/home/yaxunl/git/llvm/assert/bin/clang+0x258c614)
 #1 0x000000000258a1ae llvm::sys::RunSignalHandlers() (/home/yaxunl/git/llvm/assert/bin/clang+0x258a1ae)
 #2 0x000000000258b7a2 llvm::sys::CleanupOnSignal(unsigned long) (/home/yaxunl/git/llvm/assert/bin/clang+0x258b7a2)
 #3 0x000000000251d0c3 (anonymous namespace)::CrashRecoveryContextImpl::HandleCrash(int, unsigned long) (/home/yaxunl/git/llvm/assert/bin/clang+0x251d0c3)
 #4 0x000000000251d1fc CrashRecoverySignalHandler(int) (/home/yaxunl/git/llvm/assert/bin/clang+0x251d1fc)
 #5 0x00007f0dde3bf390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390)
 #6 0x00007f0ddcf29428 raise /build/glibc-LK5gWL/glibc-2.23/signal/../sysdeps/unix/sysv/linux/raise.c:54:0
 #7 0x00007f0ddcf2b02a abort /build/glibc-LK5gWL/glibc-2.23/stdlib/abort.c:91:0
 #8 0x00007f0ddcf21bd7 __assert_fail_base /build/glibc-LK5gWL/glibc-2.23/assert/assert.c:92:0
 #9 0x00007f0ddcf21c82 (/lib/x86_64-linux-gnu/libc.so.6+0x2dc82)
#10 0x0000000002a1a5df clang::CodeGen::CodeGenFunction::EmitDeclRefLValue(clang::DeclRefExpr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a1a5df)
#11 0x0000000002a0dfb6 clang::CodeGen::CodeGenFunction::EmitLValue(clang::Expr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a0dfb6)
#12 0x0000000002a39973 clang::CodeGen::CodeGenFunction::EmitCXXMemberOrOperatorMemberCallExpr(clang::CallExpr const*, clang::CXXMethodDecl const*, clang::CodeGen::ReturnValueSlot, bool, clang::NestedNameSpecifier*, bool, clang::Expr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a39973)
#13 0x0000000002a389b9 clang::CodeGen::CodeGenFunction::EmitCXXMemberCallExpr(clang::CXXMemberCallExpr const*, clang::CodeGen::ReturnValueSlot) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a389b9)
#14 0x0000000002a28f95 clang::CodeGen::CodeGenFunction::EmitCallExpr(clang::CallExpr const*, clang::CodeGen::ReturnValueSlot) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a28f95)
#15 0x0000000002a5be29 (anonymous namespace)::ScalarExprEmitter::VisitCallExpr(clang::CallExpr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a5be29)
#16 0x0000000002a55b19 clang::StmtVisitorBase<std::add_pointer, (anonymous namespace)::ScalarExprEmitter, llvm::Value*>::Visit(clang::Stmt*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a55b19)
#17 0x0000000002a4b615 clang::CodeGen::CodeGenFunction::EmitScalarExpr(clang::Expr const*, bool) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a4b615)
#18 0x0000000002a0da30 clang::CodeGen::CodeGenFunction::EmitAnyExpr(clang::Expr const*, clang::CodeGen::AggValueSlot, bool) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a0da30)
#19 0x0000000002a0edde clang::CodeGen::CodeGenFunction::EmitAnyExprToTemp(clang::Expr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a0edde)
#20 0x00000000029cdd6b clang::CodeGen::CodeGenFunction::EmitCallArg(clang::CodeGen::CallArgList&, clang::Expr const*, clang::QualType) (/home/yaxunl/git/llvm/assert/bin/clang+0x29cdd6b)
#21 0x00000000029ccc41 clang::CodeGen::CodeGenFunction::EmitCallArgs(clang::CodeGen::CallArgList&, llvm::ArrayRef<clang::QualType>, llvm::iterator_range<clang::Stmt::CastIterator<clang::Expr, clang::Expr const* const, clang::Stmt const* const> >, clang::CodeGen::CodeGenFunction::AbstractCallee, unsigned int, clang::CodeGen::CodeGenFunction::EvaluationOrder) (/home/yaxunl/git/llvm/assert/bin/clang+0x29ccc41)
#22 0x00000000028d8e7b void clang::CodeGen::CodeGenFunction::EmitCallArgs<clang::FunctionProtoType>(clang::CodeGen::CallArgList&, clang::FunctionProtoType const*, llvm::iterator_range<clang::Stmt::CastIterator<clang::Expr, clang::Expr const* const, clang::Stmt const* const> >, clang::CodeGen::CodeGenFunction::AbstractCallee, unsigned int, clang::CodeGen::CodeGenFunction::EvaluationOrder) (/home/yaxunl/git/llvm/assert/bin/clang+0x28d8e7b)
#23 0x00000000029de431 clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall(clang::CXXConstructorDecl const*, clang::CXXCtorType, bool, bool, clang::CodeGen::AggValueSlot, clang::CXXConstructExpr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x29de431)
#24 0x0000000002a3b84e clang::CodeGen::CodeGenFunction::EmitCXXConstructExpr(clang::CXXConstructExpr const*, clang::CodeGen::AggValueSlot) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a3b84e)
#25 0x0000000002a32a8b (anonymous namespace)::AggExprEmitter::VisitCXXConstructExpr(clang::CXXConstructExpr const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a32a8b)
#26 0x0000000002a2d44f clang::CodeGen::CodeGenFunction::EmitAggExpr(clang::Expr const*, clang::CodeGen::AggValueSlot) (/home/yaxunl/git/llvm/assert/bin/clang+0x2a2d44f)
#27 0x00000000029f96fc clang::CodeGen::CodeGenFunction::EmitExprAsInit(clang::Expr const*, clang::ValueDecl const*, clang::CodeGen::LValue, bool) (/home/yaxunl/git/llvm/assert/bin/clang+0x29f96fc)
#28 0x00000000029f68d9 clang::CodeGen::CodeGenFunction::EmitAutoVarInit(clang::CodeGen::CodeGenFunction::AutoVarEmission const&) (/home/yaxunl/git/llvm/assert/bin/clang+0x29f68d9)
#29 0x00000000029f1ca5 clang::CodeGen::CodeGenFunction::EmitVarDecl(clang::VarDecl const&) (/home/yaxunl/git/llvm/assert/bin/clang+0x29f1ca5)
#30 0x00000000029f1935 clang::CodeGen::CodeGenFunction::EmitDecl(clang::Decl const&) (/home/yaxunl/git/llvm/assert/bin/clang+0x29f1935)
#31 0x00000000027e07fb clang::CodeGen::CodeGenFunction::EmitDeclStmt(clang::DeclStmt const&) (/home/yaxunl/git/llvm/assert/bin/clang+0x27e07fb)
#32 0x00000000027d7a4c clang::CodeGen::CodeGenFunction::EmitSimpleStmt(clang::Stmt const*) (/home/yaxunl/git/llvm/assert/bin/clang+0x27d7a4c)
#33 0x00000000027d66cb clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) (/home/yaxunl/git/llvm/assert/bin/clang+0x27d66cb)
#34 0x00000000027e15f0 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/home/yaxunl/git/llvm/assert/bin/clang+0x27e15f0)
#35 0x000000000282ffb6 clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/home/yaxunl/git/llvm/assert/bin/clang+0x282ffb6)
#36 0x000000000284dc52 clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/home/yaxunl/git/llvm/assert/bin/clang+0x284dc52)
#37 0x0000000002845cc7 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2845cc7)
#38 0x0000000002852271 clang::CodeGen::CodeGenModule::EmitTopLevelDecl(clang::Decl*) (/home/yaxunl/git/llvm/assert/bin/clang+0x2852271)

It is weird since this is not en OpenMP nor CUDA program and there is no deferred diags involved.

It seems somehow my change caused some decl missing used flag.

Do not traverse the whole CU. Record potentially emitted functions and variables in the normal parsing and traverse them instead.

Also fixed bug 44948 and regression in check-mlir.

Ping

rnk removed a subscriber: rnk.Mar 6 2020, 2:14 PM

rjmccall added inline comments.Mar 7 2020, 11:57 AM

clang/include/clang/Sema/Sema.h
1432	This needs to be saved and restored in modules / PCH.
clang/lib/Sema/Sema.cpp
1444	Hmm. I know this is existing code, but I just realized something. I think it's okay to not emit the notes on every diagnostic, but you might want to emit them on the first diagnostic from a function instead of after the last. If the real bug is that the program is using something it's not supposed to use, and there are enough errors in that function to reach the error limit, then the diagnostics emitter will associate these notes with a diagnostic it's suppressing and so presumably suppress them as well, leaving the user with no way to find this information.
1466	This needs to trigger if you use a variable with delayed diagnostics, too, right? When you add these methods to `UsedDeclVisitor`, you'll be able to remove them here.
1484	Should this also go in the base `UsedDeclVisitor`? I'm less sure about that because the captured statement is really always a part of the enclosing function, right? Should the delay mechanism just be looking through local sub-contexts instead?
clang/lib/Sema/UsedDeclVisitor.h
21 ↗	(On Diff #247275)	Could you add this in a separate patch?
30 ↗	(On Diff #247275)	There should definitely be cases in here for every expression that uses a declaration, including both `DeclRefExpr` and `MemberExpr`. Those might be overridden in subclasses, but that's their business; the default behavior should be to visit every used decl.

yaxunl marked 23 inline comments as done.Mar 16 2020, 6:01 PM

yaxunl added inline comments.

clang/include/clang/Sema/Sema.h
1432	done
clang/lib/Sema/Sema.cpp
1441	the change is intentional to report all use chains which result in deferred diagnostics, otherwise user may fix one issue then see another issue, instead of see all of the issues in one compilation.
1444	done
1466	fixed
1484	yes this one should also go to UsedDeclVisitor since this statement causes a RecordDecl generated which includes a FunctionDecl for a kernel, therefore this RecordDecl needs to be visited as used decl. I am not sure if other sub-context have the same effect. If so, I think they need to be handled case by case.
clang/lib/Sema/UsedDeclVisitor.h
21 ↗	(On Diff #247275)	extracted to https://reviews.llvm.org/D76262
30 ↗	(On Diff #247275)	done

revised by John's comments.

rjmccall added inline comments.Mar 16 2020, 7:30 PM

clang/lib/Sema/Sema.cpp
1486	Okay, thank you. Do you still need all the cases in here for records, templates, and so on? It looks to me like you should always end up here with exactly the variables and functions that are being used, and you should never need to make special efforts to e.g. visit all the specializations of a template or visit all the methods of a class.
1505	Can there also be deferred diagnostics associated with this initializer?
clang/lib/Sema/SemaDecl.cpp
12229	`DeclsToCheckForDeferredDiags` is basically a set of declarations that you know to have to emit, right? It doesn't seem right to be adding every variable with an initializer to that set — especially because I'm pretty sure this function gets called for literally every variable with an initializer, including local variables. Presumably you only need to do this for global variables that you're definitely going to emit in the current mode.

revised by John's comments.

clang/lib/Sema/Sema.cpp
1486	I can remove handling of templates and records. However I have to keep the handling of CapturedDecl. It is generated from code like void t1(int r) {} int main() { #pragma omp target { t1(0); } return 0; } And it is like a function decl embeded in function main, e.g. -FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()' `-CompoundStmt 0x873c3f8 <col:12, line:15:1> \|-OMPTargetDirective 0x873c3a0 <line:9:1, col:19> \| `-CapturedStmt 0x873c378 <line:10:3, line:13:3> \| `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow \| \|-CapturedStmt 0x873c350 <line:10:3, line:13:3> \| \| `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow \| \| \|-CompoundStmt 0x873c338 <line:10:3, line:13:3> \| \| \| `-CallExpr 0x873c310 <line:12:5, col:9> 'void' \| \| \| \|-ImplicitCastExpr 0x873c2f8 <col:5> 'void ()(int)' <FunctionToPointerDecay> \| \| \| \| `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)' \| \| \| `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 \| \| `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' \| \|-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit __forceinline \| \|-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid. 'const int' \| \|-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. 'const int const restrict' \| \|-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. 'void const restrict' \| \|-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. 'void (const restrict)(void const restrict, ...)' \| \|-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 'void const' \| \|-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' \| \|-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition \| \| `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit \| `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow \| \|-CompoundStmt 0x873c338 <line:10:3, line:13:3> \| \| `-CallExpr 0x873c310 <line:12:5, col:9> 'void' \| \| \|-ImplicitCastExpr 0x873c2f8 <col:5> 'void ()(int)' <FunctionToPointerDecay> \| \| \| `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)' \| \| `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 \| `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' `-ReturnStmt 0x873c3e8 <line:14:3, col:10> `-IntegerLiteral 0x873c3c8 <col:10> 'int' 0 If I do not handle it, I will not be able to reach the call of t1().
1505	Yes. A global variable may be marked by omp declare target directive to be emitted on device. If the global var is initialized with the address of a function, the function will be emitted on device. If the device function calls a host device function which contains a deferred diag, that diag will be emitted. This can only be known after everything is parsed.
clang/lib/Sema/SemaDecl.cpp
12229	Yes we only need to check global variables. Fixed.

rjmccall added inline comments.Mar 18 2020, 10:37 AM

clang/lib/Sema/Sema.cpp
1486	Sure, although I wonder if it might be more reasonable to just make UsedDeclVisitor walk into `CapturedDecl`s (and `BlockDecl`s) when it sees the corresponding statements/expressions. Unlike other declaration references, those are never "cross-references"; they're just local code tied to a declaration for representational reasons.
1505	I meant directly with the initializer. Is there a way today to defer a diagnostic that you would emit while processing an initializer expression? If so, this needs to trigger that.

yaxunl marked 4 inline comments as done.Mar 18 2020, 1:43 PM

yaxunl added inline comments.

clang/lib/Sema/Sema.cpp
1486	done
1505	I don't think the initializer itself (without a target declare directive) will cause a deferred diagnostic since it does not cause change of emission states of functions.

revised by John's comments

This looks good, assuming there's either no issue with the lazy emission of variables or that you just intend to tackle that later.

clang/lib/Sema/Sema.cpp
1505	Okay, so if I'm getting this right: only functions are emitted lazily, and variables have to be marked specially in order to get emitted on the device, so there's no need to defer diagnostics within variable initializations because we always know at the time of processing the variable where it will be emitted?

yaxunl marked 2 inline comments as done.Mar 18 2020, 3:18 PM

yaxunl added inline comments.

clang/lib/Sema/Sema.cpp
1505	right.

Hi @yaxunl! I'm working on upgrading a large codebase from LLVM-9 to LLVM-12. I noticed on average 10% compilation speed regression that seems to be caused this change. We use Clang modules and historically provide -fopenmp compiler flag by default. The problem seems to be that compiling and importing modules is now slower, with the generated modules size increased by 2X. llvm-bcanalyzer tool shows that it's dominated by DECLS_TO_CHECK_FOR_DEFERRED_DIAGS. If I understand it right, your change is only relevant when target offloading is used. I inspected all of #pragma omp directives and can confirm that we don't use it.

I see that most of this code is gated by OpenMP flag. I wonder if there is a finer grain way to enable openmp parallel code generation without target offloading? Would it make sense to extend this code to check if -fopenom-targets is set before recording DECLS_TO_CHECK_FOR_DEFERRED_DIAGS?

Note, this was measured with @weiwang's https://reviews.llvm.org/D101793.

Herald added a subscriber: sstefan1. · View Herald TranscriptAug 24 2021, 6:34 PM

In D70172#2964118, @sugak wrote:

Hi @yaxunl! I'm working on upgrading a large codebase from LLVM-9 to LLVM-12. I noticed on average 10% compilation speed regression that seems to be caused this change. We use Clang modules and historically provide -fopenmp compiler flag by default. The problem seems to be that compiling and importing modules is now slower, with the generated modules size increased by 2X. llvm-bcanalyzer tool shows that it's dominated by DECLS_TO_CHECK_FOR_DEFERRED_DIAGS. If I understand it right, your change is only relevant when target offloading is used. I inspected all of #pragma omp directives and can confirm that we don't use it.

I see that most of this code is gated by OpenMP flag. I wonder if there is a finer grain way to enable openmp parallel code generation without target offloading? Would it make sense to extend this code to check if -fopenom-targets is set before recording DECLS_TO_CHECK_FOR_DEFERRED_DIAGS?

Note, this was measured with @weiwang's https://reviews.llvm.org/D101793.

@yaxunl We did an internal measurement by not adding decls into deferred diags, and that resolves the build regression. Wonder if we can have a special case for emitting diag as they are encountered when everything is on host side.

wenlei added a subscriber: wenlei.Aug 30 2021, 11:49 PM

This patch seems to cause a new crash, details are at https://bugs.llvm.org/show_bug.cgi?id=52250.

In D70172#3077696, @hokein wrote:

This patch seems to cause a new crash, details are at https://bugs.llvm.org/show_bug.cgi?id=52250.

I will take a look. Thanks.

Revision Contents

Path

Size

clang/

include/

clang/

Sema/

Sema.h

46 lines

lib/

Sema/

82 lines

19 lines

18 lines

146 lines

156 lines

test/

OpenMP/

declare_target_messages.cpp

12 lines

nvptx_target_exceptions_messages.cpp

4 lines

SemaCUDA/

bad-calls-on-same-line.cu

4 lines

call-device-fn-from-host.cu

4 lines

call-host-fn-from-device.cu

4 lines

openmp-target.cu

4 lines

trace-through-global.cu

2 lines

Diff 240760

clang/include/clang/Sema/Sema.h

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,420 Lines • ▼ Show 20 Lines	public:
/// Calls \c Lexer::getLocForEndOfToken()		/// Calls \c Lexer::getLocForEndOfToken()
SourceLocation getLocForEndOfToken(SourceLocation Loc, unsigned Offset = 0);		SourceLocation getLocForEndOfToken(SourceLocation Loc, unsigned Offset = 0);

/// Retrieve the module loader associated with the preprocessor.		/// Retrieve the module loader associated with the preprocessor.
ModuleLoader &getModuleLoader() const;		ModuleLoader &getModuleLoader() const;

void emitAndClearUnusedLocalTypedefWarnings();		void emitAndClearUnusedLocalTypedefWarnings();

		// Emit all deferred diagnostics.
		void emitDeferredDiags();
		// Emit any deferred diagnostics for FD and erase them from the map in which
		// they're stored.
		rjmccallUnsubmitted Done Reply Inline Actions This needs to be saved and restored in modules / PCH. rjmccall: This needs to be saved and restored in modules / PCH.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
		void emitDeferredDiags(FunctionDecl *FD, bool ShowCallStack);

enum TUFragmentKind {		enum TUFragmentKind {
/// The global module fragment, between 'module;' and a module-declaration.		/// The global module fragment, between 'module;' and a module-declaration.
Global,		Global,
/// A normal translation unit fragment. For a non-module unit, this is the		/// A normal translation unit fragment. For a non-module unit, this is the
/// entire translation unit. Otherwise, it runs from the module-declaration		/// entire translation unit. Otherwise, it runs from the module-declaration
/// to the private-module-fragment (if any) or the end of the TU (if not).		/// to the private-module-fragment (if any) or the end of the TU (if not).
Normal,		Normal,
/// The private module fragment, between 'module :private;' and the end of		/// The private module fragment, between 'module :private;' and the end of
▲ Show 20 Lines • Show All 2,171 Lines • ▼ Show 20 Lines	public:
/// Status of the function emission on the CUDA/HIP/OpenMP host/device attrs.		/// Status of the function emission on the CUDA/HIP/OpenMP host/device attrs.
enum class FunctionEmissionStatus {		enum class FunctionEmissionStatus {
Emitted,		Emitted,
CUDADiscarded, // Discarded due to CUDA/HIP hostness		CUDADiscarded, // Discarded due to CUDA/HIP hostness
OMPDiscarded, // Discarded due to OpenMP hostness		OMPDiscarded, // Discarded due to OpenMP hostness
TemplateDiscarded, // Discarded due to uninstantiated templates		TemplateDiscarded, // Discarded due to uninstantiated templates
Unknown,		Unknown,
};		};
FunctionEmissionStatus getEmissionStatus(FunctionDecl *Decl);		FunctionEmissionStatus getEmissionStatus(FunctionDecl *Decl,
		bool Final = false);

// Whether the callee should be ignored in CUDA/HIP/OpenMP host/device check.		// Whether the callee should be ignored in CUDA/HIP/OpenMP host/device check.
bool shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee);		bool shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee);

void ArgumentDependentLookup(DeclarationName Name, SourceLocation Loc,		void ArgumentDependentLookup(DeclarationName Name, SourceLocation Loc,
ArrayRef<Expr *> Args, ADLResult &Functions);		ArrayRef<Expr *> Args, ADLResult &Functions);

void LookupVisibleDecls(Scope *S, LookupNameKind Kind,		void LookupVisibleDecls(Scope *S, LookupNameKind Kind,
▲ Show 20 Lines • Show All 5,881 Lines • ▼ Show 20 Lines	private:
int getNumberOfConstructScopes(unsigned Level) const;		int getNumberOfConstructScopes(unsigned Level) const;

/// Push new OpenMP function region for non-capturing function.		/// Push new OpenMP function region for non-capturing function.
void pushOpenMPFunctionRegion();		void pushOpenMPFunctionRegion();

/// Pop OpenMP function region for non-capturing function.		/// Pop OpenMP function region for non-capturing function.
void popOpenMPFunctionRegion(const sema::FunctionScopeInfo *OldFSI);		void popOpenMPFunctionRegion(const sema::FunctionScopeInfo *OldFSI);

/// Check whether we're allowed to call Callee from the current function.
void checkOpenMPDeviceFunction(SourceLocation Loc, FunctionDecl *Callee,
bool CheckForDelayedContext = true);

/// Check whether we're allowed to call Callee from the current function.
void checkOpenMPHostFunction(SourceLocation Loc, FunctionDecl *Callee,
bool CheckCaller = true);

/// Check if the expression is allowed to be used in expressions for the		/// Check if the expression is allowed to be used in expressions for the
/// OpenMP devices.		/// OpenMP devices.
void checkOpenMPDeviceExpr(const Expr *E);		void checkOpenMPDeviceExpr(const Expr *E);

/// Finishes analysis of the deferred functions calls that may be declared as
/// host/nohost during device/host compilation.
void finalizeOpenMPDelayedAnalysis();

/// Checks if a type or a declaration is disabled due to the owning extension		/// Checks if a type or a declaration is disabled due to the owning extension
/// being disabled, and emits diagnostic messages if it is disabled.		/// being disabled, and emits diagnostic messages if it is disabled.
/// \param D type or declaration to be checked.		/// \param D type or declaration to be checked.
/// \param DiagLoc source location for the diagnostic message.		/// \param DiagLoc source location for the diagnostic message.
/// \param DiagInfo information to be emitted for the diagnostic message.		/// \param DiagInfo information to be emitted for the diagnostic message.
/// \param SrcRange source range of the declaration.		/// \param SrcRange source range of the declaration.
/// \param Map maps type or declaration to the extensions.		/// \param Map maps type or declaration to the extensions.
/// \param Selector selects diagnostic message: 0 for type and 1 for		/// \param Selector selects diagnostic message: 0 for type and 1 for
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	public:
/// Called on correct id-expression from the '#pragma omp declare target'.		/// Called on correct id-expression from the '#pragma omp declare target'.
void ActOnOpenMPDeclareTargetName(NamedDecl *ND, SourceLocation Loc,		void ActOnOpenMPDeclareTargetName(NamedDecl *ND, SourceLocation Loc,
OMPDeclareTargetDeclAttr::MapTypeTy MT,		OMPDeclareTargetDeclAttr::MapTypeTy MT,
OMPDeclareTargetDeclAttr::DevTypeTy DT);		OMPDeclareTargetDeclAttr::DevTypeTy DT);
/// Check declaration inside target region.		/// Check declaration inside target region.
void		void
checkDeclIsAllowedInOpenMPTarget(Expr E, Decl D,		checkDeclIsAllowedInOpenMPTarget(Expr E, Decl D,
SourceLocation IdLoc = SourceLocation());		SourceLocation IdLoc = SourceLocation());
		/// Finishes analysis of the deferred functions calls that may be declared as
		/// host/nohost during device/host compilation.
		void finalizeOpenMPDelayedAnalysis(const FunctionDecl *Caller,
		const FunctionDecl *Callee,
		SourceLocation Loc);
/// Return true inside OpenMP declare target region.		/// Return true inside OpenMP declare target region.
bool isInOpenMPDeclareTargetContext() const {		bool isInOpenMPDeclareTargetContext() const {
return DeclareTargetNestingLevel > 0;		return DeclareTargetNestingLevel > 0;
}		}
/// Return true inside OpenMP target region.		/// Return true inside OpenMP target region.
bool isInOpenMPTargetExecutionDirective() const;		bool isInOpenMPTargetExecutionDirective() const;

/// Return the number of captured regions created for an OpenMP directive.		/// Return the number of captured regions created for an OpenMP directive.
▲ Show 20 Lines • Show All 1,314 Lines • ▼ Show 20 Lines	public:
/// known-emitted callers (plus the location of the call).		/// known-emitted callers (plus the location of the call).
///		///
/// Functions that we can tell a priori must be emitted aren't added to this		/// Functions that we can tell a priori must be emitted aren't added to this
/// map.		/// map.
llvm::DenseMap</* Callee = */ CanonicalDeclPtr<FunctionDecl>,		llvm::DenseMap</* Callee = */ CanonicalDeclPtr<FunctionDecl>,
/* Caller = */ FunctionDeclAndLoc>		/* Caller = */ FunctionDeclAndLoc>
DeviceKnownEmittedFns;		DeviceKnownEmittedFns;

/// A partial call graph maintained during CUDA/OpenMP device code compilation
/// to support deferred diagnostics.
///
/// Functions are only added here if, at the time they're considered, they are
/// not known-emitted. As soon as we discover that a function is
/// known-emitted, we remove it and everything it transitively calls from this
/// set and add those functions to DeviceKnownEmittedFns.
llvm::DenseMap</* Caller = */ CanonicalDeclPtr<FunctionDecl>,
/* Callees = */ llvm::MapVector<CanonicalDeclPtr<FunctionDecl>,
SourceLocation>>
DeviceCallGraph;

/// Diagnostic builder for CUDA/OpenMP devices errors which may or may not be		/// Diagnostic builder for CUDA/OpenMP devices errors which may or may not be
/// deferred.		/// deferred.
///		///
/// In CUDA, there exist constructs (e.g. variable-length arrays, try/catch)		/// In CUDA, there exist constructs (e.g. variable-length arrays, try/catch)
/// which are not allowed to appear inside __device__ functions and are		/// which are not allowed to appear inside __device__ functions and are
/// allowed to appear in __host__ __device__ functions only if the host+device		/// allowed to appear in __host__ __device__ functions only if the host+device
/// function is never codegen'ed.		/// function is never codegen'ed.
///		///
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	private:
bool ShowCallStack;		bool ShowCallStack;

// Invariant: At most one of these Optionals has a value.		// Invariant: At most one of these Optionals has a value.
// FIXME: Switch these to a Variant once that exists.		// FIXME: Switch these to a Variant once that exists.
llvm::Optional<SemaDiagnosticBuilder> ImmediateDiag;		llvm::Optional<SemaDiagnosticBuilder> ImmediateDiag;
llvm::Optional<unsigned> PartialDiagId;		llvm::Optional<unsigned> PartialDiagId;
};		};

/// Indicate that this function (and thus everything it transtively calls)
/// will be codegen'ed, and emit any deferred diagnostics on this function and
/// its (transitive) callees.
void markKnownEmitted(
Sema &S, FunctionDecl OrigCaller, FunctionDecl OrigCallee,
SourceLocation OrigLoc,
const llvm::function_ref<bool(Sema &, FunctionDecl *)> IsKnownEmitted);

/// Creates a DeviceDiagBuilder that emits the diagnostic if the current context		/// Creates a DeviceDiagBuilder that emits the diagnostic if the current context
/// is "used as device code".		/// is "used as device code".
///		///
/// - If CurContext is a __host__ function, does not emit any diagnostics.		/// - If CurContext is a __host__ function, does not emit any diagnostics.
/// - If CurContext is a __device__ or __global__ function, emits the		/// - If CurContext is a __device__ or __global__ function, emits the
/// diagnostics immediately.		/// diagnostics immediately.
/// - If CurContext is a __host__ __device__ function and we are compiling for		/// - If CurContext is a __host__ __device__ function and we are compiling for
/// the device, creates a diagnostic which is emitted if and when we realize		/// the device, creates a diagnostic which is emitted if and when we realize
▲ Show 20 Lines • Show All 852 Lines • Show Last 20 Lines

clang/lib/Sema/Sema.cpp

//===--- Sema.cpp - AST Builder and Semantic Analysis Implementation ------===//		//===--- Sema.cpp - AST Builder and Semantic Analysis Implementation ------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the actions class which performs semantic analysis and		// This file implements the actions class which performs semantic analysis and
// builds an AST out of a parse stream.		// builds an AST out of a parse stream.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/AST/ASTContext.h"		#include "clang/AST/ASTContext.h"
		hliaoUnsubmitted Done Reply Inline Actions this file is missing and breaks the build hliao: this file is missing and breaks the build
		MaskRayUnsubmitted Done Reply Inline Actions Fixed by c7fa409bcadaf4ddba1862b2e52349e0ab03d1b4 MaskRay: Fixed by c7fa409bcadaf4ddba1862b2e52349e0ab03d1b4
#include "clang/AST/ASTDiagnostic.h"		#include "clang/AST/ASTDiagnostic.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
#include "clang/AST/DeclFriend.h"		#include "clang/AST/DeclFriend.h"
#include "clang/AST/DeclObjC.h"		#include "clang/AST/DeclObjC.h"
#include "clang/AST/Expr.h"		#include "clang/AST/Expr.h"
#include "clang/AST/ExprCXX.h"		#include "clang/AST/ExprCXX.h"
#include "clang/AST/PrettyDeclStackTrace.h"		#include "clang/AST/PrettyDeclStackTrace.h"
#include "clang/AST/StmtCXX.h"		#include "clang/AST/StmtCXX.h"
▲ Show 20 Lines • Show All 900 Lines • ▼ Show 20 Lines	PendingInstantiations.insert(PendingInstantiations.begin(),
Pending.begin(), Pending.end());		Pending.begin(), Pending.end());
}		}

{		{
llvm::TimeTraceScope TimeScope("PerformPendingInstantiations");		llvm::TimeTraceScope TimeScope("PerformPendingInstantiations");
PerformPendingInstantiations();		PerformPendingInstantiations();
}		}

// Finalize analysis of OpenMP-specific constructs.		emitDeferredDiags();
if (LangOpts.OpenMP)
finalizeOpenMPDelayedAnalysis();

assert(LateParsedInstantiations.empty() &&		assert(LateParsedInstantiations.empty() &&
"end of TU template instantiation should not create more "		"end of TU template instantiation should not create more "
"late-parsed templates");		"late-parsed templates");

// Report diagnostics for uncorrected delayed typos. Ideally all of them		// Report diagnostics for uncorrected delayed typos. Ideally all of them
// should have been corrected by that time, but it is very hard to cover all		// should have been corrected by that time, but it is very hard to cover all
// cases in practice.		// cases in practice.
▲ Show 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	while (FnIt != S.DeviceKnownEmittedFns.end()) {
Builder.setForceEmit();		Builder.setForceEmit();

FnIt = S.DeviceKnownEmittedFns.find(FnIt->second.FD);		FnIt = S.DeviceKnownEmittedFns.find(FnIt->second.FD);
}		}
}		}

// Emit any deferred diagnostics for FD and erase them from the map in which		// Emit any deferred diagnostics for FD and erase them from the map in which
// they're stored.		// they're stored.
static void emitDeferredDiags(Sema &S, FunctionDecl *FD, bool ShowCallStack) {		void Sema::emitDeferredDiags(FunctionDecl *FD, bool ShowCallStack) {
auto It = S.DeviceDeferredDiags.find(FD);		auto It = DeviceDeferredDiags.find(FD);
if (It == S.DeviceDeferredDiags.end())		if (It == DeviceDeferredDiags.end())
return;		return;
bool HasWarningOrError = false;		bool HasWarningOrError = false;
for (PartialDiagnosticAt &PDAt : It->second) {		for (PartialDiagnosticAt &PDAt : It->second) {
const SourceLocation &Loc = PDAt.first;		const SourceLocation &Loc = PDAt.first;
const PartialDiagnostic &PD = PDAt.second;		const PartialDiagnostic &PD = PDAt.second;
HasWarningOrError \|= S.getDiagnostics().getDiagnosticLevel(		HasWarningOrError \|= getDiagnostics().getDiagnosticLevel(
PD.getDiagID(), Loc) >= DiagnosticsEngine::Warning;		PD.getDiagID(), Loc) >= DiagnosticsEngine::Warning;
DiagnosticBuilder Builder(S.Diags.Report(Loc, PD.getDiagID()));		DiagnosticBuilder Builder(Diags.Report(Loc, PD.getDiagID()));
Builder.setForceEmit();		Builder.setForceEmit();
PD.Emit(Builder);		PD.Emit(Builder);
}		}
S.DeviceDeferredDiags.erase(It);
FznamznonUnsubmitted Done Reply Inline Actions This particular change causes duplication of deferred diagnostics. Consider the following example (please correct me if I'm doing something wrong, I'm not an expert in OpenMP): int foobar1() { throw 1; } // error is expected here // let's try to use foobar1 in the code where exceptions aren't allowed #pragma omp declare target int (B)() = &foobar1; #pragma omp end declare target // and in some other place let's use foobar1 in device code again #pragma omp declare target int a = foobar1(); #pragma omp end declare target Then diagnostic for `foobar1` will be duplicated for each use of `foobar1` under `target` directive. I first experienced this behavior not with OpenMP, so I suppose reproducer can be done for each programming model which uses deferred diagnostics. Fznamznon:* This particular change causes duplication of deferred diagnostics. Consider the following…
yaxunlAuthorUnsubmitted Done Reply Inline Actions The change is intentional so that each call chain causing the diagnostic can be identified. The drawback is that it is more verbose. I can change this behavior so that the diagnostic will be emitted only for the first call chain that causes the diagnostic, if less verbose diagnostics is preferred. yaxunl: The change is intentional so that each call chain causing the diagnostic can be identified. The…
yaxunlAuthorUnsubmitted Done Reply Inline Actions the change is intentional to report all use chains which result in deferred diagnostics, otherwise user may fix one issue then see another issue, instead of see all of the issues in one compilation. yaxunl: the change is intentional to report all use chains which result in deferred diagnostics…

// FIXME: Should this be called after every warning/error emitted in the loop		// FIXME: Should this be called after every warning/error emitted in the loop
// above, instead of just once per function? That would be consistent with		// above, instead of just once per function? That would be consistent with
// how we handle immediate errors, but it also seems like a bit much.		// how we handle immediate errors, but it also seems like a bit much.
if (HasWarningOrError && ShowCallStack)		if (HasWarningOrError && ShowCallStack)
emitCallStackNotes(S, FD);		emitCallStackNotes(*this, FD);
		rjmccallUnsubmitted Done Reply Inline Actions Hmm. I know this is existing code, but I just realized something. I think it's okay to not emit the notes on every diagnostic, but you might want to emit them on the first diagnostic from a function instead of after the last. If the real bug is that the program is using something it's not supposed to use, and there are enough errors in that function to reach the error limit, then the diagnostics emitter will associate these notes with a diagnostic it's suppressing and so presumably suppress them as well, leaving the user with no way to find this information. rjmccall: Hmm. I know this is existing code, but I just realized something. I think it's okay to not…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
}		}

// In CUDA, there are some constructs which may appear in semantically-valid		// In CUDA, there are some constructs which may appear in semantically-valid
// code, but trigger errors if we ever generate code for the function in which		// code, but trigger errors if we ever generate code for the function in which
// they appear. Essentially every construct you're not allowed to use on the		// they appear. Essentially every construct you're not allowed to use on the
// device falls into this category, because you are allowed to use these		// device falls into this category, because you are allowed to use these
// constructs in a __host__ __device__ function, but only if that function is		// constructs in a __host__ __device__ function, but only if that function is
// never codegen'ed on the device.		// never codegen'ed on the device.
//		//
// To handle semantic checking for these constructs, we keep track of the set of		// To handle semantic checking for these constructs, we keep track of the set of
// functions we know will be emitted, either because we could tell a priori that		// functions we know will be emitted, either because we could tell a priori that
// they would be emitted, or because they were transitively called by a		// they would be emitted, or because they were transitively called by a
// known-emitted function.		// known-emitted function.
//		//
// We also keep a partial call graph of which not-known-emitted functions call		// We also keep a partial call graph of which not-known-emitted functions call
// which other not-known-emitted functions.		// which other not-known-emitted functions.
//		//
// When we see something which is illegal if the current function is emitted		// When we see something which is illegal if the current function is emitted
// (usually by way of CUDADiagIfDeviceCode, CUDADiagIfHostCode, or		// (usually by way of CUDADiagIfDeviceCode, CUDADiagIfHostCode, or
// CheckCUDACall), we first check if the current function is known-emitted. If		// CheckCUDACall), we first check if the current function is known-emitted. If
// so, we immediately output the diagnostic.		// so, we immediately output the diagnostic.
//		//
		rjmccallUnsubmitted Done Reply Inline Actions This needs to trigger if you use a variable with delayed diagnostics, too, right? When you add these methods to `UsedDeclVisitor`, you'll be able to remove them here. rjmccall: This needs to trigger if you use a variable with delayed diagnostics, too, right? When you add…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions fixed yaxunl: fixed
// Otherwise, we "defer" the diagnostic. It sits in Sema::DeviceDeferredDiags		// Otherwise, we "defer" the diagnostic. It sits in Sema::DeviceDeferredDiags
// until we discover that the function is known-emitted, at which point we take		// until we discover that the function is known-emitted, at which point we take
// it out of this map and emit the diagnostic.		// it out of this map and emit the diagnostic.

Sema::DeviceDiagBuilder::DeviceDiagBuilder(Kind K, SourceLocation Loc,		Sema::DeviceDiagBuilder::DeviceDiagBuilder(Kind K, SourceLocation Loc,
unsigned DiagID, FunctionDecl *Fn,		unsigned DiagID, FunctionDecl *Fn,
Sema &S)		Sema &S)
: S(S), Loc(Loc), DiagID(DiagID), Fn(Fn),		: S(S), Loc(Loc), DiagID(DiagID), Fn(Fn),
ShowCallStack(K == K_ImmediateWithCallStack \|\| K == K_Deferred) {		ShowCallStack(K == K_ImmediateWithCallStack \|\| K == K_Deferred) {
switch (K) {		switch (K) {
case K_Nop:		case K_Nop:
break;		break;
case K_Immediate:		case K_Immediate:
case K_ImmediateWithCallStack:		case K_ImmediateWithCallStack:
ImmediateDiag.emplace(S.Diag(Loc, DiagID));		ImmediateDiag.emplace(S.Diag(Loc, DiagID));
break;		break;
case K_Deferred:		case K_Deferred:
assert(Fn && "Must have a function to attach the deferred diag to.");		assert(Fn && "Must have a function to attach the deferred diag to.");
		rjmccallUnsubmitted Done Reply Inline Actions Should this also go in the base `UsedDeclVisitor`? I'm less sure about that because the captured statement is really always a part of the enclosing function, right? Should the delay mechanism just be looking through local sub-contexts instead? rjmccall: Should this also go in the base `UsedDeclVisitor`? I'm less sure about that because the…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions yes this one should also go to UsedDeclVisitor since this statement causes a RecordDecl generated which includes a FunctionDecl for a kernel, therefore this RecordDecl needs to be visited as used decl. I am not sure if other sub-context have the same effect. If so, I think they need to be handled case by case. yaxunl: yes this one should also go to UsedDeclVisitor since this statement causes a RecordDecl…
auto &Diags = S.DeviceDeferredDiags[Fn];		auto &Diags = S.DeviceDeferredDiags[Fn];
PartialDiagId.emplace(Diags.size());		PartialDiagId.emplace(Diags.size());
		erichkeaneUnsubmitted Done Reply Inline Actions Note that when recommitting this (if you choose to), this needs to also handle NamespaceDecl. We're a downstream and discovered that this doesn't properly handle functions or records handled in a namespace. It can be implemented identically to TranslationUnitDecl. erichkeane: Note that when recommitting this (if you choose to), this needs to also handle NamespaceDecl.
		rjmccallUnsubmitted Done Reply Inline Actions Wait, what? We shouldn't be doing this for TranslationUnitDecl either. I don't even know how we're "using" a TranslationUnitDecl, but neither this case not the case for `NamespaceDecl` should be recursively using every declaration declared inside it. If there's a declaration in a namespace that's being used, it should be getting visited as part of the actual use of it. The logic for `RecordDecl` has the same problem. rjmccall: Wait, what? We shouldn't be doing this for TranslationUnitDecl either. I don't even know how…
		erichkeaneUnsubmitted Done Reply Inline Actions Despite the name, this seems to be more of a home-written ast walking class. The entry point is the 'translation unit' which seems to walk through everything in an attempt to find all the functions (including those that are 'marked' as used by an attribute). You'll see the FunctionDecl section makes this assumption as well (not necessarily that we got to a function via a call). IMO, this approach is strange, and we should register entry points in some manner (functions marked as emitted to the device in some fashion), then just follow its call-graph (via the clang::CallGraph?) to emit all of these functions. It seemed really odd to see this approach here, but it seemed well reviewed by the time I noticed it (via a downstream bug) so I figured I'd lost my chance to disagree with the approach. erichkeane: Despite the name, this seems to be more of a home-written ast walking class. The entry point…
		rjmccallUnsubmitted Done Reply Inline Actions Sure, but `visitUsedDecl` isn't the right place to be entering the walk. `visitUsedDecl` is supposed to be the callback from the walk. If they need to walk all the global declarations to find kernels instead of tracking the kernels as they're encountered (which would be a much better approach), it should be done as a separate function. I just missed this in the review. rjmccall: Sure, but `visitUsedDecl` isn't the right place to be entering the walk. `visitUsedDecl` is…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions The deferred diagnostics could be initiated by non-kernel functions or even host functions. Let's consider a device code library where no kernels are defined. A device function is emitted, which calls a host device function which has a deferred diagnostic. All device functions that are emitted need to be checked. Same with host functions that are emitted, which may call a host device function which has deferred diagnostic. Also not just function calls need to be checked. A function address may be taken then called through function pointer. Therefore any reference to a function needs to be followed. In the case of OpenMP, the initialization of a global function pointer which refers a function may trigger a deferred diangostic. There are tests for that. yaxunl: The deferred diagnostics could be initiated by non-kernel functions or even host functions.
		rjmccallUnsubmitted Done Reply Inline Actions Right, I get that emitting deferred diagnostics for a declaration D needs to trigger any deferred diagnostics in declarations used by D, recursively. You essentially have a graph of lazily-emitted declarations (which may or may not have deferred diagnostics) and a number of eagerly-emitted "root" declarations with use-edges leading into that graph. Any declaration that's reachable from a root will need to be emitted and so needs to have any deferred diagnostics emitted as well. My question is why you're finding these roots with a retroactive walk of the entire translation unit instead of either building a list of roots as you go or (better yet) building a list of lazily-emitted declarations that are used by those roots. You can unambiguously identify at the point of declaration whether an entity will be eagerly or lazily emitted, right? If you just store those initial edges into the lazily-emitted declarations graph and then initiate the recursive walk from them at the end of the translation unit, you'll only end up walking declarations that are actually relevant to your compilation, so you'll have much better locality and (if this matters to you) you'll naturally work a lot better with PCH and modules. rjmccall: Right, I get that emitting deferred diagnostics for a declaration D needs to trigger any…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I will try the approach you suggested. Basically I will record the emitted functions and variables during parsing and use them as starting point for the final traversal. This should work for CUDA/HIP. However it may be tricky for OpenMP since the emission of some entities depending on pragmas. Still it may be doable. If I encounter difficulty I will come back for discussion. I will post the change for review. Thanks. yaxunl: I will try the approach you suggested. Basically I will record the emitted functions and…
		baderUnsubmitted Done Reply Inline Actions FYI: SYCL is also using deferred diagnostics engine to emit device side diagnostics, although this part hasn't been up-streamed yet, but we are tracking changes in this area. SYCL support implementation should be quite similar to CUDA/HIP. bader: FYI: SYCL is also using deferred diagnostics engine to emit device side diagnostics, although…
		rjmccallUnsubmitted Done Reply Inline Actions Okay, thank you. Do you still need all the cases in here for records, templates, and so on? It looks to me like you should always end up here with exactly the variables and functions that are being used, and you should never need to make special efforts to e.g. visit all the specializations of a template or visit all the methods of a class. rjmccall: Okay, thank you. Do you still need all the cases in here for records, templates, and so on?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I can remove handling of templates and records. However I have to keep the handling of CapturedDecl. It is generated from code like void t1(int r) {} int main() { #pragma omp target { t1(0); } return 0; } And it is like a function decl embeded in function main, e.g. -FunctionDecl 0x86f7c70 <line:8:1, line:15:1> line:8:5 main 'int ()' `-CompoundStmt 0x873c3f8 <col:12, line:15:1> \|-OMPTargetDirective 0x873c3a0 <line:9:1, col:19> \| `-CapturedStmt 0x873c378 <line:10:3, line:13:3> \| `-CapturedDecl 0x873bd18 <<invalid sloc>> <invalid sloc> nothrow \| \|-CapturedStmt 0x873c350 <line:10:3, line:13:3> \| \| `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow \| \| \|-CompoundStmt 0x873c338 <line:10:3, line:13:3> \| \| \| `-CallExpr 0x873c310 <line:12:5, col:9> 'void' \| \| \| \|-ImplicitCastExpr 0x873c2f8 <col:5> 'void ()(int)' <FunctionToPointerDecay> \| \| \| \| `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)' \| \| \| `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 \| \| `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' \| \|-AlwaysInlineAttr 0x873c040 <<invalid sloc>> Implicit __forceinline \| \|-ImplicitParamDecl 0x873bda0 <col:1> col:1 implicit .global_tid. 'const int' \| \|-ImplicitParamDecl 0x873be08 <col:1> col:1 implicit .part_id. 'const int const restrict' \| \|-ImplicitParamDecl 0x873be70 <col:1> col:1 implicit .privates. 'void const restrict' \| \|-ImplicitParamDecl 0x873bed8 <col:1> col:1 implicit .copy_fn. 'void (const restrict)(void const restrict, ...)' \| \|-ImplicitParamDecl 0x873bf40 <col:1> col:1 implicit .task_t. 'void const' \| \|-ImplicitParamDecl 0x873bfd8 <col:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' \| \|-RecordDecl 0x873c098 <col:1> col:1 implicit struct definition \| \| `-CapturedRecordAttr 0x873c140 <<invalid sloc>> Implicit \| `-CapturedDecl 0x873c198 <<invalid sloc>> <invalid sloc> nothrow \| \|-CompoundStmt 0x873c338 <line:10:3, line:13:3> \| \| `-CallExpr 0x873c310 <line:12:5, col:9> 'void' \| \| \|-ImplicitCastExpr 0x873c2f8 <col:5> 'void ()(int)' <FunctionToPointerDecay> \| \| \| `-DeclRefExpr 0x873c290 <col:5> 'void (int)' Function 0x86f7b18 't1' 'void (int)' \| \| `-IntegerLiteral 0x873c2b0 <col:8> 'int' 0 \| `-ImplicitParamDecl 0x873c228 <line:9:1> col:1 implicit __context 'struct (anonymous at nvptx_va_arg_delayed_diags2.c:9:1) const restrict' `-ReturnStmt 0x873c3e8 <line:14:3, col:10> `-IntegerLiteral 0x873c3c8 <col:10> 'int' 0 If I do not handle it, I will not be able to reach the call of t1(). yaxunl: I can remove handling of templates and records. However I have to keep the handling of…
		rjmccallUnsubmitted Done Reply Inline Actions Sure, although I wonder if it might be more reasonable to just make UsedDeclVisitor walk into `CapturedDecl`s (and `BlockDecl`s) when it sees the corresponding statements/expressions. Unlike other declaration references, those are never "cross-references"; they're just local code tied to a declaration for representational reasons. rjmccall: Sure, although I wonder if it might be more reasonable to just make UsedDeclVisitor walk into…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions done yaxunl: done
Diags.emplace_back(Loc, S.PDiag(DiagID));		Diags.emplace_back(Loc, S.PDiag(DiagID));
break;		break;
}		}
}		}

Sema::DeviceDiagBuilder::DeviceDiagBuilder(DeviceDiagBuilder &&D)		Sema::DeviceDiagBuilder::DeviceDiagBuilder(DeviceDiagBuilder &&D)
: S(D.S), Loc(D.Loc), DiagID(D.DiagID), Fn(D.Fn),		: S(D.S), Loc(D.Loc), DiagID(D.DiagID), Fn(D.Fn),
ShowCallStack(D.ShowCallStack), ImmediateDiag(D.ImmediateDiag),		ShowCallStack(D.ShowCallStack), ImmediateDiag(D.ImmediateDiag),
PartialDiagId(D.PartialDiagId) {		PartialDiagId(D.PartialDiagId) {
// Clean the previous diagnostics.		// Clean the previous diagnostics.
D.ShowCallStack = false;		D.ShowCallStack = false;
D.ImmediateDiag.reset();		D.ImmediateDiag.reset();
D.PartialDiagId.reset();		D.PartialDiagId.reset();
}		}

Sema::DeviceDiagBuilder::~DeviceDiagBuilder() {		Sema::DeviceDiagBuilder::~DeviceDiagBuilder() {
if (ImmediateDiag) {		if (ImmediateDiag) {
// Emit our diagnostic and, if it was a warning or error, output a callstack		// Emit our diagnostic and, if it was a warning or error, output a callstack
// if Fn isn't a priori known-emitted.		// if Fn isn't a priori known-emitted.
		rjmccallUnsubmitted Done Reply Inline Actions Can there also be deferred diagnostics associated with this initializer? rjmccall: Can there also be deferred diagnostics associated with this initializer?
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Yes. A global variable may be marked by omp declare target directive to be emitted on device. If the global var is initialized with the address of a function, the function will be emitted on device. If the device function calls a host device function which contains a deferred diag, that diag will be emitted. This can only be known after everything is parsed. yaxunl: Yes. A global variable may be marked by omp declare target directive to be emitted on device.
		rjmccallUnsubmitted Done Reply Inline Actions I meant directly with the initializer. Is there a way today to defer a diagnostic that you would emit while processing an initializer expression? If so, this needs to trigger that. rjmccall: I meant directly with the initializer. Is there a way today to defer a diagnostic that you…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I don't think the initializer itself (without a target declare directive) will cause a deferred diagnostic since it does not cause change of emission states of functions. yaxunl: I don't think the initializer itself (without a target declare directive) will cause a deferred…
		rjmccallUnsubmitted Done Reply Inline Actions Okay, so if I'm getting this right: only functions are emitted lazily, and variables have to be marked specially in order to get emitted on the device, so there's no need to defer diagnostics within variable initializations because we always know at the time of processing the variable where it will be emitted? rjmccall: Okay, so if I'm getting this right: only functions are emitted lazily, and variables have to be…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions right. yaxunl: right.
bool IsWarningOrError = S.getDiagnostics().getDiagnosticLevel(		bool IsWarningOrError = S.getDiagnostics().getDiagnosticLevel(
DiagID, Loc) >= DiagnosticsEngine::Warning;		DiagID, Loc) >= DiagnosticsEngine::Warning;
ImmediateDiag.reset(); // Emit the immediate diag.		ImmediateDiag.reset(); // Emit the immediate diag.
if (IsWarningOrError && ShowCallStack)		if (IsWarningOrError && ShowCallStack)
emitCallStackNotes(S, Fn);		emitCallStackNotes(S, Fn);
} else {		} else {
assert((!PartialDiagId \|\| ShowCallStack) &&		assert((!PartialDiagId \|\| ShowCallStack) &&
"Must always show call stack for deferred diags.");		"Must always show call stack for deferred diags.");
}		}
}		}

// Indicate that this function (and thus everything it transtively calls) will
// be codegen'ed, and emit any deferred diagnostics on this function and its
// (transitive) callees.
void Sema::markKnownEmitted(
Sema &S, FunctionDecl OrigCaller, FunctionDecl OrigCallee,
SourceLocation OrigLoc,
const llvm::function_ref<bool(Sema &, FunctionDecl *)> IsKnownEmitted) {
// Nothing to do if we already know that FD is emitted.
if (IsKnownEmitted(S, OrigCallee)) {
assert(!S.DeviceCallGraph.count(OrigCallee));
return;
}

// We've just discovered that OrigCallee is known-emitted. Walk our call
// graph to see what else we can now discover also must be emitted.

struct CallInfo {
FunctionDecl *Caller;
FunctionDecl *Callee;
SourceLocation Loc;
};
llvm::SmallVector<CallInfo, 4> Worklist = {{OrigCaller, OrigCallee, OrigLoc}};
llvm::SmallSet<CanonicalDeclPtr<FunctionDecl>, 4> Seen;
Seen.insert(OrigCallee);
while (!Worklist.empty()) {
CallInfo C = Worklist.pop_back_val();
assert(!IsKnownEmitted(S, C.Callee) &&
"Worklist should not contain known-emitted functions.");
S.DeviceKnownEmittedFns[C.Callee] = {C.Caller, C.Loc};
emitDeferredDiags(S, C.Callee, C.Caller);

// If this is a template instantiation, explore its callgraph as well:
// Non-dependent calls are part of the template's callgraph, while dependent
// calls are part of to the instantiation's call graph.
if (auto *Templ = C.Callee->getPrimaryTemplate()) {
FunctionDecl *TemplFD = Templ->getAsFunction();
if (!Seen.count(TemplFD) && !S.DeviceKnownEmittedFns.count(TemplFD)) {
Seen.insert(TemplFD);
Worklist.push_back(
{/* Caller = / C.Caller, / Callee = */ TemplFD, C.Loc});
}
}

// Add all functions called by Callee to our worklist.
auto CGIt = S.DeviceCallGraph.find(C.Callee);
if (CGIt == S.DeviceCallGraph.end())
continue;

for (std::pair<CanonicalDeclPtr<FunctionDecl>, SourceLocation> FDLoc :
CGIt->second) {
FunctionDecl *NewCallee = FDLoc.first;
SourceLocation CallLoc = FDLoc.second;
if (Seen.count(NewCallee) \|\| IsKnownEmitted(S, NewCallee))
continue;
Seen.insert(NewCallee);
Worklist.push_back(
{/* Caller = / C.Callee, / Callee = */ NewCallee, CallLoc});
}

// C.Callee is now known-emitted, so we no longer need to maintain its list
// of callees in DeviceCallGraph.
S.DeviceCallGraph.erase(CGIt);
}
}

Sema::DeviceDiagBuilder Sema::targetDiag(SourceLocation Loc, unsigned DiagID) {		Sema::DeviceDiagBuilder Sema::targetDiag(SourceLocation Loc, unsigned DiagID) {
if (LangOpts.OpenMP)		if (LangOpts.OpenMP)
return LangOpts.OpenMPIsDevice ? diagIfOpenMPDeviceCode(Loc, DiagID)		return LangOpts.OpenMPIsDevice ? diagIfOpenMPDeviceCode(Loc, DiagID)
: diagIfOpenMPHostCode(Loc, DiagID);		: diagIfOpenMPHostCode(Loc, DiagID);
if (getLangOpts().CUDA)		if (getLangOpts().CUDA)
return getLangOpts().CUDAIsDevice ? CUDADiagIfDeviceCode(Loc, DiagID)		return getLangOpts().CUDAIsDevice ? CUDADiagIfDeviceCode(Loc, DiagID)
: CUDADiagIfHostCode(Loc, DiagID);		: CUDADiagIfHostCode(Loc, DiagID);
return DeviceDiagBuilder(DeviceDiagBuilder::K_Immediate, Loc, DiagID,		return DeviceDiagBuilder(DeviceDiagBuilder::K_Immediate, Loc, DiagID,
▲ Show 20 Lines • Show All 727 Lines • Show Last 20 Lines

clang/lib/Sema/SemaCUDA.cpp

Show First 20 Lines • Show All 668 Lines • ▼ Show 20 Lines	bool Sema::CheckCUDACall(SourceLocation Loc, FunctionDecl *Callee) {
FunctionDecl *Caller = dyn_cast<FunctionDecl>(CurContext);		FunctionDecl *Caller = dyn_cast<FunctionDecl>(CurContext);
if (!Caller)		if (!Caller)
return true;		return true;

// If the caller is known-emitted, mark the callee as known-emitted.		// If the caller is known-emitted, mark the callee as known-emitted.
// Otherwise, mark the call in our call graph so we can traverse it later.		// Otherwise, mark the call in our call graph so we can traverse it later.
bool CallerKnownEmitted =		bool CallerKnownEmitted =
getEmissionStatus(Caller) == FunctionEmissionStatus::Emitted;		getEmissionStatus(Caller) == FunctionEmissionStatus::Emitted;
if (CallerKnownEmitted) {
// Host-side references to a __global__ function refer to the stub, so the
// function itself is never emitted and therefore should not be marked.
if (!shouldIgnoreInHostDeviceCheck(Callee))
markKnownEmitted(
this, Caller, Callee, Loc, [](Sema &S, FunctionDecl FD) {
return S.getEmissionStatus(FD) == FunctionEmissionStatus::Emitted;
});
} else {
// If we have
// host fn calls kernel fn calls host+device,
// the HD function does not get instantiated on the host. We model this by
// omitting at the call to the kernel from the callgraph. This ensures
// that, when compiling for host, only HD functions actually called from the
// host get marked as known-emitted.
if (!shouldIgnoreInHostDeviceCheck(Callee))
DeviceCallGraph[Caller].insert({Callee, Loc});
}

DeviceDiagBuilder::Kind DiagKind = [this, Caller, Callee,		DeviceDiagBuilder::Kind DiagKind = [this, Caller, Callee,
CallerKnownEmitted] {		CallerKnownEmitted] {
switch (IdentifyCUDAPreference(Caller, Callee)) {		switch (IdentifyCUDAPreference(Caller, Callee)) {
case CFP_Never:		case CFP_Never:
return DeviceDiagBuilder::K_Immediate;		return DeviceDiagBuilder::K_Immediate;
case CFP_WrongSide:		case CFP_WrongSide:
assert(Caller && "WrongSide calls require a non-null caller");		assert(Caller && "WrongSide calls require a non-null caller");
// If we know the caller will be emitted, we know this wrong-side call		// If we know the caller will be emitted, we know this wrong-side call
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

clang/lib/Sema/SemaDecl.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 4,430 Lines • ▼ Show 20 Lines
	//	//
	LLVM_FALLTHROUGH;	LLVM_FALLTHROUGH;

	case VarDecl::DeclarationOnly:	case VarDecl::DeclarationOnly:
	// It's only a declaration.	// It's only a declaration.

	// Block scope. C99 6.7p7: If an identifier for an object is	// Block scope. C99 6.7p7: If an identifier for an object is
	// declared with no linkage (C99 6.2.2p6), the type for the	// declared with no linkage (C99 6.2.2p6), the type for the
	// object shall be complete.	// object shall be complete.
		rjmccallUnsubmitted Done Reply Inline Actions `DeclsToCheckForDeferredDiags` is basically a set of declarations that you know to have to emit, right? It doesn't seem right to be adding every variable with an initializer to that set — especially because I'm pretty sure this function gets called for literally every variable with an initializer, including local variables. Presumably you only need to do this for global variables that you're definitely going to emit in the current mode. rjmccall: `DeclsToCheckForDeferredDiags` is basically a set of declarations that you know to have to emit…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions Yes we only need to check global variables. Fixed. yaxunl: Yes we only need to check global variables. Fixed.
	if (!Type->isDependentType() && Var->isLocalVarDecl() &&	if (!Type->isDependentType() && Var->isLocalVarDecl() &&
	!Var->hasLinkage() && !Var->isInvalidDecl() &&	!Var->hasLinkage() && !Var->isInvalidDecl() &&
	RequireCompleteType(Var->getLocation(), Type,	RequireCompleteType(Var->getLocation(), Type,
	diag::err_typecheck_decl_incomplete_type))	diag::err_typecheck_decl_incomplete_type))
	Var->setInvalidDecl();	Var->setInvalidDecl();

	// Make sure that the type is not abstract.	// Make sure that the type is not abstract.
	if (!Type->isDependentType() && !Var->isInvalidDecl() &&	if (!Type->isDependentType() && !Var->isInvalidDecl() &&
	▲ Show 20 Lines • Show All 5,544 Lines • ▼ Show 20 Lines
	std::pair<IdentifierInfo*,WeakInfo>(AliasName, W));	std::pair<IdentifierInfo*,WeakInfo>(AliasName, W));
	}	}
	}	}

	Decl *Sema::getObjCDeclContext() const {	Decl *Sema::getObjCDeclContext() const {
	return (dyn_cast_or_null<ObjCContainerDecl>(CurContext));	return (dyn_cast_or_null<ObjCContainerDecl>(CurContext));
	}	}

	Sema::FunctionEmissionStatus Sema::getEmissionStatus(FunctionDecl *FD) {	Sema::FunctionEmissionStatus Sema::getEmissionStatus(FunctionDecl *FD,
		bool Final) {
	// Templates are emitted when they're instantiated.	// Templates are emitted when they're instantiated.
	if (FD->isDependentContext())	if (FD->isDependentContext())
	return FunctionEmissionStatus::TemplateDiscarded;	return FunctionEmissionStatus::TemplateDiscarded;

	FunctionEmissionStatus OMPES = FunctionEmissionStatus::Unknown;	FunctionEmissionStatus OMPES = FunctionEmissionStatus::Unknown;
	if (LangOpts.OpenMPIsDevice) {	if (LangOpts.OpenMPIsDevice) {
	Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =	Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =
	OMPDeclareTargetDeclAttr::getDeviceType(FD->getCanonicalDecl());	OMPDeclareTargetDeclAttr::getDeviceType(FD->getCanonicalDecl());
	if (DevTy.hasValue()) {	if (DevTy.hasValue()) {
	if (*DevTy == OMPDeclareTargetDeclAttr::DT_Host)	if (*DevTy == OMPDeclareTargetDeclAttr::DT_Host)
	OMPES = FunctionEmissionStatus::OMPDiscarded;	OMPES = FunctionEmissionStatus::OMPDiscarded;
	else if (DeviceKnownEmittedFns.count(FD) > 0)	else if (*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost \|\|
		*DevTy == OMPDeclareTargetDeclAttr::DT_Any) {
	OMPES = FunctionEmissionStatus::Emitted;	OMPES = FunctionEmissionStatus::Emitted;
		}
	}	}
	} else if (LangOpts.OpenMP) {	} else if (LangOpts.OpenMP) {
	// In OpenMP 4.5 all the functions are host functions.	// In OpenMP 4.5 all the functions are host functions.
	if (LangOpts.OpenMP <= 45) {	if (LangOpts.OpenMP <= 45) {
	OMPES = FunctionEmissionStatus::Emitted;	OMPES = FunctionEmissionStatus::Emitted;
	} else {	} else {
	Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =	Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =
	OMPDeclareTargetDeclAttr::getDeviceType(FD->getCanonicalDecl());	OMPDeclareTargetDeclAttr::getDeviceType(FD->getCanonicalDecl());
	// In OpenMP 5.0 or above, DevTy may be changed later by	// In OpenMP 5.0 or above, DevTy may be changed later by
	// #pragma omp declare target to() device_type(). Therefore DevTy	// #pragma omp declare target to() device_type(). Therefore DevTy
	// having no value does not imply host. The emission status will be	// having no value does not imply host. The emission status will be
	// checked again at the end of compilation unit.	// checked again at the end of compilation unit.
	if (DevTy.hasValue()) {	if (DevTy.hasValue()) {
	if (*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {	if (*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {
	OMPES = FunctionEmissionStatus::OMPDiscarded;	OMPES = FunctionEmissionStatus::OMPDiscarded;
	} else if (DeviceKnownEmittedFns.count(FD) > 0) {	} else if (*DevTy == OMPDeclareTargetDeclAttr::DT_Host \|\|
		*DevTy == OMPDeclareTargetDeclAttr::DT_Any)
	OMPES = FunctionEmissionStatus::Emitted;	OMPES = FunctionEmissionStatus::Emitted;
	}	} else if (Final)
	}	OMPES = FunctionEmissionStatus::Emitted;
	}	}
	}	}
	if (OMPES == FunctionEmissionStatus::OMPDiscarded \|\|	if (OMPES == FunctionEmissionStatus::OMPDiscarded \|\|
	(OMPES == FunctionEmissionStatus::Emitted && !LangOpts.CUDA))	(OMPES == FunctionEmissionStatus::Emitted && !LangOpts.CUDA))
	return OMPES;	return OMPES;

	if (LangOpts.CUDA) {	if (LangOpts.CUDA) {
	// When compiling for device, host functions are never emitted. Similarly,	// When compiling for device, host functions are never emitted. Similarly,
	Show All 18 Lines
	if (Def &&	if (Def &&
	!isDiscardableGVALinkage(getASTContext().GetGVALinkageForFunction(Def))	!isDiscardableGVALinkage(getASTContext().GetGVALinkageForFunction(Def))
	&& (!LangOpts.OpenMP \|\| OMPES == FunctionEmissionStatus::Emitted))	&& (!LangOpts.OpenMP \|\| OMPES == FunctionEmissionStatus::Emitted))
	return FunctionEmissionStatus::Emitted;	return FunctionEmissionStatus::Emitted;
	}	}

	// Otherwise, the function is known-emitted if it's in our set of	// Otherwise, the function is known-emitted if it's in our set of
	// known-emitted functions.	// known-emitted functions.
	return (DeviceKnownEmittedFns.count(FD) > 0)	return FunctionEmissionStatus::Unknown;
	? FunctionEmissionStatus::Emitted
	: FunctionEmissionStatus::Unknown;
	}	}

	bool Sema::shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee) {	bool Sema::shouldIgnoreInHostDeviceCheck(FunctionDecl *Callee) {
	// Host-side references to a __global__ function refer to the stub, so the	// Host-side references to a __global__ function refer to the stub, so the
	// function itself is never emitted and therefore should not be marked.	// function itself is never emitted and therefore should not be marked.
	// If we have host fn calls kernel fn calls host+device, the HD function	// If we have host fn calls kernel fn calls host+device, the HD function
	// does not get instantiated on the host. We model this by omitting at the	// does not get instantiated on the host. We model this by omitting at the
	// call to the kernel from the callgraph. This ensures that, when compiling	// call to the kernel from the callgraph. This ensures that, when compiling
	// for host, only HD functions actually called from the host get marked as	// for host, only HD functions actually called from the host get marked as
	// known-emitted.	// known-emitted.
	return LangOpts.CUDA && !LangOpts.CUDAIsDevice &&	return LangOpts.CUDA && !LangOpts.CUDAIsDevice &&
	IdentifyCUDATarget(Callee) == CFT_Global;	IdentifyCUDATarget(Callee) == CFT_Global;
	}	}
Context not available.

clang/lib/Sema/SemaExpr.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 9,991 Lines • ▼ Show 20 Lines
	// pack into the name. Computing the size of the parameters requires the			// pack into the name. Computing the size of the parameters requires the
	// parameter types to be complete. Check that now.			// parameter types to be complete. Check that now.
	if (funcHasParameterSizeMangling(*this, Func))			if (funcHasParameterSizeMangling(*this, Func))
	CheckCompleteParameterTypesForMangler(*this, Func, Loc);			CheckCompleteParameterTypesForMangler(*this, Func, Loc);

	Func->markUsed(Context);			Func->markUsed(Context);
	}			}

	if (LangOpts.OpenMP) {			if (LangOpts.OpenMP)
	markOpenMPDeclareVariantFuncsReferenced(Loc, Func, MightBeOdrUse);			markOpenMPDeclareVariantFuncsReferenced(Loc, Func, MightBeOdrUse);
	if (LangOpts.OpenMPIsDevice)
	checkOpenMPDeviceFunction(Loc, Func);
	else
	checkOpenMPHostFunction(Loc, Func);
	}
	}			}

	/// Directly mark a variable odr-used. Given a choice, prefer to use			/// Directly mark a variable odr-used. Given a choice, prefer to use
	/// MarkVariableReferenced since it does additional checks and then			/// MarkVariableReferenced since it does additional checks and then
	/// calls MarkVarDeclODRUsed.			/// calls MarkVarDeclODRUsed.
	/// If the variable must be captured:			/// If the variable must be captured:
	/// - if FunctionScopeIndexToStopAt is null, capture it in the CurContext			/// - if FunctionScopeIndexToStopAt is null, capture it in the CurContext
	/// - else capture it in the DeclContext that maps to the			/// - else capture it in the DeclContext that maps to the
	▲ Show 20 Lines • Show All 1,381 Lines • ▼ Show 20 Lines

	public:			public:
	typedef EvaluatedExprVisitor<EvaluatedExprMarker> Inherited;			typedef EvaluatedExprVisitor<EvaluatedExprMarker> Inherited;

	EvaluatedExprMarker(Sema &S, bool SkipLocalVariables)			EvaluatedExprMarker(Sema &S, bool SkipLocalVariables)
	: Inherited(S.Context), S(S), SkipLocalVariables(SkipLocalVariables) { }			: Inherited(S.Context), S(S), SkipLocalVariables(SkipLocalVariables) { }

	void VisitDeclRefExpr(DeclRefExpr *E) {			void VisitDeclRefExpr(DeclRefExpr *E) {
	// If we were asked not to visit local variables, don't.			// If we were asked not to visit local variables, don't.
				rjmccallUnsubmitted Done Reply Inline Actions This should inherit from `EvaluatedExprVisitor<Derived>`, or else calls from `EvaluatedExprVisitor` and above won't dispatch all the way down to the subclass. This will allow subclasses to do node-specific logic, like your subclass's handling of `InOMPDeviceContext` or `EvaluatedExprMarker`'s need to do custom things with local variables, DREs, and MEs. Please also define this in a header; it doesn't need to be file-specific. I guess it needs a `Sema &` because of the call to `LookupDestructor`, so `lib/Sema` is probably the right place for that header. rjmccall: This should inherit from `EvaluatedExprVisitor<Derived>`, or else calls from…
	if (SkipLocalVariables) {			if (SkipLocalVariables) {
	if (VarDecl *VD = dyn_cast<VarDecl>(E->getDecl()))			if (VarDecl *VD = dyn_cast<VarDecl>(E->getDecl()))
	if (VD->hasLocalStorage())			if (VD->hasLocalStorage())
	return;			return;
	}			}

	S.MarkDeclRefReferenced(E);			S.MarkDeclRefReferenced(E);
	}			}

	void VisitMemberExpr(MemberExpr *E) {			void VisitMemberExpr(MemberExpr *E) {
	S.MarkMemberReferenced(E);			S.MarkMemberReferenced(E);
	Inherited::VisitMemberExpr(E);			Inherited::VisitMemberExpr(E);
	}			}

	void VisitCXXBindTemporaryExpr(CXXBindTemporaryExpr *E) {			void VisitCXXBindTemporaryExpr(CXXBindTemporaryExpr *E) {
	S.MarkFunctionReferenced(			S.MarkFunctionReferenced(
	E->getBeginLoc(),			E->getBeginLoc(),
	const_cast<CXXDestructorDecl *>(E->getTemporary()->getDestructor()));			const_cast<CXXDestructorDecl *>(E->getTemporary()->getDestructor()));
	Visit(E->getSubExpr());			Visit(E->getSubExpr());
	}			}

	void VisitCXXNewExpr(CXXNewExpr *E) {			void VisitCXXNewExpr(CXXNewExpr *E) {
	if (E->getOperatorNew())			if (E->getOperatorNew())
	S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorNew());			S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorNew());
	if (E->getOperatorDelete())			if (E->getOperatorDelete())
				rjmccallUnsubmitted Done Reply Inline Actions Let's not have both a `visitDeclRefExpr` and a `VisitDeclRefExpr`, distinguished only by capitalization. rjmccall: Let's not have both a `visitDeclRefExpr` and a `VisitDeclRefExpr`, distinguished only by…
	S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorDelete());			S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorDelete());
	Inherited::VisitCXXNewExpr(E);			Inherited::VisitCXXNewExpr(E);
	}			}

	void VisitCXXDeleteExpr(CXXDeleteExpr *E) {			void VisitCXXDeleteExpr(CXXDeleteExpr *E) {
	if (E->getOperatorDelete())			if (E->getOperatorDelete())
				rjmccallUnsubmitted Done Reply Inline Actions Please have all these call sites call `asImpl().visitUsedDecl` directly, and then don't define it in this class. rjmccall: Please have all these call sites call `asImpl().visitUsedDecl` directly, and then don't define…
	S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorDelete());			S.MarkFunctionReferenced(E->getBeginLoc(), E->getOperatorDelete());
	QualType Destroyed = S.Context.getBaseElementType(E->getDestroyedType());			QualType Destroyed = S.Context.getBaseElementType(E->getDestroyedType());
	if (const RecordType *DestroyedRec = Destroyed->getAs<RecordType>()) {			if (const RecordType *DestroyedRec = Destroyed->getAs<RecordType>()) {
	CXXRecordDecl *Record = cast<CXXRecordDecl>(DestroyedRec->getDecl());			CXXRecordDecl *Record = cast<CXXRecordDecl>(DestroyedRec->getDecl());
	S.MarkFunctionReferenced(E->getBeginLoc(), S.LookupDestructor(Record));			S.MarkFunctionReferenced(E->getBeginLoc(), S.LookupDestructor(Record));
	}			}

	Inherited::VisitCXXDeleteExpr(E);			Inherited::VisitCXXDeleteExpr(E);
	}			}

	void VisitCXXConstructExpr(CXXConstructExpr *E) {			void VisitCXXConstructExpr(CXXConstructExpr *E) {
	S.MarkFunctionReferenced(E->getBeginLoc(), E->getConstructor());			S.MarkFunctionReferenced(E->getBeginLoc(), E->getConstructor());
	Inherited::VisitCXXConstructExpr(E);			Inherited::VisitCXXConstructExpr(E);
	}			}

	void VisitCXXDefaultArgExpr(CXXDefaultArgExpr *E) {			void VisitCXXDefaultArgExpr(CXXDefaultArgExpr *E) {
	Visit(E->getExpr());			Visit(E->getExpr());
	}			}
	};			};
	}
				/// Helper class that emits deferred diagnostic messages if an entity directly
				/// or indirectly using the function that causes the deferred diagnostic
				/// messages is known to be emitted.
				class DeferredDiagnosticsEmitter
				: public EvaluatedExprVisitor<DeferredDiagnosticsEmitter> {
				rjmccallUnsubmitted Done Reply Inline Actions Is there any way to share most of the visitation logic here with the visitor we use in `MarkDeclarationsUsedInExpr`? Maybe make a `UsedDeclVisitor` CRTP class that calls a "asImpl().visitUsedDecl(SourceLocation Loc, Decl D)" in the right places? rjmccall:* Is there any way to share most of the visitation logic here with the visitor we use in…
				Sema &S;
				llvm::SmallSet<CanonicalDeclPtr<Decl>, 4> Visited;
				llvm::SmallVector<CanonicalDeclPtr<FunctionDecl>, 4> UseStack;
				bool ShouldEmit;
				unsigned InOMPDeviceContext;

				public:
				typedef EvaluatedExprVisitor<DeferredDiagnosticsEmitter> Inherited;

				DeferredDiagnosticsEmitter(Sema &S)
				: Inherited(S.Context), S(S), ShouldEmit(false),
				InOMPDeviceContext(0) {}
				rjmccallUnsubmitted Done Reply Inline Actions This should be in your OMP-specific subclass. rjmccall: This should be in your OMP-specific subclass.

				void VisitDeclRefExpr(DeclRefExpr *E) {
				if (FunctionDecl *FD = dyn_cast<FunctionDecl>(E->getDecl())) {
				VisitDecl(E->getLocation(), FD);
				}
				}

				void VisitMemberExpr(MemberExpr *E) {
				if (FunctionDecl *FD = dyn_cast<FunctionDecl>(E->getMemberDecl()))
				VisitDecl(E->getMemberLoc(), FD);
				Inherited::VisitMemberExpr(E);
				}

				void VisitCXXBindTemporaryExpr(CXXBindTemporaryExpr *E) {
				VisitDecl(E->getBeginLoc(), const_cast<CXXDestructorDecl *>(
				E->getTemporary()->getDestructor()));
				Visit(E->getSubExpr());
				}

				void VisitCXXNewExpr(CXXNewExpr *E) {
				if (E->getOperatorNew()) {
				VisitDecl(E->getBeginLoc(), E->getOperatorNew());
				}
				if (E->getOperatorDelete()) {
				VisitDecl(E->getBeginLoc(), E->getOperatorDelete());
				}
				Inherited::VisitCXXNewExpr(E);
				}

				void VisitCXXDeleteExpr(CXXDeleteExpr *E) {
				if (E->getOperatorDelete()) {
				VisitDecl(E->getBeginLoc(), E->getOperatorDelete());
				}
				QualType Destroyed = S.Context.getBaseElementType(E->getDestroyedType());
				if (const RecordType *DestroyedRec = Destroyed->getAs<RecordType>()) {
				CXXRecordDecl *Record = cast<CXXRecordDecl>(DestroyedRec->getDecl());
				VisitDecl(E->getBeginLoc(), S.LookupDestructor(Record));
				}
				Inherited::VisitCXXDeleteExpr(E);
				}

				void VisitCXXConstructExpr(CXXConstructExpr *E) {
				VisitDecl(E->getBeginLoc(), E->getConstructor());
				Inherited::VisitCXXConstructExpr(E);
				}

				void VisitCXXDefaultArgExpr(CXXDefaultArgExpr *E) { Visit(E->getExpr()); }

				void VisitOMPTargetDirective(OMPTargetDirective *Node) {
				++InOMPDeviceContext;
				Inherited::VisitOMPTargetDirective(Node);
				--InOMPDeviceContext;
				}

				void VisitCapturedStmt(CapturedStmt *Node) {
				VisitDecl(Node->getBeginLoc(), Node->getCapturedDecl());
				Inherited::VisitCapturedStmt(Node);
				}

				rjmccallUnsubmitted Done Reply Inline Actions Thanks, this looks a lot better. Should this be moved to SemaOpenMP.cpp (and renamed to be OpenMP-specific), or do you think it's going to be useful in other modes? rjmccall: Thanks, this looks a lot better. Should this be moved to SemaOpenMP.cpp (and renamed to be…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions It is not just for OpenMP. Deferred diagnostics are also emitted by CUDA/HIP. yaxunl: It is not just for OpenMP. Deferred diagnostics are also emitted by CUDA/HIP.
				rjmccallUnsubmitted Done Reply Inline Actions Okay. Can it go in Sema.cpp next to the other overload of `emitDeferredDiags`, then? There isn't really much purpose to it being in this file. rjmccall: Okay. Can it go in Sema.cpp next to the other overload of `emitDeferredDiags`, then? There…
				yaxunlAuthorUnsubmitted Done Reply Inline Actions will do when committing. thanks. yaxunl: will do when committing. thanks.
				void VisitDecl(SourceLocation Loc, Decl *D) {
				if (auto *TD = dyn_cast<TranslationUnitDecl>(D)) {
				for (auto *DD : TD->decls()) {
				VisitDecl(Loc, DD);
				}
				} else if (auto *FTD = dyn_cast<FunctionTemplateDecl>(D)) {
				for (auto *DD : FTD->specializations()) {
				VisitDecl(Loc, DD);
				}
				} else if (auto *FD = dyn_cast<FunctionDecl>(D)) {
				FunctionDecl *Caller = UseStack.empty() ? nullptr : UseStack.back();
				auto IsKnownEmitted = S.getEmissionStatus(FD, /Final=/true) ==
				Sema::FunctionEmissionStatus::Emitted;
				if (!Caller)
				ShouldEmit = IsKnownEmitted;
				if ((!ShouldEmit && !S.getLangOpts().OpenMP && !Caller) \|\|
				S.shouldIgnoreInHostDeviceCheck(FD) \|\| Visited.count(D))
				return;
				// Finalize analysis of OpenMP-specific constructs.
				if (Caller && S.LangOpts.OpenMP && UseStack.size() == 1)
				S.finalizeOpenMPDelayedAnalysis(Caller, FD, Loc);
				if (Caller)
				S.DeviceKnownEmittedFns[FD] = {Caller, Loc};
				if (ShouldEmit \|\| InOMPDeviceContext)
				S.emitDeferredDiags(FD, Caller);
				Visited.insert(D);
				UseStack.push_back(FD);
				if (auto *S = FD->getBody()) {
				Visit(S);
				}
				UseStack.pop_back();
				Visited.erase(D);
				} else if (auto *RD = dyn_cast<RecordDecl>(D)) {
				for (auto *DD : RD->decls()) {
				VisitDecl(Loc, DD);
				}
				} else if (auto *CD = dyn_cast<CapturedDecl>(D)) {
				if (auto *S = CD->getBody()) {
				Visit(S);
				}
				} else if (auto *VD = dyn_cast<VarDecl>(D)) {
				if (auto *Init = VD->getInit()) {
				auto DevTy = OMPDeclareTargetDeclAttr::getDeviceType(VD);
				if (DevTy && (*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost \|\|
				*DevTy == OMPDeclareTargetDeclAttr::DT_Any))
				++InOMPDeviceContext;
				Visit(Init);
				--InOMPDeviceContext;
				}
				}
				}
				};
				} // namespace

				void Sema::emitDeferredDiags() {
				if (DeviceDeferredDiags.empty() && !LangOpts.OpenMP)
				return;

				DeferredDiagnosticsEmitter(*this).VisitDecl(
				SourceLocation(), Context.getTranslationUnitDecl());
				}

	/// Mark any declarations that appear within this expression or any			/// Mark any declarations that appear within this expression or any
	/// potentially-evaluated subexpressions as "referenced".			/// potentially-evaluated subexpressions as "referenced".
	///			///
	/// \param SkipLocalVariables If true, don't mark local variables as			/// \param SkipLocalVariables If true, don't mark local variables as
	/// 'referenced'.			/// 'referenced'.
	void Sema::MarkDeclarationsReferencedInExpr(Expr *E,			void Sema::MarkDeclarationsReferencedInExpr(Expr *E,
	bool SkipLocalVariables) {			bool SkipLocalVariables) {
	▲ Show 20 Lines • Show All 953 Lines • Show Last 20 Lines

clang/lib/Sema/SemaOpenMP.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,679 Lines • ▼ Show 20 Lines	Sema::DeviceDiagBuilder Sema::diagIfOpenMPHostCode(SourceLocation Loc,
case FunctionEmissionStatus::CUDADiscarded:		case FunctionEmissionStatus::CUDADiscarded:
Kind = DeviceDiagBuilder::K_Nop;		Kind = DeviceDiagBuilder::K_Nop;
break;		break;
}		}

return DeviceDiagBuilder(Kind, Loc, DiagID, getCurFunctionDecl(), *this);		return DeviceDiagBuilder(Kind, Loc, DiagID, getCurFunctionDecl(), *this);
}		}

void Sema::checkOpenMPDeviceFunction(SourceLocation Loc, FunctionDecl *Callee,
bool CheckForDelayedContext) {
assert(LangOpts.OpenMP && LangOpts.OpenMPIsDevice &&
"Expected OpenMP device compilation.");
assert(Callee && "Callee may not be null.");
Callee = Callee->getMostRecentDecl();
FunctionDecl *Caller = getCurFunctionDecl();

// host only function are not available on the device.
if (Caller) {
FunctionEmissionStatus CallerS = getEmissionStatus(Caller);
FunctionEmissionStatus CalleeS = getEmissionStatus(Callee);
assert(CallerS != FunctionEmissionStatus::CUDADiscarded &&
CalleeS != FunctionEmissionStatus::CUDADiscarded &&
"CUDADiscarded unexpected in OpenMP device function check");
if ((CallerS == FunctionEmissionStatus::Emitted \|\|
(!isOpenMPDeviceDelayedContext(*this) &&
CallerS == FunctionEmissionStatus::Unknown)) &&
CalleeS == FunctionEmissionStatus::OMPDiscarded) {
StringRef HostDevTy = getOpenMPSimpleClauseTypeName(
OMPC_device_type, OMPC_DEVICE_TYPE_host);
Diag(Loc, diag::err_omp_wrong_device_function_call) << HostDevTy << 0;
Diag(Callee->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),
diag::note_omp_marked_device_type_here)
<< HostDevTy;
return;
}
}
// If the caller is known-emitted, mark the callee as known-emitted.
// Otherwise, mark the call in our call graph so we can traverse it later.
if ((CheckForDelayedContext && !isOpenMPDeviceDelayedContext(*this)) \|\|
(!Caller && !CheckForDelayedContext) \|\|
(Caller && getEmissionStatus(Caller) == FunctionEmissionStatus::Emitted))
markKnownEmitted(*this, Caller, Callee, Loc,
[CheckForDelayedContext](Sema &S, FunctionDecl *FD) {
return CheckForDelayedContext &&
S.getEmissionStatus(FD) ==
FunctionEmissionStatus::Emitted;
});
else if (Caller)
DeviceCallGraph[Caller].insert({Callee, Loc});
}

void Sema::checkOpenMPHostFunction(SourceLocation Loc, FunctionDecl *Callee,
bool CheckCaller) {
assert(LangOpts.OpenMP && !LangOpts.OpenMPIsDevice &&
"Expected OpenMP host compilation.");
assert(Callee && "Callee may not be null.");
Callee = Callee->getMostRecentDecl();
FunctionDecl *Caller = getCurFunctionDecl();

// device only function are not available on the host.
if (Caller) {
FunctionEmissionStatus CallerS = getEmissionStatus(Caller);
FunctionEmissionStatus CalleeS = getEmissionStatus(Callee);
assert(
(LangOpts.CUDA \|\| (CallerS != FunctionEmissionStatus::CUDADiscarded &&
CalleeS != FunctionEmissionStatus::CUDADiscarded)) &&
"CUDADiscarded unexpected in OpenMP host function check");
if (CallerS == FunctionEmissionStatus::Emitted &&
CalleeS == FunctionEmissionStatus::OMPDiscarded) {
StringRef NoHostDevTy = getOpenMPSimpleClauseTypeName(
OMPC_device_type, OMPC_DEVICE_TYPE_nohost);
Diag(Loc, diag::err_omp_wrong_device_function_call) << NoHostDevTy << 1;
Diag(Callee->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),
diag::note_omp_marked_device_type_here)
<< NoHostDevTy;
return;
}
}
// If the caller is known-emitted, mark the callee as known-emitted.
// Otherwise, mark the call in our call graph so we can traverse it later.
if (!shouldIgnoreInHostDeviceCheck(Callee)) {
if ((!CheckCaller && !Caller) \|\|
(Caller &&
getEmissionStatus(Caller) == FunctionEmissionStatus::Emitted))
markKnownEmitted(
this, Caller, Callee, Loc, [CheckCaller](Sema &S, FunctionDecl FD) {
return CheckCaller &&
S.getEmissionStatus(FD) == FunctionEmissionStatus::Emitted;
});
else if (Caller)
DeviceCallGraph[Caller].insert({Callee, Loc});
}
}

void Sema::checkOpenMPDeviceExpr(const Expr *E) {		void Sema::checkOpenMPDeviceExpr(const Expr *E) {
assert(getLangOpts().OpenMP && getLangOpts().OpenMPIsDevice &&		assert(getLangOpts().OpenMP && getLangOpts().OpenMPIsDevice &&
"OpenMP device compilation mode is expected.");		"OpenMP device compilation mode is expected.");
QualType Ty = E->getType();		QualType Ty = E->getType();
if ((Ty->isFloat16Type() && !Context.getTargetInfo().hasFloat16Type()) \|\|		if ((Ty->isFloat16Type() && !Context.getTargetInfo().hasFloat16Type()) \|\|
((Ty->isFloat128Type() \|\|		((Ty->isFloat128Type() \|\|
(Ty->isRealFloatingType() && Context.getTypeSize(Ty) == 128)) &&		(Ty->isRealFloatingType() && Context.getTypeSize(Ty) == 128)) &&
!Context.getTargetInfo().hasFloat128Type()) \|\|		!Context.getTargetInfo().hasFloat128Type()) \|\|
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	bool Sema::isOpenMPTargetCapturedDecl(const ValueDecl *D,
const auto *VD = dyn_cast<VarDecl>(D);		const auto *VD = dyn_cast<VarDecl>(D);
return VD && !VD->hasLocalStorage() &&		return VD && !VD->hasLocalStorage() &&
DSAStack->hasExplicitDirective(isOpenMPTargetExecutionDirective,		DSAStack->hasExplicitDirective(isOpenMPTargetExecutionDirective,
Level);		Level);
}		}

void Sema::DestroyDataSharingAttributesStack() { delete DSAStack; }		void Sema::DestroyDataSharingAttributesStack() { delete DSAStack; }

void Sema::finalizeOpenMPDelayedAnalysis() {		void Sema::finalizeOpenMPDelayedAnalysis(const FunctionDecl *Caller,
		const FunctionDecl *Callee,
		SourceLocation Loc) {
assert(LangOpts.OpenMP && "Expected OpenMP compilation mode.");		assert(LangOpts.OpenMP && "Expected OpenMP compilation mode.");
// Diagnose implicit declare target functions and their callees.
for (const auto &CallerCallees : DeviceCallGraph) {
Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =		Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =
OMPDeclareTargetDeclAttr::getDeviceType(		OMPDeclareTargetDeclAttr::getDeviceType(Caller->getMostRecentDecl());
CallerCallees.getFirst()->getMostRecentDecl());
// Ignore host functions during device analyzis.		// Ignore host functions during device analyzis.
if (LangOpts.OpenMPIsDevice && DevTy &&		if (LangOpts.OpenMPIsDevice && DevTy &&
*DevTy == OMPDeclareTargetDeclAttr::DT_Host)		*DevTy == OMPDeclareTargetDeclAttr::DT_Host)
continue;		return;
// Ignore nohost functions during host analyzis.		// Ignore nohost functions during host analyzis.
if (!LangOpts.OpenMPIsDevice && DevTy &&		if (!LangOpts.OpenMPIsDevice && DevTy &&
*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost)		*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost)
continue;		return;
for (const std::pair<CanonicalDeclPtr<FunctionDecl>, SourceLocation>		const FunctionDecl *FD = Callee->getMostRecentDecl();
&Callee : CallerCallees.getSecond()) {		DevTy = OMPDeclareTargetDeclAttr::getDeviceType(FD);
const FunctionDecl *FD = Callee.first->getMostRecentDecl();
Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =
OMPDeclareTargetDeclAttr::getDeviceType(FD);
if (LangOpts.OpenMPIsDevice && DevTy &&		if (LangOpts.OpenMPIsDevice && DevTy &&
*DevTy == OMPDeclareTargetDeclAttr::DT_Host) {		*DevTy == OMPDeclareTargetDeclAttr::DT_Host) {
// Diagnose host function called during device codegen.		// Diagnose host function called during device codegen.
StringRef HostDevTy = getOpenMPSimpleClauseTypeName(		StringRef HostDevTy =
OMPC_device_type, OMPC_DEVICE_TYPE_host);		getOpenMPSimpleClauseTypeName(OMPC_device_type, OMPC_DEVICE_TYPE_host);
Diag(Callee.second, diag::err_omp_wrong_device_function_call)		Diag(Loc, diag::err_omp_wrong_device_function_call) << HostDevTy << 0;
<< HostDevTy << 0;
Diag(FD->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),		Diag(FD->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),
diag::note_omp_marked_device_type_here)		diag::note_omp_marked_device_type_here)
<< HostDevTy;		<< HostDevTy;
continue;		return;
}		}
if (!LangOpts.OpenMPIsDevice && DevTy &&		if (!LangOpts.OpenMPIsDevice && DevTy &&
*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {		*DevTy == OMPDeclareTargetDeclAttr::DT_NoHost) {
// Diagnose nohost function called during host codegen.		// Diagnose nohost function called during host codegen.
StringRef NoHostDevTy = getOpenMPSimpleClauseTypeName(		StringRef NoHostDevTy = getOpenMPSimpleClauseTypeName(
OMPC_device_type, OMPC_DEVICE_TYPE_nohost);		OMPC_device_type, OMPC_DEVICE_TYPE_nohost);
Diag(Callee.second, diag::err_omp_wrong_device_function_call)		Diag(Loc, diag::err_omp_wrong_device_function_call) << NoHostDevTy << 1;
<< NoHostDevTy << 1;
Diag(FD->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),		Diag(FD->getAttr<OMPDeclareTargetDeclAttr>()->getLocation(),
diag::note_omp_marked_device_type_here)		diag::note_omp_marked_device_type_here)
<< NoHostDevTy;		<< NoHostDevTy;
continue;
}
}
}		}
}		}

void Sema::StartOpenMPDSABlock(OpenMPDirectiveKind DKind,		void Sema::StartOpenMPDSABlock(OpenMPDirectiveKind DKind,
const DeclarationNameInfo &DirName,		const DeclarationNameInfo &DirName,
Scope *CurScope, SourceLocation Loc) {		Scope *CurScope, SourceLocation Loc) {
DSAStack->push(DKind, DirName, CurScope, Loc);		DSAStack->push(DKind, DirName, CurScope, Loc);
PushExpressionEvaluationContext(		PushExpressionEvaluationContext(
ExpressionEvaluationContext::PotentiallyEvaluated);		ExpressionEvaluationContext::PotentiallyEvaluated);
▲ Show 20 Lines • Show All 14,729 Lines • ▼ Show 20 Lines	void Sema::checkDeclIsAllowedInOpenMPTarget(Expr E, Decl D,
if (auto *FD = dyn_cast<FunctionDecl>(D)) {		if (auto *FD = dyn_cast<FunctionDecl>(D)) {
llvm::Optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =		llvm::Optional<OMPDeclareTargetDeclAttr::MapTypeTy> Res =
OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(FD);		OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(FD);
if (IdLoc.isValid() && Res && *Res == OMPDeclareTargetDeclAttr::MT_Link) {		if (IdLoc.isValid() && Res && *Res == OMPDeclareTargetDeclAttr::MT_Link) {
Diag(IdLoc, diag::err_omp_function_in_link_clause);		Diag(IdLoc, diag::err_omp_function_in_link_clause);
Diag(FD->getLocation(), diag::note_defined_here) << FD;		Diag(FD->getLocation(), diag::note_defined_here) << FD;
return;		return;
}		}
// Mark the function as must be emitted for the device.
Optional<OMPDeclareTargetDeclAttr::DevTypeTy> DevTy =
OMPDeclareTargetDeclAttr::getDeviceType(FD);
if (LangOpts.OpenMPIsDevice && Res.hasValue() && IdLoc.isValid() &&
*DevTy != OMPDeclareTargetDeclAttr::DT_Host)
checkOpenMPDeviceFunction(IdLoc, FD, /CheckForDelayedContext=/false);
if (!LangOpts.OpenMPIsDevice && Res.hasValue() && IdLoc.isValid() &&
*DevTy != OMPDeclareTargetDeclAttr::DT_NoHost)
checkOpenMPHostFunction(IdLoc, FD, /CheckCaller=/false);
}		}
if (auto *VD = dyn_cast<ValueDecl>(D)) {		if (auto *VD = dyn_cast<ValueDecl>(D)) {
// Problem if any with var declared with incomplete type will be reported		// Problem if any with var declared with incomplete type will be reported
// as normal, so no need to check it here.		// as normal, so no need to check it here.
if ((E \|\| !VD->getType()->isIncompleteType()) &&		if ((E \|\| !VD->getType()->isIncompleteType()) &&
!checkValueDeclInTarget(SL, SR, *this, DSAStack, VD))		!checkValueDeclInTarget(SL, SR, *this, DSAStack, VD))
return;		return;
if (!E && !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {		if (!E && !OMPDeclareTargetDeclAttr::isDeclareTargetDeclaration(VD)) {
▲ Show 20 Lines • Show All 318 Lines • Show Last 20 Lines

clang/test/OpenMP/declare_target_messages.cpp

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines

	#pragma omp declare target link(S) // expected-error {{'S' used in declare target directive is not a variable or a function name}}			#pragma omp declare target link(S) // expected-error {{'S' used in declare target directive is not a variable or a function name}}

	#pragma omp declare target (x, x) // expected-error {{'x' appears multiple times in clauses on the same declare target directive}}			#pragma omp declare target (x, x) // expected-error {{'x' appears multiple times in clauses on the same declare target directive}}
	#pragma omp declare target to(x) to(x) // expected-error {{'x' appears multiple times in clauses on the same declare target directive}}			#pragma omp declare target to(x) to(x) // expected-error {{'x' appears multiple times in clauses on the same declare target directive}}
	#pragma omp declare target link(x) // expected-error {{'x' must not appear in both clauses 'to' and 'link'}}			#pragma omp declare target link(x) // expected-error {{'x' must not appear in both clauses 'to' and 'link'}}

	void bazz() {}			void bazz() {}
	#pragma omp declare target to(bazz) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} host5-note {{marked as 'device_type(nohost)' here}}			#pragma omp declare target to(bazz) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} host5-note 3{{marked as 'device_type(nohost)' here}}
	void bazzz() {bazz();}			void bazzz() {bazz();}
	#pragma omp declare target to(bazzz) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}}			#pragma omp declare target to(bazzz) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}}
	void any() {bazz();} // host5-error {{function with 'device_type(nohost)' is not available on host}}			void any() {bazz();} // host5-error {{function with 'device_type(nohost)' is not available on host}}
	void host1() {bazz();}			void host1() {bazz();} // host5-error {{function with 'device_type(nohost)' is not available on host}}
	#pragma omp declare target to(host1) device_type(host) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} dev5-note 2 {{marked as 'device_type(host)' here}}			#pragma omp declare target to(host1) device_type(host) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} dev5-note 4 {{marked as 'device_type(host)' here}}
	void host2() {bazz();}			void host2() {bazz();} //host5-error {{function with 'device_type(nohost)' is not available on host}}
	#pragma omp declare target to(host2)			#pragma omp declare target to(host2)
	void device() {host1();}			void device() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}
	#pragma omp declare target to(device) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} host5-note 2 {{marked as 'device_type(nohost)' here}}			#pragma omp declare target to(device) device_type(nohost) // omp45-error {{unexpected 'device_type' clause, only 'to' or 'link' clauses expected}} host5-note 2 {{marked as 'device_type(nohost)' here}}
	void host3() {host1();}			void host3() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}
	#pragma omp declare target to(host3)			#pragma omp declare target to(host3)

	#pragma omp declare target			#pragma omp declare target
	void any1() {any();}			void any1() {any();}
	void any2() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}			void any2() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}
	void any3() {device();} // host5-error {{function with 'device_type(nohost)' is not available on host}}			void any3() {device();} // host5-error {{function with 'device_type(nohost)' is not available on host}}
	void any4() {any2();}			void any4() {any2();}
	#pragma omp end declare target			#pragma omp end declare target

	void any5() {any();}			void any5() {any();}
	void any6() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}			void any6() {host1();} // dev5-error {{function with 'device_type(host)' is not available on device}}
	void any7() {device();} // host5-error {{function with 'device_type(nohost)' is not available on host}}			void any7() {device();} // host5-error {{function with 'device_type(nohost)' is not available on host}}
	void any8() {any2();}			void any8() {any2();}

	#pragma omp declare target // expected-error {{expected '#pragma omp end declare target'}} expected-note {{to match this '#pragma omp declare target'}}			#pragma omp declare target // expected-error {{expected '#pragma omp end declare target'}} expected-note {{to match this '#pragma omp declare target'}}

clang/test/OpenMP/nvptx_target_exceptions_messages.cpp

	Show All 32 Lines
	};			};

	int foo() { return 0; }			int foo() { return 0; }
	int b = 15;			int b = 15;
	int d;			int d;
	#pragma omp end declare target			#pragma omp end declare target
	int c;			int c;

	int bar() { return 1 + foo() + bar() + baz1() + baz2(); }			int bar() { return 1 + foo() + bar() + baz1() + baz2(); } // expected-note {{called by 'bar'}}

	int maini1() {			int maini1() {
	int a;			int a;
	static long aa = 32;			static long aa = 32;
	try {			try {
	#pragma omp target map(tofrom \			#pragma omp target map(tofrom \
	: a, b)			: a, b)
	{			{
	S s(a);			S s(a);
	static long aaa = 23;			static long aaa = 23;
	a = foo() + bar() + b + c + d + aa + aaa + FA<int>();			a = foo() + bar() + b + c + d + aa + aaa + FA<int>(); // expected-note{{called by 'maini1'}}
	if (!a)			if (!a)
	throw "Error"; // expected-error {{cannot use 'throw' with exceptions disabled}}			throw "Error"; // expected-error {{cannot use 'throw' with exceptions disabled}}
	}			}
	} catch(...) {			} catch(...) {
	}			}
	return baz4();			return baz4();
	}			}

	Show All 24 Lines

clang/test/SemaCUDA/bad-calls-on-same-line.cu

	Show All 27 Lines
	template <typename T>			template <typename T>
	inline __host__ __device__ void hd() {			inline __host__ __device__ void hd() {
	Selector<T>().f();			Selector<T>().f();
	// expected-error@-1 2 {{reference to __device__ function}}			// expected-error@-1 2 {{reference to __device__ function}}
	}			}

	void host_fn() {			void host_fn() {
	hd<int>();			hd<int>();
	hd<double>(); // expected-note {{function template specialization 'hd<double>'}}			hd<double>();
	// expected-note@-1 {{called by 'host_fn'}}			// expected-note@-1 {{called by 'host_fn'}}
	hd<float>(); // expected-note {{function template specialization 'hd<float>'}}			hd<float>();
	// expected-note@-1 {{called by 'host_fn'}}			// expected-note@-1 {{called by 'host_fn'}}
	}			}

clang/test/SemaCUDA/call-device-fn-from-host.cu

	// RUN: %clang_cc1 %s --std=c++11 -triple x86_64-unknown-linux -emit-llvm -o - \			// RUN: %clang_cc1 %s --std=c++11 -triple x86_64-unknown-linux -emit-llvm -o - \
	// RUN: -verify -verify-ignore-unexpected=note			// RUN: -verify -verify-ignore-unexpected=note
	// RUN: %clang_cc1 %s --std=c++11 -triple x86_64-unknown-linux -emit-llvm -o - \			// RUN: %clang_cc1 %s --std=c++11 -triple x86_64-unknown-linux -emit-llvm -o - \
	// RUN: -verify -verify-ignore-unexpected=note -fopenmp			// RUN: -verify=expected,omp -verify-ignore-unexpected=note -fopenmp

	// Note: This test won't work with -fsyntax-only, because some of these errors			// Note: This test won't work with -fsyntax-only, because some of these errors
	// are emitted during codegen.			// are emitted during codegen.

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

	__device__ void device_fn() {}			__device__ void device_fn() {}
	// expected-note@-1 5 {{'device_fn' declared here}}			// expected-note@-1 5 {{'device_fn' declared here}}
	Show All 21 Lines
	};			};

	__host__ __device__ void T::hd3() {			__host__ __device__ void T::hd3() {
	device_fn();			device_fn();
	// expected-error@-1 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}
	}			}

	template <typename T> __host__ __device__ void hd2() { device_fn(); }			template <typename T> __host__ __device__ void hd2() { device_fn(); }
	// expected-error@-1 2 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}
	void host_fn() { hd2<int>(); }			void host_fn() { hd2<int>(); }

	__host__ __device__ void hd() { device_fn(); }			__host__ __device__ void hd() { device_fn(); }
	// expected-error@-1 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __device__ function 'device_fn' in __host__ __device__ function}}

	// No error because this is never instantiated.			// No error because this is never instantiated.
	template <typename T> __host__ __device__ void hd3() { device_fn(); }			template <typename T> __host__ __device__ void hd3() { device_fn(); }

	▲ Show 20 Lines • Show All 49 Lines • Show Last 20 Lines

clang/test/SemaCUDA/call-host-fn-from-device.cu

	Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
	};			};

	__host__ __device__ void T::hd3() {			__host__ __device__ void T::hd3() {
	host_fn();			host_fn();
	// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}
	}			}

	template <typename T> __host__ __device__ void hd2() { host_fn(); }			template <typename T> __host__ __device__ void hd2() { host_fn(); }
	// expected-error@-1 2 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}
	__global__ void kernel() { hd2<int>(); }			__global__ void kernel() { hd2<int>(); }

	__host__ __device__ void hd() { host_fn(); }			__host__ __device__ void hd() { host_fn(); }
	// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}

	template <typename T> __host__ __device__ void hd3() { host_fn(); }			template <typename T> __host__ __device__ void hd3() { host_fn(); }
	// expected-error@-1 2 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}			// expected-error@-1 {{reference to __host__ function 'host_fn' in __host__ __device__ function}}
	__device__ void device_fn() { hd3<int>(); }			__device__ void device_fn() { hd3<int>(); }

	// No error because this is never instantiated.			// No error because this is never instantiated.
	template <typename T> __host__ __device__ void hd4() { host_fn(); }			template <typename T> __host__ __device__ void hd4() { host_fn(); }

	__host__ __device__ void local_var() {			__host__ __device__ void local_var() {
	S s;			S s;
	// expected-error@-1 {{reference to __host__ function 'S' in __host__ __device__ function}}			// expected-error@-1 {{reference to __host__ function 'S' in __host__ __device__ function}}
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

clang/test/SemaCUDA/openmp-target.cu

	Show All 10 Lines
	__device__ void cu_devf();			__device__ void cu_devf();
	#endif			#endif

	void bazz() {}			void bazz() {}
	#pragma omp declare target to(bazz) device_type(nohost)			#pragma omp declare target to(bazz) device_type(nohost)
	void bazzz() {bazz();}			void bazzz() {bazz();}
	#pragma omp declare target to(bazzz) device_type(nohost)			#pragma omp declare target to(bazzz) device_type(nohost)
	void any() {bazz();} // expected-error {{function with 'device_type(nohost)' is not available on host}}			void any() {bazz();} // expected-error {{function with 'device_type(nohost)' is not available on host}}
	void host1() {bazz();}			void host1() {bazz();} // expected-error {{function with 'device_type(nohost)' is not available on host}}
	#pragma omp declare target to(host1) device_type(host)			#pragma omp declare target to(host1) device_type(host)
	void host2() {bazz();}			void host2() {bazz();} // expected-error {{function with 'device_type(nohost)' is not available on host}}
	#pragma omp declare target to(host2)			#pragma omp declare target to(host2)
	void device() {host1();}			void device() {host1();}
	#pragma omp declare target to(device) device_type(nohost)			#pragma omp declare target to(device) device_type(nohost)
	void host3() {host1();}			void host3() {host1();}
	#pragma omp declare target to(host3)			#pragma omp declare target to(host3)

	#pragma omp declare target			#pragma omp declare target
	void any1() {any();}			void any1() {any();}
	Show All 14 Lines

clang/test/SemaCUDA/trace-through-global.cu

	Show All 32 Lines

	template <typename T>			template <typename T>
	void launch_kernel() {			void launch_kernel() {
	kernel<<<0, 0>>>(T());			kernel<<<0, 0>>>(T());

	// Notice that these two diagnostics are different: Because the call to hd1			// Notice that these two diagnostics are different: Because the call to hd1
	// is not dependent on T, the call to hd1 comes from 'launch_kernel', while			// is not dependent on T, the call to hd1 comes from 'launch_kernel', while
	// the call to hd3, being dependent, comes from 'launch_kernel<int>'.			// the call to hd3, being dependent, comes from 'launch_kernel<int>'.
	hd1(); // expected-note {{called by 'launch_kernel'}}			hd1(); // expected-note {{called by 'launch_kernel<int>'}}
	hd3(T()); // expected-note {{called by 'launch_kernel<int>'}}			hd3(T()); // expected-note {{called by 'launch_kernel<int>'}}
	}			}

	void host_fn() {			void host_fn() {
	launch_kernel<int>();			launch_kernel<int>();
	// expected-note@-1 2 {{called by 'host_fn'}}			// expected-note@-1 2 {{called by 'host_fn'}}
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST traveseClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 240760

clang/include/clang/Sema/Sema.h

clang/lib/Sema/Sema.cpp

clang/lib/Sema/SemaCUDA.cpp

clang/lib/Sema/SemaDecl.cpp

clang/lib/Sema/SemaExpr.cpp

clang/lib/Sema/SemaOpenMP.cpp

clang/test/OpenMP/declare_target_messages.cpp

clang/test/OpenMP/nvptx_target_exceptions_messages.cpp

clang/test/SemaCUDA/bad-calls-on-same-line.cu

clang/test/SemaCUDA/call-device-fn-from-host.cu

clang/test/SemaCUDA/call-host-fn-from-device.cu

clang/test/SemaCUDA/openmp-target.cu

clang/test/SemaCUDA/trace-through-global.cu

[CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese
ClosedPublic