This is an archive of the discontinued LLVM Phabricator instance.

[C++20][Modules] Fix incorrect visibilities in implementation units.
Changes PlannedPublic

Authored by iains on Mar 13 2023, 10:43 AM.

Details

Reviewers
ChuanqiXu
Group Reviewers
Restricted Project
Summary

Some of the testcases committed during the merging of modules contain
FIXMEs concerning the visibility of declarations with internal-linkage in
module interface units as seen from module implementation units.

This is, in part, caused by an oversight in the utility routine that determines
whether a module is part of the current TU or not.

The lookup definition for visible was also too wide, failing to account
cases where a module was visible, but some declarations from it were not.

This patch has two components:
1/ Checking for visibility of imported decls and excluding thsose with
internal linkage (when from a different TU) or from a private module
fragment. The logic for this has to accommodate differences for ADL when
the lookup is for a templated entity that is defined in a different TU
from the importer.

2/ Updating the diagnostics in reponse to this, in particular dealing
better with the case that the user attempts to use a module-local linkage
item in an importing TU (we now provide a fixit to indicate that the item
needs to be exported to be used). When we search for possible "typos",
which we do in response to failing to find an entity, we do so allowing
finds of 'hidden' names. Before we offer names found this as possible
typo fixes, we need to check that they would be viable if not hidden.

There are a number of basically mechanical changes to tests to accommodate
the fixed rejection of relevant declarations and the improved diagnostic
messages.

Diff Detail

Event Timeline

iains created this revision.Mar 13 2023, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2023, 10:43 AM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript

notes:
a) I need to update the commit log message so that it formats properly in the browser
b) since we are now seeing performance complaints against modules, and some of these routines are used many times (sometimes several time per name), I have tried to structure tests so that the cheapest ones execute first - and likewise the utility routines.

iains published this revision for review.Mar 14 2023, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 12:32 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring). I feel this is more natural and consistent. I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports. So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

clang/lib/Sema/SemaModule.cpp
307–308

This can be a separate patch.

iains added a comment.Mar 14 2023, 1:59 AM

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring).

I think it is not small refactoring - we make extensive use of "getCurrentModule()" to determine if we are processing a module and the interface to know which module the implementation belongs to.

If we add the implicitly imported interface to "imports" instead of the module scope we will need to make sure that we deal with all these cases where we rely on the interface module for this information.

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I feel this is more natural and consistent.

Yes, if we were starting from 0, that is so - but we have an implementation with long history - we would need to see the impact. [I am not against making this cleaner, but concerned about the amount of change].

I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports.

I was wondering whether some of the reports filed naming duplicate and clashing symbols could be related, but I did not analyse.

So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

I was working on this because it was affecting my P1815 stuff - but I can use this patch as a temporary solution.

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

There are (I think your) FIXMEs in the testcases (but I do not know if there is a corresponding PR).

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring).

I think it is not small refactoring - we make extensive use of "getCurrentModule()" to determine if we are processing a module and the interface to know which module the implementation belongs to.

If we add the implicitly imported interface to "imports" instead of the module scope we will need to make sure that we deal with all these cases where we rely on the interface module for this information.

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

Yeah, understood. I had a mental model but I am not sure if it can work very well as I expected. I'll try to give it a try recently.

I feel this is more natural and consistent.

Yes, if we were starting from 0, that is so - but we have an implementation with long history - we would need to see the impact. [I am not against making this cleaner, but concerned about the amount of change].

I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports.

I was wondering whether some of the reports filed naming duplicate and clashing symbols could be related, but I did not analyse.

So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

I was working on this because it was affecting my P1815 stuff - but I can use this patch as a temporary solution.

Thanks!

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

There are (I think your) FIXMEs in the testcases (but I do not know if there is a corresponding PR).

Oh sorry. My bad clearly. Once upon time, someone told me it is a bad idea to add FIXME in the test case... Now I understand.

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

iains added a comment.Mar 14 2023, 2:33 AM

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

Perhaps then, we can split out the changes to Sema::isModuleUnitOfCurrentTU and see how much of the functionality is still working (with the intent that the full fix will come along later).

So if this is split into two

  1. the fix to Sema::isModuleUnitOfCurrentTU (which you will be updating) and
  2. the uses of that to fix the lookups and diagnostics?

(I can use both as a temporary fix .. but 2 could be applied with the existing isModuleUnitOfCurrentTU implementation).

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

Perhaps then, we can split out the changes to Sema::isModuleUnitOfCurrentTU and see how much of the functionality is still working (with the intent that the full fix will come along later).

So if this is split into two

  1. the fix to Sema::isModuleUnitOfCurrentTU (which you will be updating) and
  2. the uses of that to fix the lookups and diagnostics?

(I can use both as a temporary fix .. but 2 could be applied with the existing isModuleUnitOfCurrentTU implementation).

I feel it may be better to wait for me to have an initial experiment. On the one hand, if we can make it relatively quickly, we can save some time. On the other hand, now I understand that removing codes is much harder than adding codes... So let's wait for some time. The next release is in the end of July, so we have plenty time.

iains added a comment.Mar 14 2023, 2:01 PM

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I have changed my mind here.
It seems that we are getting confusing decl context entries without having a proper module for the implementation (even though we will never write such a module to a PCM, since it is not importable).

I was wondering if it would be possible to avoid having separate values for PartitionInterface/Implementation (since they both have a BMI - it is only a question of whether that can be re-exported). We would have to add one bit to the PCM to deal with that, so that we might as well extend the ModuleKind enum to 4 bits.

So .. please wait for one or two days, I will try to refresh the implementation patch (D126959) and see if that resolves some of the issues I am seeing with P1815

(this does not alter the situation that you are going to revisit the isModuleUnitOfCurrentTU if that is needed - but it might not be; since having a separate module for the impl. will avoid some of the ambiguities anyway).

Additional Note: we also have a situation in the lookups for dependent names during template instantiation that the visibility of decls can need to be assessed in a different module context from the current module. So that we might also need to have a areTheseModulesFromTheSameTU(Module *A. Module *B)

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I have changed my mind here.
It seems that we are getting confusing decl context entries without having a proper module for the implementation (even though we will never write such a module to a PCM, since it is not importable).

I was wondering if it would be possible to avoid having separate values for PartitionInterface/Implementation (since they both have a BMI - it is only a question of whether that can be re-exported). We would have to add one bit to the PCM to deal with that, so that we might as well extend the ModuleKind enum to 4 bits.

So .. please wait for one or two days, I will try to refresh the implementation patch (D126959) and see if that resolves some of the issues I am seeing with P1815

(this does not alter the situation that you are going to revisit the isModuleUnitOfCurrentTU if that is needed - but it might not be; since having a separate module for the impl. will avoid some of the ambiguities anyway).

Additional Note: we also have a situation in the lookups for dependent names during template instantiation that the visibility of decls can need to be assessed in a different module context from the current module. So that we might also need to have a areTheseModulesFromTheSameTU(Module *A. Module *B)

Yeah. What I had in mind is to remove ModulePartitionInterface and ModulePartitionImplementation in ModuleKinds and rename ModuleInterfaceUnit to NamedModules. And we can introduce 2 bits for IsPartition and IsImplementation. Then we can introduce ImplementationUnit without any loss.

I filed another issue (https://github.com/llvm/llvm-project/issues/61427) to reflect [basic.lookup.general]p2.3. So there are multiple issues. Let's handle them in multiple patches too.

iains updated this revision to Diff 507977.Mar 24 2023, 12:19 AM
iains edited the summary of this revision. (Show Details)

rebased, split the changes to module and private linkage out,

this needs to be refactored to collect the test functions into either
module.h and / or sema.h

iains planned changes to this revision.Mar 24 2023, 12:21 AM

although this patch does handle the cases needed, it really needs refactoring.
posting here since we are both working in this area and this might be useful input.

Just took a quick look.

clang/lib/Sema/SemaLookup.cpp
2081

We should check header units here.

2100

Let's not use isFromASTFile(). It is a low level API without higher level semantics. I think it is good enough to check the module of ND.

iains updated this revision to Diff 510388.Apr 2 2023, 8:44 PM

rebased

iains marked 2 inline comments as done.Apr 2 2023, 8:48 PM

I updated this because I am going to push the latest version of the P1815 patch and that depends on the lookup changes.

clang/lib/Sema/SemaLookup.cpp
2081

The point of checking module map modules was to avoid affecting clang modules with the change in semantics.

At the moment, I am not sure why we could exclude lookup from finding decls in an imported header unit?

2100

lookup is very heavily used; the purpose of the isFromAST() check is to short-circuit the more expensive checks when we know that a decl must be in the same TU (i.e. it is not from an AST file).

If we can find a suitable inexpensive check that has better semantics, I am fine to change this,

ChuanqiXu added inline comments.Apr 2 2023, 9:47 PM
clang/lib/Sema/SemaLookup.cpp
2081

At the moment, I am not sure why we could exclude lookup from finding decls in an imported header unit?

We're not excluding decls from an imported header units. We're trying to include the decls from the headers units. Since we've allowed decls with internal linkage to appear to header units. We must need to allow decls with internal linkage in header units to be found here. Otherwise, the following example may fail:

// foo.h
static int a = 43;

// m.cppm
export module m;
import "foo.h";
use of a...

BTW, given the semantics of header units and clang modules are pretty similar, we should use isHeaderModule in most cases.

2100

It looks good enough to me to check that ND lives in a module. And from what I profiled, the lookup process is not hot really.

iains marked 2 inline comments as done.Apr 2 2023, 9:56 PM
iains added inline comments.
clang/lib/Sema/SemaLookup.cpp
2081

ah yes, that special case, I should have remembered.
OK I will adjust this at the next iteration (I think you mean isHeaderLikeModule())

Did another quick look. And I feel the title and the summary of the page is not consistent with the patch itself. I think it is better to split this to make the change to focus on the entities with internal linkage. I don't know what's wrong with the module linkage things. Maybe we can file an issue ahead in next time.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11228

The 'private' here makes me to think about module private fragment while it is not true. I prefer to refactor it to something like "it is not exported".

clang/include/clang/Sema/Sema.h
2370

While I am not a native speaker, I feel isSameModuleWithCurrentTU may be a better name.

clang/lib/Sema/SemaLookup.cpp
1784

It looks better to me if we can insert codes here

3912–3936

What's the intention for the change? And why is the current behavior bad without this?

4515

What if we don't touch the typo part? I am still confusing.

5819–5830

I feel like this can be another change. I'm a little bit confused since I feel the patch did multiple things at the same time again..

ChuanqiXu added inline comments.Apr 4 2023, 1:47 AM
clang/lib/Sema/SemaLookup.cpp
3912–3936

What's the intention for the change? And why is the current behavior bad without this?

3912–3936

Oh, I understand why I feel the code is not good since the decl with internal linkage or module linkage shouldn't be visible. So even if there are problems, we should handle them elsewhere.

iains updated this revision to Diff 511327.Apr 6 2023, 2:08 AM

rebased, addressed review comments (patch still needs refactoring)

iains added a comment.Apr 6 2023, 2:14 AM

in the end, we have to deal with the following cases:

  1. decls in the same TU (including GMF and PMF)
  2. decls in the current named module (excluding GMF)
  3. decls on an instantiation path (which can include internal-linkage entities)
  4. decls that are needed by an exported entity from a *differennt" named module [e.g. a module-linkage entity that is used by some exported content].
iains marked 2 inline comments as done.Apr 6 2023, 2:34 AM
iains added inline comments.
clang/include/clang/Sema/Sema.h
2370

actually, as noted, both cases are needed. So I have made that clear by splitting the function into two.

clang/lib/Sema/SemaLookup.cpp
2100

OK, we can always revisit performance later,

5819–5830

sure, we can refactor; the current patch is a placeholder to allow work to continue on p1815.

iains updated this revision to Diff 529852.Jun 9 2023, 1:02 AM
iains marked 2 inline comments as done.

rebased and adjusted for upstream changes

iains added a comment.Jun 9 2023, 1:36 AM

about the comments that the patch seems to do multiple things; I do not think we can fix the lookup without touching the typo-fixes since it uses the same underlying machinery - if we do not adjust the typo fixes, we will regress diagnostics.

iains updated this revision to Diff 530786.Jun 12 2023, 11:27 PM

rebased, added testcase for Issue 61601.

bruno added a subscriber: bruno.Jun 15 2023, 6:29 AM

if we do not adjust the typo fixes, we will regress diagnostics.

What the kind of diagnostics will be regressed? I mean, it looks weird to me that we suggest typo fixes from hidden names.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11228

let's try rewording private to invisible?

clang/include/clang/Sema/Sema.h
2370

Let's use !DeclBase::isInAnotherModuleUnit() instead now.

2372–2380

nit: I prefer this to be a freestanding function in Module.h. This looks slightly not good within Sema.

clang/lib/Sema/SemaLookup.cpp
3912–3936

Could we tried to address this? The change smells not so good.

iains marked 2 inline comments as done.Jun 19 2023, 1:08 AM

if we do not adjust the typo fixes, we will regress diagnostics.

What the kind of diagnostics will be regressed? I mean, it looks weird to me that we suggest typo fixes from hidden names.

Because when we do typo checking, we relax the visibility conditions so that we can see some decl that might be hidden or misspelled - then we can say

"you need to import module XX before using YY",

or

"did you mean ZZ"

(I would be happy if we did not need to do this in this patch, but not sure how we can work around it).

clang/include/clang/Basic/DiagnosticSemaKinds.td
11228

I will reword.

clang/include/clang/Sema/Sema.h
2370

'isInAnotherModuleUnit' is not precise - for this work we need to have :

"is in another module unit of the same named module"
"is in another module unit of a different named module"

so I have made two functions to do these two jobs (although they are in Sema now, we can move them either to decl or module - as suggested later)

2372–2380

what about moving all the module tu-related tests to decl.cpp/h (or maybe to module.cpp/h)?

clang/lib/Sema/SemaLookup.cpp
1784

I am not sure exactly what you mean by "insert codes"?

3912–3936

I am not sure what you mean here - I would like us to get this lookup stuff fixed for 17, so will be working on it when back in the office (traveling today)

There is a different behaviour between cases where the entry is from an named module (but not the current one) and a different TU of the same named module.

if we do not adjust the typo fixes, we will regress diagnostics.

What the kind of diagnostics will be regressed? I mean, it looks weird to me that we suggest typo fixes from hidden names.

Because when we do typo checking, we relax the visibility conditions so that we can see some decl that might be hidden or misspelled - then we can say

"you need to import module XX before using YY",

or

"did you mean ZZ"

(I would be happy if we did not need to do this in this patch, but not sure how we can work around it).

It is OK to see the misspelled decl. But it looks weird to me to see the hidden decls. I think such regression should be called correction.

clang/include/clang/Sema/Sema.h
2370

Sounds not bad to me.

2372–2380

I prefer moving these to Module.h since its operand is Module instead of Decl.

clang/lib/Sema/SemaLookup.cpp
1784

Oh, this is a little bit historical. I mean it looks better to do the main job of the patch (correcting the visibility of internal linkage entities) if possible.

3912–3936

I mean the we should put the similar things together as much as possible. It looks that the codes are trying to correcting the visibilities returned from isVisible(). But this sounds not good. Since it implies that the result from isVisible() is not valid.

There is a different behaviour between cases where the entry is from an named module (but not the current one) and a different TU of the same named module.

Maybe we can tried to solve this by adding other isVisible interfaces.

I would like us to get this lookup stuff fixed for 17, so will be working on it when back in the office (traveling today)

Yeah, but we still have one month. And even if we didn't get it in time. (Given the size of the patch) It is still OK to back port this before the first week of September. So we can be more relaxed.

iains marked 3 inline comments as done.Jun 19 2023, 1:36 AM

I will look at the rest of the comments once back in the office.

if we do not adjust the typo fixes, we will regress diagnostics.

What the kind of diagnostics will be regressed? I mean, it looks weird to me that we suggest typo fixes from hidden names.

Because when we do typo checking, we relax the visibility conditions so that we can see some decl that might be hidden or misspelled - then we can say

"you need to import module XX before using YY",

or

"did you mean ZZ"

(I would be happy if we did not need to do this in this patch, but not sure how we can work around it).

It is OK to see the misspelled decl. But it looks weird to me to see the hidden decls. I think such regression should be called correction.

It is a case that we have supported; the user puts in a use of a decl but forgets to import the module exporting it (I agree it is not _exactly_ a "typo" in terms of names, but the diagnostics counts it in the same way)

clang/lib/Sema/SemaLookup.cpp
1784

OK. I agree that is ideal; it remains to be seen if it is feasible.

3912–3936

I mean the we should put the similar things together as much as possible. It looks that the codes are trying to correcting the visibilities returned from isVisible(). But this sounds not good. Since it implies that the result from isVisible() is not valid.

Yes, indeed, I agree - if we were starting from scratch, we would do all this in the lower-level 'isVisible' - but since that is used in different ways by the higher levels, it seemed quite tricky to change.

There is a different behaviour between cases where the entry is from an named module (but not the current one) and a different TU of the same named module.

Maybe we can tried to solve this by adding other isVisible interfaces.

maybe that might work, I believe that when I looked before it was going to be quite complex, I will look again.

It is a case that we have supported; the user puts in a use of a decl but forgets to import the module exporting it (I agree it is not _exactly_ a "typo" in terms of names, but the diagnostics counts it in the same way)

I got your point. But I prefer to implement an all-visible mode (not available for users) for such situations. And I still think it is not problem for the special case. Since this is patch is working for internal decls.

iains marked an inline comment as done.Jun 19 2023, 2:14 AM

It is a case that we have supported; the user puts in a use of a decl but forgets to import the module exporting it (I agree it is not _exactly_ a "typo" in terms of names, but the diagnostics counts it in the same way)

I got your point. But I prefer to implement an all-visible mode (not available for users) for such situations. And I still think it is not problem for the special case. Since this is patch is working for internal decls.

Yes, but AFAIR we also support the user naming an internal entity and then say "you must export PP before using it".
(I am sure that these changes were needed to avoid regressing existing diagnostics).

I have some ideas now - will thin about them during travel.

shafik added a subscriber: shafik.Jun 21 2023, 7:55 PM
shafik added inline comments.
clang/include/clang/Basic/DiagnosticSemaKinds.td
11227

I only see declaration and definition cases covered for this diagnostic in the tests.

11231

I only see the declaration case covered for this diagnostic. Please add tests to cover the other cases.

11235

I only see the declaration and definition case covered in the tests we should cover all possible diagnostics. Especially in new code, bugs often come about on code we don't cover.

clang/lib/Sema/SemaLookup.cpp
4522

Fix to conform w/ bugprone-argument-comment

4535
4621
shafik added a reviewer: Restricted Project.Jun 21 2023, 7:56 PM

Adding clang-language-wg for more visibility.

iains updated this revision to Diff 536457.Jun 30 2023, 3:29 PM

just rebased to support p1815 work

iains updated this revision to Diff 536524.Jul 1 2023, 4:14 AM

rebased, fixed some format issues; again to support p1815 work only.

iains planned changes to this revision.Jul 1 2023, 4:16 AM

changes are needed to address review comments - but this revision is needed as a parent to other work (so might need to be rebased from time to time)