Page MenuHomePhabricator

[C++20][Modules] Fix incorrect visibilities in implementation units.
Needs ReviewPublic

Authored by iains on Mar 13 2023, 10:43 AM.

Details

Reviewers
ChuanqiXu
Summary

Some of the testcases committed during the merging of modules contain
FIXMEs concerning the visibility of declarations with internal-linkage in
module interface units as seen from module implementation units.

This is, in part, caused by an oversight in the utility routine that determines
whether a module is part of the current TU or not.

The lookup definition for visible was also too wide, failing to account
cases where a module was visible, but some declarations from it were not.

This patch has two components:
1/ Checking for visibility of imported decls and excluding thsose with
internal linkage (when from a different TU) or from a private module
fragment. The logic for this has to accommodate differences for ADL when
the lookup is for a templated entity that is defined in a different TU
from the importer.

2/ Updating the diagnostics in reponse to this, in particular dealing
better with the case that the user attempts to use a module-local linkage
item in an importing TU (we now provide a fixit to indicate that the item
needs to be exported to be used). When we search for possible "typos",
which we do in response to failing to find an entity, we do so allowing
finds of 'hidden' names. Before we offer names found this as possible
typo fixes, we need to check that they would be viable if not hidden.

There are a number of basically mechanical changes to tests to accommodate
the fixed rejection of relevant declarations and the improved diagnostic
messages.

Diff Detail

Event Timeline

iains created this revision.Mar 13 2023, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2023, 10:43 AM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript

notes:
a) I need to update the commit log message so that it formats properly in the browser
b) since we are now seeing performance complaints against modules, and some of these routines are used many times (sometimes several time per name), I have tried to structure tests so that the cheapest ones execute first - and likewise the utility routines.

iains published this revision for review.Mar 14 2023, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 14 2023, 12:32 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring). I feel this is more natural and consistent. I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports. So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

clang/lib/Sema/SemaModule.cpp
304–305 ↗(On Diff #504748)

This can be a separate patch.

iains added a comment.Mar 14 2023, 1:59 AM

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring).

I think it is not small refactoring - we make extensive use of "getCurrentModule()" to determine if we are processing a module and the interface to know which module the implementation belongs to.

If we add the implicitly imported interface to "imports" instead of the module scope we will need to make sure that we deal with all these cases where we rely on the interface module for this information.

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I feel this is more natural and consistent.

Yes, if we were starting from 0, that is so - but we have an implementation with long history - we would need to see the impact. [I am not against making this cleaner, but concerned about the amount of change].

I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports.

I was wondering whether some of the reports filed naming duplicate and clashing symbols could be related, but I did not analyse.

So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

I was working on this because it was affecting my P1815 stuff - but I can use this patch as a temporary solution.

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

There are (I think your) FIXMEs in the testcases (but I do not know if there is a corresponding PR).

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

Yeah, it is indeed a problem that Sema::isModuleUnitOfCurrentTU doesn't work for implementation units. And I have been thinking about this for a while. My thought is that the root cause may be that we shouldn't push the module interface to Sema::ModuleScopes() for the implementation unit. (We need some other small refactoring).

I think it is not small refactoring - we make extensive use of "getCurrentModule()" to determine if we are processing a module and the interface to know which module the implementation belongs to.

If we add the implicitly imported interface to "imports" instead of the module scope we will need to make sure that we deal with all these cases where we rely on the interface module for this information.

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

Yeah, understood. I had a mental model but I am not sure if it can work very well as I expected. I'll try to give it a try recently.

I feel this is more natural and consistent.

Yes, if we were starting from 0, that is so - but we have an implementation with long history - we would need to see the impact. [I am not against making this cleaner, but concerned about the amount of change].

I thought to take this one but we didn't use implementation units in the downstream and there is no related issue reports.

I was wondering whether some of the reports filed naming duplicate and clashing symbols could be related, but I did not analyse.

So I didn't start to work on it... If you are not hurry, I'd like to take this. Otherwise I'd suggest you to try the above method I mentioned.

I was working on this because it was affecting my P1815 stuff - but I can use this patch as a temporary solution.

Thanks!

BTW, I suggest we file an issue first if we want to do something. So that we can avoid solving the same problem.

There are (I think your) FIXMEs in the testcases (but I do not know if there is a corresponding PR).

Oh sorry. My bad clearly. Once upon time, someone told me it is a bad idea to add FIXME in the test case... Now I understand.

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

iains added a comment.Mar 14 2023, 2:33 AM

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

Perhaps then, we can split out the changes to Sema::isModuleUnitOfCurrentTU and see how much of the functionality is still working (with the intent that the full fix will come along later).

So if this is split into two

  1. the fix to Sema::isModuleUnitOfCurrentTU (which you will be updating) and
  2. the uses of that to fix the lookups and diagnostics?

(I can use both as a temporary fix .. but 2 could be applied with the existing isModuleUnitOfCurrentTU implementation).

The checks for internal-linkage symbols and the improvements to diagnostics do not depend on how Sema::isModuleUnitOfCurrentTU is implemented, so that I think we could fix these problems and then deal with the refactoring later.

In either case, I do not have much time to work more on this right now.

I'll try to make it.

Perhaps then, we can split out the changes to Sema::isModuleUnitOfCurrentTU and see how much of the functionality is still working (with the intent that the full fix will come along later).

So if this is split into two

  1. the fix to Sema::isModuleUnitOfCurrentTU (which you will be updating) and
  2. the uses of that to fix the lookups and diagnostics?

(I can use both as a temporary fix .. but 2 could be applied with the existing isModuleUnitOfCurrentTU implementation).

I feel it may be better to wait for me to have an initial experiment. On the one hand, if we can make it relatively quickly, we can save some time. On the other hand, now I understand that removing codes is much harder than adding codes... So let's wait for some time. The next release is in the end of July, so we have plenty time.

iains added a comment.Mar 14 2023, 2:01 PM

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I have changed my mind here.
It seems that we are getting confusing decl context entries without having a proper module for the implementation (even though we will never write such a module to a PCM, since it is not importable).

I was wondering if it would be possible to avoid having separate values for PartitionInterface/Implementation (since they both have a BMI - it is only a question of whether that can be re-exported). We would have to add one bit to the PCM to deal with that, so that we might as well extend the ModuleKind enum to 4 bits.

So .. please wait for one or two days, I will try to refresh the implementation patch (D126959) and see if that resolves some of the issues I am seeing with P1815

(this does not alter the situation that you are going to revisit the isModuleUnitOfCurrentTU if that is needed - but it might not be; since having a separate module for the impl. will avoid some of the ambiguities anyway).

Additional Note: we also have a situation in the lookups for dependent names during template instantiation that the visibility of decls can need to be assessed in a different module context from the current module. So that we might also need to have a areTheseModulesFromTheSameTU(Module *A. Module *B)

I was thinking at one stage to add an 'Implementation' module Kind, but at the moment I do not think it is worth extending the size of the ModuleKind enum bit field for this (since we now have used up all the entries with the new 'ExplicitGlobalModuleFragment' case). That also has a large impact also for relatively few uses.

I have changed my mind here.
It seems that we are getting confusing decl context entries without having a proper module for the implementation (even though we will never write such a module to a PCM, since it is not importable).

I was wondering if it would be possible to avoid having separate values for PartitionInterface/Implementation (since they both have a BMI - it is only a question of whether that can be re-exported). We would have to add one bit to the PCM to deal with that, so that we might as well extend the ModuleKind enum to 4 bits.

So .. please wait for one or two days, I will try to refresh the implementation patch (D126959) and see if that resolves some of the issues I am seeing with P1815

(this does not alter the situation that you are going to revisit the isModuleUnitOfCurrentTU if that is needed - but it might not be; since having a separate module for the impl. will avoid some of the ambiguities anyway).

Additional Note: we also have a situation in the lookups for dependent names during template instantiation that the visibility of decls can need to be assessed in a different module context from the current module. So that we might also need to have a areTheseModulesFromTheSameTU(Module *A. Module *B)

Yeah. What I had in mind is to remove ModulePartitionInterface and ModulePartitionImplementation in ModuleKinds and rename ModuleInterfaceUnit to NamedModules. And we can introduce 2 bits for IsPartition and IsImplementation. Then we can introduce ImplementationUnit without any loss.

I filed another issue (https://github.com/llvm/llvm-project/issues/61427) to reflect [basic.lookup.general]p2.3. So there are multiple issues. Let's handle them in multiple patches too.

iains updated this revision to Diff 507977.Mar 24 2023, 12:19 AM
iains edited the summary of this revision. (Show Details)

rebased, split the changes to module and private linkage out,

this needs to be refactored to collect the test functions into either
module.h and / or sema.h

iains planned changes to this revision.Mar 24 2023, 12:21 AM

although this patch does handle the cases needed, it really needs refactoring.
posting here since we are both working in this area and this might be useful input.

Just took a quick look.

clang/lib/Sema/SemaLookup.cpp
2082

We should check header units here.

2101

Let's not use isFromASTFile(). It is a low level API without higher level semantics. I think it is good enough to check the module of ND.

iains updated this revision to Diff 510388.Apr 2 2023, 8:44 PM

rebased

iains marked 2 inline comments as done.Apr 2 2023, 8:48 PM

I updated this because I am going to push the latest version of the P1815 patch and that depends on the lookup changes.

clang/lib/Sema/SemaLookup.cpp
2082

The point of checking module map modules was to avoid affecting clang modules with the change in semantics.

At the moment, I am not sure why we could exclude lookup from finding decls in an imported header unit?

2101

lookup is very heavily used; the purpose of the isFromAST() check is to short-circuit the more expensive checks when we know that a decl must be in the same TU (i.e. it is not from an AST file).

If we can find a suitable inexpensive check that has better semantics, I am fine to change this,

ChuanqiXu added inline comments.Apr 2 2023, 9:47 PM
clang/lib/Sema/SemaLookup.cpp
2082

At the moment, I am not sure why we could exclude lookup from finding decls in an imported header unit?

We're not excluding decls from an imported header units. We're trying to include the decls from the headers units. Since we've allowed decls with internal linkage to appear to header units. We must need to allow decls with internal linkage in header units to be found here. Otherwise, the following example may fail:

// foo.h
static int a = 43;

// m.cppm
export module m;
import "foo.h";
use of a...

BTW, given the semantics of header units and clang modules are pretty similar, we should use isHeaderModule in most cases.

2101

It looks good enough to me to check that ND lives in a module. And from what I profiled, the lookup process is not hot really.

iains marked 2 inline comments as done.Apr 2 2023, 9:56 PM
iains added inline comments.
clang/lib/Sema/SemaLookup.cpp
2082

ah yes, that special case, I should have remembered.
OK I will adjust this at the next iteration (I think you mean isHeaderLikeModule())

Did another quick look. And I feel the title and the summary of the page is not consistent with the patch itself. I think it is better to split this to make the change to focus on the entities with internal linkage. I don't know what's wrong with the module linkage things. Maybe we can file an issue ahead in next time.

clang/include/clang/Basic/DiagnosticSemaKinds.td
11181

The 'private' here makes me to think about module private fragment while it is not true. I prefer to refactor it to something like "it is not exported".

clang/include/clang/Sema/Sema.h
2359

While I am not a native speaker, I feel isSameModuleWithCurrentTU may be a better name.

clang/lib/Sema/SemaLookup.cpp
1782

It looks better to me if we can insert codes here

3912–3936

What's the intention for the change? And why is the current behavior bad without this?

4518

What if we don't touch the typo part? I am still confusing.

5822–5833

I feel like this can be another change. I'm a little bit confused since I feel the patch did multiple things at the same time again..

ChuanqiXu added inline comments.Apr 4 2023, 1:47 AM
clang/lib/Sema/SemaLookup.cpp
3912–3936

What's the intention for the change? And why is the current behavior bad without this?

3912–3936

Oh, I understand why I feel the code is not good since the decl with internal linkage or module linkage shouldn't be visible. So even if there are problems, we should handle them elsewhere.

iains updated this revision to Diff 511327.Apr 6 2023, 2:08 AM

rebased, addressed review comments (patch still needs refactoring)

iains added a comment.Apr 6 2023, 2:14 AM

in the end, we have to deal with the following cases:

  1. decls in the same TU (including GMF and PMF)
  2. decls in the current named module (excluding GMF)
  3. decls on an instantiation path (which can include internal-linkage entities)
  4. decls that are needed by an exported entity from a *differennt" named module [e.g. a module-linkage entity that is used by some exported content].
iains marked 2 inline comments as done.Apr 6 2023, 2:34 AM
iains added inline comments.
clang/include/clang/Sema/Sema.h
2359

actually, as noted, both cases are needed. So I have made that clear by splitting the function into two.

clang/lib/Sema/SemaLookup.cpp
2101

OK, we can always revisit performance later,

5822–5833

sure, we can refactor; the current patch is a placeholder to allow work to continue on p1815.