This is an archive of the discontinued LLVM Phabricator instance.

[C++20] [Modules] Judge isInCurrentModule currently
ClosedPublic

Authored by ChuanqiXu on Apr 14 2022, 11:39 PM.

Details

Reviewers
iains
rsmith
Group Reviewers
Restricted Project
Commits
rGce2257d69fd0: [C++20] [Modules] Judge current module correctly
Summary

Now the implementation would accept following code:

//--- impl.cppm
module M:impl;
class A {};

//--- M.cppm
export module M;
import :impl;

//--- Use.cpp
import M;
void test() {
    A a; // Expected error. A is not visible here.
}

which is clearly wrong. The root cause is the implementation of isInCurrentModule would return true if the module is a partition! So in the above example, although Use.cpp is not a module unit, isInCurrentModule would still return true when the compiler tries to see if the owning module of A is the current module. I believe this is an oversight. And we could fix it simply by comparing the primary module name.

Diff Detail

Event Timeline

ChuanqiXu created this revision.Apr 14 2022, 11:39 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 11:39 PM
Herald added a subscriber: dexonsmith. · View Herald Transcript
ChuanqiXu requested review of this revision.Apr 14 2022, 11:39 PM
ChuanqiXu planned changes to this revision.Apr 15 2022, 12:15 AM
iains added inline comments.Apr 18 2022, 6:42 AM
clang/include/clang/Basic/Module.h
537–543

I do not understand the purpose of this change, we already iterated over making the function more efficient.

For C++20 modules there cannot be more than one parent - so that the nested call to getTopLevelModule(), via getTopLevelModuleName() is redundant.

Is there some other case that needs special handling?
(If so then I think we should guard this at a higher level)

clang/lib/Sema/SemaLookup.cpp
1572–1573

... of course, "correctness before optimisation", but this test function seems like it would benefit from some refactoring as suggested below.

it seems a bit unfortunate that we are placing a string comparison in the "acceptable decl" lookup path, which might be made many times (the previous use of the string compare was in making the module decl - which only happens once in a TU).

  • Sema has visibility of the import graph of modules (via DirectModuleImports and Exports).
  • We also know what the module parse state is (FirstDecl, GMF etc)
  • If were are not parsing a module (i.e. LangOpts.CurrentModule.empty()) the we could return false early?
  • if ModuleScopes is empty we know we are not (yet) parsing a module
  • Once we are parsing a module then we can test ModuleScopes.back() to find the current module

there are only two users of isInCurrentModule() - so maybe we refactor could make more use of the higher level state information - and / or cache information on the parents of partitions to avoid the complexity of parsing strings each time.

what do you think?

clang/test/Modules/cxx20-10-1-ex2.cpp
62–69

@rsmith

so, I wonder if this is intended, it seems to defeat the mechanisms to avoid circular dependencies.

TU1:
module M : Impl; // this is said not to generate a circular dependency because M : Impl does not implicitly import M

import M; // but now we've done it manually ... and if that module is implicitly exported to importers of the Impl. module interface....
....

TU2:

export module M;
import : Impl;

// ... then boom! we have a circular dep.

Address comments:

  • Add Cache in isInCurrentModule to avoid too many string comparisons.
ChuanqiXu added inline comments.Apr 18 2022, 11:57 PM
clang/include/clang/Basic/Module.h
537–543

The primary purpose of the change is that we need to get the primary module name for getLangOpts().CurrentModule. So we need a static member function which takes a StringRef.

Another point might be the correctness. For example, for the private module fragment, the current implementation would return the right result while the original wouldn't. (Although we didn't use private module fragment much...)

For the most case, I admit the current implementation would call more function a little more. But I think it is negligible.

clang/lib/Sema/SemaLookup.cpp
1572–1573

Yeah, string comparison is relatively expensive and we should avoid that. I added a search cache here and I thought it should handle the most cases.

For the solution you described, I am not sure if I understood correctly. It looks like a quick return path if we found we are not parsing a module? If yes, I think the current check could do similar works.

clang/test/Modules/cxx20-10-1-ex2.cpp
62–69

@rsmith should be the right one to answer this.

Personally, I feel the case you described should be avoided. And I found it is hard to right cyclic module dependent code in current fashion. Since when I compile TU1, the compiler suggests that it failed to found M. And when I compile TU2, the compiler suggests that it failed to found M:impl... So it looks like we could avoid worrying this now. Maybe the build system could give better diagnostic message in the future. I think this might not be a big problem.

iains added inline comments.Apr 19 2022, 2:10 AM
clang/include/clang/Basic/Module.h
537–543

(Maybe I am worrying too much - but it does seem that we have quite some complexity in an operation that we expect to repeat many times per TU).


The primary purpose of the change is that we need to get the primary module name for getLangOpts().CurrentModule. So we need a static member function which takes a StringRef.

That is, of course, fine (and we are stuck in the case that we are building a partition, with the fact that the primary module is not available). It seems that there's not a lot we can do there.

When we have module, we could cache the result of the split, tho - so have a "PrimaryModuleName" entry in each module and populate it on the first use. We are still stuck with the string comparison - but at least we could avoid repeating the split?
As I noted, the reason we did not bother to do this in the first use was that that use was only once per TU).

When we are building a consumer of a module, on the other hand, we *do* have the complete module import graph.

Another point might be the correctness. For example, for the private module fragment, the current implementation would return the right result while the original wouldn't. (Although we didn't use private module fragment much...)

Of course, we must fix correctness issues ..
Since there is only GMF and PMF, perhaps we could reasonably treat those specially if (isGlobalModule()) .. and/or if (isPrivateModule())?

For the most case, I admit the current implementation would call more function a little more. But I think it is negligible.

It is hard to judge at the moment - perhaps we should carry on with what we have and then instrument some reasonable body of code?

clang/lib/Sema/SemaLookup.cpp
1572–1573

That looks good.

On the actual test itself:

1/
we have the check of (M->isGlobalModule() && !M->Parent)

So that answers "this decl is part of the global module", since that fires when we are parsing the GMF. What will we answer,
if M->isGlobalModule() && M->Parent ?
it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?

2/
When we are actually parsing the PMF we have a similar case, no?
(so that there is no Parent pointer set yet, and I think that means that we would need to examine ModuleScopes to find the Parent module?)

clang/test/Modules/cxx20-10-1-ex2.cpp
62–69

Right, one cannot actually build the faulty module (actually, in the same way as one cannot build example 3 from section 10.3 of the std).

So probably my concern is unfounded - it's just something the user will have to figure out - we are not required to diagnose the situation (and that would be hard to do for the compiler anyway).

ChuanqiXu updated this revision to Diff 423572.Apr 19 2022, 3:02 AM

Address comments.

ChuanqiXu added inline comments.Apr 19 2022, 3:15 AM
clang/include/clang/Basic/Module.h
537–543

(Maybe I am worrying too much - but it does seem that we have quite some complexity in an operation that we expect to repeat many times per TU).

Never mind! This is the meaning of reviewing.

Of course, we must fix correctness issues ..

Since there is only GMF and PMF, perhaps we could reasonably treat those specially if (isGlobalModule()) .. and/or if (isPrivateModule())?

Done.

After consideration, I think what you here makes sense. Especially now Module::getPrimaryModuleInterfaceName is only used for Sema.getLangOpts().CurrentModule. So it might be better to handle this in Sema or LangOptions.

549–550

I thought to add an assertion here. But I feel like it is not so necessary and friendly.

clang/lib/Sema/SemaLookup.cpp
1572–1573

1/
we have the check of (M->isGlobalModule() && !M->Parent)

So that answers "this decl is part of the global module", since that fires when we are parsing the GMF. What will we answer,
if M->isGlobalModule() && M->Parent ?
it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?

Good Catcha! Now in this revision, the function would return false for the case of M->isGlobalModule() && M->Parent. Another point I found is that the name of the function is not quite right. Since global module fragment belongs to global module, it wouldn't be within the current module. So I chose a new name isUsableModule for it from the wording of [module.global.frag]p1. I tried to call it isVisibleModule but I found there is already one... I feel isUsableModule is not perfectly good... Any suggestion?

2/
When we are actually parsing the PMF we have a similar case, no?

There might not be a problem. Since PMF belongs to a named module technically. So the current implementation would handle it correctly.

clang/test/Modules/cxx20-10-1-ex2.cpp
62–69

Agreed. This should be the work for build systems.

ChuanqiXu updated this revision to Diff 423579.Apr 19 2022, 3:28 AM

Rename Sema::CurrentModuleUnitsCache to Sema::UsableModuleUnitsCache

clang/lib/Sema/SemaLookup.cpp
1581

I thought to add a comment for this. But I feel like it might be too wordy...

iains added a comment.Apr 19 2022, 3:40 AM

it would be good to add tests for GMF and PMF cases?

clang/include/clang/Basic/Module.h
537–543

Especially now Module::getPrimaryModuleInterfaceName is only used for Sema.getLangOpts().CurrentModule. So it might be better to handle this in Sema or LangOptions.

Yeah, perhaps cache that split in Sema - adding another LangOption does not feel quite right (unless we find we need this information more widely - but I didn't see that yet).

542–543

if we find this is repeated too many times, we could cache it in the module as PrimaryModuleName.

clang/lib/Sema/SemaLookup.cpp
1572–1573

1/
we have the check of (M->isGlobalModule() && !M->Parent)

So that answers "this decl is part of the global module", since that fires when we are parsing the GMF. What will we answer,
if M->isGlobalModule() && M->Parent ?
it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?

Good Catcha! Now in this revision, the function would return false for the case of M->isGlobalModule() && M->Parent. Another point I found is that the name of the function is not quite right. Since global module fragment belongs to global module, it wouldn't be within the current module. So I chose a new name isUsableModule for it from the wording of [module.global.frag]p1. I tried to call it isVisibleModule but I found there is already one... I feel isUsableModule is not perfectly good... Any suggestion?

actually that seems a reasonable name for now - I am beginning to feel that once we have things basically working - the whole module visibility stack could be refactored ...

2/
When we are actually parsing the PMF we have a similar case, no?

There might not be a problem. Since PMF belongs to a named module technically. So the current implementation would handle it correctly.

Yes, but....

a) When we are actually parsing the PMF, there will be no 'Parent' pointer set, so that we would have to find the parent some other way, right?
(perhaps it is not too complex - since we should already have diagnosed a bad parse situation if we see a :private module line in the wrong place - so that all we need to be able to do is to locate the current module - that has to be the top level one by the rules of PMF)

https://eel.is/c++draft/module#private.frag-1

b) The contents of the PMF are not 'part of the module' reachable to importers of the parent module - so it seems that we only have to consider then when the TU is the current top level module interface.

https://eel.is/c++draft/module#reach-3

1572–1573

1/
we have the check of (M->isGlobalModule() && !M->Parent)

So that answers "this decl is part of the global module", since that fires when we are parsing the GMF. What will we answer,
if M->isGlobalModule() && M->Parent ?
it seems that with the changes above we will now answer that the decl is part of the Parent - not part of the GM?

Good Catcha! Now in this revision, the function would return false for the case of M->isGlobalModule() && M->Parent. Another point I found is that the name of the function is not quite right. Since global module fragment belongs to global module, it wouldn't be within the current module. So I chose a new name isUsableModule for it from the wording of [module.global.frag]p1. I tried to call it isVisibleModule but I found there is already one... I feel isUsableModule is not perfectly good... Any suggestion?

2/
When we are actually parsing the PMF we have a similar case, no?

There might not be a problem. Since PMF belongs to a named module technically. So the current implementation would handle it correctly.

ChuanqiXu updated this revision to Diff 423797.Apr 19 2022, 7:54 PM

Address comments:

  • Add tests for GMF
  • Avoid to compare string if M is private module fragment.
ChuanqiXu added inline comments.Apr 19 2022, 8:03 PM
clang/include/clang/Basic/Module.h
537–543

Yeah, it makes sense to add a new method in Sema once it is used more widely.

542–543

I am worrying the benefit is not worth for the cost. It requires to change both Serialization and Deserialization to add a new member in Module. The scale would be larger and the benefit wouldn't be significant.

clang/lib/Sema/SemaLookup.cpp
1572–1573

I am beginning to feel that once we have things basically working - the whole module visibility stack could be refactored ...

Completely agreed.

When we are actually parsing the PMF, there will be no 'Parent' pointer set, so that we would have to find the parent some other way, right?

This is not right. When we are parsing the PMF, we already created its parent. So we could assign the Parent field. This is different from GMF.

The contents of the PMF are not 'part of the module' reachable to importers of the parent module - so it seems that we only have to consider then when the TU is the current top level module interface.

Yeah, I think the newest revision addresses the comment.

1572–1573

It looks like there is anything missing?

iains accepted this revision.Apr 20 2022, 3:41 AM

thanks for multiple iterations!

I think maybe you are using a too old clang-format?
it seems that clang-format >= llvm-14 removes spaces around module partition colons ... so A : Part==>A:Part and : Impl => :Impl.
IMO this is the correct formatting (since it follows the examples in the standard).

Anyway, this is most likely the reason for the clang format fails on the CI.

This now LGTM - but please fix the formatting in the test cases.

clang/lib/Sema/SemaLookup.cpp
1572–1573

When we are actually parsing the PMF, there will be no 'Parent' pointer set, so that we would have to find the parent some other way, right?

This is not right. When we are parsing the PMF, we already created its parent. So we could assign the Parent field. This is different from GMF.

Indeed; I missed that the Parent is set on the creation for the PMF.

This revision is now accepted and ready to land.Apr 20 2022, 3:41 AM

thanks for multiple iterations!

Thanks for the reviewing!

I think maybe you are using a too old clang-format?
it seems that clang-format >= llvm-14 removes spaces around module partition colons ... so A : Part==>A:Part and : Impl => :Impl.
IMO this is the correct formatting (since it follows the examples in the standard).

Anyway, this is most likely the reason for the clang format fails on the CI.

This now LGTM - but please fix the formatting in the test cases.

Yeah... It looks like I did't set clang-format to the newest version. Thanks for reminding. I would do it when committing the patch.

This revision was landed with ongoing or failed builds.Apr 20 2022, 8:18 PM
This revision was automatically updated to reflect the committed changes.