This is an archive of the discontinued LLVM Phabricator instance.

[C++20] [Modules] Don't make internal entities visible to lookups from other units
Changes PlannedPublic

Authored by ChuanqiXu on Mar 22 2023, 2:44 AM.

Details

Reviewers
iains
Summary

Close https://github.com/llvm/llvm-project/issues/61601

I tried to insert checks in the front of LookupResult::isAvailableForLookup() and other similar places in SemaLookup. But it can't address the above issue correctly.
So I think it may be best to make the entities to be hidden from the lookup from the beginning.

@iains I know this may interfere your current work. So I ping you to wish this one won't bring troubles to you. And I'm wondering if we can implement P1815 simpler by similar methods. (I tried this can't implement P1815 precisely)

Diff Detail

Event Timeline

ChuanqiXu created this revision.Mar 22 2023, 2:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2023, 2:44 AM
ChuanqiXu requested review of this revision.Mar 22 2023, 2:44 AM
ChuanqiXu added a reviewer: iains.
iains added a comment.Mar 22 2023, 3:49 AM

I agree it is complex

  • I think it needs more than is in this patch
  • and there are different rules applied during lookup for typos (which I have had to handle specially, otherwise we get useless suggestions)

Actually, I have patches in my local tree for handling P1815 that do get the behaviour to be correct - but those patches need to be refactored to make the tests into utility functions.

We need to work together to solve this problem - or we will both be trying to do the same things :)

  • I will repost the implementation module patch later today (UK time) which I think is the first part.
  • Then we need to rework the lookup to handle the rules - perhaps that will be a combination of these patches (I do not know yet).

Trying to avoid duplicate/wasted work

Yeah, I feel the current patch is good/innocent personally. But I agree it is pretty frustrating to do duplicated work. So I would love to wait for your P1815 patch to make the decision later.

iains added inline comments.Mar 29 2023, 10:05 AM
clang/lib/Serialization/ASTReader.cpp
7789–7810

It would be good to deal with this during the deserialisation, however I am not sure that we have enough information to know what rules to apply at this point:

  • If we are not in C++20 modules we want to do nothing
  • If we are in a named module then entities with module-linkage that are in the same named module are visible (but not ones from a different named module)
  • In most cases, entities with internal linkage are not 'visible' outside the TU in which they are declared, however when we are instantiating a template, entities that are visible in some TU on the instantiation path become visible (they will probably be 'exposures' per P1815, but that does not prevent them from being found).
  • In the case of typo fixes .. I think we actually need to apply the same rules to found names (since there is no point in offering a typo fix that is actually unusable).

If we can implement those rules here, then that would be most efficient.

  • OTOH, I wonder if eliminating the decls this early will degrade diagnostics e.g. "we found X but it is private to module Y"?
ChuanqiXu added inline comments.Mar 31 2023, 1:25 AM
clang/lib/Serialization/ASTReader.cpp
7789–7810

It would be good to deal with this during the deserialisation

Yeah, I think so too! But the current patch is not good to me now. Since I feel if we don't load decls within internal linkage in deserializer, we may implement P1815 automatically in some level (If I understand things right). So the current patch is not so good. I have a rough idea and I want to find some time to implement it in April.

If we are in a named module then entities with module-linkage that are in the same named module are visible (but not ones from a different named module)

I think we don't need to handle this specially now. Since I feel the current reachability mechanism works good. So we can only care about the entities within internal linkage (and entities in private module fragment).

In most cases, entities with internal linkage are not 'visible' outside the TU in which they are declared, however when we are instantiating a template, entities that are visible in some TU on the instantiation path become visible (they will probably be 'exposures' per P1815, but that does not prevent them from being found).

In my mind, if we can avoid to load entities with internal linkage for module users, then the template instantiation contains entities with internal linkage in other TUs will fail naturally. The only downside is that the error message may be a little bit more confusing. But I think it is acceptable.

iains added inline comments.Mar 31 2023, 1:35 AM
clang/lib/Serialization/ASTReader.cpp
7789–7810

It would be good to deal with this during the deserialisation

Yeah, I think so too! But the current patch is not good to me now. Since I feel if we don't load decls within internal linkage in deserializer, we may implement P1815 automatically in some level (If I understand things right). So the current patch is not so good. I have a rough idea and I want to find some time to implement it in April.

Please also take a look at the WIP in D145965 because (although it needs refactoring) it does actually appear to implement the correct semantics.
(I do not know if I will have time in the next week or two to refactor it)

If we are in a named module then entities with module-linkage that are in the same named module are visible (but not ones from a different named module)

I think we don't need to handle this specially now. Since I feel the current reachability mechanism works good. So we can only care about the entities within internal linkage (and entities in private module fragment).

In most cases, entities with internal linkage are not 'visible' outside the TU in which they are declared, however when we are instantiating a template, entities that are visible in some TU on the instantiation path become visible (they will probably be 'exposures' per P1815, but that does not prevent them from being found).

In my mind, if we can avoid to load entities with internal linkage for module users, then the template instantiation contains entities with internal linkage in other TUs will fail naturally. The only downside is that the error message may be a little bit more confusing. But I think it is acceptable.

I think that if we eliminate the local entities found on the instantiation path we will not be able to pass https://eel.is/c++draft/basic.link#19 TU2
"h(N::A{}); // error: overload set contains TU-local N​::​adl(int)"

Please do not spend too much effort on p1815 since I have a great deal of WIP on this and we would definitely be duplicating effort - the WIP is under D140881 (which I will update in the next few days to include the coverage based on the fixes to visibility in D145965 [of course, it does not matter how we finally fix the visibility issues - but I need something to base the p1815 testing on].

ChuanqiXu planned changes to this revision.Mar 31 2023, 1:48 AM
ChuanqiXu added inline comments.
clang/lib/Serialization/ASTReader.cpp
7789–7810

I think that if we eliminate the local entities found on the instantiation path we will not be able to pass https://eel.is/c++draft/basic.link#19 TU2
"h(N::A{}); // error: overload set contains TU-local N​::​adl(int)"

Oh, great insight!

Please do not spend too much effort on p1815 since I have a great deal of WIP on this and we would definitely be duplicating effort - the WIP is under D140881 (which I will update in the next few days to include the coverage based on the fixes to visibility in D145965 [of course, it does not matter how we finally fix the visibility issues - but I need something to base the p1815 testing on].

Yeah, this was and is what I think. I misunderstood your comment at the first : ) I meant to continue this after your P1815 work gets done. Let me mark this as 'plan changed' to avoid further confusion.

@iains The clang17 is going to be branched in the end of July. And I feel we can't get P1815 in time. How do you feel the original ideas to not load decls with internal linkage in other module units? It is indeed not good since that would make us not be able to pass https://eel.is/c++draft/basic.link#19 TU2 as you mentioned. But I feel it would be much better than now. At least currently https://eel.is/c++draft/basic.link#19 TU2 can't pass too. And we can revert this functionality after we get mature solution.
It is relatively annoying since I've seen multiple issue report about this. And this can be solved by the simple solution to not load internal decls from other module units simply. So I feel it is a user-friendly solution.

iains added a comment.Jun 9 2023, 12:57 AM

Actually, P1815 is almost complete (95%+ in the current patch and i am currently looking at the lambda) however P1815 is ultimately blocked by this (lookup) issue; That is, P1815 is not a solution to the lookup problem - for it to work lookup has to be fixed.

I also made a WIP patch to deal with fixing the lookup issue (so that I could make progress on p1815); https://reviews.llvm.org/D145965

Somehow, we need to combine these two patches into a solution - I believe that the logic in the patch I produced does the correct thing - but we might want to move the checks from Sema => Decl.

Let me see if I can summarise the requirement .. I believe we need to determin the following cases:

  • two decls are in the same TU
  • two decls are in the same named module, but it is not the current one
  • two decls are in the same named module and it is the current one

    during instantiation
  • a decl is found in the instantiation path (which means that, in this case, an internal linkage decl can become visible to lookup).

So .. IMO we should (as I think you want to as well) see if we can get the lookup issues solved first.

[I will refresh my lookup and P1815 patches in a few minutes]

iains added a comment.Jun 9 2023, 1:00 AM

BTW, I am generally very uncomfortable with making the deserialisation process do more of the Sema work.

I think we should treat serialisation / de-serialisation as a fixed task which is only concerned with those two functions, any combining or elision etc. should be done by the consumer (but I know there is already work being done in the de-serialisation, which is unfortunate, it seems a layering issue to me).

We are talking highly related but different things. My first point is on the product/user level.

I think the current status is:

  • Clang17 is going to be branched and there is a significant issue we haven't solved yet.
  • We have a potential workaround for the issue. But the workaround is incorrect for some not-working-yet cases.
  • Also the workaround would affect some other important and big feature implementations (your works). And if we decide to wait for such implementations, we're highly possible not able to get up for clang17 to fix the issues. One the one hand, we need spend a lot of time to test and review it. On the other hand, I am not sure if I would have enough time since I plan to work on clangd for module recently (issue link here if you're interested: https://github.com/clangd/clangd/issues/1293). Also it is not wise to land a big patch before the release.

So here the first decision we need to decide is: should we land the workaround before the branch of clang17?


BTW, I am generally very uncomfortable with making the deserialisation process do more of the Sema work.

I think we should treat serialisation / de-serialisation as a fixed task which is only concerned with those two functions, any combining or elision etc. should be done by the consumer (but I know there is already work being done in the de-serialisation, which is unfortunate, it seems a layering issue to me).

Agreed. I won't do the same decision if we had the chance to implement a new compiler. But for current clang, I don't feel it is a practical decision to refactor it by the complexities and the resources we have.

iains added a comment.Jun 9 2023, 1:27 AM

We are talking highly related but different things. My first point is on the product/user level.

I think the current status is:

  • Clang17 is going to be branched and there is a significant issue we haven't solved yet.
  • We have a potential workaround for the issue. But the workaround is incorrect for some not-working-yet cases.
  • Also the workaround would affect some other important and big feature implementations (your works). And if we decide to wait for such implementations, we're highly possible not able to get up for clang17 to fix the issues. One the one hand, we need spend a lot of time to test and review it.

We have two draft patches for this issue: (this and https://reviews.llvm.org/D145965) - the question is can we combine them to make a complete solution (instead of a workaround that has to be re-visited).

Also it is not wise to land a big patch before the release.

indeed; do you think tha the lookup patch will be too big (or were you referring to P1815 - which I agree is going to be too big).

So here the first decision we need to decide is: should we land the workaround before the branch of clang17?

It would be good to do so - but we need to make sure that the workaround does not break any of the cases (esp. the ADL in instantiation).

It would be good to do so - but we need to make sure that the workaround does not break any of the cases (esp. the ADL in instantiation).

Yeah, this is the problem. It looks like my solution in brain would break the ADL example you gave. But I am not sure if your lookup patch could solve https://github.com/llvm/llvm-project/issues/61807.

indeed; do you think that the lookup patch will be too big (or were you referring to P1815 - which I agree is going to be too big).

I mean P1815. The size of your lookup patch looks acceptable.

iains added a comment.Jun 9 2023, 2:07 AM

It would be good to do so - but we need to make sure that the workaround does not break any of the cases (esp. the ADL in instantiation).

Yeah, this is the problem. It looks like my solution in brain would break the ADL example you gave. But I am not sure if your lookup patch could solve https://github.com/llvm/llvm-project/issues/61807.

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

Yes. Then it is OK for me to delay the workaround.

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

(didn't start to look at the code yet)

I feel like we can't solve https://github.com/llvm/llvm-project/issues/61601 without touching the deserializer.

// Part1.cppm
export module TheMod:Part1;
static int loog = 1;

// Part2.cppm
export module TheMod:Part2;
static int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

static int loog = 3;
export int V = loog;

I feel the compiler will still emit the ODR violation issue.

BTW, there is a case that I am not sure if it is valid about module linkage.

// Part1.cppm
export module TheMod:Part1;
int loog = 1;

// Part2.cppm
export module TheMod:Part2;
int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

int loog = 3;
export int V = loog;

Here all the loog have modules linkage. And is this case an ODR violation or not?

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

(didn't start to look at the code yet)

I feel like we can't solve https://github.com/llvm/llvm-project/issues/61601 without touching the deserializer.

// Part1.cppm
export module TheMod:Part1;
static int loog = 1;

// Part2.cppm
export module TheMod:Part2;
static int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

static int loog = 3;
export int V = loog;

I feel the compiler will still emit the ODR violation issue.

I think that D145965 does fix this?

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

(didn't start to look at the code yet)

I feel like we can't solve https://github.com/llvm/llvm-project/issues/61601 without touching the deserializer.

// Part1.cppm
export module TheMod:Part1;
static int loog = 1;

// Part2.cppm
export module TheMod:Part2;
static int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

static int loog = 3;
export int V = loog;

I feel the compiler will still emit the ODR violation issue.

I think that D145965 does fix this?

I haven't tested it yet. But if it did, it is better to add it as a test case.

BTW, there is a case that I am not sure if it is valid about module linkage.

// Part1.cppm
export module TheMod:Part1;
int loog = 1;

// Part2.cppm
export module TheMod:Part2;
int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

int loog = 3;
export int V = loog;

Here all the loog have modules linkage. And is this case an ODR violation or not?

Hopefully, we can combine them to resolve both the issue and the lookup blocker.

(didn't start to look at the code yet)

I feel like we can't solve https://github.com/llvm/llvm-project/issues/61601 without touching the deserializer.

// Part1.cppm
export module TheMod:Part1;
static int loog = 1;

// Part2.cppm
export module TheMod:Part2;
static int loog = 2;

// TheMod.cppm
export module TheMod;
export import :Part1;
export import :Part2;

static int loog = 3;
export int V = loog;

I feel the compiler will still emit the ODR violation issue.

I think that D145965 does fix this?

I haven't tested it yet. But if it did, it is better to add it as a test case.

Done (please note that the position on D145965 is that it is a proof of principle; it might well benefit from moving some of the code to module.{h, cpp})