This is an archive of the discontinued LLVM Phabricator instance.

[C++20] Diagnose invalid and reserved module names
ClosedPublic

Authored by aaron.ballman on Oct 28 2022, 7:52 AM.

Details

Summary

[module.unit]p1 specifies that module and import are invalid components of a module name, that module names cannot contain reserved identifiers, and that std followed by zero or more digits is reserved.

The first issue (module and import pseudo-keywords) requires a diagnostic, the second issue (use of reserved identifiers) does not require a diagnostic. We diagnose both the same -- the code is ill-formed unless the module declaration is in a system header. This allows STL implementations to use the reserved module names while preventing users from stealing them out from under us.

Diff Detail

Event Timeline

aaron.ballman created this revision.Oct 28 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 7:52 AM
aaron.ballman requested review of this revision.Oct 28 2022, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2022, 7:52 AM
aaron.ballman retitled this revision from [C++20] Diagnosed invalid and reserved module names to [C++20] Diagnose invalid and reserved module names.Oct 28 2022, 7:53 AM
cor3ntin added inline comments.
clang/lib/Sema/SemaModule.cpp
288
293

I think module in a standard header is still invalid. We could not do the check in a system header at all

cor3ntin added inline comments.Oct 28 2022, 8:02 AM
clang/test/Modules/reserved-names.cpp
37 ↗(On Diff #471550)

you should add a test for __test

I'll leave it to the modules experts to decide whether they're happy with this, but I had a drive-by.

Also, I see none of the tests validate 'import', just 'export'. Based on the standard, BOTH are illegal, right :D

clang/lib/Sema/SemaModule.cpp
279

Urgh, could you make this an enum?

I'll leave it to the modules experts to decide whether they're happy with this, but I had a drive-by.

Also, I see none of the tests validate 'import', just 'export'. Based on the standard, BOTH are illegal, right :D

export module export is perfectly fine conforming

aaron.ballman marked 2 inline comments as done.Oct 28 2022, 8:13 AM

I'll leave it to the modules experts to decide whether they're happy with this, but I had a drive-by.

Also, I see none of the tests validate 'import', just 'export'. Based on the standard, BOTH are illegal, right :D

Heh, Erich is joking about my malicious reading of the standard where module-name has blanket wording for what is reserved, but that same grammar production is used with importing modules. e.g., a malicious reading of the standard could make you think import module std; is also invalid because it's using a reserved module name.

export module export is perfectly fine conforming

Nope, that's not going to compile because export is a real keyword instead of an identifier masquerading as a keyword like module and import.

clang/lib/Sema/SemaModule.cpp
279

Absolutely.

288

lol but my way is so much more complicated, so it must be more right... ;-)

293

Good catch, that's a think-o on my part!

erichkeane added inline comments.Oct 28 2022, 8:20 AM
clang/lib/Sema/SemaModule.cpp
288

Is this logic right? By my reading of the standard,

export module stdFoo.bar << fine
export module std00Foo.bar << fine

as neither 'consist of std followed by zero or more digits'.

This patch looks like it'll only allow the 1st, but not the 2nd.

clang/test/Modules/reserved-names.cpp
30 ↗(On Diff #471550)

SHOULD this fail?

This is NOT a module name beginning with an identifier-consisting-of-std followed by 0 or more digits.

The 'containing' part applies to just the reserved identifiers.

cor3ntin added inline comments.Oct 28 2022, 8:23 AM
clang/lib/Sema/SemaModule.cpp
288

I've been wondering the same, I think the standard is a bit confusing in that regard.
std00Foo is "std followed by 0 or more digit".

erichkeane added inline comments.Oct 28 2022, 8:25 AM
clang/lib/Sema/SemaModule.cpp
288

But it isn't 'consisting' of that, it is consisting of std, 0 or more digits, then 'Foo'.

aaron.ballman marked 3 inline comments as done.

Updated based on review feedback and reflection.

I think the standards wording here is a bit unclear as to what's intended and I had misunderstood the wording around which part of the path needs to be checked. Specifically:

All module-names either beginning with an identifier consisting of std followed by zero or more digits or containing a reserved identifier ([lex.name]) are reserved and shall not be specified in a module-declaration; no diagnostic is required.

I took this to mean all identifier components of a module name should check for std followed by zero or more digits, but I now believe it means only the first component in the path and it intends to only catch use of std[0-9]+ and not .*std[0-9]+.*. e.g., std.foo is reserved while foo.std and std12Three are not. I've changed the patch accordingly while addressing the other review feedback.

cor3ntin added inline comments.Oct 28 2022, 11:49 AM
clang/lib/Sema/SemaModule.cpp
272–279

Did you consider making this a function?
I think you could just check Path[0] for reserved stdxxx syntax, and then all parts for reserved/import/module. Might simplify the code a bit.

clang/test/Modules/reserved-names-2.cpp
7

should_fail? Did you mean the opposite?

aaron.ballman marked 2 inline comments as done.

Updating based on review feedback.

clang/lib/Sema/SemaModule.cpp
272–279

I'm not opposed.

clang/test/Modules/reserved-names-2.cpp
7

Indeed I do, good catch!

Thx.

clang/test/Modules/reserved-names-1.cpp
42

import?

cor3ntin accepted this revision.Oct 28 2022, 1:49 PM

Modulo the import typo. I'm happy with this.

This revision is now accepted and ready to land.Oct 28 2022, 1:49 PM

Are malformed imports an issue or are they already covered? There is limit test coverage for import. Am I missing something?

import import;
import module;
tschuett added inline comments.Oct 29 2022, 5:30 AM
clang/lib/Sema/SemaModule.cpp
148

Returns

I think the standards wording here is a bit unclear as to what's intended and I had misunderstood the wording around which part of the path needs to be checked. Specifically:

All module-names either beginning with an identifier consisting of std followed by zero or more digits or containing a reserved identifier ([lex.name]) are reserved and shall not be specified in a module-declaration; no diagnostic is required.

I took this to mean all identifier components of a module name should check for std followed by zero or more digits, but I now believe it means only the first component in the path and it intends to only catch use of std[0-9]+ and not .*std[0-9]+.*. e.g., std.foo is reserved while foo.std and std12Three are not. I've changed the patch accordingly while addressing the other review feedback.

Oh, yeah, the wording here looks confusing. And I agree with your understanding.

clang/lib/Sema/SemaModule.cpp
282

std modules should be irreverent with system headers; The intuition of the wording should be that the users can't declare modules like std or std.compat to avoid possible conflicting. The approach I imaged may be add a new compilation flags (call it -fstd-modules) now. And if the compiler found a std module declaration without -fstd-modules, emit an error.

For now, I think we can skip the check for -fstd-modules and add it back when we starts to support std modules actually.

clang/test/Modules/reserved-names-1.cpp
23–24

or diagnose about export module std.foo.

cor3ntin added inline comments.Oct 31 2022, 3:52 AM
clang/lib/Sema/SemaModule.cpp
282

The idea is that standard modules are built from system directories... it seems a better heuristic than adding a flag for the purpose of 1 diagnostics ( maybe some other system library could in theory export std with no warning, but I'm not super worried about that being a concern in practice)

erichkeane added inline comments.Oct 31 2022, 6:15 AM
clang/docs/ReleaseNotes.rst
347

So question: ARE we diagnosing all 'use' of invalid/reserved module names? So the idea is you cannot import:
std.concepts? Or are we diagnosing export only?

ChuanqiXu added inline comments.Oct 31 2022, 6:22 AM
clang/lib/Sema/SemaModule.cpp
282

The idea is that standard modules are built from system directories...

This is not true. For example, if someday libc++ supports std modules, then we need to build the std modules in our working directory, which is not system directories. And ideally, we would distribute and install module file in the system directories. But it is irreverent with the path of the source file.

aaron.ballman added inline comments.Oct 31 2022, 6:30 AM
clang/lib/Sema/SemaModule.cpp
282

then we need to build the std modules in our working directory, which is not system directories.

-isystem, pragmas, and linemarkers are all ways around that -- I don't think we need a feature flag for this, unless I'm misunderstanding something.

aaron.ballman marked 3 inline comments as done.Oct 31 2022, 6:45 AM

Are malformed imports an issue or are they already covered? There is limit test coverage for import. Am I missing something?

import import;
import module;

It's unclear to me whether the constraints also apply to module imports or not. The wording for the constraints is in the section about module units and the section about module imports doesn't say anything further. Certainly we want to allow import std; instead of telling the user about the use of a reserved identifier, but it's less clear to me about import module; -- if the user writes that, they'll either get an error about not being able to find a module by that name, or they'd import the module (but how did they produce that module in the first place?).

clang/docs/ReleaseNotes.rst
347

Export only for the moment. I'll update the release note accordingly.

clang/lib/Sema/SemaModule.cpp
148

Good catch!

aaron.ballman marked 2 inline comments as done.

Updated based on review feedback:

  • Fixed a typo
  • Clarified the release note
  • Added a test case for std.foo being reserved
ChuanqiXu added inline comments.Oct 31 2022, 6:55 AM
clang/lib/Sema/SemaModule.cpp
282

Although it may be a little bit nit picker, the module unit which declares the std modules won't be header. It is a module unit. So it requires we implement std modules by wrapping linemarkers around the std modules declaration, which looks a little bit overkill.

And another point is that maybe we need to introduce another feature flags to implement std modules. Although I tried to implement std modules within the current features, I can't prove we can implement std modules in that way in the end of the day.

Let me add some more words. The standards require us to implement std modules without deprecating the system headers. This strategy leads the direction to "implement the components in the original headers and control their visibility in the std module unit".
It may look like:

//--- std.cppm
module;
#include <algorithm>
...
export module std;

Then how can control the visibility? In my original experiments, I use the style:

//--- std.cppm
module;
#include <algorithm>
...
export module std;
export namespace std {
    using std::sort;
}

but this doesn't always work. For example, we can't using a in-class friend definition. And there are more reasons, the unreachable declarations in the global module fragment (the section from module; to export module [module_name]) can be discarded to reduce the size of the module file. And the reachable rules are complex. But the simple story is that it is highly possible the a lot of necessary declarations in global module fragment in the above snippet would be discarded so that the user can't use std modules correctly. I mean, we may need a special feature flag for it. And the method with system headers looks not good and semantics are not so consistency.

ChuanqiXu added inline comments.Oct 31 2022, 6:56 AM
clang/test/Modules/reserved-names-1.cpp
26

It looks like the diagnostic message should be 'std.foo is a ...'

aaron.ballman marked an inline comment as done.Oct 31 2022, 6:59 AM
aaron.ballman added inline comments.
clang/test/Modules/reserved-names-1.cpp
26

We tell the user which part of the path name is reserved instead of giving them the whole path name. That helps for cases like: export module aaron.wrote.this.awesome.module; If there's confusion from this, we could probably change the diagnostic to be 'this' is a reserved identifier within module name 'aaron.wrote.this.awesome.module', but because you can only declare one module at a time, that seemed like overkill.

ChuanqiXu added inline comments.Oct 31 2022, 7:02 AM
clang/test/Modules/reserved-names-1.cpp
26

Got it. Makes sense.

erichkeane added inline comments.Oct 31 2022, 7:04 AM
clang/lib/Sema/SemaModule.cpp
282

IMO, any such additional flag (say -isystem-module) should ALSO use the isInSystemHeader infrastructure. I suspect nearly every place we use isInSystemHeader we also mean to exclude a system-module as well.

I think that any such flag can/should be added later as you figure out how it should be specified/work. That said, when you do so, it should either also feed isInSystemHeader, or basically every use of isInSystemHeader should ALSO changed to use the new flag as well

aaron.ballman marked an inline comment as done.Oct 31 2022, 7:12 AM
aaron.ballman added inline comments.
clang/lib/Sema/SemaModule.cpp
282

I think that any such flag can/should be added later as you figure out how it should be specified/work. That said, when you do so, it should either also feed isInSystemHeader, or basically every use of isInSystemHeader should ALSO changed to use the new flag as well

+1, that's my thinking as well.

ChuanqiXu added inline comments.Oct 31 2022, 7:24 AM
clang/lib/Sema/SemaModule.cpp
282

The main confusion part to me is that why we connect std modules with system paths? I know implementors can work around the check like the tests did. But what's the point? I know every header of libcxx contains:

#ifndef _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER
#  pragma GCC system_header
#endif

but it is for the compatibility with GCC. And it looks not so meaningful to force the implementation of modules to keep such contraints.

clang/test/Modules/reserved-names-1.cpp
34

We lack a test for foo.std;

aaron.ballman marked an inline comment as done.Oct 31 2022, 7:36 AM
aaron.ballman added inline comments.
clang/lib/Sema/SemaModule.cpp
282

The main confusion part to me is that why we connect std modules with system paths?

We consider the system paths to be "special" in that they can do things "user" paths cannot do. I think we want to keep that model for modules because of how successful it has been for includes. (e.g., don't suggest fixits in a system module but do suggest them for user modules).

clang/test/Modules/reserved-names-1.cpp
34

reserved-named-2.cpp has that test (it uses std0 instead of std). Is that sufficient?

ChuanqiXu accepted this revision.Oct 31 2022, 7:45 AM

LGTM if we add a test for foo.std.

clang/docs/ReleaseNotes.rst
349

It reads odd about system header. But I can't get better name now (I guess we don't have one now). I think we need to update the StandardCPlusCplusModules.rst documents in a new section. I'll take it.

clang/lib/Sema/SemaModule.cpp
282

OK, I got it and it won't be a problem we can't workaround.

clang/test/Modules/reserved-names-1.cpp
34

I feel better with std

philnik added inline comments.
clang/lib/Sema/SemaModule.cpp
282

IIUC this would prevent the library from handling the std module the same as a user module, right? AFAIK the actual use of _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something? Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.

aaron.ballman marked 9 inline comments as done.Oct 31 2022, 10:03 AM
aaron.ballman added inline comments.
clang/lib/Sema/SemaModule.cpp
282

AFAIK the actual use of _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something?

Why would the library want a diagnostic telling them they're using a reserved identifier as a module name?

Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.

It also allows users to produce modules with reserved identifiers. It's an error that can't be downgraded specifically because I don't think we want our implementation to give arbitrary users that ability.

aaron.ballman marked 2 inline comments as done.

Updated based on review feedback.

Added an additional test case, reworked the comments in one of the tests as well.

aaron.ballman marked 2 inline comments as done.Oct 31 2022, 10:09 AM
aaron.ballman added inline comments.
clang/test/Modules/reserved-names-1.cpp
34

Okay, I added an additional test.

philnik added inline comments.Oct 31 2022, 10:41 AM
clang/lib/Sema/SemaModule.cpp
282

AFAIK the actual use of _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER is to enable warnings in the headers for development, which would not work with the modules with this patch, or am I misunderstanding something?

Why would the library want a diagnostic telling them they're using a reserved identifier as a module name?

I don't mean specifically this error, I mean more generally that other warnings should be generated from std modules. Treating the headers as system headers disables most warnings, which is the reason libc++ treat them as normal headers in the tests.

Is there a reason this isn't a warning that's an error by default? That would allow the library to disable it and still serve the same purpose.

It also allows users to produce modules with reserved identifiers. It's an error that can't be downgraded specifically because I don't think we want our implementation to give arbitrary users that ability.

I think there should be some way to enable normal warnings in the special modules, since it makes the life of library developers a lot easier. I don't care whether that's through disabling a warning or some special sauce to enable warnings from the std module, but there should be some way.

aaron.ballman added inline comments.Oct 31 2022, 10:45 AM
clang/lib/Sema/SemaModule.cpp
282

I don't mean specifically this error, I mean more generally that other warnings should be generated from std modules. Treating the headers as system headers disables most warnings, which is the reason libc++ treat them as normal headers in the tests.

Ahhh, thank you, that makes a lot more sense to me. :-D

I think there should be some way to enable normal warnings in the special modules, since it makes the life of library developers a lot easier. I don't care whether that's through disabling a warning or some special sauce to enable warnings from the std module, but there should be some way.

Just like we have -Wsystem-headers, I would expect we'd have something similar for modules (or reuse it, perhaps with a different name, for both headers and modules).

philnik added inline comments.Oct 31 2022, 11:37 AM
clang/lib/Sema/SemaModule.cpp
282

-Wsystem-headers doesn't work because that enables warnings in all system headers, but we only want the warnings from the system library that we write, i.e. libc++. Or can you somehow control in which system headers warnings are emitted?

aaron.ballman added inline comments.Oct 31 2022, 11:49 AM
clang/lib/Sema/SemaModule.cpp
282

-Wsystem-headers doesn't work because that enables warnings in all system headers, but we only want the warnings from the system library that we write, i.e. libc++

You're correct that it enabled warnings in all system headers if used from the command line, but you can use pragmas to control which warnings are or are not enabled on a file by file basis.

However, maybe I'm not understanding your concerns properly, so let me back up a step. With this patch, trying to create a module with a reserved name fails with an error unless the source file with the module declaration is in a file considered to be a "system header" (whether through -isystem, a pragma, linemarkers, etc). For libc++, I was thinking this should not be an issue because libc++ has _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER to control whether the file is considered a system header or not, so when exporting modules you have to ensure that macro is not defined. If you want to test the behavior as though the user was the one declaring that exported module... well, they get an error because they're not allowed to define that module, so I'm not certain what value there is in testing that specific situation. Am I misunderstanding something?

philnik added inline comments.Nov 1 2022, 8:34 AM
clang/lib/Sema/SemaModule.cpp
282

My understanding right now is that libc++ would have to build the module as a system header. You want this, so users have to try really hard to produce a module with a reserved identifier, which makes sense. Because of that, warnings generated inside the module would be suppressed though.
Essentially, my concern is that we don't get all the warnings we want when testing anything inside the std module. To get warnings from the std module we could use -Wsystem-headers. The problem there is that it would also produce warnings in headers we have no control over, which means we can't use it.

aaron.ballman added inline comments.Nov 1 2022, 9:18 AM
clang/lib/Sema/SemaModule.cpp
282

Ah, thank you for clarifying! I think we're considering the libc++ implementation strategy from different angles. One way to do it is as you said -- build the module as a system header. But I was thinking libc++ would do something a bit more novel (and maybe that's a bad idea!):

# __LINE__ __FILE__ 1 3 // Enter system header
export module std;
# __LINE__ __FILE__ 2 3 // Leave system header

and leave the rest of the module code "outside" of the faked up system header block of linemarker directives.

philnik added inline comments.Nov 1 2022, 9:59 AM
clang/lib/Sema/SemaModule.cpp
282

I guess it's # <line> <file> <mode?> <???>? That seems to me quite hacky. Maybe it would be nicer if we add an attribute like [[clang::system_module]] that would allow reserved identifiers, and maybe even ensure that it's actually one. Though that might be overkill, and would again allow people to more easily create modules with reserved identifiers. It would make the code a lot nicer to read though.

aaron.ballman added inline comments.Nov 1 2022, 10:51 AM
clang/lib/Sema/SemaModule.cpp
282

I guess it's # <line> <file> <mode?> <???>? That seems to me quite hacky.

Yeah, this is a pretty arcane feature but it's been around for ages. GCC documents this feature (Clang does not and we absolutely should): https://gcc.gnu.org/onlinedocs/gcc-12.2.0/cpp/Preprocessor-Output.html#Preprocessor-Output

It would make the code a lot nicer to read though.

It would, but we have to balance protecting this reserved space from the compiler while we can against readability of (already generally unreadable, IMHO) STL headers. But then again, linemarkers allow the user to do this themselves in the same manner as an attribute would. Between the two, I think linemarkers are far less likely to be used by users just trying to get their code to compile without worrying about the details of what they're doing.

philnik added inline comments.Nov 1 2022, 11:20 AM
clang/lib/Sema/SemaModule.cpp
282

I guess it's # <line> <file> <mode?> <???>? That seems to me quite hacky.

Yeah, this is a pretty arcane feature but it's been around for ages. GCC documents this feature (Clang does not and we absolutely should): https://gcc.gnu.org/onlinedocs/gcc-12.2.0/cpp/Preprocessor-Output.html#Preprocessor-Output

Thanks for the link!

It would make the code a lot nicer to read though.

It would, but we have to balance protecting this reserved space from the compiler while we can against readability of (already generally unreadable, IMHO) STL headers. But then again, linemarkers allow the user to do this themselves in the same manner as an attribute would. Between the two, I think linemarkers are far less likely to be used by users just trying to get their code to compile without worrying about the details of what they're doing.

Yeah, I also thought that it might be a bit too easy to add an attribute.
It would also be possible to just have an internal header like

__export_module_std
#pragma GCC system_header
export module std;

right? That wouldn't be great, but it would be a lot easier to understand than linemarkers.
As a side note: Anything that we only have to make available through modules could be a lot more readable, since the preprocessor wouldn't be a problem anymore. That won't happen soon, but I can dream.

Anyways, since we can enable warnings inside modules I don't have any concerns here anymore. Thanks for being so patient with me!

I believe we now have consensus on the patch. I'll land it in a few days in case any reviewers want to weigh in now that the discussion has stabilized.

clang/lib/Sema/SemaModule.cpp
282

It would also be possible to just have an internal header like ... right?

Yes, that should work I believe. And yes, that would be more direct than using linemarkers.

Anyways, since we can enable warnings inside modules I don't have any concerns here anymore. Thanks for being so patient with me!

Thank you for the great discussion! If we find we need an attribute/something more here in the future. we can certainly revisit once we have a concrete problem to solve.

This revision was landed with ongoing or failed builds.Nov 3 2022, 5:30 AM
This revision was automatically updated to reflect the committed changes.