This is an archive of the discontinued LLVM Phabricator instance.

[Modules] More descriptive diagnostics for misplaced import directive
ClosedPublic

Authored by sepavloff on Aug 7 2015, 1:18 PM.

Details

Summary

If an import directive was put into wrong context, the error message was obscure,
complaining on misbalanced braces. To get more descriptive messages, annotation
tokens related to modules are processed where they must not be seen.

Diff Detail

Repository
rL LLVM

Event Timeline

sepavloff updated this revision to Diff 31539.Aug 7 2015, 1:18 PM
sepavloff retitled this revision from to [Modules] More descriptive diagnostics for misplaced import directive.
sepavloff updated this object.
sepavloff added a subscriber: cfe-commits.
rsmith added a subscriber: rsmith.Aug 7 2015, 1:54 PM

This doesn't recover from the errors properly any more (leaving junk behind in the 'current module' stack), and somewhat breaks the original purpose of these checks, which is to detect and diagnose missing close braces. Here's what I suggest:

  1. Keep the isEofOrEom calls as-is.
  2. Diagnose the unexpected EOM tokens as you do now (but customize the diagnostics in the case of a module_end token).
  3. Once you've diagnosed an EOM token, don't consume it; instead update the annotation so you know not to diagnose it again, and bail out of parsing the current construct.

Then, when we hit an EOM, we'll diagnose it once at the innermost level, bail out all the way to the top level, and then handle it properly.

include/clang/Sema/Sema.h
1781–1784 ↗(On Diff #31539)

I don't think this is the right interface. The calls from the parser expect an error to be issued, and don't do the right thing if the module import is valid, so this should instead be called something more like diagnoseMisplacedModuleImport, and should either assert if the current context allows module imports or not perform the check.

lib/Parse/ParseDeclCXX.cpp
2856–2863 ↗(On Diff #31539)

Please factor this out into a bool TryParseMisplacedModuleImport() function.

2857–2858 ↗(On Diff #31539)

For module_end, we probably still want to give the "missing '}'" diagnostic, because that's almost certainly the problem if we reach the end of a module while still within a function or namespace. Maybe pass a flag into checkModuleImportContext to say whether you're entering or leaving the module?

2861–2862 ↗(On Diff #31539)

This is not a good way to recover from the problem: for that, we should probably do what we used to do, and bail out to the top level.

sepavloff updated this revision to Diff 31831.Aug 11 2015, 10:48 AM

Fixed module boundary treatment.

This version must fix problems in the treatment of annot_modulbegin and
annot_module_end. Error recover for the token annot_module_include found
in wrong context can be made as previously, just by ignoring it. To handle
token 'annot_module_end' parser bails out to upper level, where the module
end can be processed, it will complain on missing '}' etc. As for
'annot_module_start', now parser tries to recover by skipping all content
till matching 'annot_module_end'.

Also made changes according to reviewer's notes.

rsmith added inline comments.Aug 11 2015, 11:36 AM
include/clang/Basic/DiagnosticParseKinds.td
1018–1019 ↗(On Diff #31831)

These diagnostics are phrased from the point of view of the implementation, and should instead be phrased from the point of view of the user.

The first diagnostic would be more useful if it said something like "missing '}' at end of module". That's pretty close to the diagnostic we already produce; maybe instead of changing the behavior of this case you could improve BalancedDelimiterTracker to use a custom diagnostic for the case where the missing closing delimiter is instead an end-of-module token?

The second diagnostic seems like an improvement, but from the user's perspective, this wasn't the start of a submodule, it was a module import / #include. Maybe just use diagnoseMisplacedModuleImport here?

lib/Parse/ParseDeclCXX.cpp
3075 ↗(On Diff #31831)

This shouldn't be named try* if it has a precondition that the current token is an EoM token. (The try form, if there is one, should check for that token itself.)

lib/Parse/Parser.cpp
1999 ↗(On Diff #31831)

This will result in you producing two errors "unexpected end of module" and "missing }" for each construct that is still open at the end of the module. I don't think that's what we want.

2003 ↗(On Diff #31831)

This is liable to produce bad follow-on diagnostics; I don't think this is a reasonable way to recover. I can see a few feasible options here:

  1. Emit a fatal error when this happens (suppressing further diagnostics)
  2. Break out to the top level of parsing and resume from there (that is, assume that a modular header expects to be included at the top level and that the user didn't screw up their module map)
  3. Enter the module and carry on parsing from here

My preference would be either (1) or (2). But either way, we should diagnose the still-open bracketing constructs, because the problem is likely to be a missing close brace at the end of an unrelated header file.

2018 ↗(On Diff #31831)

Again, this will recover very poorly. In this case, our best bet seems to be to recover by importing the module.

silvas added a subscriber: silvas.Aug 12 2015, 12:08 AM
silvas added inline comments.
lib/Parse/Parser.cpp
2003 ↗(On Diff #31831)

I strongly prefer (1). In all cases I have recorded in my notes when modularizing, the error: expected '}' diagnostic indicated one of two things:

  • that a header needed to be marked textual in the module map.
  • that a #include had to be moved to the top of the file (this case was likely behaving unexpectedly in the non-modules case and "happened to work").

For the sake of our users, it is probably best to immediately fatal error and suggest an alternative (the suggestions can be a separate patch; my recommendations are below).

I believe a user is most likely to run into this diagnostic when developing an initial set of module maps, so our diagnostic should be tailored to that audience. These users may have little experience with modules when they encounter this diagnostic, so a notes suggesting a likely remedy helps them develop confidence by providing authoritative advice (and avoiding a round trip to the documentation).

My fine-grained recommendations from experience are:

  • #include inside extern "C": always meant to be modular, always the fix is to move it out of the braced block.
  • #include inside namespace: always meant to be modular, always the fix is to move it out of the braced block.
  • #include inside class: always textual (e.g. bringing in methods like TableGen .inc files; sometimes similar ".inc" files are manually written and not autogenerated. I have observed e.g. "classfoo_methods.h" files; another instance of this is stamping out ".def" files)
  • #include inside array initializer: always textual. Typically ".def" files
  • #include inside function: always textual. Typically ".def" files (e.g. generate a switch)

".inl" files are theoretically a problem inside namespaces, but I have not observed this issue in practice. In my experience code taking the effort to have ".inl" files is quite clean and avoids such textual dependencies. The issues I have seen in practice with ".inl" files usually end up coming out through a different part of clang. Improving that is for a separate patch though.

sepavloff updated this revision to Diff 32085.Aug 13 2015, 12:06 PM

Updated patch

Thanks to all for fruitful discussion!

The new version tries to addresses review notes. It differs from the previous version in:

  • Method tryParseMisplacedModuleImport now depends only on current token and is used in loop condition. Probably a better name for it is something like skipMisplacedModuleImport.
  • BalancedDelimiterTracker is updated so that it prints different message if module ended before closing bracket has been seen.
  • If annot_module_begin is seen where it must not, recovery depends on context. Inside a namespace the module is imported as a recovery, a note suggesting to move import to top level is issued. Inside a class of a compound statement fatal error issued, there is no way to handle this case at upper level.
silvas added inline comments.Aug 21 2015, 2:15 PM
test/Modules/auto-module-import.m
89 ↗(On Diff #32085)

I think what this means is marking the *header* as textual. So I would suggest the wording "consider marking the header as textual".

Ping.

Thanks,
--Serge

sepavloff updated this revision to Diff 34673.Sep 14 2015, 8:12 AM

Updated diagnostics according to Sean's notes ('textual module' -> 'textual header')

Sorry for the delay. I'm generally happy with the direction of this patch; I think this will substantially improve how we diagnose this set of problems.

include/clang/Basic/DiagnosticParseKinds.td
1016 ↗(On Diff #34673)

I would drop the "the" in this error.

include/clang/Basic/DiagnosticSemaKinds.td
7774 ↗(On Diff #34673)

We'll already have said something like:

error: import of module 'foo' appears within class X
note: class X begins here

There are two likely situations here: either the header was intended to be included textually, or the user forgot a } or similar and didn't expect for the import to be inside the class. Either way, this note is not useful (either they thought it *was* at the top level, or the note is not applicable).

How about dropping this note? The "begins here" note seems to have already covered the interesting case.

7775 ↗(On Diff #34673)

Textual headers are not the common case, so "consider" seems too strong; how about weakening this to something like:

"if this header is intended to be included textually, mark it 'textual' in the module map"
lib/Parse/ParseDeclCXX.cpp
213 ↗(On Diff #34673)

Move this try... call to the end of the && here and elsewhere; it's much more likely that we'll hit an r_brace.

lib/Parse/Parser.cpp
2023–2024 ↗(On Diff #34673)

This is the opposite of how all the other try functions behave: they return true if they consumed the thing they were asked to consume.

2025–2027 ↗(On Diff #34673)

Refactor this as follows:

Rename this to parseMisplacedModuleImport. Add an inline tryParseMisplacedModuleImport to Parser.h, which checks if the current token is an EOM token. If it is, it calls this function. If it is not, it returns immediately.

The purpose of having these try functions with thin wrappers in the .h file is so that we can inline the cheap "are we looking at the right kind of token" check (and allow it to be combined with other nearby checks on the token kind).

lib/Sema/SemaDecl.cpp
14361–14366 ↗(On Diff #34673)

It looks like this will currently produce repeated diagnostics for an import within multiple namespaces:

module.map:

module Foo { module X { header "foo1.h" } module Y { header "foo2.h" } }

foo1.h:

namespace A {
  namespace B {
    #include "foo2.h"
  }
}

I believe this will diagnose:

  1. import of "Foo.Y" within "B"
  2. expected } at end of "B"
  3. import of "Foo.Y" within "A"
  4. expected } at end of "A"

I think we should treat this case as fatal too.

14372 ↗(On Diff #34673)

I think we should only suggest changing the module map if our current module and the module of the included file are within the same top-level module. It would be unfortunate for us to suggest modifying, say, the libc module map if we see some end user's code including a C library header in the wrong context.

You can check M->getTopLevelModuleName() against S.getLangOpts().CurrentModule and S.getLangOpts().ImplementationOfModule to see whether it's a submodule of the current module.

silvas requested changes to this revision.Sep 16 2015, 5:09 PM
silvas added a reviewer: silvas.
silvas added inline comments.
include/clang/Basic/DiagnosticSemaKinds.td
7774 ↗(On Diff #34673)

I think this makes sense. error: ... import ... appears within ... seems like sufficient information about the nature of the error.

lib/Sema/SemaDecl.cpp
14372 ↗(On Diff #34673)

From my experience, your suggested heuristic is insufficient; we definitely need to cover the case that crosses top-level modules (or is from a .cpp file to a module it imports); actually, that is the only case that I have run into in practice.

In this patch, I think the right thing is to just not emit note_module_need_top_level nor note_module_need_textual. A separate patch can use a flag -Wintial-modules-migration (or whatever) to emit these notes a bit more aggressively (the flag would be off by default).

test/Modules/misplaced.cpp
1–12 ↗(On Diff #34673)

This test does not cover the appears within class case.

This revision now requires changes to proceed.Sep 16 2015, 5:09 PM
sepavloff updated this revision to Diff 35115.Sep 18 2015, 11:16 AM
sepavloff edited edge metadata.

Updated patch.

Thanks for helpful notes!

There are similar cases in Objective C. Should they be implemented in this patch or separate patch is OK?

include/clang/Basic/DiagnosticParseKinds.td
1016 ↗(On Diff #34673)

Done.

include/clang/Basic/DiagnosticSemaKinds.td
7774 ↗(On Diff #34673)

The message is removed.

7775 ↗(On Diff #34673)

This message is also removed, as Sean proposed.

lib/Parse/ParseDeclCXX.cpp
213 ↗(On Diff #34673)

Moved.

lib/Parse/Parser.cpp
2023–2024 ↗(On Diff #34673)

Meaning of the return value is inverted.

2025–2027 ↗(On Diff #34673)

Thank you for explanation. Fixed.

lib/Sema/SemaDecl.cpp
14361–14366 ↗(On Diff #34673)

Yes, the messages were as you described. Now import inside a namespace is also a fatal error. Test misplaced-4.cpp checks this case.

14372 ↗(On Diff #34673)

As Sean proposed, these are not emitted.

test/Modules/misplaced.cpp
1–12 ↗(On Diff #34673)

This case was checked in malformed.cpp but for the sake of consistency separate test was added.

rsmith accepted this revision.Sep 18 2015, 11:35 AM
rsmith added a reviewer: rsmith.
rsmith added inline comments.
include/clang/Parse/Parser.h
2560–2562 ↗(On Diff #35115)

Use Tok.isOneOf here.

A separate patch for the ObjC cases is fine.

This revision was automatically updated to reflect the committed changes.