This is an archive of the discontinued LLVM Phabricator instance.

[Preamble] Stop circular inclusion of main file when building preamble
ClosedPublic

Authored by nik on Oct 30 2018, 7:12 AM.

Details

Summary

If a header file was processed for the second time, we could end up with
a wrong conditional stack and skipped ranges:

In the particular example, if the header guard is evaluated the second
time and it is decided to skip the conditional block, the corresponding
"#endif" is never seen since the preamble does not include it and we end
up in the Tok.is(tok::eof) case with a wrong conditional stack.

Detect the circular inclusion, emit a diagnostic and stop processing the
inclusion.

Diff Detail

Event Timeline

nik created this revision.Oct 30 2018, 7:12 AM
nik added a subscriber: yvvan.Oct 30 2018, 8:08 AM

Why does resetting the conditional stack is the right thing to do here?
I can see how it can hide the problem, but can't come up with with a consistent model to handle the fact that the file contents were trimmed.

Failing to build the preamble in that case seems like the best thing we could do. WDYT?

nik added a comment.Oct 31 2018, 4:22 AM

Why does resetting the conditional stack is the right thing to do here?

Because this case can be detected and handled without loosing the benefits of the preamble.

Failing to build the preamble in that case seems like the best thing we could do. WDYT?

See above - we would not have the benefits of the preamble anymore.

I can see how it can hide the problem, but can't come up with with a consistent model to handle the fact that the file contents were trimmed.

If we are in preamble-generation-mode and the main file #includes itself, then it should see the full file content of the file, not the truncated preamble version of it.
Would this be more consistent?

Because this case can be detected and handled without loosing the benefits of the preamble.

The cases where recursive includes make sense are incredibly rare in practice. Most of the time, recursively including the same file is unintentional should be considered a bug and fixed.
Producing a warning that the file is being recursively included (along with an include chain that caused a recursive inclusion) seems to provide more benefit than preamble, especially when this is unintentional (I would argue that's the majority of the cases).

If we are in preamble-generation-mode and the main file #includes itself, then it should see the full file content of the file, not the truncated preamble version of it.
Would this be more consistent?

Yes, but I can't see an easy way to make it happen. And I don't think we want a complex fix for something bizzare that very rarely happens in practice.

@ilya-biryukov
As far as I understand the problem the same thing happens when you are in the header a.h which includes b.h and b.h includes a.h at the same time. So you get this recursion indirectly and very often because that's why include guards are there.

@ilya-biryukov
As far as I understand the problem the same thing happens when you are in the header a.h which includes b.h and b.h includes a.h at the same time. So you get this recursion indirectly and very often because that's why include guards are there.

Cycles in include graphs are almost always unintentional and must be broken apart, there are very few cases where you actually want this.
E.g. in your example, since 'b.h' includes 'a.h' it needs something from 'a.h'. But if you include 'b.h' while processing the preamble of 'a.h', the declarations inside 'a.h' are not yet visible and that would result in (at times obscure and hard to read) compiler errors when parsing 'b.h'.
(Assuming a.h and b.h are normal headers and the use-case does not involve doing various preprocessor tricks)

In those cases it usually works by accident (e.g. when 'b.h' included 'a.h', but never actually used anything declared in 'a.h').

nik added a comment.Nov 1 2018, 2:09 AM

I've only the minimal example at hand right know - I'm waiting for feedback about the real world case.

nik added a comment.Nov 16 2018, 6:45 AM

I still don't have feedback for a real world case except "unintentional #include". Unfortunately, in real world cases the cyclic include might be not obvious at all.

@ilya: As far as I understand you prefer to make the preamble generation rather fail as long as we don't have more information. How do you suggest to implement this? I see that Clang->getPreprocessor().addPPCallbacks() is called in PrecompiledPreamble::Build(). Adding a custom PPCallbacks there that overrides "void InclusionDirective()" might be enough to detect the cyclic #include and to set a flag that is checked before PrecompiledPreamble::Build() returns - BuildPreambleError could get a new enumerator. Is there a better way to do this? For example, it would be desirable to not only observe this case, but then to also stop/abort/skip all the further parsing that is done for the preamble only as it's pointless then.

In D53866#1301086, @nik wrote:

I still don't have feedback for a real world case except "unintentional #include". Unfortunately, in real world cases the cyclic include might be not obvious at all.
@ilya: As far as I understand you prefer to make the preamble generation rather fail as long as we don't have more information. How do you suggest to implement this? I see that Clang->getPreprocessor().addPPCallbacks() is called in PrecompiledPreamble::Build(). Adding a custom PPCallbacks there that overrides "void InclusionDirective()" might be enough to detect the cyclic #include and to set a flag that is checked before PrecompiledPreamble::Build() returns - BuildPreambleError could get a new enumerator. Is there a better way to do this? For example, it would be desirable to not only observe this case, but then to also stop/abort/skip all the further parsing that is done for the preamble only as it's pointless then.

Maybe produce a fatal error in the preprocessor? That seems to be the simplest option: the preprocessor is aware it's building the preamble and there's definitely some logic to produce fatal errors in other cases (include not found).
A fatal error currently aborts other stages of the compilation pipeline and producing one would make the run of the compiler that tries to produce the preamble fail, giving the empty preamble as a result.

nik updated this revision to Diff 175210.Nov 26 2018, 1:56 AM

Maybe produce a fatal error in the preprocessor? That seems to be the simplest option: the preprocessor is aware it's building the preamble and there's definitely some logic to produce fatal errors in other cases (include not found).
A fatal error currently aborts other stages of the compilation pipeline and producing one would make the run of the compiler that tries to produce the preamble fail, giving the empty preamble as a result.

Done. I'm using diag::err_pp_error_opening_file as introducing an new artificial diagnostic error that the user will never see feels wrong.

Note that a preamble is generated for fatal errors like an unresolved #include and I think that's fine. As such, I need a way to propagate the error up to PrecompilePreambleConsumer to avoid writing the preamble. I've done that with an extra flag in Preprocessor. An alternative might be to put it into PreambleConditionalStackStore (as state? and rename that class to something more general?) - what do you think?

ilya-biryukov added inline comments.Dec 4 2018, 12:53 AM
include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

There's a mechanism to handle preamble with errors, see PreprocessorOpts::AllowPCHWithCompilerErrors.
Maybe rely on it and avoid adding a different one?

lib/Lex/PPDirectives.cpp
1883

Maybe add a new error or find a more appropriate existing one to reuse?
"Error opening file", especially without any arguments does not provide enough context to figure out what went wrong.

nik marked 2 inline comments as done.Dec 5 2018, 7:27 AM
nik added inline comments.
include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

I'm not sure how to make use of AllowPCHWithCompilerErrors. It's clearly meant as an input/readonly option - do you suggest setting that one to "false" in case we detect the cyclic include (and later checking for it...)? That feels a bit hacky. Have you meant something else?

lib/Lex/PPDirectives.cpp
1883

Maybe add a new error or find a more appropriate existing one to reuse?

As stated above, introducing a new diagnostic that the user will never face feels wrong, but that's just a preference. If you prefer a dedicated diagnostics, that's fine for me.

Checking the existing ones, there are not many fatal errors to choose from:

err_pp_file_not_found
err_pp_through_header_not_found
err_pp_through_header_not_seen
err_pp_pragma_hdrstop_not_seen
err_pp_error_opening_file

The last one looks the best for me.

"Error opening file", especially without any arguments does not provide enough context to figure out what went wrong.

I've added some arguments which might be useful for debugging.

nik updated this revision to Diff 176826.Dec 5 2018, 7:27 AM

Addressed comments.

nik retitled this revision from [Preamble] Fix preamble for circular #includes to [Preamble] Stop generating preamble for circular #includes.Dec 5 2018, 7:29 AM
nik edited the summary of this revision. (Show Details)
ilya-biryukov added inline comments.Dec 5 2018, 10:55 AM
include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

We emit an error, so the preamble will not be emitted.
Unless the users specify AllowPCHWithCompilerErrors, in which case they've signed up for this anyway.

I propose to not add the new flag at all. Would that work?

lib/Lex/PPDirectives.cpp
1883

Could we create a new error for this case?

1884

I thought the convention is to not capitalize the first word of a message, otherwise it would be a bit inconsistent with other errors:

cannot open file 'foo.cpp:  Main file can't include itself for preamble

also a small nit: maybe rephrase to main file cannot be included recursively for preamble
and another small nit: we probably want to remove the full stop at the end.

nik marked 2 inline comments as done and an inline comment as not done.Dec 6 2018, 2:25 AM
nik added inline comments.
include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

As stated further above, no.

That's because for the libclang/c-index-test case, AllowPCHWithCompilerErrors=true - see clang_parseTranslationUnit_Impl. As such, the preamble will be generated/emitted as the second early return in PCHGenerator::HandleTranslationUnit is never hit.

nik updated this revision to Diff 176941.Dec 6 2018, 2:26 AM

Added a dedicated diagnostic.

ilya-biryukov added inline comments.Dec 7 2018, 6:09 AM
include/clang/Basic/DiagnosticLexKinds.td
430

NIT: maybe rephrase for preamble to when building a preamble? "for preamble" might be a bit confusing.

include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

if AllowPCHWithCompilerErrors=true is set to true, why not simply pretend the include was not found? That would still generate the preamble, but users have the error message to see that something went wrong.

c-index-test produces the errors, so we can check the error is present in the output.

nik marked 2 inline comments as done.Feb 14 2019, 3:59 AM
nik added inline comments.
include/clang/Lex/Preprocessor.h
391 ↗(On Diff #175210)

if AllowPCHWithCompilerErrors=true is set to true, why not simply pretend the include was not found?

Sounds good, especially that the preamble is still generated.

That would still generate the preamble, but users have the error message to see that something went wrong.

The vanilla diagnostic in this case might be a bit confusing, I kept the introduced diagnostics as it carries more information.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2019, 3:59 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
nik updated this revision to Diff 186813.Feb 14 2019, 3:59 AM

Addressed comments.

nik marked an inline comment as done.Feb 14 2019, 4:01 AM
nik added inline comments.
lib/Basic/SourceManager.cpp
1594

I've added the nullptr check here as LibclangReparseTest.ReparseWithModule fails/crashes otherwise.

nik retitled this revision from [Preamble] Stop generating preamble for circular #includes to [Preamble] Stop circular inclusion of main file when building preamble.Feb 14 2019, 4:06 AM
nik edited the summary of this revision. (Show Details)
nik added a comment.Apr 18 2019, 7:03 AM

Ping. Ilya?

nik updated this revision to Diff 198654.May 8 2019, 7:04 AM
nik edited the summary of this revision. (Show Details)

Rebased for current trunk.

If I miss something obvious, please tell me. Otherwise I'm waiting.

Sorry for losing this.
Neat change, minimal and focused, thanks!

Just wanted to clarify why we need the change in SourceManager.cpp, will LGTM as soon as we resolve this

lib/Basic/SourceManager.cpp
1594

Can this actually be`null`?

nik marked an inline comment as done.May 9 2019, 5:26 AM
nik added inline comments.
lib/Basic/SourceManager.cpp
1594

The comment for OrigEntry states that it might be null:

/// Reference to the file entry representing this ContentCache.
///
/// This reference does not own the FileEntry object.
///
/// It is possible for this to be NULL if the ContentCache encapsulates
/// an imaginary text buffer.
const FileEntry *OrigEntry;

Note also that further down in this function, a null check for MainContentCache->OrigEntry is done, so I believe that this was just forgotten here (since there is also no assert) and we are the first one running into this with the introduced call to SourceMgr.translateFile(File).

I've tried to understand why we end up with a nullptr here, but this goes beyond me in the time I've taken for this. What I've observed is that module code path (LibclangReparseTest.ReparseWithModule vs LibclangReparseTest.Reparse) creates at least a MemoryBuffer more (in ModuleMap::parseModuleMapFile; possibly the one referred with "imaginary text buffer" from the comment) and I suspect this to be somehow related with the OrigEntry nullptr.

nik updated this revision to Diff 198799.May 9 2019, 5:28 AM

Moved the MainFile / MainContentCache->OrigEntry check a bit further up, for
consistency with the same test further down in SourceManager::translateFile().

ilya-biryukov added inline comments.May 9 2019, 5:55 AM
lib/Basic/SourceManager.cpp
1594

Should we check for equality of MainContentCache->ContentsEntry in that case? I guess this is what should be set for "imaginary" files.

nik marked an inline comment as done.May 9 2019, 7:10 AM
nik added inline comments.
lib/Basic/SourceManager.cpp
1594

Does not help in this particular case as ContentsEntry is also a nullptr. All the members of the ContentsCache seem to be either nullptr or 0 as that particular point of execution.

How to continue?

a) Accept as is.
b) Ask someone knowledgeable for whom this could be a no-brainer that could respond hopefully soon. Any candidate?
c) Or should I dig deeper? This can take quite some time as this area is new for me.

ilya-biryukov accepted this revision.May 9 2019, 8:10 AM
ilya-biryukov marked an inline comment as done.

LGTM. See the nit about naming of an error, though

include/clang/Basic/DiagnosticLexKinds.td
429

NIT: Maybe change to err_pp_including_mainfile_in_preamble?

lib/Basic/SourceManager.cpp
1594

Wow, I guess I don't understand this code deep enough.
Your change looks ok, I'm happy to LGTM this. But if you're actually interested in digging deeper, that would be super-useful.

Based on what you're describe, it feels like "imaginary" main file works by accident rather than being properly handled in this function.

BTW, thanks for moving the check up, it definitely looks nicer this way.

This revision is now accepted and ready to land.May 9 2019, 8:10 AM
nik updated this revision to Diff 198997.May 10 2019, 3:20 AM
nik marked an inline comment as done and an inline comment as not done.

Renamed to err_pp_including_mainfile_in_preamble.

nik closed this revision.May 10 2019, 7:01 AM

Huch, I forgot to add "Differential Revision: <URL>" to the commit message, so I'll close this manually once I know how to add the svn revision number to this. https://llvm.org/docs/Phabricator.html states:

In the web UI, under “Leap Into Action” put the SVN revision number in the Comment, set the Action to “Close Revision” and click Submit.

Where is the "Lean Into Action" thingy? Can't find it.

I guess using Edit Related Object -> Edit Commits should do the trick.
I'm not sure what the "Lean Into Action" is either.

@nik here's an unusual but real-world example that triggers this.
https://github.com/google/highway compiles the same source multiple times (with different macros set) for generating code for multiple SIMD instruction sets.
The main source file sets a macro to its filename and includes a "foreach_target.h" which includes the file identified by that macro.

I understand this is complicated for analysis, so foreach_target.h does nothing if it detects that an IDE/analyzer are parsing. This relies on predefined macros such as
CDT_PARSER INTELLISENSE Q_CREATOR_RUN. Is there such a macro for clangd?