This is an archive of the discontinued LLVM Phabricator instance.

[clang] Detect header loops
Needs ReviewPublic

Authored by urnathan on Sep 26 2022, 9:59 AM.

Details

Reviewers
aaron.ballman
erichkeane
Group Reviewers
Restricted Project
Summary

The aim is to warn on cycles in header-files, which causes problems
with clang modules (and can result in confusing errors), and makes the
headers not C++20 header-units. But wait, how can this happen?
Consider:

// Hdr foo.h
.#ifndef FOO_H
.#define FOO_H
// understand this is more likely to be transitive through some set of
// intermediate headers
.#include "foo.h"
// Body of header
.#endif

This preprocesses just fine, as by the time we recursively include the
header, it has already defined the protection macro (the compiler
won't know it's an idempotent header at that point, because it will
not yet have observed that the whole header is inside the #ifndef).
You only get a compilation problem if one of the transitive headers
makes reference to an entity that is declared in the body of the
header.

It turns out to be stunningly easy to create such silent header loops,
and they are devilishly hard to detect. One has to preprocess the
file and examine the #line markings. Except, that's not robust as if
the header is either #imported or #pragma once'd, then the recursive
show the problem. That's why this warning is emitted *before* the
idempotency optimization check.

This adds a dense set to Preprocessor, and uses that to record
FileEntry's as include files are entered. Just before determining
whether to enter a #include, we check this set and warn, if a loop is
found. This means we can warn on circular #imports and #pragma once
headers too (which are just as problematic):

In file included from .../llvm/me/hdr-cyc/clang/test/Preprocessor/warn-loop-main.c:4:
In file included from .../llvm/me/hdr-cyc/clang/test/Preprocessor/warn-loop-macro-2.h:3:
.../llvm/me/hdr-cyc/clang/test/Preprocessor/warn-loop-macro-2a.h:4:10: warning: #include cycle [-Winclude-cycle]

In order to detect multiple loops with the same outer header, we add a
flag to PreprocessorLexer, to record whether circularity was detected
upon creation. We only remove the associated FileEntry from the map
at the outermost level. (You'll see the test case has two loops
involving warn_loop_macro_2.h.)

The boost headers are full of loops. Some are real, some are not
actual loops due to anticoherence in their controlling conditionals.

Diff Detail

Event Timeline

urnathan created this revision.Sep 26 2022, 9:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 9:59 AM
urnathan requested review of this revision.Sep 26 2022, 9:59 AM
urnathan edited the summary of this revision. (Show Details)

I find myself wondering if there is a good way to instead 'search' for the 'pragma once' key, or ControllingMacroID/ControllingMacro (See HeaderFileInfo), and see if that already is defined/exists?

Alternatively, we can probably search through the list of "Lexers" for the file. See PPLexerChange.cpp ~line 50 for something that does similar work, it searches back through the IncludeMacroStack for a FileLexer.

Alternatively, I see that Lexer has isFirstTimeLexingFile, which probably does about 95% of what you want (it stores "IsFirstTimeLexingFile"). This is perhaps the most promising?

clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #462953)

Do we want some sort of 'note' so we can print breadcrumbs?

Ah, I should have investigated how it goes to find an existing lexer, it didn't occur to me that there already was include stack walking code.

clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #462953)

the breadcrumbs are in the form of the 'included from ...' location information -- or do you think something more is needed?

erichkeane added inline comments.Sep 27 2022, 9:47 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #462953)

Nope, if we have those, I'm a happy camper. It wasn't clear from the emit code that it happened, but I should have realized the Lexer diagnostics always did that.

urnathan updated this revision to Diff 463579.Sep 28 2022, 8:56 AM
urnathan retitled this revision from [clang] WIP: detect header loops to [clang] Detect header loops.
urnathan edited the summary of this revision. (Show Details)

this updates adds a map to record the active headers -- searching up the include stack is going to be O(N) per include, so not the best approach.

Hmm... I'm not against this, but I'm wondering if the O(N) is problematic, compared to the overhead of a set? "N" is going to be 'small', at least relatively, right? I can't imagine "N" gets to be more than ~200 at worst case in anything but the most absurd/contrived workloads? What do others think? Or am I way off in my expectation here?

Finally, is there value to NOT doing the lookup if the warning is disabled?

clang/lib/Lex/PPLexerChange.cpp
107

The iterator is somewhat costly (at least 2 pointers?) and since we're not accessing them, copying them into the structured binding object probably isn't valuable?

aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

This diagnostic doesn't really seem like it's going to help the user to know what's wrong with their code or how to fix it. (Also, the notes @erichkeane was hoping were emitted don't seem to be, at least according to the new test cases.) I think we need to give users a bit more guidance here.

324 ↗(On Diff #463579)

We have a reasonable amount of evidence that off-by-default warnings remain off for basically everyone. Can we do anything to enable this diagnostic by default? e.g., would it make sense to enable it by default when trying to build a module but otherwise silence it?

clang/include/clang/Lex/Preprocessor.h
865 ↗(On Diff #463579)
clang/test/Preprocessor/warn-loop-macro-1.h
3 ↗(On Diff #463579)

For example, as a user, I would look at this diagnostic and assume the compiler is wrong -- "There's no cycle there, the header guard macro protects me against it being a cycle." is not an unreasonable way to view this code.

erichkeane added inline comments.Sep 30 2022, 10:11 AM
clang/test/Preprocessor/warn-loop-macro-1.h
3 ↗(On Diff #463579)

I believe our 'include breadcrumbs notes' are printed differently:

[ekeane1@scsel-clx-24 build]$ ./bin/clang -cc1 c.cpp
In file included from c.cpp:1:
In file included from ./b.h:1:
./a.h:1:2: error: "FOO"
#error "FOO"
^
1 error generated.

aaron.ballman added inline comments.Sep 30 2022, 10:17 AM
clang/test/Preprocessor/warn-loop-macro-1.h
3 ↗(On Diff #463579)

Oh wow, I forgot about the fact that we don't use actual notes there! But even still, won't the breadcrumbs in this case be:

In file included from warn-loop-main.c:3
warn-loop-macro-1.h:4: warning: #include cycle
#include "warn-loop-macro-1.h"
^

That's not really clear (at least to me) due to the header guard protecting against a recursive include.

urnathan updated this revision to Diff 465006.Oct 4 2022, 7:30 AM
urnathan updated this revision to Diff 465007.
urnathan marked an inline comment as done.
urnathan marked 2 inline comments as done.Oct 4 2022, 7:36 AM
urnathan added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

The test infrastructire ignores the 'included from ' chain, which is emitted. What do you suggest?

324 ↗(On Diff #463579)

That's probably a good idea -- I didnt want to put it unconditionally on, because it triggers all over the testsuite due to intentional self-inclusion

clang/lib/Lex/PPLexerChange.cpp
107

I'd be very disappointed it the optimizer can't see through that and elide the copies here. (but I had misremembered how structured bindings are not references to the temporary by default anyway)

aaron.ballman added inline comments.Oct 4 2022, 9:05 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

Even posting the "included from" doesn't help clarify what's wrong with the code though, if I'm understanding the tests properly. My concern is with code like:

// self.h
#ifndef GUARD
#define GUARD

#include "self.h" // expected-warning {{#include cycle}}
#endif

(This is effectively the same test as warn-loop-macro-1.h.) There is no include *cycle* ("a series of events that are regularly repeated in the same order.") in this code -- the header is included for the first time, GUARD is not defined, then GUARD is defined and the header is included for the second time. This time GUARD is defined and no further cycles into self.h occurs.

But I think I'm seeing now what you're trying to get at (correct me if I'm wrong): when processing an include stack, opening the same header file twice is a problem (regardless of the contents of the header or how you got into the same file twice)? If that's accurate, would it make more sense to diagnose this as:

including the same header (%0) more than once in an include stack causes %select{incorrect behavior of Clang modules|the header to not be a C++ module header unit}1

or something along those lines? Basically, I'd like to avoid saying "cycle" because that starts to sound like "you have potentially infinite recursion in the include stack" and I'd like to add information about what's actually wrong instead of pointing out what the code does and hoping the user knows why that's bad.

philnik added a subscriber: philnik.Oct 4 2022, 9:23 AM
philnik added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

FWIW in libc++ we have a script to check that we don't include the header more than once in a stack and also call it a cycle, so there is some precedent of calling it an include cycle, even thought it's technically not a cycle.

urnathan added inline comments.Oct 11 2022, 5:03 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

Aaron, correct, the problem is with clang-modules/c++ header units. The loop means that the header file does not expand to the same sequence of tokens at every inclusion. This is particularly problematic with module maps, because the compiler will automatically start turning a header into a clang-module upon the first #include -- which might not be at the outermost level, (if it starts with a header at some other position in the loop). Then when it starts with this header, it'll think it already has converted and loaded it. One ends up with very obscure failure modes (such as link errors concerning missing template instantations).

I wonder if (in addition to allow detecting this in general), the warning should be enabled by default when building a header-unit/clang-module and one reincludes a header corresponding to a module currently being built. That's the same check, because there can be a stack of modules being built, but I think we need some additional checking to avoid triggering on 'textual headers' from the module map.

IMHO it is still a cycle, it's a closed loop that executes exactly once :) I can see the confusion though. I mean, 'for (auto i = 0; i < 1; i++) {}' is still a loop, right? just not an unbounded one.

Anyway, I think your diagagnostic text is better, thanks!

323 ↗(On Diff #463579)

philnik, that's good to know.

aaron.ballman added inline comments.Oct 12 2022, 8:24 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
323 ↗(On Diff #463579)

Thank you for the help understanding the root of what we're diagnosing! That's fascinating and something I'm really glad we're getting a diagnostic for! I'm more strongly in favor of this being an on-by-default warning that only diagnoses when trying to build a module (if possible).

@aaron.ballman @urnathan What is the status of this PR?

@aaron.ballman @urnathan What is the status of this PR?

I'd kind of forgotten about it. Did get a query from, I think, boost about how I detected the loops I reported -- which was by hacking the compiler and a bit of scripting. But it seems there might be users desiring the knowledge of this.

And personally I noticed that if you write your headers as:

#ifndef MACRO
// body of header goes here
#define MACRO
#endif // MACRO

the current technology shouts at you -- pity that idiom is not a thing :(