This is an archive of the discontinued LLVM Phabricator instance.

Serialize PragmaAssumeNonNullLoc to support preambles
ClosedPublic

Authored by dgoldman on Mar 21 2022, 1:50 PM.

Details

Summary

Previously, if a #pragma clang assume_nonnull begin was at the
end of a premable with a #pragma clang assume_nonnull end at the
end of the main file, clang would diagnose an unterminated begin in
the preamble and an unbalanced end in the main file.

With this change, those errors no longer occur and the case above is
now properly handled. I've added a corresponding test to clangd,
which makes use of preambles, in order to verify this works as
expected.

Diff Detail

Event Timeline

dgoldman created this revision.Mar 21 2022, 1:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2022, 1:50 PM
dgoldman requested review of this revision.Mar 21 2022, 1:50 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 21 2022, 1:50 PM

LMK if there are other folks more suited to review this, I'm not sure...

sammccall added inline comments.Mar 22 2022, 6:43 AM
clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp
749

This test seems to have a lot of extraneous elements.

The following seems sufficient, with no header:

#pragma clang assume_nonnull begin
void foo(int *x);
#pragma clang assume_nonnull end
749

this should be a clang test, rather than a clangd test (or in addition)

probably under clang/PCH/, of the form

// RUN: %clang_cc1 -emit-pch ...
// RUN: %clang_cc1 -include-pch -fsyntax-only -verify -ast-dump ...
// and optionally without PCH:
// RUN: %clang_cc1 -fsyntax-only -verify -ast-dump ...

Then you can verify both the lack of diagnostics and the AST that gets generated.

776

you verify no diagnostics, but likely also want to verify that the parameter has the nonnull attribute.

This is probably simple enough, e.g

ParsedAST AST = TU.build();
ParmVarDecl *X = cast<FunctionDecl>(findDecl(AST, "foo")).getParamDecl(0);
ASSERT_TRUE(X->hasAttr<NonnullAttr>());

(or by matching the AST dump)

clang/include/clang/Lex/PreprocessorOptions.h
131–132

nit: "Similarly, we track an unterminated #pragma assume_nonnull"?

Currently it's not completely clear what the connection is.

clang/lib/Lex/PPLexerChange.cpp
430–431

unless we're hitting _the end_ of the preamble

This is a special case though, pull it out of the one-sentence summary (as the other special case is)

434

It's not valid *anywhere* we exit a file when building a preamble, only when we fall off the end of the preamble region itself within the main file.

435

what's the PredefinesFileID special case about?

clang/lib/Serialization/ASTWriter.cpp
2303

nit: #pragma assume_nonnull

2304

this "might" is doing a lot of spooky action at a distance.

At minimum we should assert this is a preamble, right? Or move this block into the if() below?

dgoldman updated this revision to Diff 417315.Mar 22 2022, 9:07 AM
dgoldman marked 9 inline comments as done.

Fixes for review

clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp
749

Hmm I tried this earlier, maybe I did it wrong, but I don't think it works since it doesn't set the GeneratePreamble bit

clang/lib/Lex/PPLexerChange.cpp
430–431

Done, but isn't it still 2 places - generating the preamble and actually using it both need the exception?

435

See ExitedFromPredefinesFile below, this is how I check if we're falling off the end of the preamble region into the main file (same as done for the conditional stack), LMK if there's a better way.

clang/lib/Serialization/ASTWriter.cpp
2304

Yeah that makes added, added an assert.

sammccall added inline comments.Mar 23 2022, 7:07 AM
clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp
749

(please don't mark comment threads done unless they've been addressed)

You're right, regular PCHes are not preambles and so don't handle these cases.
Probably the best place then is something using c-index-test similar to clang/test/Index/preamble-conditionals.cpp

755

nit: not necessary, this pragma isn't tied to objc

clang/lib/Lex/PPLexerChange.cpp
435

I don't understand how this checks if we're falling off the preamble region, and the code around ExitedFromPredefinesFile doesn't clarify this for me. Can you explain?

The if (ExitedFromPredefinesFile) appears to be handling the logical *insertion* of preamble PP events when *consuming* a preamble, which is not what you're trying to do here.

The condition here is of the form "if we have an open pragma, and we're not generating a preamble, and some other stuff, then diagnose". So if the baseline case is "diagnose" and the preamble case is an exception, the "other stuff" clauses don't limit the preamble exception, they add extra exceptions!

clang/lib/Serialization/ASTWriter.cpp
2303

it should be -> we should be
the preamble -> a preamble
?

dgoldman updated this revision to Diff 417623.Mar 23 2022, 8:08 AM
dgoldman marked 2 inline comments as done.

Minor fixes

dgoldman marked an inline comment as done.Mar 23 2022, 8:17 AM
dgoldman added inline comments.
clang/lib/Lex/PPLexerChange.cpp
435

!(CurLexer && CurLexer->getFileID() == PredefinesFileID) makes sure that if we're consuming the preamble, we don't emit the warning. AFAICT this is the way to tell that the preamble is terminated, since the current file ID being processed is the predefines file and the file is now terminated as of this method call (since !isEndOfMacro && !(CurLexer && CurLexer->Is_PragmaLexer)). !this->PPOpts->GeneratePreamble makes sure that if we're generating the preamble, we don't emit the warning. We need to special case both cases otherwise we get an error when generating the preamble or when we load the preamble before even processing the rest of the main file.

Does that makes sense?

sammccall added inline comments.Mar 23 2022, 8:46 AM
clang/lib/Lex/PPLexerChange.cpp
435

!(CurLexer && CurLexer->getFileID() == PredefinesFileID) makes sure that if we're consuming the preamble, we don't emit the warning

  1. if we're consuming the preamble, what exactly is the sequence of events that would otherwise lead us to emit the warning?
  2. what if we're in the predefines file for some other reason instead?

I'm hoping you'll explain to me what you think the predefines file is, and what its relationship to the preamble is, and so why this condition is correct :-)

My understanding (which is pretty shaky!) is that the predefines file is a kind of catchall used to inject text into the preprocessor that doesn't appear in the source file - that it contains definitions of builtin integer types, macros defined by -D on the command line, and so on. If it has any relationship to the preamble, it's something subtle that's to do with the relative order in which the preprocessor sees entities like the predefines, the preprocesor, and the main file.
So according to my understanding, interpreting "reaching EOF in the predefines file" as "consuming a preamble" is either wrong or something very subtle requiring a significant comment.

The code you refer to below is doing something very different: it's saying that if we have a preamble, then reaching the end of the predefines file is the *trigger* to inject state from it!

!this->PPOpts->GeneratePreamble makes sure that if we're generating the preamble, we don't emit the warning

Sure, but that doesn't sound correct at all! A preamble mostly consists of parsing a lot of headers, and if any of those headers have an unpaired pragma, we should be warning on that at EOF. It's only if we hit the "pretend" EOF from the truncated main file that we want to suppress the warning.

sammccall added inline comments.Mar 23 2022, 8:48 AM
clang/lib/Lex/PPLexerChange.cpp
435

the preprocessor sees entities like the predefines, the preprocesor, and the main file.

Oops... sees the predefines, any injected events from the preamble, and the main file.

dgoldman added inline comments.Mar 23 2022, 1:00 PM
clang/lib/Lex/PPLexerChange.cpp
435

I think that definition is correct, but given that there is only one predefines file, when it ends, it must go into the main file, no? And if we have a pragma assume nonnull loc, it shouldn't be from the clang builtins, it should be from the preamble (I'd certainly hope!)

re: headers, I think there might be a misunderstanding, as soon as an #include is seen the pragma is ended here: https://github.com/llvm/llvm-project/blob/7631c366c8589dda488cb7ff1df26cc134002208/clang/lib/Lex/PPDirectives.cpp#L2006, so if PragmaAssumeNonNullLoc is valid here we haven't had any includes since the pragma was seen.

sammccall added inline comments.Mar 23 2022, 2:12 PM
clang/lib/Lex/PPLexerChange.cpp
435

I think that definition is correct, but given that there is only one predefines file, when it ends, it must go into the main file, no?

I'm sorry, but i really can't follow what you're asking, or how it relates to the original problem, or what the problem even is. If *you're* not sure, it's probably worth tracing execution of the preprocessor and describing the precise sequence of events that cause a problem. And then trying to come up with a *direct* way of solving them.

Here's my *guess* as to what's going on (but please check!):

  • the assume_nonnull state is very sensitive to shifts between files - it's supposed to be off before reaching EOF and before #include. (This is why it can appear to be file-local syntax but be implemented as a global flag).
  • therefore the AssumeNonnullLoc is not just some property of the preprocessor, it's a transient state that should be set at precisely the right point in the program. However you're setting it at the very beginning, as a side effect of ReadASTBlock().
  • when consuming a preamble, the preprocessor simulates a structure like:
// AssumeNonnullLoc is being set here
#include <predefines> // implicit
#inject pp conditional stack
// AssumeNonnnullLoc *should* be set here
// now continue with the post-preamble part of main file
  • because AssumeNonnullLoc is set too early, the PP events in between these locations mess with it. Fortunately that is probably only the predefines, although that's a brittle assumption to make. And we don't run into problems when entering the predefines file, because HandleHeaderIncludeOrImport isn't called, we just enter the file directly. However you do run into issues when exiting that file again, and so a special case works around those.

If this is right, it's not a good solution. There are other observable effects of setting AssumeNonnullLoc in the wrong place. All the predefines will be parsed in "assume nonnull mode". And if any files are #included from the predefines file, then the enter/exit events will cause problems. Predefines may contain #include with the -include and -imacros clang flags, and due to clangd's preamble patching.

In this case the right fix is to set AssumeNonnullLoc at the correct point in the file: triggered by exiting the predefines, along with the PP conditional stack. The way to do this is probably to have a second property on preprocessor which stores the loc that will be restored when the trigger occurs. (Similar to how PP distinguishes between the *current* conditional stack and the replayable one set by ASTReader when it calls PP.setReplayablePreambleConditionalStack(ConditionalStack, SkipInfo);

re: headers, I think there might be a misunderstanding, as soon as an #include is seen the pragma is ended here

The scenario is:

We are building a preamble from main.c
main.c:

#include "foo.h"
int x;

foo.h:

 #pragma clang assume_nonnull begin
void foo(int*);
// missing end pragma

We should issue a diagnostic when we hit the end of foo.h, but with the code you have here it will be supposed instead. (There's no #include in scope of the pragma here, so I don't think that's relevant)

dgoldman updated this revision to Diff 417976.Mar 24 2022, 10:25 AM
dgoldman marked an inline comment as done.

Properly handle generation and restoring

dgoldman added inline comments.Mar 24 2022, 11:00 AM
clang/lib/Lex/PPLexerChange.cpp
435

🤦 Thanks, that example is super helpful, not sure why I wasn't able to understand it earlier. Yeah, this doesn't properly handle EOF for headers during preamble generation and we should only set the assume_nonnull loc only when transitioning, not earlier. Like you said, we need to basically mirror the replay/record logic for the preamble conditional stack for it to work properly. I went ahead and implemented the approach that you mentioned to restore the value when exiting from the predefines as well as added a test to clangd for the edge case you gave. PTAL, I kept the storage separate but I wonder if it would be better to merge in with the existing replay logic.

dgoldman updated this revision to Diff 418001.Mar 24 2022, 11:47 AM

Minor lint fixes

dgoldman updated this revision to Diff 418004.Mar 24 2022, 11:51 AM

if/else formatting

dgoldman updated this revision to Diff 418030.Mar 24 2022, 1:19 PM

Another formatting fix

sammccall accepted this revision.Mar 25 2022, 6:31 AM

Thanks, this looks really solid now!

clang/lib/Lex/PPLexerChange.cpp
430–431

revert comment change? this is still specific to assume_nonnull

This revision is now accepted and ready to land.Mar 25 2022, 6:31 AM
dgoldman updated this revision to Diff 418301.Mar 25 2022, 12:43 PM

Revert comment change

dgoldman marked an inline comment as done.Mar 25 2022, 12:55 PM

(Just checking you've seen this is accepted and you can land when you're ready!)

This revision was automatically updated to reflect the committed changes.