This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/
-
clang/
-
Frontend/
-
PrecompiledPreamble.h
-
Lex/
-
Lexer.h
-
PreprocessorOptions.h
-
lib/
-
Frontend/
-
FrontendActions.cpp
-
PrecompiledPreamble.cpp
-
Lex/
-
Lexer.cpp
-
Preprocessor.cpp
-
unittests/Frontend/
-
Frontend/
-
PCHPreambleTest.cpp

Differential D37491

[Preamble] Fixed preamble breaking with BOM presence (and particularly, fluctuating BOM presence)
ClosedPublic

Authored by cameron314 on Sep 5 2017, 12:56 PM.

Download Raw Diff

Details

Reviewers

ilya-biryukov

Commits

rG84fd064ef98b: [PCH] Fixed preamble breaking with BOM presence (and particularly, fluctuating…
rC313796: [PCH] Fixed preamble breaking with BOM presence (and particularly, fluctuating…
rL313796: [PCH] Fixed preamble breaking with BOM presence (and particularly, fluctuating…

Summary

This patch fixes preamble skipping when the preamble region includes a byte order mark (BOM). Previously, parsing would fail if preamble PCH generation was enabled and a BOM was present.

This also fixes preamble invalidation when a BOM appears or disappears. This may seem to be an obscure edge case, but it happens regularly with IDEs that pass buffer overrides that never (or always) have a BOM, yet the underlying file from the initial parse that generated a PCH might (or might not) have a BOM.

I've included a test case for these scenarios.

Note: This depends on the test infrastructure introduced in D37474.

Diff Detail

Repository: rL LLVM

Event Timeline

cameron314 created this revision.Sep 5 2017, 12:56 PM

How are various preprocessor offests (and SourceLocation offsets) are calculated? Do they account for BOM presence and ignore it?
Are there potential problems we may run into because of the changing offsets? Could we add tests checking changing the offsets does not matter?
Should we add checks that BOM was removed or added, but not changed? I would not expect preamble to be reusable "as is" if BOM (and therefore, input encoding) changed.

include/clang/Frontend/PrecompiledPreamble.h
102 ↗	(On Diff #113898)	Let's leave this class's interface immutable. It is used concurrently in clangd and having a mutable method like this would break the code. Passing new `PreambleBounds` to `AddImplicitPreamble` and setting the offsets accordingly would do the trick, leave the interface immutable and make the fact that offsets might change more evident.
191 ↗	(On Diff #113898)	Let's store original `PreambleBounds` instead of `PreambleEndsAtStartOfLine` and `PreambleOffset`. It would make the code easier to read.
include/clang/Lex/Lexer.h
50 ↗	(On Diff #113898)	Maybe pick a name that clearly states that it's a `BOM` size? Or add a comment indicating that it's a `BOM` offset.
639 ↗	(On Diff #113898)	Maybe leave the old name? Doesn't `SkipBytes` captures the new semantics just as good?
lib/Frontend/PrecompiledPreamble.cpp
195 ↗	(On Diff #113898)	Could you inline usages of this function and remove it?
unittests/Frontend/PchPreambleTest.cpp
190 ↗	(On Diff #113898)	We're not really testing that preamble was reused. Maybe return a flag from `ASTUnit::Reparse` to indicate if preamble was reused and check it here?

Thanks for the response!

How are various preprocessor offests (and SourceLocation offsets) are calculated? Do they account for BOM presence and ignore it?

Everything is in byte offsets; the SourceLocation after the BOM is not the same as before the BOM. The lexer automatically skips the BOM at the beginning of the file if it sees one (Lexer::InitLexer), and everything else works normally after that. The start of the first line is after the BOM, if any, which means it doesn't affect line/column numbers.

Are there potential problems we may run into because of the changing offsets? Could we add tests checking changing the offsets does not matter?

That's a good point; I've looked into it and the PCH for the preamble is parsed using just the buffer slice that contains the preamble, excluding any BOM. That means that when we resume parsing later on a main buffer with a BOM, the SourceLocations within the preamble itself will be off. However, normally this doesn't matter since the only things in the preamble are preprocessor directives, whose positions are very rarely used. (I should note at this point that we've been using a variant of this patch in production for a few years without any problem.) So, we have two choices: Either parse the preamble with the BOM and throw out the preamble/PCH when the BOM presence changes from the main buffer, or slice the buffer when using a preamble PCH so that it never has a BOM during parsing. I'm leaning towards the second option, since it's a little cleaner and lets the preamble be reused more easily; the only downside is that an external consumer would not be able to use any absolute offsets from the AST (note that line/column offsets would be identical) in the original buffer if it has a BOM -- but in any case, absolute offsets are usually useless without the buffer itself, which if obtained from clang would always be the correct buffer.

Should we add checks that BOM was removed or added, but not changed? I would not expect preamble to be reusable "as is" if BOM (and therefore, input encoding) changed.

I'm not sure I understand this point. Clang only understands UTF-8; the BOM is either present or not, but the encoding never changes. (And the BOM itself is always the same byte sequence too.) It has no impact on the file contents.

include/clang/Frontend/PrecompiledPreamble.h
102 ↗	(On Diff #113898)	Fair point, I'll change this.
191 ↗	(On Diff #113898)	Again, good point, I'll change this.
include/clang/Lex/Lexer.h
50 ↗	(On Diff #113898)	I can see how this might be confusing. I'll add a comment.
639 ↗	(On Diff #113898)	`SkipBytes` moves relative to the current position, but the lexer skips the BOM implicitly on construction; I don't want to skip it twice. `SetByteOffset` is absolute, which makes it simple and clear to use without having to reason about implicit past state.
lib/Frontend/PrecompiledPreamble.cpp
195 ↗	(On Diff #113898)	I could; I think it makes sense to leave the wrapper, though, since the `ASTUnit` deals with the `PrecompiledPreamble` at its level of abstraction, and the `PrecompiledPreamble` deals with the lexer at its level of abstraction.
unittests/Frontend/PchPreambleTest.cpp
190 ↗	(On Diff #113898)	We are; if it wasn't reused, the header would have been opened again and the last assert on `GetFileReadCount` below would fail.

In D37491#862160, @cameron314 wrote:

Are there potential problems we may run into because of the changing offsets? Could we add tests checking changing the offsets does not matter?

That's a good point; I've looked into it and the PCH for the preamble is parsed using just the buffer slice that contains the preamble, excluding any BOM. That means that when we resume parsing later on a main buffer with a BOM, the SourceLocations within the preamble itself will be off. However, normally this doesn't matter since the only things in the preamble are preprocessor directives, whose positions are very rarely used. (I should note at this point that we've been using a variant of this patch in production for a few years without any problem.) So, we have two choices: Either parse the preamble with the BOM and throw out the preamble/PCH when the BOM presence changes from the main buffer, or slice the buffer when using a preamble PCH so that it never has a BOM during parsing. I'm leaning towards the second option, since it's a little cleaner and lets the preamble be reused more easily; the only downside is that an external consumer would not be able to use any absolute offsets from the AST (note that line/column offsets would be identical) in the original buffer if it has a BOM -- but in any case, absolute offsets are usually useless without the buffer itself, which if obtained from clang would always be the correct buffer.

Maybe there's a third option option to remove the BOM from the buffer before passing it to clang?
Could you elaborate on your use-case a little more? Is there no way to consistently always pass buffers either with or without BOM?

Out of two options you mention discarding preamble on BOM changes seems like an easy option that is both correct and won't make a difference in performance since BOM rarely changes.
Looking at your use-case, it sounds like you'll only have 1 extra reparse of preamble, which is probably fine.

Should we add checks that BOM was removed or added, but not changed? I would not expect preamble to be reusable "as is" if BOM (and therefore, input encoding) changed.

I'm not sure I understand this point. Clang only understands UTF-8; the BOM is either present or not, but the encoding never changes. (And the BOM itself is always the same byte sequence too.) It has no impact on the file contents.

Sure, it's not something clang supports, it's an edge-case when clang receives "malformed" input. Does lexer only skip utf-8 BOM, but not other versions of BOM?
But you're right, it's highly unlikely anything will break in that case.

unittests/Frontend/PchPreambleTest.cpp
190 ↗	(On Diff #113898)	Missed that, thanks. Looks good. Maybe add a comment explicitly noting that?

Maybe there's a third option option to remove the BOM from the buffer before passing it to clang?
Could you elaborate on your use-case a little more? Is there no way to consistently always pass buffers either with or without BOM?
Out of two options you mention discarding preamble on BOM changes seems like an easy option that is both correct and won't make a difference in performance since BOM rarely changes.
Looking at your use-case, it sounds like you'll only have 1 extra reparse of preamble, which is probably fine.

In my particular use case, when the file is remapped by the IDE, there's never a BOM. But when it's not remapped, the real file may or may not have a BOM. Since the file goes back and forth between mapped and unmapped depending on whether it's saved, the BOM presence can change quite frequently, and we don't really have control over it (the BOM can change on disk too). This is a common use case for anyone integrating clang/libclang into an IDE; the rarity of UTF-8 BOMs on platforms other than Windows probably obscured this until now.

I think since we can handle the changing BOM presence in the preamble gracefully, we should. I'll draft a patch that does the slicing correctly so that the offsets are always valid.

Sure, it's not something clang supports, it's an edge-case when clang receives "malformed" input. Does lexer only skip utf-8 BOM, but not other versions of BOM?
But you're right, it's highly unlikely anything will break in that case.

Ah, I see. Yes, the lexer only skips a UTF-8 BOM, but I seem to recall seeing some code that detects BOMs in other encodings and emits an error (in the driver, possibly?).

unittests/Frontend/PchPreambleTest.cpp
190 ↗	(On Diff #113898)	Sure, will do.

Here's an updated patch. The code required to make it work is much simpler when the BOM is simply ignored :-)

Parsing errors on preamble additions and removals are definitely bad and should be fixed.
But I would argue that the right approach is to invalidate the preamble and rebuild it on BOM changes.

Current fix in ASTUnit just hides an error in the underlying APIs. For example, all other other clients of PrecompiledPreamble are still broken.

lib/Frontend/ASTUnit.cpp
1262 ↗	(On Diff #114228)	It seems that having only this chunk would fix your issue. Everything else is just a non-functional refactoring, maybe let's focus on that part (and tests) in this review and send the rest as a separate change? To keep refactoring and functional changes logically separate.

Will this fix PR25023 and PR21144?

In D37491#864649, @erikjv wrote:

Will this fix PR25023 and PR21144?

PR25023 should be fixed by this change. It is essentially a repro of the same bug.
Could we add a c-index-test-based test here to make sure we addressed that particular use-case?

The state of PR21144 won't be affected, as this change does not touch the code invoked during normal compilation without preambles.
If PR21144 is fixed in a way that would make SourceLocations the same regardless if BOM was present or not, we might have a better guarantee that nothing will break in case we want to reuse preamble between BOM/non-BOM versions.

It seems there's other users of PrecompiledPreamble that would have to be fixed, yes. If we go with my original fix of taking into account the BOM in the preamble bounds, there's no way of reusing the PCH when the BOM appears/disappears. I still maintain this is a common use case for IDE-type clients. This type of performance bug is very hard to track down.

@erikjv: Yes, I think this will fix PR25023.
PR21144 is unrelated; clang uses UTF-8 byte offsets instead of logical-character offsets for column numbers, which makes sense to me.

lib/Frontend/ASTUnit.cpp
1262 ↗	(On Diff #114228)	Yes, with this form of the fix, the other changes are mostly cosmetic. I could simply revert them, it's not worth the hassle of submitting another patch.

Alright, I've changed the patch so that the preamble takes into account the BOM presence and is invalidated when it changes. This automatically fixes all clients of PrecompiledPreamble, and ensures that all SourceLocations are always consistent when using a PCH generated from the preamble.

I think this should do the trick!

See my comments about removing StartOffset field, but other than that looks good.

include/clang/Lex/Lexer.h
52 ↗	(On Diff #115104)	We could simplify it further by removing `StartOffset`, leaving only `Size`. If you look at the code, it always uses `StartOffset + Size` now, which is effectively size with BOM. What do you think?
lib/Frontend/PrecompiledPreamble.cpp
227 ↗	(On Diff #115104)	Maybe store BOM bytes in `PreambleBytes` too? Would that allow to get rid of `StartOffset` field (see other comment)?

This revision is now accepted and ready to land.Sep 14 2017, 7:42 AM

cameron314 added inline comments.Sep 14 2017, 9:24 AM

include/clang/Lex/Lexer.h
52 ↗	(On Diff #115104)	Yeah, I thought of that, but it's still nice to have the two separated. ASTUnit.cpp, for example, checks if `Size != 0` to determine if there's a preamble (without considering the offset). This isn't in the diff because it was already like that.

ilya-biryukov added inline comments.Sep 14 2017, 9:40 AM

include/clang/Lex/Lexer.h
52 ↗	(On Diff #115104)	Could we simply return `Size = 0` from `ComputePreambleBounds` if we simply skipped BOM and the preamble itself is empty? My concern is that currently it's very easy to forget adding `StartOffset` and simply use `Size` when writing code that uses `PreambleBounds`. If we only have `Size`, probability of mistakes is much lower.

cameron314 added inline comments.Sep 14 2017, 12:40 PM

include/clang/Lex/Lexer.h
52 ↗	(On Diff #115104)	Alright, sold. There's already other places that use the size without checking the offset, it turns out.

Final diff. Test passes!

Closed by commit rL313796: [PCH] Fixed preamble breaking with BOM presence (and particularly, fluctuating… (authored by cameron314). · Explain WhySep 20 2017, 12:05 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

Frontend/

PrecompiledPreamble.h

15 lines

Lex/

Lexer.h

27 lines

PreprocessorOptions.h

4 lines

lib/

Frontend/

FrontendActions.cpp

2 lines

PrecompiledPreamble.cpp

3 lines

Lex/

Lexer.cpp

14 lines

Preprocessor.cpp

6 lines

unittests/

Frontend/

PCHPreambleTest.cpp

44 lines

Diff 116049

cfe/trunk/include/clang/Frontend/PrecompiledPreamble.h

	Show All 30 Lines
	class FileSystem;			class FileSystem;
	}			}

	class CompilerInstance;			class CompilerInstance;
	class CompilerInvocation;			class CompilerInvocation;
	class DeclGroupRef;			class DeclGroupRef;
	class PCHContainerOperations;			class PCHContainerOperations;

	/// A size of the preamble and a flag required by
	/// PreprocessorOptions::PrecompiledPreambleBytes.
	struct PreambleBounds {
	PreambleBounds(unsigned Size, bool PreambleEndsAtStartOfLine)
	: Size(Size), PreambleEndsAtStartOfLine(PreambleEndsAtStartOfLine) {}

	/// \brief Size of the preamble in bytes.
	unsigned Size;
	/// \brief Whether the preamble ends at the start of a new line.
	///
	/// Used to inform the lexer as to whether it's starting at the beginning of
	/// a line after skipping the preamble.
	bool PreambleEndsAtStartOfLine;
	};

	/// \brief Runs lexer to compute suggested preamble bounds.			/// \brief Runs lexer to compute suggested preamble bounds.
	PreambleBounds ComputePreambleBounds(const LangOptions &LangOpts,			PreambleBounds ComputePreambleBounds(const LangOptions &LangOpts,
	llvm::MemoryBuffer *Buffer,			llvm::MemoryBuffer *Buffer,
	unsigned MaxLines);			unsigned MaxLines);

	class PreambleCallbacks;			class PreambleCallbacks;

	/// A class holding a PCH and all information to check whether it is valid to			/// A class holding a PCH and all information to check whether it is valid to
	▲ Show 20 Lines • Show All 190 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Lex/Lexer.h

Show All 33 Lines	enum ConflictMarkerKind {
/// A normal or diff3 conflict marker, initiated by at least 7 "<"s,		/// A normal or diff3 conflict marker, initiated by at least 7 "<"s,
/// separated by at least 7 "="s or "\|"s, and terminated by at least 7 ">"s.		/// separated by at least 7 "="s or "\|"s, and terminated by at least 7 ">"s.
CMK_Normal,		CMK_Normal,
/// A Perforce-style conflict marker, initiated by 4 ">"s,		/// A Perforce-style conflict marker, initiated by 4 ">"s,
/// separated by 4 "="s, and terminated by 4 "<"s.		/// separated by 4 "="s, and terminated by 4 "<"s.
CMK_Perforce		CMK_Perforce
};		};

		/// Describes the bounds (start, size) of the preamble and a flag required by
		/// PreprocessorOptions::PrecompiledPreambleBytes.
		/// The preamble includes the BOM, if any.
		struct PreambleBounds {
		PreambleBounds(unsigned Size, bool PreambleEndsAtStartOfLine)
		: Size(Size),
		PreambleEndsAtStartOfLine(PreambleEndsAtStartOfLine) {}

		/// \brief Size of the preamble in bytes.
		unsigned Size;
		/// \brief Whether the preamble ends at the start of a new line.
		///
		/// Used to inform the lexer as to whether it's starting at the beginning of
		/// a line after skipping the preamble.
		bool PreambleEndsAtStartOfLine;
		};

/// Lexer - This provides a simple interface that turns a text buffer into a		/// Lexer - This provides a simple interface that turns a text buffer into a
/// stream of tokens. This provides no support for file reading or buffering,		/// stream of tokens. This provides no support for file reading or buffering,
/// or buffering/seeking of tokens, only forward lexing is supported. It relies		/// or buffering/seeking of tokens, only forward lexing is supported. It relies
/// on the specified Preprocessor object to handle preprocessor directives, etc.		/// on the specified Preprocessor object to handle preprocessor directives, etc.
class Lexer : public PreprocessorLexer {		class Lexer : public PreprocessorLexer {
void anchor() override;		void anchor() override;

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	public:
/// a potential prefix header.		/// a potential prefix header.
///		///
/// \param Buffer The memory buffer containing the file's contents.		/// \param Buffer The memory buffer containing the file's contents.
///		///
/// \param MaxLines If non-zero, restrict the length of the preamble		/// \param MaxLines If non-zero, restrict the length of the preamble
/// to fewer than this number of lines.		/// to fewer than this number of lines.
///		///
/// \returns The offset into the file where the preamble ends and the rest		/// \returns The offset into the file where the preamble ends and the rest
/// of the file begins along with a boolean value indicating whether		/// of the file begins along with a boolean value indicating whether
/// the preamble ends at the beginning of a new line.		/// the preamble ends at the beginning of a new line.
static std::pair<unsigned, bool> ComputePreamble(StringRef Buffer,		static PreambleBounds ComputePreamble(StringRef Buffer,
const LangOptions &LangOpts,		const LangOptions &LangOpts,
unsigned MaxLines = 0);		unsigned MaxLines = 0);

/// \brief Checks that the given token is the first token that occurs after		/// \brief Checks that the given token is the first token that occurs after
/// the given location (this excludes comments and whitespace). Returns the		/// the given location (this excludes comments and whitespace). Returns the
/// location immediately after the specified token. If the token is not found		/// location immediately after the specified token. If the token is not found
/// or the location is inside a macro, the returned source location will be		/// or the location is inside a macro, the returned source location will be
/// invalid.		/// invalid.
static SourceLocation findLocationAfterToken(SourceLocation loc,		static SourceLocation findLocationAfterToken(SourceLocation loc,
tok::TokenKind TKind,		tok::TokenKind TKind,
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	private:
/// getCharAndSizeSlowNoWarn - Same as getCharAndSizeSlow, but never emits a		/// getCharAndSizeSlowNoWarn - Same as getCharAndSizeSlow, but never emits a
/// diagnostic.		/// diagnostic.
static char getCharAndSizeSlowNoWarn(const char *Ptr, unsigned &Size,		static char getCharAndSizeSlowNoWarn(const char *Ptr, unsigned &Size,
const LangOptions &LangOpts);		const LangOptions &LangOpts);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Other lexer functions.		// Other lexer functions.

void SkipBytes(unsigned Bytes, bool StartOfLine);		void SetByteOffset(unsigned Offset, bool StartOfLine);

void PropagateLineStartLeadingSpaceInfo(Token &Result);		void PropagateLineStartLeadingSpaceInfo(Token &Result);

const char LexUDSuffix(Token &Result, const char CurPtr,		const char LexUDSuffix(Token &Result, const char CurPtr,
bool IsStringLiteral);		bool IsStringLiteral);

// Helper functions to lex the remainder of a token of the specific type.		// Helper functions to lex the remainder of a token of the specific type.
bool LexIdentifier (Token &Result, const char *CurPtr);		bool LexIdentifier (Token &Result, const char *CurPtr);
▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

cfe/trunk/include/clang/Lex/PreprocessorOptions.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	public:
/// build it again.		/// build it again.
std::shared_ptr<FailedModulesSet> FailedModules;		std::shared_ptr<FailedModulesSet> FailedModules;

public:		public:
PreprocessorOptions() : UsePredefines(true), DetailedRecord(false),		PreprocessorOptions() : UsePredefines(true), DetailedRecord(false),
DisablePCHValidation(false),		DisablePCHValidation(false),
AllowPCHWithCompilerErrors(false),		AllowPCHWithCompilerErrors(false),
DumpDeserializedPCHDecls(false),		DumpDeserializedPCHDecls(false),
PrecompiledPreambleBytes(0, true),		PrecompiledPreambleBytes(0, false),
GeneratePreamble(false),		GeneratePreamble(false),
RemappedFilesKeepOriginalName(true),		RemappedFilesKeepOriginalName(true),
RetainRemappedFileBuffers(false),		RetainRemappedFileBuffers(false),
ObjCXXARCStandardLibrary(ARCXX_nolib) { }		ObjCXXARCStandardLibrary(ARCXX_nolib) { }

void addMacroDef(StringRef Name) { Macros.emplace_back(Name, false); }		void addMacroDef(StringRef Name) { Macros.emplace_back(Name, false); }
void addMacroUndef(StringRef Name) { Macros.emplace_back(Name, true); }		void addMacroUndef(StringRef Name) { Macros.emplace_back(Name, true); }
void addRemappedFile(StringRef From, StringRef To) {		void addRemappedFile(StringRef From, StringRef To) {
Show All 18 Lines	void resetNonModularOptions() {
DumpDeserializedPCHDecls = false;		DumpDeserializedPCHDecls = false;
ImplicitPCHInclude.clear();		ImplicitPCHInclude.clear();
ImplicitPTHInclude.clear();		ImplicitPTHInclude.clear();
TokenCache.clear();		TokenCache.clear();
SingleFileParseMode = false;		SingleFileParseMode = false;
LexEditorPlaceholders = true;		LexEditorPlaceholders = true;
RetainRemappedFileBuffers = true;		RetainRemappedFileBuffers = true;
PrecompiledPreambleBytes.first = 0;		PrecompiledPreambleBytes.first = 0;
PrecompiledPreambleBytes.second = 0;		PrecompiledPreambleBytes.second = false;
}		}
};		};

} // end namespace clang		} // end namespace clang

#endif		#endif

cfe/trunk/lib/Frontend/FrontendActions.cpp

Show First 20 Lines • Show All 585 Lines • ▼ Show 20 Lines	void PrintPreambleAction::ExecuteAction() {
// We don't expect to find any #include directives in a preprocessed input.		// We don't expect to find any #include directives in a preprocessed input.
if (getCurrentFileKind().isPreprocessed())		if (getCurrentFileKind().isPreprocessed())
return;		return;

CompilerInstance &CI = getCompilerInstance();		CompilerInstance &CI = getCompilerInstance();
auto Buffer = CI.getFileManager().getBufferForFile(getCurrentFile());		auto Buffer = CI.getFileManager().getBufferForFile(getCurrentFile());
if (Buffer) {		if (Buffer) {
unsigned Preamble =		unsigned Preamble =
Lexer::ComputePreamble((*Buffer)->getBuffer(), CI.getLangOpts()).first;		Lexer::ComputePreamble((*Buffer)->getBuffer(), CI.getLangOpts()).Size;
llvm::outs().write((*Buffer)->getBufferStart(), Preamble);		llvm::outs().write((*Buffer)->getBufferStart(), Preamble);
}		}
}		}

cfe/trunk/lib/Frontend/PrecompiledPreamble.cpp

Show First 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	template <class T> bool moveOnNoError(llvm::ErrorOr<T> Val, T &Output) {
return true;		return true;
}		}

} // namespace		} // namespace

PreambleBounds clang::ComputePreambleBounds(const LangOptions &LangOpts,		PreambleBounds clang::ComputePreambleBounds(const LangOptions &LangOpts,
llvm::MemoryBuffer *Buffer,		llvm::MemoryBuffer *Buffer,
unsigned MaxLines) {		unsigned MaxLines) {
auto Pre = Lexer::ComputePreamble(Buffer->getBuffer(), LangOpts, MaxLines);		return Lexer::ComputePreamble(Buffer->getBuffer(), LangOpts, MaxLines);
return PreambleBounds(Pre.first, Pre.second);
}		}

llvm::ErrorOr<PrecompiledPreamble> PrecompiledPreamble::Build(		llvm::ErrorOr<PrecompiledPreamble> PrecompiledPreamble::Build(
const CompilerInvocation &Invocation,		const CompilerInvocation &Invocation,
const llvm::MemoryBuffer *MainFileBuffer, PreambleBounds Bounds,		const llvm::MemoryBuffer *MainFileBuffer, PreambleBounds Bounds,
DiagnosticsEngine &Diagnostics, IntrusiveRefCntPtr<vfs::FileSystem> VFS,		DiagnosticsEngine &Diagnostics, IntrusiveRefCntPtr<vfs::FileSystem> VFS,
std::shared_ptr<PCHContainerOperations> PCHContainerOps,		std::shared_ptr<PCHContainerOperations> PCHContainerOps,
PreambleCallbacks &Callbacks) {		PreambleCallbacks &Callbacks) {
▲ Show 20 Lines • Show All 364 Lines • Show Last 20 Lines

cfe/trunk/lib/Lex/Lexer.cpp

Show First 20 Lines • Show All 546 Lines • ▼ Show 20 Lines	namespace {

enum PreambleDirectiveKind {		enum PreambleDirectiveKind {
PDK_Skipped,		PDK_Skipped,
PDK_Unknown		PDK_Unknown
};		};

} // end anonymous namespace		} // end anonymous namespace

std::pair<unsigned, bool> Lexer::ComputePreamble(StringRef Buffer,		PreambleBounds Lexer::ComputePreamble(StringRef Buffer,
const LangOptions &LangOpts,		const LangOptions &LangOpts,
unsigned MaxLines) {		unsigned MaxLines) {
// Create a lexer starting at the beginning of the file. Note that we use a		// Create a lexer starting at the beginning of the file. Note that we use a
// "fake" file source location at offset 1 so that the lexer will track our		// "fake" file source location at offset 1 so that the lexer will track our
// position within the file.		// position within the file.
const unsigned StartOffset = 1;		const unsigned StartOffset = 1;
SourceLocation FileLoc = SourceLocation::getFromRawEncoding(StartOffset);		SourceLocation FileLoc = SourceLocation::getFromRawEncoding(StartOffset);
Lexer TheLexer(FileLoc, LangOpts, Buffer.begin(), Buffer.begin(),		Lexer TheLexer(FileLoc, LangOpts, Buffer.begin(), Buffer.begin(),
Buffer.end());		Buffer.end());
TheLexer.SetCommentRetentionState(true);		TheLexer.SetCommentRetentionState(true);
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	PreambleBounds Lexer::ComputePreamble(StringRef Buffer,
} while (true);		} while (true);

SourceLocation End;		SourceLocation End;
if (ActiveCommentLoc.isValid())		if (ActiveCommentLoc.isValid())
End = ActiveCommentLoc; // don't truncate a decl comment.		End = ActiveCommentLoc; // don't truncate a decl comment.
else		else
End = TheTok.getLocation();		End = TheTok.getLocation();

return std::make_pair(End.getRawEncoding() - StartLoc.getRawEncoding(),		return PreambleBounds(End.getRawEncoding() - FileLoc.getRawEncoding(),
TheTok.isAtStartOfLine());		TheTok.isAtStartOfLine());
}		}

/// AdvanceToTokenCharacter - Given a location that specifies the start of a		/// AdvanceToTokenCharacter - Given a location that specifies the start of a
/// token, return a new location that specifies a character within the token.		/// token, return a new location that specifies a character within the token.
SourceLocation Lexer::AdvanceToTokenCharacter(SourceLocation TokStart,		SourceLocation Lexer::AdvanceToTokenCharacter(SourceLocation TokStart,
unsigned CharNo,		unsigned CharNo,
const SourceManager &SM,		const SourceManager &SM,
▲ Show 20 Lines • Show All 689 Lines • ▼ Show 20 Lines	Slash:
++Size;		++Size;
return *Ptr;		return *Ptr;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Helper methods for lexing.		// Helper methods for lexing.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// \brief Routine that indiscriminately skips bytes in the source file.		/// \brief Routine that indiscriminately sets the offset into the source file.
void Lexer::SkipBytes(unsigned Bytes, bool StartOfLine) {		void Lexer::SetByteOffset(unsigned Offset, bool StartOfLine) {
BufferPtr += Bytes;		BufferPtr = BufferStart + Offset;
if (BufferPtr > BufferEnd)		if (BufferPtr > BufferEnd)
BufferPtr = BufferEnd;		BufferPtr = BufferEnd;
// FIXME: What exactly does the StartOfLine bit mean? There are two		// FIXME: What exactly does the StartOfLine bit mean? There are two
// possible meanings for the "start" of the line: the first token on the		// possible meanings for the "start" of the line: the first token on the
// unexpanded line, or the first token on the expanded line.		// unexpanded line, or the first token on the expanded line.
IsAtStartOfLine = StartOfLine;		IsAtStartOfLine = StartOfLine;
IsAtPhysicalStartOfLine = StartOfLine;		IsAtPhysicalStartOfLine = StartOfLine;
}		}
▲ Show 20 Lines • Show All 2,354 Lines • Show Last 20 Lines

cfe/trunk/lib/Lex/Preprocessor.cpp

Show First 20 Lines • Show All 510 Lines • ▼ Show 20 Lines	void Preprocessor::EnterMainSourceFile() {
// a main file.		// a main file.
if (!SourceMgr.isLoadedFileID(MainFileID)) {		if (!SourceMgr.isLoadedFileID(MainFileID)) {
// Enter the main file source buffer.		// Enter the main file source buffer.
EnterSourceFile(MainFileID, nullptr, SourceLocation());		EnterSourceFile(MainFileID, nullptr, SourceLocation());

// If we've been asked to skip bytes in the main file (e.g., as part of a		// If we've been asked to skip bytes in the main file (e.g., as part of a
// precompiled preamble), do so now.		// precompiled preamble), do so now.
if (SkipMainFilePreamble.first > 0)		if (SkipMainFilePreamble.first > 0)
CurLexer->SkipBytes(SkipMainFilePreamble.first,		CurLexer->SetByteOffset(SkipMainFilePreamble.first,
SkipMainFilePreamble.second);		SkipMainFilePreamble.second);

// Tell the header info that the main file was entered. If the file is later		// Tell the header info that the main file was entered. If the file is later
// #imported, it won't be re-entered.		// #imported, it won't be re-entered.
if (const FileEntry *FE = SourceMgr.getFileEntryForID(MainFileID))		if (const FileEntry *FE = SourceMgr.getFileEntryForID(MainFileID))
HeaderInfo.IncrementIncludeCount(FE);		HeaderInfo.IncrementIncludeCount(FE);
}		}

// Preprocess Predefines to populate the initial preprocessor state.		// Preprocess Predefines to populate the initial preprocessor state.
std::unique_ptr<llvm::MemoryBuffer> SB =		std::unique_ptr<llvm::MemoryBuffer> SB =
▲ Show 20 Lines • Show All 432 Lines • Show Last 20 Lines

cfe/trunk/unittests/Frontend/PCHPreambleTest.cpp

Show First 20 Lines • Show All 147 Lines • ▼ Show 20 Lines	TEST_F(PCHPreambleTest, ReparseWithOverriddenFileDoesNotInvalidatePreamble) {

ASSERT_TRUE(ReparseAST(AST));		ASSERT_TRUE(ReparseAST(AST));

ASSERT_NE(initialCounts[0], GetFileReadCount(MainName));		ASSERT_NE(initialCounts[0], GetFileReadCount(MainName));
ASSERT_EQ(initialCounts[1], GetFileReadCount(Header1));		ASSERT_EQ(initialCounts[1], GetFileReadCount(Header1));
ASSERT_EQ(initialCounts[2], GetFileReadCount(Header2));		ASSERT_EQ(initialCounts[2], GetFileReadCount(Header2));
}		}

		TEST_F(PCHPreambleTest, ParseWithBom) {
		std::string Header = "//./header.h";
		std::string Main = "//./main.cpp";
		AddFile(Header, "int random() { return 4; }");
		AddFile(Main,
		"\xef\xbb\xbf"
		"#include \"//./header.h\"\n"
		"int main() { return random() -2; }");

		std::unique_ptr<ASTUnit> AST(ParseAST(Main));
		ASSERT_TRUE(AST.get());
		ASSERT_FALSE(AST->getDiagnostics().hasErrorOccurred());

		unsigned HeaderReadCount = GetFileReadCount(Header);

		ASSERT_TRUE(ReparseAST(AST));
		ASSERT_FALSE(AST->getDiagnostics().hasErrorOccurred());

		// Check preamble PCH was really reused
		ASSERT_EQ(HeaderReadCount, GetFileReadCount(Header));

		// Remove BOM
		RemapFile(Main,
		"#include \"//./header.h\"\n"
		"int main() { return random() -2; }");

		ASSERT_TRUE(ReparseAST(AST));
		ASSERT_FALSE(AST->getDiagnostics().hasErrorOccurred());

		ASSERT_LE(HeaderReadCount, GetFileReadCount(Header));
		HeaderReadCount = GetFileReadCount(Header);

		// Add BOM back
		RemapFile(Main,
		"\xef\xbb\xbf"
		"#include \"//./header.h\"\n"
		"int main() { return random() -2; }");

		ASSERT_TRUE(ReparseAST(AST));
		ASSERT_FALSE(AST->getDiagnostics().hasErrorOccurred());

		ASSERT_LE(HeaderReadCount, GetFileReadCount(Header));
		}

} // anonymous namespace		} // anonymous namespace

This is an archive of the discontinued LLVM Phabricator instance.

[Preamble] Fixed preamble breaking with BOM presence (and particularly, fluctuating BOM presence)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 116049

cfe/trunk/include/clang/Frontend/PrecompiledPreamble.h

cfe/trunk/include/clang/Lex/Lexer.h

cfe/trunk/include/clang/Lex/PreprocessorOptions.h

cfe/trunk/lib/Frontend/FrontendActions.cpp

cfe/trunk/lib/Frontend/PrecompiledPreamble.cpp

cfe/trunk/lib/Lex/Lexer.cpp

cfe/trunk/lib/Lex/Preprocessor.cpp

cfe/trunk/unittests/Frontend/PCHPreambleTest.cpp

[Preamble] Fixed preamble breaking with BOM presence (and particularly, fluctuating BOM presence)
ClosedPublic