This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/trunk/
-
trunk/
-
clangd/
-
ClangdServer.cpp
-
ClangdUnit.h
-
ClangdUnit.cpp
-
CodeComplete.cpp
-
Protocol.h
-
SourceCode.h
6
SourceCode.cpp
-
XRefs.cpp
-
test/clangd/
-
clangd/
-
rename.test
-
unittests/clangd/
-
clangd/
-
ClangdUnitTests.cpp
-
DraftStoreTests.cpp
-
SourceCodeTests.cpp

Differential D46035

[clangd] Fix unicode handling, using UTF-16 where LSP requires it.
ClosedPublic

Authored by sammccall on Apr 24 2018, 4:26 PM.

Download Raw Diff

Details

Reviewers

hokein

Commits

rGa4962cce49d5: [clangd] Fix unicode handling, using UTF-16 where LSP requires it.
rCTE331029: [clangd] Fix unicode handling, using UTF-16 where LSP requires it.
rL331029: [clangd] Fix unicode handling, using UTF-16 where LSP requires it.

Summary

The Language Server Protocol unfortunately mandates that locations in files
be represented by line/column pairs, where the "column" is actually an index
into the UTF-16-encoded text of the line.
(This is because VSCode is written in JavaScript, which is UTF-16-native).

Internally clangd treats source files at UTF-8, the One True Encoding, and
generally deals with byte offsets (though there are exceptions).

Before this patch, conversions between offsets and LSP Position pretended
that Position.character was UTF-8 bytes, which is only true for ASCII lines.
Now we examine the text to convert correctly (but don't actually need to
transcode it, due to some nice details of the encodings).

The updated functions in SourceCode are the blessed way to interact with
the Position.character field, and anything else is likely to be wrong.
So I also updated the other accesses:

CodeComplete needs a "clang-style" line/column, with column in utf-8 bytes. This is now converted via Position -> offset -> clang line/column (a new function is added to SourceCode.h for the second conversion).
getBeginningOfIdentifier skipped backwards in UTF-16 space, which is will behave badly when it splits a surrogate pair. Skipping backwards in UTF-8 coordinates gives the lexer a fighting chance of getting this right. While here, I clarified(?) the logic comments, fixed a bug with identifiers containing digits, simplified the signature slightly and added a test.

This seems likely to cause problems with editors that have the same bug, and
treat the protocol as if columns are UTF-8 bytes. But we can find and fix those.

Diff Detail

Repository: rL LLVM

Event Timeline

sammccall created this revision.Apr 24 2018, 4:26 PM

Herald added subscribers: cfe-commits, jkorous, MaskRay and 3 others. · View Herald TranscriptApr 24 2018, 4:26 PM

Harbormaster completed remote builds in B17385: Diff 143835.Apr 24 2018, 4:27 PM

clang-format

Harbormaster completed remote builds in B17386: Diff 143836.Apr 24 2018, 4:27 PM

Remove some debugging junk, tweak a comment.

Harbormaster completed remote builds in B17387: Diff 143838.Apr 24 2018, 4:36 PM

Cool, the code looks good to me (just a few nits), thanks for the descriptive comments!

This seems likely to cause problems with editors that have the same bug, and
treat the protocol as if columns are UTF-8 bytes. But we can find and fix those.

VSCode is fine I think, but we need to fix our internal ycm vim integration.

clangd/SourceCode.cpp
25 ↗	(On Diff #143838)	Can we make it `static`? The callback type is function<int, int>, the reason why using template here is mainly to save some keystroke?
53 ↗	(On Diff #143838)	nit: consider naming the parameter `U16Units`?
72 ↗	(On Diff #143838)	Maybe add an `assume` to ensure `iterateCodepoints` always return false (reach the end of the U8)?
137 ↗	(On Diff #143838)	nit: it took me a while to understand what the sub-expression `Code.substr(LocInfo.second - ColumnInBytes, ColumnInBytes)` does, maybe abstract it out with a descriptive name? Also it is not straight-forward to understand what `LocInfo.second` is without navigating to `getDecomposedSpellingLoc`.

This revision is now accepted and ready to land.Apr 25 2018, 2:29 AM

Thanks!

clangd/SourceCode.cpp
25 ↗	(On Diff #143838)	Added static. The difference between using a template vs `std::function` for a lambda is compile-time vs run-time polymorphism: invoking std::function is a virtual call and (AFAIK) compilers don't manage to inline it well. With the template, we get one copy of the function for each callsite, with the lambda inlined. Not sure the performance is a big deal here, but this code is at least plausibly hot I guess? And I think there's very little readability cost to using the template in this case.
72 ↗	(On Diff #143838)	I'm not sure there's enough value to this one, it's clear from the local code that this isn't possible, and it doesn't seem likely a bug would manifest this way (abort early even though our function returns false, and return true from iterateCodepoints). The small cost is added noise - I think this needs a new variable, and assert, and a suppression of "unused variable" warning.
137 ↗	(On Diff #143838)	Called this `LineSoFar` and decomposed `LocInfo` int named variables.

Closed by commit rL331029: [clangd] Fix unicode handling, using UTF-16 where LSP requires it. (authored by sammccall). · Explain WhyApr 27 2018, 5:02 AM

This revision was automatically updated to reflect the committed changes.

sammccall marked 2 inline comments as done.

Herald added a subscriber: llvm-commits. · View Herald TranscriptApr 27 2018, 5:02 AM

benhamilton added a subscriber: benhamilton.Apr 27 2018, 12:07 PM

benhamilton added inline comments.

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	This is user input, right? Have we actually checked for valid UTF-8, or do we just assume it's valid? If not, it seems like an assertion is not the right check, but we should reject it when we're reading the input.

sammccall added inline comments.Apr 30 2018, 1:42 AM

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	Yeah, I wasn't sure about this, offline discussion tentatively concluded we wanted an assert, but I'm happy to switch to something else. We don't validate the code on the way in, so strings are "bytes of presumed-UTF8". This is usually not a big pain actually. But we could/should certainly make the JSON parser validate the UTF-8. (If we want to go this route, D45753 should be resolved first). There's two ways the assertion could fire: the code is invalid UTF-8, or there's a bug in the unicode logic here. I thought the latter was more likely at least in the short-term :) and this is the least invasive way to catch it. And if a developer build (assert-enabled) crashes because an editor feeds it invalid bytes, then that's probably better than doing nothing (though not as good as catching the error earlier).

benhamilton added inline comments.Apr 30 2018, 8:57 AM

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	The problem with not validating is it's easy to cause OOB memory access (and thus security issues) if someone crafts malicious UTF-8 and makes us read off the end of a string. We should be clear about the status of all strings in the documentation to APIs.

sammccall added inline comments.Apr 30 2018, 9:16 AM

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	You still have to find/write a UTF-8 decoder that doesn't check bounds, which is (hopefully!) the harder part of writing that bug :-) But I agree in principle, there's more subtle attacks too, like `C08A` which is invalid but non-validating decoders will treat as a newline. I have a nearly-finished patch to add real validation to the JSON library, I'll copy you on it.

benhamilton added inline comments.Apr 30 2018, 9:21 AM

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	Seems like this particular decoder isn't checking bounds, eh? ;) If `NDEBUG` is set, it will happily set `UTF8Length` to however many leading 1s there are (valid or not) and pass that to `CB(UTF8Length)`. It's true that the current callbacks passed in won't directly turn that into an OOB memory access, but they will end up returning an invalid UTF-16 code unit length from `positionToOffset()`, so who knows what that will end up doing. Thanks, I'm always happy to review Unicode stuff.

sammccall added inline comments.Apr 30 2018, 9:25 AM

clang-tools-extra/trunk/clangd/SourceCode.cpp
38	Ah, yeah - the implicit context here is that we don't do anything with UTF-16 other than send it back to the client. So this is garbage in, garbage out. Definitely not ideal.

Revision Contents

Path

Size

clang-tools-extra/

trunk/

clangd/

8 lines

3 lines

75 lines

11 lines

2 lines

19 lines

98 lines

21 lines

test/

clangd/

rename.test

2 lines

unittests/

clangd/

ClangdUnitTests.cpp

23 lines

DraftStoreTests.cpp

6 lines

SourceCodeTests.cpp

50 lines

Diff 144312

clang-tools-extra/trunk/clangd/ClangdServer.cpp

Show First 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	auto Action = [Pos](Path File, std::string NewName,
Callback<std::vector<tooling::Replacement>> CB,		Callback<std::vector<tooling::Replacement>> CB,
Expected<InputsAndAST> InpAST) {		Expected<InputsAndAST> InpAST) {
if (!InpAST)		if (!InpAST)
return CB(InpAST.takeError());		return CB(InpAST.takeError());
auto &AST = InpAST->AST;		auto &AST = InpAST->AST;

RefactoringResultCollector ResultCollector;		RefactoringResultCollector ResultCollector;
const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();		const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();
const FileEntry *FE =
SourceMgr.getFileEntryForID(SourceMgr.getMainFileID());
if (!FE)
return CB(llvm::make_error<llvm::StringError>(
"rename called for non-added document",
llvm::errc::invalid_argument));
SourceLocation SourceLocationBeg =		SourceLocation SourceLocationBeg =
clangd::getBeginningOfIdentifier(AST, Pos, FE);		clangd::getBeginningOfIdentifier(AST, Pos, SourceMgr.getMainFileID());
tooling::RefactoringRuleContext Context(		tooling::RefactoringRuleContext Context(
AST.getASTContext().getSourceManager());		AST.getASTContext().getSourceManager());
Context.setASTContext(AST.getASTContext());		Context.setASTContext(AST.getASTContext());
auto Rename = clang::tooling::RenameOccurrences::initiate(		auto Rename = clang::tooling::RenameOccurrences::initiate(
Context, SourceRange(SourceLocationBeg), NewName);		Context, SourceRange(SourceLocationBeg), NewName);
if (!Rename)		if (!Rename)
return CB(Rename.takeError());		return CB(Rename.takeError());

▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

clang-tools-extra/trunk/clangd/ClangdUnit.h

Show First 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	private:
/// Utility class required by clang		/// Utility class required by clang
std::shared_ptr<PCHContainerOperations> PCHs;		std::shared_ptr<PCHContainerOperations> PCHs;
/// This is called after the file is parsed. This can be nullptr if there is		/// This is called after the file is parsed. This can be nullptr if there is
/// no callback.		/// no callback.
ASTParsedCallback ASTCallback;		ASTParsedCallback ASTCallback;
};		};

/// Get the beginning SourceLocation at a specified \p Pos.		/// Get the beginning SourceLocation at a specified \p Pos.
		/// May be invalid if Pos is, or if there's no identifier.
SourceLocation getBeginningOfIdentifier(ParsedAST &Unit, const Position &Pos,		SourceLocation getBeginningOfIdentifier(ParsedAST &Unit, const Position &Pos,
const FileEntry *FE);		const FileID FID);

/// For testing/debugging purposes. Note that this method deserializes all		/// For testing/debugging purposes. Note that this method deserializes all
/// unserialized Decls, so use with care.		/// unserialized Decls, so use with care.
void dumpAST(ParsedAST &AST, llvm::raw_ostream &OS);		void dumpAST(ParsedAST &AST, llvm::raw_ostream &OS);

} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang
#endif		#endif

clang-tools-extra/trunk/clangd/ClangdUnit.cpp

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	ParsedAST::Build(std::unique_ptr<clang::CompilerInvocation> CI,
ASTDiags.EndSourceFile();		ASTDiags.EndSourceFile();

std::vector<const Decl *> ParsedDecls = Action->takeTopLevelDecls();		std::vector<const Decl *> ParsedDecls = Action->takeTopLevelDecls();
return ParsedAST(std::move(Preamble), std::move(Clang), std::move(Action),		return ParsedAST(std::move(Preamble), std::move(Clang), std::move(Action),
std::move(ParsedDecls), ASTDiags.take(),		std::move(ParsedDecls), ASTDiags.take(),
std::move(IncLocations));		std::move(IncLocations));
}		}

namespace {

SourceLocation getMacroArgExpandedLocation(const SourceManager &Mgr,
const FileEntry *FE, Position Pos) {
// The language server protocol uses zero-based line and column numbers.
// Clang uses one-based numbers.
SourceLocation InputLoc =
Mgr.translateFileLineCol(FE, Pos.line + 1, Pos.character + 1);
return Mgr.getMacroArgExpandedLocation(InputLoc);
}

} // namespace

void ParsedAST::ensurePreambleDeclsDeserialized() {		void ParsedAST::ensurePreambleDeclsDeserialized() {
if (PreambleDeclsDeserialized \|\| !Preamble)		if (PreambleDeclsDeserialized \|\| !Preamble)
return;		return;

std::vector<const Decl *> Resolved;		std::vector<const Decl *> Resolved;
Resolved.reserve(Preamble->TopLevelDeclIDs.size());		Resolved.reserve(Preamble->TopLevelDeclIDs.size());

ExternalASTSource &Source = *getASTContext().getExternalSource();		ExternalASTSource &Source = *getASTContext().getExternalSource();
▲ Show 20 Lines • Show All 226 Lines • ▼ Show 20 Lines	CppFile::rebuildPreamble(CompilerInvocation &CI,
} else {		} else {
log("Could not build a preamble for file " + Twine(FileName));		log("Could not build a preamble for file " + Twine(FileName));
return nullptr;		return nullptr;
}		}
}		}

SourceLocation clangd::getBeginningOfIdentifier(ParsedAST &Unit,		SourceLocation clangd::getBeginningOfIdentifier(ParsedAST &Unit,
const Position &Pos,		const Position &Pos,
const FileEntry *FE) {		const FileID FID) {
const ASTContext &AST = Unit.getASTContext();		const ASTContext &AST = Unit.getASTContext();
const SourceManager &SourceMgr = AST.getSourceManager();		const SourceManager &SourceMgr = AST.getSourceManager();
		auto Offset = positionToOffset(SourceMgr.getBufferData(FID), Pos);
SourceLocation InputLocation =		if (!Offset) {
getMacroArgExpandedLocation(SourceMgr, FE, Pos);		log("getBeginningOfIdentifier: " + toString(Offset.takeError()));
if (Pos.character == 0) {		return SourceLocation();
return InputLocation;		}
}		SourceLocation InputLoc = SourceMgr.getComposedLoc(FID, *Offset);

// This handle cases where the position is in the middle of a token or right		// GetBeginningOfToken(pos) is almost what we want, but does the wrong thing
// after the end of a token. In theory we could just use GetBeginningOfToken		// if the cursor is at the end of the identifier.
// to find the start of the token at the input position, but this doesn't		// Instead, we lex at GetBeginningOfToken(pos - 1). The cases are:
// work when right after the end, i.e. foo\|.		// 1) at the beginning of an identifier, we'll be looking at something
// So try to go back by one and see if we're still inside an identifier		// that isn't an identifier.
// token. If so, Take the beginning of this token.		// 2) at the middle or end of an identifier, we get the identifier.
// (It should be the same identifier because you can't have two adjacent		// 3) anywhere outside an identifier, we'll get some non-identifier thing.
// identifiers without another token in between.)		// We can't actually distinguish cases 1 and 3, but returning the original
Position PosCharBehind = Pos;		// location is correct for both!
--PosCharBehind.character;		if (*Offset == 0) // Case 1 or 3.
		return SourceMgr.getMacroArgExpandedLocation(InputLoc);
SourceLocation PeekBeforeLocation =		SourceLocation Before =
getMacroArgExpandedLocation(SourceMgr, FE, PosCharBehind);		SourceMgr.getMacroArgExpandedLocation(InputLoc.getLocWithOffset(-1));
Token Result;		Before = Lexer::GetBeginningOfToken(Before, SourceMgr, AST.getLangOpts());
if (Lexer::getRawToken(PeekBeforeLocation, Result, SourceMgr,		Token Tok;
AST.getLangOpts(), false)) {		if (Before.isValid() &&
// getRawToken failed, just use InputLocation.		!Lexer::getRawToken(Before, Tok, SourceMgr, AST.getLangOpts(), false) &&
return InputLocation;		Tok.is(tok::raw_identifier))
}		return Before; // Case 2.
		return SourceMgr.getMacroArgExpandedLocation(InputLoc); // Case 1 or 3.
if (Result.is(tok::raw_identifier)) {
return Lexer::GetBeginningOfToken(PeekBeforeLocation, SourceMgr,
AST.getLangOpts());
}

return InputLocation;
}		}

clang-tools-extra/trunk/clangd/CodeComplete.cpp

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	bool semaCodeComplete(std::unique_ptr<CodeCompleteConsumer> Consumer,

// Disable typo correction in Sema.		// Disable typo correction in Sema.
Clang->getLangOpts().SpellChecking = false;		Clang->getLangOpts().SpellChecking = false;

auto &FrontendOpts = Clang->getFrontendOpts();		auto &FrontendOpts = Clang->getFrontendOpts();
FrontendOpts.SkipFunctionBodies = true;		FrontendOpts.SkipFunctionBodies = true;
FrontendOpts.CodeCompleteOpts = Options;		FrontendOpts.CodeCompleteOpts = Options;
FrontendOpts.CodeCompletionAt.FileName = Input.FileName;		FrontendOpts.CodeCompletionAt.FileName = Input.FileName;
FrontendOpts.CodeCompletionAt.Line = Input.Pos.line + 1;		auto Offset = positionToOffset(Input.Contents, Input.Pos);
FrontendOpts.CodeCompletionAt.Column = Input.Pos.character + 1;		if (!Offset) {
		log("Code completion position was invalid " +
		llvm::toString(Offset.takeError()));
		return false;
		}
		std::tie(FrontendOpts.CodeCompletionAt.Line,
		FrontendOpts.CodeCompletionAt.Column) =
		offsetToClangLineColumn(Input.Contents, *Offset);

Clang->setCodeCompletionConsumer(Consumer.release());		Clang->setCodeCompletionConsumer(Consumer.release());

SyntaxOnlyAction Action;		SyntaxOnlyAction Action;
if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0])) {		if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0])) {
log("BeginSourceFile() failed when running codeComplete for " +		log("BeginSourceFile() failed when running codeComplete for " +
Input.FileName);		Input.FileName);
return false;		return false;
▲ Show 20 Lines • Show All 291 Lines • Show Last 20 Lines

clang-tools-extra/trunk/clangd/Protocol.h

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	json::Expr toJSON(const TextDocumentIdentifier &);			json::Expr toJSON(const TextDocumentIdentifier &);
	bool fromJSON(const json::Expr &, TextDocumentIdentifier &);			bool fromJSON(const json::Expr &, TextDocumentIdentifier &);

	struct Position {			struct Position {
	/// Line position in a document (zero-based).			/// Line position in a document (zero-based).
	int line = 0;			int line = 0;

	/// Character offset on a line in a document (zero-based).			/// Character offset on a line in a document (zero-based).
				/// WARNING: this is in UTF-16 codepoints, not bytes or characters!
				/// Use the functions in SourceCode.h to construct/interpret Positions.
	int character = 0;			int character = 0;

	friend bool operator==(const Position &LHS, const Position &RHS) {			friend bool operator==(const Position &LHS, const Position &RHS) {
	return std::tie(LHS.line, LHS.character) ==			return std::tie(LHS.line, LHS.character) ==
	std::tie(RHS.line, RHS.character);			std::tie(RHS.line, RHS.character);
	}			}
	friend bool operator<(const Position &LHS, const Position &RHS) {			friend bool operator<(const Position &LHS, const Position &RHS) {
	return std::tie(LHS.line, LHS.character) <			return std::tie(LHS.line, LHS.character) <
	▲ Show 20 Lines • Show All 767 Lines • Show Last 20 Lines

clang-tools-extra/trunk/clangd/SourceCode.h

	Show All 17 Lines

	namespace clang {			namespace clang {
	class SourceManager;			class SourceManager;

	namespace clangd {			namespace clangd {

	/// Turn a [line, column] pair into an offset in Code.			/// Turn a [line, column] pair into an offset in Code.
	///			///
	/// If the character value is greater than the line length, the behavior depends			/// If P.character exceeds the line length, returns the offset at end-of-line.
	/// on AllowColumnsBeyondLineLength:			/// (If !AllowColumnsBeyondLineLength, then returns an error instead).
	///			/// If the line number is out of range, returns an error.
	/// - if true: default back to the end of the line
	/// - if false: return an error
	///
	/// If the line number is greater than the number of lines in the document,
	/// always return an error.
	///			///
	/// The returned value is in the range [0, Code.size()].			/// The returned value is in the range [0, Code.size()].
	llvm::Expected<size_t>			llvm::Expected<size_t>
	positionToOffset(llvm::StringRef Code, Position P,			positionToOffset(llvm::StringRef Code, Position P,
	bool AllowColumnsBeyondLineLength = true);			bool AllowColumnsBeyondLineLength = true);

	/// Turn an offset in Code into a [line, column] pair.			/// Turn an offset in Code into a [line, column] pair.
	/// FIXME: This should return an error if the offset is invalid.			/// The offset must be in range [0, Code.size()].
	Position offsetToPosition(llvm::StringRef Code, size_t Offset);			Position offsetToPosition(llvm::StringRef Code, size_t Offset);

	/// Turn a SourceLocation into a [line, column] pair.			/// Turn a SourceLocation into a [line, column] pair.
	/// FIXME: This should return an error if the location is invalid.			/// FIXME: This should return an error if the location is invalid.
	Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc);			Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc);

	// Converts a half-open clang source range to an LSP range.			// Converts a half-open clang source range to an LSP range.
	// Note that clang also uses closed source ranges, which this can't handle!			// Note that clang also uses closed source ranges, which this can't handle!
	Range halfOpenToRange(const SourceManager &SM, CharSourceRange R);			Range halfOpenToRange(const SourceManager &SM, CharSourceRange R);

				// Converts an offset to a clang line/column (1-based, columns are bytes).
				// The offset must be in range [0, Code.size()].
				// Prefer to use SourceManager if one is available.
				std::pair<size_t, size_t> offsetToClangLineColumn(llvm::StringRef Code,
				size_t Offset);

	/// From "a::b::c", return {"a::b::", "c"}. Scope is empty if there's no			/// From "a::b::c", return {"a::b::", "c"}. Scope is empty if there's no
	/// qualifier.			/// qualifier.
	std::pair<llvm::StringRef, llvm::StringRef>			std::pair<llvm::StringRef, llvm::StringRef>
	splitQualifiedName(llvm::StringRef QName);			splitQualifiedName(llvm::StringRef QName);

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang
	#endif			#endif

clang-tools-extra/trunk/clangd/SourceCode.cpp

	Show All 10 Lines
	#include "clang/Basic/SourceManager.h"			#include "clang/Basic/SourceManager.h"
	#include "llvm/Support/Errc.h"			#include "llvm/Support/Errc.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {
	using namespace llvm;			using namespace llvm;

				// Here be dragons. LSP positions use columns measured in UTF-16 code units!
				// Clangd uses UTF-8 and byte-offsets internally, so conversion is nontrivial.

				// Iterates over unicode codepoints in the (UTF-8) string. For each,
				// invokes CB(UTF-8 length, UTF-16 length), and breaks if it returns true.
				// Returns true if CB returned true, false if we hit the end of string.
				template <typename Callback>
				static bool iterateCodepoints(StringRef U8, const Callback &CB) {
				for (size_t I = 0; I < U8.size();) {
				unsigned char C = static_cast<unsigned char>(U8[I]);
				if (LLVM_LIKELY(!(C & 0x80))) { // ASCII character.
				if (CB(1, 1))
				return true;
				++I;
				continue;
				}
				// This convenient property of UTF-8 holds for all non-ASCII characters.
				size_t UTF8Length = countLeadingOnes(C);
				// 0xxx is ASCII, handled above. 10xxx is a trailing byte, invalid here.
				// 11111xxx is not valid UTF-8 at all. Assert because it's probably our bug.
				benhamiltonUnsubmitted Not Done Reply Inline Actions This is user input, right? Have we actually checked for valid UTF-8, or do we just assume it's valid? If not, it seems like an assertion is not the right check, but we should reject it when we're reading the input. benhamilton: This is user input, right? Have we actually checked for valid UTF-8, or do we just assume it's…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I wasn't sure about this, offline discussion tentatively concluded we wanted an assert, but I'm happy to switch to something else. We don't validate the code on the way in, so strings are "bytes of presumed-UTF8". This is usually not a big pain actually. But we could/should certainly make the JSON parser validate the UTF-8. (If we want to go this route, D45753 should be resolved first). There's two ways the assertion could fire: the code is invalid UTF-8, or there's a bug in the unicode logic here. I thought the latter was more likely at least in the short-term :) and this is the least invasive way to catch it. And if a developer build (assert-enabled) crashes because an editor feeds it invalid bytes, then that's probably better than doing nothing (though not as good as catching the error earlier). sammccall: Yeah, I wasn't sure about this, offline discussion tentatively concluded we wanted an assert…
				benhamiltonUnsubmitted Not Done Reply Inline Actions The problem with not validating is it's easy to cause OOB memory access (and thus security issues) if someone crafts malicious UTF-8 and makes us read off the end of a string. We should be clear about the status of all strings in the documentation to APIs. benhamilton: The problem with not validating is it's easy to cause OOB memory access (and thus security…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions You still have to find/write a UTF-8 decoder that doesn't check bounds, which is (hopefully!) the harder part of writing that bug :-) But I agree in principle, there's more subtle attacks too, like `C08A` which is invalid but non-validating decoders will treat as a newline. I have a nearly-finished patch to add real validation to the JSON library, I'll copy you on it. sammccall: You still have to find/write a UTF-8 decoder that doesn't check bounds, which is (hopefully!)…
				benhamiltonUnsubmitted Not Done Reply Inline Actions Seems like this particular decoder isn't checking bounds, eh? ;) If `NDEBUG` is set, it will happily set `UTF8Length` to however many leading 1s there are (valid or not) and pass that to `CB(UTF8Length)`. It's true that the current callbacks passed in won't directly turn that into an OOB memory access, but they will end up returning an invalid UTF-16 code unit length from `positionToOffset()`, so who knows what that will end up doing. Thanks, I'm always happy to review Unicode stuff. benhamilton: Seems like this particular decoder isn't checking bounds, eh? ;) If `NDEBUG` is set, it will…
				sammccallAuthorUnsubmitted Not Done Reply Inline Actions Ah, yeah - the implicit context here is that we don't do anything with UTF-16 other than send it back to the client. So this is garbage in, garbage out. Definitely not ideal. sammccall: Ah, yeah - the implicit context here is that we don't do anything with UTF-16 other than send…
				assert((UTF8Length >= 2 && UTF8Length <= 4) &&
				"Invalid UTF-8, or transcoding bug?");
				I += UTF8Length; // Skip over all trailing bytes.
				// A codepoint takes two UTF-16 code unit if it's astral (outside BMP).
				// Astral codepoints are encoded as 4 bytes in UTF-8 (11110xxx ...)
				if (CB(UTF8Length, UTF8Length == 4 ? 2 : 1))
				return true;
				}
				return false;
				}

				// Returns the offset into the string that matches \p Units UTF-16 code units.
				// Conceptually, this converts to UTF-16, truncates to CodeUnits, converts back
				// to UTF-8, and returns the length in bytes.
				static size_t measureUTF16(StringRef U8, int U16Units, bool &Valid) {
				size_t Result = 0;
				Valid = U16Units == 0 \|\| iterateCodepoints(U8, [&](int U8Len, int U16Len) {
				Result += U8Len;
				U16Units -= U16Len;
				return U16Units <= 0;
				});
				if (U16Units < 0) // Offset was into the middle of a surrogate pair.
				Valid = false;
				// Don't return an out-of-range index if we overran.
				return std::min(Result, U8.size());
				}

				// Counts the number of UTF-16 code units needed to represent a string.
				// Like most strings in clangd, the input is UTF-8 encoded.
				static size_t utf16Len(StringRef U8) {
				// A codepoint takes two UTF-16 code unit if it's astral (outside BMP).
				// Astral codepoints are encoded as 4 bytes in UTF-8, starting with 11110xxx.
				size_t Count = 0;
				iterateCodepoints(U8, [&](int U8Len, int U16Len) {
				Count += U16Len;
				return false;
				});
				return Count;
				}

	llvm::Expected<size_t> positionToOffset(StringRef Code, Position P,			llvm::Expected<size_t> positionToOffset(StringRef Code, Position P,
	bool AllowColumnsBeyondLineLength) {			bool AllowColumnsBeyondLineLength) {
	if (P.line < 0)			if (P.line < 0)
	return llvm::make_error<llvm::StringError>(			return llvm::make_error<llvm::StringError>(
	llvm::formatv("Line value can't be negative ({0})", P.line),			llvm::formatv("Line value can't be negative ({0})", P.line),
	llvm::errc::invalid_argument);			llvm::errc::invalid_argument);
	if (P.character < 0)			if (P.character < 0)
	return llvm::make_error<llvm::StringError>(			return llvm::make_error<llvm::StringError>(
	llvm::formatv("Character value can't be negative ({0})", P.character),			llvm::formatv("Character value can't be negative ({0})", P.character),
	llvm::errc::invalid_argument);			llvm::errc::invalid_argument);
	size_t StartOfLine = 0;			size_t StartOfLine = 0;
	for (int I = 0; I != P.line; ++I) {			for (int I = 0; I != P.line; ++I) {
	size_t NextNL = Code.find('\n', StartOfLine);			size_t NextNL = Code.find('\n', StartOfLine);
	if (NextNL == StringRef::npos)			if (NextNL == StringRef::npos)
	return llvm::make_error<llvm::StringError>(			return llvm::make_error<llvm::StringError>(
	llvm::formatv("Line value is out of range ({0})", P.line),			llvm::formatv("Line value is out of range ({0})", P.line),
	llvm::errc::invalid_argument);			llvm::errc::invalid_argument);
	StartOfLine = NextNL + 1;			StartOfLine = NextNL + 1;
	}			}

	size_t NextNL = Code.find('\n', StartOfLine);			size_t NextNL = Code.find('\n', StartOfLine);
	if (NextNL == StringRef::npos)			if (NextNL == StringRef::npos)
	NextNL = Code.size();			NextNL = Code.size();

	if (StartOfLine + P.character > NextNL && !AllowColumnsBeyondLineLength)			bool Valid;
				size_t ByteOffsetInLine = measureUTF16(
				Code.substr(StartOfLine, NextNL - StartOfLine), P.character, Valid);
				if (!Valid && !AllowColumnsBeyondLineLength)
	return llvm::make_error<llvm::StringError>(			return llvm::make_error<llvm::StringError>(
	llvm::formatv("Character value is out of range ({0})", P.character),			llvm::formatv("UTF-16 offset {0} is invalid for line {1}", P.character,
				P.line),
	llvm::errc::invalid_argument);			llvm::errc::invalid_argument);
	// FIXME: officially P.character counts UTF-16 code units, not UTF-8 bytes!			return StartOfLine + ByteOffsetInLine;
	return std::min(NextNL, StartOfLine + P.character);
	}			}

	Position offsetToPosition(StringRef Code, size_t Offset) {			Position offsetToPosition(StringRef Code, size_t Offset) {
	Offset = std::min(Code.size(), Offset);			Offset = std::min(Code.size(), Offset);
	StringRef Before = Code.substr(0, Offset);			StringRef Before = Code.substr(0, Offset);
	int Lines = Before.count('\n');			int Lines = Before.count('\n');
	size_t PrevNL = Before.rfind('\n');			size_t PrevNL = Before.rfind('\n');
	size_t StartOfLine = (PrevNL == StringRef::npos) ? 0 : (PrevNL + 1);			size_t StartOfLine = (PrevNL == StringRef::npos) ? 0 : (PrevNL + 1);
	// FIXME: officially character counts UTF-16 code units, not UTF-8 bytes!
	Position Pos;			Position Pos;
	Pos.line = Lines;			Pos.line = Lines;
	Pos.character = static_cast<int>(Offset - StartOfLine);			Pos.character = utf16Len(Before.substr(StartOfLine));
	return Pos;			return Pos;
	}			}

	Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc) {			Position sourceLocToPosition(const SourceManager &SM, SourceLocation Loc) {
				// We use the SourceManager's line tables, but its column number is in bytes.
				FileID FID;
				unsigned Offset;
				std::tie(FID, Offset) = SM.getDecomposedSpellingLoc(Loc);
	Position P;			Position P;
	P.line = static_cast<int>(SM.getSpellingLineNumber(Loc)) - 1;			P.line = static_cast<int>(SM.getLineNumber(FID, Offset)) - 1;
	P.character = static_cast<int>(SM.getSpellingColumnNumber(Loc)) - 1;			bool Invalid = false;
				StringRef Code = SM.getBufferData(FID, &Invalid);
				if (!Invalid) {
				auto ColumnInBytes = SM.getColumnNumber(FID, Offset) - 1;
				auto LineSoFar = Code.substr(Offset - ColumnInBytes, ColumnInBytes);
				P.character = utf16Len(LineSoFar);
				}
	return P;			return P;
	}			}

	Range halfOpenToRange(const SourceManager &SM, CharSourceRange R) {			Range halfOpenToRange(const SourceManager &SM, CharSourceRange R) {
	// Clang is 1-based, LSP uses 0-based indexes.			// Clang is 1-based, LSP uses 0-based indexes.
	Position Begin = sourceLocToPosition(SM, R.getBegin());			Position Begin = sourceLocToPosition(SM, R.getBegin());
	Position End = sourceLocToPosition(SM, R.getEnd());			Position End = sourceLocToPosition(SM, R.getEnd());

	return {Begin, End};			return {Begin, End};
	}			}

				std::pair<size_t, size_t> offsetToClangLineColumn(StringRef Code,
				size_t Offset) {
				Offset = std::min(Code.size(), Offset);
				StringRef Before = Code.substr(0, Offset);
				int Lines = Before.count('\n');
				size_t PrevNL = Before.rfind('\n');
				size_t StartOfLine = (PrevNL == StringRef::npos) ? 0 : (PrevNL + 1);
				return {Lines + 1, Offset - StartOfLine + 1};
				}

	std::pair<llvm::StringRef, llvm::StringRef>			std::pair<llvm::StringRef, llvm::StringRef>
	splitQualifiedName(llvm::StringRef QName) {			splitQualifiedName(llvm::StringRef QName) {
	size_t Pos = QName.rfind("::");			size_t Pos = QName.rfind("::");
	if (Pos == llvm::StringRef::npos)			if (Pos == llvm::StringRef::npos)
	return {StringRef(), QName};			return {StringRef(), QName};
	return {QName.substr(0, Pos + 2), QName.substr(Pos + 2)};			return {QName.substr(0, Pos + 2), QName.substr(Pos + 2)};
	}			}

	} // namespace clangd			} // namespace clangd
	} // namespace clang			} // namespace clang

clang-tools-extra/trunk/clangd/XRefs.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	makeLocation(ParsedAST &AST, const SourceRange &ValSourceRange) {
L.range = R;		L.range = R;
return L;		return L;
}		}

} // namespace		} // namespace

std::vector<Location> findDefinitions(ParsedAST &AST, Position Pos) {		std::vector<Location> findDefinitions(ParsedAST &AST, Position Pos) {
const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();		const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();
const FileEntry *FE = SourceMgr.getFileEntryForID(SourceMgr.getMainFileID());		SourceLocation SourceLocationBeg =
if (!FE)		getBeginningOfIdentifier(AST, Pos, SourceMgr.getMainFileID());
return {};

SourceLocation SourceLocationBeg = getBeginningOfIdentifier(AST, Pos, FE);

std::vector<Location> Result;		std::vector<Location> Result;
// Handle goto definition for #include.		// Handle goto definition for #include.
for (auto &IncludeLoc : AST.getInclusionLocations()) {		for (auto &IncludeLoc : AST.getInclusionLocations()) {
Range R = IncludeLoc.first;		Range R = IncludeLoc.first;
Position Pos = sourceLocToPosition(SourceMgr, SourceLocationBeg);		Position Pos = sourceLocToPosition(SourceMgr, SourceLocationBeg);

if (R.contains(Pos))		if (R.contains(Pos))
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	private:
}		}
};		};

} // namespace		} // namespace

std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,		std::vector<DocumentHighlight> findDocumentHighlights(ParsedAST &AST,
Position Pos) {		Position Pos) {
const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();		const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();
const FileEntry *FE = SourceMgr.getFileEntryForID(SourceMgr.getMainFileID());		SourceLocation SourceLocationBeg =
if (!FE)		getBeginningOfIdentifier(AST, Pos, SourceMgr.getMainFileID());
return {};

SourceLocation SourceLocationBeg = getBeginningOfIdentifier(AST, Pos, FE);

DeclarationAndMacrosFinder DeclMacrosFinder(llvm::errs(), SourceLocationBeg,		DeclarationAndMacrosFinder DeclMacrosFinder(llvm::errs(), SourceLocationBeg,
AST.getASTContext(),		AST.getASTContext(),
AST.getPreprocessor());		AST.getPreprocessor());
index::IndexingOptions IndexOpts;		index::IndexingOptions IndexOpts;
IndexOpts.SystemSymbolFilter =		IndexOpts.SystemSymbolFilter =
index::IndexingOptions::SystemSymbolFilterKind::All;		index::IndexingOptions::SystemSymbolFilterKind::All;
IndexOpts.IndexFunctionLocals = true;		IndexOpts.IndexFunctionLocals = true;
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	static Hover getHoverContents(StringRef MacroName) {
H.contents.value = "#define ";		H.contents.value = "#define ";
H.contents.value += MacroName;		H.contents.value += MacroName;

return H;		return H;
}		}

Hover getHover(ParsedAST &AST, Position Pos) {		Hover getHover(ParsedAST &AST, Position Pos) {
const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();		const SourceManager &SourceMgr = AST.getASTContext().getSourceManager();
const FileEntry *FE = SourceMgr.getFileEntryForID(SourceMgr.getMainFileID());		SourceLocation SourceLocationBeg =
if (FE == nullptr)		getBeginningOfIdentifier(AST, Pos, SourceMgr.getMainFileID());
return Hover();

SourceLocation SourceLocationBeg = getBeginningOfIdentifier(AST, Pos, FE);
DeclarationAndMacrosFinder DeclMacrosFinder(llvm::errs(), SourceLocationBeg,		DeclarationAndMacrosFinder DeclMacrosFinder(llvm::errs(), SourceLocationBeg,
AST.getASTContext(),		AST.getASTContext(),
AST.getPreprocessor());		AST.getPreprocessor());

index::IndexingOptions IndexOpts;		index::IndexingOptions IndexOpts;
IndexOpts.SystemSymbolFilter =		IndexOpts.SystemSymbolFilter =
index::IndexingOptions::SystemSymbolFilterKind::All;		index::IndexingOptions::SystemSymbolFilterKind::All;
IndexOpts.IndexFunctionLocals = true;		IndexOpts.IndexFunctionLocals = true;
Show All 17 Lines

clang-tools-extra/trunk/test/clangd/rename.test

	Show All 30 Lines
	# CHECK-NEXT: "code": -32603,			# CHECK-NEXT: "code": -32603,
	# CHECK-NEXT: "message": "clang diagnostic"			# CHECK-NEXT: "message": "clang diagnostic"
	# CHECK-NEXT: },			# CHECK-NEXT: },
	# CHECK-NEXT: "id": 2,			# CHECK-NEXT: "id": 2,
	# CHECK-NEXT: "jsonrpc": "2.0"			# CHECK-NEXT: "jsonrpc": "2.0"
	---			---
	{"jsonrpc":"2.0","id":3,"method":"shutdown"}			{"jsonrpc":"2.0","id":3,"method":"shutdown"}
	---			---
	{"jsonrpc":"2.0":"method":"exit"}			{"jsonrpc":"2.0","method":"exit"}

clang-tools-extra/trunk/unittests/clangd/ClangdUnitTests.cpp

//===-- ClangdUnitTests.cpp - ClangdUnit tests ------------------- C++ --===//		//===-- ClangdUnitTests.cpp - ClangdUnit tests ------------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "Annotations.h"		#include "Annotations.h"
#include "ClangdUnit.h"		#include "ClangdUnit.h"
		#include "SourceCode.h"
#include "TestFS.h"		#include "TestFS.h"
#include "clang/Frontend/CompilerInvocation.h"		#include "clang/Frontend/CompilerInvocation.h"
#include "clang/Frontend/PCHContainerOperations.h"		#include "clang/Frontend/PCHContainerOperations.h"
#include "clang/Frontend/Utils.h"		#include "clang/Frontend/Utils.h"
#include "llvm/Support/ScopedPrinter.h"		#include "llvm/Support/ScopedPrinter.h"
#include "gmock/gmock.h"		#include "gmock/gmock.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"

▲ Show 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	main.cpp:2:3: error: something terrible happened)");
});		});

EXPECT_THAT(		EXPECT_THAT(
LSPDiags,		LSPDiags,
ElementsAre(Pair(EqualToLSPDiag(MainLSP), ElementsAre(EqualToFix(F))),		ElementsAre(Pair(EqualToLSPDiag(MainLSP), ElementsAre(EqualToFix(F))),
Pair(EqualToLSPDiag(NoteInMainLSP), IsEmpty())));		Pair(EqualToLSPDiag(NoteInMainLSP), IsEmpty())));
}		}

		TEST(ClangdUnitTest, GetBeginningOfIdentifier) {
		// First ^ is the expected beginning, last is the search position.
		for (const char *Text : {
		"int ^f^oo();", // inside identifier
		"int ^foo();", // beginning of identifier
		"int ^foo^();", // end of identifier
		"int foo(^);", // non-identifier
		"^int foo();", // beginning of file (can't back up)
		"int ^f0^0();", // after a digit (lexing at N-1 is wrong)
		"int ^λλ^λ();", // UTF-8 handled properly when backing up
		}) {
		Annotations TestCase(Text);
		auto AST = build(TestCase.code());
		const auto &SourceMgr = AST.getASTContext().getSourceManager();
		SourceLocation Actual = getBeginningOfIdentifier(
		AST, TestCase.points().back(), SourceMgr.getMainFileID());
		Position ActualPos =
		offsetToPosition(TestCase.code(), SourceMgr.getFileOffset(Actual));
		EXPECT_EQ(TestCase.points().front(), ActualPos) << Text;
		}
		}

} // namespace		} // namespace
} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

clang-tools-extra/trunk/unittests/clangd/DraftStoreTests.cpp

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	TEST(DraftStoreIncrementalUpdateTest, StartCharOutOfRange) {
Change.range->end.line = 0;		Change.range->end.line = 0;
Change.range->end.character = 100;		Change.range->end.character = 100;
Change.text = "foo";		Change.text = "foo";

llvm::Expected<std::string> Result = DS.updateDraft(File, {Change});		llvm::Expected<std::string> Result = DS.updateDraft(File, {Change});

EXPECT_TRUE(!Result);		EXPECT_TRUE(!Result);
EXPECT_EQ(llvm::toString(Result.takeError()),		EXPECT_EQ(llvm::toString(Result.takeError()),
"Character value is out of range (100)");		"UTF-16 offset 100 is invalid for line 0");
}		}

TEST(DraftStoreIncrementalUpdateTest, EndCharOutOfRange) {		TEST(DraftStoreIncrementalUpdateTest, EndCharOutOfRange) {
DraftStore DS;		DraftStore DS;
Path File = "foo.cpp";		Path File = "foo.cpp";

DS.addDraft(File, "int main() {}\n");		DS.addDraft(File, "int main() {}\n");

TextDocumentContentChangeEvent Change;		TextDocumentContentChangeEvent Change;
Change.range.emplace();		Change.range.emplace();
Change.range->start.line = 0;		Change.range->start.line = 0;
Change.range->start.character = 0;		Change.range->start.character = 0;
Change.range->end.line = 0;		Change.range->end.line = 0;
Change.range->end.character = 100;		Change.range->end.character = 100;
Change.text = "foo";		Change.text = "foo";

llvm::Expected<std::string> Result = DS.updateDraft(File, {Change});		llvm::Expected<std::string> Result = DS.updateDraft(File, {Change});

EXPECT_TRUE(!Result);		EXPECT_TRUE(!Result);
EXPECT_EQ(llvm::toString(Result.takeError()),		EXPECT_EQ(llvm::toString(Result.takeError()),
"Character value is out of range (100)");		"UTF-16 offset 100 is invalid for line 0");
}		}

TEST(DraftStoreIncrementalUpdateTest, StartLineOutOfRange) {		TEST(DraftStoreIncrementalUpdateTest, StartLineOutOfRange) {
DraftStore DS;		DraftStore DS;
Path File = "foo.cpp";		Path File = "foo.cpp";

DS.addDraft(File, "int main() {}\n");		DS.addDraft(File, "int main() {}\n");

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	TEST(DraftStoreIncrementalUpdateTest, InvalidRangeInASequence) {
Change2.range->end.line = 0;		Change2.range->end.line = 0;
Change2.range->end.character = 100;		Change2.range->end.character = 100;
Change2.text = "something";		Change2.text = "something";

llvm::Expected<std::string> Result = DS.updateDraft(File, {Change1, Change2});		llvm::Expected<std::string> Result = DS.updateDraft(File, {Change1, Change2});

EXPECT_TRUE(!Result);		EXPECT_TRUE(!Result);
EXPECT_EQ(llvm::toString(Result.takeError()),		EXPECT_EQ(llvm::toString(Result.takeError()),
"Character value is out of range (100)");		"UTF-16 offset 100 is invalid for line 0");

llvm::Optional<std::string> Contents = DS.getDraft(File);		llvm::Optional<std::string> Contents = DS.getDraft(File);
EXPECT_TRUE(Contents);		EXPECT_TRUE(Contents);
EXPECT_EQ(*Contents, OriginalContents);		EXPECT_EQ(*Contents, OriginalContents);
}		}

} // namespace		} // namespace
} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

clang-tools-extra/trunk/unittests/clangd/SourceCodeTests.cpp

Show All 18 Lines

using llvm::Failed;		using llvm::Failed;
using llvm::HasValue;		using llvm::HasValue;

MATCHER_P2(Pos, Line, Col, "") {		MATCHER_P2(Pos, Line, Col, "") {
return arg.line == Line && arg.character == Col;		return arg.line == Line && arg.character == Col;
}		}

		// The = → 🡆 below are ASCII (1 byte), BMP (3 bytes), and astral (4 bytes).
const char File[] = R"(0:0 = 0		const char File[] = R"(0:0 = 0
1:0 = 8		1:0 → 8
2:0 = 16)";		2:0 🡆 18)";

/// A helper to make tests easier to read.		/// A helper to make tests easier to read.
Position position(int line, int character) {		Position position(int line, int character) {
Position Pos;		Position Pos;
Pos.line = line;		Pos.line = line;
Pos.character = character;		Pos.character = character;
return Pos;		return Pos;
}		}
Show All 23 Lines	EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, -1)),
Failed()); // out of range		Failed()); // out of range
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 0)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 0)),
HasValue(8)); // first character		HasValue(8)); // first character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 3)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 3)),
HasValue(11)); // middle character		HasValue(11)); // middle character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 3), false),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 3), false),
HasValue(11));		HasValue(11));
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 6)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 6)),
HasValue(14)); // last character		HasValue(16)); // last character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 7)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 7)),
HasValue(15)); // the newline itself		HasValue(17)); // the newline itself
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 8)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 8)),
HasValue(15)); // out of range		HasValue(17)); // out of range
EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 8), false),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(1, 8), false),
Failed()); // out of range		Failed()); // out of range
// last line		// last line
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, -1)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, -1)),
Failed()); // out of range		Failed()); // out of range
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 0)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 0)),
HasValue(16)); // first character		HasValue(18)); // first character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 3)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 3)),
HasValue(19)); // middle character		HasValue(21)); // middle character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 7)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 5), false),
HasValue(23)); // last character		Failed()); // middle of surrogate pair
		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 5)),
		HasValue(26)); // middle of surrogate pair
		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 6), false),
		HasValue(26)); // end of surrogate pair
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 8)),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 8)),
HasValue(24)); // EOF		HasValue(28)); // last character
EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 9), false),		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 9)),
		HasValue(29)); // EOF
		EXPECT_THAT_EXPECTED(positionToOffset(File, position(2, 10), false),
Failed()); // out of range		Failed()); // out of range
// line out of bounds		// line out of bounds
EXPECT_THAT_EXPECTED(positionToOffset(File, position(3, 0)), Failed());		EXPECT_THAT_EXPECTED(positionToOffset(File, position(3, 0)), Failed());
EXPECT_THAT_EXPECTED(positionToOffset(File, position(3, 1)), Failed());		EXPECT_THAT_EXPECTED(positionToOffset(File, position(3, 1)), Failed());
}		}

TEST(SourceCodeTests, OffsetToPosition) {		TEST(SourceCodeTests, OffsetToPosition) {
EXPECT_THAT(offsetToPosition(File, 0), Pos(0, 0)) << "start of file";		EXPECT_THAT(offsetToPosition(File, 0), Pos(0, 0)) << "start of file";
EXPECT_THAT(offsetToPosition(File, 3), Pos(0, 3)) << "in first line";		EXPECT_THAT(offsetToPosition(File, 3), Pos(0, 3)) << "in first line";
EXPECT_THAT(offsetToPosition(File, 6), Pos(0, 6)) << "end of first line";		EXPECT_THAT(offsetToPosition(File, 6), Pos(0, 6)) << "end of first line";
EXPECT_THAT(offsetToPosition(File, 7), Pos(0, 7)) << "first newline";		EXPECT_THAT(offsetToPosition(File, 7), Pos(0, 7)) << "first newline";
EXPECT_THAT(offsetToPosition(File, 8), Pos(1, 0)) << "start of second line";		EXPECT_THAT(offsetToPosition(File, 8), Pos(1, 0)) << "start of second line";
EXPECT_THAT(offsetToPosition(File, 11), Pos(1, 3)) << "in second line";		EXPECT_THAT(offsetToPosition(File, 12), Pos(1, 4)) << "before BMP char";
EXPECT_THAT(offsetToPosition(File, 14), Pos(1, 6)) << "end of second line";		EXPECT_THAT(offsetToPosition(File, 13), Pos(1, 5)) << "in BMP char";
EXPECT_THAT(offsetToPosition(File, 15), Pos(1, 7)) << "second newline";		EXPECT_THAT(offsetToPosition(File, 15), Pos(1, 5)) << "after BMP char";
EXPECT_THAT(offsetToPosition(File, 16), Pos(2, 0)) << "start of last line";		EXPECT_THAT(offsetToPosition(File, 16), Pos(1, 6)) << "end of second line";
EXPECT_THAT(offsetToPosition(File, 19), Pos(2, 3)) << "in last line";		EXPECT_THAT(offsetToPosition(File, 17), Pos(1, 7)) << "second newline";
EXPECT_THAT(offsetToPosition(File, 23), Pos(2, 7)) << "end of last line";		EXPECT_THAT(offsetToPosition(File, 18), Pos(2, 0)) << "start of last line";
EXPECT_THAT(offsetToPosition(File, 24), Pos(2, 8)) << "EOF";		EXPECT_THAT(offsetToPosition(File, 21), Pos(2, 3)) << "in last line";
EXPECT_THAT(offsetToPosition(File, 25), Pos(2, 8)) << "out of bounds";		EXPECT_THAT(offsetToPosition(File, 22), Pos(2, 4)) << "before astral char";
		EXPECT_THAT(offsetToPosition(File, 24), Pos(2, 6)) << "in astral char";
		EXPECT_THAT(offsetToPosition(File, 26), Pos(2, 6)) << "after astral char";
		EXPECT_THAT(offsetToPosition(File, 28), Pos(2, 8)) << "end of last line";
		EXPECT_THAT(offsetToPosition(File, 29), Pos(2, 9)) << "EOF";
		EXPECT_THAT(offsetToPosition(File, 30), Pos(2, 9)) << "out of bounds";
}		}

} // namespace		} // namespace
} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Fix unicode handling, using UTF-16 where LSP requires it.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 144312

clang-tools-extra/trunk/clangd/ClangdServer.cpp

clang-tools-extra/trunk/clangd/ClangdUnit.h

clang-tools-extra/trunk/clangd/ClangdUnit.cpp

clang-tools-extra/trunk/clangd/CodeComplete.cpp

clang-tools-extra/trunk/clangd/Protocol.h

clang-tools-extra/trunk/clangd/SourceCode.h

clang-tools-extra/trunk/clangd/SourceCode.cpp

clang-tools-extra/trunk/clangd/XRefs.cpp

clang-tools-extra/trunk/test/clangd/rename.test

clang-tools-extra/trunk/unittests/clangd/ClangdUnitTests.cpp

clang-tools-extra/trunk/unittests/clangd/DraftStoreTests.cpp

clang-tools-extra/trunk/unittests/clangd/SourceCodeTests.cpp

[clangd] Fix unicode handling, using UTF-16 where LSP requires it.
ClosedPublic