This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Frontend/
-
Frontend/
-
TextDiagnostic.cpp
-
test/Misc/
-
Misc/
-
diag-utf8.cpp

Differential D33765

Show correct column nr. when multi-byte utf8 chars are used.
Needs ReviewPublic

Authored by erikjv on Jun 1 2017, 3:14 AM.

Download Raw Diff

Details

Reviewers

bkramer
klimek

Summary

Previously, the column number in a diagnostic would be the byte position
in the line. This results in incorrect column numbers when a multi-byte
UTF-8 character would be present in the input. This change corrects for
those multi-byte characters and for zero-length diacritic marks.

This fixes PR21144.

Diff Detail

Event Timeline

erikjv created this revision.Jun 1 2017, 3:14 AM

Correctly counting columns is a bit more complicated that that... for example, consider what happens if you replace ideëen with idez̈en. See https://stackoverflow.com/questions/3634627/how-to-know-the-preferred-display-width-in-columns-of-unicode-characters .

erikjv updated this revision to Diff 117660.Oct 4 2017, 5:28 AM

erikjv edited the summary of this revision. (Show Details)

yvvan added a subscriber: yvvan.Oct 25 2017, 11:38 PM

I didn't really search for it before, but it looks like LLVM already has a routine for computing column widths? See llvm::sys::unicode::columnWidthUTF8.

There are some tools which parse clang diagnostic output; we might need a flag to control this. Not sure who would know about that?

lib/Basic/SourceManager.cpp
1501 ↗	(On Diff #117660)	Instead of adding a parameter to getColumnNumber, it would probably make sense to just make this caller correct the column number afterwards.

I moved all code to the TextDiagnostics, so all other interfaces still get byte offsets.

Still worried about the effect on tools which parse clang diagnostics... please send a message to cfe-dev. Hopefully we'll get responses there.

Godin added a subscriber: Godin.May 22 2018, 8:29 AM

lelf added a subscriber: lelf.May 19 2019, 7:07 PM

Revision Contents

Path

Size

lib/

Frontend/

TextDiagnostic.cpp

23 lines

test/

Misc/

diag-utf8.cpp

10 lines

Diff 124903

lib/Frontend/TextDiagnostic.cpp

Show All 13 Lines
#include "clang/Basic/SourceManager.h"		#include "clang/Basic/SourceManager.h"
#include "clang/Lex/Lexer.h"		#include "clang/Lex/Lexer.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/ConvertUTF.h"		#include "llvm/Support/ConvertUTF.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Locale.h"		#include "llvm/Support/Locale.h"
#include "llvm/Support/Path.h"		#include "llvm/Support/Path.h"
		#include "llvm/Support/Unicode.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>

using namespace clang;		using namespace clang;

static const enum raw_ostream::Colors noteColor =		static const enum raw_ostream::Colors noteColor =
raw_ostream::BLACK;		raw_ostream::BLACK;
static const enum raw_ostream::Colors remarkColor =		static const enum raw_ostream::Colors remarkColor =
▲ Show 20 Lines • Show All 783 Lines • ▼ Show 20 Lines	void TextDiagnostic::emitDiagnosticLoc(FullSourceLoc Loc, PresumedLoc PLoc,
case DiagnosticOptions::Clang: OS << ':' << LineNo; break;		case DiagnosticOptions::Clang: OS << ':' << LineNo; break;
case DiagnosticOptions::MSVC: OS << '(' << LineNo; break;		case DiagnosticOptions::MSVC: OS << '(' << LineNo; break;
case DiagnosticOptions::Vi: OS << " +" << LineNo; break;		case DiagnosticOptions::Vi: OS << " +" << LineNo; break;
}		}

if (DiagOpts->ShowColumn)		if (DiagOpts->ShowColumn)
// Compute the column number.		// Compute the column number.
if (unsigned ColNo = PLoc.getColumn()) {		if (unsigned ColNo = PLoc.getColumn()) {
		// Correct the column number for multi-byte UTF-8 code-points.
		bool Invalid = false;
		StringRef BufData = Loc.getBufferData(&Invalid);
		if (!Invalid) {
		const char *BufStart = BufData.data();
		const char *BufEnd = BufStart + BufData.size();

		// Decompose the location into a FID/Offset pair.
		std::pair<FileID, unsigned> LocInfo = Loc.getDecomposedLoc();
		FileID FID = LocInfo.first;
		const SourceManager &SM = Loc.getManager();
		const char *LineStart =
		BufStart +
		SM.getDecomposedLoc(SM.translateLineCol(FID, LineNo, 1)).second;
		if (LineStart + ColNo < BufEnd) {
		StringRef SourceLine(LineStart, ColNo);
		int CorrectedColNo = llvm::sys::unicode::columnWidthUTF8(SourceLine);
		if (CorrectedColNo != -1)
		ColNo = unsigned(CorrectedColNo);
		}
		}

if (DiagOpts->getFormat() == DiagnosticOptions::MSVC) {		if (DiagOpts->getFormat() == DiagnosticOptions::MSVC) {
OS << ',';		OS << ',';
// Visual Studio 2010 or earlier expects column number to be off by one		// Visual Studio 2010 or earlier expects column number to be off by one
if (LangOpts.MSCompatibilityVersion &&		if (LangOpts.MSCompatibilityVersion &&
!LangOpts.isCompatibleWithMSVC(LangOptions::MSVC2012))		!LangOpts.isCompatibleWithMSVC(LangOptions::MSVC2012))
ColNo--;		ColNo--;
} else		} else
OS << ':';		OS << ':';
▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

test/Misc/diag-utf8.cpp

This file was added.

				// RUN: not %clang_cc1 -fsyntax-only %s 2>&1 \| FileCheck %s

				struct Foo { int member; };

				void f(Foo foo)
				{
				"ideeen" << foo; // CHECK: {{.[/\\]}}diag-utf8.cpp:7:14: error: invalid operands to binary expression ('const char ' and 'Foo')
				"ideëen" << foo; // CHECK: {{.[/\\]}}diag-utf8.cpp:8:14: error: invalid operands to binary expression ('const char ' and 'Foo')
				"idez̈en" << foo; // CHECK: {{.[/\\]}}diag-utf8.cpp:9:14: error: invalid operands to binary expression ('const char ' and 'Foo')
				}