Download Raw Diff

Details

Reviewers

dblaikie

Commits

rG78e949159d10: [Demangle][Rust] Print special namespaces

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tmiasko created this revision.May 4 2021, 2:02 AM

Herald added subscribers: JDevlieghere, hiraditya. · View Herald TranscriptMay 4 2021, 2:02 AM

tmiasko requested review of this revision.May 4 2021, 2:02 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2021, 2:02 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tschuett added a subscriber: tschuett.May 4 2021, 2:17 AM

Harbormaster completed remote builds in B102484: Diff 342674.May 4 2021, 2:35 AM

Any chance this could be subdivided further to implement only one grammar rule or the like? It'd make it much easier to inspect the test quality with smaller increments like that.

llvm/include/llvm/Demangle/RustDemangle.h
62–63	We tend not to use 'const' on locals, and maybe especially not on parameters (since it's even more readily confused with const-ref on parameters).
llvm/lib/Demangle/RustDemangle.cpp
209–250	This whole thing is alphabetic except for 'p' - maybe put it in order? Also - maybe use a const char* array indexed on `C - 'a'`? Not necessary by any means, though.
269–298	Perhaps the recursion system could be a scoped device, making it easier to add/less error-prone in case of early returns, etc? Recurse R(Error, RecursionLimit); // or just take *this and befriend for access to Error and RecursionLimit if (!R) return; // Recurse's dtor decrements the RecursionLimit)

Limit changes to M / X / Y path productions
Use SwapAndRestore for recursion limit, printing mode, and parser position.

Update description

Harbormaster completed remote builds in B103009: Diff 343425.May 6 2021, 9:16 AM

Could this be cut down further to be paths OR named types, rather than both? (I know this seems fussy/like a pain & I'm sorry about that - but if it gets down to the smallest thing possible (like adding only one named type and associated tests) I'm hoping it'll be super easy to review and verify the tests/understand how the grammar is implemented, etc - as it stands currently it does take me a while to understand each test case, which set of grammar rules are being tested relative to which different rules are being implemented in the patch, etc)

Print special namespaces

In D101821#2745300, @dblaikie wrote:

Could this be cut down further to be paths OR named types, rather than both?

I reduced the changes further still. To introduce further path productions I need types, and the one already implemented type is a named type, i.e., a path. Grammar limits how much I can break it down, but I will see what I can do.

Harbormaster completed remote builds in B103259: Diff 343762.May 7 2021, 3:13 PM

dblaikie added inline comments.May 7 2021, 5:18 PM

llvm/test/Demangle/rust.test
14–25	Do these benefit from being inside two namespaces? I think it'd be great if features were tested as much in isolation as possible, with maybe a single test demonstrating that a feature works in a particular context (which is really testing the outer production - and should used the simplest instance of the inner production as possible). There's also a rather large and, so far as I can tell, arbitrary, disambiguator ('s21hi0yVfW1J') - is there a particular reason for that being present, and having such a long/complicated value? ie, maybe these tests could be: _RNCC5crate0 _RNCC5crates_0 _RNCNCC5crates_00 _RNCNCC5crates_01g & I'm still thinking maybe some kind of comment would be useful to describe these things, though I'm not sure exactly what. _R N [ C [ C 5crate ] 0 ] _R N [ C [ C 5crate ] [ s _0 ] ] _R N [ C [ N [ C [ C 5crate ] [ s _0 ] ] ] 0 ] In any case - could you simplify down the test cases to make them as narrow and legible/obvious as possible? Maybe add some comments about the invalid/mangled cases - where the invalidity is interesting, for instance? But hopefully if they're simplified way down, the invalidity will be more clear/obvious?

Simplify test cases

llvm/test/Demangle/rust.test
14–25	I used rustc to generate the test cases previously. The extra nesting is necessary at language level. The long disambiguator `s21hi0yVfW1J` is a crate hash. Together with a crate name it forms a unique global identifier of the crate (c++filt shows / omits those hashes output based on `-i` flag, and I would like to implement something similar later on). Neither of those aspect is strictly necessary for testing, so I reduced the test cases, but they no longer correspond to valid Rust programs.

Harbormaster completed remote builds in B103302: Diff 343817.May 8 2021, 1:11 AM

Looks great - thanks for sticking with me/helping make this easier for me to follow.

llvm/test/Demangle/rust.test
33–34	Might be worth consistently using certain easy to identify identifiers? (I at least find it easier to understand the tests when the identifiers are readily.. identifiable - 'crate' is easy to spot in the other examples, for instance - but in an example with single letter identifiers like this ('a' and 'c') can make it a bit harder to read the example - knowing where the user-provided names start/end, which parts aren't from the user identifier) (similarly below in the invalid namespace) I guess in the first few examples the manglings were short/simple enough for me to figure out which component was from what - but as the examples get more complicated it's a bit harder to follow/understand and maybe clearer/long/obvious names would make it a bit easier.

This revision is now accepted and ready to land.May 8 2021, 7:52 PM

Use easy to distinguish identifiers in test cases

Could you commit this for me? Thanks.

Harbormaster completed remote builds in B103381: Diff 343908.May 9 2021, 5:06 AM

This revision was landed with ongoing or failed builds.May 9 2021, 3:46 PM

Closed by commit rG78e949159d10: [Demangle][Rust] Print special namespaces (authored by tmiasko, committed by dblaikie). · Explain Why

This revision was automatically updated to reflect the committed changes.

dblaikie added a commit: rG78e949159d10: [Demangle][Rust] Print special namespaces.

tmiasko mentioned this in D99981: [demangler] Support the new Rust mangling scheme (v0).Jun 23 2021, 3:07 PM

Diff 343937

llvm/include/llvm/Demangle/RustDemangle.h

//===--- RustDemangle.h ------------------------------------------ C++ --===//		//===--- RustDemangle.h ------------------------------------------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_DEMANGLE_RUSTDEMANGLE_H		#ifndef LLVM_DEMANGLE_RUSTDEMANGLE_H
#define LLVM_DEMANGLE_RUSTDEMANGLE_H		#define LLVM_DEMANGLE_RUSTDEMANGLE_H

#include "llvm/Demangle/DemangleConfig.h"		#include "llvm/Demangle/DemangleConfig.h"
#include "llvm/Demangle/StringView.h"		#include "llvm/Demangle/StringView.h"
#include "llvm/Demangle/Utility.h"		#include "llvm/Demangle/Utility.h"
		#include <cstdint>

namespace llvm {		namespace llvm {
namespace rust_demangle {		namespace rust_demangle {

using llvm::itanium_demangle::OutputStream;		using llvm::itanium_demangle::OutputStream;
using llvm::itanium_demangle::StringView;		using llvm::itanium_demangle::StringView;
		using llvm::itanium_demangle::SwapAndRestore;

struct Identifier {		struct Identifier {
StringView Name;		StringView Name;
bool Punycode;		bool Punycode;

bool empty() const { return Name.empty(); }		bool empty() const { return Name.empty(); }
};		};

Show All 18 Lines	public:
Demangler(size_t MaxRecursionLevel = 500);		Demangler(size_t MaxRecursionLevel = 500);

bool demangle(StringView MangledName);		bool demangle(StringView MangledName);

private:		private:
void demanglePath();		void demanglePath();

Identifier parseIdentifier();		Identifier parseIdentifier();
void parseOptionalBase62Number(char Tag);		uint64_t parseOptionalBase62Number(char Tag);
uint64_t parseBase62Number();		uint64_t parseBase62Number();
uint64_t parseDecimalNumber();		uint64_t parseDecimalNumber();

		void print(char C) {
		if (Error)
		return;
		dblaikieUnsubmitted Done Reply Inline Actions We tend not to use 'const' on locals, and maybe especially not on parameters (since it's even more readily confused with const-ref on parameters). dblaikie: We tend not to use 'const' on locals, and maybe especially not on parameters (since it's even…

		Output += C;
		}

void print(StringView S) {		void print(StringView S) {
if (Error)		if (Error)
return;		return;

Output += S;		Output += S;
}		}

		void printDecimalNumber(uint64_t N) {
		if (Error)
		return;

		Output << N;
		}

char look() const {		char look() const {
if (Error \|\| Position >= Input.size())		if (Error \|\| Position >= Input.size())
return 0;		return 0;

return Input[Position];		return Input[Position];
}		}

char consume() {		char consume() {
▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/lib/Demangle/RustDemangle.cpp

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines
	// \| "S" // shim			// \| "S" // shim
	// \| <A-Z> // other special namespaces			// \| <A-Z> // other special namespaces
	// \| <a-z> // internal namespaces			// \| <a-z> // internal namespaces
	void Demangler::demanglePath() {			void Demangler::demanglePath() {
	if (Error \|\| RecursionLevel >= MaxRecursionLevel) {			if (Error \|\| RecursionLevel >= MaxRecursionLevel) {
	Error = true;			Error = true;
	return;			return;
	}			}
	RecursionLevel += 1;			SwapAndRestore<size_t> SaveRecursionLevel(RecursionLevel, RecursionLevel + 1);

	switch (consume()) {			switch (consume()) {
	case 'C': {			case 'C': {
	parseOptionalBase62Number('s');			parseOptionalBase62Number('s');
	Identifier Ident = parseIdentifier();			Identifier Ident = parseIdentifier();
	print(Ident.Name);			print(Ident.Name);
	break;			break;
	}			}
	case 'N': {			case 'N': {
	char NS = consume();			char NS = consume();
	if (!isLower(NS) && !isUpper(NS)) {			if (!isLower(NS) && !isUpper(NS)) {
	Error = true;			Error = true;
	break;			break;
	}			}
	demanglePath();			demanglePath();

	parseOptionalBase62Number('s');			uint64_t Disambiguator = parseOptionalBase62Number('s');
	Identifier Ident = parseIdentifier();			Identifier Ident = parseIdentifier();

				if (isUpper(NS)) {
				// Special namespaces
				print("::{");
				if (NS == 'C')
				print("closure");
				else if (NS == 'S')
				print("shim");
				else
				print(NS);
				if (!Ident.empty()) {
				print(":");
				print(Ident.Name);
				}
				print('#');
				printDecimalNumber(Disambiguator);
				print('}');
				} else {
				// Implementation internal namespaces.
	if (!Ident.empty()) {			if (!Ident.empty()) {
	// FIXME print special namespaces:
	// * "C" closures
	// * "S" shim
	print("::");			print("::");
	print(Ident.Name);			print(Ident.Name);
	}			}
				}
	break;			break;
	}			}
	default:			default:
	// FIXME parse remaining productions.			// FIXME parse remaining productions.
	Error = true;			Error = true;
	break;			break;
	}			}

	RecursionLevel -= 1;
	}			}

	// <undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>			// <undisambiguated-identifier> = ["u"] <decimal-number> ["_"] <bytes>
	Identifier Demangler::parseIdentifier() {			Identifier Demangler::parseIdentifier() {
	bool Punycode = consumeIf('u');			bool Punycode = consumeIf('u');
	uint64_t Bytes = parseDecimalNumber();			uint64_t Bytes = parseDecimalNumber();

	// Underscore resolves the ambiguity when identifier starts with a decimal			// Underscore resolves the ambiguity when identifier starts with a decimal
	// digit or another underscore.			// digit or another underscore.
	consumeIf('_');			consumeIf('_');

	if (Error \|\| Bytes > Input.size() - Position) {			if (Error \|\| Bytes > Input.size() - Position) {
	Error = true;			Error = true;
	return {};			return {};
	}			}
	StringView S = Input.substr(Position, Bytes);			StringView S = Input.substr(Position, Bytes);
	Position += Bytes;			Position += Bytes;

	if (!std::all_of(S.begin(), S.end(), isValid)) {			if (!std::all_of(S.begin(), S.end(), isValid)) {
	Error = true;			Error = true;
	return {};			return {};
	}			}

	return {S, Punycode};			return {S, Punycode};
	}			}

	// Parses optional base 62 number. The presence of a number is determined using			// Parses optional base 62 number. The presence of a number is determined using
	// Tag.			// Tag. Returns 0 when tag is absent and parsed value + 1 otherwise.
	void Demangler::parseOptionalBase62Number(char Tag) {			uint64_t Demangler::parseOptionalBase62Number(char Tag) {
	// Parsing result is currently unused.			if (!consumeIf(Tag))
	if (consumeIf(Tag))			return 0;
	parseBase62Number();
				uint64_t N = parseBase62Number();
				if (Error \|\| !addAssign(N, 1))
				return 0;

				return N;
	}			}

	// Parses base 62 number with <0-9a-zA-Z> as digits. Number is terminated by			// Parses base 62 number with <0-9a-zA-Z> as digits. Number is terminated by
	// "_". All values are offset by 1, so that "_" encodes 0, "0_" encodes 1,			// "_". All values are offset by 1, so that "_" encodes 0, "0_" encodes 1,
	// "1_" encodes 2, etc.			// "1_" encodes 2, etc.
	//			//
	// <base-62-number> = {<0-9a-zA-Z>} "_"			// <base-62-number> = {<0-9a-zA-Z>} "_"
	uint64_t Demangler::parseBase62Number() {			uint64_t Demangler::parseBase62Number() {
	if (consumeIf('_'))			if (consumeIf('_'))
	return 0;			return 0;

	uint64_t Value = 0;			uint64_t Value = 0;

	while (true) {			while (true) {
	uint64_t Digit;			uint64_t Digit;
	char C = consume();			char C = consume();

	if (C == '_') {			if (C == '_') {
	break;			break;
	} else if (isDigit(C)) {			} else if (isDigit(C)) {
	Digit = C - '0';			Digit = C - '0';
	} else if (isLower(C)) {			} else if (isLower(C)) {
	Digit = 10 + (C - 'a');			Digit = 10 + (C - 'a');
	} else if (isUpper(C)) {			} else if (isUpper(C)) {
	Digit = 10 + 26 + (C - 'A');			Digit = 10 + 26 + (C - 'A');
	} else {			} else {
	Error = true;			Error = true;
	return 0;			return 0;
	}			}
				dblaikieUnsubmitted Not Done Reply Inline Actions This whole thing is alphabetic except for 'p' - maybe put it in order? Also - maybe use a const char* array indexed on `C - 'a'`? Not necessary by any means, though. dblaikie: This whole thing is alphabetic except for 'p' - maybe put it in order? Also - maybe use a…

	if (!mulAssign(Value, 62))			if (!mulAssign(Value, 62))
	return 0;			return 0;

	if (!addAssign(Value, Digit))			if (!addAssign(Value, Digit))
	return 0;			return 0;
	}			}

	if (!addAssign(Value, 1))			if (!addAssign(Value, 1))
	return 0;			return 0;

	return Value;			return Value;
	}			}

	// Parses a decimal number that had been encoded without any leading zeros.			// Parses a decimal number that had been encoded without any leading zeros.
	//			//
	// <decimal-number> = "0"			// <decimal-number> = "0"
	// \| <1-9> {<0-9>}			// \| <1-9> {<0-9>}
	uint64_t Demangler::parseDecimalNumber() {			uint64_t Demangler::parseDecimalNumber() {
	char C = look();			char C = look();
	if (!isDigit(C)) {			if (!isDigit(C)) {
	Error = true;			Error = true;
	return 0;			return 0;
	}			}

	if (C == '0') {			if (C == '0') {
	consume();			consume();
	return 0;			return 0;
	}			}

	uint64_t Value = 0;			uint64_t Value = 0;

	while (isDigit(look())) {			while (isDigit(look())) {
	if (!mulAssign(Value, 10)) {			if (!mulAssign(Value, 10)) {
	Error = true;			Error = true;
	return 0;			return 0;
	}			}

	uint64_t D = consume() - '0';			uint64_t D = consume() - '0';
	if (!addAssign(Value, D))			if (!addAssign(Value, D))
	return 0;			return 0;
	}			}

	return Value;			return Value;
	}			}

llvm/test/Demangle/rust.test

	RUN: llvm-cxxfilt -n < %s \| FileCheck --match-full-lines %s			RUN: llvm-cxxfilt -n < %s \| FileCheck --match-full-lines %s

	CHECK: a::main			CHECK: a::main
	_RNvC1a4main			_RNvC1a4main

	CHECK: hello::rust			CHECK: hello::rust
	_RNvCshGpAVYOtgW1_5hello4rust			_RNvCshGpAVYOtgW1_5hello4rust

	CHECK: a::b::c			CHECK: a::b::c
	_RNvNvC1a1b1c			_RNvNvC1a1b1c

				; Closure namespace

				CHECK: crate::{closure#0}
				_RNCC5crate0

				CHECK: crate::{closure#1}
				_RNCC5crates_0

				CHECK: crate::{closure:foo#0}
				_RNCC5crate3foo

				CHECK: crate::{closure:foo#1}
				_RNCC5crates_3foo

				dblaikieUnsubmitted Not Done Reply Inline Actions Do these benefit from being inside two namespaces? I think it'd be great if features were tested as much in isolation as possible, with maybe a single test demonstrating that a feature works in a particular context (which is really testing the outer production - and should used the simplest instance of the inner production as possible). There's also a rather large and, so far as I can tell, arbitrary, disambiguator ('s21hi0yVfW1J') - is there a particular reason for that being present, and having such a long/complicated value? ie, maybe these tests could be: _RNCC5crate0 _RNCC5crates_0 _RNCNCC5crates_00 _RNCNCC5crates_01g & I'm still thinking maybe some kind of comment would be useful to describe these things, though I'm not sure exactly what. _R N [ C [ C 5crate ] 0 ] _R N [ C [ C 5crate ] [ s _0 ] ] _R N [ C [ N [ C [ C 5crate ] [ s _0 ] ] ] 0 ] In any case - could you simplify down the test cases to make them as narrow and legible/obvious as possible? Maybe add some comments about the invalid/mangled cases - where the invalidity is interesting, for instance? But hopefully if they're simplified way down, the invalidity will be more clear/obvious? dblaikie: Do these benefit from being inside two namespaces? I think it'd be great if features were…
				tmiaskoAuthorUnsubmitted Done Reply Inline Actions I used rustc to generate the test cases previously. The extra nesting is necessary at language level. The long disambiguator `s21hi0yVfW1J` is a crate hash. Together with a crate name it forms a unique global identifier of the crate (c++filt shows / omits those hashes output based on `-i` flag, and I would like to implement something similar later on). Neither of those aspect is strictly necessary for testing, so I reduced the test cases, but they no longer correspond to valid Rust programs. tmiasko: I used rustc to generate the test cases previously. The extra nesting is necessary at language…
				; Shim namespace

				CHECK: crate::{shim:reify#0}
				_RNSC5crate5reify

				; Unrecognized special namespace

				CHECK: crate::{Z:ident#10}
				_RNZC5crates8_5ident
				dblaikieUnsubmitted Not Done Reply Inline Actions Might be worth consistently using certain easy to identify identifiers? (I at least find it easier to understand the tests when the identifiers are readily.. identifiable - 'crate' is easy to spot in the other examples, for instance - but in an example with single letter identifiers like this ('a' and 'c') can make it a bit harder to read the example - knowing where the user-provided names start/end, which parts aren't from the user identifier) (similarly below in the invalid namespace) I guess in the first few examples the manglings were short/simple enough for me to figure out which component was from what - but as the examples get more complicated it's a bit harder to follow/understand and maybe clearer/long/obvious names would make it a bit easier. dblaikie: Might be worth consistently using certain easy to identify identifiers? (I at least find it…

	; Invalid mangled characters			; Invalid mangled characters

	CHECK: _RNvC2a.1c			CHECK: _RNvC2a.1c
	_RNvC2a.1c			_RNvC2a.1c

	CHECK: _RNvC2a$1c			CHECK: _RNvC2a$1c
	_RNvC2a$1c			_RNvC2a$1c

				; Invalid namespace (not in [a-zA-Z]).

				CHECK: _RN_C5crate4main
				_RN_C5crate4main

	; Invalid identifier length (UINT64_MAX + 3, which happens to be ok after a wraparound).			; Invalid identifier length (UINT64_MAX + 3, which happens to be ok after a wraparound).

	CHECK: _RNvC2ab18446744073709551618xy			CHECK: _RNvC2ab18446744073709551618xy
	_RNvC2ab18446744073709551618xy			_RNvC2ab18446744073709551618xy

	; Mangling scheme includes an optional encoding version. When present it would			; Mangling scheme includes an optional encoding version. When present it would
	; indicate an encoding we don't support yet. Check that it is rejected:			; indicate an encoding we don't support yet. Check that it is rejected:

	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Demangle][Rust] Print special namespaces
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 343937

llvm/include/llvm/Demangle/RustDemangle.h

llvm/lib/Demangle/RustDemangle.cpp

llvm/test/Demangle/rust.test

This is an archive of the discontinued LLVM Phabricator instance.

[Demangle][Rust] Print special namespacesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 343937

llvm/include/llvm/Demangle/RustDemangle.h

llvm/lib/Demangle/RustDemangle.cpp

llvm/test/Demangle/rust.test

[Demangle][Rust] Print special namespaces
ClosedPublic