This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
source/Plugins/Language/CPlusPlus/
-
Plugins/
-
Language/
-
CPlusPlus/
-
CMakeLists.txt
-
CPlusPlusLanguage.h
5
CPlusPlusLanguage.cpp
3/4
CPlusPlusNameParser.h
3/10
CPlusPlusNameParser.cpp
-
unittests/Language/CPlusPlus/
-
Language/
-
CPlusPlus/
1/2
CPlusPlusLanguageTest.cpp

Differential D31451

New C++ function name parsing logic
ClosedPublic

Authored by eugene on Mar 28 2017, 8:36 PM.

Download Raw Diff

Details

Reviewers

zturner
jingham
labath

Commits

rGa633ee6e4a02: New C++ function name parsing logic (Resubmit)
rG699a748893d6: New C++ function name parsing logic
rLLDB299721: New C++ function name parsing logic (Resubmit)
rLLDB299374: New C++ function name parsing logic
rL299721: New C++ function name parsing logic (Resubmit)
rL299374: New C++ function name parsing logic

Summary

Current implementation of CPlusPlusLanguage::MethodName::Parse() doesn't get anywhere close to covering full extent of possible function declarations.
It causes incorrect behavior in avoid-stepping and sometimes messes printing of thread backtrace.

This change implements more methodical parsing logic based on clang lexer and simple recursive parser.

Examples:
void std::vector<Class, std::allocator<Class> >::_M_emplace_back_aux<Class const&>(Class const&)
void (*&std::_Any_data::_M_access<void (*)()>())()

Diff Detail

Event Timeline

eugene created this revision.Mar 28 2017, 8:36 PM

eugene created this object with visibility "eugene (Eugene Zemtsov)".

eugene created this object with edit policy "eugene (Eugene Zemtsov)".

Improve template method parsing accuracy.

Herald added a subscriber: mgorny. · View Herald TranscriptMar 29 2017, 7:01 PM

I cant help but feel that this could have been done in a simpler way, but then again, some of the cases you have dug up are quite tricky.
I think we should do some performance measurements to see whether this needs more optimising or it's fine as is.

I propose the following benchmark:
bin/lldb bin/clang

make sure clang is statically linked

breakpoint set --name non_exisiting_function

Clang needs to be statically linked so we can access all its symbols without running it (which would skew the benchmark) -- this is the reason we cannot use lldb itself as most of its symbols are in liblldb.

Setting the breakpoint on a nonexistent function avoids us timing the breakpoint setting machinery, while still getting every symbol in the executable parsed.

If the performance is ok i am quite happy with this, apart from some stylistic nits.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
210	How about the following api: if (auto function = CPlusPlusNameParser::ParseAsFunctionDefinition(m_full.GetStringRef())) { ...
267	Same here
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
63	Wouldn't it be better to change the type of m_next_token_index to size_t?
189	Is this really the case for the types we are interested in? I would have hoped that this would get simplified to `std::enable_if<true, bool> before it reaches us?
573	Could this be written as: `m_text.take_front(end_pos).drop_front(start_pos)`?
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h
34	I think we dont put m_ for fields of dumb structs that are meant to be accessed directly.
95	Please make the type move-only. Otherwise you will have a fun time debugging accidental copies. (You already have one, although it is benign now)
unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
105	Would defining operator<< for std::ostream and StringRef ( local tomthis test) enable you to get rid of these? I've ran into this before but was never annoyed enough to actually do it.. :/

Addressing code-review comments.
Most notable change: MethodName::Parse() tries simple version of name parser, before invoking full power of CPlusPlusNameParser. It really helps with the perf.

I think we should do some performance measurements to see whether this needs more optimising or it's fine as is.

I propose the following benchmark:
bin/lldb bin/clang
make sure clang is statically linked
breakpoint set --name non_exisiting_function

Setting the breakpoint on a nonexistent function avoids us timing the breakpoint setting machinery, while still getting every symbol in the executable parsed.

On the clang breakpoint benchmark you proposed, it is hard to notice much of a difference. clang binary has about 500k functions.
With new parser it takes about 11s to try to set a breakpoint, on the old one it's about 10s. (release version of LLDB, debug static version of clang)

I decided to use a hybrid approach, when we use old parsing code for simple cases and call new parser only when it fails.
80% of clang functions are simple enough that we don't really need the new parser, so it helped to bring clang breakpoint test back to 10s.
I think it's reasonable to assume that similar distribution is true for most programs, and most of their functions can be parsed with the old code.

I don't think we can really improve performance of a new parser without giving up on clang::Lexer, and I'm reluctant to do it.
So I propose to keep hybrid approach and call a new parser only for complicated cases.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
210	If you don't mind I'll leave it as it is. I understand that it's very tempting to have two simple functions ParseAsFunctionDefinition and ParseAsFullName instead of a class, but I can imagine calling second one if first one fails, and in this case it'll be good that parser doesn't need to tokenize string all over again.
267	see above
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
63	Unsigned types are dangerous. I prefer to avoid them wherever I can. As far as I remember many members of C++ standard commit publicly regretted the fact that .size() returns an unsigned type.
189	Yes. I was very surprised as well. But apparently compiler sometimes is being lazy and doesn't simplify everything :( Running: ~$ nm -C --defined-only clang \| grep std::enable_if Gives a few symbols like this: std::enable_if<(10u)<(64), bool>::type llvm::isUInt<10u>(unsigned long)
573	Thanks! I still have a lot to learn about llvm classes.
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h
95	Good catch! Thanks!
unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp
105	I tried to to it, but it didn't work for me after a few tries. I decided to just convert it to std::string, it solves the same problem.

In D31451#714850, @eugene wrote:

I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s)
clang::Lexer appears to be the slowest part of it.

On the clang breakpoint benchmark you proposed, it is hard to notice much of a difference. clang binary has about 500k functions.
With new parser it takes about 11s to try to set a breakpoint, on the old one it's about 10s. (release version of LLDB, debug static version of clang)

I decided to use a hybrid approach, when we use old parsing code for simple cases and call new parser only when it fails.
80% of clang functions are simple enough that we don't really need the new parser, so it helped to bring clang breakpoint test back to 10s.
I think it's reasonable to assume that similar distribution is true for most programs, and most of their functions can be parsed with the old code.

I don't think we can really improve performance of a new parser without giving up on clang::Lexer, and I'm reluctant to do it.
So I propose to keep hybrid approach and call a new parser only for complicated cases.

I like that idea. Let's go with that.

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp
210	Ok, that makes sense -- I didn't expect that would actually work. I suppose one can always write `CPlusPlusNameParser(foo).ParseAsFunctionDefinition()` when he wants to make it a one-liner and doesn't care about tokenization reuse.
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
19	Are these necessary? You seem to prefix every occurence of Optional and None anyway...
63	Well... they are most dangerous when you try to combine them with signed types, which is exactly what you are doing now... :) It's also contrary to how things are done in other parts of the code base and goes against the principle of having as few nonsensical values for your variables as possible. So, I still think this (and all other variables you use for token indexes) should be size_t.
258	You don't need to go `ConstString` here. If you wanted to avoid strlen computation, just make this `constexpr StringLiteral`.
source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h
95	Please also delete the assignment operator.. Having copy constructor deleted and assignment operator working is confusing,

One note on benchmarking: A did some performance profiling on LLDB using a similar approach to what Pavel suggested and if I remember correctly only ~10% of the time was spent on C++ name parsing (~15% was C++ demangling, ~50% was debug_info parsing, rest of them was fairly well distributed). Because of this I think some targeted micro benchmark will be much more useful to measure the performance of this code then an end-to-end test as an e2e test would have low signal to noise ratio.

Addressing review commnets

eugene updated this revision to Diff 93694.Mar 31 2017, 12:52 PM

eugene marked an inline comment as done.

In D31451#715649, @tberghammer wrote:

Because of this I think some targeted micro benchmark will be much more useful to measure the performance of this code then an end-to-end test as an e2e test would have low signal to noise ratio.

I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s)
clang::Lexer appears to be the slowest part of it.
I mitigate this performance loss, by calling simplified parsing code for simple cases and calling new parser only when the old one fails.

source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp
19	Well, I used None. Now I use Optional as well.

Thank you

In D31451#715664, @eugene wrote:

In D31451#715649, @tberghammer wrote:

Because of this I think some targeted micro benchmark will be much more useful to measure the performance of this code then an end-to-end test as an e2e test would have low signal to noise ratio.

I did some micro-benchmarking and on average new parser is ~3 time slower than the old one. (new parser - ~200k string/s, old parser - ~700k string/s)
clang::Lexer appears to be the slowest part of it.
I mitigate this performance loss, by calling simplified parsing code for simple cases and calling new parser only when the old one fails.

It was pretty clear that the new parser will be slower than the old one, even if I couldn't tell whether it would be 2x or 20x. That's why I wanted a macro benchmark to see whether that matters on the grand scale of things. If you say that 10% of time is name parsing, then we definitely don't want to make that 30%, which means the decision to use two parsers was correct.

This revision is now accepted and ready to land.Mar 31 2017, 1:58 PM

Closed by commit rL299374: New C++ function name parsing logic (authored by eugene). · Explain WhyApr 3 2017, 12:12 PM

This revision was automatically updated to reflect the committed changes.

eugene mentioned this in rL299721: New C++ function name parsing logic (Resubmit).Apr 6 2017, 3:48 PM

I'm sorry, I don't have time actually review the code here for correctness... But can you make sure that this also rejects a two or three field selector, not just "selector:" but "selector:otherField:"? That seems sufficiently different that you might get the one : but not the two : form right. You could test 3 & more colons, but at that point it's probably overkill.

This code is making debugging of large C++ apps so slow it is unusable...

Herald added subscribers: llvm-commits, hintonda. · View Herald TranscriptJan 24 2018, 10:23 AM

This is part of the problem, but not the entire thing.. We had a mangled name:
_ZNK3shk6detail17CallbackPublisherIZNS_5ThrowERKNSt15__exception_ptr13exception_ptrEEUlOT_E_E9SubscribeINS0_9ConcatMapINS0_18CallbackSubscriberIZNS_6GetAllIiNS1_IZZNS_9ConcatMapIZNS_6ConcatIJNS1_IZZNS_3MapIZZNS_7IfEmptyIS9_EEDaS7_ENKUlS6_E_clINS1_IZZNS_4TakeIiEESI_S7_ENKUlS6_E_clINS1_IZZNS_6FilterIZNS_9ElementAtEmEUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZZNSL_ImEESI_S7_ENKUlS6_E_clINS1_IZNS_4FromINS0_22InfiniteRangeContainerIiEEEESI_S7_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESI_S6_EUlS7_E_EESI_S7_ENKUlS6_E_clIS14_EESI_S6_EUlS7_E_EERNS1_IZZNSH_IS9_EESI_S7_ENKSK_IS14_EESI_S6_EUlS7_E0_EEEEESI_DpOT_EUlS7_E_EESI_S7_ENKUlS6_E_clINS1_IZNS_5StartIJZNS_4JustIJS19_S1C_EEESI_S1F_EUlvE_ZNS1K_IJS19_S1C_EEESI_S1F_EUlvE0_EEESI_S1F_EUlS7_E_EEEESI_S6_EUlS7_E_EEEESt6vectorIS6_SaIS6_EERKT0_NS_12ElementCountEbEUlS7_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlOS3_E_ZNSD_IiS1Q_EES1T_S1W_S1X_bEUlvE_EES1G_S1O_E25ConcatMapValuesSubscriberEEEDaS7_

It demangles to something that is 72MB in size!!! So yes, this code would take a while to parse all of that. So we need a way to deal with large C++ strings more effectively without having slow performance.

Greg, this name is amazing. My c++filt refuses to demangle it. We can probably give up on parsing C++ names if they're longer than 1000 characters or something.

The funny thing is this is only 981 characters long... Not sure what the right cutoff would be...

This code is processing demangled names. Since you say (I could not get my demangler to process it either) the symbol demangles to a multi-megabyte name, we can probably make the cutoff even longer then 1000 bytes.

OTOH, if we abort demangling of such names in the first place, then this code will not even get used.

In D31451#987537, @labath wrote:

This code is processing demangled names. Since you say (I could not get my demangler to process it either) the symbol demangles to a multi-megabyte name, we can probably make the cutoff even longer then 1000 bytes.

Problem is we don't have a cutoff when using the libc++ demangler. There is no such features. c++filt just calls into the system (abi::__cxa_demangle(mangled_name, NULL, NULL, NULL)) demangler. We seem to now try our fast demangler, and then fall back to "llvm::itaniumDemangle(...)".

We would switch over to using llvm::itaniumDemangle() all the time and then we could modify this call to have a max length. I believe the code inside llvm::itaniumDemangle is currently an exact local copy of abi::cxa_demangle(), so it would involve some maintenance as they try to keep in sync with abi::cxa_demangle() if we modify it...

OTOH, if we abort demangling of such names in the first place, then this code will not even get used.

The main question is how do we know what we should demangle and what we shouldn't _before_ we try to demangle it. No easy answer to that.

Revision Contents

Path

Size

source/

Plugins/

Language/

CPlusPlus/

CMakeLists.txt

1 line

CPlusPlusLanguage.h

19 lines

CPlusPlusLanguage.cpp

171 lines

CPlusPlusNameParser.h

176 lines

CPlusPlusNameParser.cpp

614 lines

unittests/

Language/

CPlusPlus/

CPlusPlusLanguageTest.cpp

124 lines

Diff 93693

source/Plugins/Language/CPlusPlus/CMakeLists.txt

	add_lldb_library(lldbPluginCPlusPlusLanguage PLUGIN			add_lldb_library(lldbPluginCPlusPlusLanguage PLUGIN
	BlockPointer.cpp			BlockPointer.cpp
	CPlusPlusLanguage.cpp			CPlusPlusLanguage.cpp
				CPlusPlusNameParser.cpp
	CxxStringTypes.cpp			CxxStringTypes.cpp
	LibCxx.cpp			LibCxx.cpp
	LibCxxAtomic.cpp			LibCxxAtomic.cpp
	LibCxxInitializerList.cpp			LibCxxInitializerList.cpp
	LibCxxList.cpp			LibCxxList.cpp
	LibCxxMap.cpp			LibCxxMap.cpp
	LibCxxUnorderedMap.cpp			LibCxxUnorderedMap.cpp
	LibCxxVector.cpp			LibCxxVector.cpp
	Show All 14 Lines

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.h

Show All 23 Lines
#include "lldb/lldb-private.h"		#include "lldb/lldb-private.h"

namespace lldb_private {		namespace lldb_private {

class CPlusPlusLanguage : public Language {		class CPlusPlusLanguage : public Language {
public:		public:
class MethodName {		class MethodName {
public:		public:
enum Type {
eTypeInvalid,
eTypeUnknownMethod,
eTypeClassMethod,
eTypeInstanceMethod
};

MethodName()		MethodName()
: m_full(), m_basename(), m_context(), m_arguments(), m_qualifiers(),		: m_full(), m_basename(), m_context(), m_arguments(), m_qualifiers(),
m_type(eTypeInvalid), m_parsed(false), m_parse_error(false) {}		m_parsed(false), m_parse_error(false) {}

MethodName(const ConstString &s)		MethodName(const ConstString &s)
: m_full(s), m_basename(), m_context(), m_arguments(), m_qualifiers(),		: m_full(s), m_basename(), m_context(), m_arguments(), m_qualifiers(),
m_type(eTypeInvalid), m_parsed(false), m_parse_error(false) {}		m_parsed(false), m_parse_error(false) {}

void Clear();		void Clear();

bool IsValid() {		bool IsValid() {
if (!m_parsed)		if (!m_parsed)
Parse();		Parse();
if (m_parse_error)		if (m_parse_error)
return false;		return false;
if (m_type == eTypeInvalid)
return false;
return (bool)m_full;		return (bool)m_full;
}		}

Type GetType() const { return m_type; }

const ConstString &GetFullName() const { return m_full; }		const ConstString &GetFullName() const { return m_full; }

std::string GetScopeQualifiedName();		std::string GetScopeQualifiedName();

llvm::StringRef GetBasename();		llvm::StringRef GetBasename();

llvm::StringRef GetContext();		llvm::StringRef GetContext();

llvm::StringRef GetArguments();		llvm::StringRef GetArguments();

llvm::StringRef GetQualifiers();		llvm::StringRef GetQualifiers();

protected:		protected:
void Parse();		void Parse();
		bool TrySimplifiedParse();

ConstString m_full; // Full name:		ConstString m_full; // Full name:
// "lldb::SBTarget::GetBreakpointAtIndex(unsigned int)		// "lldb::SBTarget::GetBreakpointAtIndex(unsigned int)
// const"		// const"
llvm::StringRef m_basename; // Basename: "GetBreakpointAtIndex"		llvm::StringRef m_basename; // Basename: "GetBreakpointAtIndex"
llvm::StringRef m_context; // Decl context: "lldb::SBTarget"		llvm::StringRef m_context; // Decl context: "lldb::SBTarget"
llvm::StringRef m_arguments; // Arguments: "(unsigned int)"		llvm::StringRef m_arguments; // Arguments: "(unsigned int)"
llvm::StringRef m_qualifiers; // Qualifiers: "const"		llvm::StringRef m_qualifiers; // Qualifiers: "const"
Type m_type;
bool m_parsed;		bool m_parsed;
bool m_parse_error;		bool m_parse_error;
};		};

CPlusPlusLanguage() = default;		CPlusPlusLanguage() = default;

~CPlusPlusLanguage() override = default;		~CPlusPlusLanguage() override = default;

Show All 24 Lines	public:

// Extract C++ context and identifier from a string using heuristic matching		// Extract C++ context and identifier from a string using heuristic matching
// (as opposed to		// (as opposed to
// CPlusPlusLanguage::MethodName which has to have a fully qualified C++ name		// CPlusPlusLanguage::MethodName which has to have a fully qualified C++ name
// with parens and arguments.		// with parens and arguments.
// If the name is a lone C identifier (e.g. C) or a qualified C identifier		// If the name is a lone C identifier (e.g. C) or a qualified C identifier
// (e.g. A::B::C) it will return true,		// (e.g. A::B::C) it will return true,
// and identifier will be the identifier (C and C respectively) and the		// and identifier will be the identifier (C and C respectively) and the
// context will be "" and "A::B::" respectively.		// context will be "" and "A::B" respectively.
// If the name fails the heuristic matching for a qualified or unqualified		// If the name fails the heuristic matching for a qualified or unqualified
// C/C++ identifier, then it will return false		// C/C++ identifier, then it will return false
// and identifier and context will be unchanged.		// and identifier and context will be unchanged.

static bool ExtractContextAndIdentifier(const char *name,		static bool ExtractContextAndIdentifier(const char *name,
llvm::StringRef &context,		llvm::StringRef &context,
llvm::StringRef &identifier);		llvm::StringRef &identifier);

Show All 29 Lines

source/Plugins/Language/CPlusPlus/CPlusPlusLanguage.cpp

Show All 15 Lines
// C++ Includes		// C++ Includes
#include <functional>		#include <functional>
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <set>		#include <set>

// Other libraries and framework includes		// Other libraries and framework includes
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Threading.h"

// Project includes		// Project includes
#include "lldb/Core/PluginManager.h"		#include "lldb/Core/PluginManager.h"
#include "lldb/Core/UniqueCStringMap.h"		#include "lldb/Core/UniqueCStringMap.h"
#include "lldb/DataFormatters/CXXFunctionPointer.h"		#include "lldb/DataFormatters/CXXFunctionPointer.h"
#include "lldb/DataFormatters/DataVisualization.h"		#include "lldb/DataFormatters/DataVisualization.h"
#include "lldb/DataFormatters/FormattersHelpers.h"		#include "lldb/DataFormatters/FormattersHelpers.h"
#include "lldb/DataFormatters/VectorType.h"		#include "lldb/DataFormatters/VectorType.h"
#include "lldb/Utility/ConstString.h"		#include "lldb/Utility/ConstString.h"
#include "lldb/Utility/FastDemangle.h"		#include "lldb/Utility/FastDemangle.h"
#include "lldb/Utility/Log.h"		#include "lldb/Utility/Log.h"
#include "lldb/Utility/RegularExpression.h"		#include "lldb/Utility/RegularExpression.h"

#include "BlockPointer.h"		#include "BlockPointer.h"
		#include "CPlusPlusNameParser.h"
#include "CxxStringTypes.h"		#include "CxxStringTypes.h"
#include "LibCxx.h"		#include "LibCxx.h"
#include "LibCxxAtomic.h"		#include "LibCxxAtomic.h"
#include "LibStdcpp.h"		#include "LibStdcpp.h"

using namespace lldb;		using namespace lldb;
using namespace lldb_private;		using namespace lldb_private;
using namespace lldb_private::formatters;		using namespace lldb_private::formatters;
Show All 33 Lines
}		}

void CPlusPlusLanguage::MethodName::Clear() {		void CPlusPlusLanguage::MethodName::Clear() {
m_full.Clear();		m_full.Clear();
m_basename = llvm::StringRef();		m_basename = llvm::StringRef();
m_context = llvm::StringRef();		m_context = llvm::StringRef();
m_arguments = llvm::StringRef();		m_arguments = llvm::StringRef();
m_qualifiers = llvm::StringRef();		m_qualifiers = llvm::StringRef();
m_type = eTypeInvalid;
m_parsed = false;		m_parsed = false;
m_parse_error = false;		m_parse_error = false;
}		}

bool ReverseFindMatchingChars(const llvm::StringRef &s,		static bool ReverseFindMatchingChars(const llvm::StringRef &s,
const llvm::StringRef &left_right_chars,		const llvm::StringRef &left_right_chars,
size_t &left_pos, size_t &right_pos,		size_t &left_pos, size_t &right_pos,
size_t pos = llvm::StringRef::npos) {		size_t pos = llvm::StringRef::npos) {
assert(left_right_chars.size() == 2);		assert(left_right_chars.size() == 2);
left_pos = llvm::StringRef::npos;		left_pos = llvm::StringRef::npos;
const char left_char = left_right_chars[0];		const char left_char = left_right_chars[0];
const char right_char = left_right_chars[1];		const char right_char = left_right_chars[1];
pos = s.find_last_of(left_right_chars, pos);		pos = s.find_last_of(left_right_chars, pos);
if (pos == llvm::StringRef::npos \|\| s[pos] == left_char)		if (pos == llvm::StringRef::npos \|\| s[pos] == left_char)
return false;		return false;
right_pos = pos;		right_pos = pos;
Show All 9 Lines	if (s[pos] == left_char) {
}		}
} else if (s[pos] == right_char) {		} else if (s[pos] == right_char) {
++depth;		++depth;
}		}
}		}
return false;		return false;
}		}

static bool IsValidBasename(const llvm::StringRef &basename) {		static bool IsTrivialBasename(const llvm::StringRef &basename) {
// Check that the basename matches with the following regular expression or is		// Check that the basename matches with the following regular expression
// an operator name:		// "^~?([A-Za-z_][A-Za-z_0-9]*)$"
// "^~?([A-Za-z_][A-Za-z_0-9])(<.>)?$"
// We are using a hand written implementation because it is significantly more		// We are using a hand written implementation because it is significantly more
// efficient then		// efficient then
// using the general purpose regular expression library.		// using the general purpose regular expression library.
size_t idx = 0;		size_t idx = 0;
if (basename.size() > 0 && basename[0] == '~')		if (basename.size() > 0 && basename[0] == '~')
idx = 1;		idx = 1;

if (basename.size() <= idx)		if (basename.size() <= idx)
Show All 10 Lines	if (!std::isalnum(basename[idx]) && basename[idx] != '_')
break;		break;
++idx;		++idx;
}		}

// We processed all characters. It is a vaild basename.		// We processed all characters. It is a vaild basename.
if (idx == basename.size())		if (idx == basename.size())
return true;		return true;

// Check for basename with template arguments
// TODO: Improve the quality of the validation with validating the template
// arguments
if (basename[idx] == '<' && basename.back() == '>')
return true;

// Check if the basename is a vaild C++ operator name
if (!basename.startswith("operator"))
return false;		return false;

static RegularExpression g_operator_regex(
llvm::StringRef("^(operator)( "
"?)([A-Za-z_][A-Za-z_0-9]*\|\$\$\|"
"\\[\\]\|[\\^<>=!\\/"
"+-]+)(<.>)?(\\[\\])?$"));
std::string basename_str(basename.str());
return g_operator_regex.Execute(basename_str, nullptr);
}		}

void CPlusPlusLanguage::MethodName::Parse() {		bool CPlusPlusLanguage::MethodName::TrySimplifiedParse() {
if (!m_parsed && m_full) {		// This method tries to parse simple method definitions
// ConstString mangled;		// which are presumably most comman in user programs.
// m_full.GetMangledCounterpart(mangled);		// Definitions that can be parsed by this function don't have return types
// printf ("\n parsing = '%s'\n", m_full.GetCString());		// and templates in the name.
// if (mangled)		// A::B::C::fun(std::vector<T> &) const
// printf (" mangled = '%s'\n", mangled.GetCString());
m_parse_error = false;
m_parsed = true;
llvm::StringRef full(m_full.GetCString());

size_t arg_start, arg_end;		size_t arg_start, arg_end;
		llvm::StringRef full(m_full.GetCString());
llvm::StringRef parens("()", 2);		llvm::StringRef parens("()", 2);
if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) {		if (ReverseFindMatchingChars(full, parens, arg_start, arg_end)) {
m_arguments = full.substr(arg_start, arg_end - arg_start + 1);		m_arguments = full.substr(arg_start, arg_end - arg_start + 1);
if (arg_end + 1 < full.size())		if (arg_end + 1 < full.size())
m_qualifiers = full.substr(arg_end + 1);		m_qualifiers = full.substr(arg_end + 1).ltrim();
if (arg_start > 0) {
		if (arg_start == 0)
		return false;
size_t basename_end = arg_start;		size_t basename_end = arg_start;
size_t context_start = 0;		size_t context_start = 0;
size_t context_end = llvm::StringRef::npos;		size_t context_end = full.rfind(':', basename_end);
if (basename_end > 0 && full[basename_end - 1] == '>') {
// TODO: handle template junk...
// Templated function
size_t template_start, template_end;
llvm::StringRef lt_gt("<>", 2);
if (ReverseFindMatchingChars(full, lt_gt, template_start,
template_end, basename_end)) {
// Check for templated functions that include return type like:
// 'void foo<Int>()'
context_start = full.rfind(' ', template_start);
if (context_start == llvm::StringRef::npos)
context_start = 0;
else
++context_start;

context_end = full.rfind(':', template_start);
if (context_end == llvm::StringRef::npos \|\|
context_end < context_start)
context_end = context_start;
} else {
context_end = full.rfind(':', basename_end);
}
} else if (context_end == llvm::StringRef::npos) {
context_end = full.rfind(':', basename_end);
}

if (context_end == llvm::StringRef::npos)		if (context_end == llvm::StringRef::npos)
m_basename = full.substr(0, basename_end);		m_basename = full.substr(0, basename_end);
else {		else {
if (context_start < context_end)		if (context_start < context_end)
m_context =		m_context = full.substr(context_start, context_end - 1 - context_start);
full.substr(context_start, context_end - 1 - context_start);
const size_t basename_begin = context_end + 1;		const size_t basename_begin = context_end + 1;
m_basename =		m_basename = full.substr(basename_begin, basename_end - basename_begin);
full.substr(basename_begin, basename_end - basename_begin);
}
m_type = eTypeUnknownMethod;
} else {
m_parse_error = true;
return;
}		}

if (!IsValidBasename(m_basename)) {		if (IsTrivialBasename(m_basename)) {
		return true;
		} else {
// The C++ basename doesn't match our regular expressions so this can't		// The C++ basename doesn't match our regular expressions so this can't
// be a valid C++ method, clear everything out and indicate an error		// be a valid C++ method, clear everything out and indicate an error
m_context = llvm::StringRef();		m_context = llvm::StringRef();
m_basename = llvm::StringRef();		m_basename = llvm::StringRef();
m_arguments = llvm::StringRef();		m_arguments = llvm::StringRef();
m_qualifiers = llvm::StringRef();		m_qualifiers = llvm::StringRef();
m_parse_error = true;		return false;
}		}
		}
		return false;
		}

		void CPlusPlusLanguage::MethodName::Parse() {
		if (!m_parsed && m_full) {
		if (TrySimplifiedParse()) {
		m_parse_error = false;
		} else {
		CPlusPlusNameParser parser(m_full.GetStringRef());
		if (auto function = parser.ParseAsFunctionDefinition()) {
		m_basename = function.getValue().name.basename;
		m_context = function.getValue().name.context;
		m_arguments = function.getValue().arguments;
		m_qualifiers = function.getValue().qualifiers;
		m_parse_error = false;
} else {		} else {
m_parse_error = true;		m_parse_error = true;
}		}
		labathUnsubmitted Not Done Reply Inline Actions How about the following api: if (auto function = CPlusPlusNameParser::ParseAsFunctionDefinition(m_full.GetStringRef())) { ... labath: How about the following api: ``` if (auto function = CPlusPlusNameParser…
		eugeneAuthorUnsubmitted Not Done Reply Inline Actions If you don't mind I'll leave it as it is. I understand that it's very tempting to have two simple functions ParseAsFunctionDefinition and ParseAsFullName instead of a class, but I can imagine calling second one if first one fails, and in this case it'll be good that parser doesn't need to tokenize string all over again. eugene: If you don't mind I'll leave it as it is. I understand that it's very tempting to have two…
		labathUnsubmitted Not Done Reply Inline Actions Ok, that makes sense -- I didn't expect that would actually work. I suppose one can always write `CPlusPlusNameParser(foo).ParseAsFunctionDefinition()` when he wants to make it a one-liner and doesn't care about tokenization reuse. labath: Ok, that makes sense -- I didn't expect that would actually work. I suppose one can always…
}		}
		m_parsed = true;
		}
}		}

llvm::StringRef CPlusPlusLanguage::MethodName::GetBasename() {		llvm::StringRef CPlusPlusLanguage::MethodName::GetBasename() {
if (!m_parsed)		if (!m_parsed)
Parse();		Parse();
return m_basename;		return m_basename;
}		}

Show All 13 Lines	llvm::StringRef CPlusPlusLanguage::MethodName::GetQualifiers() {
if (!m_parsed)		if (!m_parsed)
Parse();		Parse();
return m_qualifiers;		return m_qualifiers;
}		}

std::string CPlusPlusLanguage::MethodName::GetScopeQualifiedName() {		std::string CPlusPlusLanguage::MethodName::GetScopeQualifiedName() {
if (!m_parsed)		if (!m_parsed)
Parse();		Parse();
if (m_basename.empty() \|\| m_context.empty())		if (m_context.empty())
return std::string();		return m_basename;

std::string res;		std::string res;
res += m_context;		res += m_context;
res += "::";		res += "::";
res += m_basename;		res += m_basename;

return res;		return res;
}		}

bool CPlusPlusLanguage::IsCPPMangledName(const char *name) {		bool CPlusPlusLanguage::IsCPPMangledName(const char *name) {
// FIXME, we should really run through all the known C++ Language plugins and		// FIXME, we should really run through all the known C++ Language plugins and
// ask each one if		// ask each one if
// this is a C++ mangled name, but we can put that off till there is actually		// this is a C++ mangled name, but we can put that off till there is actually
// more than one		// more than one
// we care about.		// we care about.

return (name != nullptr && name[0] == '_' && name[1] == 'Z');		return (name != nullptr && name[0] == '_' && name[1] == 'Z');
}		}

bool CPlusPlusLanguage::ExtractContextAndIdentifier(		bool CPlusPlusLanguage::ExtractContextAndIdentifier(
const char *name, llvm::StringRef &context, llvm::StringRef &identifier) {		const char *name, llvm::StringRef &context, llvm::StringRef &identifier) {
static RegularExpression g_basename_regex(llvm::StringRef(		CPlusPlusNameParser parser(name);
"^(([A-Za-z_][A-Za-z_0-9]::))(~?[A-Za-z_~][A-Za-z_0-9]*)$"));		if (auto full_name = parser.ParseAsFullName()) {
RegularExpression::Match match(4);		identifier = full_name.getValue().basename;
		labathUnsubmitted Not Done Reply Inline Actions Same here labath: Same here
		eugeneAuthorUnsubmitted Not Done Reply Inline Actions see above eugene: see above
if (g_basename_regex.Execute(llvm::StringRef::withNullAsEmpty(name),		context = full_name.getValue().context;
&match)) {
match.GetMatchAtIndex(name, 1, context);
match.GetMatchAtIndex(name, 3, identifier);
return true;		return true;
}		}
return false;		return false;
}		}

class CPPRuntimeEquivalents {		class CPPRuntimeEquivalents {
public:		public:
CPPRuntimeEquivalents() {		CPPRuntimeEquivalents() {
▲ Show 20 Lines • Show All 853 Lines • Show Last 20 Lines

source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.h

This file was added.

				//===-- CPlusPlusNameParser.h ------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#ifndef liblldb_CPlusPlusNameParser_h_
				#define liblldb_CPlusPlusNameParser_h_

				// C Includes
				// C++ Includes

				// Other libraries and framework includes
				#include "clang/Lex/Lexer.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/ADT/StringRef.h"

				// Project includes
				#include "lldb/Utility/ConstString.h"
				#include "lldb/lldb-private.h"

				namespace lldb_private {

				// Helps to validate and obtain various parts of C++ definitions.
				class CPlusPlusNameParser {
				public:
				CPlusPlusNameParser(llvm::StringRef text) : m_text(text) { ExtractTokens(); }

				struct ParsedName {
				llvm::StringRef basename;
				labathUnsubmitted Done Reply Inline Actions I think we dont put m_ for fields of dumb structs that are meant to be accessed directly. labath: I think we dont put m_ for fields of dumb structs that are meant to be accessed directly.
				llvm::StringRef context;
				};

				struct ParsedFunction {
				ParsedName name;
				llvm::StringRef arguments;
				llvm::StringRef qualifiers;
				};

				// Treats given text as a function definition and parses it.
				// Function definition might or might not have a return type and this should
				// change parsing result.
				// Examples:
				// main(int, chat const*)
				// T fun(int, bool)
				// std::vector<int>::push_back(int)
				// int& map<int, pair<short, int>>::operator[](short) const
				// int (get_function(const chat ))()
				llvm::Optional<ParsedFunction> ParseAsFunctionDefinition();

				// Treats given text as a potentially nested name of C++ entity (function,
				// class, field) and parses it.
				// Examples:
				// main
				// fun
				// std::vector<int>::push_back
				// map<int, pair<short, int>>::operator[]
				// func<C>(int, C&)::nested_class::method
				llvm::Optional<ParsedName> ParseAsFullName();

				private:
				// A C++ definition to parse.
				llvm::StringRef m_text;
				// Tokens extracted from m_text.
				llvm::SmallVector<clang::Token, 30> m_tokens;
				// Index of the next token to look at from m_tokens.
				size_t m_next_token_index = 0;

				// Range of tokens saved in m_next_token_index.
				struct Range {
				size_t begin_index = 0;
				size_t end_index = 0;

				Range() {}
				Range(size_t begin, size_t end) : begin_index(begin), end_index(end) {
				assert(end >= begin);
				}

				size_t size() const { return end_index - begin_index; }

				bool empty() const { return size() == 0; }
				};

				struct ParsedNameRanges {
				Range basename_range;
				Range context_range;
				};

				// Bookmark automatically restores parsing position (m_next_token_index)
				// when destructed unless it's manually removed with Remove().
				class Bookmark {
				labathUnsubmitted Done Reply Inline Actions Please make the type move-only. Otherwise you will have a fun time debugging accidental copies. (You already have one, although it is benign now) labath: Please make the type move-only. Otherwise you will have a fun time debugging accidental copies.
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions Good catch! Thanks! eugene: Good catch! Thanks!
				labathUnsubmitted Done Reply Inline Actions Please also delete the assignment operator.. Having copy constructor deleted and assignment operator working is confusing, labath: Please also delete the assignment operator.. Having copy constructor deleted and assignment…
				public:
				Bookmark(size_t &position)
				: m_position(position), m_position_value(position) {}
				Bookmark(const Bookmark &) = delete;
				Bookmark(Bookmark &&b)
				: m_position(b.m_position), m_position_value(b.m_position_value),
				m_restore(b.m_restore) {
				b.Remove();
				}
				Bookmark &operator=(Bookmark &&) = delete;
				Bookmark &operator=(const Bookmark &) = delete;

				void Remove() { m_restore = false; }
				size_t GetSavedPosition() { return m_position_value; }
				~Bookmark() {
				if (m_restore) {
				m_position = m_position_value;
				}
				}

				private:
				size_t &m_position;
				size_t m_position_value;
				bool m_restore = true;
				};

				bool HasMoreTokens();
				void Advance();
				void TakeBack();
				bool ConsumeToken(clang::tok::TokenKind kind);
				template <typename... Ts> bool ConsumeToken(Ts... kinds);
				Bookmark SetBookmark();
				size_t GetCurrentPosition();
				clang::Token &Peek();
				bool ConsumeBrackets(clang::tok::TokenKind left, clang::tok::TokenKind right);

				llvm::Optional<ParsedFunction> ParseFunctionImpl(bool expect_return_type);

				// Parses functions returning function pointers 'string (*f(int x))(float y)'
				llvm::Optional<ParsedFunction> ParseFuncPtr(bool expect_return_type);

				// Consumes function arguments enclosed within '(' ... ')'
				bool ConsumeArguments();

				// Consumes template arguments enclosed within '<' ... '>'
				bool ConsumeTemplateArgs();

				// Consumes '(anonymous namespace)'
				bool ConsumeAnonymousNamespace();

				// Consumes operator declaration like 'operator *' or 'operator delete []'
				bool ConsumeOperator();

				// Skips 'const' and 'volatile'
				void SkipTypeQualifiers();

				// Skips 'const', 'volatile', '&', '&&' in the end of the function.
				void SkipFunctionQualifiers();

				// Consumes built-in types like 'int' or 'unsigned long long int'
				bool ConsumeBuiltinType();

				// Skips 'const' and 'volatile'
				void SkipPtrsAndRefs();

				// Consumes things like 'const * const &'
				bool ConsumePtrsAndRefs();

				// Consumes full type name like 'Namespace::Class<int>::Method()::InnerClass'
				bool ConsumeTypename();

				llvm::Optional<ParsedNameRanges> ParseFullNameImpl();
				llvm::StringRef GetTextForRange(const Range &range);

				// Populate m_tokens by calling clang lexer on m_text.
				void ExtractTokens();
				};

				} // namespace lldb_private

				#endif // liblldb_CPlusPlusNameParser_h_

source/Plugins/Language/CPlusPlus/CPlusPlusNameParser.cpp

This file was added.

				//===-- CPlusPlusNameParser.cpp ---------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//

				#include "CPlusPlusNameParser.h"

				#include "clang/Basic/IdentifierTable.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/Support/Threading.h"

				using namespace lldb;
				using namespace lldb_private;
				using llvm::Optional;
				using llvm::None;
				labathUnsubmitted Not Done Reply Inline Actions Are these necessary? You seem to prefix every occurence of Optional and None anyway... labath: Are these necessary? You seem to prefix every occurence of Optional and None anyway...
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions Well, I used None. Now I use Optional as well. eugene: Well, I used None. Now I use Optional as well.
				using ParsedFunction = lldb_private::CPlusPlusNameParser::ParsedFunction;
				using ParsedName = lldb_private::CPlusPlusNameParser::ParsedName;
				namespace tok = clang::tok;

				Optional<ParsedFunction> CPlusPlusNameParser::ParseAsFunctionDefinition() {
				m_next_token_index = 0;
				Optional<ParsedFunction> result(None);

				// Try to parse the name as function without a return type specified
				// e.g. main(int, char*[])
				{
				Bookmark start_position = SetBookmark();
				result = ParseFunctionImpl(false);
				if (result && !HasMoreTokens())
				return result;
				}

				// Try to parse the name as function with function pointer return type
				// e.g. void (get_func(const char))()
				result = ParseFuncPtr(true);
				if (result)
				return result;

				// Finally try to parse the name as a function with non-function return type
				// e.g. int main(int, char*[])
				result = ParseFunctionImpl(true);
				return result;
				}

				Optional<ParsedName> CPlusPlusNameParser::ParseAsFullName() {
				m_next_token_index = 0;
				Optional<ParsedNameRanges> name_ranges = ParseFullNameImpl();
				if (!name_ranges)
				return None;
				ParsedName result;
				result.basename = GetTextForRange(name_ranges.getValue().basename_range);
				result.context = GetTextForRange(name_ranges.getValue().context_range);
				return result;
				}

				bool CPlusPlusNameParser::HasMoreTokens() {
				return m_next_token_index < m_tokens.size();
				}

				labathUnsubmitted Not Done Reply Inline Actions Wouldn't it be better to change the type of m_next_token_index to size_t? labath: Wouldn't it be better to change the type of m_next_token_index to size_t?
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions Unsigned types are dangerous. I prefer to avoid them wherever I can. As far as I remember many members of C++ standard commit publicly regretted the fact that .size() returns an unsigned type. eugene: Unsigned types are dangerous. I prefer to avoid them wherever I can. As far as I remember many…
				labathUnsubmitted Done Reply Inline Actions Well... they are most dangerous when you try to combine them with signed types, which is exactly what you are doing now... :) It's also contrary to how things are done in other parts of the code base and goes against the principle of having as few nonsensical values for your variables as possible. So, I still think this (and all other variables you use for token indexes) should be size_t. labath: Well... they are most dangerous when you try to combine them with signed types, which is…
				void CPlusPlusNameParser::Advance() { ++m_next_token_index; }

				void CPlusPlusNameParser::TakeBack() { --m_next_token_index; }

				bool CPlusPlusNameParser::ConsumeToken(tok::TokenKind kind) {
				if (!HasMoreTokens())
				return false;

				if (!Peek().is(kind))
				return false;

				Advance();
				return true;
				}

				template <typename... Ts> bool CPlusPlusNameParser::ConsumeToken(Ts... kinds) {
				if (!HasMoreTokens())
				return false;

				if (!Peek().isOneOf(kinds...))
				return false;

				Advance();
				return true;
				}

				CPlusPlusNameParser::Bookmark CPlusPlusNameParser::SetBookmark() {
				return Bookmark(m_next_token_index);
				}

				size_t CPlusPlusNameParser::GetCurrentPosition() { return m_next_token_index; }

				clang::Token &CPlusPlusNameParser::Peek() {
				assert(HasMoreTokens());
				return m_tokens[m_next_token_index];
				}

				Optional<ParsedFunction>
				CPlusPlusNameParser::ParseFunctionImpl(bool expect_return_type) {
				Bookmark start_position = SetBookmark();
				if (expect_return_type) {
				// Consume return type if it's expected.
				if (!ConsumeTypename())
				return None;
				}

				auto maybe_name = ParseFullNameImpl();
				if (!maybe_name) {
				return None;
				}

				size_t argument_start = GetCurrentPosition();
				if (!ConsumeArguments()) {
				return None;
				}

				size_t qualifiers_start = GetCurrentPosition();
				SkipFunctionQualifiers();
				size_t end_position = GetCurrentPosition();

				ParsedFunction result;
				result.name.basename = GetTextForRange(maybe_name.getValue().basename_range);
				result.name.context = GetTextForRange(maybe_name.getValue().context_range);
				result.arguments = GetTextForRange(Range(argument_start, qualifiers_start));
				result.qualifiers = GetTextForRange(Range(qualifiers_start, end_position));
				start_position.Remove();
				return result;
				}

				Optional<ParsedFunction>
				CPlusPlusNameParser::ParseFuncPtr(bool expect_return_type) {
				Bookmark start_position = SetBookmark();
				if (expect_return_type) {
				// Consume return type.
				if (!ConsumeTypename())
				return None;
				}

				if (!ConsumeToken(tok::l_paren))
				return None;
				if (!ConsumePtrsAndRefs())
				return None;

				{
				Bookmark before_inner_function_pos = SetBookmark();
				auto maybe_inner_function_name = ParseFunctionImpl(false);
				if (maybe_inner_function_name)
				if (ConsumeToken(tok::r_paren))
				if (ConsumeArguments()) {
				SkipFunctionQualifiers();
				start_position.Remove();
				before_inner_function_pos.Remove();
				return maybe_inner_function_name;
				}
				}

				auto maybe_inner_function_ptr_name = ParseFuncPtr(false);
				if (maybe_inner_function_ptr_name)
				if (ConsumeToken(tok::r_paren))
				if (ConsumeArguments()) {
				SkipFunctionQualifiers();
				start_position.Remove();
				return maybe_inner_function_ptr_name;
				}
				return None;
				}

				bool CPlusPlusNameParser::ConsumeArguments() {
				return ConsumeBrackets(tok::l_paren, tok::r_paren);
				}

				bool CPlusPlusNameParser::ConsumeTemplateArgs() {
				Bookmark start_position = SetBookmark();
				if (!HasMoreTokens() \|\| Peek().getKind() != tok::less)
				return false;
				Advance();

				// Consuming template arguments is a bit trickier than consuming function
				// arguments, because '<' '>' brackets are not always trivially balanced.
				// In some rare cases tokens '<' and '>' can appear inside template arguments
				// as arithmetic or shift operators not as template brackets.
				// Examples: std::enable_if<(10u)<(64), bool>
				// f<A<operator<(X,Y)::Subclass>>
				// Good thing that compiler makes sure that really ambiguous cases of
				// '>' usage should be enclosed within '()' brackets.
				int template_counter = 1;
				labathUnsubmitted Not Done Reply Inline Actions Is this really the case for the types we are interested in? I would have hoped that this would get simplified to `std::enable_if<true, bool> before it reaches us? labath: Is this really the case for the types we are interested in? I would have hoped that this would…
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions Yes. I was very surprised as well. But apparently compiler sometimes is being lazy and doesn't simplify everything :( Running: ~$ nm -C --defined-only clang \| grep std::enable_if Gives a few symbols like this: std::enable_if<(10u)<(64), bool>::type llvm::isUInt<10u>(unsigned long) eugene: Yes. I was very surprised as well. But apparently compiler sometimes is being lazy and doesn't…
				bool can_open_template = false;
				while (HasMoreTokens() && template_counter > 0) {
				tok::TokenKind kind = Peek().getKind();
				switch (kind) {
				case tok::greatergreater:
				template_counter -= 2;
				can_open_template = false;
				Advance();
				break;
				case tok::greater:
				--template_counter;
				can_open_template = false;
				Advance();
				break;
				case tok::less:
				// '<' is an attempt to open a subteamplte
				// check if parser is at the point where it's actually possible,
				// otherwise it's just a part of an expression like 'sizeof(T)<(10)'.
				// No need to do the same for '>' because compiler actually makes sure
				// that '>' always surrounded by brackets to avoid ambiguity.
				if (can_open_template)
				++template_counter;
				can_open_template = false;
				Advance();
				break;
				case tok::kw_operator: // C++ operator overloading.
				if (!ConsumeOperator())
				return false;
				can_open_template = true;
				break;
				case tok::raw_identifier:
				can_open_template = true;
				Advance();
				break;
				case tok::l_square:
				if (!ConsumeBrackets(tok::l_square, tok::r_square))
				return false;
				can_open_template = false;
				break;
				case tok::l_paren:
				if (!ConsumeArguments())
				return false;
				can_open_template = false;
				break;
				default:
				can_open_template = false;
				Advance();
				break;
				}
				}

				assert(template_counter >= 0);
				if (template_counter > 0) {
				return false;
				}
				start_position.Remove();
				return true;
				}

				bool CPlusPlusNameParser::ConsumeAnonymousNamespace() {
				Bookmark start_position = SetBookmark();
				if (!ConsumeToken(tok::l_paren)) {
				return false;
				}
				static ConstString g_anonymous("anonymous");
				if (HasMoreTokens() && Peek().is(tok::raw_identifier) &&
				Peek().getRawIdentifier() == g_anonymous.GetStringRef()) {
				Advance();
				} else {
				labathUnsubmitted Done Reply Inline Actions You don't need to go `ConstString` here. If you wanted to avoid strlen computation, just make this `constexpr StringLiteral`. labath: You don't need to go `ConstString` here. If you wanted to avoid strlen computation, just make…
				return false;
				}

				if (!ConsumeToken(tok::kw_namespace)) {
				return false;
				}

				if (!ConsumeToken(tok::r_paren)) {
				return false;
				}
				start_position.Remove();
				return true;
				}

				bool CPlusPlusNameParser::ConsumeBrackets(tok::TokenKind left,
				tok::TokenKind right) {
				Bookmark start_position = SetBookmark();
				if (!HasMoreTokens() \|\| Peek().getKind() != left)
				return false;
				Advance();

				int counter = 1;
				while (HasMoreTokens() && counter > 0) {
				tok::TokenKind kind = Peek().getKind();
				if (kind == right)
				--counter;
				else if (kind == left)
				++counter;
				Advance();
				}

				assert(counter >= 0);
				if (counter > 0) {
				return false;
				}
				start_position.Remove();
				return true;
				}

				bool CPlusPlusNameParser::ConsumeOperator() {
				Bookmark start_position = SetBookmark();
				if (!ConsumeToken(tok::kw_operator))
				return false;

				if (!HasMoreTokens()) {
				return false;
				}

				const auto &token = Peek();
				switch (token.getKind()) {
				case tok::kw_new:
				case tok::kw_delete:
				// This is 'new' or 'delete' operators.
				Advance();
				// Check for array new/delete.
				if (HasMoreTokens() && Peek().is(tok::l_square)) {
				// Consume the '[' and ']'.
				if (!ConsumeBrackets(tok::l_square, tok::r_square))
				return false;
				}
				break;

				#define OVERLOADED_OPERATOR(Name, Spelling, Token, Unary, Binary, MemberOnly) \
				case tok::Token: \
				Advance(); \
				break;
				#define OVERLOADED_OPERATOR_MULTI(Name, Spelling, Unary, Binary, MemberOnly)
				#include "clang/Basic/OperatorKinds.def"
				#undef OVERLOADED_OPERATOR
				#undef OVERLOADED_OPERATOR_MULTI

				case tok::l_paren:
				// Call operator consume '(' ... ')'.
				if (ConsumeBrackets(tok::l_paren, tok::r_paren))
				break;
				return false;

				case tok::l_square:
				// This is a [] operator.
				// Consume the '[' and ']'.
				if (ConsumeBrackets(tok::l_square, tok::r_square))
				break;
				return false;

				default:
				// This might be a cast operator.
				if (ConsumeTypename())
				break;
				return false;
				}
				start_position.Remove();
				return true;
				}

				void CPlusPlusNameParser::SkipTypeQualifiers() {
				while (ConsumeToken(tok::kw_const, tok::kw_volatile))
				;
				}

				void CPlusPlusNameParser::SkipFunctionQualifiers() {
				while (ConsumeToken(tok::kw_const, tok::kw_volatile, tok::amp, tok::ampamp))
				;
				}

				bool CPlusPlusNameParser::ConsumeBuiltinType() {
				bool result = false;
				bool continue_parsing = true;
				// Built-in types can be made of a few keywords
				// like 'unsigned long long int'. This function
				// consumes all built-in type keywords without
				// checking if they make sense like 'unsigned char void'.
				while (continue_parsing && HasMoreTokens()) {
				switch (Peek().getKind()) {
				case tok::kw_short:
				case tok::kw_long:
				case tok::kw___int64:
				case tok::kw___int128:
				case tok::kw_signed:
				case tok::kw_unsigned:
				case tok::kw_void:
				case tok::kw_char:
				case tok::kw_int:
				case tok::kw_half:
				case tok::kw_float:
				case tok::kw_double:
				case tok::kw___float128:
				case tok::kw_wchar_t:
				case tok::kw_bool:
				case tok::kw_char16_t:
				case tok::kw_char32_t:
				result = true;
				Advance();
				break;
				default:
				continue_parsing = false;
				break;
				}
				}
				return result;
				}

				void CPlusPlusNameParser::SkipPtrsAndRefs() {
				// Ignoring result.
				ConsumePtrsAndRefs();
				}

				bool CPlusPlusNameParser::ConsumePtrsAndRefs() {
				bool found = false;
				SkipTypeQualifiers();
				while (ConsumeToken(tok::star, tok::amp, tok::ampamp, tok::kw_const,
				tok::kw_volatile)) {
				found = true;
				SkipTypeQualifiers();
				}
				return found;
				}

				bool CPlusPlusNameParser::ConsumeTypename() {
				Bookmark start_position = SetBookmark();
				SkipTypeQualifiers();
				if (!ConsumeBuiltinType()) {
				if (!ParseFullNameImpl())
				return false;
				}
				SkipPtrsAndRefs();
				start_position.Remove();
				return true;
				}

				Optional<CPlusPlusNameParser::ParsedNameRanges>
				CPlusPlusNameParser::ParseFullNameImpl() {
				// Name parsing state machine.
				enum class State {
				Beginning, // start of the name
				AfterTwoColons, // right after ::
				AfterIdentifier, // right after alphanumerical identifier ([a-z0-9_]+)
				AfterTemplate, // right after template brackets (<something>)
				AfterOperator, // right after name of C++ operator
				};

				Bookmark start_position = SetBookmark();
				State state = State::Beginning;
				bool continue_parsing = true;
				Optional<size_t> last_coloncolon_position = None;

				while (continue_parsing && HasMoreTokens()) {
				const auto &token = Peek();
				switch (token.getKind()) {
				case tok::raw_identifier: // Just a name.
				if (state != State::Beginning && state != State::AfterTwoColons) {
				continue_parsing = false;
				break;
				}
				Advance();
				state = State::AfterIdentifier;
				break;
				case tok::l_paren: {
				if (state == State::Beginning \|\| state == State::AfterTwoColons) {
				// (anonymous namespace)
				if (ConsumeAnonymousNamespace()) {
				state = State::AfterIdentifier;
				break;
				}
				}

				// Type declared inside a function 'func()::Type'
				if (state != State::AfterIdentifier && state != State::AfterTemplate &&
				state != State::AfterOperator) {
				continue_parsing = false;
				break;
				}
				Bookmark l_paren_position = SetBookmark();
				// Consume the '(' ... ') [const]'.
				if (!ConsumeArguments()) {
				continue_parsing = false;
				break;
				}
				SkipFunctionQualifiers();

				// Consume '::'
				size_t coloncolon_position = GetCurrentPosition();
				if (!ConsumeToken(tok::coloncolon)) {
				continue_parsing = false;
				break;
				}
				l_paren_position.Remove();
				last_coloncolon_position = coloncolon_position;
				state = State::AfterTwoColons;
				break;
				}
				case tok::coloncolon: // Type nesting delimiter.
				if (state != State::Beginning && state != State::AfterIdentifier &&
				state != State::AfterTemplate) {
				continue_parsing = false;
				break;
				}
				last_coloncolon_position = GetCurrentPosition();
				Advance();
				state = State::AfterTwoColons;
				break;
				case tok::less: // Template brackets.
				if (state != State::AfterIdentifier && state != State::AfterOperator) {
				continue_parsing = false;
				break;
				}
				if (!ConsumeTemplateArgs()) {
				continue_parsing = false;
				break;
				}
				state = State::AfterTemplate;
				break;
				case tok::kw_operator: // C++ operator overloading.
				if (state != State::Beginning && state != State::AfterTwoColons) {
				continue_parsing = false;
				break;
				}
				if (!ConsumeOperator()) {
				continue_parsing = false;
				break;
				}
				state = State::AfterOperator;
				break;
				case tok::tilde: // Destructor.
				if (state != State::Beginning && state != State::AfterTwoColons) {
				continue_parsing = false;
				break;
				}
				Advance();
				if (ConsumeToken(tok::raw_identifier)) {
				state = State::AfterIdentifier;
				} else {
				TakeBack();
				continue_parsing = false;
				}
				break;
				default:
				continue_parsing = false;
				break;
				}
				}

				if (state == State::AfterIdentifier \|\| state == State::AfterOperator \|\|
				state == State::AfterTemplate) {
				ParsedNameRanges result;
				if (last_coloncolon_position) {
				result.context_range = Range(start_position.GetSavedPosition(),
				last_coloncolon_position.getValue());
				result.basename_range =
				Range(last_coloncolon_position.getValue() + 1, GetCurrentPosition());
				} else {
				result.basename_range =
				Range(start_position.GetSavedPosition(), GetCurrentPosition());
				}
				start_position.Remove();
				return result;
				} else {
				return None;
				}
				}

				llvm::StringRef CPlusPlusNameParser::GetTextForRange(const Range &range) {
				if (range.empty())
				return llvm::StringRef();
				assert(range.begin_index < range.end_index);
				assert(range.begin_index < m_tokens.size());
				assert(range.end_index <= m_tokens.size());
				clang::Token &first_token = m_tokens[range.begin_index];
				clang::Token &last_token = m_tokens[range.end_index - 1];
				clang::SourceLocation start_loc = first_token.getLocation();
				clang::SourceLocation end_loc = last_token.getLocation();
				unsigned start_pos = start_loc.getRawEncoding();
				unsigned end_pos = end_loc.getRawEncoding() + last_token.getLength();
				return m_text.take_front(end_pos).drop_front(start_pos);
				}

				labathUnsubmitted Done Reply Inline Actions Could this be written as: `m_text.take_front(end_pos).drop_front(start_pos)`? labath: Could this be written as: `m_text.take_front(end_pos).drop_front(start_pos)`?
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions Thanks! I still have a lot to learn about llvm classes. eugene: Thanks! I still have a lot to learn about llvm classes.
				static const clang::LangOptions &GetLangOptions() {
				static clang::LangOptions g_options;
				static llvm::once_flag g_once_flag;
				llvm::call_once(g_once_flag, []() {
				g_options.LineComment = true;
				g_options.C99 = true;
				g_options.C11 = true;
				g_options.CPlusPlus = true;
				g_options.CPlusPlus11 = true;
				g_options.CPlusPlus14 = true;
				g_options.CPlusPlus1z = true;
				});
				return g_options;
				}

				static const llvm::StringMap<tok::TokenKind> &GetKeywordsMap() {
				static llvm::StringMap<tok::TokenKind> g_map{
				#define KEYWORD(Name, Flags) {llvm::StringRef(#Name), tok::kw_##Name},
				#include "clang/Basic/TokenKinds.def"
				#undef KEYWORD
				};
				return g_map;
				}

				void CPlusPlusNameParser::ExtractTokens() {
				clang::Lexer lexer(clang::SourceLocation(), GetLangOptions(), m_text.data(),
				m_text.data(), m_text.data() + m_text.size());
				const auto &kw_map = GetKeywordsMap();
				clang::Token token;
				for (lexer.LexFromRawLexer(token); !token.is(clang::tok::eof);
				lexer.LexFromRawLexer(token)) {
				if (token.is(clang::tok::raw_identifier)) {
				auto it = kw_map.find(token.getRawIdentifier());
				if (it != kw_map.end()) {
				token.setKind(it->getValue());
				}
				}

				m_tokens.push_back(token);
				}
				}

unittests/Language/CPlusPlus/CPlusPlusLanguageTest.cpp

	//===-- CPlusPlusLanguageTest.cpp -------------------------------- C++ --===//			//===-- CPlusPlusLanguageTest.cpp -------------------------------- C++ --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "gtest/gtest.h"			#include "gtest/gtest.h"

	#include "Plugins/Language/CPlusPlus/CPlusPlusLanguage.h"			#include "Plugins/Language/CPlusPlus/CPlusPlusLanguage.h"

	using namespace lldb_private;			using namespace lldb_private;

	TEST(CPlusPlusLanguage, MethodName) {			TEST(CPlusPlusLanguage, MethodNameParsing) {
	struct TestCase {			struct TestCase {
	std::string input;			std::string input;
	std::string context, basename, arguments, qualifiers, scope_qualified_name;			std::string context, basename, arguments, qualifiers, scope_qualified_name;
	};			};

	TestCase test_cases[] = {			TestCase test_cases[] = {
	{"foo::bar(baz)", "foo", "bar", "(baz)", "", "foo::bar"},			{"main(int, char []) ", "", "main", "(int, char [])", "", "main"},
				{"foo::bar(baz) const", "foo", "bar", "(baz)", "const", "foo::bar"},
				{"foo::~bar(baz)", "foo", "~bar", "(baz)", "", "foo::~bar"},
				{"a::b::c::d(e,f)", "a::b::c", "d", "(e,f)", "", "a::b::c::d"},
				{"void f(int)", "", "f", "(int)", "", "f"},

				// Operators
	{"std::basic_ostream<char, std::char_traits<char> >& "			{"std::basic_ostream<char, std::char_traits<char> >& "
	"std::operator<<<std::char_traits<char> >"			"std::operator<<<std::char_traits<char> >"
	"(std::basic_ostream<char, std::char_traits<char> >&, char const*)",			"(std::basic_ostream<char, std::char_traits<char> >&, char const*)",
	"std", "operator<<<std::char_traits<char> >",			"std", "operator<<<std::char_traits<char> >",
	"(std::basic_ostream<char, std::char_traits<char> >&, char const*)", "",			"(std::basic_ostream<char, std::char_traits<char> >&, char const*)", "",
	"std::operator<<<std::char_traits<char> >"}};			"std::operator<<<std::char_traits<char> >"},
				{"operator delete[](void*, clang::ASTContext const&, unsigned long)", "",
				"operator delete[]", "(void*, clang::ASTContext const&, unsigned long)",
				"", "operator delete[]"},
				{"llvm::Optional<clang::PostInitializer>::operator bool() const",
				"llvm::Optional<clang::PostInitializer>", "operator bool", "()", "const",
				"llvm::Optional<clang::PostInitializer>::operator bool"},
				{"(anonymous namespace)::FactManager::operator[](unsigned short)",
				"(anonymous namespace)::FactManager", "operator[]", "(unsigned short)",
				"", "(anonymous namespace)::FactManager::operator[]"},
				{"const int& std::map<int, pair<short, int>>::operator[](short) const",
				"std::map<int, pair<short, int>>", "operator[]", "(short)", "const",
				"std::map<int, pair<short, int>>::operator[]"},
				{"CompareInsn::operator()(llvm::StringRef, InsnMatchEntry const&)",
				"CompareInsn", "operator()", "(llvm::StringRef, InsnMatchEntry const&)",
				"", "CompareInsn::operator()"},
				{"llvm::Optional<llvm::MCFixupKind>::operator*() const &",
				"llvm::Optional<llvm::MCFixupKind>", "operator*", "()", "const &",
				"llvm::Optional<llvm::MCFixupKind>::operator*"},
				// Internal classes
				{"operator<<(Cls, Cls)::Subclass::function()",
				"operator<<(Cls, Cls)::Subclass", "function", "()", "",
				"operator<<(Cls, Cls)::Subclass::function"},
				{"SAEC::checkFunction(context&) const::CallBack::CallBack(int)",
				"SAEC::checkFunction(context&) const::CallBack", "CallBack", "(int)", "",
				"SAEC::checkFunction(context&) const::CallBack::CallBack"},
				// Anonymous namespace
				{"XX::(anonymous namespace)::anon_class::anon_func() const",
				"XX::(anonymous namespace)::anon_class", "anon_func", "()", "const",
				"XX::(anonymous namespace)::anon_class::anon_func"},

				// Function pointers
				{"string (*f(vector<int>&&))(float)", "", "f", "(vector<int>&&)", "",
				"f"},
				{"void (&std::_Any_data::_M_access<void ()()>())()", "std::_Any_data",
				"_M_access<void (*)()>", "()", "",
				"std::_Any_data::_M_access<void (*)()>"},
				{"void (((((((( const&func1(int))())())())())())())())()", "",
				"func1", "(int)", "", "func1"},

				// Templates
				{"void llvm::PM<llvm::Module, llvm::AM<llvm::Module>>::"
				"addPass<llvm::VP>(llvm::VP)",
				"llvm::PM<llvm::Module, llvm::AM<llvm::Module>>", "addPass<llvm::VP>",
				"(llvm::VP)", "",
				"llvm::PM<llvm::Module, llvm::AM<llvm::Module>>::"
				"addPass<llvm::VP>"},
				{"void std::vector<Class, std::allocator<Class> >"
				"::_M_emplace_back_aux<Class const&>(Class const&)",
				"std::vector<Class, std::allocator<Class> >",
				"_M_emplace_back_aux<Class const&>", "(Class const&)", "",
				"std::vector<Class, std::allocator<Class> >::"
				"_M_emplace_back_aux<Class const&>"},
				{"unsigned long llvm::countTrailingOnes<unsigned int>"
				"(unsigned int, llvm::ZeroBehavior)",
				"llvm", "countTrailingOnes<unsigned int>",
				"(unsigned int, llvm::ZeroBehavior)", "",
				"llvm::countTrailingOnes<unsigned int>"},
				{"std::enable_if<(10u)<(64), bool>::type llvm::isUInt<10u>(unsigned "
				"long)",
				"llvm", "isUInt<10u>", "(unsigned long)", "", "llvm::isUInt<10u>"},
				{"f<A<operator<(X,Y)::Subclass>, sizeof(B)<sizeof(C)>()", "",
				"f<A<operator<(X,Y)::Subclass>, sizeof(B)<sizeof(C)>", "()", "",
				"f<A<operator<(X,Y)::Subclass>, sizeof(B)<sizeof(C)>"}};

	for (const auto &test : test_cases) {			for (const auto &test : test_cases) {
	CPlusPlusLanguage::MethodName method(ConstString(test.input));			CPlusPlusLanguage::MethodName method(ConstString(test.input));
	EXPECT_TRUE(method.IsValid());			EXPECT_TRUE(method.IsValid()) << test.input;
	EXPECT_EQ(test.context, method.GetContext());			if (method.IsValid()) {
	EXPECT_EQ(test.basename, method.GetBasename());			EXPECT_EQ(test.context, method.GetContext().str());
	EXPECT_EQ(test.arguments, method.GetArguments());			EXPECT_EQ(test.basename, method.GetBasename().str());
	EXPECT_EQ(test.qualifiers, method.GetQualifiers());			EXPECT_EQ(test.arguments, method.GetArguments().str());
				labathUnsubmitted Done Reply Inline Actions Would defining operator<< for std::ostream and StringRef ( local tomthis test) enable you to get rid of these? I've ran into this before but was never annoyed enough to actually do it.. :/ labath: Would defining operator<< for std::ostream and StringRef ( local tomthis test) enable you to…
				eugeneAuthorUnsubmitted Not Done Reply Inline Actions I tried to to it, but it didn't work for me after a few tries. I decided to just convert it to std::string, it solves the same problem. eugene: I tried to to it, but it didn't work for me after a few tries. I decided to just convert it to…
				EXPECT_EQ(test.qualifiers, method.GetQualifiers().str());
	EXPECT_EQ(test.scope_qualified_name, method.GetScopeQualifiedName());			EXPECT_EQ(test.scope_qualified_name, method.GetScopeQualifiedName());
	}			}
	}			}
				}

				TEST(CPlusPlusLanguage, ExtractContextAndIdentifier) {
				struct TestCase {
				std::string input;
				std::string context, basename;
				};

				TestCase test_cases[] = {
				{"main", "", "main"},
				{"foo01::bar", "foo01", "bar"},
				{"foo::~bar", "foo", "~bar"},
				{"std::vector<int>::push_back", "std::vector<int>", "push_back"},
				{"operator<<(Cls, Cls)::Subclass::function",
				"operator<<(Cls, Cls)::Subclass", "function"},
				{"std::vector<Class, std::allocator<Class>>"
				"::_M_emplace_back_aux<Class const&>",
				"std::vector<Class, std::allocator<Class>>",
				"_M_emplace_back_aux<Class const&>"}};

				llvm::StringRef context, basename;
				for (const auto &test : test_cases) {
				EXPECT_TRUE(CPlusPlusLanguage::ExtractContextAndIdentifier(
				test.input.c_str(), context, basename));
				EXPECT_EQ(test.context, context.str());
				EXPECT_EQ(test.basename, basename.str());
				}

				EXPECT_FALSE(CPlusPlusLanguage::ExtractContextAndIdentifier("void", context,
				basename));
				EXPECT_FALSE(
				CPlusPlusLanguage::ExtractContextAndIdentifier("321", context, basename));
				EXPECT_FALSE(
				CPlusPlusLanguage::ExtractContextAndIdentifier("", context, basename));
				}