This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Support/
-
llvm/
-
Support/
4
YAMLTraits.h

Differential D43892

[YAML] speed up isNumber by doing regex matching less often
Needs ReviewPublic

Authored by pelikan on Feb 28 2018, 12:39 PM.

Download Raw Diff

Details

Reviewers

dberris
rnk
zturner

Summary

Processing 2 GB XRay traces with "llvm-xray convert -output-format=yaml"
currently takes 1.5+ hours on my machine. When YAML is finally printed,
profiling shows huge amounts of time in YAML's needQuotes and isNumber,
doing regexp matching. That shouldn't be necessary for arbitrary input
and leaving it only when there's a chance of a float appearing makes it
consume only 40 minutes instead. More CPU savings to come.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 15561
Build 15561: arc lint + arc unit

Event Timeline

pelikan created this revision.Feb 28 2018, 12:39 PM

Harbormaster completed remote builds in B15541: Diff 136363.Feb 28 2018, 12:41 PM

pelikan mentioned this in D43896: [XRay] cache symbolized function names for a repeatedly queried function ID.Feb 28 2018, 1:02 PM

Adding Reid and Zach who might be able to give more insight here.

include/llvm/Support/YAMLTraits.h
479–481	Does it make sense to make the `Regex` object `static` and `const` so that we only compile/initialise it once?

sync

include/llvm/Support/YAMLTraits.h
479–481	I was too lazy to look into Regex implementation to check whether it actually makes sense. I bet the current version is faster though.

Harbormaster completed remote builds in B15561: Diff 136453.Feb 28 2018, 6:09 PM

dberris added inline comments.Feb 28 2018, 6:23 PM

include/llvm/Support/YAMLTraits.h
479–481	Why would this be faster than having it compiled once, and only the first time it's needed? I suspect if you do that, instead of having to do a linear search first, would be much faster if it's possible to re-use a Regex object, compile it once, and re-use for all the times it's needed.

zturner added inline comments.Mar 1 2018, 10:20 AM

include/llvm/Support/YAMLTraits.h
479–481	Agree, this will make the case where it's not a number faster, but that's the uncommon case. The case where it is a number will become slower. I agree with Dean's suggestion of compiling the regex only once, but we can still do an early out if it's obviously not a number. However, instead of checking the entire string, how about just checking the first character?

That regex is relatively simple. If we're worried about performance, maybe we should just roll it out into a series of find_first_(not_)of/consume_front/... statements

Revision Contents

Path

Size

include/

llvm/

Support/

YAMLTraits.h

10 lines

Diff 136453

include/llvm/Support/YAMLTraits.h

Show First 20 Lines • Show All 465 Lines • ▼ Show 20 Lines	inline bool isNumber(StringRef S) {

static const char DecChars[] = "0123456789";		static const char DecChars[] = "0123456789";
if (S.find_first_not_of(DecChars) == StringRef::npos)		if (S.find_first_not_of(DecChars) == StringRef::npos)
return true;		return true;

if (S.equals(".inf") \|\| S.equals(".Inf") \|\| S.equals(".INF"))		if (S.equals(".inf") \|\| S.equals(".Inf") \|\| S.equals(".INF"))
return true;		return true;

Regex FloatMatcher("^(\\.[0-9]+\|[0-9]+(\\.[0-9]*)?)([eE][-+]?[0-9]+)?$");		// Don't waste time regex matching on anything obviously non-numeric.
if (FloatMatcher.match(S))		static const char FloatChars[] = "+-0123456789.eE";
return true;		if (S.find_first_not_of(FloatChars) != StringRef::npos)

return false;		return false;

		Regex FloatMatcher("^(\\.[0-9]+\|[0-9]+(\\.[0-9]*)?)([eE][-+]?[0-9]+)?$");
		return FloatMatcher.match(S);
}		}
		dberrisUnsubmitted Not Done Reply Inline Actions Does it make sense to make the `Regex` object `static` and `const` so that we only compile/initialise it once? dberris: Does it make sense to make the `Regex` object `static` and `const` so that we only…
		pelikanAuthorUnsubmitted Not Done Reply Inline Actions I was too lazy to look into Regex implementation to check whether it actually makes sense. I bet the current version is faster though. pelikan: I was too lazy to look into Regex implementation to check whether it actually makes sense. I…
		dberrisUnsubmitted Not Done Reply Inline Actions Why would this be faster than having it compiled once, and only the first time it's needed? I suspect if you do that, instead of having to do a linear search first, would be much faster if it's possible to re-use a Regex object, compile it once, and re-use for all the times it's needed. dberris: Why would this be faster than having it compiled once, and only the first time it's needed? I…
		zturnerUnsubmitted Not Done Reply Inline Actions Agree, this will make the case where it's not a number faster, but that's the uncommon case. The case where it is a number will become slower. I agree with Dean's suggestion of compiling the regex only once, but we can still do an early out if it's obviously not a number. However, instead of checking the entire string, how about just checking the first character? zturner: Agree, this will make the case where it's not a number faster, but that's the uncommon case.

inline bool isNumeric(StringRef S) {		inline bool isNumeric(StringRef S) {
if ((S.front() == '-' \|\| S.front() == '+') && isNumber(S.drop_front()))		if ((S.front() == '-' \|\| S.front() == '+') && isNumber(S.drop_front()))
return true;		return true;

if (isNumber(S))		if (isNumber(S))
return true;		return true;

▲ Show 20 Lines • Show All 1,276 Lines • Show Last 20 Lines