This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Support/
-
llvm/
-
Support/
1/3
Regex.h
-
lib/
-
Support/
-
FileCheck.cpp
-
Regex.cpp
-
Transforms/Utils/
-
Utils/
-
SymbolRewriter.cpp
-
tools/llvm-cov/
-
llvm-cov/
-
CoverageFilters.cpp
-
unittests/Support/
-
Support/
4/5
RegexTest.cpp

Differential D68054

Regex: Add static convenience functions for "match" and "sub"
Needs ReviewPublic

Authored by nlguillemot on Sep 25 2019, 2:40 PM.

Download Raw Diff

Details

Reviewers

thopre

Summary

There are many cases where a Regex object used only once: It is created,
used in a single call to match() or sub(), then thrown away. This was
done in various places in LLVM using the following idiom:

Regex(Pattern).match(String)

The problem with this idiom is that if the compilation of the Regex fails,
then match() gets called with an invalid regex pattern. This scenario is
currently handled by checking inside match() if the pattern compiled successfully.
This gives match() the double-duty of handling match errors and handling regex
compilation errors. To move away from match() having this double-duty,
we created an alternative to the idiom above as follows:

Regex::match(Pattern, String)

This static member function version of the idiom behaves like syntactical sugar
for the idiom from earlier, but it checks that the regex compiled without
error before making the call to match().

If we consistently explicitly check the validity of regex before calling match(),
we can require that the regex successfully compiled as a precondition for match().
However, there is code in other projects (eg: clang) that calls match() without
checking for validity of the compiled regex, so that code must first be updated
to be more strict about using the match() API before we can require a validly
compiled regex as a precondition for match(). Updating these uses and making
the API more strict is left as future work.

A similar static convenience function was added for sub(). The constructor of Regex
was also extended to be able to return an error without requiring a subsequent
call to isValid(), for the sake of convenience and code reuse.

Uses of the previous idiom in LLVM were updated to the new style.

Diff Detail

Event Timeline

nlguillemot created this revision.Sep 25 2019, 2:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 25 2019, 2:40 PM

Herald added subscribers: llvm-commits, kristina. · View Herald Transcript

nlguillemot marked 3 inline comments as done.Sep 25 2019, 2:46 PM

nlguillemot added inline comments.

tools/sancov/sancov.cpp
127 ↗	(On Diff #221833)	Missed making these ones `const` in the previous commit, so updated them in this one.
unittests/Support/RegexTest.cpp
20	This typedef might seem kind of redundant, but it's recommended by the googletest guide: https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#death-test-naming
179	This style of conditionally executing death tests was copied from `AlignmentTest.cpp`

thopre added inline comments.Sep 26 2019, 1:42 AM

include/llvm/Support/Regex.h
90–92	While redundant, I'd add "which returns an error if the regex is invalid or no match is found" to be more explicit
120–122	Same as above.
tools/sancov/sancov.cpp
127 ↗	(On Diff #221833)	I know it's a small change but I'd rather you split it out in a separate patch which I'll approve gladly. This allow easier revert or cherry-pick.
unittests/Support/RegexTest.cpp
20	Thanks for bringing this to my attention, I was not aware of it.
193–196	I'd exchange the two tests like you did in the Constructor test because it shows that calling the wrapper on a valid regex will clear Error while here it only shows it does not set it.
199–202	Likewise.

Added more comments to static match and static sub to clarify the return value and the error's value.
Remove updates of "static Regex" -> "static const Regex", to do them in a future separate patch instead.
Switch order of test lines in "ConvenienceFunctions" test.

nlguillemot marked 3 inline comments as done.Sep 26 2019, 10:14 AM

nlguillemot added inline comments.

include/llvm/Support/Regex.h
90–92	I expanded the description. Thoughts?

nlguillemot marked an inline comment as done.Sep 26 2019, 10:14 AM

nlguillemot marked an inline comment as done.Sep 26 2019, 10:33 AM

nlguillemot added inline comments.

tools/sancov/sancov.cpp
127 ↗	(On Diff #221833)	Moved to a separate patch here: https://reviews.llvm.org/D68091

LGTM

This revision is now accepted and ready to land.Sep 27 2019, 1:01 AM

The following clang unit tests fail with your patch:

Format/./FormatTests/FormatTest.FunctionAnnotations
Format/./FormatTests/FormatTest.UnderstandsFunctionRefQualification

Can you have a look?

This revision now requires changes to proceed.Sep 27 2019, 4:06 AM

In D68054#1685676, @thopre wrote:

The following clang unit tests fail with your patch:

Format/./FormatTests/FormatTest.FunctionAnnotations
Format/./FormatTests/FormatTest.UnderstandsFunctionRefQualification

Can you have a look?

Oops, I didn't even think of checking if clang works. I started taking a look at it, and here's what I think we should do:

We should do another Regex const correctness patch for clang's uses of Regex.
We need to decide how to deal with the fact that clang depends on match()/sub()'s behavior of lazily reporting compile errors:

Option A) Make a patch for llvm and clang to put before this one that adds explicit isValid() checks to all uses of match()/sub() with "untrusted" inputs.

Option B) Keep the previous behavior and return false instead of asserting inside match(), for the sake of backwards compatibility.

If we do option A, we don't compromise on one of the original goals of this patch, and we make the API of match()/sub() more strict.
If we do option B, we have less chance of potentially breaking code in other projects in the LLVM ecosystem.

I think step (1) is a win-win, but I'm not sure about step (2). I'm tempted to conservatively go with the backwards compatible option B, but if we can judge that the potential impact of changing the API with option A is acceptable, we could go ahead and take that risk anyways.

Thoughts?

Opened a review that just increases the const-correctness of Regex in clang: https://reviews.llvm.org/D68155

This is the first of two steps I suggested in my previous comment.

In D68054#1686276, @nlguillemot wrote:

We need to decide how to deal with the fact that clang depends on match()/sub()'s behavior of lazily reporting compile errors:

Option A) Make a patch for llvm and clang to put before this one that adds explicit isValid() checks to all uses of match()/sub() with "untrusted" inputs.

Option B) Keep the previous behavior and return false instead of asserting inside match(), for the sake of backwards compatibility.

If we do option A, we don't compromise on one of the original goals of this patch, and we make the API of match()/sub() more strict.
If we do option B, we have less chance of potentially breaking code in other projects in the LLVM ecosystem.

I think step (1) is a win-win, but I'm not sure about step (2). I'm tempted to conservatively go with the backwards compatible option B, but if we can judge that the potential impact of changing the API with option A is acceptable, we could go ahead and take that risk anyways.

Thoughts?

I'm quite new to LLVM community but my understanding is that the general approach is to not worry about external dependencies. On the other hand the benefit is small, doing a check again in match and sub is cheap. We can still have those convenience function to avoid using a temporary variable, so I'd tend to agree with you and go for option B. Update the patch to keep the checks for errors in sub and match and I'll approve it.

Removed the assert inside match() that made it crash when match() is called with a regex pattern that isn't successfully compiled. Also removed the "death test" unit tests that tested that this assert was triggering.

In D68054#1750788, @nlguillemot wrote:

Removed the assert inside match() that made it crash when match() is called with a regex pattern that isn't successfully compiled. Also removed the "death test" unit tests that tested that this assert was triggering.

Can you reword the description along the line of this new API being a convenience when one does not care about distinguishing validity of a regex from whether it matches?

nlguillemot edited the summary of this revision. (Show Details)Nov 19 2019, 9:57 AM

In D68054#1751415, @thopre wrote:

In D68054#1750788, @nlguillemot wrote:

Removed the assert inside match() that made it crash when match() is called with a regex pattern that isn't successfully compiled. Also removed the "death test" unit tests that tested that this assert was triggering.

Can you reword the description along the line of this new API being a convenience when one does not care about distinguishing validity of a regex from whether it matches?

Did an editing pass on the review summary to reflect the new scope of the patch, and noted what is intended as future work.

Not committing to remove the ability to call match/sub on an invalid regex implies that the addition of the new API needs to be justified independently of the extra check that needs to be performed in match/sub to allow the current behaviour. And in fact, the only motivation for this patch I could think of was consistency with other languages and I was finding it a hard sell and was about to apologise for the extra work I made you do.

However I overlooked one benefit of removing the ability to match an invalid regex: robustness. All the cases where a call to match/sub is guards a code other than throwing an error is likely to result in wrong behaviour in case the regex is invalid. So I think it would be safer to use a separate API for combined regex creation + match/sub as per the previous revision of this patch and prevent the existing approach, so that users are mindful when not doing separate checks.

What do you think? If you agree that would mean reverting to the previous version of this patch and the simultaneous apologies from me for requesting the current version.

In D68054#1755059, @thopre wrote:

Not committing to remove the ability to call match/sub on an invalid regex implies that the addition of the new API needs to be justified independently of the extra check that needs to be performed in match/sub to allow the current behaviour. And in fact, the only motivation for this patch I could think of was consistency with other languages and I was finding it a hard sell and was about to apologise for the extra work I made you do.

However I overlooked one benefit of removing the ability to match an invalid regex: robustness. All the cases where a call to match/sub is guards a code other than throwing an error is likely to result in wrong behaviour in case the regex is invalid. So I think it would be safer to use a separate API for combined regex creation + match/sub as per the previous revision of this patch and prevent the existing approach, so that users are mindful when not doing separate checks.

What do you think? If you agree that would mean reverting to the previous version of this patch and the simultaneous apologies from me for requesting the current version.

If I understand correctly, you're suggesting we re-add the assert inside match() that requires the regex to be compiled successfully as a precondition? If so, I agree that the API should be strict. The problem with the assert() was that it affected other projects like clang, but now that there's the monorepo I feel more confident that we can safely make a breaking change by fixing any uses in the other projects in the monorepo.

If we want to re-add the assert(), I would suggest that we first commit this patch to lay the groundwork, then create a separate patch that adds the assert() back in. Isolating the patch with the API breakage would make it easier to revert if necessary.

By the way, I'm going to be on vacation from tomorrow up to December 3rd, so I might not be able to answer to comments in a timely way. Apologies in advance.

How about the following commit message:

There are many cases where a Regex object is used only once: It is created,
used in a single call to match() or sub(), then thrown away. This was
done in various places in LLVM using the following idiom:

Regex(Pattern).match(String)

The problem with this idiom is that invalid patterns result in a match
failure, which can lead to unexpected behavior if the return value of
match() is used as a condition for an if statement. To force developers
to be mindful of this aspect, an assert is added to match() to check
that the regex is valid and an new idiom is created as follows for
cases where the pattern is known to be valid:

Regex::match(Pattern, String)

This new idiom is documented as returning false when the pattern is invalid.
Code using the old idiom is thus updated to use the new idiom.

In D68054#1801066, @thopre wrote:

How about the following commit message:

(...)

I would suggest an amendment to this part:

To force developers to be mindful of this aspect, an assert is added to match() to check
that the regex is valid and an new idiom is created as follows for
cases where the pattern is known to be valid:

As-is, this patch doesn't assert inside match(), since this makes the API more backwards compatible. The wording of the commit message should be updated to match this.

It was originally a bit tricky to track down and update all users of the API, but the monorepo makes that a lot easier. If we brought back the older version of the commit that *did* assert inside match(), and we updated all affected users of the API (eg: clang) before committing, I wouldn't be opposed.

In D68054#1806142, @nlguillemot wrote:

In D68054#1801066, @thopre wrote:

How about the following commit message:

(...)

I would suggest an amendment to this part:

To force developers to be mindful of this aspect, an assert is added to match() to check
that the regex is valid and an new idiom is created as follows for
cases where the pattern is known to be valid:

As-is, this patch doesn't assert inside match(), since this makes the API more backwards compatible. The wording of the commit message should be updated to match this.

It was originally a bit tricky to track down and update all users of the API, but the monorepo makes that a lot easier. If we brought back the older version of the commit that *did* assert inside match(), and we updated all affected users of the API (eg: clang) before committing, I wouldn't be opposed.

Yes I think it should be done otherwise the change does not bring benefit besides an API familiarity to users of other languages (since it's not more concise and does not help catch errors).

Revision Contents

Path

Size

include/

llvm/

Support/

Regex.h

40 lines

lib/

Support/

FileCheck.cpp

2 lines

Regex.cpp

67 lines

Transforms/

Utils/

SymbolRewriter.cpp

3 lines

tools/

llvm-cov/

CoverageFilters.cpp

4 lines

unittests/

Support/

RegexTest.cpp

24 lines

Diff 229946

include/llvm/Support/Regex.h

Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	enum {
BasicRegex=4		BasicRegex=4
};		};

Regex();		Regex();
/// Compiles the given regular expression \p Regex.		/// Compiles the given regular expression \p Regex.
///		///
/// \param Regex - referenced string is no longer needed after this		/// \param Regex - referenced string is no longer needed after this
/// constructor does finish. Only its compiled form is kept stored.		/// constructor does finish. Only its compiled form is kept stored.
Regex(StringRef Regex, unsigned Flags = NoFlags);		///
		/// \param Error - If non-null, any errors in regex compilation will be
		/// recorded as a non-empty string. If there is no error, it will be an
		/// empty string.
		Regex(StringRef Regex, unsigned Flags = NoFlags,
		std::string *Error = nullptr);
Regex(const Regex &) = delete;		Regex(const Regex &) = delete;
Regex &operator=(Regex regex) {		Regex &operator=(Regex regex) {
std::swap(preg, regex.preg);		std::swap(preg, regex.preg);
std::swap(error, regex.error);		std::swap(error, regex.error);
return *this;		return *this;
}		}
Regex(Regex &&regex);		Regex(Regex &&regex);
~Regex();		~Regex();
Show All 16 Lines	public:
///		///
/// \param Error - If non-null, any errors in the matching will be recorded		/// \param Error - If non-null, any errors in the matching will be recorded
/// as a non-empty string. If there is no error, it will be an empty string.		/// as a non-empty string. If there is no error, it will be an empty string.
///		///
/// This returns true on a successful match.		/// This returns true on a successful match.
bool match(StringRef String, SmallVectorImpl<StringRef> *Matches = nullptr,		bool match(StringRef String, SmallVectorImpl<StringRef> *Matches = nullptr,
std::string *Error = nullptr) const;		std::string *Error = nullptr) const;

		/// Syntactical sugar to create a temporary Regex and call match() on it.
		///
		/// Assuming no regex compilation errors, equivalent to the following:
		///
		/// Regex(RegexPattern, Flags, Error).match(String, Matches, Error)
		thopreUnsubmitted Not Done Reply Inline Actions While redundant, I'd add "which returns an error if the regex is invalid or no match is found" to be more explicit thopre: While redundant, I'd add "which returns an error if the regex is invalid or no match is found"…
		nlguillemotAuthorUnsubmitted Done Reply Inline Actions I expanded the description. Thoughts? nlguillemot: I expanded the description. Thoughts?
		///
		/// However, unlike the above, this doesn't call match() if the constructor
		/// reports an error.
		///
		/// This returns true only when both the regex is valid and the match is
		/// also successful. If \p Error is non-null, it will be set to a non-empty
		/// string if the regex is invalid or if an error happened during match().
		/// If there is no error, it will be an empty string.
		static bool match(StringRef RegexPattern, StringRef String,
		SmallVectorImpl<StringRef> *Matches = nullptr,
		unsigned Flags = NoFlags, std::string *Error = nullptr);

/// sub - Return the result of replacing the first match of the regex in		/// sub - Return the result of replacing the first match of the regex in
/// \p String with the \p Repl string. Backreferences like "\0" in the		/// \p String with the \p Repl string. Backreferences like "\0" in the
/// replacement string are replaced with the appropriate match substring.		/// replacement string are replaced with the appropriate match substring.
///		///
/// Note that the replacement string has backslash escaping performed on		/// Note that the replacement string has backslash escaping performed on
/// it. Invalid backreferences are ignored (replaced by empty strings).		/// it. Invalid backreferences are ignored (replaced by empty strings).
///		///
/// \param Error If non-null, any errors in the substitution (invalid		/// \param Error If non-null, any errors in the substitution (invalid
/// backreferences, trailing backslashes) will be recorded as a non-empty		/// backreferences, trailing backslashes) will be recorded as a non-empty
/// string. If there is no error, it will be an empty string.		/// string. If there is no error, it will be an empty string.
std::string sub(StringRef Repl, StringRef String,		std::string sub(StringRef Repl, StringRef String,
std::string *Error = nullptr) const;		std::string *Error = nullptr) const;

		/// Syntactical sugar to create a temporary Regex and call sub() on it.
		///
		/// Assuming no regex compilation errors, equivalent to the following:
		///
		/// Regex(RegexPattern, Flags, Error).sub(Repl, String, Error)
		thopreUnsubmitted Not Done Reply Inline Actions Same as above. thopre: Same as above.
		///
		/// However, unlike the above, this doesn't call sub() if the constructor
		/// reports an error.
		///
		/// If \p Error is non-null, it will be set to a non-empty string if the
		/// regex is invalid or if an error happened during sub(). If there is no
		/// error, it will be an empty string.
		static std::string sub(StringRef RegexPattern, StringRef Repl,
		StringRef String, unsigned Flags = NoFlags,
		std::string *Error = nullptr);

/// If this function returns true, ^Str$ is an extended regular		/// If this function returns true, ^Str$ is an extended regular
/// expression that matches Str and only Str.		/// expression that matches Str and only Str.
static bool isLiteralERE(StringRef Str);		static bool isLiteralERE(StringRef Str);

/// Turn String into a regex by escaping its special characters.		/// Turn String into a regex by escaping its special characters.
static std::string escape(StringRef String);		static std::string escape(StringRef String);

private:		private:
struct llvm_regex *preg;		struct llvm_regex *preg;
int error;		int error;
};		};
}		}

#endif // LLVM_SUPPORT_REGEX_H		#endif // LLVM_SUPPORT_REGEX_H

lib/Support/FileCheck.cpp

Show First 20 Lines • Show All 650 Lines • ▼ Show 20 Lines	for (const auto &Substitution : Substitutions) {
InsertOffset += Value->size();		InsertOffset += Value->size();
}		}

// Match the newly constructed regex.		// Match the newly constructed regex.
RegExToMatch = TmpStr;		RegExToMatch = TmpStr;
}		}

SmallVector<StringRef, 4> MatchInfo;		SmallVector<StringRef, 4> MatchInfo;
if (!Regex(RegExToMatch, Regex::Newline).match(Buffer, &MatchInfo))		if (!Regex::match(RegExToMatch, Buffer, &MatchInfo, Regex::Newline))
return make_error<FileCheckNotFoundError>();		return make_error<FileCheckNotFoundError>();

// Successful regex match.		// Successful regex match.
assert(!MatchInfo.empty() && "Didn't get any match");		assert(!MatchInfo.empty() && "Didn't get any match");
StringRef FullMatch = MatchInfo[0];		StringRef FullMatch = MatchInfo[0];

// If this defines any string variables, remember their values.		// If this defines any string variables, remember their values.
for (const auto &VariableDef : VariableDefs) {		for (const auto &VariableDef : VariableDefs) {
▲ Show 20 Lines • Show All 1,312 Lines • Show Last 20 Lines

lib/Support/Regex.cpp

Show All 17 Lines

// Important this comes last because it defines "_REGEX_H_". At least on		// Important this comes last because it defines "_REGEX_H_". At least on
// Darwin, if included before any header that (transitively) includes		// Darwin, if included before any header that (transitively) includes
// xlocale.h, this will cause trouble, because of missing regex-related types.		// xlocale.h, this will cause trouble, because of missing regex-related types.
#include "regex_impl.h"		#include "regex_impl.h"

using namespace llvm;		using namespace llvm;

		namespace {

		/// Utility to convert a regex error code into a human-readable string.
		void RegexErrorToString(int error, struct llvm_regex *preg,
		std::string &Error) {
		size_t len = llvm_regerror(error, preg, nullptr, 0);

		Error.resize(len - 1);
		llvm_regerror(error, preg, &Error[0], len);
		}

		} // namespace

Regex::Regex() : preg(nullptr), error(REG_BADPAT) {}		Regex::Regex() : preg(nullptr), error(REG_BADPAT) {}

Regex::Regex(StringRef regex, unsigned Flags) {		Regex::Regex(StringRef regex, unsigned Flags, std::string *Error) {
unsigned flags = 0;		unsigned flags = 0;
preg = new llvm_regex();		preg = new llvm_regex();
preg->re_endp = regex.end();		preg->re_endp = regex.end();
if (Flags & IgnoreCase)		if (Flags & IgnoreCase)
flags \|= REG_ICASE;		flags \|= REG_ICASE;
if (Flags & Newline)		if (Flags & Newline)
flags \|= REG_NEWLINE;		flags \|= REG_NEWLINE;
if (!(Flags & BasicRegex))		if (!(Flags & BasicRegex))
flags \|= REG_EXTENDED;		flags \|= REG_EXTENDED;
error = llvm_regcomp(preg, regex.data(), flags\|REG_PEND);		error = llvm_regcomp(preg, regex.data(), flags\|REG_PEND);

		// Log regex compilation error into Error string if it is available.
		if (Error) {
		if (error) {
		RegexErrorToString(error, preg, *Error);
		} else {
		if (!Error->empty())
		Error->clear();
		}
		}
}		}

Regex::Regex(Regex &&regex) {		Regex::Regex(Regex &&regex) {
preg = regex.preg;		preg = regex.preg;
error = regex.error;		error = regex.error;
regex.preg = nullptr;		regex.preg = nullptr;
regex.error = REG_BADPAT;		regex.error = REG_BADPAT;
}		}

Regex::~Regex() {		Regex::~Regex() {
if (preg) {		if (preg) {
llvm_regfree(preg);		llvm_regfree(preg);
delete preg;		delete preg;
}		}
}		}

namespace {

/// Utility to convert a regex error code into a human-readable string.
void RegexErrorToString(int error, struct llvm_regex *preg,
std::string &Error) {
size_t len = llvm_regerror(error, preg, nullptr, 0);

Error.resize(len - 1);
llvm_regerror(error, preg, &Error[0], len);
}

} // namespace

bool Regex::isValid(std::string &Error) const {		bool Regex::isValid(std::string &Error) const {
if (!error)		if (!error)
return true;		return true;

RegexErrorToString(error, preg, Error);		RegexErrorToString(error, preg, Error);
return false;		return false;
}		}

/// getNumMatches - In a valid regex, return the number of parenthesized		/// getNumMatches - In a valid regex, return the number of parenthesized
/// matches it contains.		/// matches it contains.
unsigned Regex::getNumMatches() const {		unsigned Regex::getNumMatches() const {
return preg->re_nsub;		return preg->re_nsub;
}		}

bool Regex::match(StringRef String, SmallVectorImpl<StringRef> *Matches,		bool Regex::match(StringRef String, SmallVectorImpl<StringRef> *Matches,
std::string *Error) const {		std::string *Error) const {
// Reset error, if given.		// Reset error, if given.
if (Error && !Error->empty())		if (Error && !Error->empty())
*Error = "";		Error->clear();

// Check if the regex itself didn't successfully compile.		// Check if the regex itself didn't successfully compile.
if (Error ? !isValid(*Error) : !isValid())		if (Error ? !isValid(*Error) : !isValid())
return false;		return false;

unsigned nmatch = Matches ? preg->re_nsub+1 : 0;		unsigned nmatch = Matches ? preg->re_nsub+1 : 0;

// pmatch needs to have at least one element.		// pmatch needs to have at least one element.
Show All 29 Lines	for (unsigned i = 0; i != nmatch; ++i) {
Matches->push_back(StringRef(String.data()+pm[i].rm_so,		Matches->push_back(StringRef(String.data()+pm[i].rm_so,
pm[i].rm_eo-pm[i].rm_so));		pm[i].rm_eo-pm[i].rm_so));
}		}
}		}

return true;		return true;
}		}

		bool Regex::match(StringRef RegexPattern, StringRef String,
		SmallVectorImpl<StringRef> *Matches, unsigned Flags,
		std::string *Error) {
		// Compile the single-use regex.
		Regex TmpRegex(RegexPattern, Flags, Error);

		// Bail out if there were regex compile errors.
		if (!TmpRegex.isValid())
		return false;

		// Do the single-use match itself.
		return TmpRegex.match(String, Matches, Error);
		}

std::string Regex::sub(StringRef Repl, StringRef String,		std::string Regex::sub(StringRef Repl, StringRef String,
std::string *Error) const {		std::string *Error) const {
SmallVector<StringRef, 8> Matches;		SmallVector<StringRef, 8> Matches;

// Return the input if there was no match.		// Return the input if there was no match.
if (!match(String, &Matches, Error))		if (!match(String, &Matches, Error))
return String;		return String;

▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	std::string Regex::sub(StringRef Repl, StringRef String,
}		}

// And finally the suffix.		// And finally the suffix.
Res += StringRef(Matches[0].end(), String.end() - Matches[0].end());		Res += StringRef(Matches[0].end(), String.end() - Matches[0].end());

return Res;		return Res;
}		}

		std::string Regex::sub(StringRef RegexPattern, StringRef Repl, StringRef String,
		unsigned Flags, std::string *Error) {
		// Compile the single-use regex.
		Regex TmpRegex(RegexPattern, Flags, Error);

		// Bail out if there were regex compile errors.
		if (!TmpRegex.isValid())
		return String;

		// Do the single-use sub itself.
		return TmpRegex.sub(Repl, String, Error);
		}

// These are the special characters matched in functions like "p_ere_exp".		// These are the special characters matched in functions like "p_ere_exp".
static const char RegexMetachars[] = "()^$\|*+?.[]\\{}";		static const char RegexMetachars[] = "()^$\|*+?.[]\\{}";

bool Regex::isLiteralERE(StringRef Str) {		bool Regex::isLiteralERE(StringRef Str) {
// Check for regex metacharacters. This list was derived from our regex		// Check for regex metacharacters. This list was derived from our regex
// implementation in regcomp.c and double checked against the POSIX extended		// implementation in regcomp.c and double checked against the POSIX extended
// regular expression specification.		// regular expression specification.
return Str.find_first_of(RegexMetachars) == StringRef::npos;		return Str.find_first_of(RegexMetachars) == StringRef::npos;
Show All 12 Lines

lib/Transforms/Utils/SymbolRewriter.cpp

Show First 20 Lines • Show All 173 Lines • ▼ Show 20 Lines	template <RewriteDescriptor::Type DT, typename ValueType,
iterator_range<typename iplist<ValueType>::iterator>		iterator_range<typename iplist<ValueType>::iterator>
(Module::*Iterator)()>		(Module::*Iterator)()>
bool PatternRewriteDescriptor<DT, ValueType, Get, Iterator>::		bool PatternRewriteDescriptor<DT, ValueType, Get, Iterator>::
performOnModule(Module &M) {		performOnModule(Module &M) {
bool Changed = false;		bool Changed = false;
for (auto &C : (M.*Iterator)()) {		for (auto &C : (M.*Iterator)()) {
std::string Error;		std::string Error;

std::string Name = Regex(Pattern).sub(Transform, C.getName(), &Error);		std::string Name =
		Regex::sub(Pattern, Transform, C.getName(), Regex::NoFlags, &Error);
if (!Error.empty())		if (!Error.empty())
report_fatal_error("unable to transforn " + C.getName() + " in " +		report_fatal_error("unable to transforn " + C.getName() + " in " +
M.getModuleIdentifier() + ": " + Error);		M.getModuleIdentifier() + ": " + Error);

if (C.getName() == Name)		if (C.getName() == Name)
continue;		continue;

if (GlobalObject *GO = dyn_cast<GlobalObject>(&C))		if (GlobalObject *GO = dyn_cast<GlobalObject>(&C))
▲ Show 20 Lines • Show All 394 Lines • Show Last 20 Lines

tools/llvm-cov/CoverageFilters.cpp

Show All 20 Lines	bool NameCoverageFilter::matches(
const coverage::FunctionRecord &Function) const {		const coverage::FunctionRecord &Function) const {
StringRef FuncName = Function.Name;		StringRef FuncName = Function.Name;
return FuncName.find(Name) != StringRef::npos;		return FuncName.find(Name) != StringRef::npos;
}		}

bool NameRegexCoverageFilter::matches(		bool NameRegexCoverageFilter::matches(
const coverage::CoverageMapping &,		const coverage::CoverageMapping &,
const coverage::FunctionRecord &Function) const {		const coverage::FunctionRecord &Function) const {
return llvm::Regex(Regex).match(Function.Name);		return llvm::Regex::match(Regex, Function.Name);
}		}

bool NameRegexCoverageFilter::matchesFilename(StringRef Filename) const {		bool NameRegexCoverageFilter::matchesFilename(StringRef Filename) const {
return llvm::Regex(Regex).match(Filename);		return llvm::Regex::match(Regex, Filename);
}		}

bool NameWhitelistCoverageFilter::matches(		bool NameWhitelistCoverageFilter::matches(
const coverage::CoverageMapping &,		const coverage::CoverageMapping &,
const coverage::FunctionRecord &Function) const {		const coverage::FunctionRecord &Function) const {
return Whitelist.inSection("llvmcov", "whitelist_fun", Function.Name);		return Whitelist.inSection("llvmcov", "whitelist_fun", Function.Name);
}		}

▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

unittests/Support/RegexTest.cpp

Show All 11 Lines
#include <cstring>		#include <cstring>

using namespace llvm;		using namespace llvm;
namespace {		namespace {

class RegexTest : public ::testing::Test {		class RegexTest : public ::testing::Test {
};		};

TEST_F(RegexTest, Basics) {		TEST_F(RegexTest, Basics) {
		nlguillemotAuthorUnsubmitted Done Reply Inline Actions This typedef might seem kind of redundant, but it's recommended by the googletest guide: https://github.com/google/googletest/blob/master/googletest/docs/advanced.md#death-test-naming nlguillemot: This typedef might seem kind of redundant, but it's recommended by the googletest guide: https…
		thopreUnsubmitted Not Done Reply Inline Actions Thanks for bringing this to my attention, I was not aware of it. thopre: Thanks for bringing this to my attention, I was not aware of it.
Regex r1("^[0-9]+$");		Regex r1("^[0-9]+$");
EXPECT_TRUE(r1.match("916"));		EXPECT_TRUE(r1.match("916"));
EXPECT_TRUE(r1.match("9"));		EXPECT_TRUE(r1.match("9"));
EXPECT_FALSE(r1.match("9a"));		EXPECT_FALSE(r1.match("9a"));

SmallVector<StringRef, 1> Matches;		SmallVector<StringRef, 1> Matches;
Regex r2("[0-9]+");		Regex r2("[0-9]+");
EXPECT_TRUE(r2.match("aa216b", &Matches));		EXPECT_TRUE(r2.match("aa216b", &Matches));
▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
TEST_F(RegexTest, IsValid) {		TEST_F(RegexTest, IsValid) {
std::string Error;		std::string Error;
EXPECT_FALSE(Regex("(foo").isValid(Error));		EXPECT_FALSE(Regex("(foo").isValid(Error));
EXPECT_EQ("parentheses not balanced", Error);		EXPECT_EQ("parentheses not balanced", Error);
EXPECT_FALSE(Regex("a[b-").isValid(Error));		EXPECT_FALSE(Regex("a[b-").isValid(Error));
EXPECT_EQ("invalid character range", Error);		EXPECT_EQ("invalid character range", Error);
}		}

		TEST_F(RegexTest, ConstructorError) {
		std::string Error;
		Regex r1("(foo", Regex::NoFlags, &Error);
		EXPECT_EQ("parentheses not balanced", Error);
		Regex r2("foo", Regex::NoFlags, &Error);
		EXPECT_TRUE(Error.empty());
		}

TEST_F(RegexTest, MoveConstruct) {		TEST_F(RegexTest, MoveConstruct) {
Regex r1("^[0-9]+$");		Regex r1("^[0-9]+$");
Regex r2(std::move(r1));		Regex r2(std::move(r1));
EXPECT_TRUE(r2.match("916"));		EXPECT_TRUE(r2.match("916"));
}		}

TEST_F(RegexTest, MoveAssign) {		TEST_F(RegexTest, MoveAssign) {
Regex r1("^[0-9]+$");		Regex r1("^[0-9]+$");
Show All 13 Lines	TEST_F(RegexTest, NoArgConstructor) {
EXPECT_TRUE(r1.isValid(Error));		EXPECT_TRUE(r1.isValid(Error));
}		}

TEST_F(RegexTest, MatchInvalid) {		TEST_F(RegexTest, MatchInvalid) {
Regex r1;		Regex r1;
std::string Error;		std::string Error;
EXPECT_FALSE(r1.isValid(Error));		EXPECT_FALSE(r1.isValid(Error));
EXPECT_FALSE(r1.match("X"));		EXPECT_FALSE(r1.match("X"));
}		}
		nlguillemotAuthorUnsubmitted Done Reply Inline Actions This style of conditionally executing death tests was copied from `AlignmentTest.cpp` nlguillemot: This style of conditionally executing death tests was copied from `AlignmentTest.cpp`

// https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3727		// https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=3727
TEST_F(RegexTest, OssFuzz3727Regression) {		TEST_F(RegexTest, OssFuzz3727Regression) {
// Wrap in a StringRef so the NUL byte doesn't terminate the string		// Wrap in a StringRef so the NUL byte doesn't terminate the string
Regex r(StringRef("[[[=GS\x00[=][", 10));		Regex r(StringRef("[[[=GS\x00[=][", 10));
std::string Error;		std::string Error;
EXPECT_FALSE(r.isValid(Error));		EXPECT_FALSE(r.isValid(Error));
}		}

		TEST_F(RegexTest, ConvenienceFunctions) {
		std::string Error;

		// static Regex::match
		EXPECT_FALSE(Regex::match("(foo", "foo", nullptr, Regex::NoFlags, &Error));
		EXPECT_EQ("parentheses not balanced", Error);
		EXPECT_TRUE(Regex::match("^[0-9]+$", "916", nullptr, Regex::NoFlags, &Error));
		EXPECT_TRUE(Error.empty());
		thopreUnsubmitted Done Reply Inline Actions I'd exchange the two tests like you did in the Constructor test because it shows that calling the wrapper on a valid regex will clear Error while here it only shows it does not set it. thopre: I'd exchange the two tests like you did in the Constructor test because it shows that calling…

		// static Regex::sub
		EXPECT_EQ("aber", Regex::sub("a[b-", "d", "aber", Regex::NoFlags, &Error));
		EXPECT_EQ("invalid character range", Error);
		EXPECT_EQ("NUM", Regex::sub("[0-9]+", "NUM", "1234", Regex::NoFlags, &Error));
		EXPECT_TRUE(Error.empty());
		thopreUnsubmitted Done Reply Inline Actions Likewise. thopre: Likewise.
		}

}		}

This is an archive of the discontinued LLVM Phabricator instance.

Regex: Add static convenience functions for "match" and "sub"Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 229946

include/llvm/Support/Regex.h

lib/Support/FileCheck.cpp

lib/Support/Regex.cpp

lib/Transforms/Utils/SymbolRewriter.cpp

tools/llvm-cov/CoverageFilters.cpp

unittests/Support/RegexTest.cpp

Regex: Add static convenience functions for "match" and "sub"
Needs ReviewPublic