This is an archive of the discontinued LLVM Phabricator instance.

[LLVM/Support] - Disallow match() method of llvm::Regex to change object state.
AbandonedPublic

Authored by grimar on Aug 31 2016, 2:39 PM.

Details

Summary

match() method previously was able to change the error filed of a object.
That looked as excessive because pattern is compiled in constructor,
so to the moment of match() call should be already known if Regex state is valid or not.

match() returns a bool flag what should be enough to know was there some fail during the call or not.

This change allows to make match() to be const what is useful if we have constant reference to Regex
and want to call match().

Diff Detail

Event Timeline

grimar updated this revision to Diff 69904.Aug 31 2016, 2:39 PM
grimar retitled this revision from to [LLVM/Support] - Disallow match() method of llvm::Regex to change object state..
grimar updated this object.
grimar added reviewers: ruiu, rafael, davide, dblaikie.
grimar added subscribers: llvm-commits, grimar.
dblaikie edited edge metadata.Aug 31 2016, 3:59 PM

I'm not as comfortable with this change - have you looked around to see if any callers are actually checking the error after calling Match?

It seems like a client could have a problem in either case (if Regex says "no match" when it's really "ran out of memory" then maybe the code would behave incorrectly - assuming a match wasn't present when it may actually have been present).

I'm not as comfortable with this change - have you looked around to see if any callers are actually checking the error after calling Match?

Yes, sure. Callers in llvm check only returned bool value from what I saw. Most common scenarios btw is to create temporarily Regex in place and call match() like:

if (Regex(...).match()) {...

So they check only returned bool and do not care why it is false, which is fine in my opinion, as I don't expect match() to return false because of something except the "no match found".

It seems like a client could have a problem in either case (if Regex says "no match" when it's really "ran out of memory" then maybe the code would behave incorrectly - assuming a match wasn't present when it may actually have been present).

Even if we would like to handle OOM error (though we dont do that in llvm generally), there is no good way to do that atm. The same can be applied for any specific error, because
the only API Regex provides for diagnostics is bool isValid(std::string &Error), so callers can only compare returned string with "out of memory", what does not look as usable API.

So what I want to say is that I believe it is not supposed to use error member currently for anything else then text error reporting.
And callers do that earlier than match() call (if they do):

Regex RegEx(ExtractRegExpGlobals[i]);
if (!RegEx.isValid(Error)) {
  errs() << argv[0] << ": '" << ExtractRegExpGlobals[i] << "' "
    "invalid regex: " << Error;
}

So since at the moment of match() call it should be already known if regex was valid or not, and nobody seems to use IsValid() after match(), I think it is correct and useful cleanup change.

grimar added inline comments.Sep 1 2016, 1:23 AM
include/llvm/Support/Regex.h
77

btw, code in Regex.h and .cpp is not clang-formatted.
In this patch, formatting can affect on this line, should I format it ?

grimar updated this revision to Diff 69958.Sep 1 2016, 1:50 AM
grimar edited edge metadata.
  • clang-formatted.
  • fixed last minute mistype in condition.
grimar updated this revision to Diff 69961.Sep 1 2016, 2:28 AM
  • Simplified.
grimar updated this revision to Diff 69980.Sep 1 2016, 3:55 AM
  • Fixed the same mistype again. Sorry :/
grimar abandoned this revision.Sep 2 2016, 2:19 PM

Change was intended to use in lld, but it is no more needed atm.