This is an archive of the discontinued LLVM Phabricator instance.

Regex: Make "match" and "sub" const member functions
ClosedPublic

Authored by nlguillemot on Sep 5 2019, 2:59 PM.

Details

Summary

The Regex "match" and "sub" member functions were previously not "const"
because they wrote to the "error" member variable. This commit removes those
assignments, and instead assumes that the validity of the regex is already
known after the initial compilation of the regular expression. As a result, these
member functions were possible to make "const". This makes it easier to do things like
pre-compile Regexes up-front, and makes "match" and "sub" thread-safe.
The error status is now returned as an optional output, which also makes the API of
"match" and "sub" more consistent with each other.

Also, some uses of Regex that could be refactored to be const were made const.

Diff Detail

Event Timeline

nlguillemot created this revision.Sep 5 2019, 2:59 PM
thopre added inline comments.Sep 6 2019, 2:19 AM
include/llvm/Support/Regex.h
76–77

Could you mention that any error will be cleared by the function?

lib/Support/Regex.cpp
88–90

I think I'd prefer the API to enforce the user to check compile error when creating the regex and just have an assert here. match should only report errors in matching IMHO.

nlguillemot marked 2 inline comments as done.Sep 6 2019, 9:41 AM
nlguillemot added inline comments.
include/llvm/Support/Regex.h
76–77

The error is cleared in both match and sub, so I'll update the comment for both of those.

lib/Support/Regex.cpp
88–90

I completely agree in principle, though right now that would cause some API breakage.
For example, the following usage in lib/Transforms/Utils/SymbolRewriter.cpp:

std::string Name = Regex(Pattern).sub(Transform, C.getName(), &Error);
if (!Error.empty())
  report_fatal_error("unable to transforn " + C.getName() + " in " +
                     M.getModuleIdentifier() + ": " + Error);

I guess we could make the change anyways and update all the call sites. Maybe good for a follow-up patch?

Added note in the comments for match and sub to explicitly note that the Error string is cleared when there is no error.

nlguillemot marked an inline comment as done.Sep 6 2019, 9:48 AM
thopre added inline comments.Sep 6 2019, 2:26 PM
lib/Support/Regex.cpp
88–90

Yes, doing it in a separate would be nice. Ideally update call sites in a first patch and then apply this patch on top with those asserts I suggested.

nlguillemot marked an inline comment as done.Sep 6 2019, 2:47 PM
nlguillemot added inline comments.
lib/Support/Regex.cpp
88–90

I think it would be nice if there was an alternative way to cleanly implement the "Create a Regex and immediately throw it away" idiom as used in the example in SymbolRewriter.cpp above.

For example, Python's API can be used both ways:

import re
# "one-shot" regex
re.match(some_pattern, some_string)
# pre-compiled regex
compiled = re.compile(some_pattern)
compiled.match(some_string)

Maybe we should have something similar, where the whole process of creating the Regex object and throwing it away is wrapped in one function call. It could maybe be implemented by static member functions where the first parameter is the regex pattern. That would also make the refactoring more easy for cases like the above.

By the way, I don't have commit access to LLVM yet. If the patch looks good to you so far, could you commit it on my behalf?

thopre added inline comments.Sep 9 2019, 2:14 AM
lib/Support/Regex.cpp
88–90

Indeed, sounds like a good idea. Yes I can commit the patch for you once it is approved.

Best regards.

nlguillemot marked an inline comment as done.Sep 10 2019, 1:27 PM
nlguillemot added inline comments.
lib/Support/Regex.cpp
88–90

Thanks for helping me with the commit. Is it ok if I switch around the reviewers to make you the reviewer? There was no particular reason for the choice of the initial reviewer list other than the recent activity in the file, so I think your review feedback is probably enough.

thopre added inline comments.Sep 11 2019, 1:57 AM
lib/Support/Regex.cpp
88–90

That's usually the right way to go about it since someone who did a change in an area is likely to be interested in what's going on in that area in the future. You can add me to the reviewer list yes, but no need to remove anyone I think.

friendly reminder ping

thopre accepted this revision.Sep 23 2019, 2:51 PM

friendly reminder ping

My bad, I misunderstood you and thought you'd do a patch to implement the "Create a Regex and immediately throw it away" idiom and rebase that one on top of it.

This revision is now accepted and ready to land.Sep 23 2019, 2:51 PM

friendly reminder ping

My bad, I misunderstood you and thought you'd do a patch to implement the "Create a Regex and immediately throw it away" idiom and rebase that one on top of it.

I could do that. By "rebase that one on top of it", do you mean to include the additional changes in this same review?

friendly reminder ping

My bad, I misunderstood you and thought you'd do a patch to implement the "Create a Regex and immediately throw it away" idiom and rebase that one on top of it.

I could do that. By "rebase that one on top of it", do you mean to include the additional changes in this same review?

I meant make a patch series/stack where the first one adds the new idiom and the second is this one where the matching only checks for matching errors. But it's ok to do it the other way around, I'll commit this now once I've finished a build+check-all and you can add the changes for the new idiom on top of it.

Best regards.

This revision was automatically updated to reflect the committed changes.