This is an archive of the discontinued LLVM Phabricator instance.

[ASTMatchers] Enhanced support for matchers taking Regex arguments
ClosedPublic

Authored by njames93 on Jun 28 2020, 12:38 AM.

Details

Summary

Added new Macros AST(_POLYMORPHIC)_MATCHER_REGEX(_OVERLOAD) that define a matchers that take a regular expression string and optionally regular expression flags. This lets users match against nodes while ignoring the case without having to manually use [Aa] or [A-Fa-f] in their regex. The other point this addresses is in the current state, matchers that use regular expressions have to compile them for each node they try to match on, Now the regular expression is compiled once when you define the matcher and used for every node that it tries to match against. If there is an error while compiling the regular expression an error will be logged to stderr showing the bad regex string and the reason it couldn't be compiled. The old behaviour of this was down to the Matcher implementation and some would assert, whereas others just would never match. Support for this has been added to the documentation script as well. Support for this has been added to dynamic matchers ensuring functionality is the same between the 2 use cases.

Diff Detail

Event Timeline

njames93 created this revision.Jun 28 2020, 12:38 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 28 2020, 12:38 AM
njames93 updated this revision to Diff 273957.Jun 28 2020, 10:52 AM

Include matcher name in error message when failing to build a regex

njames93 updated this revision to Diff 273960.Jun 28 2020, 11:47 AM

Moved the code to build and verify the regex out of the macro

aaron.ballman added inline comments.Jun 30 2020, 6:36 AM
clang/include/clang/ASTMatchers/ASTMatchersInternal.h
1952

Is there a reason this needs to be a shared pointer as opposed to a unique pointer that gets moved into the class owning it?

clang/lib/ASTMatchers/Dynamic/Marshallers.cpp
126

This makes me think we want to do a case insensitive comparison rather than an exact case match. Personally, I would always write NewLine as the string literal, but I could easily see someone writing Newline to match the spelling of the RegEx flag and being surprised it doesn't work. WDYT?

clang/lib/ASTMatchers/Dynamic/Marshallers.h
745

I may be misunderstanding this, but it seems suspicious that getNumArgs() returns 0 but getArgKinds() suggests at least one argument. Is that intentional?

770

Are you missing a return here, after issuing the error?

clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp
379

Should we have a failure for something like IgnoreCase | NoFlags ?

njames93 marked 7 inline comments as done.Jun 30 2020, 8:15 AM
njames93 added inline comments.
clang/include/clang/ASTMatchers/ASTMatchersInternal.h
1952

It has to be shared ownership for the polymorphic matchers to enable reuse of PolymorphicMatcherWithParam1. When a matcher or type T is required from a PolymorphicMatcherWithParam1 it creates a copy of the Param. As you can't copy from a unique_ptr, a shared_ptr makes sense.

clang/lib/ASTMatchers/Dynamic/Marshallers.cpp
126

That could work, or I implement the ArgTypeTraits::getBestGuess to handle small issues like that. Would be a little trickier than the other cases as It would need to support |. WDYT?

clang/lib/ASTMatchers/Dynamic/Marshallers.h
745

It was my understanding that when isVariadic() returns true, getNumArgs() is not used. I could change it to return 1, but that wouldn't be any more correct than it is now, nor would it have any functional change.

770

Yes I am

clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp
379

I'm against that, its perfectly legal in non dynamic matchers land to write matchesName("[ABC]*", IgnoreCase | NoFlags) Even if it is a little redundant.

njames93 updated this revision to Diff 274493.Jun 30 2020, 8:15 AM
njames93 marked 2 inline comments as done.

Fix missing return statement

aaron.ballman added inline comments.Jun 30 2020, 9:51 AM
clang/include/clang/ASTMatchers/ASTMatchersInternal.h
1952

Drat, but reasonable, thanks for the explanation.

clang/lib/ASTMatchers/Dynamic/Marshallers.cpp
126

Hmm, I don't have a firm opinion but am slightly leaning towards implementing getBestGuess and using the more strict spellings based purely on personal preference.

clang/lib/ASTMatchers/Dynamic/Marshallers.h
745

Ah, so the real issue is that getNumArgs() is pure virtual, which forces us to ask these sort of silly questions. Good to know, I'm fine with 0, I was just surprised.

clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp
379

Okay, fine by me. I only brought it up because it seems like a very confused user and we have the opportunity to diagnose (due to the dynamic nature here as opposed to the static nature of C++ using bitwise OR directly).

njames93 updated this revision to Diff 274591.Jun 30 2020, 12:51 PM

Added best guess support for parsing RegexFlags

njames93 marked 5 inline comments as done.Jun 30 2020, 1:21 PM
aaron.ballman added inline comments.Jul 1 2020, 3:51 AM
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp
123

const auto & (same below)

157

I would probably simplify this (untested):

SmallVector<StringRef, 4> Split;
llvm::Optional<llvm::Regex::RegexFlags> Flag;

Flags.split(Split, '|', -1, false);
for (StringRef OrFlag : Split) {
  if (llvm::Optional<llvm::Regex::RegexFlags> NextFlag =
            getRegexFlag(OrFlag.trim())) {
    Flag = Flag.getValueOr(llvm::Regex::NoFlags) | *NextFlag;
  }
}
return Flag;

Similar below.

njames93 updated this revision to Diff 274846.Jul 1 2020, 9:28 AM

Small refactor based on comments

njames93 marked 5 inline comments as done.Jul 1 2020, 2:46 PM
njames93 added inline comments.
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp
123

Foiled by my own check ;)

This revision is now accepted and ready to land.Jul 2 2020, 4:49 AM
This revision was automatically updated to reflect the committed changes.
njames93 marked an inline comment as done.