Added new Macros AST(_POLYMORPHIC)_MATCHER_REGEX(_OVERLOAD) that define a matchers that take a regular expression string and optionally regular expression flags. This lets users match against nodes while ignoring the case without having to manually use [Aa] or [A-Fa-f] in their regex. The other point this addresses is in the current state, matchers that use regular expressions have to compile them for each node they try to match on, Now the regular expression is compiled once when you define the matcher and used for every node that it tries to match against. If there is an error while compiling the regular expression an error will be logged to stderr showing the bad regex string and the reason it couldn't be compiled. The old behaviour of this was down to the Matcher implementation and some would assert, whereas others just would never match. Support for this has been added to the documentation script as well. Support for this has been added to dynamic matchers ensuring functionality is the same between the 2 use cases.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/include/clang/ASTMatchers/ASTMatchersInternal.h | ||
---|---|---|
1952 ↗ | (On Diff #273960) | Is there a reason this needs to be a shared pointer as opposed to a unique pointer that gets moved into the class owning it? |
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp | ||
126 | This makes me think we want to do a case insensitive comparison rather than an exact case match. Personally, I would always write NewLine as the string literal, but I could easily see someone writing Newline to match the spelling of the RegEx flag and being surprised it doesn't work. WDYT? | |
clang/lib/ASTMatchers/Dynamic/Marshallers.h | ||
745 | I may be misunderstanding this, but it seems suspicious that getNumArgs() returns 0 but getArgKinds() suggests at least one argument. Is that intentional? | |
770 | Are you missing a return here, after issuing the error? | |
clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp | ||
379 | Should we have a failure for something like IgnoreCase | NoFlags ? |
clang/include/clang/ASTMatchers/ASTMatchersInternal.h | ||
---|---|---|
1952 ↗ | (On Diff #273960) | It has to be shared ownership for the polymorphic matchers to enable reuse of PolymorphicMatcherWithParam1. When a matcher or type T is required from a PolymorphicMatcherWithParam1 it creates a copy of the Param. As you can't copy from a unique_ptr, a shared_ptr makes sense. |
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp | ||
126 | That could work, or I implement the ArgTypeTraits::getBestGuess to handle small issues like that. Would be a little trickier than the other cases as It would need to support |. WDYT? | |
clang/lib/ASTMatchers/Dynamic/Marshallers.h | ||
745 | It was my understanding that when isVariadic() returns true, getNumArgs() is not used. I could change it to return 1, but that wouldn't be any more correct than it is now, nor would it have any functional change. | |
770 | Yes I am | |
clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp | ||
379 | I'm against that, its perfectly legal in non dynamic matchers land to write matchesName("[ABC]*", IgnoreCase | NoFlags) Even if it is a little redundant. |
clang/include/clang/ASTMatchers/ASTMatchersInternal.h | ||
---|---|---|
1952 ↗ | (On Diff #273960) | Drat, but reasonable, thanks for the explanation. |
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp | ||
126 | Hmm, I don't have a firm opinion but am slightly leaning towards implementing getBestGuess and using the more strict spellings based purely on personal preference. | |
clang/lib/ASTMatchers/Dynamic/Marshallers.h | ||
745 | Ah, so the real issue is that getNumArgs() is pure virtual, which forces us to ask these sort of silly questions. Good to know, I'm fine with 0, I was just surprised. | |
clang/unittests/ASTMatchers/Dynamic/ParserTest.cpp | ||
379 | Okay, fine by me. I only brought it up because it seems like a very confused user and we have the opportunity to diagnose (due to the dynamic nature here as opposed to the static nature of C++ using bitwise OR directly). |
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp | ||
---|---|---|
123 | const auto & (same below) | |
157 | I would probably simplify this (untested): SmallVector<StringRef, 4> Split; llvm::Optional<llvm::Regex::RegexFlags> Flag; Flags.split(Split, '|', -1, false); for (StringRef OrFlag : Split) { if (llvm::Optional<llvm::Regex::RegexFlags> NextFlag = getRegexFlag(OrFlag.trim())) { Flag = Flag.getValueOr(llvm::Regex::NoFlags) | *NextFlag; } } return Flag; Similar below. |
clang/lib/ASTMatchers/Dynamic/Marshallers.cpp | ||
---|---|---|
123 | Foiled by my own check ;) |
I may be misunderstanding this, but it seems suspicious that getNumArgs() returns 0 but getArgKinds() suggests at least one argument. Is that intentional?