This is an archive of the discontinued LLVM Phabricator instance.

[lit] Add the ability to parse regexes in Lit boolean expressions
ClosedPublic

Authored by ldionne on Jun 18 2021, 2:59 PM.

Details

Summary

This patch augments Lit with the ability to parse regular expressions
in boolean expressions. This includes REQUIRES:, XFAIL:, UNSUPPORTED:,
and all other special Lit markup that evaluates to a boolean expression.

Regular expressions can be specified by enclosing them in {{...}},
similarly to how FileCheck handles such regular expressions. The regular
expression can either be on its own, or it can be part of an identifier.
For example, a match expression like {{.+}}-apple-darwin{{.+}} would match
the following variables:

x86_64-apple-darwin20.0
arm64-apple-darwin20.0
arm64-apple-darwin22.0
etc...

In the long term, this could be used to remove the need to handle the
target triple specially when parsing boolean expressions.

Diff Detail

Event Timeline

ldionne created this revision.Jun 18 2021, 2:59 PM
ldionne requested review of this revision.Jun 18 2021, 2:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 18 2021, 2:59 PM
ldionne added inline comments.Jun 18 2021, 3:03 PM
llvm/utils/lit/lit/BooleanExpression.py
12–18

I'm not an EBNF expert, but the intent here was to describe that one can alternate identifier and {{regex}} inside what used to be just an identifier. This is to allow things like abc{{regex1}}def{{regex2}}ghi.

https://reviews.llvm.org/D104747 is an example of how libc++, libc++abi and libunwind can take advantage of this in their test suites.

ldionne updated this revision to Diff 353976.Jun 23 2021, 7:51 AM

Fix issue with grouping, add more tests (in particular, for interaction with --show-used-features)

Gentle ping to reviewers. Does anyone see an issue with adding this? Can anyone think of a subtle (and bad) interaction with existing Lit behavior?

This seems like a worthwhile feature. Thanks for working on it. It's not a part of lit I have much experience with, so hopefully someone else will comment too.

I've commented some on the implementation. I haven't looked at the tests much yet.

llvm/utils/lit/lit/BooleanExpression.py
18

I think the following renaming would make this easier to understand:

  • regex -> braced_regex: It's not just a plain regular expression.
  • any-regex -> python_regex: It's written in python's regular expression language.
114

isIdentifier seems misnamed if it accepts {{.

The logic should probably be something like either not isIdentifier and not isRegexOpen or not isIdentiferOrRegexOpen.

115

Shouldn't this mention {{ as a possibility?

ldionne marked 3 inline comments as done.Jun 28 2021, 9:49 AM
ldionne added inline comments.
llvm/utils/lit/lit/BooleanExpression.py
114

I agree that isIdentifier is misnamed, I'll fix that.

I also agree that it should be something like not isIdentifier and not isRegexOpen, however this implementation does not track the fact that we're parsing the content of a regular expression -- it doesn't know that it's inside a {{. Instead, the tokenization pattern was augmented to treat anything with {{<whatever>}} as a token of its own - that was by far the simplest way I could find to implement it. So instead, I believe the fix is to simply rename isIdentifier to isMatchExpression, and to acknowledge that we now have a new leaf in the grammar, and that isMatchExpression basically allows us to detect that.

115

I'll say expected '!' or '(' or match-expression instead, LMK if that's not satisfying.

ldionne updated this revision to Diff 354940.Jun 28 2021, 9:50 AM
ldionne marked 2 inline comments as done.

Address review comments. Thanks for reviewing!

There should be some user-level documentation about this new feature. It looks like the appropriate place is:

https://llvm.org/docs/TestingGuide.html#constraining-test-execution

llvm/utils/lit/lit/BooleanExpression.py
114

That seems reasonable. Thanks for addressing it.

115

While that does reflect the implementation, I think expected '!', '(', '{{', or identifier would be more meaningful to a user who isn't familiar with the internal grammar symbol names. As far as they know, match-expression could be, for example, the start symbol for the whole grammar. In contrast, {{ and identifier are probably clear enough to any user.

184

Should there be some minimal test somewhere (maybe here) ensuring that a regex is handled as a literal string when matching triples? I think that means it cannot ever match (due to {{ and }}), so maybe it's not a very interesting behavior, but we still might want to be aware if we accidentally change it.

ldionne updated this revision to Diff 355250.Jun 29 2021, 8:06 AM
ldionne marked 2 inline comments as done.

Address review comments

There should be some user-level documentation about this new feature. It looks like the appropriate place is:

Thanks! Indeed, I wanted to write some documentation but I couldn't find where to add it. I'll do that.

llvm/utils/lit/lit/BooleanExpression.py
115

Ok, I agree, that seems better. Changed.

184

Hmm, I agree, I had not considered that. Added two tests: one that a triple doesn't match even if the regex would match, and one where the triple does match the regex when it is treated literally.

The second test is not something we can even encounter in real life, as you say, because triples can't contain special characters like {{, but I think the test still has value since it pins down the behavior.

Note: In the future, I would love to remove any notion of a triple in Lit, since I think the special substring handling was its only purpose. If we ever do that, this whole test will become moot.

ldionne updated this revision to Diff 355253.Jun 29 2021, 8:09 AM

Rephrase error message containing too many or's

jdenny accepted this revision.Jun 29 2021, 8:42 AM

LGTM.

I apologize if this is frustrating, but can you wait a couple of days to commit? I'd like to give other reviewers just a bit more time to comment in case we overlooked something.

llvm/docs/TestingGuide.rst
465–468 ↗(On Diff #355253)

Thanks for adding this.

Tiny nit: The second bullet is a different animal than the other two. I'd put the other two next to each other or integrate them. But it's no big deal if you prefer it as is.

llvm/utils/lit/lit/BooleanExpression.py
184

That all makes sense to me. Thanks.

This revision is now accepted and ready to land.Jun 29 2021, 8:42 AM
ldionne marked an inline comment as done.Jun 29 2021, 12:19 PM

LGTM.

I apologize if this is frustrating, but can you wait a couple of days to commit? I'd like to give other reviewers just a bit more time to comment in case we overlooked something.

Thanks a lot for the review! Waiting for a bit is not frustrating - I fully understand how everybody's busy and I myself often let reviews go under my radar for a while. I'll wait until the end of the week to commit this.

llvm/docs/TestingGuide.rst
465–468 ↗(On Diff #355253)

Hmm, yes, I agree with you. I integrated both into one paragraph, and I documented the fact that string comparison is case sensitive while I was at it.

Please let me know if the new formulation doesn't suit you.

ldionne updated this revision to Diff 355329.Jun 29 2021, 12:19 PM
ldionne marked an inline comment as done.

Reformulate documentation.

Also, as a side note - should we perhaps create a Lit review group that people can add themselves to, and that is added as a blocking reviewer on any review that touches llvm/utils/lit? I think that might be useful, as I'm never certain of who should be included on those reviews. We have that for libc++ and it's working out pretty well.

Thanks a lot for the review! Waiting for a bit is not frustrating - I fully understand how everybody's busy and I myself often let reviews go under my radar for a while. I'll wait until the end of the week to commit this.

Thanks for understanding, and for the patch!

Also, as a side note - should we perhaps create a Lit review group that people can add themselves to, and that is added as a blocking reviewer on any review that touches llvm/utils/lit? I think that might be useful, as I'm never certain of who should be included on those reviews. We have that for libc++ and it's working out pretty well.

I've seen these groups but haven't investigated how they work. Is the idea that someone from that group must accept or the patch is marked as blocked?

llvm/docs/TestingGuide.rst
465–468 ↗(On Diff #355253)

LGTM. Thanks.

I've seen these groups but haven't investigated how they work. Is the idea that someone from that group must accept or the patch is marked as blocked?

Yes, basically the review is marked as blocked until anyone from the group gives a green check-mark. It's also possible to make the group a non-blocking reviewer (or even just a subscriber). But having that helps making sure that everybody in the group will be pinged when a review is posted.

I've seen these groups but haven't investigated how they work. Is the idea that someone from that group must accept or the patch is marked as blocked?

Yes, basically the review is marked as blocked until anyone from the group gives a green check-mark. It's also possible to make the group a non-blocking reviewer (or even just a subscriber). But having that helps making sure that everybody in the group will be pinged when a review is posted.

Makes sense to me. FileCheck could probably use this too. In either case, whether we need a blocking group isn't clear to me yet, so I'll have to defer to others on that point.

yln accepted this revision.Jun 29 2021, 4:55 PM

Pretty cool, looks useful! :)
Thanks to you both @ldionne and @jdenny!

I'll commit this now since it seems it has gotten the required attention, after all. If anyone gets here and thinks there's an issue, please ping the review and I can make changes in another commit or revert the patch, depending on what the issue is.