This is an archive of the discontinued LLVM Phabricator instance.

[lit] Implement DEFINE and REDEFINE directives
ClosedPublic

Authored by jdenny on Aug 23 2022, 3:44 PM.

Details

Summary

These directives define per-test lit substitutions. The concept was
discussed at
https://discourse.llvm.org/t/iterating-lit-run-lines/62596/10.

For example, the following directives can be inserted into a test file
to define %{cflags} and %{fcflags} substitutions with empty
initial values, which serve as the parameters of another newly defined
%{check} substitution:

// DEFINE: %{cflags} =
// DEFINE: %{fcflags} =

// DEFINE: %{check} = %clang_cc1 %{cflags} -emit-llvm -o - %s | \ 
// DEFINE:            FileCheck %{fcflags} %s

The following directives then redefine the parameters before each use
of %{check}:

// REDEFINE: %{cflags} = -foo
// REDEFINE: %{fcflags} = -check-prefix=FOO
// RUN: %{check}

// REDEFINE: %{cflags} = -bar
// REDEFINE: %{fcflags} = -check-prefix=BAR
// RUN: %{check}

Of course, %{check} would typically be more elaborate, increasing
the benefit of the reuse.

One issue is that the strings DEFINE: and REDEFINE: already appear
in 5 tests. This patch adjusts those tests not to use those strings.
Our prediction is that, in the vast majority of cases, if a test
author mistakenly uses one of those strings for another purpose, the
text appearing after the string will not happen to have the syntax
required for these directives. Thus, the test author will discover
the mistake immediately when lit reports the syntax error.

This patch also expands the documentation on existing lit substitution
behavior.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I haven't got the time to review this right now, but I wanted to say a big thank you for working on this. I've wanted the ability to specify my own substitutions within a test for a long time now. I experimented with a CHECK-DEFINE pattern for FileCheck, since I only needed it in the context of FileCheck check patterns, but this will do the job perfectly: we can use it in conjunction with FileCheck's -D option for crazy shenanigans like reusing the same string in both positive and negative checks, or swapping around the meaning of certain variables (by naming them differently). As a bonus of this design, it also could be used to rerun the same FileCheck invocation, but with different sets of prefixes, without having to repeat everything else.

One question I do have: why do we need both DEFINE and REDEFINE? I personally would think that just DEFINE would be enough, with it implicitly replacing any existing substitution. Or is it something to do with substitution order?

ellis added a subscriber: ellis.Aug 24 2022, 7:50 AM

I haven't got the time to review this right now, but I wanted to say a big thank you for working on this.

Sure. Thanks for the quick response.

I experimented with a CHECK-DEFINE pattern for FileCheck, since I only needed it in the context of FileCheck check patterns, but this will do the job perfectly:

Makes sense.

One question I do have: why do we need both DEFINE and REDEFINE? I personally would think that just DEFINE would be enough, with it implicitly replacing any existing substitution. Or is it something to do with substitution order?

I agree that DEFINE would be enough for the core functionality. However, I find that debugging complex substitution expansions can sometimes be painful. I split DEFINE into two directives in order to improve readability and facilitate debugging, as follows.

I work with test suites where lit configuration files define substitutions in terms of other substitutions. Let's say a lit configuration file defines a substitution foo such that its value contains a substitution bar. Now let's say you have a test file that uses foo in a RUN line, and you don't realize that foo uses bar or that bar is a thing at all. However, bar seems like a nice name for a per-test substitution that you want, so you define bar using DEFINE. If DEFINE doesn't complain that bar is already defined, you've corrupted the RUN line, and you'll be lucky if the RUN line fails so you have the opportunity to debug and eventually figure out your mistake. On the other hand, if DEFINE does complain, you will immediately discover your mistake.

Another scenario is where you intend to redefine a substitution but mistype it. Such things do happen: think of all the ways people have tried to diagnose typos in FileCheck prefixes. Again, hopefully the next RUN line will fail and you'll eventually figure out your mistake... or maybe it won't fail given that the intended substitution's stale value didn't produce a failure in previous RUN lines. On the other hand, if REDEFINE complains that the substitution is not already defined, you will immediately discover your mistake.

Finally, I find that substitution expansion order can be confusing at times. This patch tries to address that confusion with a definition-before-reference rule. However, if the only directive is DEFINE, then how do you tell where the original definitions are? Sure, you could look for the first DEFINE for a substitution, but what if the first DEFINE in the file isn't the first definition because there's a definition in the lit configuration file? I think it's easier to understand expansion order if DEFINE always sets substitution expansion order and REDEFINE never does.

Thanks for implementing this! The design looks good to me.
I think it is powerful enough for my impending use case which is quite straightforward to reason about, without making the lit domain specific language too complex.

One question I do have: why do we need both DEFINE and REDEFINE? I personally would think that just DEFINE would be enough, with it implicitly replacing any existing substitution. Or is it something to do with substitution order?

I agree that DEFINE would be enough for the core functionality. However, I find that debugging complex substitution expansions can sometimes be painful. I split DEFINE into two directives in order to improve readability and facilitate debugging, as follows.

Having the DEFINE/REDEFINE distinction looks good to me. If a substitution is defined in multiple places, knowing where they are redefined helps (like variable declarations and assignments. Python/Ruby made it difficult.)

llvm/utils/lit/tests/Inputs/shtest-define/errors/assignment/before-name.txt
9

delete blank line

llvm/utils/lit/tests/Inputs/shtest-define/errors/assignment/braces-empty.txt
2

space after DEFINE:

Thanks for implementing this! The design looks good to me.
I think it is powerful enough for my impending use case which is quite straightforward to reason about, without making the lit domain specific language too complex.

Great! For others, that discussion is at https://discourse.llvm.org/t/lit-run-a-run-line-multiple-times-with-different-replacements/64932/15.

Having the DEFINE/REDEFINE distinction looks good to me. If a substitution is defined in multiple places, knowing where they are redefined helps (like variable declarations and assignments.

Good analogy. Perhaps I can work it into the documentation.

Thanks for the review. I'll apply your suggestions on the next update.

jdenny updated this revision to Diff 457446.Sep 1 2022, 5:10 PM
jdenny edited the summary of this revision. (Show Details)
  • Fixed 5 existing tests that use the strings DEFINE: and REDEFINE:.
  • Applied reviewer suggestions in tests.
  • Added declaration/assignment analogy to docs.
  • Rebased.
jdenny marked 2 inline comments as done.Sep 1 2022, 5:11 PM

Just a drive through review from me. Thanks for working on this!

llvm/docs/TestingGuide.rst
624–627

Why delete "This can be used in tests with multiple RUN lines, which reference test file's line numbers"?

696–698

What happens if these remain undefined? I couldn't find this documented or tested.

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-define-bad-redefine.txt
14

FIXME

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-define-redefine.txt
14

FIXME

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-define-run.txt
10

FIXME

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-define.txt
9

FIXME

llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
48

FIXME

Just a drive through review from me. Thanks for working on this!

Thanks for the review.

llvm/docs/TestingGuide.rst
624–627

It seems misleading to me: these substitutions can be used even if there's only one RUN line. Beyond that issue, I'm not sure what the sentence says that the first sentence doesn't already say more clearly.

If I've misunderstood the sentence and lost an important point, please let me know. However, if we keep it, it might need to be generalized to make sense for the new directives.

696–698

If, for example, %{triple} is left undefined, then the REDEFINE: directive for %{triple} below will report an error. This error is documented under the REDEFINE: documentation entry below, and llvm/utils/lit/tests/Inputs/shtest-define/errors/defined-check/redefine-none.txt tests that behavior. Is that what you're looking for?

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-define-redefine.txt
14

Ah, I thought I had fixed all these. Will do in my next update. Thanks.

I've done a bit of a review. Not looked at the testing yet, and I can't say to have fully grasped the details of everything that's going on in the python at this point.

llvm/docs/TestingGuide.rst
901

Nit: I believe it should be "use-case" (hyphenated).

llvm/utils/lit/lit/TestRunner.py
1259

Nit: I'd probably put blank lines between functions, as in some cases, it's easy to miss that you've entered a new function.

1270–1272

How come this only applies to substitutions and not commands?

1280

Is there a test to show that you can have substitutions containing = characters in the value? (I skimmed briefly, but there's a lot of tests, so I may not have spotted it).

1301

Not sure if it's worth caring about, but normally _ would be a valid leading character in an identifier in C and Python, so not permitting it here might cause a little confusion.

1304

"must must" -> "must" (and presumably fix a corresponding test).

1564

dir is a built-in python command, so I'd pick a different name. I also find it a confusing name anyway, because I'd expand it to "directory" not "directive". Any reason not to just name it "directive"?

1571

Nit

1591

Nit: double space after period is inconsistent with the rest of this comment.

1719

Nit: whilst moving this, might as well add the missing trailing full stop.

1836–1838

Ditto comments re. dir name.

1846

Ditto.

jdenny updated this revision to Diff 458018.Sep 5 2022, 8:48 AM
jdenny marked 15 inline comments as done.

Apply reviewer suggestions:

  • Accepted leading _ for name in %{name}. Test what chars are accepted in name.
  • Test that = is accepted in a value.
  • Fix various typographical and naming issues.
jdenny added a comment.Sep 5 2022, 8:48 AM

I've done a bit of a review. Not looked at the testing yet, and I can't say to have fully grasped the details of everything that's going on in the python at this point.

Thanks for the review so far.

llvm/docs/TestingGuide.rst
901

Interesting. I didn't recall it was ever written that way. https://www.merriam-webster.com/dictionary/use%20case claims that's less common. https://en.wikipedia.org/wiki/Use_case spells it without the hyphen. A couple greps in llvm-project.git:

$ git grep -i 'use-case' | wc
     47     509    5139
$ git grep -i 'use\s*case' | wc
    276    3310   30237

The hyphen makes sense to me if the phrase is being used as an adjective, but here's it a noun.

llvm/utils/lit/lit/TestRunner.py
1270–1272

_handleCommand already does the rstrip, so the effect is the same. Would it help if I moved that logic into CommandDirective?

1280

Good point. Added one.

1301

For the leading character, I was following the FileCheck prefix documentation. It turns out that doesn't match the FileCheck prefix implementation.

OK, I've extended it to accept the leading _, and I've added a test that all the above characters are indeed accepted.

1304

Good catch. Fixed.

The associated tests check the diagnostic only up to the word "malformed", so there's nothing to update. That was my thinking when I wrote them: don't maintain the verbose description of the syntax repeatedly in the test suite. I'm generally not very consistent about how much of a diagnostic to replicate in a test suite, so let me know if you think I should include a different amount in this case.

1564

Good point. Done.

awarzynski added inline comments.Sep 5 2022, 11:07 AM
llvm/docs/TestingGuide.rst
624–627

these substitutions can be used even if there's only one RUN line.

That's true, but do they make sense if there's only one RUN line? I don't think so 🤔 . I couldn't find a single example in tree though :( I wrote one myself:

one two three


; RUN: head -%(line+8) %s | tail +%(line+8) | FileCheck %s --check-prefix=BOTTOM
; RUN: head -%(line+7) %s | tail +%(line+7) | FileCheck %s --check-prefix=BOTTOM
; RUN: head -%(line-5) %s | tail +%(line-5) | FileCheck %s --check-prefix=TOP

; TOP: one two three
; BOTTOM: five six seven


five six seven

To me that sentence helps understand the rationale for having %(line+<number>)/%(line-<number>) (as opposed to plain %(line)). Without it I'm just not sure what those substitutions are for.

696–698

Is that what you're looking for?

No :) By "undefined" I meant "empty". My bad, sorry! Basically, I had this scenario in mind:

; DEFINE: %{triple} =
; DEFINE: %{cflags} =
; DEFINE: %{fc-prefix} =

 ; DEFINE: %{check} =                                                  \
 ; DEFINE:   %clang_cc1 -verify -fopenmp -fopenmp-version=51 %{cflags} \
 ; DEFINE:              -triple %{triple} -emit-llvm -o - %s |         \
 ; DEFINE:     FileCheck -check-prefix=%{fc-prefix} %s

What should happen here?

If, for example, %{triple} is left undefined, then the REDEFINE: directive for %{triple} below will report an error.

So this is allowed:

; DEFINE: %{triple} =
; REDEFINE: %{triple} =

and this is not allowed (assuming that there's no DEFINE: %{triple} in any of the local config files):

;  REDEFINE: %{triple} =

Makes sense!

llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
15
22–23

Why not?

28–29

And I guess that such usage should be discouraged?

33–34

Why not? And what's the difference between %{global:multiple-once-exact} and <%{global:multiple-once-exact}>?

jdenny updated this revision to Diff 458063.Sep 5 2022, 4:11 PM
jdenny marked 5 inline comments as done.

Applied reviewer suggestions:

  • Rewrote examples to avoid confusion over unusable initial substitution values.
  • Improved some comments in the test suite.
  • Fixed a typo.
llvm/docs/TestingGuide.rst
624–627

I have not found examples of line number substitutions either. I even grepped through the LLVM history.

To me, saying line number substitutions don't make sense unless there are multiple RUN lines suggests that the RUN lines would refer to each other's line numbers. When is that useful?

I can imagine a RUN line referring to another part of the file, such as source code that is expected to produce a diagnostic including a line number. Only one RUN line is required for that to make sense. For example:

<text>foo</txt>
# RUN: tool-under-test %s 2>&1 | diag-check-tool --line-number=%(line-1) 'error: unmatched <text>'

But we normally use FileCheck or clang's -verify to verify output, and they have their own line number support, so the above use case probably doesn't come up in practice.

Does the example from your comment above make sense with a single RUN line? I think so. In that example, the line number substitutions are used to refer to other parts of the test file, but not specifically to the other RUN lines. One RUN line is enough for them to be useful. Did I misunderstand the intent?

If you feel the sentence in question describes a real use case, I might just add the phrase "For example," before it so at least it doesn't imply that's the only use case.

(By the way, I do show an example of using non-relative line number substitutions in D133172.)

696–698
; DEFINE: %{triple} =
; DEFINE: %{cflags} =
; DEFINE: %{fc-prefix} =

 ; DEFINE: %{check} =                                                  \
 ; DEFINE:   %clang_cc1 -verify -fopenmp -fopenmp-version=51 %{cflags} \
 ; DEFINE:              -triple %{triple} -emit-llvm -o - %s |         \
 ; DEFINE:     FileCheck -check-prefix=%{fc-prefix} %s

What should happen here?

I think I get the point now: the initial values for %{triple} and %{fc-prefix} are unusable, so they must be redefined before %{check} is used in a RUN line, or the RUN line will malfunction I see how that quirk is confusing for an introductory example, so I've rewritten the example so that empty values work fine. I've also made it so every use of %{check} redefines every parameter substitution rather than just some, and I think that makes the example easier to read too. Let me know if it's still confusing.

If, for example, %{triple} is left undefined, then the REDEFINE: directive for %{triple} below will report an error.

So this is allowed:

; DEFINE: %{triple} =
; REDEFINE: %{triple} =

and this is not allowed (assuming that there's no DEFINE: %{triple} in any of the local config files):

;  REDEFINE: %{triple} =

Makes sense!

Right.

llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
22–23

I extended the comments here to explain. TestingGuide.rst documents the restriction generally.

28–29

I would say so. If you use %{global:multiple-exact} somewhere, which value do you expect it to have?

33–34

Why not?

I extended the comments here to explain. TestingGuide.rst documents the restriction generally.

And what's the difference between %{global:multiple-once-exact} and <%{global:multiple-once-exact}>?

I'm not sure I understand the question. One pattern simply contains the other. Can you rephrase?

jdenny added inline comments.Sep 5 2022, 4:29 PM
llvm/utils/lit/tests/Inputs/shtest-define/examples/param-subst.txt
2

I mean to keep this example in sync with TestingGuide.rst, at least until the patch lands. Hopefully I'll get to that tomorrow.

jdenny updated this revision to Diff 458273.Sep 6 2022, 2:05 PM

Updated the example in the test suite to match the one in the documentation.

Thanks for addressing my comments! Here's a few more from me.

I've only skimmed through your changes in "TestRunner.py". I will try to take another look later. Unless @jhenderson approves in the meantime :)

llvm/docs/TestingGuide.rst
624–627

If you feel the sentence in question describes a real use case, I might just add the phrase "For example," before it so at least it doesn't imply that's the only use case.

I don't feel too strongly about this. To me it sounds like we are both unsure about the rationale for having %(line) there in the first place. For this reason, I would suggest either leaving that sentence intact or removing support for %(line) altogether (after having this proposed and accepted on Discourse). However, ....

(By the way, I do show an example of using non-relative line number substitutions in D133172.)

... there is going to be an in-tree example soon :)

I'm happy with whatever you decide here. Personally I'd consider removing %(line), but that's beyond the scope of this patch!

696–698

I've rewritten the example so that empty values work fine.

Nice, this is much clearer now. Thank you!

llvm/utils/lit/lit/TestRunner.py
1221

Docstring? Same for other classes/method.

1247

ScriptDirective.add_continuation takes only 3 arguments.

1315

[nit] You could use f-strings instead.

llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
28–29

My point is that this particular example could suggest that this usage is fine. I would add a comment explaining that this is bad practice and is only used here to facilitate testing.

jdenny updated this revision to Diff 458525.Sep 7 2022, 12:16 PM
jdenny marked 9 inline comments as done.

Applied reviewer suggestions:

  • Restored some existing %(line) documentation.
  • Added python docstrings.
  • Fixed parameter list on add_continuation, which was inconsistent between ScriptDirective and derived classes.
  • Switched to python f-strings.
  • Clarified that some substitutions in test code are discouraged.
llvm/docs/TestingGuide.rst
624–627

If you feel the sentence in question describes a real use case, I might just add the phrase "For example," before it so at least it doesn't imply that's the only use case.

I don't feel too strongly about this. To me it sounds like we are both unsure about the rationale for having %(line) there in the first place.

Agreed.

For this reason, I would suggest either leaving that sentence intact

I've restored the sentence with "For example".

or removing support for %(line) altogether (after having this proposed and accepted on Discourse). However, ....

(By the way, I do show an example of using non-relative line number substitutions in D133172.)

... there is going to be an in-tree example soon :)

I'm happy with whatever you decide here. Personally I'd consider removing %(line), but that's beyond the scope of this patch!

I don't want to remove %(line). I think the use case presented in D133172 has value. I currently use %(line) in that way in my downstream work. I imagine that usage will find its way upstream after these patches land.

Thanks for the discussion. Maybe someone who knows the history will see this and answer our questions.

696–698

Great. Thanks for the feedback. I like the result better too.

llvm/utils/lit/lit/TestRunner.py
1221

Done.

1247

Thanks for catching that. Fixed.

1315

I wasn't aware f-strings. Nice. I've changed all code modified/added by this patch to use those instead of format. Thanks.

llvm/utils/lit/tests/Inputs/shtest-define/lit.cfg
28–29

Good point. I've added a comment before all three of these sets of confusing definitions.

Thanks for adding comments in the Python script, that helps a lot! I've only skimmed through for now and have one small comment about the overall approach. I will try to take a look again later today.

llvm/utils/lit/lit/TestRunner.py
1223–1224

Through this class you are effectively re-introducing the notion of "script directives" in LIT, but this class is only meant for RUN, DEFINE and REDEFINE, right? What about XFAIL, REQUIRES or UNSUPPORTED?

If this is only meant for a sub-set of LIT directives, I suggest:

  • renaming the class as e.g. OrderedClassDirective (so that it is clear that this does not represent any LIT directive)
  • adding an assert in the constructor to make sure that self.keyword is one of RUN, DEFINE or REDEFINE.

An assert will document the code in a way that cannot be ignored by developers who decide to extend this class later ;-)

jdenny added inline comments.Sep 8 2022, 11:08 AM
llvm/utils/lit/lit/TestRunner.py
1223–1224

Through this class you are effectively re-introducing the notion of "script directives" in LIT, but this class is only meant for RUN, DEFINE and REDEFINE, right? What about XFAIL, REQUIRES or UNSUPPORTED?

Thanks for asking this. I agree the terminology here is confusing.

Even without this patch, "script" is overloaded to mean *at least* three different things in TestRunner.py:

  1. The original integrated test script that has all the directives. For example, parseIntegratedTestScriptCommands.
  2. The script containing the RUN directives extracted from 1. For example, the script variable in _parseKeywords, the require_script parameter to _parseKeywords and parseIntegratedTestScript, the script parameter to applySubstitutions.
  3. The shell script produced after expanding substitutions in 2. For example, executeScriptInternal, executeScript.

With this patch, 2 also includes DEFINE/REDEFINE directives. I chose ScriptDirective to refer to directives in 2 because that's the usage of "script" I saw most often in the code I was modifying.

Should we try to eliminate this overload generally in TestRunner.py? Here are some possible names:

  1. IntegratedTestScript
  2. UnexpandedScript
  3. ShellScript

Maybe there are better names. What do you think?

If this is only meant for a sub-set of LIT directives, I suggest:

  • renaming the class as e.g. OrderedClassDirective (so that it is clear that this does not represent any LIT directive)

I'd like to pick something that will fit the name we give script 2 above. If script 2 is OrderedScript, then OrderedDirective works. However, script 3 is also ordered. Script 1 is ordered even though the order doesn't matter for some parts. I think we need something more distinct. Whatever names we choose, I'll be sure to improve my python documentation accordingly (e.g., don't emphasize order so much).

  • adding an assert in the constructor to make sure that self.keyword is one of RUN, DEFINE or REDEFINE.

An assert will document the code in a way that cannot be ignored by developers who decide to extend this class later ;-)

Do you want asserts for the CommandDirective and SubstDirective constructors as well? Of course, SubstDirective.adjust_substitutions already asserts about keywords because its behavior is already known to vary based on the keyword.

awarzynski added inline comments.Sep 9 2022, 5:16 AM
llvm/utils/lit/lit/TestRunner.py
1223–1224

GENERAL DISCUSSION

Thanks for asking this. I agree the terminology here is confusing.

This is way deeper than I expected, thanks for the thorough explanation!

Should we try to eliminate this overload generally in TestRunner.py?

Yes, but probably in a different patch. This one is already quite long. Below are some of my thoughts based on your explanation.

Here are some possible names:

IntegratedTestScript
UnexpandedScript
ShellScript

Maybe there are better names. What do you think?

All in all, I would consider doing some refactoring it in two steps (based on my understanding, which might be incorrect):

  1. Clearly define what LIT directives are and what directives are supported (i.e. RUN, XFAIL, REQUIRES ...) . Ideally, every class that manipulates them should clearly define what directives it will process. In the case of ScriptDirective, it's RUN, DEFINE and REDEFINE, right?
  2. Hopefully, Step 1. will be a good source of inspiration for renaming things. That's when I would try to fix what "script" means in various classes.

I'm suggesting Step 1. as from your description it sounds like "script" in TestRunner.py refers to the command that follows a directive. At least in it's most general form. That wasn't clear to me at all.

As for Step 2., how about the following classification:

  • RawScript (base class),
  • ExpandableScript (specialization of RawScript, with substitutions either to be expanded or already expanded).

RENAMING

The refactoring discussed above should happen in a separate patch IMO. In this patch I would just focus on renaming ScriptDirective.

I'd like to pick something that will fit the name we give script 2 above. If script 2 is OrderedScript, then OrderedDirective works. However, script 3 is also ordered.

So what's unique about script 2 then? From your description, it sounds like OrderedScriptWithSubstitutions. Or something similar.

Do you want asserts for the CommandDirective and SubstDirective constructors as well?

Yes, it sounds like these should be only used with specific directives and IMO that should be very clear from the implementation.

Of course, SubstDirective.adjust_substitutions already asserts about keywords

True, but that's quite far from object creation. To me it sounds like SubstDirective should only work for specific directives. You can be strict about it and enforce it during object creation (which sends very clear message to class users and/or other developers) or defer it until that assumption becomes vital. Both approaches will work. I'm merely suggesting to express the intent more explicitly.

1388

I would move this assert to the constructor. In here you can add something like this instead:

if self.keyword != 'REDEFINE:':
  raise ValueError(f"Unsupported directive {self.keyword})")
jdenny updated this revision to Diff 459111.Sep 9 2022, 10:24 AM
jdenny marked 3 inline comments as done.
  • ScriptDirective -> ExpandableScriptDirective, and updated python docs accordingly.
  • Fixed and tested support for custom keywords in the case of DEFINE: and REDEFINE:
llvm/utils/lit/lit/TestRunner.py
1223–1224

GENERAL DISCUSSION

Should we try to eliminate this overload generally in TestRunner.py?

Yes, but probably in a different patch.

Agreed.

All in all, I would consider doing some refactoring it in two steps (based on my understanding, which might be incorrect):

I agree that steps like these would improve TestRunner.py. Implementing them myself is more than I want to sign up for right now, but I would be happy to review patches. I would like to finish the DEFINE/REDEFINE series first though.

In the case of ScriptDirective, it's RUN, DEFINE and REDEFINE, right?

Right.

it sounds like "script" in TestRunner.py refers to the command that follows a directive. At least in it's most general form. That wasn't clear to me at all.

I might be missing your point here, but that description doesn't match my view. The first two script forms I identified (the original test file, and the one with only RUN/DEFINE/REDEFINE directives) both have to distinguish directives by the directive keyword. Only the third script form has just shell commands and thus no directive keywords.

As for Step 2., how about the following classification:

  • RawScript (base class),

Might be fine. We could discuss more in a separate patch to implement the refactoring.

  • ExpandableScript (specialization of RawScript, with substitutions either to be expanded or already expanded).

ExpandableScript works for me.

RENAMING

The refactoring discussed above should happen in a separate patch IMO. In this patch I would just focus on renaming ScriptDirective.

Agreed.

I'd like to pick something that will fit the name we give script 2 above. If script 2 is OrderedScript, then OrderedDirective works. However, script 3 is also ordered.

So what's unique about script 2 then? From your description, it sounds like OrderedScriptWithSubstitutions. Or something similar.

I need to think more about whether the "ordered" property is really the right emphasis here. In my mind, any "script" is an ordered sequence, and google just showed me definitions of "script" that agree, so OrderedScript sounds redundant. Moreover, maybe directives not included in this script form will one day (or already do?) care about order in some way I haven't imagined.

You suggested ExpandableScript above. A consistent name for the directive would then be ExpandableScriptDirective. I think that succinctly captures which stage of the transformation this script and its directives represents. I went ahead and implemented that and updated the python documentation to reflect that name.

Maybe there are better names. I think we both have suggested there's a larger discussion and mostly orthogonal patch to work on, and the naming could be revisited then. Is that ok for now?

Do you want asserts for the CommandDirective and SubstDirective constructors as well?

Yes, it sounds like these should be only used with specific directives and IMO that should be very clear from the implementation.

I tried, but a test failure reminded me that some test suites define their own keywords in place of RUN:. See llvm/utils/lit/tests/unit/TestRunner.py. What if someone wants to do that for DEFINE: or REDEFINE:? I'm not sure whether anyone ever will, but it would be more consistent to permit it.

In other words, we need to go the other way: less strictness about keywords.

I implemented that by extending SubstDirective with a new field indicating whether it defines or redefines. Now, adjust_substitutions doesn't need to check the directive's keyword at all.

I also minimally extended llvm/utils/lit/tests/unit/TestRunner.py to check that custom keywords for DEFINE: and REDEFINE: work as expected. I could probably test much more here, but I haven't found much evidence that keyword customization is used much, so I'm not sure exhaustive testing is worthwhile.

1388

I've removed the assert entirely, as discussed in the other comment.

jdenny updated this revision to Diff 459124.Sep 9 2022, 10:54 AM

Made a small improvement to the testing I just added.

jdenny updated this revision to Diff 459227.Sep 9 2022, 4:24 PM
jdenny marked an inline comment as done.

Addressed a forgotten reviewer comment: coded the handling of whitespace after \ more consistently. Tested it more carefully.

llvm/utils/lit/lit/TestRunner.py
1270–1272

This comment was originally on the python documentation in SubstDirective.needs_continuation.

I've now addressed it by moving the rstrip call from _handleCommand to CommandDirective so that handling of whitespace stripping is consistently enscapsulated in the SubstDirective and CommandDirective classes. I've moved the python documentation to the parent class's needs_continuation.

Implementing them myself is more than I want to sign up for right now

Totally understandable!

it sounds like "script" in TestRunner.py refers to the command that follows a directive. At least in it's most general form. That wasn't clear to me at all.

I might be missing your point here, but that description doesn't match my view.

Sounds like our mental models are slightly different. This is the breakdown that derived from our discussion:

; RUN: <script>

<test input>

; <expected output>

To me, RUN: <script> is a "LIT directive" followed by a "shell script" (that may or may not be expandable). I feel that refactoring TestRunner.py to be strict about this distinction would make a lot of things easier. But perhaps I'm oversimplifying this? Also, another patch, another time :)

What if someone wants to do that for DEFINE: or REDEFINE:? I'm not sure whether anyone ever will, but it would be more consistent to permit it.

That sounds like a bit "exotic" use case that complicates a fair few things here, but I agree that consistency is key. Thanks for trying!

Having read and understood the code (thanks for bearing with me and all the extra context), I've skimmed through the documentation again. I've made a couple of minor suggestions, but nothing serious. It reads really well for me. Basically, I'm suggesting keeping the discussion on recursiveExpansionLimit and how it affects DEFINE and REDEFINE separate (perhaps a paragraph on "advanced usage" or "fine details").

Tl;Dr
I won't have any more comments after this. This is a very well written, thoroughly tested and documented patch. Many thanks for all the hard work! It's been a rather long discussion and I really appreciate you seeing this through. This is going to make our testing infrastructure much more powerful and I suspect that LLVM folks will be adopting this in their testing soon :)

llvm/docs/TestingGuide.rst
680

Hm, in practice it's "lit.cfg.py" and "lit.local.cfg.py", but all LIT docs seem to drop the extension 🤔 .

898

[nit] I'd be tempted to add "(e.g. lit.cfg or lit.local.cfg)" To avoid any potential ambiguity.

902–906

"confusing" and "insufficient" are a bit ambiguous and vague. Sometimes less is more and I would skip this point. Instead, I would mention above (after "There's a simple way to remember the required definition order in a test file: define a substitution before you refer to it.") that "This behavior can be overridden by specifying recursiveExpansionLimit). An example would also be super helpful! You could use the contents of "recursiveExpansionLimit.txt".

907–909
923–924

[nit] I would remove this part to keep this short and to the point

Sounds like our mental models are slightly different. This is the breakdown that derived from our discussion:

It sounds like this is just a difference in the way we're using the terminology. Clear terminology is indeed important, but...

Also, another patch, another time :)

.... yes, I think it's reasonable to defer.

Basically, I'm suggesting keeping the discussion on recursiveExpansionLimit and how it affects DEFINE and REDEFINE separate (perhaps a paragraph on "advanced usage" or "fine details").

Thanks for those comments. I'll pursue them soon.

I won't have any more comments after this.

But feel free if something comes up. Once you've hit accept, I'll likely stall for about a week anyway to give other reviewers a chance to catch up.

This is a very well written, thoroughly tested and documented patch. Many thanks for all the hard work! It's been a rather long discussion and I really appreciate you seeing this through. This is going to make our testing infrastructure much more powerful and I suspect that LLVM folks will be adopting this in their testing soon :)

Thanks for saying so, and thank you for your review, which has definitely improved the patch. I appreciate thorough reviews, especially on patches affecting LLVM's testing infrastructure as they can easily impact all LLVM developers.

llvm/docs/TestingGuide.rst
680

Hm, in practice it's "lit.cfg.py" and "lit.local.cfg.py", but all LIT docs seem to drop the extension 🤔 .

All four are used. Lit even permits some customization (see --config-prefix). So these are just common examples.

680

You suggested dropping "test/" here. I assume "test/" was meant to hint that lit.cfg is typically at the root directory of a test suite. Maybe we should add a corresponding hint for lit.local.cfg, as in my above suggestion. Does that help?

923–924

By "this part", do you mean "even if `recursiveExpansionLimit` is not used" or the whole sentence starting with "Thus"?

awarzynski added inline comments.Sep 11 2022, 12:34 PM
llvm/docs/TestingGuide.rst
680

In general, I feel that adding "test/" might be a bit confusing. What about test sub-directories like "<project-root-dir>/test/Lowering/" in which case "Lowering/lit.local.cfg" could be more accurate than "test/lit.local.cfg". That's why I would drop "test/" in "test/lit.local.cfg". If you do want to keep it, then ...

Maybe we should add a corresponding hint for lit.local.cfg, as in my above suggestion.

+ 1

923–924

Yes, "even if `recursiveExpansionLimit`". I was hoping that Phabricator would highlight it.

jdenny updated this revision to Diff 459397.Sep 11 2022, 9:05 PM
jdenny marked 6 inline comments as done.

Addressed reviewer concerns:

  • Relocated and expanded recursiveExpansionLimit documentation with an example in terms of DEFINE/REDEFINE.
  • Trimmed other discussions of recursiveExpansionLimit.
  • Drop "test/" from "(e.g., test/lit.cfg or lit.local.cfg)".
llvm/docs/TestingGuide.rst
898

There are now several mentions of lit configuration files between here and the original description, which has "(e.g., lit.cfg or lit.local.cfg)". To me, it seems arbitrary to give examples at this particular mention. See how it reads to you now. If you still feel it would be helpful, I'll add it.

902–906

Instead, I would mention above (after "There's a simple way to remember the required definition order in a test file: define a substitution before you refer to it.") that "This behavior can be overridden by specifying recursiveExpansionLimit).

Because this bulleted list is meant to be a general description of substitution behavior, I feel it's better to keep this point here than to complicate the introductory example with it. However, I did abbreviate the description and append it to the bullet that describes the default single pass over the substitution list, where it seems like a natural fit.

An example would also be super helpful! You could use the contents of "recursiveExpansionLimit.txt".

Done. I'm using that to replace the original recursiveExpansionLimit documentation.

awarzynski accepted this revision.Sep 12 2022, 2:44 AM

LGTM!

Great job, thank you!

This revision is now accepted and ready to land.Sep 12 2022, 2:44 AM

I've run out of time to look at this further for now, but hopefully should be able to go through the rest of the testing later in the week.

llvm/docs/TestingGuide.rst
693–694

The part from "as well as" doesn't read well for me. I think it's probably clearer to rephrase as "... to define %{cflags} and %{fcflags} substitutions with empty initial values, which serve as the parameters of another newly-defined %{check} substitution:". WDYT?

776–777

On first read, I thought the reference to patterns containing %{name} meant something like %{foo} couldn't be referenced with a definition/redefinition if %{afooa} existed. Is it actually meant to be referring to something like %{a%{foo}a}? If so, I feel like an example here could be useful. If not (i.e. %{foo} can't be used if e.g. %{afooa} is defined), I find this to be a problem, as I think it is not unreasonable to have e.g. `%{check} and %{check2} in the same substitution set.

796–797

Would it be clearer here to refer to the config files, since I believe that's the only way to specify an "illegal" pattern? Something like "However, substitution patterns defined in configuration files are not restricted to this form...".

822–823

I'm not sure the bit about "against the advice" makes a huge amount of sense, since the previous section explained the situation and what happens, I thought? I'd just omit the bit between the commas, personally.

832–833

I'd suggest replacing the first "Because" with something else, e.g. "Since", to avoid the double because in this sentence.

857
llvm/utils/lit/lit/TestRunner.py
1277

Are there any cases where this isn't a RUN directive? Similarly, are there any cases where a RUN directive isn't this? If not to both, I'd consider renaming to RunDirective so that it marries up with the actual name used in the test files.

1302

I'd suggest this slight addition.

1329

I think this is strictly more correct, which could be important in some edge cases, I guess.

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-run-redefine.txt
4–5

There's nothing particularly wrong with this, but why the two run lines in this case (unlike the define case)?

jdenny updated this revision to Diff 459478.Sep 12 2022, 9:00 AM
jdenny marked 10 inline comments as done.

Addressed reviewer comments:

  • Applied various wording improvements in the documentation.
  • Adjusted a test for consistency.
jdenny marked an inline comment as done and an inline comment as not done.Sep 12 2022, 9:05 AM
llvm/docs/TestingGuide.rst
776–777

On first read, I thought the reference to patterns containing %{name} meant something like %{foo} couldn't be referenced with a definition/redefinition if %{afooa} existed. Is it actually meant to be referring to something like %{a%{foo}a}?

The full %{name} is the substitution's pattern, so the latter is what I meant.

If so, I feel like an example here could be useful.

Done.

796–797

The other possibility I was thinking of is a substitution always defined by lit itself. For example, it's best not to define a substitution whose pattern starts with %s.

That's relevant to the first part of the bullet (why braces are required here). However, as far as I know, it's not currently relevant to the second part (I know of no existing lit-defined substitution whose pattern would overlap with the braced form). Lit evolves constantly, so that could change. For example, what if someone thinks %{name([0-9]*)} is a reasonable pattern (where parens are capturing)?

I've expanded "not defined by these directives" to mention both sources of substitutions. Let me know if that's not clear enough.

822–823

I'm not sure the bit about "against the advice" makes a huge amount of sense, since the previous section explained the situation and what happens, I thought? I'd just omit the bit between the commas, personally.

822–823

Done.

I was trying to make sure the reader knows this example is not the recommended way to write directives. Instead, I've added a comment next to the directives, which should help anyone skimming the documentation for examples and not reading the text.

llvm/utils/lit/lit/TestRunner.py
1277

Are there any cases where this isn't a RUN directive?

Yes. See MY_RUN in llvm/utils/lit/tests/Inputs/testrunner-custom-parsers/test.txt and llvm/utils/lit/tests/unit/TestRunner.py.

I chose "CommandDirective" for consistency with _handleCommand and ParserKind.COMMAND.

Similarly, are there any cases where a RUN directive isn't this?

I don't think so.

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-run-redefine.txt
4–5

I've forgotten. I think it was a lazy way to make sure this check is performed when continuation lines are themselves continued.

For now, I've made them consistent. I'll think about the right way to check that case separately.

jdenny marked an inline comment as done.Sep 12 2022, 9:16 AM
jdenny added inline comments.
llvm/docs/TestingGuide.rst
768

Ah, I forgot brackets are special in patterns. I'll escape them.

jdenny marked an inline comment as not done.Sep 12 2022, 9:17 AM

Mostly just comment nitpicks from me. I don't really have time to think about test coverage, so that's about it from me, aside from any follow-up comments.

llvm/utils/lit/lit/TestRunner.py
1277

To clarify, I meant, any cases other than if someone specifies a different name for the "RUN" directive explicitly. Given that that is a specific overriding of the default behaviour, I think my suggestion still stands, but up to you (I don't feel strongly about it).

llvm/utils/lit/tests/Inputs/shtest-define/expansion-order.txt
2

Not sure I follow the bit after the colon? Are you saying that %{global:greeting} is the parameter for %{global:echo}? If so, I'd just move the name to between "the" and "parameter" earlier in the line.

llvm/utils/lit/tests/Inputs/shtest-define/ws-and-continuations.txt
2

Just wondering if it's worth a test case with two adjacent substitutions without separating whitespace? I.e. something like %{foo}%{bar}? I'd expect both to expand, with no separating whitespace added. Not sure where the test case would belong.

9

Nit: whitespace is usually all one word, right?

19

In this comment you're being inconsistent :)

69

What's with the "..."?

llvm/utils/lit/tests/shtest-define.py
12–13

Nit: a slightly more natural flow would be to have the record-test substitution after the run-test one, since that is the order they're used in.

jdenny updated this revision to Diff 460800.Sep 16 2022, 9:44 AM
jdenny marked 7 inline comments as done.

Addressed remaining comments:

  • Fixed an improperly escaped pattern in the docs.
  • Extended tests a little, polished their comments, etc.

I also extended ws-and-continuations.txt with a couple checks that somehow ended up in D133172 but make sense here.

llvm/utils/lit/lit/TestRunner.py
1277

I'm not sure if it matters to your point, but MY_RUN doesn't override RUN. It's an additional keyword parsed as a COMMAND.

In any case, I feel more comfortable using the term Command because that's the precedent already set by _handleCommand and ParserKind.COMMAND. If we want to rethink that terminology, I feel like that should be a separate patch.

llvm/utils/lit/tests/Inputs/shtest-define/errors/continuation/unterminated-run-redefine.txt
4–5

I decided it's not worthwhile to roughly double the number of unterminated tests in order to check every combination of conditions. Instead, I added only two to offer some assurance that unterminated continuation lines are checked for the new directives added by this patch:

  • unterminated-define-continuation.txt
  • unterminated-redefine-continuation.txt
llvm/utils/lit/tests/Inputs/shtest-define/expansion-order.txt
2

Done.

llvm/utils/lit/tests/Inputs/shtest-define/ws-and-continuations.txt
2

I've added a test to this file.

9

I've seen it written both ways, but I should at least be consistent within the same code. Thanks.

19

Oops. :-)

69

I've reworded.

llvm/utils/lit/tests/shtest-define.py
12–13

Done.

jhenderson accepted this revision.Sep 20 2022, 12:43 AM

LGTM, thanks!

@jhenderson: Thanks for the review!

All: There has been no discussion so far on the directive name collisions, as described in the review summary:

One issue is that the strings DEFINE: and REDEFINE: already appear in 5 tests. This patch adjusts those tests not to use those strings. Alternatively, maybe lit keywords from now on should start with LIT- to avoid collisions. For this patch, we might use LIT-DEF: and LIT-REDEF:.

If I don't hear any opinions on this within a couple days, I'll go ahead and land the patch with the current names (DEFINE: and REDEFINE:). Thanks.

@jhenderson: Thanks for the review!

All: There has been no discussion so far on the directive name collisions, as described in the review summary:

One issue is that the strings DEFINE: and REDEFINE: already appear in 5 tests. This patch adjusts those tests not to use those strings. Alternatively, maybe lit keywords from now on should start with LIT- to avoid collisions. For this patch, we might use LIT-DEF: and LIT-REDEF:.

If I don't hear any opinions on this within a couple days, I'll go ahead and land the patch with the current names (DEFINE: and REDEFINE:). Thanks.

My personal opinion is that it's fine as long as there is pretty much zero risk of a downstream user running into spurious passes due to the change from a FileCheck directive to a lit command. In 99% of cases, I think it'll likely fail due to an invalid definition, so I think we're good, personally. (Plus strictly, LIT-DEF could already be in use downstream anyway).

My personal opinion is that it's fine as long as there is pretty much zero risk of a downstream user running into spurious passes due to the change from a FileCheck directive to a lit command. In 99% of cases, I think it'll likely fail due to an invalid definition, so I think we're good, personally. (Plus strictly, LIT-DEF could already be in use downstream anyway).

I agree, but I wasn't confident. Thanks for giving a second opinion.

OK, I'll try to land tomorrow provided no one else comments on the patch.

jdenny edited the summary of this revision. (Show Details)Sep 21 2022, 8:33 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 21 2022, 8:33 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

It was a small battle to get the tests past the windows bots, but they eventually succeeded:

https://lab.llvm.org/buildbot/#/builders/216/builds/10125

I pushed the following additional patches while trying to fix the tests for windows:

b9735db6464cf0bb4f60077b9f5495c0fa372ecf adds a fixme and workaround for the issue that echo is doubling backslashes under windows. I don't think that's specific to the DEFINE/REDEFINE implementation, and my hunch is that it's not a critical issue. It would take me a while to produce a windows build, so I won't investigate that further anytime soon. If someone else wants to, please feel free.

If anything looks objectionable, I can amend further or revert the whole series.

I've skimmed through - these fixes make sense to me. Thank you for the quick summary!

Thanks for reviewing!