This is an archive of the discontinued LLVM Phabricator instance.

[lit] Add %{for-each-file} substitution
Needs ReviewPublic

Authored by Endill on May 18 2023, 3:40 AM.

Details

Summary

This substitution effectively duplicated enclosing RUN command for each file in the given path. Accepts wildcards.

In order to keep implementation simple, only a single such substitution per command is supported. It is possible to support multiple such substitution via cartesian product, if deemed necessary.

Use case for this is upcoming refactoring of C++ DR tests structure to avoid tests interfering with each other. Currently, tests are grouped by 100 per file, and have a RUN command per C++ Standard revision (for a total of 7 with the recent addition of c++2c mode). The plan is to rely heavily on split-file to invoke the compiler individually for each test, but avoid duplicating RUN commands for each of them. More details in Discourse thread.

Diff Detail

Event Timeline

Endill created this revision.May 18 2023, 3:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 3:40 AM
Herald added a subscriber: delcypher. · View Herald Transcript
Endill requested review of this revision.May 18 2023, 3:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2023, 3:40 AM

Adding a few more reviewers because it's always a challenge to get someone to sign off on lit changes. :-)

I don't have enough Python knowledge to review this well, but I support the idea. This makes it easier to do a split-file without repeating a bunch of RUN lines, which I think is a nice win in terms of expressivity for our DR tests.

rnk added inline comments.May 18 2023, 12:38 PM
llvm/utils/lit/lit/TestRunner.py
1654

Surely there must be a simpler way to do this. One could turn the match iterator into a list (list(match_iter)) and raise an error if the length is more than 1.

1663

typo absolute

1692

Depending on whether we've done a for each file substitution, this != will now do very different things. I recommend making the return type of processLine uniform one way or another, probably to always return a list of strings.

1717–1723

I recommend returning a list for simplicity, and always extending.

Endill updated this revision to Diff 523911.May 19 2023, 1:20 PM
Endill edited the summary of this revision. (Show Details)

Trying to work with python iterators the correct (lazy) way is just too cumbersome. Replace it with something more conventional.
Also add a link to related Discourse thread in the description.

Endill updated this revision to Diff 523998.May 20 2023, 12:22 AM
  • Fixed typo
  • Get substituteForEachFile() out of processLine() to avoid dealing with recursion limits

Thank you for the review!

llvm/utils/lit/lit/TestRunner.py
1654

Done!

1663

Fixed

1692

I haven't found an easy way to make this algorithm work with processLine() that could generate additional directives. Given that %{for-each-file} is not expected to contribute anything that could be substituted (I don't see substitutions coming from file paths), I decided to opt out of processLine() entirely, and substitute %{for-each-file} as the last step.

1717–1723

substituteForEachFile() always returns a list now.

I think the code looks good, but I would like to see more consensus that we want to add this feature to the lit macro substitution language. I have to run right now, but who else has a stake in the lit language? +@dblaikie @MaskRay

Can you add some tests? Check out git log -- llvm/utils/lit/tests/ for recent changes.
D132513 is a recent major feature (DEFINE and REDEFINE). Reading its discussions may give some clue about how to add tests.

Can you add some tests? Check out git log -- llvm/utils/lit/tests/ for recent changes.
D132513 is a recent major feature (DEFINE and REDEFINE). Reading its discussions may give some clue about how to add tests.

+1 to this. I'm struggling to envision how this will look in the lit file, which is in turn preventing me from properly providing my own feedback on the feature.`

llvm/utils/lit/lit/TestRunner.py
1532

If you haven't already, please make sure to run black to format your new python code.

1651–1652

Nit: here and with the other error messages, it probably makes sense to follow the standard LLVM coding standard for warnings and errors (no leading capital or trailing full stop).

Endill updated this revision to Diff 528117.Jun 3 2023, 10:02 AM
  • Add tests
  • Fix broken check for relative paths
  • Emit an error when %{for-each-file} doesn't match anything
  • Run formatter
  • Fix style of error messages
Endill marked 2 inline comments as done.Jun 3 2023, 10:11 AM

+1 to this. I'm struggling to envision how this will look in the lit file, which is in turn preventing me from properly providing my own feedback on the feature.`

In C++ defect report tests, we're going to use split-file to split single e.g. dr1xx.cpp into 100 smaller files, and get each of them through every of the (currently) 7 RUN directives specified in dr1xx.cpp (one per C++ revision). Before and after comparison here: https://gist.github.com/Endilll/1d8035d42bdf0dc73eaaf8a8c343bf8c/revisions?diff=unified. I hope that clarifies the intended usage.

So I definitely support the overall goal of this patch, but I'm not convinced by the syntax, if I'm honest. It seems somewhat unintuitive that the %{for-each-file ...} syntax impacts the whole line it appears on, yet its position is important.

I don't have an exact syntax in mind, but it feels to me like a new lit directive might be more appropriate to indicate that the line should be run multiple times in some manner. It would probably need to involve a new substitution too, admittedly, which indicates where to add the file name in the command. Something along the lines of RUN-ALL-FILES-%t: cat %f. Sorry, I don't really have the time right now to give a more thought out example that dictates how to specify the file location.

llvm/docs/CommandGuide/lit.rst
559

It's completely unclear what "Accepts wildcards" means in this context. I wonder if an example might be appropriate.

So I definitely support the overall goal of this patch, but I'm not convinced by the syntax, if I'm honest. It seems somewhat unintuitive that the %{for-each-file ...} syntax impacts the whole line it appears on, yet its position is important.

I don't have an exact syntax in mind, but it feels to me like a new lit directive might be more appropriate to indicate that the line should be run multiple times in some manner. It would probably need to involve a new substitution too, admittedly, which indicates where to add the file name in the command. Something along the lines of RUN-ALL-FILES-%t: cat %f. Sorry, I don't really have the time right now to give a more thought out example that dictates how to specify the file location.

FWIW, I'm not certain I agree with something long the lines of a new RUN syntax. The reason I favor the current %{for-each-file} approach is that it works well for more complex scenarios and it's extensible. The RUN line approach looks really reasonable when the RUN line is simple, but DR testing will often have fairly complex RUN lines that I'm not certain would be improved by a RUN-ALL-FILES (for example, complex -verify prefixes). But more interestingly, the %{for-each-file} approach means we can do a cartesian product of options on a RUN line, which is both terrifying and powerful. e.g., // RUN: %clang_cc1 %{for-each-file %S} -include %{for-each-file Inputs/} to generate a RUN line that's the cross product of all the files and all the includes. We don't support that today in the current patch, but it's possible to support in the future if needed. (This may or may not even generalize to something more powerful using DEFINE and REDEFINE syntax to define "sets" of arbitrary command line options (not just files) that get for-eached on a single RUN line. But I'm not suggesting we go that far with the feature right now, by any means.)

That said, I'm sympathetic that it's a bit weird that a random replacement somewhere in the RUN line means the whole RUN line is duplicated. But I think this functionality will be used in a limited enough way that it won't be overly confusing for long (people who work on tests likely to use this function are most often not writing one-off tests but instead working on a bunch of tests in the same area).

WDYT?

Your point about the potential for cartesian products is fair, and mentioning things like DEFINE and REDEFINE made me wonder whether we could simply have a special lit suffix (or prefix), a bit like FileCheck suffixes, that simply tells lit to expand foreach directives. In that way, a user can see that the line will be potentially invoked multiple times without having to spot the substitution in the line Something like RUN-FOREACH (which could later be extrapolated as DEFINE-FOREACH etc).

FWIW, I can see a lot of value for "for each" style syntax in various tests I've written, although it's not always about the input file, and may even be about a sequence of invocations. However, if we start following that chain of thought, we end up making lit directives a Turing-complete language (after all, if you want a generic "FOR", you'll probably quickly want "IF" etc to handle special cases), and I'm not convinced we want to fall down that hole.

Your point about the potential for cartesian products is fair, and mentioning things like DEFINE and REDEFINE made me wonder whether we could simply have a special lit suffix (or prefix), a bit like FileCheck suffixes, that simply tells lit to expand foreach directives. In that way, a user can see that the line will be potentially invoked multiple times without having to spot the substitution in the line Something like RUN-FOREACH (which could later be extrapolated as DEFINE-FOREACH etc).

FWIW, I can see a lot of value for "for each" style syntax in various tests I've written, although it's not always about the input file, and may even be about a sequence of invocations. However, if we start following that chain of thought, we end up making lit directives a Turing-complete language (after all, if you want a generic "FOR", you'll probably quickly want "IF" etc to handle special cases), and I'm not convinced we want to fall down that hole.

We can do it the following way:

// RUN: rm -rf %t && mkdir %t
// RUN: split-file --allow-comments --add-file-extension=cpp %s %t
// DEFINE-FOR-EACH-FILE: %{file} = %t
// RUN: %clang_cc1 -std=c++11 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors
// RUN: %clang_cc1 -std=c++11 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors
// RUN: %clang_cc1 -std=c++14 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors
// RUN: %clang_cc1 -std=c++17 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors
// RUN: %clang_cc1 -std=c++20 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors
// RUN: %clang_cc1 -std=c++23 %{file} -verify -fexceptions -fcxx-exceptions -pedantic-errors

From there, we can disallow RUN lines with FOR-EACH defines on syntax level, requiring something along the lines of RUN-FOR-EACH instead. Though I don't have a strong opinion whether we should.
It also opens a way to DEFINE-FOR-EACH with list of arbitrary strings, which Aaron mentioned. Seems useful, but I'm not ready yet to advocate that we need this for DR testing, even for language modes.

I understand your concern about moving towards Turing-completeness, but we're not changing anything for majority of tests that are fine with handful of RUN directives. Those Turing-complete features are going to replace dozens of RUN lines (I can image us running codegen DR tests for many triples in each language mode), and I don't see simplicity of lit serving us well in those important use cases.

llvm/docs/CommandGuide/lit.rst
559

Would it be more clear if I explicitly mention asterisk and double asterisk as allowed in path?

For-each is one of the major use cases originally targeted by DEFINE/REDEFINE.  In the C++ DR scenario, the data set to iterate (the split-out files) is already harcoded in the lit test file.  Without any new lit features, I believe you can use DEFINE/REDEFINE now to iterate that data set as follows:

// RUN: split-file %s %t
// DEFINE: %{file} =
// DEFINE: %{run} = \
// DEFINE:     cmd1 %t/%{file} && \
// DEFINE:     cmd2 && cmd3 && cmd4 

//--- dr100.cpp
// REDEFINE: %{file} = dr100.cpp
// RUN: %{run}
contents of dr100.cpp 

//--- dr101.cpp
// REDEFINE: %{file} = dr101.cpp
// RUN: %{run}
contents of dr101.cpp

Especially from the perspective of lit maintenance, that solution seems simpler to me than inventing additional for-each mechanisms, such as %{for-each-file}.  Moreover, it seems more flexible.  For example, some split-out files could be visited by a different command list, or some could be left out of the iteration entirely in order to serve some sort of auxiliary role in the test.

If anyone is bothered by the repetition of each file name, then perhaps someone should invent a built-in lit substitution that expands to the most recent split-out file name.  Maybe %{split-file-name}. In each RUN line, it should expand after all custom substitutions.  Then you could have:

// RUN: split-file %s %t
// DEFINE: %{run} = \
// DEFINE:     cmd1 %t/%{split-file-name} && \
// DEFINE:     cmd2 && cmd3 && cmd4 

//--- dr100.cpp
// RUN: %{run}
contents of dr100.cpp 

//--- dr101.cpp
// RUN: %{run}
contents of dr101.cpp

For-each is one of the major use cases originally targeted by DEFINE/REDEFINE.  In the C++ DR scenario, the data set to iterate (the split-out files) is already harcoded in the lit test file.  Without any new lit features, I believe you can use DEFINE/REDEFINE now to iterate that data set as follows:

// RUN: split-file %s %t
// DEFINE: %{file} =
// DEFINE: %{run} = \
// DEFINE:     cmd1 %t/%{file} && \
// DEFINE:     cmd2 && cmd3 && cmd4 

//--- dr100.cpp
// REDEFINE: %{file} = dr100.cpp
// RUN: %{run}
contents of dr100.cpp 

//--- dr101.cpp
// REDEFINE: %{file} = dr101.cpp
// RUN: %{run}
contents of dr101.cpp

Especially from the perspective of lit maintenance, that solution seems simpler to me than inventing additional for-each mechanisms, such as %{for-each-file}.

I don't think we're inventing additional iteration mechanisms here, because manual loop unrolling is anything but an iteration mechanism.

Moreover, it seems more flexible.  For example, some split-out files could be visited by a different command list, or some could be left out of the iteration entirely in order to serve some sort of auxiliary role in the test.

When we need such flexibility, we move the test out of drNxx.cpp into separate file. It's been working for us well for quite a while, so we're unlikely to ask lit to support such flexibility on its side.

Anyway, I went ahead and transformed dr5xx.cpp according to your ideas: https://gist.github.com/Endilll/d0099df11e532fb2a7ef7ee34f3cbf21/revisions?diff=split
Line count went up from ≈1k to ≈1.5k.
Note that total C++ CWG DR count approaches 3000, so this file is about 3% of full test suite we're going to have at some point.
We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better. To be fully honest, it doesn't work today, but chaining compiler invocations would mean to give up on that entirely.

If anyone is bothered by the repetition of each file name, then perhaps someone should invent a built-in lit substitution that expands to the most recent split-out file name.  Maybe %{split-file-name}. In each RUN line, it should expand after all custom substitutions.  Then you could have:

// RUN: split-file %s %t
// DEFINE: %{run} = \
// DEFINE:     cmd1 %t/%{split-file-name} && \
// DEFINE:     cmd2 && cmd3 && cmd4 

//--- dr100.cpp
// RUN: %{run}
contents of dr100.cpp 

//--- dr101.cpp
// RUN: %{run}
contents of dr101.cpp

I'm sympathetic to the intent, but I don't think we should introduce interactions between split-file and lit.

For-each is one of the major use cases originally targeted by DEFINE/REDEFINE.  In the C++ DR scenario, the data set to iterate (the split-out files) is already harcoded in the lit test file.  Without any new lit features, I believe you can use DEFINE/REDEFINE now to iterate that data set as follows:

// RUN: split-file %s %t
// DEFINE: %{file} =
// DEFINE: %{run} = \
// DEFINE:     cmd1 %t/%{file} && \
// DEFINE:     cmd2 && cmd3 && cmd4 

//--- dr100.cpp
// REDEFINE: %{file} = dr100.cpp
// RUN: %{run}
contents of dr100.cpp 

//--- dr101.cpp
// REDEFINE: %{file} = dr101.cpp
// RUN: %{run}
contents of dr101.cpp

Especially from the perspective of lit maintenance, that solution seems simpler to me than inventing additional for-each mechanisms, such as %{for-each-file}.

I don't think we're inventing additional iteration mechanisms here, because manual loop unrolling is anything but an iteration mechanism.

Moreover, it seems more flexible.  For example, some split-out files could be visited by a different command list, or some could be left out of the iteration entirely in order to serve some sort of auxiliary role in the test.

When we need such flexibility, we move the test out of drNxx.cpp into separate file. It's been working for us well for quite a while, so we're unlikely to ask lit to support such flexibility on its side.

Anyway, I went ahead and transformed dr5xx.cpp according to your ideas: https://gist.github.com/Endilll/d0099df11e532fb2a7ef7ee34f3cbf21/revisions?diff=split
Line count went up from ≈1k to ≈1.5k.
Note that total C++ CWG DR count approaches 3000, so this file is about 3% of full test suite we're going to have at some point.
We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better. To be fully honest, it doesn't work today, but chaining compiler invocations would mean to give up on that entirely.

If anyone is bothered by the repetition of each file name, then perhaps someone should invent a built-in lit substitution that expands to the most recent split-out file name.  Maybe %{split-file-name}. In each RUN line, it should expand after all custom substitutions.  Then you could have:

The pain to run a test with many -std= values has some discussion on https://discourse.llvm.org/t/lit-run-a-run-line-multiple-times-with-different-replacements/64932
I do feel that a Cartesian product feature can be useful but also that it can be over-powered to be misused...

I don't think we're inventing additional iteration mechanisms here, because manual loop unrolling is anything but an iteration mechanism.

The DEFINE/REDEFINE solution I proposed is not manual loop unrolling. The test author manually specifies the loop body only once.

Anyway, I went ahead and transformed dr5xx.cpp according to your ideas: https://gist.github.com/Endilll/d0099df11e532fb2a7ef7ee34f3cbf21/revisions?diff=split
Line count went up from ≈1k to ≈1.5k.

For others skimming these comments, that high line count is because of the avoidance of &&, as discussed below. And then it nearly is manual loop unrolling.

We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better.

Can you explain that a bit more or point to a reference? I find lit -vv makes it pretty easy to understand what commands are executed and what output goes with each.

I'm sympathetic to the intent, but I don't think we should introduce interactions between split-file and lit.

Why? My recollection is that split-file was designed to be used with lit. Anyway, that suggestion is not critical to the solution. It just saves one line and a small repetition.

We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better.

Can you explain that a bit more or point to a reference? I find lit -vv makes it pretty easy to understand what commands are executed and what output goes with each.

Using && means that an earlier process failure (nonzero return) causes later commands to not be run at all, right?

We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better.

Can you explain that a bit more or point to a reference? I find lit -vv makes it pretty easy to understand what commands are executed and what output goes with each.

Using && means that an earlier process failure (nonzero return) causes later commands to not be run at all, right?

Yes. Lit does that anyway when commands appear on separate RUN lines.

We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better.

Can you explain that a bit more or point to a reference? I find lit -vv makes it pretty easy to understand what commands are executed and what output goes with each.

Using && means that an earlier process failure (nonzero return) causes later commands to not be run at all, right?

Yes. Lit does that anyway when commands appear on separate RUN lines.

Ah, I didn't know lit was already doing that, thanks!

My primary concern is how easy it will be for someone to extend these tests. The current proposed approach is quite easy: take a RUN line, copy and paste it, add your new option and it all just works. With using DEFINE and REDEFINE, you're forced to think about how lit works a *lot* more and it looks like you have to write a lot more RUN lines overall, at least as best I can tell. @jdenny, it might be helpful if you were to take the existing dr5xx.cpp file and show us what you believe it would end up looking like with your approach. I can understand what it will look like with the approach @Endill is suggesting, but I'm not certain I've seen a correct demonstration of how your approach works (and I don't want to judge it unfairly).

My primary concern is how easy it will be for someone to extend these tests.

Under my proposal, you just extend a %{run} definition. You don't have to modify the individual split-out files again. Moreover, if you're using the same commands in many lit test files, you could define %{run} in a lit config file.

@jdenny, it might be helpful if you were to take the existing dr5xx.cpp file and show us what you believe it would end up looking like with your approach.

https://gist.github.com/jdenny-ornl/82aae9c3706c88fedebc8c44ac2a6694

We're also can't chain compiler invocations via &&: RUN: %{run98} && %{run11} && %{run14} && %{run17} && %{run20} && %{run23}, because having dedicated stdout log per each language mode makes DR test debugging workflow much better.

Can you explain that a bit more or point to a reference? I find lit -vv makes it pretty easy to understand what commands are executed and what output goes with each.

One of the most common pitfalls is to forget to guard a test with #if __cplusplus. I commented one of those in dr13xx.cpp, and changed order of language modes to avoid special case of first command failing. That's what I currently get from ninja check-clang-cxx-drs (and would get from a buildbot, for that matter):

******************** TEST 'Clang :: CXX/drs/dr13xx.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++17 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
: 'RUN: at line 2';   /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++14 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
: 'RUN: at line 3';   /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++11 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
: 'RUN: at line 4';   /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++98 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
--
Exit Code: 1

Command Output (stderr):
--
error: 'error' diagnostics seen but not expected: 
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 212: a type specifier is required for all declarations
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 212: no member named 'g' in 'dr1330::B<dr1330::P>'
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 214: a type specifier is required for all declarations
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 214: no member named 'g' in 'dr1330::B<dr1330::Q>'
4 errors generated.

--

********************

Thank you for mentioning -vv, because it resolves that for local development at least. Surprisingly, it helps even with chained compiler invocations:

******************** TEST 'Clang :: CXX/drs/dr13xx.cpp' FAILED ********************
Script:
--
: 'RUN: at line 1';   /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++17 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors && /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++14 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors && /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++98 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors && /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++11 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
--
Exit Code: 1

Command Output (stderr):
--
+ : 'RUN: at line 1'
+ /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++17 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
+ /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++14 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
+ /home/user/endill/llvm-project/build-linux/bin/clang -cc1 -internal-isystem /home/user/endill/llvm-project/build-linux/lib/clang/17/include -nostdsysteminc -std=c++98 /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp -verify -fexceptions -fcxx-exceptions -pedantic-errors
error: 'error' diagnostics seen but not expected: 
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 209: a type specifier is required for all declarations
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 209: no member named 'g' in 'dr1330::B<dr1330::P>'
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 211: a type specifier is required for all declarations
  File /home/user/endill/llvm-project/clang/test/CXX/drs/dr13xx.cpp Line 211: no member named 'g' in 'dr1330::B<dr1330::Q>'
4 errors generated.

--

********************

Irrespective of what happens to this patch, I'd like failed DR test to be diagnosed in -vv mode by default. It's very helpful, and lowers the bar for newcomers (we advertise DR tests as an activity for them).

https://gist.github.com/jdenny-ornl/82aae9c3706c88fedebc8c44ac2a6694

So we're looking at 20% more noise in a large test suite. I'm not happy with that either.

https://gist.github.com/jdenny-ornl/82aae9c3706c88fedebc8c44ac2a6694

So we're looking at 20% more noise in a large test suite. I'm not happy with that either.

When comparing the character counts for the solution using %{for-each-file} and the solution above using DEFINE/REDEFINE, I see a 9% increase for the latter.

But what does that number really mean anyway? It boils down to the following. When formatting the DR tests as lit tests, the %{for-each-file} solution requires adding the following header to each DR test:

//--- dr500: dup 372

where the DEFINE/REDEFINE solution requires adding the following header instead:

//--- dr500: dup 372
// REDEFINE: %{file} = dr500.cpp
// RUN: %{run}

As far as I can tell, neither version introduces any significant maintenance burden to the DR test. The former is just more concise.. But does that difference justify maintaining %{for-each-file} in lit?

https://gist.github.com/jdenny-ornl/82aae9c3706c88fedebc8c44ac2a6694

So we're looking at 20% more noise in a large test suite. I'm not happy with that either.

When comparing the character counts for the solution using %{for-each-file} and the solution above using DEFINE/REDEFINE, I see a 9% increase for the latter.

But what does that number really mean anyway?

This matters for DR tests because we put ~100 tests in every file. There are several thousand DR tests in C++ and an additional 500 or so in C, so this adds a fair amount of noise to these test files.

It boils down to the following. When formatting the DR tests as lit tests, the %{for-each-file} solution requires adding the following header to each DR test:

//--- dr500: dup 372

where the DEFINE/REDEFINE solution requires adding the following header instead:

//--- dr500: dup 372
// REDEFINE: %{file} = dr500.cpp
// RUN: %{run}

The DR tests already have the comment for generation of the cxx status page, now we're talking about adding two additional lines of markup for each test and those two lines add zero information the user cares about. What's more, we have to repeat information that can get out of sync (if the //--- dr500: part doesn't match the %{file} = dr500.cpp part, the test doesn't run properly) and we're requiring people to know there's this hidden connection between the comment and the REDEFINE line. And if you forget to add the REDEFINE line, the test will still run and pass, but can have potentially unexpected results (as far as our DR testing is concerned) due to Clang's diagnostic display heuristics, which is the entire reason we're discussing this for-each-file feature in the first place. Using for-each-file has its own issues (it's new and novel), but once the RUN lines are written, the only thing the test developer needs to write is the special comment they already had to write anyway for the DR status page to be generated.

As far as I can tell, neither version introduces any significant maintenance burden to the DR test. The former is just more concise.. But does that difference justify maintaining %{for-each-file} in lit?

IMO, yes. I think I'm now convinced that DEFINE/REDEFINE, while powerful and potentially plausible as a feature to solve this problem, is not really the right approach for DR testing. While I'm usually hesitant to add new lit functionality (I think those additions are powerful but make it harder for newcomers to the project), DR testing is a pretty specialized use case that has different needs than other testing. I think it's reasonable for both features to coexist; they both have plenty of utility but different tradeoffs.

jdenny added a reviewer: yln.Jun 20 2023, 10:19 AM

IMO, yes.

The vote stands at 2 to 1 in favor of %{for-each-file} as opposed to using the existing DEFINE/REDEFINE functionality. I don't have any additional information to present, so we should let others weigh in on this point.

If some form of %{for-each-file} is the chosen path forward, I hope @jhenderson's suggestion for a different syntax will be pursued. Replicating the enclosing RUN line is a surprising side effect for a substitution.

IMO, yes.

The vote stands at 2 to 1 in favor of %{for-each-file} as opposed to using the existing DEFINE/REDEFINE functionality. I don't have any additional information to present, so we should let others weigh in on this point.

If some form of %{for-each-file} is the chosen path forward, I hope @jhenderson's suggestion for a different syntax will be pursued. Replicating the enclosing RUN line is a surprising side effect for a substitution.

If we can agree that lit should provide first-class iteration facilities, I have no troubles reimplementing this patch according to my earlier suggestion, which is in line with @jhenderson's suggestion as I understand it. If we find generic (non-file) iteration facilities useful for DR testing, I can contribute them as well later.

What about the following instead?

// RUN: split-file %s %t
// PYTHON: for file in os.listdir(lit.substs['%t']):
// PYTHON:   lit.run('%clang_cc1 -std=c++98 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++11 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++14 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++17 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++20 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++23 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)

In other words, instead of incrementally designing, debating, and maintaining more and more lit features to try to address every use case in the most ideal manner, eventually ending up with a full blown scripting language in lit, why don't we invest that effort in connecting to an existing scripting language? Python seems like the obvious choice. This would open so many more doors than one more lit control structure.

What about the following instead?

// RUN: split-file %s %t
// PYTHON: for file in os.listdir(lit.substs['%t']):
// PYTHON:   lit.run('%clang_cc1 -std=c++98 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++11 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++14 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++17 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++20 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++23 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)

In other words, instead of incrementally designing, debating, and maintaining more and more lit features to try to address every use case in the most ideal manner, eventually ending up with a full blown scripting language in lit, why don't we invest that effort in connecting to an existing scripting language? Python seems like the obvious choice. This would open so many more doors than one more lit control structure.

I've been watching on the discussion a bit over the last few days, and soemthing along these lines is what I'm inclined to think would make sense. As this is a somewhat specialised case. I don't think having a substitution that could cause potential confusion (it expands the whole line and repeats it, rather than replacing a thing in situ only) is the way to go, but we can achieve the same thing via python. You don't even need @jdenny's addition to lit either: %python is an existing substitution to the python executable, allowing you to just execute any old script, which could either be a separate file, something inline in the RUN command, or a file "appended" to the test file and split off via split-file. Untested example:

// RUN: split-file %s %t
// RUN: %python doTest.py %t %clang_cc1

//--- doTest.py
import os
import sys

for f in os.listdir(sys.argv[1]):
  if f.endswith('.py'):
    continue
  for std in ['c++98', 'c++11', 'c++14', 'c++17', 'c++20', 'c++23']:
    subprocess.run(sys.argv[2] + ' -std=' + std + ' -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + f)

// ... all the files for expansion by split-file ...

If you made the python script a separate file listed alongside the tests, you could relatively easily adapt it to work with all the different test cases you need. The basic rule would be that a thing that would be substituted by lit instead becomes an input argument into the python script. If you want additional diagnostics about which case failed, you just need to print some markers in the script at the appropriate points (e.g. at the start of the loop, indicating the file being used etc).

A minor potential benefit of this beyond @jdenny's suggestion is that you could execute the python script directly without using lit to aid with debugging, once you have a test that failed. This would avoid things like needing to re-split the file etc/you could delete all the uninteresting inputs to focus on one specific one etc.

What about the following instead?

// RUN: split-file %s %t
// PYTHON: for file in os.listdir(lit.substs['%t']):
// PYTHON:   lit.run('%clang_cc1 -std=c++98 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++11 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++14 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++17 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++20 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
// PYTHON:   lit.run('%clang_cc1 -std=c++23 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)

In other words, instead of incrementally designing, debating, and maintaining more and more lit features to try to address every use case in the most ideal manner, eventually ending up with a full blown scripting language in lit, why don't we invest that effort in connecting to an existing scripting language? Python seems like the obvious choice. This would open so many more doors than one more lit control structure.

That's an excellent suggestion! Is any aspect of it a real lit API?

I've been watching on the discussion a bit over the last few days, and soemthing along these lines is what I'm inclined to think would make sense. As this is a somewhat specialised case. I don't think having a substitution that could cause potential confusion (it expands the whole line and repeats it, rather than replacing a thing in situ only) is the way to go, but we can achieve the same thing via python. You don't even need @jdenny's addition to lit either: %python is an existing substitution to the python executable, allowing you to just execute any old script, which could either be a separate file, something inline in the RUN command, or a file "appended" to the test file and split off via split-file. Untested example:

// RUN: split-file %s %t
// RUN: %python doTest.py %t %clang_cc1

//--- doTest.py
import os
import sys

for f in os.listdir(sys.argv[1]):
  if f.endswith('.py'):
    continue
  for std in ['c++98', 'c++11', 'c++14', 'c++17', 'c++20', 'c++23']:
    subprocess.run(sys.argv[2] + ' -std=' + std + ' -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + f)

// ... all the files for expansion by split-file ...

I did some real testing. Here's how it looks like: https://gist.github.com/Endilll/e2e4ace431489aa019682b96c331d928
Note that it requires D151320 to work.

If you made the python script a separate file listed alongside the tests, you could relatively easily adapt it to work with all the different test cases you need. The basic rule would be that a thing that would be substituted by lit instead becomes an input argument into the python script. If you want additional diagnostics about which case failed, you just need to print some markers in the script at the appropriate points (e.g. at the start of the loop, indicating the file being used etc).

We also would have to properly handle return codes from subprocesses. Then we shouldn't forget to forward stderr to stdout, otherwise we could miss some output. All in all, I don't think tests should duplicate test runner stuff inside lit that's been proven and working for us already.

A minor potential benefit of this beyond @jdenny's suggestion is that you could execute the python script directly without using lit to aid with debugging, once you have a test that failed. This would avoid things like needing to re-split the file etc/you could delete all the uninteresting inputs to focus on one specific one etc.

I understand where you come from, but it's hard to beat debugging experience Compiler Explorer provides for DR testing. Compiler invocation is sufficient in common cases, and when crazy stuff happens, something this simple is not expected to be too helpful.

Thank you for pointing out how well lit and split-file can play together! I'm in favor of both Joel's PYTHON directive, and a form along the following lines:

// RUN: rm -rf %t
// RUN: split-file --leading-lines %s %t
// RUN: %lit-generate-directives %t/run_dr6xx.py

//--- run_dr6xx.py
for file in os.listdir(lit.substs['%t']):
  lit.run('%clang_cc1 -std=c++98 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
  lit.run('%clang_cc1 -std=c++11 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
  lit.run('%clang_cc1 -std=c++14 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
  lit.run('%clang_cc1 -std=c++17 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
  lit.run('%clang_cc1 -std=c++20 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)
  lit.run('%clang_cc1 -std=c++23 -verify -fexceptions -fcxx-exceptions -pedantic-errors ' + file)

//--- dr600.cpp
...

%lit-generate-directives is useful for "ordinary" DR tests that share same compiler flags. Eliminating differences in flags is on my list for DR tests. But there are also at least CodeGen tests which definitely can't use ordinary set of flags. Inline PYTHON directive could make more sense for them.

I've been watching on the discussion a bit over the last few days, and soemthing along these lines is what I'm inclined to think would make sense.

That's an excellent suggestion!

Great, we're converging!

Is any aspect of it a real lit API?

Not that I'm aware of. Someone would have to implement it. Until then, the external/split-file python ideas that you and @jhenderson are discussing seem like a reasonable direction for the C++ DR testing. I don't yet understand what your %lit-generate-directives does, but I don't have time to investigate right now.

A few more thoughts on the PYTHON: directive in case someone pursues it....

I imagine it could eventually replace some other existing and proposed lit features as well. Lit's existing %if x could become PYTHON: if lit.features('x'):. Python functions could replace parameterized substitutions (discussed in the DEFINE/REDEFINE documentation) and function-like substitutions (which I started working on previously). In general, I think using python instead of lit features in this way would lower the learning curve for lit users, increase the flexibility, and lower the maintenance burden for lit developers.

Also, if RUN: cmd is treated as an alias for PYTHON: lit.run(cmd_escaped_for_python), then it should be easier to annotate existing lit tests with PYTHON: if cond: directives and similar.

I've been watching on the discussion a bit over the last few days, and soemthing along these lines is what I'm inclined to think would make sense.

That's an excellent suggestion!

Great, we're converging!

Is any aspect of it a real lit API?

Not that I'm aware of. Someone would have to implement it. Until then, the external/split-file python ideas that you and @jhenderson are discussing seem like a reasonable direction for the C++ DR testing.

Honestly, I'd rather start implementing PYTHON directive rather than spend efforts on a stop-gap measure that requires reinventing test runners.

I don't yet understand what your %lit-generate-directives does, but I don't have time to investigate right now.

A few more thoughts on the PYTHON: directive in case someone pursues it....

I imagine it could eventually replace some other existing and proposed lit features as well. Lit's existing %if x could become PYTHON: if lit.features('x'):. Python functions could replace parameterized substitutions (discussed in the DEFINE/REDEFINE documentation) and function-like substitutions (which I started working on previously). In general, I think using python instead of lit features in this way would lower the learning curve for lit users, increase the flexibility, and lower the maintenance burden for lit developers.

Also, if RUN: cmd is treated as an alias for PYTHON: lit.run(cmd_escaped_for_python), then it should be easier to annotate existing lit tests with PYTHON: if cond: directives and similar.

I don't yet understand what your %lit-generate-directives does, but I don't have time to investigate right now.

In a nutshell, It tells lit to treat contents of a .py file as inline PYTHON directives.

Also, if RUN: cmd is treated as an alias for PYTHON: lit.run(cmd_escaped_for_python), then it should be easier to annotate existing lit tests with PYTHON: if cond: directives and similar.

Can you elaborate on a second half of this sentence?

Also, if RUN: cmd is treated as an alias for PYTHON: lit.run(cmd_escaped_for_python), then it should be easier to annotate existing lit tests with PYTHON: if cond: directives and similar.

Can you elaborate on a second half of this sentence?

I was probably overselling it. I just meant that, if you already have something like

// RUN: use-non-portable-feature-foo args0
// RUN: use-non-portable-feature-foo args1
// RUN: use-non-portable-feature-foo args2

it might be more convenient to extend it to

// PYTHON: if lit.features('foo'):
//    RUN:   use-non-portable-feature-foo args0
//    RUN:   use-non-portable-feature-foo args1
//    RUN:   use-non-portable-feature-foo args2

than to

// PYTHON: if lit.features('foo'):
// PYTHON:   lit.run('use-non-portable-feature-foo args0')
// PYTHON:   lit.run('use-non-portable-feature-foo args1')
// PYTHON:   lit.run('use-non-portable-feature-foo args2')

especially if the command lines have characters that require quoting for python. You can imagine similar cases where you extend a test with a foreach loop.

Someone might argue that mixing RUN/PYTHON directives in that way is actually harmful in some respects, so feel free to ignore this idea for now.

The reason I favor the current %{for-each-file} approach is that it works well for more complex scenarios and it's extensible.

I actually had the exact opposite reaction: I was not able to see how this would compose. For example, can you nest a command that uses %{for-each-file} within a command that uses %{for-each-file}?

In other words, instead of incrementally designing, debating, and maintaining more and more lit features to try to address every use case in the most ideal manner, eventually ending up with a full blown scripting language in lit, why don't we invest that effort in connecting to an existing scripting language? Python seems like the obvious choice. This would open so many more doors than one more lit control structure.

I think this should be at the top of our minds.

I just noticed this thread and skimmed through it. I'd like to offer libc++'s experience to solve a similar issue. We have several tests that are pretty repetitive and we started generating them automatically using python scripts. To do this, we added a new kind of test called .gen.py to libc++'s own custom test format. This kind of test is basically a Python script that generates other tests based on arbitrary Python code. So what we do is something like this:

# Test that we can include each header in a TU while using modules.

# RUN: %{python} %s

for header in SOME-LIST-OF-HEADERS:
  print(f"""
//--- {header}.pass.cpp
// RUN: %{cxx} %s %{flags} %{compile_flags} -fmodules -fcxx-modules -fmodules-cache-path=%t -fsyntax-only
#include <{header}>
""")

The real thing is actually more complicated than this, but this is the gist of it. The result of this .gen.py test is actually many sub-tests, one for each header.pass.cpp file output by this script. Each test looks like this (e.g. for <thread>):

//--- thread.pass.cpp
// RUN: %{cxx} %s %{flags} %{compile_flags} -fmodules -fcxx-modules -fmodules-cache-path=%t -fsyntax-only
#include <thread>

The benefits of this approach are:

  1. This is extremely extensible. We're writing arbitrary Python code to generate the tests, so if somebody needs additional custom logic when generating their test, they can go as crazy as they want without impacting Lit at all.
  2. This interacts nicely with the rest of Lit because each file is then a self-contained test with its own RUN command. When it fails, you can see exactly what failed in that test and in that test only, as-if you never actually had all the tests written in the same file in the first place.

The downside is that (in this case at least) there's a bit more upfront complexity required for writing the test, since you have to write it in Python. I think this would probably be easy to improve with something like the PYTHON directive (which I wasn't aware of until recently) or something else -- libc++'s test suite can't use most of the tools that Clang can use, like split-files. What you folks are trying to achieve is slightly different because most of your DR test is not generated, it's just normal C++, and the only part that gets generated is the RUN lines. However, I think a similar approach might work nicely if tweaked to your needs. In particular, I would argue that if the solution to this problem requires adding anything specific to Lit (as opposed to a general capability), then the design of that solution might be slightly off.

Anyway, I don't have a large stake in Lit or in the Clang test suite, but I wanted to share libc++'s positive experience with solving a similar problem using a general approach.

Thank you for taking a look at this thread!

The reason I favor the current %{for-each-file} approach is that it works well for more complex scenarios and it's extensible.

I actually had the exact opposite reaction: I was not able to see how this would compose. For example, can you nest a command that uses %{for-each-file} within a command that uses %{for-each-file}?

If I understand it correctly, your scenario should result into Cartesian product of two %{for-each-file}. In this patch I left a placeholder for that part of implementation.
But there's no point discussing that idea anymore, I believe. We converged on having PYTHON directive and lit exposing a Python API for applying substitutions and adding directives.

Thank you for sharing your experience with libc++ test suite! It is possible to apply your approach for DR tests, but I'm not keen to add a new layer of abstraction for tests that need to generate just RUN directives, keeping test cases the same.

jdenny added a comment.Jul 5 2023, 9:16 AM

Thanks for commenting. It's good to hear more perspectives and experiences.

We have several tests that are pretty repetitive and we started generating them automatically using python scripts. To do this, we added a new kind of test called .gen.py to libc++'s own custom test format. This kind of test is basically a Python script that generates other tests based on arbitrary Python code.

Interesting approach. I'd like to understand the motivation a bit better, in part to understand how a PYTHON: directive might help. Why do you generate tests using lit RUN: lines in the .gen.py custom test format files? That is, my first guess at how to do this would have been to encode calls to python scripts in cmake as part of the test suite build process. That seems simpler, but maybe I missed something.

(Aside: Instead of the BLOCKLIT that appears in many of these tests, why not use lit's END. directive?)

The downside is that (in this case at least) there's a bit more upfront complexity required for writing the test, since you have to write it in Python. I think this would probably be easy to improve with something like the PYTHON directive (which I wasn't aware of until recently)

In case it isn't clear, the PYTHON: directive doesn't exist yet, but @Endill and I are discussing a prototype.

Thanks for commenting. It's good to hear more perspectives and experiences.

We have several tests that are pretty repetitive and we started generating them automatically using python scripts. To do this, we added a new kind of test called .gen.py to libc++'s own custom test format. This kind of test is basically a Python script that generates other tests based on arbitrary Python code.

Interesting approach. I'd like to understand the motivation a bit better, in part to understand how a PYTHON: directive might help. Why do you generate tests using lit RUN: lines in the .gen.py custom test format files? That is, my first guess at how to do this would have been to encode calls to python scripts in cmake as part of the test suite build process. That seems simpler, but maybe I missed something.

I guess there's no real reason, it's just that all libc++ tests are basically ShTests and it seemed simpler for me to keep them that way. So we write a ShTest that generates a ShTest and it just works.

(Aside: Instead of the BLOCKLIT that appears in many of these tests, why not use lit's END. directive?)

I wasn't aware of END.!

ldionne resigned from this revision.Aug 31 2023, 8:59 AM
ldionne added a subscriber: ldionne.