This is an archive of the discontinued LLVM Phabricator instance.

[lit] Implement PYTHON directive and config.prologue
Needs ReviewPublic

Authored by jdenny on Jul 11 2023, 10:17 AM.

Details

Summary

NOTE: This patch should be applied after PR #68496, whose branch is jdenny-ornl:lit-python-script-fatal and whose final commit is currently 909a48a4d1dd04b1841d5e6953797001a560c432.

For example, you can put this in a lit test:

// PYTHON: def check(cflags, fcprefix):
// PYTHON:     lit.run(f"""
// PYTHON:         %clang_cc1 -verify -fopenmp -fopenmp-version=51
// PYTHON:                    {cflags} -emit-llvm -o - %s |
// PYTHON:           FileCheck -check-prefix={fcprefix} %s
// PYTHON:     """)
// PYTHON:
// PYTHON: check("-triple x86_64-apple-darwin10.6.0 -fopenmp-simd", "SIMD")
// PYTHON: check("-triple x86_64-apple-darwin10.6.0", "NO-SIMD")
// PYTHON: check("-triple x86_64-unknown-linux-gnu", "NO-SIMD")

where lit.run(cmd) behaves like RUN: cmd.

To define check at the start of every test file in a test directory, you can move it to a lit.prologue.py file and, in a lit configuration file, set config.prologue to that file.

The idea of a PYTHON directive was proposed at https://reviews.llvm.org/D150856#4436879 (but some details have evolved since then). The rationale is that we are continuing to incrementally evolve lit RUN lines toward a full blown scripting language. A simpler alternative is to just use an existing scripting language, and python seems like the obvious choice. That will lower the learning curve for lit users, quickly and significantly increase the flexibility of lit RUN lines, and lower the burden for lit developers and reviewers.

Toward those goals, PYTHON directives can also eventually replace some other existing and proposed lit features. Lit's existing %if x substitutions can become PYTHON: if lit.has('x'):. Python functions can replace parameterized substitutions and my proposed function-like substitutions.

For further discussion and examples, see this patch's additions to llvm/docs/TestingGuide.rst.

Diff Detail

Event Timeline

jdenny created this revision.Jul 11 2023, 10:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2023, 10:17 AM
jdenny requested review of this revision.Jul 11 2023, 10:17 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
Herald added a reviewer: Restricted Project. · View Herald Transcript

Are you also planning to adjust the update_tests.py scripts to support this syntax? Might be a good motivation to have them use the lit parser to extract commands rather than regexes.

Are you also planning to adjust the update_tests.py scripts to support this syntax?

Thanks, I hadn't thought of that. Is that something you would like to use? Regardless of use cases, I'm inclined to wait to see the reaction to PYTHON directives generally.

Might be a good motivation to have them use the lit parser to extract commands rather than regexes.

Sounds reasonable to me, but I haven't spent much time with those scripts in a while.

@ldionne: It looks like this patch has revealed an existing bug in libcxx/test/libcxx/selftest/sh.cpp/empty.sh.cpp, but I'm not sure how best to handle it. Can you help?

That test intends to check the case when no RUN line is specified, but its comments accidentally have an embedded RUN line:

// XFAIL: *

// Make sure the test DOES NOT pass if it has no 'RUN:' steps

As a result, the actual lit error is a shell parse error not an error about missing RUN lines. From the pre-commit CI fails above:

Script:
--
%dbg(RUN: at line 11) ' steps
--
Exit Code: 1

Command Output (stdout):
--
shell parser error on: ": 'RUN: at line 11';  ' steps"

--

The XFAIL directive turns that into an expected test fail. That's why the test has not caused the test suite to fail before this patch. After this patch, a shell parse error produces an unresolved test status not a failed test status, so the test suite now fails. That seems like a good side effect of this patch as it catches this malformed test.

I tried to fix the test so it exercises what it was meant to: no RUN lines. To do so, I removed the : after RUN. Both before and after this patch, that produces an unresolved test status, and again that seems like the right lit behavior for a malformed test. However, again, that means the test suite now fails. How would you like to handle that?

Thank you for coming up with the idea, and then implementing it!
Unfortunately, I can't review your implementation from lit standpoint, but it looks solid judging by documentation.

llvm/docs/TestingGuide.rst
317–318

I think we can provide an example for this case as well:

from itertools import product

stds = (98, 11, 14, 17, 20, 23)
triples = (
  'x86_64-apple-darwin10.6.0',
  'x86_64-unknown-linux-gnu'
)
for std, triple in product(stds, triples):
  lit.run(f'%clang_cc1 -std=c++{std} -triple {triple} %s')

But it could be better to hold it until we come up with a variant of run that runs in parallel with other runs, in order to avoid the issue you're describing.

377

I suggest to provide complete example, because description is somewhat difficult to follow, especially for people not too well-versed in Python.

llvm/utils/lit/lit/TestRunner.py
2431–2432

I don't see why we shouldn't provide such an API to PYTHON directives, but it's out of scope of this patch.

jdenny updated this revision to Diff 541248.Jul 17 2023, 3:05 PM

Addressed reviewer comment: make import example clearer.

Made punctuation spacing more consistent.

Thank you for coming up with the idea, and then implementing it!
Unfortunately, I can't review your implementation from lit standpoint, but it looks solid judging by documentation.

Thanks for reviewing and for the helpful conversations that led here.

llvm/docs/TestingGuide.rst
317–318

Agreed. Let's wait until we have a better handle on such cases.

377

Thanks for that feedback. Please let me know if the new version is still confusing.

llvm/utils/lit/lit/TestRunner.py
2431–2432

I agree we should consider that later.

One reason is we need to think about the interface. (For example, I'm currently inclined to have separate define and redefine functions for safety, but others might disagree.)

Another reason is that, if PYTHON directives can change substitutions, it would not generally be possible to expand substitutions in RUN lines appearing after PYTHON directives until those PYTHON directives have been executed. That might impact the "Script:" discussion under D154984, so let's settle that discussion first.

@ldionne: It looks like this patch has revealed an existing bug in libcxx/test/libcxx/selftest/sh.cpp/empty.sh.cpp, but I'm not sure how best to handle it. Can you help?

[...]

I tried to fix the test so it exercises what it was meant to: no RUN lines. To do so, I removed the : after RUN. Both before and after this patch, that produces an unresolved test status, and again that seems like the right lit behavior for a malformed test. However, again, that means the test suite now fails. How would you like to handle that?

Thanks for the heads up and the detailed explanation. It looks like there's no reliable way to test what I was trying to test, and that test isn't *that* important anyway. I'll remove it since it was broken in the first place, so just rebase your patch. Sorry for the trouble.

Endill added inline comments.Jul 17 2023, 7:51 PM
llvm/utils/lit/lit/TestRunner.py
2431–2432

For example, I'm currently inclined to have separate define and redefine functions for safety

I (again) missed that it's DEFINE and REDEFINE we're talking about here. Sorry for that. As I expressed earlier in our private conversation, I hope PYTHON to subsume REDEFINE, so we wouldn't need to think hard about exposing it to PYTHON directive. But DEFINE is a different story. I agree that we should wait for that discussion to settle.

Thank you for implementing this feature! It may simplify some of the 133 rg -l -g '!flang' '%python ' tests we have today.
flang/ has 700+ tests. They may be helped by lit.prologue.py

I have to offer a dissenting voice here...

Lit shell tests are not written in Python or C++, but in a constrained language specifically for testing.
This has advantages and some disadvantages: e.g. it's easy to see concretely what a lit test does, but can be hard to understand its intent. Assertions are quite readable, but brittle. Output is not always easy to read but fits into a linear sequence of RUN lines.
Personally I think we should be less reliant on lit tests.

But mashing python into lit doesn't produce a coherent result.
Almost all lit features have a python equivalent: CHECK vs check(), DEFINEs vs variables, %if vs if etc. Now there are always several ways to do the same thing, it's unclear when each is best, and the inconsistency will make it hard to read & maintain tests across the repo in various dialects, which many people have to do.
It's possible to process lit shell tests with alternate implementations (without python!), I know of at least two...
Python is a serious general-purpose programming language with tools available, but none of these tools work with python embedded in lit tests (not even syntax highlighting!).

The lit test runner supports having multiple types of tests (this is how we run gtest + shell tests).
Could the tests you're targeting be written as actual python files, with an appropriate library of operations? (Some of this library could of course be shared with the shell test implementation).

pirama added a subscriber: pirama.Jul 24 2023, 11:35 AM

I have to offer a dissenting voice here...

Thanks for explaining your view.

Almost all lit features have a python equivalent: CHECK vs check(),

How would a check() work?

DEFINEs vs variables, %if vs if etc.

My hope was that people would gradually migrate away from lit constructs like %if and REDEFINE, which some people find ugly to use and complex in lit's implementation. I do think DEFINE and simple lit substitutions will be around for a long time because they're useful with simple RUN lines.

Now there are always several ways to do the same thing, it's unclear when each is best, and the inconsistency will make it hard to read & maintain tests across the repo in various dialects, which many people have to do.

Maybe, but most tools/languages eventually grow that issue. An intended advantage is that people would be reading python instead of new add-ons for lit's own scripting language, which LLVM developers (understandably) keep wanting to propose. That is, a lit with PYTHON should make tests easier to read and should make lit easier to maintain than a lit with its own evolving scripting language.

It's possible to process lit shell tests with alternate implementations (without python!), I know of at least two...

I don't think I've heard about this. Can you explain more?

Python is a serious general-purpose programming language with tools available, but none of these tools work with python embedded in lit tests (not even syntax highlighting!).

That sounds like a nice integration to work on... instead of continuing to grow lit's own scripting language.

The lit test runner supports having multiple types of tests (this is how we run gtest + shell tests).
Could the tests you're targeting be written as actual python files, with an appropriate library of operations? (Some of this library could of course be shared with the shell test implementation).

Are you suggesting that, instead of embedding python inside C/Fortran/IR/whatever, people should embed C/Fortran/IR/whatever inside python? Either way, libraries can be developed to minimize the complexity of python in the individual tests. However, I think the latter would create greater inconsistency among lit tests than the former: just to add a loop to a shell command in an existing lit test, you would have to reorganize the entire test.

Could the tests you're targeting be written as actual python files, with an appropriate library of operations? (Some of this library could of course be shared with the shell test implementation).

I think it would accurate to say that the idea this patch implements was born out of use case I have for C++ defect report test suite. As I see it, restructuring that test suite as actual Python files will introduce more noise to the tests than other options. We would also lose C++ syntax highlighting, which would make tests even less readable in majority of contexts.

I have to offer a dissenting voice here...

I am also unsure whether this patch is the right direction for LIT. One major advantage of LIT is that it is "constrained" and hence:

  • the number of "idioms" is rather limited, so it's relatively easy to switch between sub-projects and still be able to parse/contribute new tests,
  • while "creativity" is limited (it's not Python), it allows us to keep the tests relatively coherent.

While new LIT features make it resemble Python more and more, it is still quite far from Python itself. Also, there's a difference between "adding _a_ new feature" vs "allowing arbitrary Python code". But I appreciate that LIT has it's limitations and we need to evolve it.

That is, a lit with PYTHON should make tests easier to read and should make lit easier to maintain than a lit with its own evolving scripting language.

I am not sure. It will give developers more power, but with extra power comes extra responsibility :) Also, this "power" will require people to understand LIT internals better, right? That wouldn't be a bad thing, but I just want to highlight that there are some advantages of having a dedicated scripting language (which doesn't require one to understand how LIT is implemented).

Just chiming in here to give my 2p. lit already has the ability to execute python scripts (call %python), but any such scripts are not directly tied into the lit, so running them etc won't produce output that fits with the lit diagnostics, as I understand it. As I understand it, this patch is essentially just an improvement of the "execute arbitrary python script" option. I'm in favour of it, but if it isn't already clear, I think it would be worth outlining in this patch why simply calling an external python script (which could actually be embedded within the file) is insufficient for the motivating case.

Almost all lit features have a python equivalent: CHECK vs check(),

How would a check() work?

Sorry, this was an incomplete thought I should have revised.
I'm expecting we'll end up with some way of writing assertions directly within python, rather than just calling out to shell commands that verify themselves.
(If this isn't added deliberately, then I'd expect it'll turn up anyway via raise or lit.run("false") or something)

DEFINEs vs variables, %if vs if etc.

My hope was that people would gradually migrate away from lit constructs like %if and REDEFINE, which some people find ugly to use and complex in lit's implementation.

I think gradual migration is a recipe for getting stuck in a halfway state. I agree with the ugliness and complexity, and wish these features had also not been added. The mixture is yet more complicated.

Maybe, but most tools/languages eventually grow that issue. An intended advantage is that people would be reading python instead of new add-ons for lit's own scripting language, which LLVM developers (understandably) keep wanting to propose. That is, a lit with PYTHON should make tests easier to read and should make lit easier to maintain than a lit with its own evolving scripting language.

The difficulty in extending lit cleanly is a good reason for an alternative, and I think python is a good choice. (just not embedded in lit)

It's possible to process lit shell tests with alternate implementations (without python!), I know of at least two...

I don't think I've heard about this. Can you explain more?

Google uses a custom lit test runner to run LLVM tests as part of CI.
This has different performance characteristics and fits better into the infrastructure (each test is run as a separate sandboxed action in a distributed build/test environment).
This is possible because lit is a spec as much as a limitation (though recent feature additions have strained this).
The main limitation is that the config definition & discovery scheme is unsuitable for multiple reasons, so config needs to be configured separately.

The current iteration of this system implements a shell test interpreter as a Go binary (gtests are run in a different way).
The previous iteration mangled tests into shell scripts through regexps.

This is downstream and LLVM doesn't have any obligation to support any of it, but the ability to be integrated into different environments is a useful LLVM feature, and having clear interfaces for infra allows it to evolve in ways other than bolting on more features.

Python is a serious general-purpose programming language with tools available, but none of these tools work with python embedded in lit tests (not even syntax highlighting!).

That sounds like a nice integration to work on... instead of continuing to grow lit's own scripting language.

Agreed. Realistically, none of the standard tools for editing python code will ever work if the code is prefixed with PYTHON: lines and lives in *.cpp files, though.

The lit test runner supports having multiple types of tests (this is how we run gtest + shell tests).
Could the tests you're targeting be written as actual python files, with an appropriate library of operations? (Some of this library could of course be shared with the shell test implementation).

Are you suggesting that, instead of embedding python inside C/Fortran/IR/whatever, people should embed C/Fortran/IR/whatever inside python?

I do like the trick where a single lit test file provides multiple kinds of inputs (e.g. to lit + clang + FileCheck), but it is hard to reason about, and would not like to see it used when the test contains control flow.
I'm not sure whether it's best to have C etc inputs as string literals, or as separate files.

String literals are workable, because python has multiline raw strings (vs PYTHON: prefix), and the embedded C programs tend to be self-contained (vs interacting with e.g. lit).
So readability & tools are a bit better as python-in-lit, and the need for tools isn't so strong.
This is the approach we mostly used in clangd (e.g. llvm-project/clang-tools-extra/clangd/unittests/FindTargetTests.cpp)

Separate files for the C++ inputs are in some ways a more principled option, and tools work well, but more hassle to organize and navigate.

Either way, libraries can be developed to minimize the complexity of python in the individual tests. However, I think the latter would create greater inconsistency among lit tests than the former: just to add a loop to a shell command in an existing lit test, you would have to reorganize the entire test.

FWIW, this seems like a feature to me: you have to choose between terse/cryptic/self-referential lit test syntax, and powerful logic. (Hiding the logic in libraries makes it both better and worse!)
Having to rewrite the test is work, but starting with spaghetti and adding structure *without* rewriting may be worse...

Could the tests you're targeting be written as actual python files, with an appropriate library of operations? (Some of this library could of course be shared with the shell test implementation).

I think it would accurate to say that the idea this patch implements was born out of use case I have for C++ defect report test suite.

Definitely. Other feedback/experience for DEFINE, REDEFINE, function substitutions, and %if also contributed.

I am also unsure whether this patch is the right direction for LIT.

Thanks for chiming in.

One major advantage of LIT is that it is "constrained" and hence:

  • the number of "idioms" is rather limited, so it's relatively easy to switch between sub-projects and still be able to parse/contribute new tests,
  • while "creativity" is limited (it's not Python), it allows us to keep the tests relatively coherent.

I haven't found that to be true when test suites invent their own custom lit test "formats". I admit I haven't investigated their usage generally throughout LLVM. However, when I've come across one, I have found it to be a significant paradigm shift, and I have had to study the new format's implementation to grasp what the custom rules are. In my estimation, the difficulty of understanding a lit shell test with PYTHON when you're used to understanding a lit shell test without PYTHON would be much smaller in comparison. (To be clear, I'm not arguing for avoiding custom lit test formats or replacing them with PYTHON. I'm just comparing the difficulty of understanding PYTHON with variations that already appear in lit test suites.)

That is, a lit with PYTHON should make tests easier to read and should make lit easier to maintain than a lit with its own evolving scripting language.

I am not sure. It will give developers more power, but with extra power comes extra responsibility :) Also, this "power" will require people to understand LIT internals better, right?

What lit internals do you mean? Do you mean it enables tests to directly access functions and symbols within lit's implementation? This patch specifically avoids that. Access to lit-provided features is via a carefully defined lit object in python. Or do you mean it requires test authors to learn various complex subtleties of new user-visible lit features? (I think people sometimes refer to those subtleties as lit internals.) That's exactly a problem this patch is trying to avoid by having people use a widely used and well documented language like python instead of new lit scripting language features.

That wouldn't be a bad thing, but I just want to highlight that there are some advantages of having a dedicated scripting language (which doesn't require one to understand how LIT is implemented).

I previously thought that too. In practice, I don't think it's working out that way.

We tend to start off with a simple and familiar idea we're used to in other languages (recent examples: variables/macros, functions, if-then-else, for-each). Then we try to force that into lit's existing shell language, and we find there are many lit-specific subtleties to handle, debate, and ultimately teach. Then we start using the feature and find it doesn't adapt well to other use cases. Then people say they regret it. If we had a dedicated dream team for developing lit features, maybe we could design the perfect set of lit features that give LLVM developers just what they need without adding too much power. In the real world, I think it's more likely we'll keep evolving a messy collection of features that many people dislike. Instead, I say let's use python and get back to developing compilers.

I'm expecting we'll end up with some way of writing assertions directly within python, rather than just calling out to shell commands that verify themselves.
(If this isn't added deliberately, then I'd expect it'll turn up anyway via raise or lit.run("false") or something)

In this patch's current implementation, python compile failures (similar to RUN line shell parse errors) produce "unresolved" tests, and python execution failures (similar to RUN line non-zero exit status) produce "failed" tests. Yes, test authors could use raise or some lit.run fallacy to induce test failures. I don't see why that's bad if the check they want to perform is easier to write in python.

I think gradual migration is a recipe for getting stuck in a halfway state.

We could migrate lit and all llvm upstream test suites at the same time. However, I think it would be better to give people time to adjust and make sure there are no surprises. Moreover, I don't think the halfway state in this case is so dire. My hope is that PYTHON will stop the development of new lit scripting language features so we don't find ourselves with more features to maintain and possibly migrate away from. Then we would gradually remove the few that have already landed unless we find a good reason to keep them.

The difficulty in extending lit cleanly is a good reason for an alternative, and I think python is a good choice. (just not embedded in lit)

So, as @jhenderson suggested, I think this is the issue we should focus discussion on right now. That is, why embed python (using PYTHON) rather than just call a separate python script? Why embed multiple shell commands (using RUN) rather than just call a separate shell script? In my opinion, test authors should do what makes their tests easiest to understand while avoiding repeated code. Sometimes that means making test code visible directly in the individual test file, and sometimes that means encapsulating it in another file.

It's possible to process lit shell tests with alternate implementations (without python!), I know of at least two...

I don't think I've heard about this. Can you explain more?

Google uses a custom lit test runner to run LLVM tests as part of CI.

Thanks for explaining that.

This is downstream and LLVM doesn't have any obligation to support any of it, but the ability to be integrated into different environments is a useful LLVM feature, and having clear interfaces for infra allows it to evolve in ways other than bolting on more features.

I think it's too much to ask upstream lit developers to avoid adding new features indefinitely because it causes trouble for downstream users.

Realistically, none of the standard tools for editing python code will ever work if the code is prefixed with PYTHON: lines and lives in *.cpp files, though.

I agree that tools that support standard python would require extensions or shims. I don't have experience with python tools to know how hard that is. Then again, people might find it's really not worthwhile. Complex python will likely end up in a python module that is simply called from a PYTHON line. That would look a lot like what people already do with %python in RUN. But PYTHON gives people a convenient, well integrated way to keep their python local to the test when that makes the test easier to read and maintain.

I do like the trick where a single lit test file provides multiple kinds of inputs (e.g. to lit + clang + FileCheck), but it is hard to reason about, and would not like to see it used when the test contains control flow.

Why is control flow the cutoff?

So readability & tools are a bit better as python-in-lit, and the need for tools isn't so strong.

I'm not sure I'm following. Which approach does "python-in-lit" refer to here?

Complex python will likely end up in a python module that is simply called from a PYTHON line. That would look a lot like what people already do with %python in RUN. But PYTHON gives people a convenient, well integrated way to keep their python local to the test when that makes the test easier to read and maintain.

I'd like to double this. I've already prepared a bit of python code to generate -verify prefixes that we need for DR testing, which I need in every test. config.prologue gives me an opportunity to move it to a separate file. We can even strive to have 0 explicit RUN lines in DR tests, generating everything in Python code defined in a single file. This way we wouldn't have to duplicate list of C++ language modes 27 times per each test file.

This helps establishing an environment where Clang contributors can focus on writing DR tests simply by following an example, without bothering with RUN lines. It also saves time, because there will be less to teach and less to iterate over during code review.

As a bottom line, there is a problem to solve for at least DR tests. I attempted to solve it in a less ambitious way in https://reviews.llvm.org/D150856. There you can also find detailed discussion of the problem I'm trying to solve. If this patch is going to be rejected, I'd like opponents to give me a different path forward.

I definitely understand the desire to keep lit as simple and focused as we can, and from that perspective, adding support for a full scripting language seems like a bad idea. However, lit also has to meet our testing needs and in its current form, we're missing a direct way to support our uses for defect report testing in C and C++. Because there are thousands of defect reports in C++ alone, we group the tests together in blocks of roughly 100 tests within a single file. However, we need to run those tests semi-hermetically because we do not want an earlier test failure to impact the diagnostics for later tests (this causes us to miss diagnostics that would otherwise be issued and can hide compiler or testing bugs). Additionally, each of these tests needs to be run in all of the various language modes we support and we'd like it to be easy to add new language standard modes to the set without having to update each test file directly. The proposed support in this patch will help solve those problems in a way that I think is pretty straightforward yet powerful enough to be used to solve testing problems for other projects as well (such as the issue flang is having that @MaskRay pointed out).

lit already has the ability to execute python scripts (call %python),

I forgot about that, thanks for the reminder! This means that "python super powers" that I was concerned about earlier are already available to us. Please ignore my earlier concerns regarding that part.

I think it would be worth outlining in this patch why simply calling an external python script (which could actually be embedded within the file) is insufficient for the motivating case.

+1 Also, is this change meant to replace %python? That would be very nice and would also help maintain consistency within our tests. Otherwise, we will have "yet another way" to run Python.

What lit internals do you mean? Do you mean it enables tests to directly access functions and symbols within lit's implementation? This patch specifically avoids that. Access to lit-provided features is via a carefully defined lit object in python.

@jdenny, thanks for clarifying. I assumed (incorrectly) that the intention was to open-up all of LIT's internals to people writing tests. That's not the case. My bad for not reading the summary beforehand.

The proposed support in this patch will help solve those problems in a way that I think is pretty straightforward yet powerful enough to be used to solve testing problems for other projects as well (such as the issue flang is having that @MaskRay pointed out).

IMHO, we should refrain from making assumptions about any potential refactoring that _might_ happen following this change. Unless there are volunteers to implement them. I am referring specifically to Flang:

  • updating ~700 tests is not a small task,
  • those tests were ported from Bash to Python (via %python) to enable better testing on Windows,
  • I am not aware of any limitation of the current approach (with %python) that active Flang developers would like to have solved,
  • those tests don't use FileCheck (previous discussion on that) or any other "typical" LLVM infra/tooling (i.e. there's relatively little alignment with other parts of LLVM).

It was one of my colleagues who ported those Flang tests from Bash to Python - that's how I know the context.

jdenny updated this revision to Diff 547037.Aug 3 2023, 4:00 PM
jdenny edited the summary of this revision. (Show Details)
jdenny removed a reviewer: Restricted Project.

Rebased. Extended docs with comparison to using %python, as requested.

jdenny added a comment.Aug 3 2023, 4:02 PM

Thanks for the additional comments from everyone.

It looks like there's no reliable way to test what I was trying to test, and that test isn't *that* important anyway. I'll remove it since it was broken in the first place, so just rebase your patch. Sorry for the trouble.

No worries, and thanks for the fix. I finally got around to the rebase.

I think it would be worth outlining in this patch why simply calling an external python script (which could actually be embedded within the file) is insufficient for the motivating case.

+1

I just extended TestingGuide.rst. Let me know if it's not what you are looking for.

Also, is this change meant to replace %python? That would be very nice and would also help maintain consistency within our tests. Otherwise, we will have "yet another way" to run Python.

I'm not sure we can prevent every subproject from defining its own %python lit substitution in its test suite configuration. I'd like to think many would eventually choose the PYTHON directive instead.

jdenny added a reviewer: Restricted Project.Aug 3 2023, 4:04 PM
jdenny added a comment.Aug 3 2023, 4:06 PM

This patch will eventually need to be updated for the changes in D156954, but I haven't gotten there yet.

jdenny updated this revision to Diff 547994.Aug 7 2023, 4:46 PM

Rebase onto D156954 and update to its execution trace style.

Thanks for the additional comments from everyone.

It looks like there's no reliable way to test what I was trying to test, and that test isn't *that* important anyway. I'll remove it since it was broken in the first place, so just rebase your patch. Sorry for the trouble.

No worries, and thanks for the fix. I finally got around to the rebase.

I think it would be worth outlining in this patch why simply calling an external python script (which could actually be embedded within the file) is insufficient for the motivating case.

+1

I just extended TestingGuide.rst. Let me know if it's not what you are looking for.

Thanks. In my case at least, I was just referring to enhancing the commit description or even just posting a review comment with the explanation. Enhancing the documentation works for me though.

llvm/docs/TestingGuide.rst
348
jdenny updated this revision to Diff 548237.Aug 8 2023, 8:18 AM

Rebase onto recent D156954 changes to address libcxx pre-commit CI failure.

In docs, apply suggested typo correction, and wordsmith recent text a little.

jdenny marked 2 inline comments as done.Aug 8 2023, 8:20 AM
jdenny added a comment.Aug 8 2023, 8:23 AM

I think it would be worth outlining in this patch why simply calling an external python script (which could actually be embedded within the file) is insufficient for the motivating case.

+1

I just extended TestingGuide.rst. Let me know if it's not what you are looking for.

Thanks. In my case at least, I was just referring to enhancing the commit description or even just posting a review comment with the explanation. Enhancing the documentation works for me though.

Thanks for taking a look (and the typo correction). Hopefully having it in the docs will help if this point continues to be raised, and it might help users choose between %python and PYTHON.

jdenny updated this revision to Diff 548634.Aug 9 2023, 8:19 AM

Rebase onto recent D156954 changes to address another libcxx pre-commit CI failure.

jdenny updated this revision to Diff 553795.Aug 27 2023, 8:26 AM

Replace <lit test> with the actual test file name when compiling PYTHON lines. This info helps the python interpreter produce more informative diagnostics, including not only the test file name but also sometimes full PYTHON lines quoted from the test file. After a future patch, I expect it to also help when setting breakpoints while using a python debugger on lit test files.

Also, set __file__ (normally defined in python scripts) to the test file name.

jdenny updated this revision to Diff 556911.Sep 17 2023, 7:10 AM
jdenny edited the summary of this revision. (Show Details)

Rebased onto PR #66408, which includes recent changes from main. Within TestRunner.py, this rebase required moving runShScript and PythonDirectiveLitAPI into runOnce to make the per-run --per-test-coverage file index visible to lit.run(cmd) calls. Extended tests to cover that case.

Applied python formatting suggestions from darker.

jdenny added a comment.Oct 2 2023, 6:25 PM

Now that the major prerequisites have landed, it would be good to resume our discussion of the PYTHON directive. My recollection is that most people who commented were happy with the concept.

@awarzynski I recall you had some concerns and that we addressed some. What is your current opinion?

@awarzynski I recall you had some concerns and that we addressed some. What is your current opinion?

Sorry for delay, I tend to focus on GitHub these days.

I'm OK with this change. Based on the feedback, this is clearly desirable by various folks. And great deal of effort has gone into preparing LIT for this PR, so I wouldn't want to block it. While I'm still concerned about extending LIT like this, there's an option for people _not to use_ the new directive.

As always with this sort of changes, I suggest "advertising" on Discourse to make sure that people are aware what's coming. Also, is any other sub-project beyond libcxx intending to use this?

Endill added a comment.Oct 4 2023, 4:40 AM

Also, is any other sub-project beyond libcxx intending to use this?

Some of my work on Clang C++ DR test suite is blocked on this (https://reviews.llvm.org/D148433). I've been preparing bits and pieces that play along with this feature for quite a while.

jdenny added a comment.EditedOct 5 2023, 12:57 PM

Sorry for delay, I tend to focus on GitHub these days.

No worries. I'm a bit distracted at the moment anyway.

I'm OK with this change. Based on the feedback, this is clearly desirable by various folks. And great deal of effort has gone into preparing LIT for this PR, so I wouldn't want to block it.

Thanks for your understanding.

While I'm still concerned about extending LIT like this, there's an option for people _not to use_ the new directive.

For people who are not interested in the PYTHON directive, I hope they will at least benefit from the debugger option that D158954 builds on top for the sake of lit tests (whether or not those tests actually use PYTHON directives explicitly). After all the recent discussion of copying and pasting shell code from lit's output to a terminal window and hoping that terminal's shell processes them correctly, and after my own experience trying to replicate environment variables and investigating lit substitution issues, I imagine a proper debugger that can step through RUN lines and examine/adjust state would be a welcome improvement.

As always with this sort of changes, I suggest "advertising" on Discourse to make sure that people are aware what's coming.

As you suggested, this new feature is opt-in. It is not meant to change behavior for existing lit tests. For that reason, I think it's less important to write a general RFC before landing than in the case of, for example, the recent lit debug output changes. I'm inclined to land this one first (after a technical review) and advertise afterward so that folks like @Endill can move forward. That would also permit others to more easily try out the new feature once we do advertise. We can always tweak things afterward as the need arises. Is that a reasonable plan?

jdenny updated this revision to Diff 557640.Oct 7 2023, 1:26 PM
jdenny edited the summary of this revision. (Show Details)

Rebased onto PR #68496, which includes the version of PR #67898 that landed. Thus, this patch no longer includes changes from those PRs.

Fixed an omission of -triple in the example appearing in the commit log and in TestingGuide.rst.

jdenny edited the summary of this revision. (Show Details)Oct 7 2023, 1:33 PM
jdenny removed a reviewer: Restricted Project.