Add a new --order option to choose between available test orders:
the default "smart" order, predictable "lexical" order or "random"
order. Default to using lexical order and one job in the lit test
suite.
Details
Diff Detail
Event Timeline
Thanks for working on this!
The patch doesn't spell it consistently: sometimes --lexical and sometimes --lexical-order.
Shouldn't we put --lexical-order directly into lit's test suite's %{lit} substitution? That way, if another of lit's current or future tests is susceptible to this problem, no one has to waste time trying to reproduce the race and figure out what's happening. Blindly using %{lit} in lit's test suite would just work.
As far as I know, reorder.py is the only test in lit's test suite that requires the reordering. Either it can use a %{lit-reordered} substitution that drops --lexical-order, or it can use a new command-line option that selects the default order and overrides any previous --lexical-order or --shuffle.
llvm/utils/lit/tests/shtest-shell.py | ||
---|---|---|
10–11 | Any reason to keep this? It might be worthwhile to grep for lit_test_times, if you didn't already. |
Sorry, I've forgotten how I named it :-D.
Shouldn't we put --lexical-order directly into lit's test suite's %{lit} substitution? That way, if another of lit's current or future tests is susceptible to this problem, no one has to waste time trying to reproduce the race and figure out what's happening. Blindly using %{lit} in lit's test suite would just work.
I don't mind that. However, should we also put -j 1 then?
As far as I know, reorder.py is the only test in lit's test suite that requires the reordering. Either it can use a %{lit-reordered} substitution that drops --lexical-order, or it can use a new command-line option that selects the default order and overrides any previous --lexical-order or --shuffle.
I suppose I could do that, and then replace the mutually exclusive group approach with 'last option wins'.
llvm/utils/lit/tests/shtest-shell.py | ||
---|---|---|
10–11 | I was grepping for order FIXMEs ;-). Fixed now. |
Add --default-order. Last option specified takes precedence. Include --lexical-order in %{lit} substitution.
Thanks for continuing to work on all this.
Good point. I haven't looked to see whether the default value of -j is ever expected in lit's test suite. Anyway, maybe that should be a different patch to make reviewing/debugging easier.
Please don't forget this.
llvm/utils/lit/lit/cl_arguments.py | ||
---|---|---|
154 | Instead of --default-order, I'd prefer something that describes the order. I'm not sure of the best name, especially given that it's always possible we'll think of other tweaks to the default order in the future. Perhaps something like --auto-order or --smart-order to at least suggest it's something clever. What do you think? | |
llvm/utils/lit/tests/lit.cfg | ||
64 | I think it would be helpful to extend the above comments to explain why --lexical-order is needed. | |
llvm/utils/lit/tests/reorder.py | ||
4 | As far as I can tell, we now have no test that --default-order is actually the default. Sorry, I didn't think of that when I made that suggestion. If we ever broke that behavior (for example, someone could reorganize the arg specification in a subtly broken way), I wonder how long it would be before anyone realized that test suites started taking much longer than necessary. What if we have something like a %{lit-no-order-opt} that doesn't specify any order options? reorder.py could then run lit twice, once to ensure --default-order is indeed the default, and once to ensure it overrides --lexical-order and --shuffle when specified afterward. I guess that means --default-order isn't so useful in lit's own test suite. It's just another feature to test. However, as I mentioned in D107427, I'd like to use it in clang/test/utils too for testing test-suite-generation scripts. |
llvm/utils/lit/tests/reorder.py | ||
---|---|---|
4 |
Well, that actually makes no sense. I want --lexical-order there not --default-order. Hmm. I'm not opposed to keeping something like --default-order for completeness, but I'm now having trouble seeing a real use case. What do others think? |
llvm/utils/lit/tests/reorder.py | ||
---|---|---|
4 | To be honest, I don't really see a use case for testing which order is the default. After all, setting the default order is a matter of changing the parameter to default=… A test for this sounds like asking people to change the default in two places. |
llvm/utils/lit/tests/reorder.py | ||
---|---|---|
4 | I don't agree with the argument that a behavior whose current implementation looks simple doesn't need a regression test, and I've previously found that the apparent simplicity of an argparse config can be deceptive. If we're going to implement these options, then I think we should test them fully, especially that the intended default behavior is actually the default. If the default ever changes accidentally, it could quietly affect many people. Moreover, testing it is straight-forward: we just need another lit substitution to use in reorder.py. After further thought, I do think having a different option for each possible order is likely to prove useful, so I vote we keep --default-order (or whatever we end up calling it). |
Thanks for working on this! :)
llvm/utils/lit/lit/cl_arguments.py | ||
---|---|---|
14 | While we are at it, can we also give DEFAULT a more descriptive name here? Order predicate: (not t.previous_failure, -t.previous_elapsed, t.getFullName())
Since putting all of the above in an enum name is quite a mouthful, I am advocating for smart as suggested by Joel. Together with my other comment (see below) the help text could then be: This would align the names used for options and names in the code. | |
154 | I would also prefer descriptive names instead of "default". If we already put the effort i to improve ordering, how about: --order=(aaa|bbb|...) and make the old ones aliases for the new ones: --shuffle --> --order=random. I think argparse spells out all possible options and even shows which one is the default when printing the help: --order=(smart|aaa|bbb|...) [default: smart] So it shows the user the available options, which option is the default, and the explanation of these options is right next to it. |
llvm/utils/lit/lit/cl_arguments.py | ||
---|---|---|
154 | If it wasn't clear, I am advocating for using parser.add_argument('--order', choices=['smart', aaa', 'bbb']) here. |
Switched to choice-based --order= option. Renamed 'default' to 'smart'. Added a new lit subst not to include order args. Made %{lit} include -j1 as well. Added the comment.
I think this addresses all the comments.
I have also removed the enum since it seemed to make little sense now... but I have a new idea, so I might reintroduce it in a minute ;-).
Restore the TestOrder enum. It doesn't change much logically but gives some abstraction that'll hopefully make uncaught typos less likely in the future.
--order is nice.
Also, the docs should be updated: https://llvm.org/docs/CommandGuide/lit.html
This is still needed.
Hey, I don't know if this has anything to do with this change.
But I see the "llvm-clang-x86_64-sie-win" buildbot (https://lab.llvm.org/buildbot/#/builders/216) failing the reorder.py testcase randomly every 10-20 builds.
See https://lab.llvm.org/buildbot/#/builders/216 for the list of recent builds.
I didn't look deeply into the root cause here, but thought it would be good to ping here in case anyone has an idea to get the bot more stable.
I'm the owner of that buildbot, but unfortunately have not been able to get the failure to reproduce outside of the buildbot initiated build environment, so I haven't been able to really look into the issue. Any ideas how to possibly debug and/or fix the issue would be greatly appreciated as I've been a bit stuck on how to get it more stable!
It's kind of concerning that the test (reorder.py) writes to the source-controlled input file. I'm not sure if using a temporary directory to save intermediate files (like most other tests do) would fix the issue.
Alternatively, as the last resort, I'd suggest temporarily inserting some extra RUN lines in the test to help debug the random failure.
While we are at it, can we also give DEFAULT a more descriptive name here?
Order predicate: (not t.previous_failure, -t.previous_elapsed, t.getFullName())
Since putting all of the above in an enum name is quite a mouthful, I am advocating for smart as suggested by Joel.
Together with my other comment (see below) the help text could then be:
--order=(smart|aaa|bbb|...) [default: smart]
This would align the names used for options and names in the code.