Page MenuHomePhabricator

[lit] Add an option to print all features used in tests
ClosedPublic

Authored by ldionne on Apr 21 2020, 2:04 PM.

Details

Summary

Lit test suites can tend to accumulate annotations that are not necessarily
relevant as time goes by, for example XFAILS on old compilers or platforms.
To help spot old annotations that can be cleaned up, it can be useful to
look at all features used inside a test suite.

This commit adds a new option '--show-features' that prints all the
features used in XFAIL, REQUIRES and UNSUPPORTED of all tests that
are discovered.

Diff Detail

Event Timeline

ldionne created this revision.Apr 21 2020, 2:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 21 2020, 2:04 PM

As explained in the commit message, the main goal of this feature is to allow cleaning up Lit test suites that accumulate old markup. This allowed me to remove several unused features from the libc++ test suite, and also to find out some mistakes (e.g. some tests were requiring C++14 instead of c++14, and were never run). While this may seem trivial for a small test suite, it's actually very useful to have this programmatic tool for large test suites like libc++'s.

yln added a comment.Apr 21 2020, 3:24 PM

I like this. Thanks for working on it!

Please check my assumptions, about features:
Available, but unused -> cruft
Not available, but used -> okay (e.g., test disabled because not right platform/architecture)

After adding all my comments above, I am assuming the workflow is as follows:
First run with --show-suites to see available features, then run --show-features to see used/requested features and cleanup the ones which are available but unused.

How about making this even easier and more discoverable by adding it directly to --show-suites to encourage people to clean things up?

Append "(unused)" marker to available, but unused feature to draw attention that they could be removed; or maybe even add a separate line for "unused" features.
Add another line for "unavailable" features.

Available features: darwin, win95 (unused)
Unavailable/requested features: linux

or

Available features: darwin
Unavailable features: linux
Unused features: win95

What do you think?

llvm/utils/lit/lit/Test.py
372

Is parsed a dict? If so, then I think we get a KeyError here; or the or [] part is redundant.

374

feature_keywords = ['UNSUPPORTED', ..]
and then do some mapping over that: keywords -> dict lookup -> chain -> tokenize -> chain

377

Prefer being dry over performance, remove if expr != '*'

llvm/utils/lit/lit/TestRunner.py
1462

KeyError or redundant or []?

llvm/utils/lit/lit/cl_arguments.py
160

Maybe name --show-used-features?

161

Show features used by selected tests; or use discovered_tests below

llvm/utils/lit/lit/main.py
86

If the main use case for this is identifying obsolete features that are defined, but never used then we could move this up and use discovered_tests. Or is it also useful to ask the question, which features do these selected/filtered tests use?

135–136

Import at top is better; remove

ldionne updated this revision to Diff 259281.Apr 22 2020, 7:14 AM
ldionne marked 14 inline comments as done.

Address review comments.

Also reuse BooleanExpression.isIdentifier to avoid code duplication.

In D78589#1995591, @yln wrote:

I like this. Thanks for working on it!

Please check my assumptions, about features:
Available, but unused -> cruft
Not available, but used -> okay (e.g., test disabled because not right platform/architecture)

Yes, that's correct. There's also some features that are used in the test suite but never made available from the configuration, and those are cruft too. They are usually harder to spot because they can be created programmatically from the configuration, but those are the ones that are (usually) worth cleaning up.

After adding all my comments above, I am assuming the workflow is as follows:
First run with --show-suites to see available features, then run --show-features to see used/requested features and cleanup the ones which are available but unused.

How about making this even easier and more discoverable by adding it directly to --show-suites to encourage people to clean things up?

Append "(unused)" marker to available, but unused feature to draw attention that they could be removed; or maybe even add a separate line for "unused" features.
Add another line for "unavailable" features.

Available features: darwin, win95 (unused)
Unavailable/requested features: linux

or

Available features: darwin
Unavailable features: linux
Unused features: win95

What do you think?

I thought it made a lot of sense so I tried it out. I added a fake win95 feature somewhere in the libc++ test suite, made the Lit changes you suggested, and ran --show-suites:

-- Test Suites --
  libc++ - 6353 tests
    Source Root: [...]
    Exec Root  : [...]
    Available Features: apple-clang-11 apple-clang-11.0 apple-clang-11.0.0 -faligned-allocation -fsized-deallocation apple-clang availability=macosx availability=macosx10.15 c++2a darwin diagnose-if-support fcoroutines-ts fdelayed-template-parsing has-fblocks has-fobjc-arc libc++ libcpp-no-concepts locale.cs_CZ.ISO8859-2 locale.en_US.UTF-8 locale.fr_CA.ISO8859-1 locale.fr_FR.UTF-8 locale.ru_RU.UTF-8 locale.zh_CN.UTF-8 long_tests modules-support objective-c++ thread-safety
    Unavailable Features: -fmodules -fno-rtti LIBCXX-WINDOWS-FIXME apple-clang-10 apple-clang-10.0 apple-clang-10.0.0 apple-clang-9 apple-clang-9.0 apple-clang-9.1 asan availability=macosx10.10 availability=macosx10.11 availability=macosx10.12 availability=macosx10.13 availability=macosx10.14 availability=macosx10.9 c++03 c++11 c++14 c++17 c++98 c++filesystem-disabled clang clang-4 clang-4.0 clang-5 clang-5.0 clang-6 clang-6.0 clang-7 clang-7.0 clang-8 clang-8.0 clang-9 dylib-has-no-bad_any_cast dylib-has-no-bad_optional_access dylib-has-no-bad_variant_access dylib-has-no-shared_mutex gcc gcc-5 gcc-5.1 gcc-5.2 gcc-6 gcc-7 gcc-8 gcc-9 libatomic libcpp-has-no-global-filesystem-namespace libcpp-has-no-monotonic-clock libcpp-has-no-stdin libcpp-has-no-stdout libcpp-has-no-thread-unsafe-c-functions libcpp-has-no-threads libcpp-has-thread-api-external libcpp-has-thread-api-pthread libcpp-no-deduction-guides libcpp-no-exceptions libcpp-no-if-constexpr libcpp-no-structured-bindings libcpp-no-vcruntime libcxx_gdb libstdc++ linux linux-gnu msan msvc netbsd newlib sanitizer-new-delete suse-linux-enterprise-server-11 system-windows template-cost-testing ubsan verify-support win95 windows with_system_cxx_lib=macosx with_system_cxx_lib=macosx10.10 with_system_cxx_lib=macosx10.11 with_system_cxx_lib=macosx10.12 with_system_cxx_lib=macosx10.13 with_system_cxx_lib=macosx10.14 with_system_cxx_lib=macosx10.15 with_system_cxx_lib=macosx10.9
    Unused Features: apple-clang-11 apple-clang-11.0 apple-clang-11.0.0 availability=macosx
    Available Substitutions: [...]

Apologies for the large output, however I think this is the key to knowing whether it's worth doing this or not. My experience so far has been that cleaning up features is a fairly manual task -- one has to look at features individually and determine what to do with them. That's true both for features that are mentioned in the test suite (that we might want to clean up) and those set by the configuration (that we might want to clean up too).

Looking at the above output, I make the following observations:

  1. There's a lot of output because there's a lot of features (both set and used)
  2. The fake win95 feature is difficult to spot amongst all the Unavailable Features
  3. The Unused Features are actually not really that useful. Even though apple-clang-11 is set but not used by the suite right now, it could be used in the future. It's just set programmatically and it's not a crufty feature.

For these reasons, I believe this output is actually more complicated than a simple output of all the features used by the test suite, and that's not super helpful. So my preference would be to stick with a command-line option that does exactly one thing, i.e. output the features used by the test suite. One can then use that to guide a manual investigation of what to do with them.

However, if you feel strongly that the above output is useful, my changes are stashed so I can do it without too much trouble.

llvm/utils/lit/lit/Test.py
372

It's a dict, but the parser's values are None if they haven't parsed anything. So the keys could be e.g. {'UNSUPPORTED:' : None}

377

It doesn't work when expr == *, that's why I added it :-). The problem is that '*' is not a valid value for a BooleanExpression, however '*' is still used in XFAIL: because it has a special meaning there (see https://github.com/llvm/llvm-project/blob/master/llvm/utils/lit/tests/Inputs/shtest-format/requires-star.txt and https://github.com/llvm/llvm-project/blob/master/llvm/utils/lit/lit/Test.py#L285).

llvm/utils/lit/lit/TestRunner.py
1377–1379

I'm open to better name suggestions.

1462

'RUN:' is "special" in that we know it's always going to be a (maybe empty) list. I'll add or [] for consistency though, cause I agree it's non-trivial to know that (and it's arguably an implementation detail).

llvm/utils/lit/lit/cl_arguments.py
161

Done. However, I do think it would make more sense for all of them to use the filtered_tests instead, don't you think? The argument is that it's simply more general: no filter is equivalent to discovered_tests, but it also allows filtering if one wants.

Should I create a follow-up patch?

llvm/utils/lit/lit/main.py
86

Moved above to discovered_tests.

yln added a comment.Apr 24 2020, 4:12 PM

Thanks for being thorough and explaining your reasoning so well. I now agree that "Unavailable Features" introduces too much noise to simply add it to the existing option without the user asking for it.

Two more quick questions:
We present features as a sum of all used features, however they are specific to configs/test suites. Could we group them by test suite? Maybe a good way to do this would be to extend the print_discovered function. This way we can also combine output from the "show and exit" options.

Currently we have "Available Features" in --show-suites, which can be either used or unused. When the user also asks for --show-used-features then maybe it would still be helpful to somehow mark the unused ones without introducing too much visual noise (e.g., with an asterisk + footnote). This would help with "de-crufting" in at least one direction.

llvm/utils/lit/lit/Test.py
377

Got it! Thanks for explaining.

llvm/utils/lit/lit/cl_arguments.py
161

That makes sense. Feel free to send a patch.

ychen added a subscriber: ychen.Apr 27 2020, 10:57 AM

Sorry for the delay in replying to this -- I got pulled into other things.

In D78589#2002960, @yln wrote:

Thanks for being thorough and explaining your reasoning so well. I now agree that "Unavailable Features" introduces too much noise to simply add it to the existing option without the user asking for it.

Two more quick questions:

  1. We present features as a sum of all used features, however they are specific to configs/test suites. Could we group them by test suite? Maybe a good way to do this would be to extend the print_discovered function. This way we can also combine output from the "show and exit" options.

I actually see the features as being whatever features are used in the test files discovered by Lit -- not as a per test-suite thing. I actually don't think it makes a lot of sense to run Lit with multiple test suites at once (for example --param are shared across all test suites which is weird), but that's a different story.

  1. Currently we have "Available Features" in --show-suites, which can be either used or unused. When the user also asks for --show-used-features then maybe it would still be helpful to somehow mark the unused ones without introducing too much visual noise (e.g., with an asterisk + footnote). This would help with "de-crufting" in at least one direction.

When we use --show-used-features, "Available Features" are not printed. Are you saying when one uses both --show-suites and --show-used-features? IMO, that's adding complexity for fairly little benefit -- I doubt anyone would even notice that feature. I like thinking that Lit can be used as a swiss army knife for extracting information about a test suite (or a bunch of tests), and then combined with a bit of Bash scripting or whatever to obtain the desired information.

Unless you feel strongly about points above, I'd like to move forward with this to avoid rotting of this patch.

ldionne updated this revision to Diff 266963.May 28 2020, 11:23 AM

Rebase onto master

yln accepted this revision.May 28 2020, 2:52 PM

Okay, thanks!

This revision is now accepted and ready to land.May 28 2020, 2:52 PM
This revision was automatically updated to reflect the committed changes.