Page MenuHomePhabricator

[test] Add a utility for dumping all tests in the dotest suite
Needs ReviewPublic

Authored by labath on Apr 24 2018, 5:49 AM.

Details

Summary

The utility iterates through the tests in the suite, much like dotest
does, and prints all tests it found. So that it can set up the
environment properly, it needs to be called with a single argument - the
path to the lldb executable.

Right now it's not wired up to anything, but the idea is to call this
from the lldbtest lit format, to enable running tests with a finer
granularity.

The main reason I do this as a separate utility (and not inside the test format)
is that importing our tests requires a lot of fiddling with the import path and
has a bunch of side effects (e.g., loading liblldb into the address space), and
I did not want to expose the main lit process to this.

Event Timeline

labath created this revision.Apr 24 2018, 5:49 AM
JDevlieghere added inline comments.Apr 24 2018, 6:01 AM
lit/Suite/lldbtestdumper.py
27

A comment describing what and why this is necessary might be nice.

40

Any objection to parsing the arg in main and passing it as an argument?

Does it print just the names of the .py files, or does it print the actual test classes and methods in the Class.TestName format, evaluate XFAIL and SKIPs, etc?

I'm also not entirely clear why set_up_environment is even necessary. Nothing below the call to set_up_environment even seems to depend on the environment being set up.

Does it print just the names of the .py files, or does it print the actual test classes and methods in the Class.TestName format, evaluate XFAIL and SKIPs, etc?

It prints everything: test file (including the relative path in the test suite), test class name and test method name.

It does not try to evaluate skips or xfails, but that was never the intention. The idea is to feed this information to lit (as it needs to know about the set of all tests), which can then run them with a bigger (smaller?) granularity. The interpretation of the test results (skip,fail, ...) will still be done by the lit test format, only now it could do it at a test-method level instead of file-level.

I'm also not entirely clear why set_up_environment is even necessary. Nothing below the call to set_up_environment even seems to depend on the environment being set up.

In the main function I call unittest2.defaultTestLoader.loadTestsFromName(base), which deep down does an __import__(base). We need the environment of the import statement to be reasonably close to the the environment that dotest provides so that the import will succeed.

Does it print just the names of the .py files, or does it print the actual test classes and methods in the Class.TestName format, evaluate XFAIL and SKIPs, etc?

It prints everything: test file (including the relative path in the test suite), test class name and test method name.

It does not try to evaluate skips or xfails, but that was never the intention. The idea is to feed this information to lit (as it needs to know about the set of all tests), which can then run them with a bigger (smaller?) granularity. The interpretation of the test results (skip,fail, ...) will still be done by the lit test format, only now it could do it at a test-method level instead of file-level.

I thought the intention was going to be parallelize at file-granularity rather than method granularity, since the whole point of grouping tests together into classes is that they can share similar set up and tear down which may be expensive to perform multiple times. If we're going to parallelize at method-granularity, I would rather have one .py file per method.

Does it print just the names of the .py files, or does it print the actual test classes and methods in the Class.TestName format, evaluate XFAIL and SKIPs, etc?

It prints everything: test file (including the relative path in the test suite), test class name and test method name.

It does not try to evaluate skips or xfails, but that was never the intention. The idea is to feed this information to lit (as it needs to know about the set of all tests), which can then run them with a bigger (smaller?) granularity. The interpretation of the test results (skip,fail, ...) will still be done by the lit test format, only now it could do it at a test-method level instead of file-level.

I thought the intention was going to be parallelize at file-granularity rather than method granularity, since the whole point of grouping tests together into classes is that they can share similar set up and tear down which may be expensive to perform multiple times. If we're going to parallelize at method-granularity, I would rather have one .py file per method.

Note that there's currently no precedent (that i'm aware of anwyay) in LLVM or any of its subprojects for splitting the running of a single file into multiple parallel jobs. All of LLVM's other projects parallelize at file granularity, and so it's up to to the person writing tests to ensure that the file granularity matches that with which they want tests to be parallelized at.

That doesn't mean it's not possible (as you've shown here), but it adds some additional complexity and I'm not sure it's worth it.

There are currently two use cases I can come up with for grouping lots of tests together in a single class.

  1. Sharing common set up and tear down code that is expensive.
  1. Sharing common utility functions that all test cases need to use.

For the first case, we don't *want* to parallelize at a finer granularity because it will cause extra work to be done.

And for the second case, we can solve this by having a file in a directory named util.py and writing from . import util at the top of each test case file. Then have the lit config skip files named util.py in its discovery.

Actually, for the first case, I'm thinking we can extend lit to support somethign like per-directory set up. Like if there is a file present called lit.init.cfg, then this file contains some code that is run once per directory before any of the tests. Then we could still split up the test cases into multiple ones and parallelize finer.

I thought the intention was going to be parallelize at file-granularity rather than method granularity, since the whole point of grouping tests together into classes is that they can share similar set up and tear down which may be expensive to perform multiple times. If we're going to parallelize at method-granularity, I would rather have one .py file per method.

Hm... good question. I don't think any of our current tests take advantage of this possibility (sharing set-up code). Well.. except if you don't count bringing up the entire dotest machinery as set-up code (which does take a non-zero amount of time, but hopefully that can be brought down once we move everything to lit).

However, maybe this does need more discussion.

FWIW, I think that we shouldn't enforce the one-test-method-per-file completely rigidly. It may make sense to separate out some of the things, but I think that our tests often need some similar utility functions, which are too specific to put in some generic library, and it's easier to share these if they are in the same file. Having multiple tests in the same file also reduces the boiler plate and avoids another round of modifying each test file. (I think even lit acknowledges the usefulness of having logically separate tests in the same file (CHECK-LABEL)).

I thought the intention was going to be parallelize at file-granularity rather than method granularity, since the whole point of grouping tests together into classes is that they can share similar set up and tear down which may be expensive to perform multiple times. If we're going to parallelize at method-granularity, I would rather have one .py file per method.

Hm... good question. I don't think any of our current tests take advantage of this possibility (sharing set-up code). Well.. except if you don't count bringing up the entire dotest machinery as set-up code (which does take a non-zero amount of time, but hopefully that can be brought down once we move everything to lit).

However, maybe this does need more discussion.

FWIW, I think that we shouldn't enforce the one-test-method-per-file completely rigidly. It may make sense to separate out some of the things, but I think that our tests often need some similar utility functions, which are too specific to put in some generic library, and it's easier to share these if they are in the same file. Having multiple tests in the same file also reduces the boiler plate and avoids another round of modifying each test file. (I think even lit acknowledges the usefulness of having logically separate tests in the same file (CHECK-LABEL)).

I agree we shouldn't enforce one test per file. I just think it's may be worth only parallelizing at the file level. If you want two tests to be able to run in parallel, perhaps we should say that those tests must be in different files (perhaps even in the same directory).

LLVM's lit / FileCheck tests often also have multiple tests in one file. But all tests in the same file run sequentially.

Note that there's currently no precedent (that i'm aware of anwyay) in LLVM or any of its subprojects for splitting the running of a single file into multiple parallel jobs. All of LLVM's other projects parallelize at file granularity, and so it's up to to the person writing tests to ensure that the file granularity matches that with which they want tests to be parallelized at.

There's always gtest. Regardless of whether you consider a .cpp file or the built exectable to be a "test file", it will generally contain a number of tests. And arguably, our test suite is more similar to the gtest suite than the traditional lit (ShTest) suites (for one, it hijacks a third party unit testing library, and then tries to make it run under lit). And we run all gtest tests individually (I've often wondered what kind of performance impact that has, but they seem to be running fast enough anyway...).

For what it's worth, paralelization is not my motivation here. There some tests which run disproportionately long, and this will speed them up, but that may be offset by the fact we need to start more work this way (if anyone is interested I can try to do some measurements). My main reason for doing this is so that we can have better mapping of test result states. As it stands now, we only have two possible results for a test: passed or failed. Lit has more results than that, and they roughly match the existing dotest states (the different treatment of XFAILs is the biggest difference), so it should be possible to represent them with more fidelity. However, right now it's not possible to translate them reasonably, as a "single test" can result in any number of fails/xfails/skips/... (or no results at all).

That doesn't mean it's not possible (as you've shown here), but it adds some additional complexity and I'm not sure it's worth it.

It adds complexity, but not much imho. I was actually pleasantly surprised at how easy it was to pull this off. The entire implementation is 78 LOC right now. The changes to the test format will probably push it over a 100, but not by much.

That said, I am not interested in forcing this onto everyone. I did it because it seemed like a nice thing to have and I was curious if it is doable (easily). However, if there's no interest for it, then I can just abandon it..

zturner added a subscriber: labath.Apr 25 2018, 8:13 AM

Well let’s see what Davide and Adrian think. I’m more of an outsider these
days so consider my perspective an llvm-centric one, which would sometimes
(but not always) be the best for lldb

I've given this some more though as I see valid points and concerns on both sides of the argument. I'm leaning towards running test cases in parallel because it offloads more work to lit at negligible cost (running make and launching the inferior are the biggest time consumers).

Other advantages are that:

  • It would also make it at lot easier to rerun a single test case because lit is in a better position to inform us how to reproduce a failure.
  • It would give a more granular overview of what passed, failed or skipped. The cmake bot on GreenDragon runs the test suite with lit and that means that failures are only reported at file level, which imho is to coarse.

If the majority of active contributors think this is the right direction then don't let me hold it up. I mostly just wanted to raise the concern, but if you've evaluated the pros and cons and made a decision, I'm fine with that :)

Thank you for the support Jonas. I've quickly prototyped a patch (D47265) of how would this integrate with lit. The nice part about it is that it is very small. If you consider the inline test refactor it even has negative LOC. It would definitely have negative LOC impact if this would allow us to remote test driver parts from dotest.

The not-so-nice part about it is that is has more impact on test running time than I expected. For me the time to run the tests goes from 1m38s to 2m30s. I am not extremely worried about that as that is still pretty fast for me (*), but I don't know how that scales, so it could be that this makes a significant difference for people with slower machines. I encourage you to give it a try (you need to apply it together with this patch), and let me know what you think.

(*) I should also point out that the competing solution of putting every test into it's own file would likely have similar impact, as this is mostly due to fixed startup cost of spinning up a python +lldb process for the test. I also hope that some of this slowdown can be reclaimed as we garbage collect parts of dotest that would become unneeded after this.

zturner added a comment.EditedMay 23 2018, 9:01 AM

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I think that would shave off up to 75% of the test suite runtime.

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I'm a big proponent of moving as much logic as possible into the configuration stage, but a common use case (at least for us) is testing a single build with a different compiler/configuration.

I think that would shave off up to 75% of the test suite runtime.

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I'm a big proponent of moving as much logic as possible into the configuration stage, but a common use case (at least for us) is testing a single build with a different compiler/configuration.

Then the configure / CMake stage could build all inferiors with every possible configuration you want to test, each one writing to a different output directory. Sorta like how in LLVM you can specify LLVM_TARGETS_TO_BUILD=X86,ARM,... you could specify something like LLDB_TEST_INFERIOR_CONFIGS=gcc-x64;gcc-x86;clang-arm64.

Obviously this is a big change and a lot of work, but I think it would make the test suite run in under 30 seconds

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I'm a big proponent of moving as much logic as possible into the configuration stage, but a common use case (at least for us) is testing a single build with a different compiler/configuration.

Then the configure / CMake stage could build all inferiors with every possible configuration you want to test, each one writing to a different output directory. Sorta like how in LLVM you can specify LLVM_TARGETS_TO_BUILD=X86,ARM,... you could specify something like LLDB_TEST_INFERIOR_CONFIGS=gcc-x64;gcc-x86;clang-arm64.

Obviously this is a big change and a lot of work, but I think it would make the test suite run in under 30 seconds

I don't think that moving this to the configure stage is the right choice. I'm also skeptical about your claim about the saved time (are you talking about Windows?).

In my opinion the configuration should configure what is being built, not what is being tested. I don't want to have to lock down at configure time which tests I will be running. As an analogy, this is similar to how LLVM has the in-tree regression tests but also the external test-suite that can be run with endless permutations of options, to test one specific build of LLVM, without having to reconfigure/rebuild.

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I'm a big proponent of moving as much logic as possible into the configuration stage, but a common use case (at least for us) is testing a single build with a different compiler/configuration.

Then the configure / CMake stage could build all inferiors with every possible configuration you want to test, each one writing to a different output directory. Sorta like how in LLVM you can specify LLVM_TARGETS_TO_BUILD=X86,ARM,... you could specify something like LLDB_TEST_INFERIOR_CONFIGS=gcc-x64;gcc-x86;clang-arm64.

Obviously this is a big change and a lot of work, but I think it would make the test suite run in under 30 seconds

I don't think that moving this to the configure stage is the right choice. I'm also skeptical about your claim about the saved time (are you talking about Windows?).

In my opinion the configuration should configure what is being built, not what is being tested. I don't want to have to lock down at configure time which tests I will be running. As an analogy, this is similar to how LLVM has the in-tree regression tests but also the external test-suite that can be run with endless permutations of options, to test one specific build of LLVM, without having to reconfigure/rebuild.

This is a bit of a digression from this patch, but no I wasn't talking about specific to Windows. The amount of time required to compile several thousand inferiors is quite significant, especially since it is completely redundant and shouldn't be changing between each run of the test suite. So it's unnecessary.

Also just because it could be done during the CMake configure step doesn't mean it couldn't *also* be done via some external script. Similar to how if you run ninja check-lldb it runs with the default set of options, but you can still run dotest.py manually to pass additional flags. So building inferiors as part of the configure step does not necessitate locking down at configure time which tests you will be running.

FWIW, I think the single biggest improvement one could make to the LLDB test suite runtime is to compile all inferiors up front as part of the CMake step. If you run the test suite twice every inferior is unnecessarily compiled twice.

I'm a big proponent of moving as much logic as possible into the configuration stage, but a common use case (at least for us) is testing a single build with a different compiler/configuration.

Then the configure / CMake stage could build all inferiors with every possible configuration you want to test, each one writing to a different output directory. Sorta like how in LLVM you can specify LLVM_TARGETS_TO_BUILD=X86,ARM,... you could specify something like LLDB_TEST_INFERIOR_CONFIGS=gcc-x64;gcc-x86;clang-arm64.

Obviously this is a big change and a lot of work, but I think it would make the test suite run in under 30 seconds

I don't think that moving this to the configure stage is the right choice. I'm also skeptical about your claim about the saved time (are you talking about Windows?).

In my opinion the configuration should configure what is being built, not what is being tested. I don't want to have to lock down at configure time which tests I will be running. As an analogy, this is similar to how LLVM has the in-tree regression tests but also the external test-suite that can be run with endless permutations of options, to test one specific build of LLVM, without having to reconfigure/rebuild.

This is a bit of a digression from this patch,

Agreed :-)

but no I wasn't talking about specific to Windows. The amount of time required to compile several thousand inferiors is quite significant, especially since it is completely redundant and shouldn't be changing between each run of the test suite. So it's unnecessary.

I suppose you are looking at this from the perspective of an LLDB developer, where rebuilding the tests after changing something in LLDB seems redundant. I was thinking about this from the perspective of our bots, and there LLVM changes much much faster than LLDB, so between two bot runs the only thing that changes is the compiler, which forces you to rebuild all the source-based tests anyway.

... but let's postpone this discussion for another day. As you said it's a diversion for this review.

Also just because it could be done during the CMake configure step doesn't mean it couldn't *also* be done via some external script. Similar to how if you run ninja check-lldb it runs with the default set of options, but you can still run dotest.py manually to pass additional flags. So building inferiors as part of the configure step does not necessitate locking down at configure time which tests you will be running.