This is an archive of the discontinued LLVM Phabricator instance.

RFC/WIP: Have lit run the lldb test suite
AbandonedPublic

Authored by JDevlieghere on Apr 3 2018, 10:24 AM.

Details

Summary

With lldb-dotest checked in, this is the next step in allowing us to run the LLDB test suite with lit. I've converted a single test to give an idea of what I have in mind.

My goal is for this to live next to lldb-dotest until we're convinced that this is the way to go, at which point we can start moving functionality from dotest to lit. For example, with the current approach when a test is not supported it will show up as PASS in lit because dotest returned exit code 0.

Diff Detail

Event Timeline

JDevlieghere created this revision.Apr 3 2018, 10:24 AM

Can you add another test or two? It's a little complicated to see what's going on here, but from your description, it makes sense.
I'm not particularly worried right now to distinguish between UNSUPPORTED and PASS. In practice, it doesn't matter (at least for the transition phase).

Can you add another test or two? It's a little complicated to see what's going on here, but from your description, it makes sense.
I'm not particularly worried right now to distinguish between UNSUPPORTED and PASS. In practice, it doesn't matter (at least for the transition phase).

Sure thing, let me update the diff with some more examples!

So this is basically replacing the parallel test-driver functionality of dotest with lit and dotest is only used to invoke one test at a time. This way we (as the LLVM project) can avoid maintaining to test drivers implemented in python. That clearly sounds like the right direction forward!

I wonder if there is anything more automatic we could do instead of requiring a .test file in every directory. For example: could we have lit recognize .py files and stick the RUN line into the py files?

JDevlieghere added a comment.EditedApr 3 2018, 10:36 AM

So this is basically replacing the parallel test-driver functionality of dotest with lit and dotest is only used to invoke one test at a time. This way we (as the LLVM project) can avoid maintaining to test drivers implemented in python. That clearly sounds like the right direction forward!

Exactly :-)

I wonder if there is anything more automatic we could do instead of requiring a .test file in every directory. For example: could we have lit recognize .py files and stick the RUN line into the py files?

That's an excellent idea, let me give that a shot!

  • Add few more examples as per Davide's request.
  • Add run line to the python file.

Because the .py suffix is currently specified in the root of the TestSuite directory, we end up with a bunch of unresolved tests. One alternative is having a local config in each subdirectory, but that kind of defeats the point of moving the run line into the python files. Either way we'll have to come up with some kind of solution, because not all .py files are test cases. LMKWYT!

I don't think this is going in a good direction TBH.

You are building another layer on top of everything, whereas I think we should be cutting layers out. Besides the issues already pointed out (not being able to differentiate PASS/XFAIL/SKIP, not all .py files being test files), I see one more problem: a single file does not contain a single test -- some of our test files have dozens of tests, and this would bunch them together.

I think the solution here is to invent a new lit test format, instead of trying to fit our very square tests into the round ShTest boxes. Of the existing test formats, I think that actually the googletest format is closest to what we need here, and that's because it supports the notion of a "test" being different from a "file" -- gtest executables typically contain dozens if not hundreds of test, and yet googletest format is able to recognize each one individually. The only difference is that instead of running something like "my_google_test_exec --gtest_list_all_tests" you would use some python introspection to figure out the list of tests within a file.

Besides this, having our own test format would allow us to resolve the other problems of this approach as well:

  • since it's the test format who determines the result of the test, it would be trivial to come up with some sort of a protocol (or reusing an existing one) to notify lit of the full range of test results (pass, fail, xfail, unsupported)
  • the test format could know that a "test file" is everything ending in ".py" and starting with Test (which is exactly the rules that we follow now), so no special or new conventions would be needed.
  • it would give us full isolation between individual test methods in a file, while still having the convenience of being able to factor out common code into utility functions

I know this is a bit more up-front work, but I think this will result in a much nicer final product, and will allow us to remove a lot more code more easily (maybe even all of unittest2 eventually).

(I apologise for the rashness of my response, I can go into this in more detail tomorrow).

I don't think this is going in a good direction TBH.

You are building another layer on top of everything, whereas I think we should be cutting layers out. Besides the issues already pointed out (not being able to differentiate PASS/XFAIL/SKIP, not all .py files being test files), I see one more problem: a single file does not contain a single test -- some of our test files have dozens of tests, and this would bunch them together.

I completely agree and removing the driver logic from dotest would contribute to that goal, no?

I think the solution here is to invent a new lit test format, instead of trying to fit our very square tests into the round ShTest boxes. Of the existing test formats, I think that actually the googletest format is closest to what we need here, and that's because it supports the notion of a "test" being different from a "file" -- gtest executables typically contain dozens if not hundreds of test, and yet googletest format is able to recognize each one individually. The only difference is that instead of running something like "my_google_test_exec --gtest_list_all_tests" you would use some python introspection to figure out the list of tests within a file.

Great, I wasn't aware that there was a dedicated googletest format. If it's a better fit then we should definitely consider using something like that.

Besides this, having our own test format would allow us to resolve the other problems of this approach as well:

  • since it's the test format who determines the result of the test, it would be trivial to come up with some sort of a protocol (or reusing an existing one) to notify lit of the full range of test results (pass, fail, xfail, unsupported)
  • the test format could know that a "test file" is everything ending in ".py" and starting with Test (which is exactly the rules that we follow now), so no special or new conventions would be needed.
  • it would give us full isolation between individual test methods in a file, while still having the convenience of being able to factor out common code into utility functions

If we come up with out own test format, would we be able to reuse the current dotest.py output?

I know this is a bit more up-front work, but I think this will result in a much nicer final product, and will allow us to remove a lot more code more easily (maybe even all of unittest2 eventually).

That's totally warranted if it helps in the long term.

(I apologise for the rashness of my response, I can go into this in more detail tomorrow).

No worries, I didn't have that impression at all. I appreciate the constructive feedback!

labath added a comment.Apr 4 2018, 3:46 AM

I don't think this is going in a good direction TBH.

You are building another layer on top of everything, whereas I think we should be cutting layers out. Besides the issues already pointed out (not being able to differentiate PASS/XFAIL/SKIP, not all .py files being test files), I see one more problem: a single file does not contain a single test -- some of our test files have dozens of tests, and this would bunch them together.

I completely agree and removing the driver logic from dotest would contribute to that goal, no?

Maybe we could achieve that this way, but this seems like a strange way of achieving that. Maybe I just don't know what are the next steps you have in plan for this.

I think the solution here is to invent a new lit test format, instead of trying to fit our very square tests into the round ShTest boxes. Of the existing test formats, I think that actually the googletest format is closest to what we need here, and that's because it supports the notion of a "test" being different from a "file" -- gtest executables typically contain dozens if not hundreds of test, and yet googletest format is able to recognize each one individually. The only difference is that instead of running something like "my_google_test_exec --gtest_list_all_tests" you would use some python introspection to figure out the list of tests within a file.

Great, I wasn't aware that there was a dedicated googletest format. If it's a better fit then we should definitely consider using something like that.

Just to be clear. I doubt we will be able to reuse any of the existing googletest code, but I don't think that matters as the entirety of googletest support code in lit (llvm/utils/lit/lit/formats/googletest.py) is 150 lines of code, and I don't expect ours to be much longer.

Besides this, having our own test format would allow us to resolve the other problems of this approach as well:

  • since it's the test format who determines the result of the test, it would be trivial to come up with some sort of a protocol (or reusing an existing one) to notify lit of the full range of test results (pass, fail, xfail, unsupported)
  • the test format could know that a "test file" is everything ending in ".py" and starting with Test (which is exactly the rules that we follow now), so no special or new conventions would be needed.
  • it would give us full isolation between individual test methods in a file, while still having the convenience of being able to factor out common code into utility functions

If we come up with out own test format, would we be able to reuse the current dotest.py output?

I'm not sure what you mean by this, but I'm pretty sure the answer is yes. :)

If you're talking about the textual output, then we could do the exact same thing as googletest is doing. googletest does something like this (execute() in googletest.py):

out, err, exitCode = lit.util.executeCommand(
        [testPath, '--gtest_filter=' + testName],
        env=test.config.environment,
        timeout=litConfig.maxIndividualTestTime)
if exitCode:
    return lit.Test.FAIL, out + err

The only thing we'd need to change is command we execute.

Adding Zachary as he's familiar with lit internals.

I'll try to elaborate more on the approach I had in mind. I would split my approach into a couple of tests.

  1. Write a tool which will dump out the list of all tests in the test suite. This could either be a separate python file, or a special flag to dotest.py, whichever is easier. It's implementation would basically be like:
for each test subdir:
  for each file in subdir:
    if file matches Test*.py:
      test = __import__(file)
      for each class in test:
        for each method in class:
          if method matches "test*":
            print("%s.%s.%s"%(file. class.__name__, method.__name__))

The only tricky part here is the __import__ line. I think we'll need to do some fiddling with sys.path or whatever to get the test files to import correctly. We'll probably need to look at how unittest2 imports them to get it right.

This fiddling is also the reason I think this should be a separate utility. Since the "test discovery" phase happens in the context of the main lit process, which is shared with all other test suites, we should try to avoid messing with it's state too much.

  1. have the new test format invoke this utility in its discovery function (getTestsInDirectory)
  1. Invoke something like: ['dotest.py'] + common_args + ['-p', TestFile, '-f', testclass + '.' + testmethod] in the execute()` function

The thing to resolve here is to figure out how to translate the result of that command into the test state. Whether we wan't to use just the exit code (this would give us PASS/FAIL states only), do we want to scrape the command output to get the rest of the states (XFAIL, SKIP, ...), or do something else..

I am deliberately bypassing lldb-dotest here, as I think it becomes unnecessary in this setup (so we already remove one layer) -- the common args can be hardcoded by cmake into lit.site.cfg which lit will read for us. And since now lit is the one executing the test, you don't have to worry about remembering all the arguments because you can just pass the test name to lit, and it will figure them out for you (we can still print the full dotest line we execute to help debugging).

  1. Start ripping out all the multiprocessing/driver logic from dotest.

Alright, I'm convinced this is the way to go.

  • For (1) I'll see if I can get some inspiration from the visit logic in dotest.py. I guess the functionality is similar. I agree on doing this in a separate tool, especially if we want to remove functionality from dotest in the long run.
  • For the lit stuff I'll wait on input from Zachary, but it doesn't seem to be too hard either.
  • No objections to skipping lldb-dotest, it was only meant to support the whole lit thing. If we don't need it the better, it has already caused me some minor headaches :-)

I'll leave this diff open for the central discussion until I have diffs for the sub-problems.

labath added a comment.Apr 4 2018, 5:02 AM

Alright, I'm convinced this is the way to go.

  • For (1) I'll see if I can get some inspiration from the visit logic in dotest.py. I guess the functionality is similar. I agree on doing this in a separate tool, especially if we want to remove functionality from dotest in the long run.

I'm not aware of any code which would do the full visit. We have some code in dosep.py (find_test_files_in_dir_tree), but this one only does file traversal. It does not look inside the the files for classes, test methods and such.

The real visiting happens inside unittest2 (loader.py). The code there seems very complicated, but I think that's because it tries to support various things we don't use, so /I hope/ that our visiting logic can be simpler.

OTOH, looking at loader.py, it seems to me it may actually be possible to let it do the enumeration for us, if we stroke it the right way.

I haven’t had time to look at this in detail yet, but when I originally had
this idea I thought we would use lit’s discovery mechanism to find all .py
files, and then invoke them using dotest.py in single process mode with a
path to a specific file.

Why do we need run lines?

JDevlieghere added a comment.EditedApr 4 2018, 8:09 AM

I haven’t had time to look at this in detail yet, but when I originally had
this idea I thought we would use lit’s discovery mechanism to find all .py
files, and then invoke them using dotest.py in single process mode with a
path to a specific file.

Assuming we can work around the problem of not every .py file being a test (by filtering the Test prefix), would there be a way to differentiate the different test within a single file?

Why do we need run lines?

With a custom test format these would no longer be needed.

Preferably lit would take care of as much as possible. I think Zachary’s
idea makes sense as an incremental step. If we think of one python file as
a google test executable, it makes sense to return a list of test for every
python file for “v2”. I think running the different variants as separate
tests is going to be “v3” and will require quite a bit more work.

Actually, this (v3) should happen pretty much automatically, as the test variant is encoded in the test method name. we have a python metaclass which does this replication for us, and as far as the rest of the world is concerned, they are just separate tests.

So if you just normally enumerate all methods in a test class, you will naturally get every variant in it's own test.

Preferably lit would take care of as much as possible. I think Zachary’s
idea makes sense as an incremental step. If we think of one python file as
a google test executable, it makes sense to return a list of test for every
python file for “v2”. I think running the different variants as separate
tests is going to be “v3” and will require quite a bit more work.

Actually, this (v3) should happen pretty much automatically, as the test variant is encoded in the test method name. we have a python metaclass which does this replication for us, and as far as the rest of the world is concerned, they are just separate tests.

So if you just normally enumerate all methods in a test class, you will naturally get every variant in it's own test.

Yea but there are some issues with that, such as wanting to have a bit of common set up for each directory so that each variant doesn't have to do the same expensive thing over and over.

Anyway, we should discuss this when it's time to actually implement that since I think there are some interesting cases to consider.

Please see D45332 for the lldb test format and D45333 for how to use that to run the tests.

JDevlieghere abandoned this revision.Apr 11 2018, 8:49 AM