Page MenuHomePhabricator

[WIP][libc++] Add an executor that bundles tests for deferred execution
Needs ReviewPublic

Authored by danalbert on Apr 7 2020, 8:56 AM.

Details

Reviewers
ldionne
Group Reviewers
Restricted Project
Summary

Instead of actually running tests, this executor instead bundles the
executable and its dependencies into a tarball, and then creates a
self-contained script that can be used to run that test later.

The script contains the tarball of dependencies embedded inside itself,
so one can simply copy the script anywhere and execute it, provided the
machine and the executable have compatible architectures. This is an
alternative to SSH executors, which run tests remotely as lit executes
instead of deferring that execution. Deferring execution allows doing
batch copies and executes, which can be much faster than SSHing to a
remote machine for each test.

However, there are still some problems with this approach as of right now:

  1. Lit doesn't have a notion of "deferred execution", so tests that should actually run a command will be marked as PASS even though they might fail when the test is actually run. Simmilarly, if the test is expected to XFAIL at runtime, lit will think that the test XPASSed if the bundling step succeeds.
  2. Similarly, when running the tests on the target machine, any XFAIL test that (correctly) fails will kind-of be reported as a failure, since the executable will return a non-zero code.
  3. When running a ShTest that has multiple RUN commands (all prefixed by %{exec}), the bundle will be re-created for each line instead of having a single bundle for all RUN lines. This basically doesn't work at all.

I believe there are solutions to all of the above, but solving them
properly will require some thinking and some changes to lit. Here are
some ideas:

  • Lit could have a notion of HOST-RUN: and TARGET-RUN: lines. The idea is that Lit wouldn't even try running TARGET-RUN: lines on the host, and if any such line exists, the test would be reported as having a new status HOST-PASS or TARGET-DEFERRED or whatever. That new test status would represent that Lit can't know for sure yet whether the test will pass on the target. That would solve (1).
  • In addition to the above, we could solve (2) by running Lit on the target in a mode where only the TARGET-RUN: lines are executed. Unfortunately, this requires the target to support running Lit, which is not always reasonable (imagine small embedded devices).
  • Lit could support multi-line RUN: lines, where it would basically create a small script and consider it like a single RUN line. We could then use the %{exec} substitution on such a multi-line RUN instead of having multiple calls to %{exec}. This would solve (3). Alternatively, Lit could have builtin support for running HOST-RUN and TARGET-RUN lines specially.

Diff Detail

Event Timeline

ldionne created this revision.Apr 7 2020, 8:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 7 2020, 8:56 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript

This review is a WIP for solving remote execution of libc++ tests with lit better than we currently do. See the review summary for the initial discussion.

Adding a bunch of folks that might be interested by the discussion -- feel free to unsubscribe if you have no interest!

Also, just to explain how I'm currently running the tests with this patch applied:

rsync -am --include='*.bundle' --include='*/' --exclude='*' $PWD/build $PWD/all-tests

for bundle in $(find all-tests -name '*.bundle'); do
    if ! sh ${bundle}; then
        echo ${bundle} failed
    fi
done

I'm currently running them locally, but if the tests were compiled for some device, I could just as easily copy the bundles over and execute them with a single ssh command.

yln added a comment.Apr 8 2020, 4:34 PM

This is very exciting! I would definitely want to adopt this for compiler-rt tests. I am sure the Swift folks would be interested in this as well.

My unorganized thoughts:
I agree that we should teach lit the notion of "deferred execution" in some form or another to make this work. We essentially need to be able to break up tests into separate steps: setup (compile, generate inputs on host), execute (run on device), assert (inspect outputs on host) and then change execution to do thing stage-by-stage: setup, copy to device, run on device, copy back results, finish tests; instead of test-by-test. I think most of compiler-rt tests would be amendable to this without changes.

Lit could have a notion of HOST-RUN: and TARGET-RUN: lines.

Did you consider inferring this via the presence of %{exec}? In compiler-rt, we have %run, which is empty if host==target, but does simctl spawn for the iOS simulator.

In addition to the above, we could solve (2) by running Lit on the target in a mode where only the TARGET-RUN: lines are executed. Unfortunately, this requires the target to support running Lit, which is not always reasonable (imagine small embedded devices).

I think we should keep the execute step on the target as "dumb" as possible, i.e., no running lit on the target. I would restrict the target to do what's specified via %{exec}: execute the copied binary and produce outputs.

Lit could support multi-line RUN: lines, where it would basically create a small script and consider it like a single RUN line. We could then use the %{exec} substitution on such a multi-line RUN instead of having multiple calls to %{exec}. This would solve (3). Alternatively, Lit could have builtin support for running HOST-RUN and TARGET-RUN lines specially.

Did you consider to generate one bundle per test including all of the %{exec} lines, instead one bundle per %{exec} line?

ldionne updated this revision to Diff 342879.May 4 2021, 3:09 PM

Rebase onto main and on top of the from-scratch config.

Dan Albert and I spoke today and he may be interested in picking up this work,
which I've had to put on the side for lack of time.

Re problems 1 and 2 (xfail handling), my idea what that we'd add --xfail to the bundler interface and delegate to the test runner to handle that. For Android I need to fit these tests into tradefed, so our bundler would work by creating the test config files for tradefed and would let it handle xfails. We don't have much reason to interact with LIT again after the tests are built. This more or less matches how we've been running libc++ tests for years, though with some improvements (xfail info is currently manually tracked rather than being given by LIT). I've been carrying a local patch that adds a --build-only flag which just ignores xfail annotations and defers all the execution and interpretation to our test runner.

@ldionne do you have an example for 3? Shell tests were hardly a thing the last time I spent any real time here so I'm not sure why multiple run lines would be used in a test.

One other thought is that it would be beneficial for us if we could build one bundle rather than a bundle per test, but that's probably impractical.

danalbert commandeered this revision.May 12 2021, 5:24 PM
danalbert added a reviewer: ldionne.
danalbert updated this revision to Diff 344991.May 12 2021, 5:28 PM

Initial pass at my suggestion of deferring XFAIL handling to the bundler. This is incomplete because we'll need to differentiate XFAIL-BUILD and XFAIL-RUN in tests (running this as-is makes any test that is XFAIL because it doesn't _build_ turn to FAIL, but things that fail to _run_ are handled correctly), but it's enough to get feedback on the idea.

danalbert updated this revision to Diff 345307.May 13 2021, 3:50 PM

Adds XFAIL-BUILD to deal with the deferred-run cases that fail to build.

Fix remaining tests for the deferred test configuration.

krisb added a subscriber: krisb.May 30 2021, 2:49 PM
phosek added a subscriber: phosek.Thu, Jul 8, 10:59 AM
bcain added a subscriber: bcain.Fri, Jul 16, 3:47 PM