This is an archive of the discontinued LLVM Phabricator instance.

[lit] Enable the use of custom user-defined lit commands
Needs RevisionPublic

Authored by zturner on Nov 19 2018, 3:35 PM.

Details

Summary

Currently lit supports running shell commands through the use of the RUN: prefix. This patch allows individual test suites to install their own run handlers that can do things other than run shell commands. RUN commands still work as they always do, just that now if a different kind of command appears it will be appropriately sequenced along with the run command.

The commands the user installs can execute arbitrary Python code. As such, they can in theory write directly to stdout or stderr, but a well-behaved command should return its stdout and stderr from the function so that this can be reported to the user in a manner consistent with output from RUN lines.

The motivating use case for this is being able to provide a richer and more powerful syntax by which to compile test programs in LLDB tests. Currently everything is based off of substitutions and explicitly shell commands, but this is problematic when you get into interesting compilation scenarios.

For example, one could imagine wanting to write a test that tested the behavior of the debugger with optimized code. Each driver has different sets of flags that control the optimization behavior.

Another example is in cross-compilation scenarios. Certain types of PDB tests don't need to run a process, so the tests can be run anywhere, but they need to be linked with special flags to avoid pulling in system libraries.

We can try to make substitutions for all of these cases, but it will quickly become unwieldy and you will end up with a command line like: RUN: %cxx %debug %opt %norun, and this still isn't as flexible as you'd like.

With this patch, we could (in theory) do the compilation directly from Python. Instead of a shell command like above, we could write something like:

COMPILE: source=%p/Inputs/foo.cpp \
COMPILE:     mode=debug \
COMPILE:     opt=none \
COMPILE:     link=no \
COMPILE:     output=%t.o \
COMPILE:     clean=yes

and let the function figure out how best to do this for each platform. This is similar in spirit to how lldb's dotest.py already works with its platform specific builders, but the mechanism here is general enough that it can be used for anything a test suite wants, not just compiling.

Diff Detail

Event Timeline

zturner created this revision.Nov 19 2018, 3:35 PM
zturner edited the summary of this revision. (Show Details)Nov 19 2018, 3:36 PM
zturner edited the summary of this revision. (Show Details)
zturner edited the summary of this revision. (Show Details)
zturner added a subscriber: llvm-commits.
stella.stamenova accepted this revision.Nov 19 2018, 3:43 PM

LGTM as long as the tests are all still passing :)

This revision is now accepted and ready to land.Nov 19 2018, 3:43 PM
rnk added a comment.Nov 20 2018, 4:04 PM

I think the code to implement this is fine, but before we add this complexity to lit, I just wanted to know if other folks who work on the LLDB test suite are on board and want to use this approach to abstract over building apps for different targets. I see @stella.stamenova is, but I wanted to hear from other people involved in the LLDB lit test suite stuff. The %compile %debug %opt %s lit substitution approach is limiting, but do people feel strongly that this is much better?

llvm/utils/lit/tests/Inputs/shtest-keyword-command/keyword_helper.py
2

Oh, the joys of multiprocessing and pickling...

zturner added a reviewer: vsk.Nov 20 2018, 4:19 PM

+vsk. One thing I need input from LLDB people on is whether this seems like a good "general direction" for writing lit tests. Does wrapping compilation and linker commands behind a python function seem like it would address these use cases?

Can we think of any other use cases for custom commands like this?

vsk added a subscriber: filcab.Nov 20 2018, 6:31 PM

For compiling/linking, I think we can get by using lit substitutions to fill in platform-specific options? iOS testing for Swift is done this way (both on-device and simulator), as is testing for the profiling runtime. Dan and @filcab are more active in the area of sanitizer runtime testing, so they may have more informed opinions to share about how well that model works.

I’m not sure what we’d use custom lit commands for beyond compiling tests.

I think that something like this would go a long way towards solving the problems with lit tests we're having in lldb.

However, the part that is not clear to me is whether it is actually necessary to modify lit (shtest) to achieve this. It seems to me an equivalent effect to the command from the motivating example could be achieved via something like

RUN: %compile --source=%p/Inputs/foo.cpp --mode=debug --opt=none --link=no --output=%t.o --clean=yes

where %compile expands to a python script living somewhere in the lldb repository. This script could do the same thing that the implementation of COMPILE: would do, except it would be done in a separate process.

The only downside of that I see is the extra process will incur some overhead, slowing down the testing, but I am not sure if we care about that (or if it would even be measurable). OTOH, the benefits are:

  • decreased complexity of lit
  • decreased level of surprise of developers seeing new lit commands
  • easier reproducibility of tests when debugging (just copy paste the %compile run-line to rebuild the executable)

I think that something like this would go a long way towards solving the problems with lit tests we're having in lldb.

However, the part that is not clear to me is whether it is actually necessary to modify lit (shtest) to achieve this. It seems to me an equivalent effect to the command from the motivating example could be achieved via something like

RUN: %compile --source=%p/Inputs/foo.cpp --mode=debug --opt=none --link=no --output=%t.o --clean=yes

where %compile expands to a python script living somewhere in the lldb repository. This script could do the same thing that the implementation of COMPILE: would do, except it would be done in a separate process.

The only downside of that I see is the extra process will incur some overhead, slowing down the testing, but I am not sure if we care about that (or if it would even be measurable). OTOH, the benefits are:

  • decreased complexity of lit
  • decreased level of surprise of developers seeing new lit commands
  • easier reproducibility of tests when debugging (just copy paste the %compile run-line to rebuild the executable)

I did consider this, and I'm still open to the possibility of doing things this way. Two reasons I chose this route instead are:

  1. We have a lot of setup that runs in lit before we ever get to this point, and a builder could re-use this. For example, the environment, any additional lit configuration parameters specified on the command line, etc. We could of course pass these in to the compile.py script via hidden arguments, so this isn't a total blocker, it was just something I thought of.
  1. We could re-purpose this machinery for other uses. For example, I could imagine re-writing many lldb inline tests in terms of a custom command prefix. For example, here's test/functionalities/data-formatter/dump_dynamic/main.cpp:
class Base {
public:
  Base () = default;
  virtual int func() { return 1; }
  virtual ~Base() = default;
};

class Derived : public Base {
private:
  int m_derived_data;
public:
  Derived () : Base(), m_derived_data(0x0fedbeef) {}
  virtual ~Derived() = default;
  virtual int func() { return m_derived_data; }
};

int main (int argc, char const *argv[])
{
  Base *base = new Derived();
    return 0; //% stream = lldb.SBStream()
    //% base = self.frame().FindVariable("base")
    //% base.SetPreferDynamicValue(lldb.eDynamicDontRunTarget)
    //% base.GetDescription(stream)
    //% if self.TraceOn(): print(stream.GetData())
    //% self.assertTrue(stream.GetData().startswith("(Derived *"))
}

I could imagine writing this as:

class Base {
public:
  Base () = default;
  virtual int func() { return 1; }
  virtual ~Base() = default;
};

class Derived : public Base {
private:
  int m_derived_data;
public:
  Derived () : Base(), m_derived_data(0x0fedbeef) {}
  virtual ~Derived() = default;
  virtual int func() { return m_derived_data; }
};

int main (int argc, char const *argv[])
{
  Base *base = new Derived();
    return 0;
}

//SCRIPT: stream = lldb.SBStream()
//SCRIPT: base = self.frame().FindVariable("base")
//SCRIPT: base.SetPreferDynamicValue(lldb.eDynamicDontRunTarget)
//SCRIPT: base.GetDescription(stream)
//EXPECT: stream.GetData().startswith("(Derived *"))

where the lldb module is loaded in-process similar to how it is with dotest.py. (I do wonder if all lldbinline tests could actually be convereted to lit / FileCheck tests right now, today, using an lldbinit file such as:

script stream = lldb.SBStream()
script base = self.frame().FindVariable("base")
script base.SetPreferDynamicValue(lldb.eDynamicDontRunTarget)
script base.GetDescription(stream)
script stream.GetData()

and then FileCheck'ing that, but I haven't tried and I haven't investigated every single lldbinline test to see if they would all fit into this model.

In any case, the point being that being able to run python code in-process opens up a lot of interesting possibilities, considering that's how all the dotest tests are written. Whether we need that flexibility is open for discussion though. Like I said, i'm willing to give the external script a try if people think we should try a more conservative approach first.

I'd go with the "conservative" approach first. The idea of having lldb loaded inside a lit process does not excite me. One of the problems we have with dotest is that when lldb crashes during the test, it takes a part of the test driver with it which causes some tests to be skipped and the complicates the reporting of the result of the crashed test. It's not as bad right now, as there is still the main process left to report some kind of an error (back in the days when tests were run sequentially in a single process, the entire test suite would just stop), but I still think it would be nice to avoid these issues in the new framework.

aprantl requested changes to this revision.Nov 27 2018, 8:57 AM

Currently lit supports running shell commands through the use of the RUN: prefix. This patch allows individual test suites to install their own run handlers that can do things other than run shell commands. RUN commands still work as they always do, just that now if a different kind of command appears it will be appropriately sequenced along with the run command.

I'm not convinced that this is the best direction to evolve the LLDB testsuite to. Let me know if I'm missing something; I'm willing to be convinced otherwise :-)

It sounds like the problem you want to solve is having a more flexible build system for tests, and the ability to run python code as part of a tests. That is exactly the feature set that dotest.py provides. Tests are written in fully flexible Python, and in order to compile inferiors, we can fan out to a dedicated build system that is really good at compiling programs, namely make.

I don't see how any of your stated goals couldn't be achieved within the existing Makefile.rules. Establishing a second, parallel way of doing something similar would only serve to bifurcate the test infrastructure and make maintenance a lot harder in the future. It also makes the system more difficult to explain to new developers.

Specifically:

The commands the user installs can execute arbitrary Python code.

dotest.py already does that. Currently, we are using lit.py as a test scheduler and dotest.py as an LLDB-specific test harness. I think that's reasonable design.

As such, they can in theory write directly to stdout or stderr, but a well-behaved command should return its stdout and stderr from the function so that this can be reported to the user in a manner consistent with output from RUN lines.

The motivating use case for this is being able to provide a richer and more powerful syntax by which to compile test programs in LLDB tests. Currently everything is based off of substitutions and explicitly shell commands, but this is problematic when you get into interesting compilation scenarios.

I disagree with this statement. Building tests is done in an explicit, portable build system: make. I don't think it is a good idea to *also* add all the complexity of the dotest.py tests to the lit-based tests. Lit-based tests are very useful for certain (specifically non-interactive) use-cases, but if you need more build system support, or need to more complex test logic, I'd rather use make+dotest.py.

For example, one could imagine wanting to write a test that tested the behavior of the debugger with optimized code. Each driver has different sets of flags that control the optimization behavior.

This is mostly a solved problem with our Makefile system.

Another example is in cross-compilation scenarios. Certain types of PDB tests don't need to run a process, so the tests can be run anywhere, but they need to be linked with special flags to avoid pulling in system libraries.

We can try to make substitutions for all of these cases, but it will quickly become unwieldy and you will end up with a command line like: RUN: %cxx %debug %opt %norun, and this still isn't as flexible as you'd like.

With this patch, we could (in theory) do the compilation directly from Python. Instead of a shell command like above, we could write something like:

COMPILE: source=%p/Inputs/foo.cpp \
COMPILE: mode=debug \
COMPILE: opt=none \
COMPILE: link=no \
COMPILE: output=%t.o \
COMPILE: clean=yes
and let the function figure out how best to do this for each platform. This is similar in spirit to how LLDB's dotest.py already works with its platform specific builders, but the mechanism here is general enough that it can be used for anything a test suite wants, not just compiling.

In the dotest tests you generally don't need to write explicit build commands at all. All the platform-specific build logic is implemented once in Makefile.rules and the individual tests merely specify what source files need to be built and whether you want o build a binary or a shared library.

This revision now requires changes to proceed.Nov 27 2018, 8:57 AM
labath resigned from this revision.Aug 9 2019, 2:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 2:03 AM
JDevlieghere resigned from this revision.Aug 19 2019, 12:36 PM