[CMake] Add support for unittests that have input files
ClosedPublic

Authored by zturner on Fri, Aug 31, 3:35 PM.

Details

Summary

This patch allows a target to call a function add_llvm_unittest_inputs() with a list of files. This list of files will be copied from the source directory to the binary directory. Then, it adds a function to SupportHelpers that computes the input directory folder, which a unittest can then use to construct a path to the inputs folder.

LLDB has had this for a while, but now I need it in LLVM. Since we have more than one user now, having this in LLVM makes sense, and as a followup we can port LLDB's uses over to the new LLVM code.

Diff Detail

Repository
rL LLVM
zturner created this revision.Fri, Aug 31, 3:35 PM
rnk added inline comments.Tue, Sep 4, 11:35 AM
llvm/cmake/modules/AddLLVM.cmake
1126 ↗(On Diff #163601)

These post build commands will only run when the unit test target rebuilds, which means changes to the inputs won't be reflected in the copied files if you edit an input, rebuild, and re-run the test.

Would it be better to embed a path to the Inputs directory in the source directory? That's how lit tests work; they don't copy inputs to the output directory. I don't see why unit tests need to be independent from the source directory.

zturner added inline comments.Tue, Sep 4, 11:38 AM
llvm/cmake/modules/AddLLVM.cmake
1126 ↗(On Diff #163601)

How do you envision this working? What path would be embedded exactly? Obviously we can't hardcode anything as different peoples' machines are configured differently. lit tests have the advantage that they have a lit.cfg file, so it can use the top-level lit.cfg to find the source directory at runtime. Unittests have no such analogue that I'm aware of.

zturner added inline comments.Tue, Sep 4, 11:39 AM
llvm/cmake/modules/AddLLVM.cmake
1126 ↗(On Diff #163601)

FWIW, I think you can set the DEPENDS CMake property on this custom command so that if you change the input it will re-run. I didn't do that here (unittest inputs tend to be binaries that change almost never), but it seems possible.

rnk added inline comments.Wed, Sep 5, 1:58 PM
llvm/cmake/modules/AddLLVM.cmake
1126 ↗(On Diff #163601)

The alternative I came up with over lunch was to have CMake write some kind of file next to the .exe, like unittest_input_path.txt, and then have getInputFileDirectory() read that file and return the path inside it. That could point to the source directory. It might also do a sanity check to ensure the directory exists.

If the DEPENDS trick does what we want, sounds good, but I think this text file is going to be simpler. No macro definitions, no calls to configure_file, just runtime things.

zturner updated this revision to Diff 164111.Wed, Sep 5, 2:38 PM

Made a custom target which is responsible for copying inputs, and set it as a dependency of the unittest executable target. This should ensure that even if the input file changes, it makes it into the output directory.

zturner updated this revision to Diff 164115.Wed, Sep 5, 3:26 PM

Updated to use the configure_file approach. See if you like this better.

rnk accepted this revision.Wed, Sep 5, 4:02 PM

Thanks, I do think this will be simpler in the long run. :)

llvm/lib/Testing/Support/SupportHelpers.cpp
29 ↗(On Diff #164115)

Can you do something like EXPECT_TRUE(llvm::sys::path::is_directory(Result)) << "unit test inputs directory does not exist"; so that tests die earlier in more obvious ways when they call this and the directory is missing?

This revision is now accepted and ready to land.Wed, Sep 5, 4:02 PM
This revision was automatically updated to reflect the committed changes.
MaskRay added a subscriber: MaskRay.Wed, Sep 5, 5:54 PM

For -DBUILD_SHARED_LIBS=on users,

% ninja libLLVMTestingSupport.so
............. -Wl,-z,defs  ..........
ld.lld: error: undefined symbol: TestMainArgv0
>>> referenced by SupportHelpers.cpp
>>>               lib/Testing/Support/CMakeFiles/LLVMTestingSupport.dir/SupportHelpers.cpp.o:(llvm::unittest::getInputFileDirectory())
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

Now libLLVMTestingSupport.so references a symbol in utils/unittest/UnitTestMain/TestMain.cpp

zturner added a subscriber: beanz.Wed, Sep 5, 6:16 PM

I think LLVMTestingSupport should never be a shared lib, what do you think?

I can’t fix this until tomorrow but if you need an immediate workaround you
can pass TestMainArgv0 to the function instead. Could you please check that
in for me?

Sorry, didn't see this earlier. Writing a llvm.srcdir.txt next to every single unit test binary seems like a pretty roundabout way of doing this. How about instead passing the src dir in a runtime flag in http://llvm-cs.pcc.me.uk/utils/lit/lit/formats/googletest.py#107 if the lit config asks for it? Then you don't need to touch the disk to get the src dir path, it's not passed to _all_ binaries, and cmake doesn't have to write a file to disk for every test binary.

Also also, opening a file from a unit test kind of makes it not a unit test, since those are supposed to not hit the disk etc. Maybe that test wants to be its own binary instead that's called from a lit script, instead of being a unit test?

rnk added a comment.Thu, Sep 6, 11:17 AM

Sorry, didn't see this earlier. Writing a llvm.srcdir.txt next to every single unit test binary seems like a pretty roundabout way of doing this. How about instead passing the src dir in a runtime flag in http://llvm-cs.pcc.me.uk/utils/lit/lit/formats/googletest.py#107 if the lit config asks for it? Then you don't need to touch the disk to get the src dir path, it's not passed to _all_ binaries, and cmake doesn't have to write a file to disk for every test binary.

To avoid CMake writing so many files, we could make this opt-in. A unittest could pass an option to add_unittest or call another cmake function to configure the text file. Or, CMake could test for the presence of $srcdir/Inputs before writing the text file.

I don't think we want to get lit to pass a flag to gtest binaries. Gtest binaries have the nice property that you just run them and they work. It would be a shame to lose that.

Also also, opening a file from a unit test kind of makes it not a unit test, since those are supposed to not hit the disk etc. Maybe that test wants to be its own binary instead that's called from a lit script, instead of being a unit test?

We may call these "unit" tests, but they're really gtests, and developers everywhere use gtest for more than just unit tests. We already have tests that create temp files. I don't see why reaching back into the source directory for inputs is too heavyweight for us.


What are you most concerned about, files littering the build directory, cmake runtime, simplicity, or something else?

Also there’s not really that many of these files, and they’re very small.
Making it opt-in seems more troue than it’s worth because there’s not going
to be a measurable impact of writing the files.

It would be nice to have a standalone executable that can be lit-run but
it’s a lot of work for little gain in some cases. Fwiw this pattern is also
prevalent in lldb with minidump tests. There too it would be nice to have a
dedicated test executable, but again it’s a lot of work. In this particular
case, it’s even questionable how appropriate that would be, because I’m
really testing an implementation detail, which is what unit tests are for),
but it requires an input file to exercise. Mocking up a “fake” pdb file
class is another option, but that’s also a lot of work and the solution
here is pretty straightforward and “just works”

I think LLVMTestingSupport should never be a shared lib, what do you think?

I don't know much about this target... just came across the build issue (ninja without arguments builds all (or most) targets). Pushed a quick fix rL341580 and waiting for your proper one :)

I can’t fix this until tomorrow but if you need an immediate workaround you
can pass TestMainArgv0 to the function instead. Could you please check that
in for me?

We may call these "unit" tests, but they're really gtests, and developers everywhere use gtest for more than just unit tests. We already have tests that create temp files. I don't see why reaching back into the source directory for inputs is too heavyweight for us.

That's definitely true at Google because it's the only hammer we have there. In LLVM land, we have lit, and gtests really tend to be unit tests there.

Also there’s not really that many of these files, and they’re very small.
Making it opt-in seems more troue than it’s worth because there’s not going
to be a measurable impact of writing the files.

It's one per test binary, so about 60. I agree that writing 60 files won't have a big time cost, but writing 60 files because a single gtest wants to access a file in the source tree seems questionable to me.

Furthermore, it's not a move that has good follow-up moves:

  • If this doesn't get widely adopted (as he commit message hopes: "and we just encourage people to use this sparingly") then we have this weird magical setup that's used by one file.
  • If it does get widely adopted, then now we have lots of tests reading stuff from the disk, where writing a binary running under lit would almost certainly lead to a better long term design.

So I'd encourage you to think about the question "if this CL here wasn't a possible approach, how could I still do what I want?" and do that instead.

Fwiw this pattern is also prevalent in lldb with minidump tests.

I'm not sure if "lldb has it, so it's a good pattern" is necessarily a good argument.

Doesn’t mean it’s a good or bad pattern, just that “only 1 file will use
it” is already false, because we can make lldb use it as a followup and
that will get many more.

I did think about alternative solutions, If this CL wasn’t here, the
alternatives i came up would be to either not write this test, or spend
several days/weeks writing a tool whose only purpose was to make it
possible to write this test.

ormris added a subscriber: ormris.Thu, Sep 6, 2:19 PM

This is causing issues for our Visual Studio builds. CMake doesn't know which configuration will be used at build time, so we end up with "llvm.srcdir.txt" placed in directory called "$(Configuration)". Any thoughts on a fix?

Let me look at it. I have VS installed so I can repro it. Gimme about 30
minutes and I'll update with a path forward.

Doesn’t mean it’s a good or bad pattern, just that “only 1 file will use
it” is already false, because we can make lldb use it as a followup and
that will get many more.

I did think about alternative solutions, If this CL wasn’t here, the
alternatives i came up would be to either not write this test, or spend
several days/weeks writing a tool whose only purpose was to make it
possible to write this test.

If you're dead set on this approach, please do make it opt in. Something like:

function(add_file_reading_unittest test_suite test_name)
  set(LLVM_UNITTEST_SOURCE_DIR ${CMAKE_CURRENT_SOURCE_DIR})
  configure_file(
    ${LLVM_MAIN_SRC_DIR}/unittests/unittest.cfg.in
    ${CMAKE_CURRENT_BINARY_DIR}/${CMAKE_CFG_INTDIR}/llvm.srcdir.txt)
  add_unittests(test_suite test_name)
endfunction()

...and then make DemangleTests call that function.

These aren't demangle tests fwiw. But that just complicates the cmake
logic, and I don't see the benefit of doing so. Writing 3k of data to the
disk is hardly going to be a noticeable part of the configure step.

These aren't demangle tests fwiw. But that just complicates the cmake
logic, and I don't see the benefit of doing so. Writing 3k of data to the
disk is hardly going to be a noticeable part of the configure step.

The benefits are:

  1. Unit tests reading inputs are bad form; doing that should require an explicit opt-in.
  1. While writing 3k of data to disk isn't causing measurable perf issues, writing a text file next to every test binary just because one test binary needs it is still plain ugly.

And not doing it is super easy to do. I don't understand the pushback here.

These aren't demangle tests fwiw. But that just complicates the cmake
logic, and I don't see the benefit of doing so. Writing 3k of data to the
disk is hardly going to be a noticeable part of the configure step.

The benefits are:

  1. Unit tests reading inputs are bad form; doing that should require an explicit opt-in.
  2. While writing 3k of data to disk isn't causing measurable perf issues, writing a text file next to every test binary just because one test binary needs it is still plain ugly.

    And not doing it is super easy to do. I don't understand the pushback here.

^ zturner: Ping on this.

Unrelatedly, another problem with the current design is that llvm.srcdir.txt must contain an absolute path, which makes it impossible to build on one machine and then copy artifacts and test inputs to another machine and run tests on another machine. DebugInfoPDBTests is the only test in all llvm, lld, clang tests that has this restriction. Requiring some defined fixed pwd for unit tests with inputs would address this issue as well. (As would finding a way to not require reading a file off disk for this test.)

All of the lit tests have this restriction, is this something you would do
with unit tests but not lit tests?

Unrelatedly, another problem with the current design is that llvm.srcdir.txt must contain an absolute path, which makes it impossible to build on one machine and then copy artifacts and test inputs to another machine and run tests on another machine. DebugInfoPDBTests is the only test in all llvm, lld, clang tests that has this restriction. Requiring some defined fixed pwd for unit tests with inputs would address this issue as well. (As would finding a way to not require reading a file off disk for this test.)

This issue could be fixed by treating the path in llvm.srcdir.txt as relative to the test executable (if it's relative).

All of the lit tests have this restriction, is this something you would do
with unit tests but not lit tests?

If you set things up so that the paths passed to lit.site.cfg.py.in (@LLVM_SOURCE_DIR@ etc) are relative and are made absolute at test time instead of at cmake time, this currently works. But treating the contents of llvm.srcdir.txt as executable-relative paths does make things work in that setup, so I have a workaround for that usecase.

I sent you https://reviews.llvm.org/D52095 for making the file-reading opt-in as I requested.