This is an archive of the discontinued LLVM Phabricator instance.

[lit] Use sharding for GoogleTest format
ClosedPublic

Authored by ychen on Mar 22 2022, 12:04 PM.

Details

Summary

This helps lit unit test performance by a lot, especially on windows. The performance gain comes from launching one gtest executable for many subtests instead of one (this is the current situation).

The shards are executed by the test runner and the results are stored in the
json format supported by the GoogleTest. Later in the test reporting stage,
all test results in the json file are retrieved to continue the test results
summary etc.

On my Win10 desktop, before this patch: check-clang-unit: 177s, check-llvm-unit: 38s; after this patch: check-clang-unit: 37s, check-llvm-unit: 11s.
On my Linux machine, before this patch: check-clang-unit: 46s, check-llvm-unit: 8s; after this patch: check-clang-unit: 7s, check-llvm-unit: 4s.

Diff Detail

Event Timeline

ychen created this revision.Mar 22 2022, 12:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 12:04 PM
ychen requested review of this revision.Mar 22 2022, 12:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 12:04 PM
ychen updated this revision to Diff 417416.Mar 22 2022, 3:40 PM
  • Fix two bugs on Windows.
ychen edited the summary of this revision. (Show Details)Mar 22 2022, 3:57 PM
ychen added a comment.Mar 22 2022, 4:37 PM

Do we support using upstream googletest instead of the in-tree copy for the unit test?

llvm/utils/lit/lit/TestingConfig.py
31

This is the replacement for --filter and --filter-not for unit tests. Since GTEST_FILTER takes a different format of regex than --filter/--filter-not , passing --filter and --filter-not arguments for unit tests to GTEST_FILTER could be confusing to the lit user. IMHO, it would be more clear to just use GTEST_FILTER. The drawback is sacrificing a bit of consistency. Please let me know if you think otherwise.

About --xfail /--xfail-not, tests for D106022 did not cover unit tests, I could add more tests and the handling for sharding in a follow-up patch considering this patch is already big.

ychen edited the summary of this revision. (Show Details)Mar 22 2022, 4:42 PM

Do we support using upstream googletest instead of the in-tree copy for the unit test?

LLVM unit tests use the in-tree gtest. If you need more recent functionality, you should look at updating the im-tree copy.

What's with all the deleted files?

llvm/utils/lit/lit/formats/googletest.py
40

Why 512?

llvm/utils/lit/lit/main.py
138
141
144

Just do len(idxs) inline

152–153

This seems redundant given the way gtests is created.

155–156

Don't use file as a variable name since it's a built-in python name.

160

Since the term appears to be SUPPRESSED anyway...

173

restcode -> resultCode or returnCode, right?

Do we support using upstream googletest instead of the in-tree copy for the unit test?

LLVM unit tests use the in-tree gtest. If you need more recent functionality, you should look at updating the im-tree copy.

What's with all the deleted files?

Hmm, I was asking about the upstream googletest because of the deleted tests. They were introduced by D18606 to handle upstream googletest --gtest_list_tests output. This is not needed for sharding since --gtest_list_tests is only used for estimating the number of sub-testcases which in turn is used to compute the number of shards. The correctness is always preserved.

llvm/utils/lit/lit/formats/googletest.py
40

Why 512?

Admittedly, I chose this with LLVM lit testing in mind. I'm not sure this is a good number for the general use of LIT. The criteria are that the number should be as big as possible to keep the number of shards small (so that the overhead is small) while the number of shards should be big enough to keep all cores busy and not hurts scheduling.

ychen updated this revision to Diff 417697.Mar 23 2022, 11:22 AM
  • Address James's comments.
ychen marked 7 inline comments as done.Mar 23 2022, 11:22 AM

Do we support using upstream googletest instead of the in-tree copy for the unit test?

The Clang/LLVM unittests use an imported copy of googletest (with some modifications, I believe mainly to allow using LLVM's various string types with gtest macros). However, I think I have seen patches intended to allow lit to work cleanly with upstream googletest. That functionality should be preserved, even though LLVM itself doesn't make use of it.

Do we support using upstream googletest instead of the in-tree copy for the unit test?

The Clang/LLVM unittests use an imported copy of googletest (with some modifications, I believe mainly to allow using LLVM's various string types with gtest macros). However, I think I have seen patches intended to allow lit to work cleanly with upstream googletest. That functionality should be preserved, even though LLVM itself doesn't make use of it.

Agreed. I think they're preserved. The deleted tests are meant for a piece of deleted code.

Re. performance, I can see the benefit if you just run the unit tests, but is that the normal use-case? My suspicion is that most peopke just run check-llvm or check-clang, which includes the regular lit tests, and therefore already ends up with things split across he available cores. Do you have any figures for that case?

llvm/utils/lit/lit/main.py
144–145

Doesn't this invalidate the indexes after the first one? If we need to delete items 2 and 4 in the list, after deleting item 2, item 4 will be at index 3 now.

152

Don't use file as a variable name.

My suspicion is that most peopke just run check-llvm or check-clang, which includes the regular lit tests, and therefore already ends up with things split across he available cores.

Yes, but.

The way lit works now, it runs each unittest program once per test function (plus one, an initial run to retrieve the list of test functions without actually running them). If a unittest has 500 test functions, lit will run the unittest program 500 times, running one test function each time. The total number of test functions is some thousands, across all unittest programs. Even on Linux, the process-creation overhead adds up. Yes, all those processes are distributed across the available cores, but you're still talking many thousands of processes.

By taking advantage of gtest's sharding feature, within each test program we essentially divide the total number of tests-per-program by the number of cores, and run each test program once per core. So, the number of processes created goes from (number-of-test-functions-across-all-unittest-programs) to (number-of-cores X number-of-test-programs). Actually in some cases fewer, because there are a handful of unittest programs that have a very small number of test functions, but for ballparking this is the right calculation. @ychen can correct me if I'm misunderstanding what's going on.

I believe he tried running each unittest program exactly once, but some of them take significant time, and the net wall-time reduction wasn't as good. gtest's sharding seemed like a good compromise.

ychen edited the summary of this revision. (Show Details)Mar 24 2022, 8:40 AM
ychen updated this revision to Diff 417949.Mar 24 2022, 9:13 AM
ychen marked an inline comment as done.
  • Rename file to f
llvm/utils/lit/lit/main.py
144–145

The -i makes the index correct. To delete items 2 and 4 in the list, the idxs is [2,4]. The del statement ends up being del tests[2-0] and del tests[4-1].

ychen added a comment.EditedMar 24 2022, 9:20 AM

My suspicion is that most peopke just run check-llvm or check-clang, which includes the regular lit tests, and therefore already ends up with things split across he available cores.

Yes, but.

The way lit works now, it runs each unittest program once per test function (plus one, an initial run to retrieve the list of test functions without actually running them). If a unittest has 500 test functions, lit will run the unittest program 500 times, running one test function each time. The total number of test functions is some thousands, across all unittest programs. Even on Linux, the process-creation overhead adds up. Yes, all those processes are distributed across the available cores, but you're still talking many thousands of processes.

By taking advantage of gtest's sharding feature, within each test program we essentially divide the total number of tests-per-program by the number of cores, and run each test program once per core. So, the number of processes created goes from (number-of-test-functions-across-all-unittest-programs) to (number-of-cores X number-of-test-programs). Actually in some cases fewer, because there are a handful of unittest programs that have a very small number of test functions, but for ballparking this is the right calculation. @ychen can correct me if I'm misunderstanding what's going on.

I believe he tried running each unittest program exactly once, but some of them take significant time, and the net wall-time reduction wasn't as good. gtest's sharding seemed like a good compromise.

Thanks for the explanation, @probinson. That's exactly where the performance gain comes from. I've updated the patch description to explain this.

My knowledge of this area is relatively limited, so I don't feel like I have a good enough understanding of the intricacies to progress this review further. Thanks for the explanations as to why this approach is most appropriate. The real cost sounds like it's due to process spawning overhead primarily (and any startup costs within the program itself). That does make me wonder whether there's a natural way of threading things within the test programs themselves, but I suspect not as I doubt the unit tests are thread-safe. Consequently, this approach seems sound, but hopefully others with knowledge in this area might be able to chime in?

llvm/utils/lit/lit/main.py
144–145

Ah, I misread that as -1 not -i. Thanks!

ychen updated this revision to Diff 418925.Mar 29 2022, 10:37 AM
  • Fixed a bug on Windows

My knowledge of this area is relatively limited, so I don't feel like I have a good enough understanding of the intricacies to progress this review further. Thanks for the explanations as to why this approach is most appropriate. The real cost sounds like it's due to process spawning overhead primarily (and any startup costs within the program itself). That does make me wonder whether there's a natural way of threading things within the test programs themselves, but I suspect not as I doubt the unit tests are thread-safe. Consequently, this approach seems sound, but hopefully others with knowledge in this area might be able to chime in?

@jhenderson Thanks for taking a look. Adding a few potentially interested reviewers working on Windows (@rnk, @mstorsjo, @compnerd).

rnk added a comment.Mar 29 2022, 11:19 AM

I don't have time to review the code, but this seems like a good direction.

What happens if the unit test shard crashes without writing the final JSON file? Does lit handle that? Is or can it be tested, and if not, what gives you confidence that it will work?

I don't have time to review the code, but this seems like a good direction.

What happens if the unit test shard crashes without writing the final JSON file? Does lit handle that? Is or can it be tested, and if not, what gives you confidence that it will work?

It would trigger an assertion failure : assert os.path.exists(test.gtest_json_file). The only way this could happen is the copy of GoogleTest in LLVM has crashing bugs that may be triggered by new tests or new test environments (existing ones, if any, should be found when this patch is landed). I think both cases could be detected and handled reliably.

rnk added a comment.Mar 29 2022, 12:51 PM

It would trigger an assertion failure : assert os.path.exists(test.gtest_json_file). The only way this could happen is the copy of GoogleTest in LLVM has crashing bugs that may be triggered by new tests or new test environments (existing ones, if any, should be found when this patch is landed). I think both cases could be detected and handled reliably.

I think I don't agree that the only way this could happen is if there are gtest bugs. One segfault in the middle of a test suite (on Linux, no SEH or signal handler recovery mechanism) will bring down the entire gtest shard process.

llvm/utils/lit/lit/formats/googletest.py
148–149

I don't think it makes sense to use assert to check for things that are not expected invariants. If an exception is desired, IMO it's better to use if ...: raise .... Alternatively, it makes sense to return a FAIL test status in this case.

160

This command is different from the sharded command. It is possible to construct tests that fail when run together but pass when run in isolation, so this command may not always reproduce failures observable with lit. I'm not sure how to signal that to the user.

rnk added a comment.Mar 29 2022, 12:52 PM

I should add, I think it's worth adding a test that shows what happens when a shard fails. This is doable, just have the fake gtest process exit without producing a json file.

I should add, I think it's worth adding a test that shows what happens when a shard fails. This is doable, just have the fake gtest process exit without producing a json file.

Sounds good. I'll add the handling for crashes with a test.

probinson added inline comments.Mar 29 2022, 2:25 PM
llvm/utils/lit/lit/formats/googletest.py
160

Currently, *all* unitttest functions are run in isolation under lit, but would be run in the same process by someone who just runs the program interactively with no arguments. It would be best to report exactly what lit did, but I'm not super worried about weird test-function interactions.

ychen updated this revision to Diff 419306.Mar 30 2022, 6:51 PM
ychen marked an inline comment as done.
  • return test failure for shard crash (added a new test case)
  • print the shard command if any tests in the shard fail.
llvm/utils/lit/lit/formats/googletest.py
160

I've added a header script for the sharded command if there are any test failures in the shard. The googletest-format down below has the output check. @rnk @probinson WDYT?

rnk accepted this revision.Mar 31 2022, 10:24 AM

Thanks, I'm happy, but please let other reviewers comment.

This revision is now accepted and ready to land.Mar 31 2022, 10:24 AM

I really like these results but I don't feel confident enough about Python to give a final okay.

yln accepted this revision.Apr 1 2022, 2:35 PM

LGTM, with one small comment, thanks!

yln added inline comments.Apr 1 2022, 2:47 PM
llvm/utils/lit/lit/main.py
111

Can we move this function out into to a gtest-specific file and make it return new (filtered & changed) lists leaving the passed-in ones intact. Something like:

selected_tests, discovered_tests = gtest.post_process_results(selected_tests, discovered_tests)

Thanks!

ychen updated this revision to Diff 419871.Apr 1 2022, 3:19 PM
  • move post processing method to class GoogleTest
ychen updated this revision to Diff 419872.Apr 1 2022, 3:20 PM
  • format
ychen marked an inline comment as done.Apr 1 2022, 3:21 PM
ychen added inline comments.
llvm/utils/lit/lit/main.py
111

Yep, done.

yln added inline comments.Apr 1 2022, 3:22 PM
llvm/utils/lit/lit/formats/googletest.py
204

This will still change the passed-in lists, right?

ychen marked an inline comment as done.Apr 1 2022, 3:28 PM
ychen added inline comments.
llvm/utils/lit/lit/formats/googletest.py
204

That's right. It is by reference.

This revision was landed with ongoing or failed builds.Apr 3 2022, 7:47 PM
This revision was automatically updated to reflect the committed changes.
fhahn added a subscriber: fhahn.Apr 4 2022, 2:21 AM

Not sure if it is related, but it looks like the output strings for failed unit tests regressed over the last day, it's now something like Clangd Unit Tests :: ./ClangdTests/24/133 instead of the test name.

thakis added a subscriber: thakis.Apr 4 2022, 5:36 AM

check-llvm has been failing on my bots since this landed: http://45.33.8.238/linux/72775/step_12.txt

Does the error make sense to you?

thakis added a comment.Apr 4 2022, 5:37 AM

(Also fails on my win bot with the same error: http://45.33.8.238/win/55601/step_11.txt

Passes fine on my mac bot, though!)

thakis added a comment.Apr 4 2022, 5:42 AM

I don't see it discussed above: An advantage of the previous scheme is that it guarantees that there are no order dependencies between tests and that each tests passes in isolation.

The speedup is nice, but it's also somewhat small relative to total check-llvm / check-clang time. Is that worth sacrificing guaranteed independent tests for?

thakis added a comment.Apr 4 2022, 5:45 AM

(Also fails on my win bot with the same error: http://45.33.8.238/win/55601/step_11.txt

Passes fine on my mac bot, though!)

Aha, I think I know what this is: MLAnalysisTests is an old, stale binary that just happens to still be there due to an old build (these bots do incremental builds). I _think_ this change means that all future stale binaries shouldn't have this problem and that this is a one-time thing, not an ongoing, systemic problem for incremental builds every time a test binary is added or removed. So I'll just manually remove this one stale binary from the bots.

thakis added a comment.Apr 4 2022, 6:48 AM

I don't see it discussed above: An advantage of the previous scheme is that it guarantees that there are no order dependencies between tests and that each tests passes in isolation.

The speedup is nice, but it's also somewhat small relative to total check-llvm / check-clang time. Is that worth sacrificing guaranteed independent tests for?

FWIW, I don't see a speedup on Win on my bot.
Before: http://45.33.8.238/win/55580/summary.html
After: http://45.33.8.238/win/55605/summary.html

If anything, check-clang got 30s slower.

On linux, check-clang is 16 seconds faster (out of ~5 min total bot cycle time, so a 5.3% speedup) but none of the other test suites changed:
Before: http://45.33.8.238/linux/72772/summary.html
After: http://45.33.8.238/linux/72807/summary.html

Do you see bigger wins on other bots?

When using result_db output from lit we are seeing

Traceback (most recent call last):
  File "/b/s/w/ir/x/w/staging/llvm_build/./bin/llvm-lit", line 50, in <module>
    main(builtin_parameters)
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/main.py", line 121, in main
    report.write_results(tests_for_report, elapsed)
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/reports.py", line 208, in write_results
    gen_resultdb_test_entry(
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/reports.py", line 163, in gen_resultdb_test_entry
    'start_time': datetime.datetime.fromtimestamp(start_time).isoformat() + 'Z',
TypeError: an integer is required (got type NoneType)

https://github.com/llvm/llvm-project/blob/main/llvm/utils/lit/lit/reports.py#L163

ychen added a comment.Apr 4 2022, 10:17 AM

Not sure if it is related, but it looks like the output strings for failed unit tests regressed over the last day, it's now something like Clangd Unit Tests :: ./ClangdTests/24/133 instead of the test name.

That's expected. 24 is the shard index; 133 is the total number of shards. The failed message would print the failed test name and the reproducing script for the shard and the individual test. If anything is missing, please let me know.

ychen added a comment.Apr 4 2022, 10:30 AM

I don't see it discussed above: An advantage of the previous scheme is that it guarantees that there are no order dependencies between tests and that each tests passes in isolation.

The speedup is nice, but it's also somewhat small relative to total check-llvm / check-clang time. Is that worth sacrificing guaranteed independent tests for?

FWIW, I don't see a speedup on Win on my bot.
Before: http://45.33.8.238/win/55580/summary.html
After: http://45.33.8.238/win/55605/summary.html

If anything, check-clang got 30s slower.

Probably there is some one-time cost involved?

http://45.33.8.238/win/55614/summary.html did not regress which is expected since these runs do not include unit tests.

On linux, check-clang is 16 seconds faster (out of ~5 min total bot cycle time, so a 5.3% speedup) but none of the other test suites changed:
Before: http://45.33.8.238/linux/72772/summary.html
After: http://45.33.8.238/linux/72807/summary.html

Do you see bigger wins on other bots?

Yeah, like
before (~270) https://lab.llvm.org/buildbot/#/builders/216/builds/2323/steps/7/logs/stdio
after (~240) https://lab.llvm.org/buildbot/#/builders/216/builds/2356/steps/7/logs/stdio

The time fluctuates to various degrees for bots. bot 216 is relatively stable and the speedup is what I would expect for a windows machine.

ychen added a comment.Apr 4 2022, 10:33 AM

When using result_db output from lit we are seeing

Traceback (most recent call last):
  File "/b/s/w/ir/x/w/staging/llvm_build/./bin/llvm-lit", line 50, in <module>
    main(builtin_parameters)
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/main.py", line 121, in main
    report.write_results(tests_for_report, elapsed)
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/reports.py", line 208, in write_results
    gen_resultdb_test_entry(
  File "/b/s/w/ir/x/w/llvm-llvm-project/llvm/utils/lit/lit/reports.py", line 163, in gen_resultdb_test_entry
    'start_time': datetime.datetime.fromtimestamp(start_time).isoformat() + 'Z',
TypeError: an integer is required (got type NoneType)

https://github.com/llvm/llvm-project/blob/main/llvm/utils/lit/lit/reports.py#L163

I see you've reverted the patch. Thanks. It seems some output formats depend on start_time which is not synthesized for the shards, and strangely the lit tests for these formats did not trigger. I'll fix it and reland.

I see you've reverted the patch. Thanks. It seems some output formats depend on start_time which is not synthesized for the shards, and strangely the lit tests for these formats did not trigger. I'll fix it and reland.

Thanks! Sorry about that. It just keeps our bots red and possibly masks other problems

ychen added a comment.Apr 4 2022, 10:40 AM

All x86 tests passed but non-x86 tests failed with

Traceback (most recent call last):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm_build0/./bin/llvm-lit", line 45, in <module>
    main(builtin_parameters)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py", line 45, in main
    opts.indirectlyRunCheck)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 279, in find_tests_for_inputs
    local_config_cache, indirectlyRunCheck)[1])
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 200, in getTestsInSuite
    litConfig, lc):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 50, in getTestsInDirectory
    num_tests = self.get_num_tests(execpath, localConfig)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 32, in get_num_tests
    out, _, exitCode = lit.util.executeCommand(cmd, env=localConfig.environment)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/util.py", line 342, in executeCommand
    env=env, close_fds=kUseCloseFDs)
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_aarch64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-aarch64-Test'

I suspect the bot has some setup issue. How do I find out who is the bot owner?

yln added inline comments.Apr 4 2022, 10:50 AM
llvm/utils/lit/lit/formats/googletest.py
204

I meant: please change this function so it builds new lists and leaves the passed-in ones unchanged. Thanks!

yln reopened this revision.Apr 4 2022, 10:52 AM
This revision is now accepted and ready to land.Apr 4 2022, 10:52 AM

All x86 tests passed but non-x86 tests failed with

Traceback (most recent call last):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm_build0/./bin/llvm-lit", line 45, in <module>
    main(builtin_parameters)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py", line 45, in main
    opts.indirectlyRunCheck)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 279, in find_tests_for_inputs
    local_config_cache, indirectlyRunCheck)[1])
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 200, in getTestsInSuite
    litConfig, lc):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 50, in getTestsInDirectory
    num_tests = self.get_num_tests(execpath, localConfig)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 32, in get_num_tests
    out, _, exitCode = lit.util.executeCommand(cmd, env=localConfig.environment)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/util.py", line 342, in executeCommand
    env=env, close_fds=kUseCloseFDs)
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_aarch64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-aarch64-Test'

I suspect the bot has some setup issue. How do I find out who is the bot owner?

new code probably missed COMPILER_RT_EMULATOR

ychen added a comment.Apr 4 2022, 1:29 PM

All x86 tests passed but non-x86 tests failed with

Traceback (most recent call last):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm_build0/./bin/llvm-lit", line 45, in <module>
    main(builtin_parameters)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/main.py", line 45, in main
    opts.indirectlyRunCheck)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 279, in find_tests_for_inputs
    local_config_cache, indirectlyRunCheck)[1])
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/discovery.py", line 200, in getTestsInSuite
    litConfig, lc):
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 50, in getTestsInDirectory
    num_tests = self.get_num_tests(execpath, localConfig)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/formats/googletest.py", line 32, in get_num_tests
    out, _, exitCode = lit.util.executeCommand(cmd, env=localConfig.environment)
  File "/b/sanitizer-x86_64-linux-qemu/build/llvm-project/llvm/utils/lit/lit/util.py", line 342, in executeCommand
    env=env, close_fds=kUseCloseFDs)
  File "/usr/lib/python3.7/subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/subprocess.py", line 1522, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
OSError: [Errno 8] Exec format error: '/b/sanitizer-x86_64-linux-qemu/build/llvm_build2_aarch64_qemu/lib/scudo/standalone/tests/./ScudoUnitTest-aarch64-Test'

I suspect the bot has some setup issue. How do I find out who is the bot owner?

new code probably missed COMPILER_RT_EMULATOR

Thanks @vitalybuka. I'll update the patch here.

vitalybuka added inline comments.Apr 4 2022, 1:34 PM
llvm/utils/lit/lit/formats/googletest.py
38

prepareCmd is gone?

ychen added inline comments.Apr 4 2022, 1:36 PM
llvm/utils/lit/lit/formats/googletest.py
38

yep. I'll put it back with a test.

thakis added a comment.Apr 4 2022, 4:58 PM

http://45.33.8.238/win/55614/summary.html did not regress which is expected since these runs do not include unit tests.

check-llvm includes llvm's unit tests, check-lld includes lld's unit tests, check-clang includes clang's unit tests. That build does include unit tests.

If you compare the stdout:
http://45.33.8.238/linux/72772/step_7.txt
http://45.33.8.238/linux/72807/step_7.txt

then one says "30362 tests" while the other says "16250 tests" (which I think is expected since we now count number of unit test processes instead of number of tests?), so it does seem to have an effect. It's just not (much) faster.

before (~270) https://lab.llvm.org/buildbot/#/builders/216/builds/2323/steps/7/logs/stdio
after (~240) https://lab.llvm.org/buildbot/#/builders/216/builds/2356/steps/7/logs/stdio

The time fluctuates to various degrees for bots. bot 216 is relatively stable and the speedup is what I would expect for a windows machine.

Even on that it it's a 30s improvement on a bit with a 9-15 min cycle time, so a 3-5% speedup. Is that worth giving up the property that we're guaranteed we don't have ordering dependencies between tests?

ychen added a comment.Apr 4 2022, 5:11 PM

http://45.33.8.238/win/55614/summary.html did not regress which is expected since these runs do not include unit tests.

check-llvm includes llvm's unit tests, check-lld includes lld's unit tests, check-clang includes clang's unit tests. That build does include unit tests.

If you compare the stdout:
http://45.33.8.238/linux/72772/step_7.txt
http://45.33.8.238/linux/72807/step_7.txt

then one says "30362 tests" while the other says "16250 tests" (which I think is expected since we now count number of unit test processes instead of number of tests?), so it does seem to have an effect. It's just not (much) faster.

before (~270) https://lab.llvm.org/buildbot/#/builders/216/builds/2323/steps/7/logs/stdio
after (~240) https://lab.llvm.org/buildbot/#/builders/216/builds/2356/steps/7/logs/stdio

The time fluctuates to various degrees for bots. bot 216 is relatively stable and the speedup is what I would expect for a windows machine.

Even on that it it's a 30s improvement on a bit with a 9-15 min cycle time, so a 3-5% speedup. Is that worth giving up the property that we're guaranteed we don't have ordering dependencies between tests?

https://github.com/google/googletest/blob/main/docs/advanced.md says that Remember that the test order is undefined, so your code can't depend on a test preceding or following another.. And https://github.com/google/googletest/blob/main/docs/advanced.md#shuffling-the-tests seems to suggest random order is that default. If we're paranoid, I could pass in GTEST_SHUFFLE all the time. WDYT?

ychen added a comment.Apr 4 2022, 5:18 PM

http://45.33.8.238/win/55614/summary.html did not regress which is expected since these runs do not include unit tests.

check-llvm includes llvm's unit tests, check-lld includes lld's unit tests, check-clang includes clang's unit tests. That build does include unit tests.

If you compare the stdout:
http://45.33.8.238/linux/72772/step_7.txt
http://45.33.8.238/linux/72807/step_7.txt

Agreed that it does not help much for this bot. It is pretty fast already. TBH, I think the patch helps local development more than CI, especially for people working on Windows.

then one says "30362 tests" while the other says "16250 tests" (which I think is expected since we now count number of unit test processes instead of number of tests?), so it does seem to have an effect. It's just not (much) faster.

before (~270) https://lab.llvm.org/buildbot/#/builders/216/builds/2323/steps/7/logs/stdio
after (~240) https://lab.llvm.org/buildbot/#/builders/216/builds/2356/steps/7/logs/stdio

The time fluctuates to various degrees for bots. bot 216 is relatively stable and the speedup is what I would expect for a windows machine.

Even on that it it's a 30s improvement on a bit with a 9-15 min cycle time, so a 3-5% speedup. Is that worth giving up the property that we're guaranteed we don't have ordering dependencies between tests?

ychen updated this revision to Diff 420681.Apr 5 2022, 6:42 PM
ychen marked 2 inline comments as done.

Thanks, everyone for the valuable feedback. I believe I addressed all of them in the updated patch.

  • @abrachet: makes sure synthesize result.pid and result.start that are used by --resultdb-output and --time-trace-output. (with a test case)
  • @vitalybuka: makes sure prepareCmd is used during the test discovery phase. (added a test case for the GTest wrapper command)
  • @thakis: hook up lit --order=random to GTEST_SHUFFLE=1 and when a shard fails, include GTEST_RANDOM_SEED in the reproducer script. (this is checked in the updated test)
  • @yln: compose new lists instead of modifying existing lists.
  • formatting
ychen requested review of this revision.Apr 5 2022, 6:45 PM
abrachet accepted this revision.Apr 6 2022, 10:14 AM

Thanks, everyone for the valuable feedback. I believe I addressed all of them in the updated patch.

  • @abrachet: makes sure synthesize result.pid and result.start that are used by --resultdb-output and --time-trace-output. (with a test case)

I checked this out locally and it works for --resultdb-output. Thanks. Hopefully others can say if the fixes worked for them too before resubmitting.

This revision is now accepted and ready to land.Apr 6 2022, 10:14 AM
ormris added a subscriber: ormris.Apr 11 2022, 10:29 AM
This revision was landed with ongoing or failed builds.Apr 12 2022, 2:51 PM
This revision was automatically updated to reflect the committed changes.

I think this change broke libc++ tests. _executeScriptInternal in libcxx/utils/libcxx/test/dsl.py does not pass an argument for LitConfig's new order parameter:

rprichard@cashew:/x/llvm-upstream/out$ cmake -GNinja -S ../llvm-project/runtimes -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" -DLIBCXXABI_USE_LLVM_UNWINDER=ON
...
rprichard@cashew:/x/llvm-upstream/out$ ninja
...
rprichard@cashew:/x/llvm-upstream/out$ ninja check-cxx
[0/1] Running libcxx tests
llvm-lit: /x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/newconfig.py:23: note: Applied 'add Lit feature target=x86_64-unknown-linux-gnu' as a result of parameter 'target_triple=x86_64-unknown-linux-gnu'
llvm-lit: /x/llvm-upstream/llvm-project/llvm/utils/lit/lit/TestingConfig.py:103: fatal: unable to parse config file '/x/llvm-upstream/out/libcxx/test/lit.site.cfg', traceback: Traceback (most recent call last):
  File "/x/llvm-upstream/llvm-project/llvm/utils/lit/lit/TestingConfig.py", line 92, in load_from_path
    exec(compile(data, path, 'exec'), cfg_globals, None)
  File "/x/llvm-upstream/out/libcxx/test/lit.site.cfg", line 21, in <module>
    libcxx.test.newconfig.configure(
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/newconfig.py", line 22, in configure
    action.applyTo(config)
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 400, in applyTo
    if hasCompileFlag(config, flag):
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 64, in f
    cache[cacheKey] = function(config, *args, **kwargs)
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 203, in hasCompileFlag
    out, err, exitCode, timeoutInfo = _executeScriptInternal(test, [
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 80, in _executeScriptInternal
    litConfig = lit.LitConfig.LitConfig(
TypeError: __init__() missing 1 required positional argument: 'order'

FAILED: libcxx/test/CMakeFiles/check-cxx /x/llvm-upstream/out/libcxx/test/CMakeFiles/check-cxx 
cd /x/llvm-upstream/out/libcxx/test && /usr/bin/python3.9 /x/llvm-upstream/out/bin/llvm-lit -sv --show-xfail --show-unsupported /x/llvm-upstream/out/libcxx/test
ninja: build stopped: subcommand failed.
ychen added a comment.Apr 12 2022, 3:43 PM

I think this change broke libc++ tests. _executeScriptInternal in libcxx/utils/libcxx/test/dsl.py does not pass an argument for LitConfig's new order parameter:

rprichard@cashew:/x/llvm-upstream/out$ cmake -GNinja -S ../llvm-project/runtimes -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi;libunwind" -DLIBCXXABI_USE_LLVM_UNWINDER=ON
...
rprichard@cashew:/x/llvm-upstream/out$ ninja
...
rprichard@cashew:/x/llvm-upstream/out$ ninja check-cxx
[0/1] Running libcxx tests
llvm-lit: /x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/newconfig.py:23: note: Applied 'add Lit feature target=x86_64-unknown-linux-gnu' as a result of parameter 'target_triple=x86_64-unknown-linux-gnu'
llvm-lit: /x/llvm-upstream/llvm-project/llvm/utils/lit/lit/TestingConfig.py:103: fatal: unable to parse config file '/x/llvm-upstream/out/libcxx/test/lit.site.cfg', traceback: Traceback (most recent call last):
  File "/x/llvm-upstream/llvm-project/llvm/utils/lit/lit/TestingConfig.py", line 92, in load_from_path
    exec(compile(data, path, 'exec'), cfg_globals, None)
  File "/x/llvm-upstream/out/libcxx/test/lit.site.cfg", line 21, in <module>
    libcxx.test.newconfig.configure(
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/newconfig.py", line 22, in configure
    action.applyTo(config)
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 400, in applyTo
    if hasCompileFlag(config, flag):
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 64, in f
    cache[cacheKey] = function(config, *args, **kwargs)
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 203, in hasCompileFlag
    out, err, exitCode, timeoutInfo = _executeScriptInternal(test, [
  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 80, in _executeScriptInternal
    litConfig = lit.LitConfig.LitConfig(
TypeError: __init__() missing 1 required positional argument: 'order'

FAILED: libcxx/test/CMakeFiles/check-cxx /x/llvm-upstream/out/libcxx/test/CMakeFiles/check-cxx 
cd /x/llvm-upstream/out/libcxx/test && /usr/bin/python3.9 /x/llvm-upstream/out/bin/llvm-lit -sv --show-xfail --show-unsupported /x/llvm-upstream/out/libcxx/test
ninja: build stopped: subcommand failed.

Fixed in 81b51b61f849fa91bdf5d069591

  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 80, in _executeScriptInternal
    litConfig = lit.LitConfig.LitConfig(
TypeError: __init__() missing 1 required positional argument: 'order'

Fixed in 81b51b61f849fa91bdf5d069591

Thanks, looks fixed to me.

  File "/x/llvm-upstream/llvm-project/libcxx/utils/libcxx/test/dsl.py", line 80, in _executeScriptInternal
    litConfig = lit.LitConfig.LitConfig(
TypeError: __init__() missing 1 required positional argument: 'order'

Fixed in 81b51b61f849fa91bdf5d069591

Thanks, looks fixed to me.

Yup, thanks. And sorry @ychen for this piece of tech debt that we carry around in libc++. I'll try removing it in the future when/if I get some time.

vitalybuka added inline comments.Apr 19 2022, 7:10 PM
llvm/utils/lit/lit/formats/googletest.py
114

Why it uses env variables when gtest has has command line flags for that? Seems flags are more portable and easier to work with emulators.

131

'Script' used to be convenient oneliner which can be copy/pasted into shell for debugging, now it's not.
also GTEST_OUTPUT and GTEST_COLOR are not needed for that purpose.

ychen added inline comments.Apr 19 2022, 8:53 PM
llvm/utils/lit/lit/formats/googletest.py
131

GTEST_SHARD_INDEX/GTEST_TOTAL_SHARDS only have env var version. To be consistent, I made them all env vars. This script is for the failing shard (there is another one for the individual tests down below). If an error could only be reproduced this way, there are some hidden dependencies among tests that are not expected. In practice, I think this script is less used than the one below.

One option is to output them in one line. At least this is user-friendly on Linux. Does that sound good to you?

vitalybuka added inline comments.Apr 19 2022, 10:57 PM
llvm/utils/lit/lit/formats/googletest.py
131

I am sorry, you are correct, only env of them.
One liner would be nice
Probably we should drop GTEST_COLOR, GTEST_SHUFFLE and GTEST_OUTPUT that case to make it shorter.

After this change, failure details from unittests are not printed both locally and on bots: https://lab.llvm.org/buildbot/#/builders/109/builds/37071
(Assuming this is a bug that can be fixed)

Also the summary of "failed tests" at the end is now not useful:

Failed Tests (6):
  Clangd Unit Tests :: ./ClangdTests/114/134
  Clangd Unit Tests :: ./ClangdTests/115/134
  Clangd Unit Tests :: ./ClangdTests/128/134
  Clangd Unit Tests :: ./ClangdTests/42/134
  Clangd Unit Tests :: ./ClangdTests/78/134
  Clangd Unit Tests :: ./ClangdTests/87/134

This is critical information (e.g. to pick the simplest test to start debugging) and having to scan through the test outputs to try to find the actual failing test names is a large regression.
Is there a plan to address this?

After this change, failure details from unittests are not printed both locally and on bots: https://lab.llvm.org/buildbot/#/builders/109/builds/37071
(Assuming this is a bug that can be fixed)

Having read the change more carefully, it sounds like it's *expected* that a crashing unittest no longer prints a stack trace on stdout.

I think this is an unacceptable regression. Crashes/asserts are a common way that tests fail and the stack trace from a debug build is how we investigate them.
The rerun commands are hard to find, require manually rebuilding to iterate on, and don't filter output to the test in question as lit did (prior to this change).
Moreover, it's not possible to manually rerun the test on a bot, and many times the crash trace/assertion messages from buildbots have been *critical* to debugging failures that only occur in certain buildbot configurations (e.g. platforms I don't have access to).
Finally, some crashes are rare/nondeterministic, and so rerunning is not always a good option.

I'm not familiar enough with this part of lit to suggest solutions here.
But can this be reverted or made an off-by-default option until it can provide equivalent information & convenience?

After this change, failure details from unittests are not printed both locally and on bots: https://lab.llvm.org/buildbot/#/builders/109/builds/37071
(Assuming this is a bug that can be fixed)

Having read the change more carefully, it sounds like it's *expected* that a crashing unittest no longer prints a stack trace on stdout.

I think this is an unacceptable regression. Crashes/asserts are a common way that tests fail and the stack trace from a debug build is how we investigate them.
The rerun commands are hard to find, require manually rebuilding to iterate on, and don't filter output to the test in question as lit did (prior to this change).
Moreover, it's not possible to manually rerun the test on a bot, and many times the crash trace/assertion messages from buildbots have been *critical* to debugging failures that only occur in certain buildbot configurations (e.g. platforms I don't have access to).
Finally, some crashes are rare/nondeterministic, and so rerunning is not always a good option.

I'm not familiar enough with this part of lit to suggest solutions here.
But can this be reverted or made an off-by-default option until it can provide equivalent information & convenience?

Thanks for the feedback. I think there is a way to address this. Give me one day or two.

I'm not familiar enough with this part of lit to suggest solutions here.
But can this be reverted or made an off-by-default option until it can provide equivalent information & convenience?

Thanks for the feedback. I think there is a way to address this. Give me one day or two.

Thanks for looking into it.
And the test speed impressive & fantastic, thanks for working on it!

But can this be reverted or made an off-by-default option until it can provide equivalent information & convenience?

I appreciate the speed improvement on Windows, but I'd like to second @sammccall's request.
Currently I'm intermittently getting the output below from my CI, and it's too opaque.

Script(shard):
--
GTEST_COLOR=no
GTEST_SHUFFLE=0
GTEST_TOTAL_SHARDS=83
GTEST_SHARD_INDEX=69
GTEST_OUTPUT=json:/path/to/clang/unittests/Tooling/./ToolingTests-Clang-Unit-54664-69-83.json
/path/to/clang/unittests/Tooling/./ToolingTests
--
shard JSON output does not exist: /path/to/clang/unittests/Tooling/./ToolingTests-Clang-Unit-54664-69-83.json

Something, somewhere is sometimes, somehow causing a crash but the information the test system provides is not helpful to figuring out what that is.
Given the choice between faster opaque tests and slower, more helpful ones I'd like the slower ones back please.

But can this be reverted or made an off-by-default option until it can provide equivalent information & convenience?

I appreciate the speed improvement on Windows, but I'd like to second @sammccall's request.
Currently I'm intermittently getting the output below from my CI, and it's too opaque.

Script(shard):
--
GTEST_COLOR=no
GTEST_SHUFFLE=0
GTEST_TOTAL_SHARDS=83
GTEST_SHARD_INDEX=69
GTEST_OUTPUT=json:/path/to/clang/unittests/Tooling/./ToolingTests-Clang-Unit-54664-69-83.json
/path/to/clang/unittests/Tooling/./ToolingTests
--
shard JSON output does not exist: /path/to/clang/unittests/Tooling/./ToolingTests-Clang-Unit-54664-69-83.json

Something, somewhere is sometimes, somehow causing a crash but the information the test system provides is not helpful to figuring out what that is.
Given the choice between faster opaque tests and slower, more helpful ones I'd like the slower ones back please.

The patch to address this has just been committed (d3efa577f549760a42ba68bda95d1e). I'm still waiting for the CI results. Looks good so far. :-)

That's working perfectly for me. Great improvement, thank you very much!

This has changed test totals reported on the buildbot dashboard, so I would be interested in a comment on that.

As others have mentioned, some tests are now reported as a kind of summary. For example I used to get 1332 lines of "ADTTests" like:

LLVM-Unit :: ADT/./ADTTests/AnyTest.BadAnyCast

and now get three lines in total:

LLVM-Unit :: ADT/./ADTTests/0/3
LLVM-Unit :: ADT/./ADTTests/1/3
LLVM-Unit :: ADT/./ADTTests/2/3
(3-core VM)

Before this change, the XCore buildbot totals on the buildbot staging dashboard matched the totals at the end of the "check-all" log:

https://lab.llvm.org/staging/#/builders/145/builds/5438

But after this change, the dashboard totals do not match the totals at the end of the log. The passed total is down by 20,000 and "skipped" (skipped Google unit tests) is not reported at all:

https://lab.llvm.org/staging/#/builders/145/builds/5439

I think the buildbot dashboard report counts log lines, rather than just displaying the totals at the end of the log.

A green buildbot with the change is clang-ppc64le-linux-multistage on main buildbot master:

https://lab.llvm.org/buildbot/#/builders/121/builds/18347
https://lab.llvm.org/buildbot/#/builders/121/builds/18350

Again the number of "expected passes" does not match the end of the log, "skipped" is not on the dashboard, and the log has numbers instead of test names.

(I tried the follow-up commit but it does not give test names of passing tests.)

Is this all intended and/or expected? What do you think about the buildbot totals?

Thanks,
Nigel

ychen added a comment.May 4 2022, 10:20 AM

This has changed test totals reported on the buildbot dashboard, so I would be interested in a comment on that.

As others have mentioned, some tests are now reported as a kind of summary. For example I used to get 1332 lines of "ADTTests" like:

LLVM-Unit :: ADT/./ADTTests/AnyTest.BadAnyCast

and now get three lines in total:

LLVM-Unit :: ADT/./ADTTests/0/3
LLVM-Unit :: ADT/./ADTTests/1/3
LLVM-Unit :: ADT/./ADTTests/2/3
(3-core VM)

Before this change, the XCore buildbot totals on the buildbot staging dashboard matched the totals at the end of the "check-all" log:

https://lab.llvm.org/staging/#/builders/145/builds/5438

But after this change, the dashboard totals do not match the totals at the end of the log. The passed total is down by 20,000 and "skipped" (skipped Google unit tests) is not reported at all:

https://lab.llvm.org/staging/#/builders/145/builds/5439

I think the buildbot dashboard report counts log lines, rather than just displaying the totals at the end of the log.

A green buildbot with the change is clang-ppc64le-linux-multistage on main buildbot master:

https://lab.llvm.org/buildbot/#/builders/121/builds/18347
https://lab.llvm.org/buildbot/#/builders/121/builds/18350

Again the number of "expected passes" does not match the end of the log, "skipped" is not on the dashboard, and the log has numbers instead of test names.

(I tried the follow-up commit but it does not give test names of passing tests.)

Is this all intended and/or expected? What do you think about the buildbot totals?

Thanks,
Nigel

This is all intended. The buildbot totals are counting the number of shards across all unit tests. The summary at the end of the log is counting the number of test cases like before. I think it is possible to let the buildbot show the summary but I'd understand the use case before doing that. Are you blocked in some way due to the mismatch?

The buildbot totals are counting the number of shards across all unit tests. The summary at the end of the log is counting the number of test cases like before.

lit used to report (or include in the report) the number of individual test functions across all the unittests. Now I believe it is reporting the number of shards, instead? I should think the code in lit that parses googletest output should be able to extract the number of tests executed by each shard (and if not, it could be upgraded to do so). That would recover the number that people have been expecting to see (both directly from lit, from buildbots, and possibly from metrics recorded elsewhere).

I wouldn't think it was a big deal, except that total number is in the tens of thousands. I'm sure it is very surprising to people to have that huge drop.

ychen added a comment.May 4 2022, 11:47 AM

The buildbot totals are counting the number of shards across all unit tests. The summary at the end of the log is counting the number of test cases like before.

lit used to report (or include in the report) the number of individual test functions across all the unittests. Now I believe it is reporting the number of shards, instead? I should think the code in lit that parses googletest output should be able to extract the number of tests executed by each shard (and if not, it could be upgraded to do so). That would recover the number that people have been expecting to see (both directly from lit, from buildbots, and possibly from metrics recorded elsewhere).

Yeah, I attempted this before posting the patch. It is not trivial. There is one shared results vector across all individual jobs in the thread pool and the LIT thread pool is multiprocess instead of multithreaded. Extracting the results inside a shard earlier would incur some nontrivial overhead due to IPC and from my initial attempt, the required source code changes are pretty intrusive so I ended up deferring the shard results extraction until the test reporting phase (which is after the thread pool execution is done.)

Are you blocked in some way due to the mismatch?

No, not blocked at all.

thakis added a comment.EditedMay 20 2022, 6:08 PM

Another effect of this: If a unit test hangs forever, killing the process now no longer prints the hanging test name in the results. (Example: http://45.33.8.238/linux/76575/step_9.txt , where clangd unit tests ran for a very long time, until I killed that process.) I'm guessing it's https://github.com/llvm/llvm-project/issues/48455 since we have this known, rare, hang in those tests -- but if I didn't know about that issue, I'd be stumped.

What's the recommendation for finding out which test a test process is hanging in now?

ychen added a comment.May 20 2022, 6:17 PM

Another effect of this: If a unit test hangs forever, killing the process now no longer prints the hanging test name in the results. (Example: http://45.33.8.238/linux/76575/step_9.txt , where clangd unit tests ran for a very long time, until I killed that process.) I'm guessing it's https://github.com/llvm/llvm-project/issues/48455 since we have this known, rare, hang in those tests -- but if I didn't know about that issue, I'd be stumped.

What's the recommendation for finding out which test a test process is hanging in now?

The last test is TUSchedulerTests.Cancellation in the shard. Do you mean it is not `TUSchedulerTests.Cancellation that hangs?

thakis added a comment.EditedMay 23 2022, 6:46 AM

The last test is TUSchedulerTests.Cancellation in the shard. Do you mean it is not `TUSchedulerTests.Cancellation that hangs?

I'm just bad at reading. Thanks, that _is_ easy to see :) Sorry for the noise.

Not sure when it's started but, but likely related to sharding

ninja check-all exits with 0 if only sharded test fail

LIT_FILTER=Unit ninja check-all || echo 1

FAIL: LLVM-Unit :: Analysis/./AnalysisTests/29/157 (1459 of 6195)
******************** TEST 'LLVM-Unit :: Analysis/./AnalysisTests/29/157' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/usr/local/google/home/vitalybuka/tmp/bot/llvm_build_asan/unittests/Analysis/./AnalysisTests-LLVM-Unit-1704385-29-157.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=157 GTEST_SHARD_INDEX=29 /usr/local/google/home/vitalybuka/tmp/bot/llvm_build_asan/unittests/Analysis/./AnalysisTests
--


********************

Testing Time: 21.39s
  Excluded   : 62250
  Skipped    :    38
  Unsupported:     4
  Passed     : 24625
vvv:~/tmp/bot/llvm_build_asan$

check stage2/asan check of https://lab.llvm.org/buildbot/#/builders/5/builds/26144
This bot is broken for a while, but we didn't see that

Also there is not output of failure
previously we had something

GTEST_OUTPUT=json:/usr/local/google/home/vitalybuka/tmp/bot/llvm_build_asan/unittests/Analysis/./AnalysisTests-LLVM-Unit-1704385-29-157.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=157 GTEST_SHARD_INDEX=29 /usr/local/google/home/vitalybuka/tmp/bot/llvm_build_asan/unittests/Analysis/./AnalysisTests
Note: This is test shard 30 of 157.
[==========] Running 4 tests from 4 test suites.
[----------] Global test environment set-up.
[----------] 1 test from IsPotentiallyReachableTest
[ RUN      ] IsPotentiallyReachableTest.SimpleLoop1
[       OK ] IsPotentiallyReachableTest.SimpleLoop1 (0 ms)
[----------] 1 test from IsPotentiallyReachableTest (0 ms total)

[----------] 1 test from IRSimilarityIdentifier
[ RUN      ] IRSimilarityIdentifier.IntrinsicCommutative
[       OK ] IRSimilarityIdentifier.IntrinsicCommutative (1 ms)
[----------] 1 test from IRSimilarityIdentifier (1 ms total)

[----------] 1 test from ScalarEvolutionsTest
[ RUN      ] ScalarEvolutionsTest.ComputeMaxTripCountFromMultiDemArray
[       OK ] ScalarEvolutionsTest.ComputeMaxTripCountFromMultiDemArray (1 ms)
[----------] 1 test from ScalarEvolutionsTest (1 ms total)

[----------] 1 test from ComputeKnownBitsTest
[ RUN      ] ComputeKnownBitsTest.ComputeKnownBitsAddWithRangeNoOverlap
[       OK ] ComputeKnownBitsTest.ComputeKnownBitsAddWithRangeNoOverlap (0 ms)
[----------] 1 test from ComputeKnownBitsTest (0 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 4 test suites ran. (3 ms total)
[  PASSED  ] 4 tests.

=================================================================
==1767424==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0xc9f0b7 in operator new(unsigned long) /usr/local/google/home/vitalybuka/tmp/bot/llvm-project/compiler-rt/lib/asan/asan_new_delete.cpp:95:3
    #1 0xd1e63f in (anonymous namespace)::IsPotentiallyReachableTest::ExpectPath(bool)::IsPotentiallyReachableTestPass::initialize() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/unittests/Analysis/CFGTest.cpp:80:25
    #2 0xd1c5df in (anonymous namespace)::IsPotentiallyReachableTest::ExpectPath(bool) /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/unittests/Analysis/CFGTest.cpp:117:29
    #3 0x230b53c in HandleExceptionsInMethodIfSupported<testing::Test, void> /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #4 0x230b53c in testing::Test::Run() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2508:5
    #5 0x230fbab in testing::TestInfo::Run() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11
    #6 0x231103f in testing::TestSuite::Run() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28
    #7 0x233d3ea in testing::internal::UnitTestImpl::RunAllTests() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44
    #8 0x233be4c in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #9 0x233be4c in testing::UnitTest::Run() /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10
    #10 0x22eea08 in RUN_ALL_TESTS /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2472:46
    #11 0x22eea08 in main /usr/local/google/home/vitalybuka/src/llvm.git/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:55:10
    #12 0x7f5876dd37fc in __libc_start_main csu/../csu/libc-start.c:332:16

SUMMARY: AddressSanitizer: 80 byte(s) leaked in 1 allocation(s).

Also there is not output of failure
previously we had something

I've also seen this recently when trying to debug a tsan failure: we see that the test shard fails, but no failure details (and so no idea which test *case* to associate with).

https://lab.llvm.org/buildbot/#/builders/131/builds/30528/steps/6/logs/stdio

I also don't know whether it's this patch or a more recent regression, as I haven't had cause to investigate a santitizer failure in a while.

Also there is not output of failure
previously we had something

I've also seen this recently when trying to debug a tsan failure: we see that the test shard fails, but no failure details (and so no idea which test *case* to associate with).

https://lab.llvm.org/buildbot/#/builders/131/builds/30528/steps/6/logs/stdio

I also don't know whether it's this patch or a more recent regression, as I haven't had cause to investigate a santitizer failure in a while.

@vitalybuka Thanks for the link. I'll take a look today.

ychen added a comment.EditedJul 15 2022, 12:33 PM

Also there is not output of failure
previously we had something

I've also seen this recently when trying to debug a tsan failure: we see that the test shard fails, but no failure details (and so no idea which test *case* to associate with).

https://lab.llvm.org/buildbot/#/builders/131/builds/30528/steps/6/logs/stdio

I also don't know whether it's this patch or a more recent regression, as I haven't had cause to investigate a santitizer failure in a while.

@vitalybuka Thanks for the link. I'll take a look today.

Should've been fixed by 4cd1c96d375aaaf8b48f0fddc8bd0a6619ff061c

ychen added a comment.EditedJul 15 2022, 5:51 PM

Also there is not output of failure
previously we had something

I've also seen this recently when trying to debug a tsan failure: we see that the test shard fails, but no failure details (and so no idea which test *case* to associate with).

https://lab.llvm.org/buildbot/#/builders/131/builds/30528/steps/6/logs/stdio

I also don't know whether it's this patch or a more recent regression, as I haven't had cause to investigate a santitizer failure in a while.

@sammccall Looks like a data race in [ RUN ] TUSchedulerTests.PreambleThrottle (https://lab.llvm.org/buildbot/#/builders/131/builds/30569/steps/6/logs/FAIL__Clangd_Unit_Tests__18).

yln added a comment.Aug 17 2022, 4:31 PM

@ychen Is there a nob to disable the sharding for unit tests?

ychen added a comment.Aug 17 2022, 4:43 PM

@ychen Is there a nob to disable the sharding for unit tests?

Not at the moment. Two approaches are possible: 1. (easy) force a shard size of 1 so tests do not affect each other; 2. (more involved) add a knob to have the original behavior. I hope we choose 2 as the last the resort.

llvm/utils/lit/tests/googletest-timeout.py