This is an archive of the discontinued LLVM Phabricator instance.

[lit][test-suite] - Allow 1 test to report multiple individual test results
AbandonedPublic

Authored by homerdin on Nov 15 2017, 7:16 AM.

Details

Reviewers
hfinkel
MatzeB
Summary

Hello,

Based on the suggestion from @MatzeB, I've added the logic to allow a Result object to have nested Result objects to report microbenchmarks. I have also made the change to report these tests individually in the json output. This will allow tests to report separate results when desired.

Do we need to add a way to store the unit of measurement? The exec_time shown below are mean cpu_time in usec.
I've altered the microbenchmark in the test-suite to test and we get the following outputs:

lit cmd line output

-- Testing: 1 tests, 1 threads --
PASS: test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test (1 of 1)
********** TEST 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' RESULTS **********
compile_time: 0.8284 
hash: "daec0d37414da26fdae56f0d9bfa2cc0" 
iterations: 10 
link_time: 0.0587 
**********
*** MICRO-TEST 'BM_RDTSCP_Cost RESULTS ***
    exec_time:  14.1476 
    std_dev:  0.0003 
*** MICRO-TEST 'BM_ReturnInstrumentedPatched RESULTS ***
    exec_time:  12.3221 
    std_dev:  0.0002 
*** MICRO-TEST 'BM_ReturnInstrumentedPatchedThenUnpatched RESULTS ***
    exec_time:  2.2820 
    std_dev:  0.0001 
*** MICRO-TEST 'BM_ReturnInstrumentedPatchedWithLogHandler RESULTS ***
    exec_time:  42.8986 
    std_dev:  0.0008 
*** MICRO-TEST 'BM_ReturnInstrumentedUnPatched RESULTS ***
    exec_time:  3.4562 
    std_dev:  0.0050 
*** MICRO-TEST 'BM_ReturnNeverInstrumented RESULTS ***
    exec_time:  1.8256 
    std_dev:  0.0001 
**********
Testing Time: 43.79s
  Expected Passes    : 1

lit json output

{
  "__version__": [
    0,
    6,
    0
  ],
  "elapsed": 43.78972411155701,
  "tests": [
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 3.45623,
        "std_dev": 0.00501622
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedUnPatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 2.28196,
        "std_dev": 6.62184e-05
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatchedThenUnpatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 42.8986,
        "std_dev": 0.000839534
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatchedWithLogHandler.test",
      "output": ""
    },
    { 
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 14.1476,
        "std_dev": 0.000291057
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_RDTSCP_Cost.test",
      "output": ""
    },
    { 
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 1.82558,
        "std_dev": 5.89162e-05
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnNeverInstrumented.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 12.3221,
        "std_dev": 0.000165266
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elapsed": 43.70928406715393,
      "metrics": {
        "compile_time": 0.8284,
        "hash": "daec0d37414da26fdae56f0d9bfa2cc0",
        "iterations": 10,
        "link_time": 0.0587
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test",
      "output": "\n/home/bhomerding/build/test-suite-micro2/MicroBenchmarks/XRay/ReturnReference/retref-bench --benchmark_repetitions=10 --benchmark_format=csv --benchmark_report_aggregates_only=true > /home/bhomerding/build/test-suite-micro2/MicroBenchmarks/XRay/ReturnReference/Output/retref-bench.test.bench.csv"
    }
  ]
}

Diff Detail

Event Timeline

homerdin created this revision.Nov 15 2017, 7:16 AM
MatzeB edited edge metadata.EditedNov 15 2017, 11:06 AM

I'm surprised that lit lets you sneak a dictionary into the reported metrics.

In general though I'd rather see us change the lit tool so we can report 1 test in the lit output for each sub-benchmark of a microbenchmark executable.

That way we would not end up with two classes of tools that understand the microbenchmark metric dictionary and the ones that don't. There are some uses outside of LNT around (test-suite/utils/compare.py is getting more popular and some buildbots look at lits xunit output).

@MatzeB Thanks for your feedback, its good to learn what some other use cases are. Sorry about the slow response. I've looked into adding this functionality to the lit tool. It looks like we would have to have the worker_run_one_test( ) return a collection of test objects or possible be turned into a generator. Then provide some logic to add the extra tests to the main thread's test list. This would have the side effect of producing more results than test discovery had found PASS: test-suite :: xyz.test (5 of 3).

Do you know if the --max-tests option is used and in what context? ( Max number of executables to run or max number of results desired )

Does this sound like a reasonable change? Would it be invasive to change the test discovery to have tests say how many results they will return and then make worker threads responsible for slices of the list?

Thanks again.

homerdin updated this revision to Diff 127571.Dec 19 2017, 11:12 AM
homerdin retitled this revision from [RFC][LNT][test-suite] - Allow 1 test to report multiple individual test results to [lit][test-suite] - Allow 1 test to report multiple individual test results.
homerdin edited the summary of this revision. (Show Details)
homerdin updated this revision to Diff 131320.Jan 24 2018, 11:33 AM

@MatzeB I've included the changes needed to the existing microbenchmark support to allow for microbenchmarks to collect individual results. This is primarily a change to the lit Result class to allow nested Result objects.

I don't see any tests being added to lit's test suite. Please add some. This would also be help myself and others understand what you're trying to do. You're also touching code that isn't part of lit. This probably should be split into two patches, one for the changes to lit and one for the code that actually uses those changes.

homerdin abandoned this revision.Feb 14 2018, 2:46 PM

I am abandoning this revision and splitting it up into smaller ones.

Changes to lit to support microbenchmarks:
https://reviews.llvm.org/D43314

Changes to microbenchmark litsupport plugin and XRay microbenchmarks:
https://reviews.llvm.org/D43316

Also a link to the LCALS (Livermore Compiler Analysis Loop Suite) revision I wish to add using these changes:
https://reviews.llvm.org/D43319