This is an archive of the discontinued LLVM Phabricator instance.

[XRay] [test-suite] Add LNT support to retref-bench benchmarks.
AbandonedPublic

Authored by eizan on Sep 3 2017, 4:05 AM.

Details

Reviewers
dberris
MatzeB
Summary

This might be quite a nasty hack, so I'd like to solicit some feedback
on what the best course to take with it is. I've symlinked the benchmark lib
and XRay microbenchmark sources into LNTBased/XRay. When the test is run, cmake
is invoked and the retref-bench binary is built. When it is run, its output is
parsed and the appropriate TestSamples objects are returned.

This is to accomplish a TODO item in https://reviews.llvm.org/D32272#794759 :

"In the future teach LNT to understand the Google Benchmark output."

Event Timeline

eizan created this revision.Sep 3 2017, 4:05 AM
hfinkel added a subscriber: hfinkel.Sep 3 2017, 7:11 AM

Thanks for working on this! As a general note, I plan to add other microbenchmarks (and to transition at least MultiSource/Benchmarks/TSVC into this form). Can we take the code in LNTBased/XRay/TestModule and move it into, say, LNTBased/lib/microbenchmarks.py. We can have LNTBased/Microbenchmarks/XRay, and the we can just iterate over all top-level directories in LNTBased/Microbenchmarks. The name 'XRay' is currently hard-coded in LNTBased/XRay/TestModule, but that could be a parameter taken from the directory name.

LNTBased/XRay/TestModule
35 ↗(On Diff #113683)

Please don't fix the number of iterations (i.e., --benchmark_repetitions=10). A major advantage of Google's benchmark library is its ability to dynamically vary the number of iterations in favor of determining accurately the per-iteration time. We should let it do that.

47 ↗(On Diff #113683)

Please add a comment here with the columns so that it's clear what's happening:

  1. name,iterations,real_time,cpu_time,time_unit,...
eizan updated this revision to Diff 114518.Sep 10 2017, 5:49 AM
  • move running of benchmark & parsing output into lib/microbenchmark.py
  • Move XRay dir into MicroBenchmarks and fix up paths
  • move building of XRay benchmarks into microbenchmark lib
  • pass ARCH to build_microbenchmark invocation of cmake
dberris edited edge metadata.Sep 17 2017, 11:47 PM

Let's wait for @MatzeB's feedback on this.

LNTBased/MicroBenchmarks/XRay/CMakeLists.txt
7

This one you probably want to move into either the MicroBenchmarks/CMakeLists.txt file so it gets added if any of the microbenchmarks are built. Then from here you can just add a dependency to the executable besides adding the target_link_libraries.

This should address the multiple build problem you were describing (i.e. for each of the different micro benchmark runs).

MatzeB edited edge metadata.Sep 18 2017, 3:47 PM

First: This looks ok and can be committed as is if you want.

However IMO using the LNTBased system here is not the best choice, I'd rather try writing a litsupport plugin. LNTBased is little used and IMO less flexible than the cmake+lit plugin approach.

LNTBased/MicroBenchmarks/XRay/CMakeLists.txt
1–13

I don't see why you need a separate CMakeLists.txt here?

First: This looks ok and can be committed as is if you want.

However IMO using the LNTBased system here is not the best choice, I'd rather try writing a litsupport plugin. LNTBased is little used and IMO less flexible than the cmake+lit plugin approach.

I agree, having this in litsupport seems better (and also solves the duplication/symlink problem). I imagine that the plugin could append a function to metric_collectors that reads the CSV and reports the associated timing data. @MatzeB, is there documentation on the litsupport plugin interface anywhere? (if all we have is what's in the README.md, we should write some more).

First: This looks ok and can be committed as is if you want.

However IMO using the LNTBased system here is not the best choice, I'd rather try writing a litsupport plugin. LNTBased is little used and IMO less flexible than the cmake+lit plugin approach.

I agree, having this in litsupport seems better (and also solves the duplication/symlink problem). I imagine that the plugin could append a function to metric_collectors that reads the CSV and reports the associated timing data. @MatzeB, is there documentation on the litsupport plugin interface anywhere? (if all we have is what's in the README.md, we should write some more).

That's all right now. I'll write some paragraphs on how to get started in the README file and make sure the interfaces have better documentation comments.

In played around with this yesterday and this should work to collect the gtest output:

litsupport/microbenchmark.py

'''Test module to collect google benchmark results.'''
from litsupport import shellcommand
from litsupport import testplan
import csv
import lit.Test


def _mutateCommandLine(context, commandline):
    cmd = shellcommand.parse(commandline)
    cmd.arguments.append("--benchmark_format=csv")
    # We need stdout outself to get the benchmark csv data.
    if cmd.stdout is not None:
        raise Exception("Rerouting stdout not allowed for microbenchmarks")
    benchfile = context.tmpBase + '.bench.csv'
    cmd.stdout = benchfile
    context.microbenchfiles.append(benchfile)

    return cmd.toCommandline()


def _mutateScript(context, script):
    return testplan.mutateScript(context, script, _mutateCommandLine)


def _collectMicrobenchmarkTime(context, microbenchfiles):
    result = 0.0
    for f in microbenchfiles:
        with open(f) as inp:
            lines = csv.reader(inp)
            # First line: "name,iterations,real_time,cpu_time,time_unit..."
            for line in lines:
                if line[0] == 'name':
                    continue
                # Note that we cannot create new tests here, so for now we just
                # add up all the numbers here.
                result += float(line[3])
    return {'exec_time': lit.Test.toMetricValue(result/1000000.)}


def mutatePlan(context, plan):
    context.microbenchfiles = []
    plan.runscript = _mutateScript(context, plan.runscript)
    plan.metric_collectors.append(
        lambda context: _collectMicrobenchmarkTime(context,
                                                   context.microbenchfiles)
    )

I found two problems with this at the moment:

  • The list of litsupport plugins is currently global. Thus enabling this microbenchmarking plugin will make the non-microbenchmarking tests fail. It should be easy enough to change this to a local setting so that lit.local.cfg files can override the plugin list (just have to convince myself that this is a good feature to have :).
  • As we discussed before: lit currently needs to have 1 file to represent each test. This means we cannot produce results for 10 "benchmarks" inside a gtest executable unless we also produce 10 .test files for it.

Fixing the 2nd point is probably harder; on the other hand the workaround of simply creating a .test file for each benchmark is not too bad (the current XRay cmake already does exactly this I think).

First: This looks ok and can be committed as is if you want.

However IMO using the LNTBased system here is not the best choice, I'd rather try writing a litsupport plugin. LNTBased is little used and IMO less flexible than the cmake+lit plugin approach.

I agree, having this in litsupport seems better (and also solves the duplication/symlink problem). I imagine that the plugin could append a function to metric_collectors that reads the CSV and reports the associated timing data. @MatzeB, is there documentation on the litsupport plugin interface anywhere? (if all we have is what's in the README.md, we should write some more).

That's all right now. I'll write some paragraphs on how to get started in the README file and make sure the interfaces have better documentation comments.

In played around with this yesterday and this should work to collect the gtest output:

litsupport/microbenchmark.py

'''Test module to collect google benchmark results.'''
from litsupport import shellcommand
from litsupport import testplan
import csv
import lit.Test


def _mutateCommandLine(context, commandline):
    cmd = shellcommand.parse(commandline)
    cmd.arguments.append("--benchmark_format=csv")
    # We need stdout outself to get the benchmark csv data.
    if cmd.stdout is not None:
        raise Exception("Rerouting stdout not allowed for microbenchmarks")
    benchfile = context.tmpBase + '.bench.csv'
    cmd.stdout = benchfile
    context.microbenchfiles.append(benchfile)

    return cmd.toCommandline()


def _mutateScript(context, script):
    return testplan.mutateScript(context, script, _mutateCommandLine)


def _collectMicrobenchmarkTime(context, microbenchfiles):
    result = 0.0
    for f in microbenchfiles:
        with open(f) as inp:
            lines = csv.reader(inp)
            # First line: "name,iterations,real_time,cpu_time,time_unit..."
            for line in lines:
                if line[0] == 'name':
                    continue
                # Note that we cannot create new tests here, so for now we just
                # add up all the numbers here.
                result += float(line[3])
    return {'exec_time': lit.Test.toMetricValue(result/1000000.)}


def mutatePlan(context, plan):
    context.microbenchfiles = []
    plan.runscript = _mutateScript(context, plan.runscript)
    plan.metric_collectors.append(
        lambda context: _collectMicrobenchmarkTime(context,
                                                   context.microbenchfiles)
    )

I found two problems with this at the moment:

  • The list of litsupport plugins is currently global. Thus enabling this microbenchmarking plugin will make the non-microbenchmarking tests fail. It should be easy enough to change this to a local setting so that lit.local.cfg files can override the plugin list (just have to convince myself that this is a good feature to have :).

What's the design space here? Would it be better to be able to name plugins in the .test file?

  • As we discussed before: lit currently needs to have 1 file to represent each test. This means we cannot produce results for 10 "benchmarks" inside a gtest executable unless we also produce 10 .test files for it.

Fixing the 2nd point is probably harder; on the other hand the workaround of simply creating a .test file for each benchmark is not too bad (the current XRay cmake already does exactly this I think).

Do we need to fix this, or we can we have the different benchmarks within the executable simply produce lots of different metrics for the same test? It seems like we need the ability to track multiple metrics per test anyway. Maybe this implies updates to LNT so that the interface does something reasonable?

I found two problems with this at the moment:

  • The list of litsupport plugins is currently global. Thus enabling this microbenchmarking plugin will make the non-microbenchmarking tests fail. It should be easy enough to change this to a local setting so that lit.local.cfg files can override the plugin list (just have to convince myself that this is a good feature to have :).

What's the design space here? Would it be better to be able to name plugins in the .test file?

  • The majority of the plugins depend on the users needs and should be chosen by cmake/cmake flags. This is what happens today.
  • We could modify the plugin list further depending on the test. lit.local.cfg would apply new settings for a directory and all subdirectories. This would be in line with what we already do to influence behavior between SingleSource/MultiSource/External (the compiletime files end up with different names and the reference files must/mustn't contain the exit XX line at the end). Enhancing this with a way to request additional plugins for a subdirectory would feel most natural to me.
  • It wouldn't be harder to do it per .test file but I'm not convinced we need it at that granularity and it seems less discoverable to me than a setting in lit.local.cfg.

I'll write some paragraphs on how to get started in the README file and make sure the interfaces have better documentation comments.

I improved the documentation in r313711 and added sections about test modules and how to develop new ones in litsupport/README.md.

  • The list of litsupport plugins is currently global. Thus enabling this microbenchmarking plugin will make the non-microbenchmarking tests fail. It should be easy enough to change this to a local setting so that lit.local.cfg files can override the plugin list (just have to convince myself that this is a good feature to have :).

I just refactored the litsupport code in r314239, config.test_modules is now evaluated for every test instead of globally allowing to modify the plugin list in places lit.local.cfg to change the plugins used for a subdirectory.

Ping @eizan -- has this landed? Is there something else we need to do here?

eizan abandoned this revision.Nov 16 2017, 2:26 PM

I'm going to abandon this change as there doesn't appear to be a pressing need for it right now.

See https://reviews.llvm.org/D38496 for similar functionality to integrate the google benchmark output with lnt.