This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
MicroBenchmarks/XRay/
-
XRay/
1/1
lit.local.cfg
-
litsupport/modules/
-
modules/
5
microbenchmark.py

Differential D38496

[XRay] [test-suite] Add litsupport module for microbenchmarks
ClosedPublic

Authored by eizan on Oct 3 2017, 5:22 AM.

Download Raw Diff

Details

Reviewers

dberris
MatzeB

Summary

add microbenchmark litsupport plugin
use microbenchmark litsupport plugin for XRay benchmarks

The litsuport module was originally written by @MatzeB provided in https://reviews.llvm.org/D37421 with some minor edits to the test output name string (note: s/exec_time/microbenchmark_time_ns/)

Diff Detail

Build Status

Buildable 10771
Build 10771: arc lint + arc unit

Event Timeline

eizan created this revision.Oct 3 2017, 5:22 AM

eizan edited the summary of this revision. (Show Details)Oct 3 2017, 5:24 AM

eizan edited the summary of this revision. (Show Details)

hfinkel added a subscriber: hfinkel.Oct 3 2017, 9:55 AM

hfinkel added inline comments.

litsupport/modules/microbenchmark.py
34	The summary might make sense in some contexts, but why not just return multiple metrics? I'd really like to move to a model where we collect all of the microbenchmark timings individually.

MatzeB added inline comments.Oct 3 2017, 10:30 AM

MicroBenchmarks/XRay/lit.local.cfg
2	I think we need some more logic here, like this: test_modules = config.test_modules if 'run' in test_modules: # Insert microbenchmarks module behind 'run' test_modules.insert(test_modules.index('run')+1, 'microbenchmarks') # Timeit results are not useful for microbenchmarks if 'timeit' in test_modules: test_modules.remove('timeit') I think we should add the microbenchmarks module behind 'run' so that 'remote' or 'run_under' can come later and modify the commandline including the redirections and flags for the microbenchmarks. It's probably best to remove the 'timeit' module as the number isn't really useful for microbenchmarks.
litsupport/modules/microbenchmark.py
34	I'd like to see individual timings too; but I think putting the benchmark name as prefix to the metric name is not the right direction. LNT unfortunately has a fixed schema of metrics that are recorded. The LNT UI and homebrewn tools are probably designed to well known metrics too (some information like 'bigger/smaller is better' is derived from the name today). I think adding something to lit that allows to report multiple benchmarks/tests from a single test file can't be that hard.

hfinkel added inline comments.Oct 3 2017, 10:43 AM

litsupport/modules/microbenchmark.py
34	I think adding something to lit that allows to report multiple benchmarks/tests from a single test file can't be that hard. I don't object to doing this, but it seems like the multiple-metrics-per-test is a direction in which we need to go anyway, no? I expect that we will need to update the LNT UI regardless of what we do (otherwise we could easily end up with a single huge list that makes the interface clunky). What are we going to do with all of the binary-size information that we collect?

MatzeB added inline comments.Oct 3 2017, 10:54 AM

litsupport/modules/microbenchmark.py
34	I don't object to doing this, but it seems like the multiple-metrics-per-test is a direction in which we need to go anyway, no? I expect that we will need to update the LNT UI regardless of what we do (otherwise we could easily end up with a single huge list that makes the interface clunky). Sure and we already have multiple metrics per test of course; I think it's compiletime, exec_time, hash and size today. However given that this is somewhat two dimensional data today and we have one dimension along test names and one along the metrics, I really think the "subbenchmarks" should end up on the test-name dimenson and not the metrics one. What are we going to do with all of the binary-size information that we collect? LNT throws all the detailed codesize metrics away and only keeps `size.__text` I think, which probably is macOS only :-( Though if you run the benchmarks yourself you can simply not use LNT and view results with something like test-suite/utils/compare.py.

hfinkel added inline comments.Oct 3 2017, 2:55 PM

litsupport/modules/microbenchmark.py
34	Sure and we already have multiple metrics per test of course; I think it's compiletime, exec_time, hash and size today. However given that this is somewhat two dimensional data today and we have one dimension along test names and one along the metrics, I really think the "subbenchmarks" should end up on the test-name dimenson and not the metrics one. Okay. At some point we may want something more hierarchical, but we can also cross that bridge if/when we come to it. LNT throws all the detailed codesize metrics away and only keeps size.__text I think, which probably is macOS only :-( Yes. And that's certainly something else that can be improved.

add microbenchmark module after run and remove timeit from XRay/lit.local.cfg

eizan marked an inline comment as done.Oct 4 2017, 3:06 AM

eizan retitled this revision from [XRay] [test-suite] Add litsupport for microbenchmarks to [XRay] [test-suite] Add litsupport module for microbenchmarks.

Thanks, LGTM

This revision is now accepted and ready to land.Oct 4 2017, 11:03 AM

LGTM, thanks @eizan!

This has been submitted: https://reviews.llvm.org/rL317310

Revision Contents

Path

Size

MicroBenchmarks/

XRay/

lit.local.cfg

1 line

litsupport/

modules/

microbenchmark.py

46 lines

Diff 117506

MicroBenchmarks/XRay/lit.local.cfg

	config.environment['XRAY_OPTIONS'] = 'patch_premain=false xray_naive_log=false'			config.environment['XRAY_OPTIONS'] = 'patch_premain=false xray_naive_log=false'
				config.test_modules.append('microbenchmark')
				MatzeBUnsubmitted Done Reply Inline Actions I think we need some more logic here, like this: test_modules = config.test_modules if 'run' in test_modules: # Insert microbenchmarks module behind 'run' test_modules.insert(test_modules.index('run')+1, 'microbenchmarks') # Timeit results are not useful for microbenchmarks if 'timeit' in test_modules: test_modules.remove('timeit') I think we should add the microbenchmarks module behind 'run' so that 'remote' or 'run_under' can come later and modify the commandline including the redirections and flags for the microbenchmarks. It's probably best to remove the 'timeit' module as the number isn't really useful for microbenchmarks. MatzeB: I think we need some more logic here, like this: ``` test_modules = config.test_modules if…

litsupport/modules/microbenchmark.py

This file was added.

				'''Test module to collect google benchmark results.'''
				from litsupport import shellcommand
				from litsupport import testplan
				import csv
				import lit.Test


				def _mutateCommandLine(context, commandline):
				cmd = shellcommand.parse(commandline)
				cmd.arguments.append("--benchmark_format=csv")
				# We need stdout outself to get the benchmark csv data.
				if cmd.stdout is not None:
				raise Exception("Rerouting stdout not allowed for microbenchmarks")
				benchfile = context.tmpBase + '.bench.csv'
				cmd.stdout = benchfile
				context.microbenchfiles.append(benchfile)

				return cmd.toCommandline()


				def _mutateScript(context, script):
				return testplan.mutateScript(context, script, _mutateCommandLine)


				def _collectMicrobenchmarkTime(context, microbenchfiles):
				result = 0.0
				for f in microbenchfiles:
				with open(f) as inp:
				lines = csv.reader(inp)
				# First line: "name,iterations,real_time,cpu_time,time_unit..."
				for line in lines:
				if line[0] == 'name':
				continue
				# Note that we cannot create new tests here, so for now we just
				hfinkelUnsubmitted Not Done Reply Inline Actions The summary might make sense in some contexts, but why not just return multiple metrics? I'd really like to move to a model where we collect all of the microbenchmark timings individually. hfinkel: The summary might make sense in some contexts, but why not just return multiple metrics? I'd…
				MatzeBUnsubmitted Not Done Reply Inline Actions I'd like to see individual timings too; but I think putting the benchmark name as prefix to the metric name is not the right direction. LNT unfortunately has a fixed schema of metrics that are recorded. The LNT UI and homebrewn tools are probably designed to well known metrics too (some information like 'bigger/smaller is better' is derived from the name today). I think adding something to lit that allows to report multiple benchmarks/tests from a single test file can't be that hard. MatzeB: I'd like to see individual timings too; but I think putting the benchmark name as prefix to the…
				hfinkelUnsubmitted Not Done Reply Inline Actions I think adding something to lit that allows to report multiple benchmarks/tests from a single test file can't be that hard. I don't object to doing this, but it seems like the multiple-metrics-per-test is a direction in which we need to go anyway, no? I expect that we will need to update the LNT UI regardless of what we do (otherwise we could easily end up with a single huge list that makes the interface clunky). What are we going to do with all of the binary-size information that we collect? hfinkel: > I think adding something to lit that allows to report multiple benchmarks/tests from a single…
				MatzeBUnsubmitted Not Done Reply Inline Actions I don't object to doing this, but it seems like the multiple-metrics-per-test is a direction in which we need to go anyway, no? I expect that we will need to update the LNT UI regardless of what we do (otherwise we could easily end up with a single huge list that makes the interface clunky). Sure and we already have multiple metrics per test of course; I think it's compiletime, exec_time, hash and size today. However given that this is somewhat two dimensional data today and we have one dimension along test names and one along the metrics, I really think the "subbenchmarks" should end up on the test-name dimenson and not the metrics one. What are we going to do with all of the binary-size information that we collect? LNT throws all the detailed codesize metrics away and only keeps `size.__text` I think, which probably is macOS only :-( Though if you run the benchmarks yourself you can simply not use LNT and view results with something like test-suite/utils/compare.py. MatzeB: > I don't object to doing this, but it seems like the multiple-metrics-per-test is a direction…
				hfinkelUnsubmitted Not Done Reply Inline Actions Sure and we already have multiple metrics per test of course; I think it's compiletime, exec_time, hash and size today. However given that this is somewhat two dimensional data today and we have one dimension along test names and one along the metrics, I really think the "subbenchmarks" should end up on the test-name dimenson and not the metrics one. Okay. At some point we may want something more hierarchical, but we can also cross that bridge if/when we come to it. LNT throws all the detailed codesize metrics away and only keeps size.__text I think, which probably is macOS only :-( Yes. And that's certainly something else that can be improved. hfinkel: > Sure and we already have multiple metrics per test of course; I think it's compiletime…
				# add up all the numbers here.
				result += float(line[3])
				return {'microbenchmark_time_ns': lit.Test.toMetricValue(result)}


				def mutatePlan(context, plan):
				context.microbenchfiles = []
				plan.runscript = _mutateScript(context, plan.runscript)
				plan.metric_collectors.append(
				lambda context: _collectMicrobenchmarkTime(context,
				context.microbenchfiles)
				)