Download Raw Diff

Details

Reviewers

MatzeB
hfinkel
rengolin
delcypher

Commits

rGd5c558ff216c: [lit] - Allow 1 test to report multiple micro-test results to provide support…
rL327422: [lit] - Allow 1 test to report multiple micro-test results to provide support…

Summary

These changes are to allow to a Result object to have nested Result objects in order to support microbenchmarks. Currently lit is restricted to reporting one result object for one test, this change provides support for when a test that wants to report individual timings for individual kernels.

This is revision is result of the discussions in https://reviews.llvm.org/D32272#794759, https://reviews.llvm.org/D37421#f8003b27 and https://reviews.llvm.org/D38496 that I have seen. It is a separation of the changes purposed in https://reviews.llvm.org/D40077.

With this change I will be adding the LCALS (Livermore Compiler Analysis Loop Suite) collection of loop kernels to the llvm test suite using the google benchmark library (https://reviews.llvm.org/D43319).

Previously microbenchmarks had been handled by using macros to section groups of microbenchmarks together and build many executables while still getting a grouped timing (MultiSource/TSVC). Recently the google benchmark library was added to the test suite and utilized with a litsupport plugin. However the limitation of 1 test 1 result limited its use to passing a runtime option to run only 1 microbenchmark with several hand written tests (MicroBenchmarks/XRay). This runs the same executable many times with different hand written tests. I will update the litsupport plugin to utilize the new functionality (https://reviews.llvm.org/D43316).

These changes allow lit to report micro test results if desired in order to get many precise timing results from 1 run of 1 test executable.

Example Output from LCALS:

Terminal:

PASS: test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test (1 of 1)
********** TEST 'test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test' RESULTS **********
MicroBenchmarks: 12 
compile_time: 6.9610 
hash: "5075a3ae907cf9631cdc4cf8401cbfb3" 
link_time: 0.0426 
**********
*** MICRO-TEST: BM_IF_QUAD_RAW/171
    exec_time:  2.6995 
*** MICRO-TEST: BM_IF_QUAD_RAW/44217
    exec_time:  698.8880 
*** MICRO-TEST: BM_IF_QUAD_RAW/5001
    exec_time:  78.9838 
*** MICRO-TEST: BM_INIT3_RAW/171
    exec_time:  0.2248 
*** MICRO-TEST: BM_INIT3_RAW/44217
    exec_time:  168.0970 
*** MICRO-TEST: BM_INIT3_RAW/5001
    exec_time:  15.1119 
*** MICRO-TEST: BM_MULADDSUB_RAW/171
    exec_time:  0.4491 
*** MICRO-TEST: BM_MULADDSUB_RAW/44217
    exec_time:  169.6760 
*** MICRO-TEST: BM_MULADDSUB_RAW/5001
    exec_time:  16.1443 
*** MICRO-TEST: BM_TRAP_INT_RAW/171
    exec_time:  2.0922 
*** MICRO-TEST: BM_TRAP_INT_RAW/44217
    exec_time:  540.9620 
*** MICRO-TEST: BM_TRAP_INT_RAW/5001
    exec_time:  61.1846

Partial JSON output:

"tests": [
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 540.962
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_TRAP_INT_RAW/44217",
    "output": ""
  },
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 2.6995
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_IF_QUAD_RAW/171",
    "output": ""
  },
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 169.676
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_MULADDSUB_RAW/44217",
    "output": ""
  },
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 168.097
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_INIT3_RAW/44217",
    "output": ""
  },
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 16.1443
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_MULADDSUB_RAW/5001",
    "output": ""
  },
  {
    "code": "PASS",
    "elapsed": null,
    "metrics": {
      "exec_time": 61.1846
    },
    "name": "test-suite :: MicroBenchmarks/LCALS/SubsetBRawLoops/lcalsBRaw.test:BM_TRAP_INT_RAW/5001",
    "output": ""
  },

Diff Detail

Event Timeline

homerdin created this revision.Feb 14 2018, 2:09 PM

Herald added a subscriber: delcypher. · View Herald TranscriptFeb 14 2018, 2:09 PM

homerdin edited the summary of this revision. (Show Details)Feb 14 2018, 2:42 PM

Please add at least one test to lit's test suite for the functionality you're adding. As this patch stands right now it's not obvious how to use this new functionality. A test would demonstrate how this feature is intended to be used and will ensure that if someone breaks your new feature that it will be noticed. Documentation would be great but that's probably asking too much.

This revision now requires changes to proceed.Feb 15 2018, 1:13 AM

I think this patch should be merged with D43316 as a first example, and then used in D43319 as a proper example. Once D43316 is approved, D43319 should wait until a few green builds are out on the LNT bots to go, once approved.

Detecting LNT errors in the bots is not a trivial task sometimes and pushing only one behavioural change at a time helps a lot.

I've also added a few comments on the Python code.

Finally, please use diff -U9999 to show the context before uploading the diff. This is hard to review as it is.

Thanks,
--renato

utils/lit/lit/main.py
86	IIRC Python 2.x and 3.x behave different regarding `.items()`. Do we force Python 3 for LIT?
89	We already have the ** enclosure for the whole test, I think this is a bit too verbose. I'd suggest something like `* MICRO-TEST: micro_test_name`
91	guard with `if micro_test.metrics:`
93	Same as above, we don't need more *** :)
130	What's with the [:-5] ?

I updated to a full context diff and have made some of the python changes requested. I changed the way the name is formed to make it clearer.

@delcypher I had misread your request for tests that show what I am trying to do, which is why I opened the revision for LCALS. I will add a test to lit's test suite for these changes. I'll also look into the documentation and see what I can contribute there.

I still need to look into the Python 2.x and 3.x behavior regarding .items() and if we force Python 3 for LIT.

homerdin marked 3 inline comments as done.Feb 15 2018, 8:27 AM

homerdin added inline comments.

utils/lit/lit/main.py
130	I changed the way I build the name to be clearer. Was using [:-5] (.test) to expand the parent test name with the micro test name before ".test"

homerdin added inline comments.Feb 15 2018, 9:39 AM

utils/lit/lit/main.py
130	Realizing this is incorrect as it only works with ".test" files. Need a safer way to expand the test name.

rengolin added inline comments.Feb 15 2018, 2:16 PM

utils/lit/lit/main.py
130	regular expression would do nicely

I added tests for the console and JSON output of micro-tests. Also made a change to the way the test name is expanded.

rengolin added inline comments.Feb 19 2018, 7:26 AM

utils/lit/tests/test-data-micro.py
4 ↗	(On Diff #134907)	Join these two lines with a pipe, otherwise .out files will be kept and can break further tests.
14 ↗	(On Diff #134907)	Both micro-test and micro-results are being stored in dictionaries, that means their order is not guaranteed. I assume that's the reason for the `{{[0-2]}}` regex here, but a similar solution would be needed for the values as well. I recommend you look at CHECK-DAG: CHECK-NEXT: *** MICRO-TEST: test{{[0-2]}} CHECK-DAG: micro_value0: 4 # CHECK-DAG: micro_value1: 1.3 This means there are two lines after `MICRO-TEST` which have those two values, in any order. It could match to a list after the next `MICRO-TEST`, however, if you have three `CHECK-NEXT:` of `MICRO-TEST`, then the last one would fail if you match the `CHECK-DAG` on the wrong line, so you're covered. Just repeat that pattern three times and it should be fine.
utils/lit/tests/test-output-micro.py
1 ↗	(On Diff #134907)	Same thing here, join the lines with a pipe.
9 ↗	(On Diff #134907)	Here, the `CHECK-DAG` trick won't work, because you have two levels: one for the result and another for the metrics, and there's no way to identify `CHECK-DAG1` and `CHECK-DAG2`. One way to solve this is to sort the keys before dumping JSON, as to get a reproducible answer (that is, of course, if Python's default sort is stable).

Thanks for all the feedback! I've made some changes to the tests:

Updated test-data-micro test to use CHECK-DAG and piped output
Updated test-output-micro test to use CHECK-DAG. I think this should work since parent result test-data-micro :: micro-tests.ini will always be appended to the end of the list after all micro-tests. This is testing the lit's output file so I sent the stdout to /dev/null and explicitly removed the output file to clean the test up.

Looks nearly good to me. I'd just opt for a different naming of the sub-tests. Also some nitpicks/comments on other nitpicks below:

utils/lit/lit/main.py
86	`.items()` use should be fine here (it'll give you a full copy in python2 and a lazy iterator in python3. But except for higher memory they should behave the same, and this doesn't look like we care about the memory usage of a few entries).
134	Appending the subtest name before .test feels odd to me as there is no actual file with the name `testname/microtest.test` how about `testname.test:microtest` instead?
utils/lit/tests/test-data-micro.py
14 ↗	(On Diff #134907)	Actually I would recommend changing the printing code above to something like: sorted_results = sorted(test.result.microResults.items()) for key, micro_test in sorted_results: to avoid the indeterministic output.
utils/lit/tests/test-output-micro.py
9 ↗	(On Diff #134907)	lit does the json dumping with `sort_keys=True` so the output should be deterministic so we don't need the `CHECK-DAG`s.

Thanks for the comments!

Updated micro test name expansion to be ParentTest.test:MicroTest
Updated Console output to be deterministic and removed the CHECK-DAGS from the tests
Added Updated Output in the Summary

Thanks @MatzeB, apart from the missing pipe in the RUN line, I'm happy when you are.

cheers,
--renato

utils/lit/tests/test-data-micro.py
14 ↗	(On Diff #134907)	Indeed, better.
utils/lit/tests/test-output-micro.py
1 ↗	(On Diff #134907)	Forgot the pipe here. :)
9 ↗	(On Diff #134907)	Ah, perfect!

homerdin added inline comments.Feb 27 2018, 8:29 AM

utils/lit/tests/test-output-micro.py
1 ↗	(On Diff #134907)	This test is checking the output file that `lit` writes, is there a way to pipe the output file to `FileCheck`?

rengolin added inline comments.Feb 27 2018, 2:06 PM

utils/lit/tests/test-output-micro.py
1 ↗	(On Diff #134907)	Yup. Try `--output -` for stdout, or just omit `--output` and see if it goes automatically to stdout.

Tried some things and was able to get it working using /dev/stdout.

In D43314#1021294, @homerdin wrote:

Tried some things and was able to get it working using /dev/stdout.

That doesn't work on Windows, may not work on Mac?

In D43314#1025162, @rengolin wrote:

In D43314#1021294, @homerdin wrote:

Tried some things and was able to get it working using /dev/stdout.

That doesn't work on Windows, may not work on Mac?

Should I change it back to rm %t.results.out after FileCheck? I'm not sure how to do this with a pipe.

In D43314#1031716, @homerdin wrote:

In D43314#1025162, @rengolin wrote:

In D43314#1021294, @homerdin wrote:

Tried some things and was able to get it working using /dev/stdout.

That doesn't work on Windows, may not work on Mac?

Should I change it back to rm %t.results.out after FileCheck? I'm not sure how to do this with a pipe.

Yes, given that lit has nothing like -o - writing a temporary file is fine.

And I think most tests don't clean the temporary files as part of the test. I guess it's better for debugging in case of a failure if the file is still around and as the temp files are part of the build directory they will be cleaned up when the build directory is removed as a whole.

I think it should be fine if it's not in the list of patterns recognised as tests...

I believe I have made all the requested changes. Thank you for all the feedback!

LGTM with the sorting comment. Thanks!

utils/lit/lit/main.py
130	Don't you need to sort these ones, too?

rengolin mentioned this in D43316: [test-suite] Update litsupport/module/microbenchmark.py to report individual timing results from 1 test..Mar 9 2018, 5:26 AM

homerdin added inline comments.Mar 9 2018, 6:53 AM

utils/lit/lit/main.py
130	Both the `micro_test_data` and `micro_metrics_data` dictionaries are sorted when dumped into the json file and are not used elsewhere.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 13 2018, 9:40 AM

Closed by commit rL327422: [lit] - Allow 1 test to report multiple micro-test results to provide support… (authored by homerdin). · Explain Why

This revision was automatically updated to reflect the committed changes.

lebedev.ri added a subscriber: lebedev.ri.Jun 8 2019, 3:45 AM

lebedev.ri added inline comments.

llvm/trunk/utils/lit/lit/Test.py
163–164 ↗	(On Diff #138213)	Was this restriction motivated? Does LNT fail to import multiple results with the same name from a single `.json`?

Herald added a project: Restricted Project. · View Herald TranscriptJun 8 2019, 3:45 AM

homerdin marked an inline comment as done.Jun 10 2019, 7:29 AM

homerdin added inline comments.

llvm/trunk/utils/lit/lit/Test.py
163–164 ↗	(On Diff #138213)	LNT accepts multiple results with the same name in a `.json` from multiple runs `--exec-multisample` as a combination of the individual lit outputs. If we give two tests the same name in the test suite we would be combining results from two separate tests. If you want to get multiple samples you can use `--exec-multisample`. Also google benchmark is already reporting the mean of many runs. If you are wondering if you can add a micro-test result with the same name as a micro-test result that runs under a different executable that works because the names are extended with the parent's.

lebedev.ri marked an inline comment as done.Jun 10 2019, 7:42 AM

lebedev.ri added inline comments.

llvm/trunk/utils/lit/lit/Test.py
163–164 ↗	(On Diff #138213)	Thanks. I was actually thinking about using `benchmark`'s repetitions instead of `--exec-multisample`, but it appears that doesn't work, can't pass a list of measurements into `addMicroResult()`.

Diff 134434

utils/lit/lit/Test.py

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	def __init__(self, code, output='', elapsed=None):
# The result code.		# The result code.
self.code = code		self.code = code
# The test output.		# The test output.
self.output = output		self.output = output
# The wall timing to execute the test, if timing.		# The wall timing to execute the test, if timing.
self.elapsed = elapsed		self.elapsed = elapsed
# The metrics reported by this test.		# The metrics reported by this test.
self.metrics = {}		self.metrics = {}
		# The micro-test results reported by this test.
		self.microResults = {}

def addMetric(self, name, value):		def addMetric(self, name, value):
"""		"""
addMetric(name, value)		addMetric(name, value)

Attach a test metric to the test result, with the given name and list of		Attach a test metric to the test result, with the given name and list of
values. It is an error to attempt to attach the metrics with the same		values. It is an error to attempt to attach the metrics with the same
name multiple times.		name multiple times.

Each value must be an instance of a MetricValue subclass.		Each value must be an instance of a MetricValue subclass.
"""		"""
if name in self.metrics:		if name in self.metrics:
raise ValueError("result already includes metrics for %r" % (		raise ValueError("result already includes metrics for %r" % (
name,))		name,))
if not isinstance(value, MetricValue):		if not isinstance(value, MetricValue):
raise TypeError("unexpected metric value: %r" % (value,))		raise TypeError("unexpected metric value: %r" % (value,))
self.metrics[name] = value		self.metrics[name] = value

		def addMicroResult(self, name, microResult):
		"""
		addMicroResult(microResult)

		Attach a micro-test result to the test result, with the given name and
		result. It is an error to attempt to attach a micro-test with the
		same name multiple times.

		Each micro-test result must be an instance of the Result class.
		"""
		if name in self.microResults:
		raise ValueError("Result already includes microResult for %r" % (
		name,))
		if not isinstance(microResult, Result):
		raise TypeError("unexpected MicroResult value %r" % (microResult,))
		self.microResults[name] = microResult


# Test classes.		# Test classes.

class TestSuite:		class TestSuite:
"""TestSuite - Information on a group of tests.		"""TestSuite - Information on a group of tests.

A test suite groups together a set of logically related tests.		A test suite groups together a set of logically related tests.
"""		"""

▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

utils/lit/lit/main.py

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	def update(self, test):
if test.result.metrics:		if test.result.metrics:
print("%s TEST '%s' RESULTS %s" % (''10, test.getFullName(),		print("%s TEST '%s' RESULTS %s" % (''10, test.getFullName(),
''10))		''10))
items = sorted(test.result.metrics.items())		items = sorted(test.result.metrics.items())
for metric_name, value in items:		for metric_name, value in items:
print('%s: %s ' % (metric_name, value.format()))		print('%s: %s ' % (metric_name, value.format()))
print("" 10)		print("" 10)

		# Report micro-tests, if present
		if test.result.microResults:
		items = sorted(test.result.microResults.items())
		rengolinUnsubmitted Not Done Reply Inline Actions IIRC Python 2.x and 3.x behave different regarding `.items()`. Do we force Python 3 for LIT? rengolin: IIRC Python 2.x and 3.x behave different regarding `.items()`. Do we force Python 3 for LIT?
		MatzeBUnsubmitted Not Done Reply Inline Actions `.items()` use should be fine here (it'll give you a full copy in python2 and a lazy iterator in python3. But except for higher memory they should behave the same, and this doesn't look like we care about the memory usage of a few entries). MatzeB: `.items()` use should be fine here (it'll give you a full copy in python2 and a lazy iterator…
		for micro_test_name, micro_test in items:
		print("%s MICRO-TEST: %s" %
		(''3, micro_test_name))
		rengolinUnsubmitted Done Reply Inline Actions We already have the ** enclosure for the whole test, I think this is a bit too verbose. I'd suggest something like `* MICRO-TEST: micro_test_name` rengolin: We already have the **** enclosure for the whole test, I think this is a bit too verbose. I'd…

		if micro_test.metrics:
		rengolinUnsubmitted Done Reply Inline Actions guard with `if micro_test.metrics:` rengolin: guard with `if micro_test.metrics:`
		for metric_name, value in micro_test.metrics.items():
		print(' %s: %s ' % (metric_name, value.format()))
		rengolinUnsubmitted Done Reply Inline Actions Same as above, we don't need more * :) rengolin: Same as above, we don't need more * :)

# Ensure the output is flushed.		# Ensure the output is flushed.
sys.stdout.flush()		sys.stdout.flush()

def write_test_results(run, lit_config, testing_time, output_path):		def write_test_results(run, lit_config, testing_time, output_path):
try:		try:
import json		import json
except ImportError:		except ImportError:
lit_config.fatal('test output unsupported with Python 2.5')		lit_config.fatal('test output unsupported with Python 2.5')
Show All 16 Lines	for test in run.tests:
'elapsed' : test.result.elapsed }		'elapsed' : test.result.elapsed }

# Add test metrics, if present.		# Add test metrics, if present.
if test.result.metrics:		if test.result.metrics:
test_data['metrics'] = metrics_data = {}		test_data['metrics'] = metrics_data = {}
for key, value in test.result.metrics.items():		for key, value in test.result.metrics.items():
metrics_data[key] = value.todata()		metrics_data[key] = value.todata()

		# Report micro-tests separately, if present
		if test.result.microResults:
		for key, micro_test in test.result.microResults.items():
		# Expand parent test name with micro test name
		rengolinUnsubmitted Not Done Reply Inline Actions What's with the [:-5] ? rengolin: What's with the [:-5] ?
		homerdinAuthorUnsubmitted Not Done Reply Inline Actions I changed the way I build the name to be clearer. Was using [:-5] (.test) to expand the parent test name with the micro test name before ".test" homerdin: I changed the way I build the name to be clearer. Was using [:-5] (.test) to expand the parent…
		homerdinAuthorUnsubmitted Not Done Reply Inline Actions Realizing this is incorrect as it only works with ".test" files. Need a safer way to expand the test name. homerdin: Realizing this is incorrect as it only works with ".test" files. Need a safer way to expand…
		rengolinUnsubmitted Not Done Reply Inline Actions regular expression would do nicely rengolin: regular expression would do nicely
		rengolinUnsubmitted Not Done Reply Inline Actions Don't you need to sort these ones, too? rengolin: Don't you need to sort these ones, too?
		homerdinAuthorUnsubmitted Not Done Reply Inline Actions Both the `micro_test_data` and `micro_metrics_data` dictionaries are sorted when dumped into the json file and are not used elsewhere. homerdin: Both the `micro_test_data` and `micro_metrics_data` dictionaries are sorted when dumped into…
		micro_test_name = '/' + key + '.test'
		micro_full_name = test.getFullName().replace(".test",
		micro_test_name)

		MatzeBUnsubmitted Not Done Reply Inline Actions Appending the subtest name before .test feels odd to me as there is no actual file with the name `testname/microtest.test` how about `testname.test:microtest` instead? MatzeB: Appending the subtest name before .test feels odd to me as there is no actual file with the…
		micro_test_data = {
		'name' : micro_full_name,
		'code' : micro_test.code.name,
		'output' : micro_test.output,
		'elapsed' : micro_test.elapsed }
		if micro_test.metrics:
		micro_test_data['metrics'] = micro_metrics_data = {}
		for key, value in micro_test.metrics.items():
		micro_metrics_data[key] = value.todata()

		tests_data.append(micro_test_data)

tests_data.append(test_data)		tests_data.append(test_data)

# Write the output.		# Write the output.
f = open(output_path, 'w')		f = open(output_path, 'w')
try:		try:
json.dump(data, f, indent=2, sort_keys=True)		json.dump(data, f, indent=2, sort_keys=True)
f.write('\n')		f.write('\n')
finally:		finally:
▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[lit] - Allow 1 test to report multiple micro-test results to provide support for microbenchmarks.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 134434

utils/lit/lit/Test.py

utils/lit/lit/main.py

This is an archive of the discontinued LLVM Phabricator instance.

[lit] - Allow 1 test to report multiple micro-test results to provide support for microbenchmarks.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 134434

utils/lit/lit/Test.py

utils/lit/lit/main.py

[lit] - Allow 1 test to report multiple micro-test results to provide support for microbenchmarks.
ClosedPublic