This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
MicroBenchmarks/XRay/
-
XRay/
-
FDRMode/
-
CMakeLists.txt
-
ReturnReference/
-
CMakeLists.txt
-
litsupport/
-
modules/
-
microbenchmark.py
-
testplan.py
-
utils/lit/lit/
-
lit/
-
lit/
-
Test.py
-
main.py

Differential D40077

[lit][test-suite] - Allow 1 test to report multiple individual test results
AbandonedPublic

Authored by homerdin on Nov 15 2017, 7:16 AM.

Download Raw Diff

Details

Reviewers

hfinkel
MatzeB

Summary

Hello,

Based on the suggestion from @MatzeB, I've added the logic to allow a Result object to have nested Result objects to report microbenchmarks. I have also made the change to report these tests individually in the json output. This will allow tests to report separate results when desired.

Do we need to add a way to store the unit of measurement? The exec_time shown below are mean cpu_time in usec.
I've altered the microbenchmark in the test-suite to test and we get the following outputs:

lit cmd line output

-- Testing: 1 tests, 1 threads --
PASS: test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test (1 of 1)
********** TEST 'test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test' RESULTS **********
compile_time: 0.8284 
hash: "daec0d37414da26fdae56f0d9bfa2cc0" 
iterations: 10 
link_time: 0.0587 
**********
*** MICRO-TEST 'BM_RDTSCP_Cost RESULTS ***
    exec_time:  14.1476 
    std_dev:  0.0003 
*** MICRO-TEST 'BM_ReturnInstrumentedPatched RESULTS ***
    exec_time:  12.3221 
    std_dev:  0.0002 
*** MICRO-TEST 'BM_ReturnInstrumentedPatchedThenUnpatched RESULTS ***
    exec_time:  2.2820 
    std_dev:  0.0001 
*** MICRO-TEST 'BM_ReturnInstrumentedPatchedWithLogHandler RESULTS ***
    exec_time:  42.8986 
    std_dev:  0.0008 
*** MICRO-TEST 'BM_ReturnInstrumentedUnPatched RESULTS ***
    exec_time:  3.4562 
    std_dev:  0.0050 
*** MICRO-TEST 'BM_ReturnNeverInstrumented RESULTS ***
    exec_time:  1.8256 
    std_dev:  0.0001 
**********
Testing Time: 43.79s
  Expected Passes    : 1

lit json output

{
  "__version__": [
    0,
    6,
    0
  ],
  "elapsed": 43.78972411155701,
  "tests": [
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 3.45623,
        "std_dev": 0.00501622
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedUnPatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 2.28196,
        "std_dev": 6.62184e-05
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatchedThenUnpatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 42.8986,
        "std_dev": 0.000839534
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatchedWithLogHandler.test",
      "output": ""
    },
    { 
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 14.1476,
        "std_dev": 0.000291057
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_RDTSCP_Cost.test",
      "output": ""
    },
    { 
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 1.82558,
        "std_dev": 5.89162e-05
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnNeverInstrumented.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elasped": null,
      "metrics": {
        "exec_time": 12.3221,
        "std_dev": 0.000165266
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.BM_ReturnInstrumentedPatched.test",
      "output": ""
    },
    {
      "code": "PASS",
      "elapsed": 43.70928406715393,
      "metrics": {
        "compile_time": 0.8284,
        "hash": "daec0d37414da26fdae56f0d9bfa2cc0",
        "iterations": 10,
        "link_time": 0.0587
      },
      "name": "test-suite :: MicroBenchmarks/XRay/ReturnReference/retref-bench.test",
      "output": "\n/home/bhomerding/build/test-suite-micro2/MicroBenchmarks/XRay/ReturnReference/retref-bench --benchmark_repetitions=10 --benchmark_format=csv --benchmark_report_aggregates_only=true > /home/bhomerding/build/test-suite-micro2/MicroBenchmarks/XRay/ReturnReference/Output/retref-bench.test.bench.csv"
    }
  ]
}

Diff Detail

Event Timeline

homerdin created this revision.Nov 15 2017, 7:16 AM

Herald added a subscriber: mgorny. · View Herald TranscriptNov 15 2017, 7:16 AM

I'm surprised that lit lets you sneak a dictionary into the reported metrics.

In general though I'd rather see us change the lit tool so we can report 1 test in the lit output for each sub-benchmark of a microbenchmark executable.

That way we would not end up with two classes of tools that understand the microbenchmark metric dictionary and the ones that don't. There are some uses outside of LNT around (test-suite/utils/compare.py is getting more popular and some buildbots look at lits xunit output).

@MatzeB Thanks for your feedback, its good to learn what some other use cases are. Sorry about the slow response. I've looked into adding this functionality to the lit tool. It looks like we would have to have the worker_run_one_test( ) return a collection of test objects or possible be turned into a generator. Then provide some logic to add the extra tests to the main thread's test list. This would have the side effect of producing more results than test discovery had found PASS: test-suite :: xyz.test (5 of 3).

Do you know if the --max-tests option is used and in what context? ( Max number of executables to run or max number of results desired )

Does this sound like a reasonable change? Would it be invasive to change the test discovery to have tests say how many results they will return and then make worker threads responsible for slices of the list?

Thanks again.

homerdin updated this revision to Diff 127571.Dec 19 2017, 11:12 AM

homerdin retitled this revision from [RFC][LNT][test-suite] - Allow 1 test to report multiple individual test results to [lit][test-suite] - Allow 1 test to report multiple individual test results.

homerdin edited the summary of this revision. (Show Details)

Ping @MatzeB

@MatzeB I've included the changes needed to the existing microbenchmark support to allow for microbenchmarks to collect individual results. This is primarily a change to the lit Result class to allow nested Result objects.

Herald added subscribers: delcypher, hintonda. · View Herald TranscriptJan 24 2018, 11:33 AM

I don't see any tests being added to lit's test suite. Please add some. This would also be help myself and others understand what you're trying to do. You're also touching code that isn't part of lit. This probably should be split into two patches, one for the changes to lit and one for the code that actually uses those changes.

I am abandoning this revision and splitting it up into smaller ones.

Changes to lit to support microbenchmarks:
https://reviews.llvm.org/D43314

Changes to microbenchmark litsupport plugin and XRay microbenchmarks:
https://reviews.llvm.org/D43316

Also a link to the LCALS (Livermore Compiler Analysis Loop Suite) revision I wish to add using these changes:
https://reviews.llvm.org/D43319

Revision Contents

Path

Size

MicroBenchmarks/

XRay/

FDRMode/

CMakeLists.txt

15 lines

ReturnReference/

CMakeLists.txt

15 lines

litsupport/

modules/

microbenchmark.py

22 lines

testplan.py

4 lines

utils/

lit/

Test.py

20 lines

main.py

28 lines

Diff 131320

MicroBenchmarks/XRay/FDRMode/CMakeLists.txt

	check_cxx_compiler_flag(-fxray-instrument COMPILER_HAS_FXRAY_INSTRUMENT)			check_cxx_compiler_flag(-fxray-instrument COMPILER_HAS_FXRAY_INSTRUMENT)
	if(ARCH STREQUAL "x86" AND COMPILER_HAS_FXRAY_INSTRUMENT)			if(ARCH STREQUAL "x86" AND COMPILER_HAS_FXRAY_INSTRUMENT)
	file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})			file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})

	list(APPEND CPPFLAGS -std=c++11 -Wl,--gc-sections -fxray-instrument)			list(APPEND CPPFLAGS -std=c++11 -Wl,--gc-sections -fxray-instrument)
	list(APPEND LDFLAGS -fxray-instrument)			list(APPEND LDFLAGS -fxray-instrument)
	llvm_test_run(--benchmark_filter=dummy_skip_ignore)			llvm_test_run(--benchmark_repetitions=10)
	llvm_test_executable(fdrmode-bench fdrmode-bench.cc)			llvm_test_executable(fdrmode-bench fdrmode-bench.cc)
	target_link_libraries(fdrmode-bench benchmark)			target_link_libraries(fdrmode-bench benchmark)

	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_1_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_2_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_4_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_8_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_16_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY fdrmode-bench_BM_XRayFDRMultiThreaded_32_thread.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	endif()			endif()

MicroBenchmarks/XRay/ReturnReference/CMakeLists.txt

	check_cxx_compiler_flag(-fxray-instrument COMPILER_HAS_FXRAY_INSTRUMENT)			check_cxx_compiler_flag(-fxray-instrument COMPILER_HAS_FXRAY_INSTRUMENT)
	if(ARCH STREQUAL "x86" AND COMPILER_HAS_FXRAY_INSTRUMENT)			if(ARCH STREQUAL "x86" AND COMPILER_HAS_FXRAY_INSTRUMENT)
	file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})			file(COPY lit.local.cfg DESTINATION ${CMAKE_CURRENT_BINARY_DIR})

	list(APPEND CPPFLAGS -std=c++11 -Wl,--gc-sections -fxray-instrument)			list(APPEND CPPFLAGS -std=c++11 -Wl,--gc-sections -fxray-instrument)
	list(APPEND LDFLAGS -fxray-instrument)			list(APPEND LDFLAGS -fxray-instrument)
	llvm_test_run(--benchmark_filter=dummy_skip_ignore)			llvm_test_run(--benchmark_repetitions=10)
	llvm_test_executable(retref-bench retref-bench.cc)			llvm_test_executable(retref-bench retref-bench.cc)
	target_link_libraries(retref-bench benchmark)			target_link_libraries(retref-bench benchmark)

	file(COPY retref-bench_BM_ReturnNeverInstrumented.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY retref-bench_BM_ReturnInstrumentedUnPatched.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY retref-bench_BM_ReturnInstrumentedPatchedThenUnpatched.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY retref-bench_BM_ReturnInstrumentedPatched.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY retref-bench_BM_RDTSCP_Cost.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	file(COPY retref-bench_BM_ReturnInstrumentedPatchedWithLogHandler.test
	DESTINATION ${CMAKE_CURRENT_BINARY_DIR})
	endif()			endif()

litsupport/modules/microbenchmark.py

	'''Test module to collect google benchmark results.'''			'''Test module to collect google benchmark results.'''
	from litsupport import shellcommand			from litsupport import shellcommand
	from litsupport import testplan			from litsupport import testplan
	import csv			import csv
	import lit.Test			import lit.Test


	def _mutateCommandLine(context, commandline):			def _mutateCommandLine(context, commandline):
	cmd = shellcommand.parse(commandline)			cmd = shellcommand.parse(commandline)
	cmd.arguments.append("--benchmark_format=csv")			cmd.arguments.append("--benchmark_format=csv")
				cmd.arguments.append("--benchmark_report_aggregates_only=true")
	# We need stdout outself to get the benchmark csv data.			# We need stdout outself to get the benchmark csv data.
	if cmd.stdout is not None:			if cmd.stdout is not None:
	raise Exception("Rerouting stdout not allowed for microbenchmarks")			raise Exception("Rerouting stdout not allowed for microbenchmarks")
	benchfile = context.tmpBase + '.bench.csv'			benchfile = context.tmpBase + '.bench.csv'
	cmd.stdout = benchfile			cmd.stdout = benchfile
	context.microbenchfiles.append(benchfile)			context.microbenchfiles.append(benchfile)

	return cmd.toCommandline()			return cmd.toCommandline()


	def _mutateScript(context, script):			def _mutateScript(context, script):
	return testplan.mutateScript(context, script, _mutateCommandLine)			return testplan.mutateScript(context, script, _mutateCommandLine)


	def _collectMicrobenchmarkTime(context, microbenchfiles):			def _collectMicrobenchmarkTime(context, microbenchfiles):
	result = 0.0
	for f in microbenchfiles:			for f in microbenchfiles:
	with open(f) as inp:			with open(f) as inp:
	lines = csv.reader(inp)			lines = csv.reader(inp)
	# First line: "name,iterations,real_time,cpu_time,time_unit..."			# First line: "name,iterations,real_time,cpu_time,time_unit..."
	for line in lines:			for line in lines:
	if line[0] == 'name':			if line[0] == 'name':
	continue			continue
	# Note that we cannot create new tests here, so for now we just			# Name for MicroBenchmark
	# add up all the numbers here.			name = line[0][:-5]
	result += float(line[3])			# Create Result object with PASS
	return {'microbenchmark_time_ns': lit.Test.toMetricValue(result)}			microBenchmark = lit.Test.Result(lit.Test.PASS)

				# Use Mean as Reported Time. Index 3 is cpu_time
				microBenchmark.addMetric('exec_time', lit.Test.toMetricValue(float(line[3])))
				microBenchmark.addMetric('iterations', lit.Test.toMetricValue(int(line[1])))
				medianLine = next(lines)
				stdDevLine = next(lines)
				microBenchmark.addMetric('std_dev', lit.Test.toMetricValue(float(stdDevLine[3])))

				# Add Micro Result
				context.micro_results[name] = microBenchmark

				return ({'MicroBenchmarks': lit.Test.toMetricValue(len(context.micro_results))})


	def mutatePlan(context, plan):			def mutatePlan(context, plan):
	context.microbenchfiles = []			context.microbenchfiles = []
	plan.runscript = _mutateScript(context, plan.runscript)			plan.runscript = _mutateScript(context, plan.runscript)
	plan.metric_collectors.append(			plan.metric_collectors.append(
	lambda context: _collectMicrobenchmarkTime(context,			lambda context: _collectMicrobenchmarkTime(context,
	context.microbenchfiles)			context.microbenchfiles)
	)			)

litsupport/testplan.py

Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	def _executePlan(context, plan):
return lit.Test.PASS		return lit.Test.PASS


def executePlanTestResult(context, testplan):		def executePlanTestResult(context, testplan):
"""Convenience function to invoke _executePlan() and construct a		"""Convenience function to invoke _executePlan() and construct a
lit.test.Result() object for the results."""		lit.test.Result() object for the results."""
context.result_output = ""		context.result_output = ""
context.result_metrics = {}		context.result_metrics = {}
		context.micro_results = {}

result_code = _executePlan(context, testplan)		result_code = _executePlan(context, testplan)

# Build test result object		# Build test result object
result = lit.Test.Result(result_code, context.result_output)		result = lit.Test.Result(result_code, context.result_output)
for key, value in context.result_metrics.items():		for key, value in context.result_metrics.items():
result.addMetric(key, value)		result.addMetric(key, value)
		for key, value in context.micro_results.items():
		result.addMicroResult(key, value)

return result		return result


def check_output(commandline, aargs, *dargs):		def check_output(commandline, aargs, *dargs):
"""Wrapper around subprocess.check_output that logs the command."""		"""Wrapper around subprocess.check_output that logs the command."""
logging.info(" ".join(commandline))		logging.info(" ".join(commandline))
return subprocess.check_output(commandline, aargs, *dargs)		return subprocess.check_output(commandline, aargs, *dargs)


def check_call(commandline, aargs, *dargs):		def check_call(commandline, aargs, *dargs):
"""Wrapper around subprocess.check_call that logs the command."""		"""Wrapper around subprocess.check_call that logs the command."""
logging.info(" ".join(commandline))		logging.info(" ".join(commandline))
return subprocess.check_call(commandline, aargs, *dargs)		return subprocess.check_call(commandline, aargs, *dargs)

utils/lit/lit/Test.py

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	def __init__(self, code, output='', elapsed=None):
# The result code.		# The result code.
self.code = code		self.code = code
# The test output.		# The test output.
self.output = output		self.output = output
# The wall timing to execute the test, if timing.		# The wall timing to execute the test, if timing.
self.elapsed = elapsed		self.elapsed = elapsed
# The metrics reported by this test.		# The metrics reported by this test.
self.metrics = {}		self.metrics = {}
		# The micro-test results reported by this test.
		self.microResults = {}

def addMetric(self, name, value):		def addMetric(self, name, value):
"""		"""
addMetric(name, value)		addMetric(name, value)

Attach a test metric to the test result, with the given name and list of		Attach a test metric to the test result, with the given name and list of
values. It is an error to attempt to attach the metrics with the same		values. It is an error to attempt to attach the metrics with the same
name multiple times.		name multiple times.

Each value must be an instance of a MetricValue subclass.		Each value must be an instance of a MetricValue subclass.
"""		"""
if name in self.metrics:		if name in self.metrics:
raise ValueError("result already includes metrics for %r" % (		raise ValueError("result already includes metrics for %r" % (
name,))		name,))
if not isinstance(value, MetricValue):		if not isinstance(value, MetricValue):
raise TypeError("unexpected metric value: %r" % (value,))		raise TypeError("unexpected metric value: %r" % (value,))
self.metrics[name] = value		self.metrics[name] = value

		def addMicroResult(self, name, microResult):
		"""
		addMicroResult(microResult)

		Attach a micro-test result to the test result, with the given name and
		result. It is an error to attempt to attach a micro-test with the
		same name multiple times.

		Each micro-test result must be an instance of the Result class.
		"""
		if name in self.microResults:
		raise ValueError("Result already includes microResult for %r" % (
		name,))
		if not isinstance(microResult, Result):
		raise TypeError("unexpected MicroResult value %r" % (microResult,))
		self.microResults[name] = microResult


# Test classes.		# Test classes.

class TestSuite:		class TestSuite:
"""TestSuite - Information on a group of tests.		"""TestSuite - Information on a group of tests.

A test suite groups together a set of logically related tests.		A test suite groups together a set of logically related tests.
"""		"""

▲ Show 20 Lines • Show All 199 Lines • Show Last 20 Lines

utils/lit/lit/main.py

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	def update(self, test):
if test.result.metrics:		if test.result.metrics:
print("%s TEST '%s' RESULTS %s" % (''10, test.getFullName(),		print("%s TEST '%s' RESULTS %s" % (''10, test.getFullName(),
''10))		''10))
items = sorted(test.result.metrics.items())		items = sorted(test.result.metrics.items())
for metric_name, value in items:		for metric_name, value in items:
print('%s: %s ' % (metric_name, value.format()))		print('%s: %s ' % (metric_name, value.format()))
print("" 10)		print("" 10)

		# Report micro-tests, if present
		if test.result.microResults:
		items = sorted(test.result.microResults.items())
		for micro_test_name, micro_test in items:
		print("%s MICRO-TEST '%s RESULTS %s" %
		(''3, micro_test_name, ''3))

		for metric_name, value in micro_test.metrics.items():
		print(' %s: %s ' % (metric_name, value.format()))
		print("" 10)

# Ensure the output is flushed.		# Ensure the output is flushed.
sys.stdout.flush()		sys.stdout.flush()

def write_test_results(run, lit_config, testing_time, output_path):		def write_test_results(run, lit_config, testing_time, output_path):
try:		try:
import json		import json
except ImportError:		except ImportError:
lit_config.fatal('test output unsupported with Python 2.5')		lit_config.fatal('test output unsupported with Python 2.5')
Show All 16 Lines	for test in run.tests:
'elapsed' : test.result.elapsed }		'elapsed' : test.result.elapsed }

# Add test metrics, if present.		# Add test metrics, if present.
if test.result.metrics:		if test.result.metrics:
test_data['metrics'] = metrics_data = {}		test_data['metrics'] = metrics_data = {}
for key, value in test.result.metrics.items():		for key, value in test.result.metrics.items():
metrics_data[key] = value.todata()		metrics_data[key] = value.todata()

		# Report micro-tests separately, if present
		if test.result.microResults:
		for key, micro_test in test.result.microResults.items():
		micro_full_name = test.getFullName()[:-4] + key + ".test"

		micro_test_data = {
		'name' : micro_full_name,
		'code' : micro_test.code.name,
		'output' : micro_test.output,
		'elapsed' : micro_test.elapsed }
		if micro_test.metrics:
		micro_test_data['metrics'] = micro_metrics_data = {}
		for key, value in micro_test.metrics.items():
		micro_metrics_data[key] = value.todata()

		tests_data.append(micro_test_data)

tests_data.append(test_data)		tests_data.append(test_data)

# Write the output.		# Write the output.
f = open(output_path, 'w')		f = open(output_path, 'w')
try:		try:
json.dump(data, f, indent=2, sort_keys=True)		json.dump(data, f, indent=2, sort_keys=True)
f.write('\n')		f.write('\n')
finally:		finally:
▲ Show 20 Lines • Show All 482 Lines • Show Last 20 Lines