This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
utils/
1/3
compare.py

Differential D144101

[test-suite] Increase the --filter-short threshold
ClosedPublic

Authored by SjoerdMeijer on Feb 15 2023, 6:14 AM.

Download Raw Diff

Details

Reviewers

fhahn
arichardson
MatzeB
aemerson
rengolin

Commits

rTf36619ce1b38: compare.py: increase --filter-short threshold, and accept optional argument

Summary

This proposes to increase the --filter-short threshold from 0.6 to 1.0 seconds. The reason is that the LLVM test-suite is quite noisy and this is one mitigation for that. I.e., if we filter out all apps with a runtime of less than a second, then the variation between runs is a lot less and we get less noisy results. I appreciate this replaces one arbitrary number with another arbitray number, but a runtime of at least 1 second for a benchmark seemed reasonable to me.

As a result, less results/benchmarks will be reported. On the system that I tested it, this means that for the MultiSource benchmarks the following apps will be excluded in addition to the ones that already got excluded with --filter-short:

Program                                         exec_time
Benchmarks/DOE-ProxyApps-C/CoMD/CoMD            0.92
Benchmarks...VC/Expansion-flt/Expansion-flt     0.84
Applications/viterbi/viterbi                    0.78
Benchmarks...ataFlow-flt/GlobalDataFlow-flt     0.76
Benchmarks/Trimaran/enc-rc4/enc-rc4             0.74
Benchmarks/BitBench/five11/five11               0.97
Benchmarks/Prolangs-C++/life/life               0.65
Benchmarks...VC/Symbolics-flt/Symbolics-flt     0.73
Benchmarks...Fhourstones-3.1/fhourstones3.1     0.69
Benchmarks/VersaBench/dbms/dbms                 0.85
Benchmarks.../DOE-ProxyApps-C++/CLAMR/CLAMR     0.81
Benchmarks...roxyApps-C/SimpleMOC/SimpleMOC     0.88
Benchmarks...alencing-dbl/Equivalencing-dbl     0.63
Benchmarks/nbench/nbench                        0.89
Benchmarks/ASCI_Purple/SMG2000/smg2000          0.94
Benchmarks...aran/netbench-crc/netbench-crc     0.66

Diff Detail

Repository: rT test-suite

Event Timeline

SjoerdMeijer created this revision.Feb 15 2023, 6:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 6:14 AM

Herald added a subscriber: StephenFan. · View Herald Transcript

SjoerdMeijer requested review of this revision.Feb 15 2023, 6:14 AM

Harbormaster completed remote builds in B213884: Diff 497657.Feb 15 2023, 6:15 AM

SjoerdMeijer added a reviewer: rengolin.Feb 16 2023, 5:39 AM

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

In D144101#4174508, @fhahn wrote:

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

I like this idea. Where would the argument come from in the buildbots? CMake?

In D144101#4174512, @rengolin wrote:

In D144101#4174508, @fhahn wrote:

It would probably be even better to make the threshold an argument (perhaps pass the threshold optionally like --filter-short=N), with the default being 1.0?

I like this idea. Where would the argument come from in the buildbots? CMake?

I think the script is mostly (only?) used for local analysis of the results.json files generated by LNT/running the benchmarks using llvm-lit, so I would expect this to be provided by the person running the script in most cases. I might be missing some potential use cases though.

Thanks for the comments, I also like this idea!
I will prepare a new revision to support this.

Option --filter-short now accepts an optional arguments, and it defaults to 1.0s.
Some special care had to be taken if this optional argument is omitted, it then needs to recognise that the FILE arguments is not the optional argument to --filter-short, as also explained in the comments.

Harbormaster completed remote builds in B218346: Diff 503703.Mar 9 2023, 2:24 AM

rengolin added inline comments.Mar 9 2023, 3:32 AM

utils/compare.py
342	what happens to `filter_short_threshold` on exception? Is it a stable behaviour?

SjoerdMeijer added inline comments.Mar 9 2023, 3:47 AM

utils/compare.py
342	It defaults to `filter_short_threshold = 1.0` on line 329, so that should always be defined (but not always used). I think that makes it stable behaviour, but I am not a Python programmer, so might be missing something. Let me know what you think, I could do some more python research on try/except behaviour.

This looks good to me. @fhahn ?

utils/compare.py
342	Yeah, just worried the assignment won't occur before `float()` succeeds. If you pass "foobar" as the option and it's still `1.0`, than that's good enough. :)

Thanks for your help and reviews! I will commit this tomorrow if there will be no further comments.

This revision was not accepted when it landed; it landed in state Needs Review.Mar 20 2023, 6:11 AM

This revision was landed with ongoing or failed builds.

Closed by commit rTf36619ce1b38: compare.py: increase --filter-short threshold, and accept optional argument (authored by SjoerdMeijer). · Explain Why

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rTf36619ce1b38: compare.py: increase --filter-short threshold, and accept optional argument.

Revision Contents

Path

Size

utils/

compare.py

27 lines

Diff 506560

utils/compare.py

Show First 20 Lines • Show All 148 Lines • ▼ Show 20 Lines	for metric in metrics:
gm_diff = stats.gmean(relative.dropna()) - 1.0		gm_diff = stats.gmean(relative.dropna()) - 1.0
gm[(metric, 'diff')] = gm_diff		gm[(metric, 'diff')] = gm_diff
gm.Program = GEOMEAN_ROW		gm.Program = GEOMEAN_ROW
return pd.concat([dataout, gm])		return pd.concat([dataout, gm])

def filter_failed(data, key='Exec'):		def filter_failed(data, key='Exec'):
return data.loc[data[key] == "pass"]		return data.loc[data[key] == "pass"]

def filter_short(data, key='Exec_Time', threshold=0.6):		def filter_short(data, threshold, key='Exec_Time'):
return data.loc[data[key] >= threshold]		return data.loc[data[key] >= threshold]

def filter_same_hash(data, key='hash'):		def filter_same_hash(data, key='hash'):
assert key in data.columns		assert key in data.columns
assert data.index.get_level_values(0).nunique() > 1		assert data.index.get_level_values(0).nunique() > 1

return data.groupby(level=1).filter(lambda x: x[key].nunique() != 1)		return data.groupby(level=1).filter(lambda x: x[key].nunique() != 1)

▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	def main():
parser.add_argument('-f', '--full', action='store_true')		parser.add_argument('-f', '--full', action='store_true')
parser.add_argument('-m', '--metric', action='append', dest='metrics',		parser.add_argument('-m', '--metric', action='append', dest='metrics',
default=[])		default=[])
parser.add_argument('--nodiff', action='store_false', dest='show_diff',		parser.add_argument('--nodiff', action='store_false', dest='show_diff',
default=None)		default=None)
parser.add_argument('--diff', action='store_true', dest='show_diff')		parser.add_argument('--diff', action='store_true', dest='show_diff')
parser.add_argument('--absolute-diff', action='store_true',		parser.add_argument('--absolute-diff', action='store_true',
help='Use an absolute instead of a relative difference')		help='Use an absolute instead of a relative difference')
parser.add_argument('--filter-short', action='store_true',		parser.add_argument('--filter-short', nargs='?',
dest='filter_short')		dest='filter_short', default=None,
		help="Filter benchmarks with execution times less than N seconds (default 1.0s)")
parser.add_argument('--no-filter-failed', action='store_false',		parser.add_argument('--no-filter-failed', action='store_false',
dest='filter_failed', default=True)		dest='filter_failed', default=True)
parser.add_argument('--filter-hash', action='store_true',		parser.add_argument('--filter-hash', action='store_true',
dest='filter_hash', default=False)		dest='filter_hash', default=False)
parser.add_argument('--filter-blacklist',		parser.add_argument('--filter-blacklist',
dest='filter_blacklist', default=None)		dest='filter_blacklist', default=None)
parser.add_argument('--merge-average', action='store_const',		parser.add_argument('--merge-average', action='store_const',
dest='merge_function', const=pd.DataFrame.mean,		dest='merge_function', const=pd.DataFrame.mean,
Show All 11 Lines	parser.add_argument('--minimal-names', action='store_true',
dest='minimal_names', default=False)		dest='minimal_names', default=False)
parser.add_argument('--no-abs-sort', action='store_true',		parser.add_argument('--no-abs-sort', action='store_true',
dest='no_abs_sort', default=False, help="Don't use abs() when sorting results")		dest='no_abs_sort', default=False, help="Don't use abs() when sorting results")
config = parser.parse_args()		config = parser.parse_args()

if config.show_diff is None:		if config.show_diff is None:
config.show_diff = len(config.files) > 1		config.show_diff = len(config.files) > 1

		# If only --filter-short is provided, i.e. its optional argument is
		# omitted, we default to threshold of 1 second to filter out apps and
		# results with a execution time less than that.
		filter_short_threshold = 1.0

		# If the optional argument to --filter-short is omitted, we need to take
		# care of this case and command line:
		# --filter-short FILE [FILE ...]
		# I.e., we need to recognise that FILE is not the optional argument to
		# --filter-short. The way we do this, is to try converting the option value
		# to a float, and if that fails, we insert it back into the files list (in
		# the first position).
		if config.filter_short is not None:
		try:
		filter_short_threshold = float(config.filter_short)
		except:
		config.files.insert(0, config.filter_short)
		rengolinUnsubmitted Not Done Reply Inline Actions what happens to `filter_short_threshold` on exception? Is it a stable behaviour? rengolin: what happens to `filter_short_threshold` on exception? Is it a stable behaviour?
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions It defaults to `filter_short_threshold = 1.0` on line 329, so that should always be defined (but not always used). I think that makes it stable behaviour, but I am not a Python programmer, so might be missing something. Let me know what you think, I could do some more python research on try/except behaviour. SjoerdMeijer: It defaults to `filter_short_threshold = 1.0` on line 329, so that should always be defined…
		rengolinUnsubmitted Not Done Reply Inline Actions Yeah, just worried the assignment won't occur before `float()` succeeds. If you pass "foobar" as the option and it's still `1.0`, than that's good enough. :) rengolin: Yeah, just worried the assignment won't occur before `float()` succeeds. If you pass "foobar"…

# Read inputs		# Read inputs
files = config.files		files = config.files
if "vs" in files:		if "vs" in files:
split = files.index("vs")		split = files.index("vs")
lhs = files[0:split]		lhs = files[0:split]
rhs = files[split+1:]		rhs = files[split+1:]

# Filter minimum of lhs and rhs		# Filter minimum of lhs and rhs
Show All 35 Lines	def main():
initial_size = len(proggroup.indices)		initial_size = len(proggroup.indices)
print("Tests: %s" % (initial_size,))		print("Tests: %s" % (initial_size,))
if config.filter_failed and hasattr(data, 'Exec'):		if config.filter_failed and hasattr(data, 'Exec'):
newdata = filter_failed(data)		newdata = filter_failed(data)
print_filter_stats("Failed", data, newdata)		print_filter_stats("Failed", data, newdata)
newdata = newdata.drop('Exec', 1)		newdata = newdata.drop('Exec', 1)
data = newdata		data = newdata
if config.filter_short:		if config.filter_short:
newdata = filter_short(data, metric)		newdata = filter_short(data, filter_short_threshold, metric)
print_filter_stats("Short Running", data, newdata)		print_filter_stats("Short Running", data, newdata)
data = newdata		data = newdata
if config.filter_hash and 'hash' in data.columns and \		if config.filter_hash and 'hash' in data.columns and \
data.index.get_level_values(0).nunique() > 1:		data.index.get_level_values(0).nunique() > 1:
newdata = filter_same_hash(data)		newdata = filter_same_hash(data)
print_filter_stats("Same hash", data, newdata)		print_filter_stats("Same hash", data, newdata)
data = newdata		data = newdata
if config.filter_blacklist:		if config.filter_blacklist:
Show All 35 Lines