This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
utils/
1/1
compare.py

Differential D121908

utils/compare.py: Fix support for multiple metrics
ClosedPublic

Authored by arichardson on Mar 17 2022, 5:39 AM.

Download Raw Diff

Details

Reviewers

fhahn
MatzeB
aemerson

Commits

rT0954371820c4: utils/compare.py: Fix support for multiple metrics

Summary

In addition, this retains the pandas multi-index to improve the
visualization. We also transform the program column in-place now since
it appears that DataFrame.to_string() ignores the formatter argument for
all columns that have dtype=object.

Before: utils/compare.py a.json vs b.json --lhs-name=baseline --rhs-name=new -m compile_time -m link_time --nodiff

Program                                        baseline  new    baseline  new
-C++/Shootout-C++-lists                         0.00      0.00   0.09      0.08
-C++/Shootout-C++-objinst                       0.00      0.00   0.07      0.08
...
         baseline         new    baseline         new
count  315.000000  315.000000  315.000000  315.000000
mean   3.543040    3.531691    0.076500    0.078680
...

As can be seen here, the metric is missing from the column headers, so it's not
clear which one is which if you use multiple metrics.

With this change we now see the metrics, and they are sorted by largest diff:

Program                                       compile_time        link_time
                                              baseline     new    baseline  new
SingleSource/Benchmarks/Stanford/Perm           0.07         0.08   0.08      0.11
SingleSour...Benchmarks/Stanford/Bubblesort     0.07         0.08   0.10      0.10
...
      compile_time               link_time
l/r       baseline         new    baseline         new
count  315.000000   315.000000  315.000000  315.000000
mean   3.543040     3.531691    0.076500    0.078680
...

More importantly (my original motivation for this change), the script currently
fails to run when passing multiple metrics without --nodiff. For example,
utils/compare.py a.json vs b.json -m compile_time -m link_time result in:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I believe this used to work and regressed in 0ee69b891c3b167474212eb996b78c23f82bc13d
(https://reviews.llvm.org/D57828) as the exception is raised in add_geomean_row()

After:

Program                                       compile_time              link_time
                                              baseline     new    diff  baseline  new    diff
SingleSource/Benchmarks/Stanford/Perm           0.07         0.08 26.2%   0.08      0.11  34.4%
SingleSour...Benchmarks/Stanford/Bubblesort     0.07         0.08 16.9%   0.10      0.10   4.0%
...
                           Geomean difference                     -0.4%                    2.9%
      compile_time                           link_time
l/r       baseline         new        diff    baseline         new        diff
count  315.000000   315.000000  283.000000  315.000000  315.000000  315.000000
mean   3.543040     3.531691   -0.002523    0.076500    0.078680    0.041652
...

Listing multiple metrics with a single file was also broken:
utils/compare.py a.json -m compile_time -m link_time results an exception
while executing d['$sortkey'] = d[sortkey].
ValueError: Expected a 1D array, got an array with shape (315, 2).
This problem has most likely existed since the initial commit that added this
script, but with this change we can now handle mutiple metrics using the newer
pandas d.sort_values(by=(metrics[0], sortkey), ascending=False) instead of
adding a new column and sorting on that. We now get a table sorted by the
first metric:

Program                                       compile_time link_time
                                              results      results
ultiSource...Benchmarks/7zip/7zip-benchmark   109.56         0.22
ultiSource/Benchmarks/Bullet/bullet            73.55         0.17
...
      compile_time   link_time
run        results     results
count  315.000000   315.000000
mean   3.543040     0.076500
...

Diff Detail

Repository

rT test-suite

Build Status

Buildable 154821
Build 229985: arc lint + arc unit

Event Timeline

arichardson created this revision.Mar 17 2022, 5:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 5:39 AM

arichardson requested review of this revision.Mar 17 2022, 5:39 AM

Harbormaster completed remote builds in B154821: Diff 416152.Mar 17 2022, 5:39 AM

Ensure the full terminal width is used for the d.describe() call.

Harbormaster completed remote builds in B156262: Diff 418192.Mar 25 2022, 5:31 AM

ping?

Changes make sense to me, and a quick sanity check of the script seemed fine.

Thanks, LGTM

utils/compare.py
217–218	Seems like you fixed the `TODO` and can remove it?

This revision is now accepted and ready to land.Apr 20 2022, 10:13 AM

Closed by commit rT0954371820c4: utils/compare.py: Fix support for multiple metrics (authored by arichardson). · Explain WhyApr 20 2022, 10:36 AM

This revision was automatically updated to reflect the committed changes.

arichardson mentioned this in rTbb77bd4067c7: utils/compare.py: Add a main() function.

arichardson marked an inline comment as done.

arichardson added a commit: rT0954371820c4: utils/compare.py: Fix support for multiple metrics.

Revision Contents

Path

Size

utils/

compare.py

84 lines

Diff 416152

utils/compare.py

#!/usr/bin/env python		#!/usr/bin/env python
"""Tool to filter, organize, compare and display benchmarking results. Usefull		"""Tool to filter, organize, compare and display benchmarking results. Usefull
for smaller datasets. It works great with a few dozen runs it is not designed to		for smaller datasets. It works great with a few dozen runs it is not designed to
deal with hundreds.		deal with hundreds.
Requires the pandas library to be installed."""		Requires the pandas library to be installed."""
from __future__ import print_function		from __future__ import print_function

import pandas as pd		import pandas as pd
from scipy import stats		from scipy import stats
import sys		import sys
import os.path		import os.path
import re		import re
import numbers		import numbers
import argparse		import argparse

		GEOMEAN_ROW = 'Geomean difference'

def read_lit_json(filename):		def read_lit_json(filename):
import json		import json
jsondata = json.load(open(filename))		jsondata = json.load(open(filename))
columns = []		columns = []
columnindexes = {}		columnindexes = {}
names = set()		names = set()
info_columns = ['hash']		info_columns = ['hash']
# Pass1: Figure out metrics (= the column index)		# Pass1: Figure out metrics (= the column index)
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	def get_values(values):
if 'diff' in values.columns:		if 'diff' in values.columns:
values = values[[c for c in values.columns if c != 'diff']]		values = values[[c for c in values.columns if c != 'diff']]
has_two_runs = len(values.columns) == 2		has_two_runs = len(values.columns) == 2
if has_two_runs:		if has_two_runs:
return (values.iloc[:,0], values.iloc[:,1])		return (values.iloc[:,0], values.iloc[:,1])
else:		else:
return (values.min(axis=1), values.max(axis=1))		return (values.min(axis=1), values.max(axis=1))

def add_diff_column(values, absolute_diff=False):		def add_diff_column(metric, values, absolute_diff=False):
values0, values1 = get_values(values)		values0, values1 = get_values(values[metric])
# Quotient or absolute difference?		# Quotient or absolute difference?
if absolute_diff:		if absolute_diff:
values['diff'] = values1 - values0		values[(metric, 'diff')] = values1 - values0
else:		else:
values['diff'] = values1 / values0		values[(metric, 'diff')] = (values1 / values0) - 1.0
values['diff'] -= 1.0
return values		return values

def add_geomean_row(data, dataout):		def add_geomean_row(metrics, data, dataout):
"""		"""
Normalize values1 over values0, compute geomean difference and add a		Normalize values1 over values0, compute geomean difference and add a
summary row to dataout.		summary row to dataout.
"""		"""
values0, values1 = get_values(data)		gm = pd.DataFrame(index=[GEOMEAN_ROW], columns=dataout.columns,
		dtype='float64')
		for metric in metrics:
		values0, values1 = get_values(data[metric])
		# Avoid infinite values in the diff and instead use NaN, as otherwise
		# the computation of the geometric mean will fail.
		values0 = values0.replace({0: float('NaN')})
relative = values1 / values0		relative = values1 / values0
gm_diff = stats.gmean(relative) - 1.0		gm_diff = stats.gmean(relative.dropna()) - 1.0
		gm[(metric, 'diff')] = gm_diff
gm_row = {c: '' for c in dataout.columns}		gm.Program = GEOMEAN_ROW
gm_row['diff'] = gm_diff		return pd.concat([dataout, gm])
series = pd.Series(gm_row, name='Geomean difference')
return dataout.append(series)

def filter_failed(data, key='Exec'):		def filter_failed(data, key='Exec'):
return data.loc[data[key] == "pass"]		return data.loc[data[key] == "pass"]

def filter_short(data, key='Exec_Time', threshold=0.6):		def filter_short(data, key='Exec_Time', threshold=0.6):
return data.loc[data[key] >= threshold]		return data.loc[data[key] >= threshold]

def filter_same_hash(data, key='hash'):		def filter_same_hash(data, key='hash'):
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
def format_diff(value):		def format_diff(value):
if not isinstance(value, numbers.Integral):		if not isinstance(value, numbers.Integral):
return "%4.1f%%" % (value * 100.)		return "%4.1f%%" % (value * 100.)
else:		else:
return "%-5d" % value		return "%-5d" % value

def print_result(d, limit_output=True, shorten_names=True, minimal_names=False,		def print_result(d, limit_output=True, shorten_names=True, minimal_names=False,
show_diff_column=True, sortkey='diff', sort_by_abs=True):		show_diff_column=True, sortkey='diff', sort_by_abs=True):
# sort (TODO: is there a more elegant way than create+drop a column?)		# sort (TODO: is there a more elegant way than create+drop a column?)
		metrics = d.columns.levels[0]
		MatzeBUnsubmitted Done Reply Inline Actions Seems like you fixed the `TODO` and can remove it? MatzeB: Seems like you fixed the `TODO` and can remove it?
if sort_by_abs:		if sort_by_abs:
d['$sortkey'] = d[sortkey].abs()		d = d.sort_values(by=(metrics[0], sortkey), key=pd.Series.abs, ascending=False)
else:		else:
d['$sortkey'] = d[sortkey]		d = d.sort_values(by=(metrics[0], sortkey), ascending=False)

		# Ensure that the columns are grouped by metric (rather than having the
		# diffs at the end of the line).
		d = d.reindex(columns=d.columns.levels[0], level=0)

d = d.sort_values("$sortkey", ascending=False)
del d['$sortkey']
if not show_diff_column:		if not show_diff_column:
del d['diff']		# Remove all diff columns (using level=1 since level 0 is the metric).
		d.drop(labels='diff', level=1, axis=1, inplace=True)
dataout = d		dataout = d
if limit_output:		if limit_output:
# Take 15 topmost elements		# Take 15 topmost elements
dataout = dataout.head(15)		dataout = dataout.head(15)

if show_diff_column:
dataout = add_geomean_row(d, dataout)

# Turn index into a column so we can format it...
dataout.insert(0, 'Program', dataout.index)

formatters = dict()		formatters = dict()
formatters['diff'] = format_diff		for m in metrics:
		formatters[(m, 'diff')] = format_diff
		# Turn index into a column so we can format it...
		formatted_program = dataout.index.to_series()
if shorten_names:		if shorten_names:
drop_prefix, drop_suffix = determine_common_prefix_suffix(dataout.Program)
def format_name(name, common_prefix, common_suffix):		def format_name(name, common_prefix, common_suffix):
name = name[common_prefix:]		name = name[common_prefix:]
if common_suffix > 0:		if common_suffix > 0:
name = name[:-common_suffix]		name = name[:-common_suffix]
return "%-45s" % truncate(name, 10, 30)		return "%-45s" % truncate(name, 10, 30)


def strip_name_fully(name):		def strip_name_fully(name):
name = name.split('/')[-1]		name = name.split('/')[-1]
if name.endswith('.test'):		if name.endswith('.test'):
name = name[:-5]		name = name[:-5]
return name		return name

		# The to_string formatters argument appears to be ignored for
		# dtype=object, so transform the program column manually.
if minimal_names:		if minimal_names:
formatters['Program'] = strip_name_fully		formatted_program = formatted_program.map(strip_name_fully)
else:		else:
formatters['Program'] = lambda name: format_name(name, drop_prefix, drop_suffix)		drop_prefix, drop_suffix = determine_common_prefix_suffix(formatted_program)
		formatted_program = formatted_program.map(lambda name: format_name(name, drop_prefix, drop_suffix))
		dataout.insert(0, 'Program', formatted_program)
		# Add the geometric mean row after we have formatted the program names
		# as it will otherwise interfere with common prefix/suffix computation.
		if show_diff_column:
		dataout = add_geomean_row(metrics, d, dataout)

def float_format(x):		def float_format(x):
if x == '':		if x == '':
return ''		return ''
return "%6.2f" % (x,)		return "%6.2f" % (x,)

pd.set_option("display.max_colwidth", 0)		pd.set_option("display.max_colwidth", 0)
out = dataout.to_string(index=False, justify='left',		# Print an empty value instead of NaN (for the geomean row).
		out = dataout.to_string(index=False, justify='left', na_rep='',
float_format=float_format, formatters=formatters)		float_format=float_format, formatters=formatters)
print(out)		print(out)
print(d.describe())		print(d.describe())

def main():		def main():
parser = argparse.ArgumentParser(prog='compare.py')		parser = argparse.ArgumentParser(prog='compare.py')
parser.add_argument('-a', '--all', action='store_true')		parser.add_argument('-a', '--all', action='store_true')
parser.add_argument('-f', '--full', action='store_true')		parser.add_argument('-f', '--full', action='store_true')
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	if final_size != initial_size:
print("Remaining: %d" % (final_size,))		print("Remaining: %d" % (final_size,))

# Reduce / add columns		# Reduce / add columns
print("Metric: %s" % (",".join(metrics),))		print("Metric: %s" % (",".join(metrics),))
if len(metrics) > 0:		if len(metrics) > 0:
data = data[metrics]		data = data[metrics]

data = data.unstack(level=0)		data = data.unstack(level=0)
# unstack() gave us a complicated multiindex for the columns, simplify
# things by renaming to a simple index.
data.columns = [(c[1] if c[1] else c[0]) for c in data.columns.values]

data = add_diff_column(data)		for metric in data.columns.levels[0]:
		data = add_diff_column(metric, data)

sortkey = 'diff'		sortkey = 'diff'
		# TODO: should we still be sorting by diff even if the diff is hidden?
if len(config.files) == 1:		if len(config.files) == 1:
sortkey = data.columns[0]		sortkey = data.columns.levels[1][0]

# Print data		# Print data
print("")		print("")
shorten_names = not config.full		shorten_names = not config.full
limit_output = (not config.all) and (not config.full)		limit_output = (not config.all) and (not config.full)
print_result(data, limit_output, shorten_names, config.minimal_names, config.show_diff, sortkey, config.no_abs_sort)		print_result(data, limit_output, shorten_names, config.minimal_names, config.show_diff, sortkey, config.no_abs_sort)


if __name__ == "__main__":		if __name__ == "__main__":
main()		main()