In addition, this retains the pandas multi-index to improve the
visualization. We also transform the program column in-place now since
it appears that DataFrame.to_string() ignores the formatter argument for
all columns that have dtype=object.
Before: utils/compare.py a.json vs b.json --lhs-name=baseline --rhs-name=new -m compile_time -m link_time --nodiff
Program baseline new baseline new -C++/Shootout-C++-lists 0.00 0.00 0.09 0.08 -C++/Shootout-C++-objinst 0.00 0.00 0.07 0.08 ... baseline new baseline new count 315.000000 315.000000 315.000000 315.000000 mean 3.543040 3.531691 0.076500 0.078680 ...
As can be seen here, the metric is missing from the column headers, so it's not
clear which one is which if you use multiple metrics.
With this change we now see the metrics, and they are sorted by largest diff:
Program compile_time link_time baseline new baseline new SingleSource/Benchmarks/Stanford/Perm 0.07 0.08 0.08 0.11 SingleSour...Benchmarks/Stanford/Bubblesort 0.07 0.08 0.10 0.10 ... compile_time link_time l/r baseline new baseline new count 315.000000 315.000000 315.000000 315.000000 mean 3.543040 3.531691 0.076500 0.078680 ...
More importantly (my original motivation for this change), the script currently
fails to run when passing multiple metrics without --nodiff. For example,
utils/compare.py a.json vs b.json -m compile_time -m link_time result in:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
I believe this used to work and regressed in 0ee69b891c3b167474212eb996b78c23f82bc13d
(https://reviews.llvm.org/D57828) as the exception is raised in add_geomean_row()
After:
Program compile_time link_time baseline new diff baseline new diff SingleSource/Benchmarks/Stanford/Perm 0.07 0.08 26.2% 0.08 0.11 34.4% SingleSour...Benchmarks/Stanford/Bubblesort 0.07 0.08 16.9% 0.10 0.10 4.0% ... Geomean difference -0.4% 2.9% compile_time link_time l/r baseline new diff baseline new diff count 315.000000 315.000000 283.000000 315.000000 315.000000 315.000000 mean 3.543040 3.531691 -0.002523 0.076500 0.078680 0.041652 ...
Listing multiple metrics with a single file was also broken:
utils/compare.py a.json -m compile_time -m link_time results an exception
while executing d['$sortkey'] = d[sortkey].
ValueError: Expected a 1D array, got an array with shape (315, 2).
This problem has most likely existed since the initial commit that added this
script, but with this change we can now handle mutiple metrics using the newer
pandas d.sort_values(by=(metrics[0], sortkey), ascending=False) instead of
adding a new column and sorting on that. We now get a table sorted by the
first metric:
Program compile_time link_time results results ultiSource...Benchmarks/7zip/7zip-benchmark 109.56 0.22 ultiSource/Benchmarks/Bullet/bullet 73.55 0.17 ... compile_time link_time run results results count 315.000000 315.000000 mean 3.543040 0.076500 ...
Seems like you fixed the TODO and can remove it?