This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Adding LCALS (Livermore Compiler Analysis Loop Suite) loop kernels to test suite.
ClosedPublic

Authored by homerdin on Feb 14 2018, 2:41 PM.

Details

Summary

These changes are dependent on the changes purposed in https://reviews.llvm.org/D43314 and https://reviews.llvm.org/D43316.

This adds parts of LCALS as google benchmarks to the test suite. The loop suite is partitioned into 3 subsets based on their origins.

From README-LCALS_instructions.txt:
LCALS (“Livermore Compiler Analysis Loop Suite”) is a collection of loop kernels based, in part, on historical “Livermore Loops” benchmarks (See the 1986 technical report: “The Livermore Fortran Kernels: A Computer Test of the Numerical Performance Range”, by Frank H. McMahon, UCRL-53745.).

  • Subset A: Loops representative of those found in application codes. They are implemented in source files named runA<variant>Loops.cxx.
  • Subset B: Basic loops that help to illustrate compiler optimization issues. They are implemented in source files named runB<variant>Loops.cxx
  • Subset C: Loops extracted from "Livermore Loops coded in C" developed by Steve Langer, which were derived from the Fortran version by Frank McMahon. They are implemented in source files runC<variant>Loops.cxx

Being added are google benchmark versions of the Raw and ForeachLambda variants in the 3 sizes.

  • SubsetALambdaLoops: 18 tests (6 loops x 3 sizes)
  • SubsetARawLoops: 18 tests (6 loops x 3 sizes)
  • SubsetBLambdaLoops: 12 tests (4 loops x 3 sizes)
  • SubsetBRawLoops: 12 tests (4 loops x 3 sizes)
  • SubsetCLambdaLoops: 60 tests (20 loops x 3 sizes)
  • SubsetCRawLoops: 60 tests (20 loops x 3 sizes)

When run on Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz (Haswell):

  • SubsetA takes around 22 seconds (18 tests reported).
  • SubsetB takes around 16 seconds (12 tests reported).
  • SubsetC takes around 61 seconds (60 tests reported).

The machine being used should not affect the runtime of the tests by much as the benchmark library will adjust the number of iterations up or down. There are several ways I can adjust the overall run time, but as this will be reporting multiple results from each executable I was unsure what the expectation would be.

Diff Detail

Event Timeline

homerdin created this revision.Feb 14 2018, 2:41 PM
homerdin updated this revision to Diff 134438.Feb 15 2018, 8:35 AM

Include full context

MatzeB accepted this revision.Feb 22 2018, 4:50 PM
  • You should double check whether you really need to repeat the lit.local.cfg changes in each directory.
  • I haven't actually tried compiling/running the benchmark but the changes LGTM.
MicroBenchmarks/LCALS/SubsetCRawLoops/lit.local.cfg
1–7 ↗(On Diff #134438)

It's enough to have this at the toplevel /Microbenchmark directory I think, and you shouldn't need to repeat it for each subdirectory, is it not?

This revision is now accepted and ready to land.Feb 22 2018, 4:50 PM

Ya, just having the lit.local.cfg in /Microbenchmark will work. See inline.

MicroBenchmarks/LCALS/SubsetCRawLoops/lit.local.cfg
1–7 ↗(On Diff #134438)

Tested it and you're right, everything works with it in the /Microbenchmark directory so long as the subdirectory ones are removed.

microbenchmark.py raises an exception if there are both. So I added the lit.local.cfg in /Microbenchmark in https://reviews.llvm.org/D43316 so nothing breaks in the XRay tests. Would be better to commit that change first anyways, cause without it these tests produce 1 big number.

# We need stdout outself to get the benchmark csv data.
if cmd.stdout is not None:
    raise Exception("Rerouting stdout not allowed for microbenchmarks")
homerdin updated this revision to Diff 135692.Feb 23 2018, 1:55 PM

Updated to Remove lit.local.cfg files. Realizing I added the add_subdirectory(LCALS) in https://reviews.llvm.org/D43316 alongside the lit.local.cfg

This revision was automatically updated to reflect the committed changes.
MicroBenchmarks/LCALS/LCALSSuite.hxx