This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Add SPEC CPU 2017
ClosedPublic

Authored by Meinersbur on Aug 14 2017, 3:06 PM.

Details

Summary

Add CMakeLists.txt files for SPEC CPU 2017.

Add C/C++ benchmarks are included, grouped by benchmark suite (SPECrate 2017 Integer, SPECspeed 2017
Integer, SPECrate 2017 Floating Point, SPECspeed 2017 Floating Point).

SPEC's changes to some of the benchmarks (at least perlbench) require then to be invoked with a relative path (because for finding its working directory, it prepends a path to basename(argv[0])). When the executable is referred to by a relative path, litsupport needs to take cd commands before into account, otherwise it won't find the executable.

Some reference files in SPEC are stored with CRLF line endings (511.povray_r) while the benchmark generates LF endings, at least on UNIX. fpcmp got stock in an endless loop in this case.

Useful additions for the future:

Some mixed-language projects of SPEC CPU 2017 have most of sources written in C/C++ (e.g. cactuBSSN). A useful addition might try compiling the Fortran parts with gfortran.

The only platform I tried is x86_64. I can try make it work with -m32 as well. Support for other platforms has to be added by people who can test on those.

Most 5xx_r/6xx_s benchmark pairs are (nearly) identical and/or only have different test inputs. One might save compile/runtime by running only one of them.

Many _s flavors have support for OpenMP. This is unconditionally disabled at the moment, but one might add an option to compile them with OpenMP.

The new CMakeLists.txt support multiple run types (test,train,ref) in one build but test-suite's cmake system doesn't allow this yet.

Diff Detail

Repository
rL LLVM

Event Timeline

Meinersbur created this revision.Aug 14 2017, 3:06 PM
MatzeB edited edge metadata.Aug 14 2017, 4:19 PM
  • I'd suggest to remove some of the comments in the CMakeLists.txt files:
    • The link to the benchmark descriptions add little value to the cmake code itself (some of them are also wrong).
    • The COMP: and LINK: and test execution lines were probably a big help for you bringing up the cmake files. But keeping them around now seems strange: for example I don't care that your compiler was installed in /root/build/llvm/release/bin/ and there are deviations in the flags that are actually used from the flags present in the COMP/LINK lines. So I'd expect that to add more confusion than being helpful.
External/SPEC/CINT2017rate/505.mcf_r/CMakeLists.txt
1 ↗(On Diff #111077)

wrong link

External/SPEC/CINT2017speed/605.mcf_s/CMakeLists.txt
1 ↗(On Diff #111077)

wrong link?

External/SPEC/CMakeLists.txt
1–22 ↗(On Diff #111077)

It would be good if you could keep the include(External)/externals_find logic inside the subdirectories. The motivation for that is that people like running parts of the test-suite independently. As an example we currently have 4 bots running on our side that set TEST_SUITE_SUBDIRS=(External/SPEC/CINT200|External/SPEC/CINT2006 etc.
The problem with that approach is that cmake will not include the intermediate CMakeLists.txt. So you won't see any effects of External/CMakeLists.txt or External/SPEC/CMakeLists.txt but you will just see External/SPEC/CINT2000/CMakeLists.txt etc. so we have to design this in a way that the intermediate cmake files only use add_subdirectory() commands and do not set variables or otherwise modify state.

Putting common code into something like SpecCPU2017.cmake and using that from all subdirectories is the way to get there, you just have to do it for everything including the llvm_external_find call.

External/SPEC/SpecCPU2017.cmake
170 ↗(On Diff #111077)

I think it would be a good idea add a spec_ (or _spec) to the name, so people know immediately that this is not a generic macro but contains logic designed for the spec benchmarks. The same applies to add_include_dirs, set_sources, run_test.

cmake/modules/TestFile.cmake
37–40 ↗(On Diff #111077)

The WORKDIR feature seems unnecessary to me. Can't you rather prepend the workdir to the arguments of the fpcmp/diff command instead of adding this? Even if there is a reason to change the working directory, I think users should just prepend their command with the cd xxx ; sequence. (I know llvm_test_run has the feature too, but that has more to do with the fact that people do not specify the executable for llvm_test_run, so there is no easy way to prepend something to the executable).

litsupport/shellcommand.py
149 ↗(On Diff #111077)

Modifying cmd.executable feeld wrong, the value should rather be pulled out into a variable and that variable modified.

tools/fpcmp.c
249–264 ↗(On Diff #111077)

I think this should go into a separate review. I think we also shouldn't make \r skipping the default behavior and rather require the user to enable an option.

Sorry, pressed submit by accident. This review is not finished (I'll add more thins later, I also have to get my hands on the benchmark sources yet to try it all out in practice). It lacks the most important part at the beginning:

Hi Michael,
thanks a lot for working on this! This looks like a great start and is really appreciated.

MatzeB added inline comments.Aug 14 2017, 4:37 PM
External/SPEC/SpecCPU2017.cmake
119–122 ↗(On Diff #111077)

I think the ORIGIN_PATH logic would be better expressed by have a separate macro to call instead of setting a variable that then changes all the behavior of this macro.

123–148 ↗(On Diff #111077)

This is a bit subjective but I'd lean towards splitting this into two macros for test file creation and defining the test executable. While this is an extra line for each benchmark, I think it is easier to understand what is going on and allows you to choose meaningful macro names over an uninformative benchmark_done name...

150–156 ↗(On Diff #111077)

Does this mean some benchmark come with a custom tool to validate the results? I guess the current approach will fail in cross compilation settings. I don't have an easy solution for that though and we may want to address this in followup patches instead.

158–163 ↗(On Diff #111077)

COPY_TO_RUNDIR is another case where a separate macro would work better than setting a variable to influence the behavior here.

Thank you for the review!

  • I'd suggest to remove some of the comments in the CMakeLists.txt files:
    • The link to the benchmark descriptions add little value to the cmake code itself (some of them are also wrong).

Some links include hints about how the benchmarks are to be compiled. For instance, perlbench requires -fno-strict-aliasing which explains why the option as added. gcc requires either -fgnu89-inline or -z muldefs added to the linker. Hence the links contain information on why the options are set as they are.

On the conterside, I don't see why the comments are problematic, even if they add only little value. That's at least some value.

Please respond whether I should still remove them.

Of course I will correct the wrong links.

  • The COMP: and LINK: and test execution lines were probably a big help for you bringing up the cmake files. But keeping them around now seems strange: for example I don't care that your compiler was installed in /root/build/llvm/release/bin/ and there are deviations in the flags that are actually used from the flags present in the COMP/LINK lines. So I'd expect that to add more confusion than being helpful.

I agree, and will remove them in the next update. I wanted to remove them anyway, but at the moment when I run the ref datasets, I hit lit's default timout for xz_s and imgemagick_s on my computer. I only keep them until I know that all the verifications pass.

External/SPEC/CMakeLists.txt
1–22 ↗(On Diff #111077)

That's a good idea.

Btw, SPEC2006/SPEC2000 uses llvm_externals_find in their CINT200x/CFP200x directories such that one cannot only configure an individual benchmark. It also prints the "Expected SPEC200x version 1.y version multiple times", which annoys me a bit (I don't have access to the latest versions).

Could we make "lnt runtest test-suite --only-test" one day use TEST_SUITE_SUBDIRS such that it configures only that particular directory instead of the entire test-suite?

External/SPEC/SpecCPU2017.cmake
119–122 ↗(On Diff #111077)

Could you outline how you think this should be structured?

The "problem" is that the ORIGIN_PATH must be already known in cpu2017_benchmark which is uses to "inherit" its source and data path from.

I could specify the origin benchmarks in both calls, but that information is redundant.

150–156 ↗(On Diff #111077)

Yes, each benchmark that outputs an image has its own image_validator tool. I did not compare their sources, maybe they are identical. runspec however even compiles the 5xx_r and 6xx_s variants sperately.

cmake/modules/TestFile.cmake
37–40 ↗(On Diff #111077)

The reason to change the workdir is that some of spec's verifiers (image_validator) put the input argument into the output. If it is invoked with an absolute path, the comparison with the reference output which does not include the absolute path will fail.

I found the WORKDIR parameter useful and cmake's add_custom_target itself features a WORKING_DIRECTORY argument, so I would argue it makes sense for consistence; and callers do not have to think about e.g. whether calling cd is platform-neutral.

However, it is not a big thing so I will remove it in the update.

Thank you for the review!

  • I'd suggest to remove some of the comments in the CMakeLists.txt files:
    • The link to the benchmark descriptions add little value to the cmake code itself (some of them are also wrong).

Some links include hints about how the benchmarks are to be compiled. For instance, perlbench requires -fno-strict-aliasing which explains why the option as added. gcc requires either -fgnu89-inline or -z muldefs added to the linker. Hence the links contain information on why the options are set as they are.

On the conterside, I don't see why the comments are problematic, even if they add only little value. That's at least some value.

They are not problematic. I mainly wanted to confirm that you indeed find them useful. Leave them in.

Please respond whether I should still remove them.

Of course I will correct the wrong links.

  • The COMP: and LINK: and test execution lines were probably a big help for you bringing up the cmake files. But keeping them around now seems strange: for example I don't care that your compiler was installed in /root/build/llvm/release/bin/ and there are deviations in the flags that are actually used from the flags present in the COMP/LINK lines. So I'd expect that to add more confusion than being helpful.

I agree, and will remove them in the next update. I wanted to remove them anyway, but at the moment when I run the ref datasets, I hit lit's default timout for xz_s and imgemagick_s on my computer. I only keep them until I know that all the verifications pass.

External/SPEC/CMakeLists.txt
1–22 ↗(On Diff #111077)

Yes the current situation is far from perfect, it only works one level up, not for individual benchmarks. I mainly wanted to point out why things are organized the way they are for SPEC.

It also prints the "Expected SPEC200x version 1.y version multiple times", which annoys me a bit (I don't have access to the latest versions).

I never realized that as I had the latest version, patches welcome. You could probably track in a variable like SPEC2000_VERSION_WARNING_DISPLAYED whether we already printed such a message.

Could we make "lnt runtest test-suite --only-test" one day use TEST_SUITE_SUBDIRS such that it configures only that particular directory instead of the entire test-suite?

Several notes here:

  • You can use lnt --cmake-define TEST_SUITE_SUBDIRS=xxx today, we do that in some of our (internal) bots and it works fine.
  • SingleSource benchmarks cannot be selected just with directory names; I don't see how we can enforece or check that intermediate CMakeLists.txt do not change global state.
  • Also see https://reviews.llvm.org/D34132 for my take/opinion on lnt runtest test-suite.
External/SPEC/SpecCPU2017.cmake
119–122 ↗(On Diff #111077)

This is mostly a design "smell", in my experience there are often better solution possible if functions/macros have overly generic/uninformative names. Note that this is just a hunch and not always true, so I'm mainly asking questions to have you at least think about it :)

That said, even if you cannot eliminate the ORIGIN_PATH from the early cpu2017_benchmark for now, you could still a macro like cpu2017_include_origin() that should be used instead of cpu2017_benchmark_done() in the ORIGIN_PATH cases.

150–156 ↗(On Diff #111077)

As for followup patches here:

Note that there is llvm_add_host_executable in tools/CMakeLists.txt which I consider a hack that happens to work well enough for timeit.c/fpcmp.c but will fall short if people start doing more complicated stuff in their host tools that require c flags, linker flags, etc.

Maybe we should move that into cmake/modules and use it here too, though moving it there also feels like legitimizing this hack.

cmake/modules/TestFile.cmake
37–40 ↗(On Diff #111077)

I don't feel strongly about, but note that looking up whether a WORKING_DIRECTORY feature exists is probably equally complicated as determining whether cd xxx will work. You probably end up just scouting for examples/precedence in both cases. And when we have the cd feature working anyway, this feels like just introducing a 2nd syntax.

kristof.beyls added inline comments.Aug 16 2017, 12:15 AM
External/SPEC/SpecCPU2017.cmake
150–156 ↗(On Diff #111077)

I guess that this benchmark-specific image_validator is very close in intended use as fpcmp.c.
Therefore, my gut feel is that it'd probably be best if the fpcmp and the image_validator binaries were built using the same mechanism, even if it was extending a current hack.
Wouldn't 1 hack be preferable over 2 different hacks; so that when the hack needs to be fixed, only one location needs to be fixed?

  • Remove runspec log comments
  • Split cpu2017_benchmark/cpu2016_benchmark_done into multiple macros
  • Prefix all macros with speccpu2017_
  • Move and use llvm_add_host_executable in(to) Host.cmake
  • Fix 644.nab_s output file
  • Use LF file endings
MatzeB accepted this revision.Aug 17 2017, 11:56 AM

I'm still waiting to get my hands on the benchmark to try this out myself.

But this is coming along nicely and looks good to me. Thanks!

External/SPEC/CFP2017rate/519.lbm_r/CMakeLists.txt
9 ↗(On Diff #111500)

Most parts of the test-suite use list(APPEND LDFLAGS -lm) for this (that's not better but maybe more consistent)

This revision is now accepted and ready to land.Aug 17 2017, 11:56 AM

One more remark: The 'ref' dataset of 638.imagick_s on all the computers I tried on took between 2 and 3 hours. Submitted results to SPEC are in a similar range. Unfortunately timeit.py has a 7200 seconds (2 hours) limit hardcoded (for --limit-cpu and --timeout). I had to remove that limit to be able to have a successful run.

Can we make the limit configurable or set higher?

External/SPEC/CFP2017rate/519.lbm_r/CMakeLists.txt
9 ↗(On Diff #111500)

I assumed that CPPFLAGS, CFLAGS and LDFLAGS are there only from the automatic upgrade from the Makefiles. The SPEC2006 also mostly use include_directories/add_definitions instead of C(PP)FLAGS. I stayed consistent with that.

The handling of LDFLAGS in llvm_test_executable is mmmh... ugly. I also need the flags applied on the image_validator and ldecod from x264. link_libraries was practical here because it also applied to these executables.

One more remark: The 'ref' dataset of 638.imagick_s on all the computers I tried on took between 2 and 3 hours. Submitted results to SPEC are in a similar range. Unfortunately timeit.py has a 7200 seconds (2 hours) limit hardcoded (for --limit-cpu and --timeout). I had to remove that limit to be able to have a successful run.

Can we make the limit configurable or set higher?

Let's make it configurable. There isn't any hard-coded value that will work well in all situations.

One more remark: The 'ref' dataset of 638.imagick_s on all the computers I tried on took between 2 and 3 hours. Submitted results to SPEC are in a similar range. Unfortunately timeit.py has a 7200 seconds (2 hours) limit hardcoded (for --limit-cpu and --timeout). I had to remove that limit to be able to have a successful run.

Can we make the limit configurable or set higher?

Let's make it configurable. There isn't any hard-coded value that will work well in all situations.

Yes. I think the easiest way to implement this would be reading an option from lit.cfg (config.xxx inside the litsupport code) so that benchmarks with special needs can place a lit.local.cfg in their directory to change this setting.
I think it would also be best to not make this setting a fixed value but a multiplier. So you would specify 638.imagick_s gets 3x the amount of time as other benchmarks. The nice thing about specifying this in relative terms is that someone benchmarking on really slow (or fast) devices can change the default/base timeout without the need to adapt all the per-benchmark timeouts as well.

This revision was automatically updated to reflect the committed changes.