This is an archive of the discontinued LLVM Phabricator instance.

test-suite: Remove lemon benchmark
ClosedPublic

Authored by MatzeB on Jun 21 2016, 7:27 PM.

Details

Summary

This benchmark is unsuitable as a compiler benchmark:

  • It forks itself hundreds of times per second (in order to reset to a clean state after an iteration). This makes it more a test for the operating system than of code produced by the compiler. This also leads to performance noise depending on the OS.
  • Profiling the code (of a single run without fork) shows 60-70% of the time is spend in fputs and fputc which is just testing your libc and OS again.

This test is often showing huge performance jumps for me and is a waste of
time when you have to check whether compiler changes have led to any (real)
regressions.

I hereby propose it for deletion from the test-suite.

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB updated this revision to Diff 61490.Jun 21 2016, 7:27 PM
MatzeB retitled this revision from to test-suite: Remove lemon benchmark.
MatzeB updated this object.
MatzeB set the repository for this revision to rL LLVM.
MatzeB added a subscriber: llvm-commits.

Thanks for the analysis!

Interestingly, on the systems I look at most often (linux AArch64 and AArch32), I don't see lemon produce too much noise.
Anyway, with your analysis, it's clear that lemon doesn't really test quality of compiler code generation, so I agree not to run it as a benchmark.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Thanks,

Kristof

cmatthews edited edge metadata.Jun 22 2016, 11:12 AM

Thanks for the analysis!

Interestingly, on the systems I look at most often (linux AArch64 and AArch32), I don't see lemon produce too much noise.
Anyway, with your analysis, it's clear that lemon doesn't really test quality of compiler code generation, so I agree not to run it as a benchmark.

Forking is a more heavy weight operation on Darwin.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Since lemon is a parser, maybe it is okay to totally remove since we have some other benchmarks that represent that class of programs?

The runtime is sub-second everywhere where I looked, so I think it is fine to keep it in if we think there is any chance that it would detect a miscompile.

Thanks,

Kristof

Also, just for reference: on our test machines I checked today, the noise on lemon is ~2% on arm and 3.5% on x86_64. Not near the worst offender in terms of noise.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Since lemon is a parser, maybe it is okay to totally remove since we have some other benchmarks that represent that class of programs?

The runtime is sub-second everywhere where I looked, so I think it is fine to keep it in if we think there is any chance that it would detect a miscompile.

Right. I don't have a strong opinion on whether to keep this in non-benchmarking mode only, or to remove the test altogether.
Both options are fine for me.

LGTM.

lemon is mostly stable for me as well, but from time to time it has huge outliers (just speculating but maybe iOS throttles too much forking in some circumstances and that benchmark is at the edge of such throttling...)

  • Matthias
rengolin edited edge metadata.Jun 22 2016, 11:24 AM

Maybe deletion from the benchmark set instead of at all?

Since this is a compiler test, we should remove any OS related noise (forks, etc) and reduce the printf need by accumulating the results and only printing at the end.

I've done a lot of those small things in the past and they work well with the test-suite.

cheers,
--renato

Given the mixed feedback I will only remove lemon from the BENCHMARKING_ONLY set.

This revision was automatically updated to reflect the committed changes.

Thanks Matthias,

I've created https://llvm.org/bugs/show_bug.cgi?id=28279 to make sure we don't forget. I really don't think we should have that kind of test in the test-suite anyway.

cheers,
--renato