This is an archive of the discontinued LLVM Phabricator instance.

test-suite: Remove lemon benchmark
ClosedPublic

Authored by MatzeB on Jun 21 2016, 7:27 PM.

Download Raw Diff

Details

Reviewers

kristof.beyls
rengolin
cmatthews

Commits

rOLDT273530: Remove lemon from BENCHMARKING_ONLY set.
rL273530: Remove lemon from BENCHMARKING_ONLY set.

Summary

This benchmark is unsuitable as a compiler benchmark:

It forks itself hundreds of times per second (in order to reset to a clean state after an iteration). This makes it more a test for the operating system than of code produced by the compiler. This also leads to performance noise depending on the OS.
Profiling the code (of a single run without fork) shows 60-70% of the time is spend in fputs and fputc which is just testing your libc and OS again.

This test is often showing huge performance jumps for me and is a waste of
time when you have to check whether compiler changes have led to any (real)
regressions.

I hereby propose it for deletion from the test-suite.

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 61490.Jun 21 2016, 7:27 PM

MatzeB retitled this revision from to test-suite: Remove lemon benchmark.

MatzeB updated this object.

MatzeB added reviewers: rengolin, cmatthews, kristof.beyls.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added a subscriber: llvm-commits.

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 21 2016, 7:27 PM

Thanks for the analysis!

Interestingly, on the systems I look at most often (linux AArch64 and AArch32), I don't see lemon produce too much noise.
Anyway, with your analysis, it's clear that lemon doesn't really test quality of compiler code generation, so I agree not to run it as a benchmark.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Thanks,

Kristof

In D21593#464278, @kristof.beyls wrote:

Thanks for the analysis!

Interestingly, on the systems I look at most often (linux AArch64 and AArch32), I don't see lemon produce too much noise.
Anyway, with your analysis, it's clear that lemon doesn't really test quality of compiler code generation, so I agree not to run it as a benchmark.

Forking is a more heavy weight operation on Darwin.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Since lemon is a parser, maybe it is okay to totally remove since we have some other benchmarks that represent that class of programs?

The runtime is sub-second everywhere where I looked, so I think it is fine to keep it in if we think there is any chance that it would detect a miscompile.

Thanks,

Kristof

Also, just for reference: on our test machines I checked today, the noise on lemon is ~2% on arm and 3.5% on x86_64. Not near the worst offender in terms of noise.

However, it may still make sense to keep this in the test-suite as a correctness test, in which case it should be just listed under "if(NOT TEST_SUITE_BENCHMARKING_ONLY)"?

Since lemon is a parser, maybe it is okay to totally remove since we have some other benchmarks that represent that class of programs?

The runtime is sub-second everywhere where I looked, so I think it is fine to keep it in if we think there is any chance that it would detect a miscompile.

Right. I don't have a strong opinion on whether to keep this in non-benchmarking mode only, or to remove the test altogether.
Both options are fine for me.

LGTM.

lemon is mostly stable for me as well, but from time to time it has huge outliers (just speculating but maybe iOS throttles too much forking in some circumstances and that benchmark is at the edge of such throttling...)

Matthias

Maybe deletion from the benchmark set instead of at all?

Since this is a compiler test, we should remove any OS related noise (forks, etc) and reduce the printf need by accumulating the results and only printing at the end.

I've done a lot of those small things in the past and they work well with the test-suite.

cheers,
--renato

Given the mixed feedback I will only remove lemon from the BENCHMARKING_ONLY set.

Closed by commit rL273530: Remove lemon from BENCHMARKING_ONLY set. (authored by matze). · Explain WhyJun 22 2016, 8:14 PM

This revision was automatically updated to reflect the committed changes.

Thanks Matthias,

I've created https://llvm.org/bugs/show_bug.cgi?id=28279 to make sure we don't forget. I really don't think we should have that kind of test in the test-suite anyway.

cheers,
--renato

Revision Contents

Path

Size

test-suite/

trunk/

MultiSource/

Applications/

Makefile

1 line

lemon/

CMakeLists.txt

16 lines

Diff 61637

test-suite/trunk/MultiSource/Applications/Makefile

	Show All 39 Lines
	# XCore does not support fork().			# XCore does not support fork().
	# lemon removed pending general method for running tests multiple times.			# lemon removed pending general method for running tests multiple times.
	PARALLEL_DIRS := $(filter-out lemon, $(PARALLEL_DIRS))			PARALLEL_DIRS := $(filter-out lemon, $(PARALLEL_DIRS))
	endif			endif

	ifdef BENCHMARKING_ONLY			ifdef BENCHMARKING_ONLY
	PARALLEL_DIRS := $(filter-out Burg, $(PARALLEL_DIRS))			PARALLEL_DIRS := $(filter-out Burg, $(PARALLEL_DIRS))
	PARALLEL_DIRS := $(filter-out treecc, $(PARALLEL_DIRS))			PARALLEL_DIRS := $(filter-out treecc, $(PARALLEL_DIRS))
				PARALLEL_DIRS := $(filter-out lemon, $(PARALLEL_DIRS))
	endif			endif

	include $(LEVEL)/Makefile.programs			include $(LEVEL)/Makefile.programs

test-suite/trunk/MultiSource/Applications/lemon/CMakeLists.txt

				if(NOT TEST_SUITE_BENCHMARKING_ONLY)
	set(Source lemon.c)			set(Source lemon.c)
	set(PROG lemon)			set(PROG lemon)
	set(RUN_OPTIONS ${CMAKE_CURRENT_SOURCE_DIR}/parse.y ${CMAKE_CURRENT_SOURCE_DIR}/example1.y ${CMAKE_CURRENT_SOURCE_DIR}/example2.y ${CMAKE_CURRENT_SOURCE_DIR}/example3.y ${CMAKE_CURRENT_SOURCE_DIR}/example4.y ${CMAKE_CURRENT_SOURCE_DIR}/example5.y ${CMAKE_CURRENT_SOURCE_DIR}/lighttpd_configparser.y ${CMAKE_CURRENT_SOURCE_DIR}/lighttpd_mod_ssi_exprparser.y ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_dtd_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_mate_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/xapian_queryparser.lemony ${CMAKE_CURRENT_SOURCE_DIR}/ecmascript.y)			set(RUN_OPTIONS ${CMAKE_CURRENT_SOURCE_DIR}/parse.y ${CMAKE_CURRENT_SOURCE_DIR}/example1.y ${CMAKE_CURRENT_SOURCE_DIR}/example2.y ${CMAKE_CURRENT_SOURCE_DIR}/example3.y ${CMAKE_CURRENT_SOURCE_DIR}/example4.y ${CMAKE_CURRENT_SOURCE_DIR}/example5.y ${CMAKE_CURRENT_SOURCE_DIR}/lighttpd_configparser.y ${CMAKE_CURRENT_SOURCE_DIR}/lighttpd_mod_ssi_exprparser.y ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_dtd_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/wireshark_mate_grammar.lemon ${CMAKE_CURRENT_SOURCE_DIR}/xapian_queryparser.lemony ${CMAKE_CURRENT_SOURCE_DIR}/ecmascript.y)
	set(HASH_PROGRAM_OUTPUT 1)			set(HASH_PROGRAM_OUTPUT 1)
	set(RUNTIMELIMIT 1500)			set(RUNTIMELIMIT 1500)
	set(DIFFPROG ${PROGDIR}/DiffOutput.sh "diff ")			set(DIFFPROG ${PROGDIR}/DiffOutput.sh "diff ")
	llvm_multisource()			llvm_multisource()
				endif()