This is an archive of the discontinued LLVM Phabricator instance.

nbench: Eliminate timing and re-running related code
ClosedPublic

Authored by MatzeB on Jun 21 2016, 6:42 PM.

Download Raw Diff

Details

Reviewers

bkramer
cmatthews

Commits

rOLDT273529: nbench: Eliminate timing and re-running related code
rL273529: nbench: Eliminate timing and re-running related code

Summary

nbench is by far the noisiest benchmark in the test-suite for me. This
seems to be caused by the benchmark running a variable number of
iterations internally based on its own time measurements. Even though the
original commit to the test-suite claims this was disabled.

This commit shortcuts all the re-running logic and removes
all time measurement code just to be sure.

Diff Detail

Repository: rL LLVM

Event Timeline

MatzeB updated this revision to Diff 61486.Jun 21 2016, 6:42 PM

MatzeB retitled this revision from to nbench: Eliminate timing and re-running related code.

MatzeB updated this object.

MatzeB added a reviewer: bkramer.

MatzeB set the repository for this revision to rL LLVM.

MatzeB added subscribers: llvm-commits, cmatthews, kristof.beyls.

Herald added a subscriber: mcrosier. · View Herald TranscriptJun 21 2016, 6:42 PM

I also see nbench consistently being noisy.
The benchmarks in the test-suite should on every run do the same amount of work when invoked in the same way, so indeed, it's broken that nbench changes the number of iterations it runs the way it does at the moment. If such functionality would be needed, it would need to be moved to the test-suite driver level, not be implemented in individual programs.

That being said, while we're fiddling with the number of iterations this benchmark runs, could we reduce this by a factor of about 100, so it runs in between 0.1 and 1 seconds on most systems?
I've started doing analysis on what would be needed to make the test-suite in benchmark mode run an order of magnitude more quickly, so that with the same amount of experimentation time, we could get an order of magnitude more samples to do statistical analysis.
My initial experiments on x86_64 and AArch64 systems indicate that as long as a benchmark runs for at least 0.1 seconds, noise isn't larger than if it runs for much longer.
Making all programs in the test-suite run between 0.1 and 1 second would make the test-suite run probably between 1 and 2 orders of magnitude faster.
In these experiments, I see nbench taking between 0.5% and 3% of the total test-suite running time. So if it's little work to reduce the number of iterations in nbench as part of this work, we at least already have this test-suite speedup in the bank.

Thanks!

Kristof

MultiSource/Benchmarks/nbench/nbench0.c
68 ↗	(On Diff #61486)	It seems this iter variable is unused, as you declare another iter variable in for-loop-scope below?

LGTM.

This revision is now accepted and ready to land.Jun 22 2016, 11:16 AM

In D21590#464282, @kristof.beyls wrote:

I also see nbench consistently being noisy.
The benchmarks in the test-suite should on every run do the same amount of work when invoked in the same way, so indeed, it's broken that nbench changes the number of iterations it runs the way it does at the moment. If such functionality would be needed, it would need to be moved to the test-suite driver level, not be implemented in individual programs.

That being said, while we're fiddling with the number of iterations this benchmark runs, could we reduce this by a factor of about 100, so it runs in between 0.1 and 1 seconds on most systems?
I've started doing analysis on what would be needed to make the test-suite in benchmark mode run an order of magnitude more quickly, so that with the same amount of experimentation time, we could get an order of magnitude more samples to do statistical analysis.
My initial experiments on x86_64 and AArch64 systems indicate that as long as a benchmark runs for at least 0.1 seconds, noise isn't larger than if it runs for much longer.
Making all programs in the test-suite run between 0.1 and 1 second would make the test-suite run probably between 1 and 2 orders of magnitude faster.
In these experiments, I see nbench taking between 0.5% and 3% of the total test-suite running time. So if it's little work to reduce the number of iterations in nbench as part of this work, we at least already have this test-suite speedup in the bank.

I will introduce an N_ITERATIONS define and reduce it from 5 to 1, which brings the runtime down to 1s on my desktop machine.

Closed by commit rL273529: nbench: Eliminate timing and re-running related code (authored by matze). · Explain WhyJun 22 2016, 8:14 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

test-suite/

trunk/

MultiSource/

Benchmarks/

nbench/

nbench0.c

12 lines

sysspec.h

1 line

sysspec.c

79 lines

Diff 61636

test-suite/trunk/MultiSource/Benchmarks/nbench/nbench0.c

	Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	#include <ctype.h>			#include <ctype.h>
	#include <string.h>			#include <string.h>
	#include <time.h>			#include <time.h>
	#include <math.h>			#include <math.h>
	#include "nmglobal.h"			#include "nmglobal.h"
	#include "nbench0.h"			#include "nbench0.h"
	#include "hardware.h"			#include "hardware.h"

				#ifndef N_ITERATIONS
				#define N_ITERATIONS 1
				#endif

	/*************			/*************
	** main **			** main **
	*************/			*************/
	#ifdef MAC			#ifdef MAC
	void main(void)			void main(void)
	#else			#else
	int main(int argc, char *argv[])			int main(int argc, char *argv[])
	#endif			#endif
	{			{
	int i; /* Index */			int i; /* Index */
				int iter;
	time_t time_and_date; /* Self-explanatory */			time_t time_and_date; /* Self-explanatory */
	struct tm *loctime;			struct tm *loctime;
	double bmean; /* Benchmark mean */			double bmean; /* Benchmark mean */
	double bstdev; /* Benchmark stdev */			double bstdev; /* Benchmark stdev */
	double lx_memindex; /* Linux memory index (mainly integer operations)*/			double lx_memindex; /* Linux memory index (mainly integer operations)*/
	double lx_intindex; /* Linux integer index */			double lx_intindex; /* Linux integer index */
	double lx_fpindex; /* Linux floating-point index */			double lx_fpindex; /* Linux floating-point index */
	double intindex; /* Integer index */			double intindex; /* Integer index */
	▲ Show 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	output_string("--------------------:------------------:-------------:------------\n");			output_string("--------------------:------------------:-------------:------------\n");
	#endif			#endif

	for(i=0;i<NUMTESTS;i++)			for(i=0;i<NUMTESTS;i++)
	{			{
	if(tests_to_do[i])			if(tests_to_do[i])
	{ sprintf(buffer,"%s :",ftestnames[i]);			{ sprintf(buffer,"%s :",ftestnames[i]);
	output_string(buffer);			output_string(buffer);
				#if 0
	if (0!=bench_with_confidence(i,			if (0!=bench_with_confidence(i,
	&bmean,			&bmean,
	&bstdev,			&bstdev,
	&bnumrun)){			&bnumrun)){
	#if 0
	output_string("\n** WARNING: The current test result is NOT 95 % statistically certain.\n");			output_string("\n** WARNING: The current test result is NOT 95 % statistically certain.\n");
	output_string("** WARNING: The variation among the individual results is too large.\n");			output_string("** WARNING: The variation among the individual results is too large.\n");
	output_string(" :");			output_string(" :");
				}
	#endif			#endif
				for (iter = 0; iter < N_ITERATIONS; ++iter) {
				(*funcpointer[i])();
	}			}
	#ifdef LINUX			#ifdef LINUX
	sprintf(buffer," %15.5g : %9.2f : %9.2f\n",			sprintf(buffer," %15.5g : %9.2f : %9.2f\n",
	bmean,bmean/bindex[i],bmean/lx_bindex[i]);			bmean,bmean/bindex[i],bmean/lx_bindex[i]);
	#else			#else
	sprintf(buffer," Iterations/sec.: %13.2f Index: %6.2f\n",			sprintf(buffer," Iterations/sec.: %13.2f Index: %6.2f\n",
	/bmean,bmean/bindex[i],/ 0.0, 0.0);			/bmean,bmean/bindex[i],/ 0.0, 0.0);
	#endif			#endif
	output_string(buffer);			output_string(buffer);
	▲ Show 20 Lines • Show All 914 Lines • Show Last 20 Lines

test-suite/trunk/MultiSource/Benchmarks/nbench/sysspec.h

	Show All 22 Lines
	** this code.			** this code.
	*/			*/

	/*			/*
	** Standard includes			** Standard includes
	*/			*/
	#include <stdlib.h>			#include <stdlib.h>
	#include <stdio.h>			#include <stdio.h>
	#include <time.h>
	#include <string.h>			#include <string.h>

	#include "nmglobal.h"			#include "nmglobal.h"

	#if 0 /!defined(__APPLE__)/			#if 0 /!defined(__APPLE__)/
	#include <malloc.h>			#include <malloc.h>
	#endif			#endif

	▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

test-suite/trunk/MultiSource/Benchmarks/nbench/sysspec.c

	Show First 20 Lines • Show All 775 Lines • ▼ Show 20 Lines

	/****************************			/****************************
	** StartStopwatch			** StartStopwatch
	** Starts a software stopwatch. Returns the first value of			** Starts a software stopwatch. Returns the first value of
	** the stopwatch in ticks.			** the stopwatch in ticks.
	*/			*/
	unsigned long StartStopwatch()			unsigned long StartStopwatch()
	{			{
	#ifdef MACTIMEMGR			/* no timing code in this version. */
	/*			return 0;
	** For Mac code warrior, use timer. In this case, what we return is really
	** a dummy value.
	*/
	InsTime((QElemPtr)&myTMTask);
	PrimeTime((QElemPtr)&myTMTask,-MacHSTdelay);
	return((unsigned long)1);
	#else
	#ifdef WIN31TIMER
	/*
	** Win 3.x timer returns a DWORD, which we coax into a long.
	*/
	_Call16(lpfn,"p",&win31tinfo);
	return((unsigned long)win31tinfo.dwmsSinceStart);
	#else
	return((unsigned long)clock());
	#endif
	#endif
	}			}

	/****************************			/****************************
	** StopStopwatch			** StopStopwatch
	** Stops the software stopwatch. Expects as an input argument			** Stops the software stopwatch. Expects as an input argument
	** the stopwatch start time.			** the stopwatch start time.
	*/			*/
	unsigned long StopStopwatch(unsigned long startticks)			unsigned long StopStopwatch(unsigned long startticks)
	{			{
				/* no timing code in this version. */
	#ifdef MACTIMEMGR			return 1000;
	/*
	** For Mac code warrior...ignore startticks. Return val. in microseconds
	*/
	RmvTime((QElemPtr)&myTMTask);
	return((unsigned long)(MacHSTdelay+myTMTask.tmCount-MacHSTohead));
	#else
	#ifdef WIN31TIMER
	_Call16(lpfn,"p",&win31tinfo);
	return((unsigned long)win31tinfo.dwmsSinceStart-startticks);
	#else
	return((unsigned long)clock()-startticks);
	#endif
	#endif
	}			}

	/****************************			/****************************
	** TicksToSecs			** TicksToSecs
	** Converts ticks to seconds. Converts ticks to integer			** Converts ticks to seconds. Converts ticks to integer
	** seconds, discarding any fractional amount.			** seconds, discarding any fractional amount.
	*/			*/
	unsigned long TicksToSecs(unsigned long tickamount)			unsigned long TicksToSecs(unsigned long tickamount)
	{			{
	#ifdef CLOCKWCT			/* no timing code in this version. */
	return((unsigned long)(tickamount/CLK_TCK));			return 0;
	#endif

	#ifdef MACTIMEMGR
	/* +++ MAC time manager version (using timer in microseconds) +++ */
	return((unsigned long)(tickamount/1000000));
	#endif

	#ifdef CLOCKWCPS
	/* Everybody else */
	return((unsigned long)(tickamount/CLOCKS_PER_SEC));
	#endif

	#ifdef WIN31TIMER
	/* Each tick is 840 nanoseconds */
	return((unsigned long)(tickamount/1000L));
	#endif

	}			}

	/****************************			/****************************
	** TicksToFracSecs			** TicksToFracSecs
	** Converts ticks to fractional seconds. In other words,			** Converts ticks to fractional seconds. In other words,
	** this returns the exact conversion from ticks to			** this returns the exact conversion from ticks to
	** seconds.			** seconds.
	*/			*/
	double TicksToFracSecs(unsigned long tickamount)			double TicksToFracSecs(unsigned long tickamount)
	{			{
	#ifdef CLOCKWCT			/* no timing code in this version. */
	return((double)tickamount/(double)CLK_TCK);			return 0;
	#endif

	#ifdef MACTIMEMGR
	/* +++ MAC time manager version +++ */
	return((double)tickamount/(double)1000000);
	#endif

	#ifdef CLOCKWCPS
	/* Everybody else */
	return((double)tickamount/(double)CLOCKS_PER_SEC);
	#endif

	#ifdef WIN31TIMER
	/* Using 840 nanosecond ticks */
	return((double)tickamount/(double)1000);
	#endif
	}			}