This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
CMakeLists.txt
-
SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/
-
Benchmarks/
-
Polybench/
-
linear-algebra/
-
kernels/
-
atax/
-
CMakeLists.txt
1/3
atax.c

Differential D104935

Update the Polybench tests to check relative error
AcceptedPublic

Authored by andrew.w.kaylor on Jun 25 2021, 11:17 AM.

Download Raw Diff

Details

Reviewers

rengolin
sebpop
mibintc
zahiraam

Summary

This change updates the Polybench tests to check results using relative error rather than absolute error and adds an option to have different tolerances depending on whether fast-math and fp-contract are enabled.

(Note: In the first draft, I'm only updating one test in order to get feedback.

Diff Detail

Event Timeline

andrew.w.kaylor created this revision.Jun 25 2021, 11:17 AM

Herald added a subscriber: mgorny. · View Herald TranscriptJun 25 2021, 11:17 AM

andrew.w.kaylor requested review of this revision.Jun 25 2021, 11:17 AM

Harbormaster completed remote builds in B111043: Diff 354559.Jun 25 2021, 11:18 AM

andrew.w.kaylor set the repository for this revision to rT test-suite.Jun 25 2021, 1:14 PM

rengolin added inline comments.Jun 25 2021, 2:04 PM

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.c
119	Pass in comment: this adds a (variably) slow fmul to the existing fsub and fabs, which can more than double the execution time in this piece of code. If we have enough runtime on the actual benchmark, it may not be a big problem, though.

Hi Andrew,

Thanks for the RFC, I think this is a good discussion to have.

I think the CMake side makes sense, because we may want different precision checks depending on how we run. Even in production systems we may accept imprecision (or increased precision) for different reasons (including performance).

The actual check change is less clear.

True, fabs(a-b) is problematic. It doesn't work well for very small or very large numbers, it ignores differences relative to the magnitude of the values, and we need to add an EPSILON for every test, sometimes more than one. But that is also true for normalising the values, as some tests need to be more precise than others anyway.

We're also still not handling corner cases that, for example boost, does. I'm not proposing we use boost, but perhaps we could copy some of the logic from there, and create a small library that the tests include and use to do the comparisons.

Then, all the tests have to know is which values they should check and what is the appropriate epsilon for each one. We can even make the default epsilon to be proportional to one of the values, say eps = (a * 1e-5) then do fabs(a - b) < eps (or whatever covers more corner cases in the FP-world) and let users override the value of eps only if necessary.

Makes sense?
--renato

I think having a more robust library for FP comparisons in the test suite would be great. In fact, I thought about moving even the simple comparison I did in this patch to some common location where it could be included by other tests. Maybe that's a good starting point for an incremental change in the direction you're suggesting. A robust library of such checks is beyond the scope of what I have time to do at the moment, but I would certainly support that direction.

In this case (atax) and I think probably for the rest of the polybench tests, the relative error check I added is probably good enough. My main concern is the case where one of the values is zero. If we've enabled FTZ for the test and the values are approaching the denormal range this could cause a very small value to compare as out-of-tolerance with a value that got flushed to zero. I suppose we can deal with that problem if/when it happens.

In the interest of full disclosure, the FP_TOLERANCE value I used for the atax change was determined experimentally from the actual deltas introduced by enabling FMA for the test. It's shorthand for (2.0*DBL_EPISOLN), so I think it's a very reasonable tolerance, maybe even a bit conservative. I don't know if it would be enough for other architectures or library-related errors. If not, it could probably be relaxed without reducing the value of the test.

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.c
119	I'm not sure how this test actually measures performance. The code includes statements (polybench_start_instruments, polybench_stop_instruments) that seem to be intended to isolate the optimized kernel for performance measurements, but I am skeptical that this is the measurement that actually shows up in the LNT comparison. In any event, the test is running the kernel twice in addition to doing the comparison. I would hope that the kernel calculation overshadows the FP comparison. It's still less than ideal, but what we'd end up with for total run time is 'precise kernel time' + 'optimized kernel time' + 'FP comparison time'. Most relevant performance changes would probably effect both kernel calculations, so the benchmark isn't entirely useless. Any FP-specific performance changes might be masked somewhat by the double run. I think the best way around this is to have a bot testing for performance that doesn't check the FP values at all and another bot checking for correctness. Of course, I don't know who will be able to supply the machines for this.

I agree a generic comparison library is outside the scope of this patch.

Getting machines to run the test-suite like the bots, for less popular architectures, usually means taking one of the bots offline and running there. This is obviously tricky.

Just like with the previous test-suite patches, and our general policy, if you test on one architecture it should be fine to commit and monitor breakages in other arches.

Thanks!
--renato

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.c
119	Ah, yes, you're right. We used to just time those but that must have changed since last time I looked at them seriously.

This revision is now accepted and ready to land.Jun 30 2021, 8:07 AM

mibintc added a reviewer: zahiraam.Jul 19 2021, 7:58 AM

Revision Contents

Path

Size

CMakeLists.txt

7 lines

SingleSource/

Benchmarks/

Polybench/

linear-algebra/

kernels/

atax/

CMakeLists.txt

14 lines

atax.c

62 lines

Diff 354559

CMakeLists.txt

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	set(LLVM_CODESIGNING_IDENTITY "" CACHE STRING
"Sign executables and dylibs with the given identity or skip if empty (Darwin Only)")		"Sign executables and dylibs with the given identity or skip if empty (Darwin Only)")

add_definitions(-DNDEBUG)		add_definitions(-DNDEBUG)
option(TEST_SUITE_SUPPRESS_WARNINGS "Suppress all warnings" ON)		option(TEST_SUITE_SUPPRESS_WARNINGS "Suppress all warnings" ON)
if(${TEST_SUITE_SUPPRESS_WARNINGS})		if(${TEST_SUITE_SUPPRESS_WARNINGS})
add_definitions(-w)		add_definitions(-w)
endif()		endif()

		# Tests that use floating point math should have different tolerances when
		# fast-math and fp-contract are enabled.
		# TODO: Check to see if these settings are consistent with the compile options.
		option(ENABLE_FAST_MATH "Allow value changing FP optimizations" OFF)
		# In most cases clang uses fp-contract=on by default
		option(ENABLE_FP_CONTRACT "Allow fused multiply and add instructions" ON)

# We want reproducible builds, so using __DATE__ and __TIME__ is bad		# We want reproducible builds, so using __DATE__ and __TIME__ is bad
add_definitions(-Werror=date-time)		add_definitions(-Werror=date-time)

# Add path for custom modules		# Add path for custom modules
set(CMAKE_MODULE_PATH		set(CMAKE_MODULE_PATH
${CMAKE_MODULE_PATH}		${CMAKE_MODULE_PATH}
"${CMAKE_CURRENT_SOURCE_DIR}/cmake"		"${CMAKE_CURRENT_SOURCE_DIR}/cmake"
"${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules"		"${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules"
▲ Show 20 Lines • Show All 346 Lines • Show Last 20 Lines

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/CMakeLists.txt

	set(POLYBENCH_UTILS SingleSource/Benchmarks/Polybench/utilities )			set(POLYBENCH_UTILS SingleSource/Benchmarks/Polybench/utilities )
	list(APPEND CFLAGS -I ${CMAKE_SOURCE_DIR}/${POLYBENCH_UTILS} -DPOLYBENCH_DUMP_ARRAYS)			list(APPEND CFLAGS -I ${CMAKE_SOURCE_DIR}/${POLYBENCH_UTILS} -DPOLYBENCH_DUMP_ARRAYS)
	set(HASH_PROGRAM_OUTPUT 1)			set(HASH_PROGRAM_OUTPUT 1)
	add_definitions(-DFP_ABSTOLERANCE=1e-5)			if(ENABLE_FAST_MATH)
	# Floating point contraction must be suppressed due to accuracy issues			add_definitions(-DFP_TOLERANCE=4.5e-16)
	list(APPEND CXXFLAGS -ffp-contract=off -DFMA_DISABLED=1)			add_definitions(-DENABLE_FP_TOLERANCE_CHECK=1)
	list(APPEND CFLAGS -ffp-contract=off -DFMA_DISABLED=1)			elseif(ENABLE_FP_CONTRACT)
				add_definitions(-DFP_TOLERANCE=4.5e-16)
				add_definitions(-DENABLE_FP_TOLERANCE_CHECK=1)
				else()
				add_definitions(-DFP_TOLERANCE=0.0)
				endif()
	llvm_singlesource()			llvm_singlesource()

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.c

Show First 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	#pragma scop
tmp[i] = tmp[i] + A[i][j] * x[j];		tmp[i] = tmp[i] + A[i][j] * x[j];
for (j = 0; j < _PB_NY; j++)		for (j = 0; j < _PB_NY; j++)
y[j] = y[j] + A[i][j] * tmp[i];		y[j] = y[j] + A[i][j] * tmp[i];
}		}
#pragma endscop		#pragma endscop

}		}

#if !FMA_DISABLED		#if ENABLE_FP_TOLERANCE_CHECK
// NOTE: FMA_DISABLED is true for targets where FMA contraction causes
// discrepancies which cause the accuracy checks to fail.
// In this case, the test runs with the option -ffp-contract=off
static void		static void
kernel_atax_StrictFP(int nx, int ny,		kernel_atax_ValueSafe(int nx, int ny,
DATA_TYPE POLYBENCH_2D(A,NX,NY,nx,ny),		DATA_TYPE POLYBENCH_2D(A,NX,NY,nx,ny),
DATA_TYPE POLYBENCH_1D(x,NY,ny),		DATA_TYPE POLYBENCH_1D(x,NY,ny),
DATA_TYPE POLYBENCH_1D(y,NY,ny),		DATA_TYPE POLYBENCH_1D(y,NY,ny),
DATA_TYPE POLYBENCH_1D(tmp,NX,nx))		DATA_TYPE POLYBENCH_1D(tmp,NX,nx))
{		{
		#pragma float_control(precise, on)
#pragma STDC FP_CONTRACT OFF		#pragma STDC FP_CONTRACT OFF
int i, j;		int i, j;

for (i = 0; i < _PB_NY; i++)		for (i = 0; i < _PB_NY; i++)
y[i] = 0;		y[i] = 0;
for (i = 0; i < _PB_NX; i++)		for (i = 0; i < _PB_NX; i++)
{		{
tmp[i] = 0;		tmp[i] = 0;
for (j = 0; j < _PB_NY; j++)		for (j = 0; j < _PB_NY; j++)
tmp[i] = tmp[i] + A[i][j] * x[j];		tmp[i] = tmp[i] + A[i][j] * x[j];
for (j = 0; j < _PB_NY; j++)		for (j = 0; j < _PB_NY; j++)
y[j] = y[j] + A[i][j] * tmp[i];		y[j] = y[j] + A[i][j] * tmp[i];
}		}
}		}

/* Return 0 when one of the elements of arrays A and B do not match within the		/* Return 0 when one of the elements of arrays A and B do not match within the
allowed FP_ABSTOLERANCE. Return 1 when all elements match. */		allowed FP_TOLERANCE. Return 1 when all elements match. */
static int		static int
check_FP(int ny,		check_FP(int ny,
DATA_TYPE POLYBENCH_1D(A,NY,ny),		DATA_TYPE POLYBENCH_1D(A,NY,ny),
DATA_TYPE POLYBENCH_1D(B,NY,ny)) {		DATA_TYPE POLYBENCH_1D(B,NY,ny)) {
int i;		int i;
double AbsTolerance = FP_ABSTOLERANCE;		double RelTolerance = FP_TOLERANCE;
for (i = 0; i < _PB_NY; i++)		for (i = 0; i < _PB_NY; i++)
{		{
double V1 = A[i];		double V1 = A[i];
double V2 = B[i];		double V2 = B[i];
double Diff = fabs(V1 - V2);		double RelDiff;
if (Diff > AbsTolerance) {		// If either value is zero, should we add an epsilon?
fprintf(stderr, "A[%d] = %lf and B[%d] = %lf differ more than"		if (V2)
" FP_ABSTOLERANCE = %lf\n", i, V1, i, V2, AbsTolerance);		RelDiff = fabs(V1/V2 - 1.0);
		rengolinUnsubmitted Not Done Reply Inline Actions Pass in comment: this adds a (variably) slow fmul to the existing fsub and fabs, which can more than double the execution time in this piece of code. If we have enough runtime on the actual benchmark, it may not be a big problem, though. rengolin: Pass in comment: this adds a (variably) slow fmul to the existing fsub and fabs, which can more…
		andrew.w.kaylorAuthorUnsubmitted Done Reply Inline Actions I'm not sure how this test actually measures performance. The code includes statements (polybench_start_instruments, polybench_stop_instruments) that seem to be intended to isolate the optimized kernel for performance measurements, but I am skeptical that this is the measurement that actually shows up in the LNT comparison. In any event, the test is running the kernel twice in addition to doing the comparison. I would hope that the kernel calculation overshadows the FP comparison. It's still less than ideal, but what we'd end up with for total run time is 'precise kernel time' + 'optimized kernel time' + 'FP comparison time'. Most relevant performance changes would probably effect both kernel calculations, so the benchmark isn't entirely useless. Any FP-specific performance changes might be masked somewhat by the double run. I think the best way around this is to have a bot testing for performance that doesn't check the FP values at all and another bot checking for correctness. Of course, I don't know who will be able to supply the machines for this. andrew.w.kaylor: I'm not sure how this test actually measures performance. The code includes statements…
		rengolinUnsubmitted Not Done Reply Inline Actions Ah, yes, you're right. We used to just time those but that must have changed since last time I looked at them seriously. rengolin: Ah, yes, you're right. We used to just time those but that must have changed since last time I…
		else if (V1)
		RelDiff = fabs(V2/V1 - 1.0);
		else
		RelDiff = 0; // Both zero. = fabs(V1/V2 - 1.0d);
		if (RelDiff > RelTolerance) {
		fprintf(stderr, "A[%d] = %lf and B[%d] = %lf differ by %le,"
		" more than FP_TOLERANCE = %le\n", i, V1, i, V2,
		RelDiff, RelTolerance);
return 0;		return 0;
}		}
}		}

return 1;		return 1;
}		}
#endif		#endif // ENABLE_FP_TOLERANCE_CHECK

int main(int argc, char** argv)		int main(int argc, char** argv)
{		{
/* Retrieve problem size. */		/* Retrieve problem size. */
int nx = NX;		int nx = NX;
int ny = NY;		int ny = NY;

/* Variable declaration/allocation. */		/* Variable declaration/allocation. */
POLYBENCH_2D_ARRAY_DECL(A, DATA_TYPE, NX, NY, nx, ny);		POLYBENCH_2D_ARRAY_DECL(A, DATA_TYPE, NX, NY, nx, ny);
POLYBENCH_1D_ARRAY_DECL(x, DATA_TYPE, NY, ny);		POLYBENCH_1D_ARRAY_DECL(x, DATA_TYPE, NY, ny);
POLYBENCH_1D_ARRAY_DECL(y, DATA_TYPE, NY, ny);		POLYBENCH_1D_ARRAY_DECL(y, DATA_TYPE, NY, ny);
#if !FMA_DISABLED		#if ENABLE_FP_TOLERANCE_CHECK
POLYBENCH_1D_ARRAY_DECL(y_StrictFP, DATA_TYPE, NY, ny);		POLYBENCH_1D_ARRAY_DECL(y_ValueSafe, DATA_TYPE, NY, ny);
#endif		#endif // ENABLE_FP_TOLERANCE_CHECK
POLYBENCH_1D_ARRAY_DECL(tmp, DATA_TYPE, NX, nx);		POLYBENCH_1D_ARRAY_DECL(tmp, DATA_TYPE, NX, nx);

/* Initialize array(s). */		/* Initialize array(s). */
init_array (nx, ny, POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(x));		init_array (nx, ny, POLYBENCH_ARRAY(A), POLYBENCH_ARRAY(x));

/* Start timer. */		/* Start timer. */
polybench_start_instruments;		polybench_start_instruments;

/* Run kernel. */		/* Run kernel. */
kernel_atax (nx, ny,		kernel_atax (nx, ny,
POLYBENCH_ARRAY(A),		POLYBENCH_ARRAY(A),
POLYBENCH_ARRAY(x),		POLYBENCH_ARRAY(x),
POLYBENCH_ARRAY(y),		POLYBENCH_ARRAY(y),
POLYBENCH_ARRAY(tmp));		POLYBENCH_ARRAY(tmp));

/* Stop and print timer. */		/* Stop and print timer. */
polybench_stop_instruments;		polybench_stop_instruments;
polybench_print_instruments;		polybench_print_instruments;

#if FMA_DISABLED		#if !ENABLE_FP_TOLERANCE_CHECK
/* Prevent dead-code elimination. All live-out data must be printed		/* Prevent dead-code elimination. All live-out data must be printed
by the function call in argument. */		by the function call in argument. */
polybench_prevent_dce(print_array(nx, POLYBENCH_ARRAY(y)));		polybench_prevent_dce(print_array(nx, POLYBENCH_ARRAY(y)));
#else		#else // ENABLE_FP_TOLERANCE_CHECK
kernel_atax_StrictFP (nx, ny,		kernel_atax_ValueSafe (nx, ny,
POLYBENCH_ARRAY(A),		POLYBENCH_ARRAY(A),
POLYBENCH_ARRAY(x),		POLYBENCH_ARRAY(x),
POLYBENCH_ARRAY(y_StrictFP),		POLYBENCH_ARRAY(y_ValueSafe),
POLYBENCH_ARRAY(tmp));		POLYBENCH_ARRAY(tmp));
if (!check_FP(ny, POLYBENCH_ARRAY(y), POLYBENCH_ARRAY(y_StrictFP)))		if (!check_FP(ny, POLYBENCH_ARRAY(y), POLYBENCH_ARRAY(y_ValueSafe)))
return 1;		return 1;

/* Prevent dead-code elimination. All live-out data must be printed		/* Prevent dead-code elimination. All live-out data must be printed
by the function call in argument. */		by the function call in argument. */
polybench_prevent_dce(print_array(nx, POLYBENCH_ARRAY(y_StrictFP)));		polybench_prevent_dce(print_array(nx, POLYBENCH_ARRAY(y_ValueSafe)));
#endif		#endif // ENABLE_FP_TOLERANCE_CHECK

/* Be clean. */		/* Be clean. */
POLYBENCH_FREE_ARRAY(A);		POLYBENCH_FREE_ARRAY(A);
POLYBENCH_FREE_ARRAY(x);		POLYBENCH_FREE_ARRAY(x);
POLYBENCH_FREE_ARRAY(y);		POLYBENCH_FREE_ARRAY(y);
#if !FMA_DISABLED		#if ENABLE_FP_TOLERANCE_CHECK
POLYBENCH_FREE_ARRAY(y_StrictFP);		POLYBENCH_FREE_ARRAY(y_ValueSafe);
#endif		#endif
POLYBENCH_FREE_ARRAY(tmp);		POLYBENCH_FREE_ARRAY(tmp);

return 0;		return 0;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Update the Polybench tests to check relative errorAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354559

CMakeLists.txt

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/CMakeLists.txt

SingleSource/Benchmarks/Polybench/linear-algebra/kernels/atax/atax.c

Update the Polybench tests to check relative error
AcceptedPublic