This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
MultiSource/Benchmarks/MiBench/telecomm-FFT/
-
Benchmarks/
-
MiBench/
-
telecomm-FFT/
-
fourierf.c

Differential D107519

[AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values for some inputs.
ClosedPublic

Authored by amyk on Aug 4 2021, 6:36 PM.

Download Raw Diff

Details

Reviewers

hubert.reinterpretcast
renenkel
rengolin

Commits

rT751722e82554: [AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values…

Summary

This patch update the telecomm-FFT benchmark to use saved sin/cos values for some inputs, to account
for the differences of the AIX libm math library. When compared to Linux libm, with the AIX version of the
sin and cos functions producing slightly less accurate results (with 1 ulp).

The AIX sin function produces different results when the input is (pi/4) or (pi/16), whereas the cos function
produces different results for the input (pi/4). The Linux libm sin/cos results based off these inputs are saved
as three const-declared variables and substituted in when appropriate to ensure the benchmark on AIX
produces the same results.

Diff Detail

Repository: rT test-suite

Event Timeline

amyk created this revision.Aug 4 2021, 6:36 PM

Herald added a subscriber: mgorny. · View Herald TranscriptAug 4 2021, 6:36 PM

amyk requested review of this revision.Aug 4 2021, 6:36 PM

Hi Amy,

Thanks for the thorough analysis, it's most welcome.

We have had this problem in the past but fixing it with a OS reference output wasn't the right fix. The main problem is that an OS can have multiple C library versions (and variants), and the precision across those versions can vary, too. You'd nee a reference output for the pair OS-libc and that could create a lot of different outputs.

I understand that for AIX changes in the libc are probably rare, but opening reference to generic target OSs will encourage other targets to use it in the wrong way. It could also encourage people creating all variations of endiannes, size, OS and libc on the macro expansion, which would be even worse.

If everybody did the thorough analysis you did, this wouldn't be a problem per se. But it's more common that people would just assume the precision is different because of the libc and just add a new reference output. Having no way to validate that without getting the OS+libc on the actual hardware, we have no choice but to accept the patch.

Another approach that was taken on sphereflake was to implement simpler maths functions (ex. LLVMsin) which are inaccurate but stable, and are simple enough that compilers have a high chance of making them super fast so not to move the performance traces.

This is the file: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp.
There's also a better version here: SingleSource/Benchmarks/Misc/fbench.c.

Would that be a reasonable fix for the FFT benchmark?

If so, then we may want to make them more public and move to some lib root directory, so any test or benchmark can use them.

My reasoning for suggesting it for a benchmark is that even if that changes the performance traces of the benchmarks, we really want to test the performance of the compiler, not the libc, so it will just be on a new level that we can assume "equivalence" with the old level and continue to measure relative performance as before.

Makes sense?

cheers,
--renato

In D107519#2927975, @rengolin wrote:

Hi Amy,

Thanks for the thorough analysis, it's most welcome.

We have had this problem in the past but fixing it with a OS reference output wasn't the right fix. The main problem is that an OS can have multiple C library versions (and variants), and the precision across those versions can vary, too. You'd nee a reference output for the pair OS-libc and that could create a lot of different outputs.

I understand that for AIX changes in the libc are probably rare, but opening reference to generic target OSs will encourage other targets to use it in the wrong way. It could also encourage people creating all variations of endiannes, size, OS and libc on the macro expansion, which would be even worse.

If everybody did the thorough analysis you did, this wouldn't be a problem per se. But it's more common that people would just assume the precision is different because of the libc and just add a new reference output. Having no way to validate that without getting the OS+libc on the actual hardware, we have no choice but to accept the patch.

Another approach that was taken on sphereflake was to implement simpler maths functions (ex. LLVMsin) which are inaccurate but stable, and are simple enough that compilers have a high chance of making them super fast so not to move the performance traces.

This is the file: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp.
There's also a better version here: SingleSource/Benchmarks/Misc/fbench.c.

Would that be a reasonable fix for the FFT benchmark?

If so, then we may want to make them more public and move to some lib root directory, so any test or benchmark can use them.

My reasoning for suggesting it for a benchmark is that even if that changes the performance traces of the benchmarks, we really want to test the performance of the compiler, not the libc, so it will just be on a new level that we can assume "equivalence" with the old level and continue to measure relative performance as before.

Makes sense?

cheers,
--renato

Hi Renato,

Firstly, I apologize for the delay to your response on this patch. Thank you so much for taking the time in reviewing the patch! I very much appreciate your feedback on this.

Also, thanks for explaining the rationale behind why an OS reference output isn’t a suitable fix for these types of situations, I understand the concern behind that. I took a look at the LLVMsin/LLVMcos implementations that you’ve pointed out in:

SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp
SingleSource/Benchmarks/Misc/fbench.c

And have tried just adding these implementations to the telecomm-FFT benchmark to see the results of the benchmark on AIX.
One thing I've noticed is that the results of these implementations differ more significantly when compared to Linux libm. If I remember correctly when I tested it, many of the values seem to differ more than 1ulp.

This motivated me to look a bit further into telecomm-FFT and pinpoint exactly for which inputs sin and cos differ for AIX libm and Linux libm. For sin, it looks like it differs for inputs (pi/4) and (pi/16), whereas it only differs at the (pi/4) for cos.

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

Update the telecomm-FFT benchmark to replace values of sin/cos on if the inputs are pi/4 and pi/16 respectively.

Hi @hubert.reinterpretcast @renenkel @rengolin,
Do any of you have any further comments regarding this patch on whether or not the change is acceptable, or if it should be guarded on AIX only?

In D107519#3075064, @amyk wrote:

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

Hi Amy,

Really sorry about the delay, I completely missed it.

I agree having a lookup table just for these values, if those are the only ones that come out wrong, is better than re-implementing the whole thing.

This means for those values, not often hit, the implementation is instant, but for the rest, it's still depending on libm's performance characteristics, so there should be virtually no change at all.

The main issue is if other libm's have a slightly different result for those values too, and then you break others by hard-coding AIX's libm's answer.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

lkail added a subscriber: lkail.Nov 9 2021, 7:03 AM

In D107519#3118414, @rengolin wrote:

In D107519#3075064, @amyk wrote:

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

Hi Amy,

Really sorry about the delay, I completely missed it.

I agree having a lookup table just for these values, if those are the only ones that come out wrong, is better than re-implementing the whole thing.

This means for those values, not often hit, the implementation is instant, but for the rest, it's still depending on libm's performance characteristics, so there should be virtually no change at all.

The main issue is if other libm's have a slightly different result for those values too, and then you break others by hard-coding AIX's libm's answer.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

No worries at all, Renato. My apologies for not seeing your reply earlier, as well and thanks for getting back to me!
I just wanted to correct that, the few values I'm hard coding actually come from what Linux libm produces, and not AIX libm (since AIX libm produces different results).

So theoretically, I believe there shouldn't be any changes either since using these few hard coded Linux libm values gives the same overall benchmark results as what is expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output.
This change produces the same successful expected output on PPC Linux and PPC AIX.

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

In D107519#3154508, @amyk wrote:

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

Good point. This is just for this benchmark, so the golden file is enough.

We'd only need further tests if this was a generic FP library for other tests as well.

LGTM, thanks!

This revision is now accepted and ready to land.Nov 25 2021, 12:53 PM

In D107519#3154648, @rengolin wrote:

In D107519#3154508, @amyk wrote:

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

Good point. This is just for this benchmark, so the golden file is enough.

We'd only need further tests if this was a generic FP library for other tests as well.

LGTM, thanks!

Thank you for elaborating Renato! Appreciate your review.

Closed by commit rT751722e82554: [AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values… (authored by amyk). · Explain WhyNov 26 2021, 7:55 AM

This revision was automatically updated to reflect the committed changes.

amyk added a commit: rT751722e82554: [AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values….

Revision Contents

Path

Size

MultiSource/

Benchmarks/

MiBench/

telecomm-FFT/

fourierf.c

40 lines

Diff 390064

MultiSource/Benchmarks/MiBench/telecomm-FFT/fourierf.c

Show All 20 Lines
#include <stdio.h>		#include <stdio.h>
#include <math.h>		#include <math.h>

#include "fourier.h"		#include "fourier.h"
#include "ddcmath.h"		#include "ddcmath.h"

#define CHECKPOINTER(p) CheckPointer(p,#p)		#define CHECKPOINTER(p) CheckPointer(p,#p)

		const double SinPi4Result = 0x1.6a09e667f3bccp-1;
		const double SinPi16Result = 0x1.8f8b83c69a60ap-3;
		const double CosPi4Result = 0x1.6a09e667f3bcdp-1;

static void CheckPointer ( void p, char name )		static void CheckPointer ( void p, char name )
{		{
if ( p == NULL )		if ( p == NULL )
{		{
fprintf ( stderr, "Error in fft_float(): %s == NULL\n", name );		fprintf ( stderr, "Error in fft_float(): %s == NULL\n", name );
exit(1);		exit(1);
}		}
}		}
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	void fft_float (
BlockEnd = 1;		BlockEnd = 1;
for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 )		for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 )
{		{
double delta_angle = angle_numerator / (double)BlockSize;		double delta_angle = angle_numerator / (double)BlockSize;
double sm2 = sin ( -2 * delta_angle );		double sm2 = sin ( -2 * delta_angle );
double sm1 = sin ( -delta_angle );		double sm1 = sin ( -delta_angle );
double cm2 = cos ( -2 * delta_angle );		double cm2 = cos ( -2 * delta_angle );
double cm1 = cos ( -delta_angle );		double cm1 = cos ( -delta_angle );
		// The sin and cos functions implemented in the AIX libm math
		// library produce slightly less accurate results compared to sin
		// and cos in Linux libm for some inputs; a difference of 1ulp.
		// For sin, the results differ between AIX and Linux libm for inputs
		// (pi/4) and (pi/16).
		// For cos, the results differ between AIX and Linux libm for (pi/4).
		// For these inputs specifically, the sin and cos results are set
		// to the specific results retrieved from Linux libm.
		if ((-2 * delta_angle) == (DDC_PI / 4)) {
		sm2 = SinPi4Result;
		cm2 = CosPi4Result;
		} else if ((-2 * delta_angle) == (DDC_PI / 16))
		sm2 = SinPi16Result;
		if ((-delta_angle) == (DDC_PI / 4)) {
		sm1 = SinPi4Result;
		cm1 = CosPi4Result;
		} else if ((-delta_angle) == (DDC_PI / 16))
		sm1 = SinPi16Result;
double w = 2 * cm1;		double w = 2 * cm1;
double ar[3], ai[3];		double ar[3], ai[3];
double temp;		double temp;

for ( i=0; i < NumSamples; i += BlockSize )		for ( i=0; i < NumSamples; i += BlockSize )
{		{
ar[2] = cm2;		ar[2] = cm2;
ar[1] = cm1;		ar[1] = cm1;
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	#pragma STDC FP_CONTRACT OFF
BlockEnd = 1;		BlockEnd = 1;
for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 )		for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 )
{		{
double delta_angle = angle_numerator / (double)BlockSize;		double delta_angle = angle_numerator / (double)BlockSize;
double sm2 = sin ( -2 * delta_angle );		double sm2 = sin ( -2 * delta_angle );
double sm1 = sin ( -delta_angle );		double sm1 = sin ( -delta_angle );
double cm2 = cos ( -2 * delta_angle );		double cm2 = cos ( -2 * delta_angle );
double cm1 = cos ( -delta_angle );		double cm1 = cos ( -delta_angle );
		// The sin and cos functions implemented in the AIX libm math
		// library produce slightly less accurate results compared to sin
		// and cos in Linux libm for some inputs; a difference of 1ulp.
		// For sin, the results differ between AIX and Linux libm for inputs
		// (pi/4) and (pi/16).
		// For cos, the results differ between AIX and Linux libm for (pi/4).
		// For these inputs specifically, the sin and cos results are set
		// to the specific results retrieved from Linux libm.
		if ((-2 * delta_angle) == (DDC_PI / 4)) {
		sm2 = SinPi4Result;
		cm2 = CosPi4Result;
		} else if ((-2 * delta_angle) == (DDC_PI / 16))
		sm2 = SinPi16Result;
		if ((-delta_angle) == (DDC_PI / 4)) {
		sm1 = SinPi4Result;
		cm1 = CosPi4Result;
		} else if ((-delta_angle) == (DDC_PI / 16))
		sm1 = SinPi16Result;
double w = 2 * cm1;		double w = 2 * cm1;
double ar[3], ai[3];		double ar[3], ai[3];
double temp;		double temp;

for ( i=0; i < NumSamples; i += BlockSize )		for ( i=0; i < NumSamples; i += BlockSize )
{		{
ar[2] = cm2;		ar[2] = cm2;
ar[1] = cm1;		ar[1] = cm1;
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines