Page MenuHomePhabricator

[AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values for some inputs.
ClosedPublic

Authored by amyk on Aug 4 2021, 6:36 PM.

Details

Summary

This patch update the telecomm-FFT benchmark to use saved sin/cos values for some inputs, to account
for the differences of the AIX libm math library. When compared to Linux libm, with the AIX version of the
sin and cos functions producing slightly less accurate results (with 1 ulp).

The AIX sin function produces different results when the input is (pi/4) or (pi/16), whereas the cos function
produces different results for the input (pi/4). The Linux libm sin/cos results based off these inputs are saved
as three const-declared variables and substituted in when appropriate to ensure the benchmark on AIX
produces the same results.

Diff Detail

Repository
rT test-suite

Event Timeline

amyk created this revision.Aug 4 2021, 6:36 PM
amyk requested review of this revision.Aug 4 2021, 6:36 PM

Hi Amy,

Thanks for the thorough analysis, it's most welcome.

We have had this problem in the past but fixing it with a OS reference output wasn't the right fix. The main problem is that an OS can have multiple C library versions (and variants), and the precision across those versions can vary, too. You'd nee a reference output for the pair OS-libc and that could create a lot of different outputs.

I understand that for AIX changes in the libc are probably rare, but opening reference to generic target OSs will encourage other targets to use it in the wrong way. It could also encourage people creating all variations of endiannes, size, OS and libc on the macro expansion, which would be even worse.

If everybody did the thorough analysis you did, this wouldn't be a problem per se. But it's more common that people would just assume the precision is different because of the libc and just add a new reference output. Having no way to validate that without getting the OS+libc on the actual hardware, we have no choice but to accept the patch.

Another approach that was taken on sphereflake was to implement simpler maths functions (ex. LLVMsin) which are inaccurate but stable, and are simple enough that compilers have a high chance of making them super fast so not to move the performance traces.

This is the file: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp.
There's also a better version here: SingleSource/Benchmarks/Misc/fbench.c.

Would that be a reasonable fix for the FFT benchmark?

If so, then we may want to make them more public and move to some lib root directory, so any test or benchmark can use them.

My reasoning for suggesting it for a benchmark is that even if that changes the performance traces of the benchmarks, we really want to test the performance of the compiler, not the libc, so it will just be on a new level that we can assume "equivalence" with the old level and continue to measure relative performance as before.

Makes sense?

cheers,
--renato

amyk added a comment.Oct 20 2021, 6:25 AM

Hi Amy,

Thanks for the thorough analysis, it's most welcome.

We have had this problem in the past but fixing it with a OS reference output wasn't the right fix. The main problem is that an OS can have multiple C library versions (and variants), and the precision across those versions can vary, too. You'd nee a reference output for the pair OS-libc and that could create a lot of different outputs.

I understand that for AIX changes in the libc are probably rare, but opening reference to generic target OSs will encourage other targets to use it in the wrong way. It could also encourage people creating all variations of endiannes, size, OS and libc on the macro expansion, which would be even worse.

If everybody did the thorough analysis you did, this wouldn't be a problem per se. But it's more common that people would just assume the precision is different because of the libc and just add a new reference output. Having no way to validate that without getting the OS+libc on the actual hardware, we have no choice but to accept the patch.

Another approach that was taken on sphereflake was to implement simpler maths functions (ex. LLVMsin) which are inaccurate but stable, and are simple enough that compilers have a high chance of making them super fast so not to move the performance traces.

This is the file: SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp.
There's also a better version here: SingleSource/Benchmarks/Misc/fbench.c.

Would that be a reasonable fix for the FFT benchmark?

If so, then we may want to make them more public and move to some lib root directory, so any test or benchmark can use them.

My reasoning for suggesting it for a benchmark is that even if that changes the performance traces of the benchmarks, we really want to test the performance of the compiler, not the libc, so it will just be on a new level that we can assume "equivalence" with the old level and continue to measure relative performance as before.

Makes sense?

cheers,
--renato

Hi Renato,

Firstly, I apologize for the delay to your response on this patch. Thank you so much for taking the time in reviewing the patch! I very much appreciate your feedback on this.

Also, thanks for explaining the rationale behind why an OS reference output isn’t a suitable fix for these types of situations, I understand the concern behind that. I took a look at the LLVMsin/LLVMcos implementations that you’ve pointed out in:

SingleSource/Benchmarks/Misc-C++/Large/sphereflake.cpp
SingleSource/Benchmarks/Misc/fbench.c

And have tried just adding these implementations to the telecomm-FFT benchmark to see the results of the benchmark on AIX.
One thing I've noticed is that the results of these implementations differ more significantly when compared to Linux libm. If I remember correctly when I tested it, many of the values seem to differ more than 1ulp.

This motivated me to look a bit further into telecomm-FFT and pinpoint exactly for which inputs sin and cos differ for AIX libm and Linux libm. For sin, it looks like it differs for inputs (pi/4) and (pi/16), whereas it only differs at the (pi/4) for cos.

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

amyk updated this revision to Diff 380933.Oct 20 2021, 6:31 AM
amyk retitled this revision from [AIX][test-suite] Create a new reference output for telecomm-FFT on AIX to [AIX][test-suite] Update the telecomm-FFT benchmark to use saved sin/cos values for some inputs..
amyk edited the summary of this revision. (Show Details)

Update the telecomm-FFT benchmark to replace values of sin/cos on if the inputs are pi/4 and pi/16 respectively.

amyk added a comment.Tue, Nov 9, 6:29 AM

Hi @hubert.reinterpretcast @renenkel @rengolin,
Do any of you have any further comments regarding this patch on whether or not the change is acceptable, or if it should be guarded on AIX only?

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

Hi Amy,

Really sorry about the delay, I completely missed it.

I agree having a lookup table just for these values, if those are the only ones that come out wrong, is better than re-implementing the whole thing.

This means for those values, not often hit, the implementation is instant, but for the rest, it's still depending on libm's performance characteristics, so there should be virtually no change at all.

The main issue is if other libm's have a slightly different result for those values too, and then you break others by hard-coding AIX's libm's answer.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

lkail added a subscriber: lkail.Tue, Nov 9, 7:03 AM
amyk added a comment.Wed, Nov 17, 7:59 AM

I was wondering if it makes sense to implement a look-up value approach, and substitute in the Linux libm results whenever the inputs are (pi/4) or (pi/16), respectively.
Furthermore, perhaps this change can be guarded to run for AIX specifically, if that’s a good idea. I’ve updated the revision to illustrate what I mean. I'd appreciate it if you could let me know if you have any thoughts on this.

Hi Amy,

Really sorry about the delay, I completely missed it.

I agree having a lookup table just for these values, if those are the only ones that come out wrong, is better than re-implementing the whole thing.

This means for those values, not often hit, the implementation is instant, but for the rest, it's still depending on libm's performance characteristics, so there should be virtually no change at all.

The main issue is if other libm's have a slightly different result for those values too, and then you break others by hard-coding AIX's libm's answer.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

No worries at all, Renato. My apologies for not seeing your reply earlier, as well and thanks for getting back to me!
I just wanted to correct that, the few values I'm hard coding actually come from what Linux libm produces, and not AIX libm (since AIX libm produces different results).

So theoretically, I believe there shouldn't be any changes either since using these few hard coded Linux libm values gives the same overall benchmark results as what is expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output.
This change produces the same successful expected output on PPC Linux and PPC AIX.

amyk added a comment.Thu, Nov 25, 11:20 AM

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

rengolin accepted this revision.Thu, Nov 25, 12:53 PM

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

Good point. This is just for this benchmark, so the golden file is enough.

We'd only need further tests if this was a generic FP library for other tests as well.

LGTM, thanks!

This revision is now accepted and ready to land.Thu, Nov 25, 12:53 PM
amyk added a comment.Thu, Nov 25, 1:00 PM

Hi @rengolin :)

I thought about this again and just wanted to make sure that I did not misunderstand your previous comment.

So, if this also works for in another architecture (like x86_64 or Arm on Linux or Mac), then it should be fine to merge.

Do you mean that it would be preferable for other targets to test this patch as well to ensure it is a reasonable fix? Would this be necessary as I'm using a hard-coding a few of the Linux libm produced values (that produces what is already expected in MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.reference_output)?

Good point. This is just for this benchmark, so the golden file is enough.

We'd only need further tests if this was a generic FP library for other tests as well.

LGTM, thanks!

Thank you for elaborating Renato! Appreciate your review.