This is an archive of the discontinued LLVM Phabricator instance.

[test-suite] Add unit test for libmvec sincos() auto-vectorisation
Needs ReviewPublic

Authored by tim.schmielau on Mar 4 2022, 12:39 AM.

Details

Summary

Prepare for merging https://reviews.llvm.org/D116879 into LLVM,
which enables vectorization of sincos() via libmvec.

Compared to the other functions in libmvec, sincos() vectorisation
offers additional pitfalls as it returns two results via pointers.
Thus a test seems justified that checks the transformations performed
by llvm match the interface of the underlying vector library.

The test is run for any instruction set in {SSE, SSE2, AVX, AVX2, AVX512F}
supported by the CPU as well as with -march=native to improve coverage.
The supplied CMakeLists.txt only enables the check for -fveclib=libmvec on x86.
I leave enabling other vector libraries to people who can test those setups.
There is no check for successfull vectorization here, however a
regression test in the llvm repo already checks that.

Event Timeline

tim.schmielau created this revision.Mar 4 2022, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 12:39 AM
tim.schmielau requested review of this revision.Mar 4 2022, 12:39 AM

fix mixed up comments

my previous update seems to have eaten the original changes.
Let's see if I can get this right.

fhahn added a comment.Mar 4 2022, 1:42 AM

Thanks for putting up the test. I am not sure if it should be Libmvec or X86 specific though. There are other vector libraries on different platforms that provide vector versions of sincos. I don't think we should exclude those from this testing here.

SingleSource/UnitTests/Vectorizer/CMakeLists.txt
4 ↗(On Diff #412952)

Is this enough to ensure that the library is actually available on the system or could the compiler have the flag but the linker can't find libmvec?

I am not sure if it should be Libmvec or X86 specific though. There are other vector libraries on different platforms that provide vector versions of sincos. I don't think we should exclude those from this testing here.

I've been unsure about this as well. My preference would be to always run the test, but (somehow) issue a warning if the code does not get vectorized.

However a warning would alter output and thus make the test fail (unless the warning goes to stderr, but I'm not sure that is desirable either). I'm open to suggestions!

SingleSource/UnitTests/Vectorizer/CMakeLists.txt
4 ↗(On Diff #412952)

Indeed, thanks for pointing that out. With consideration of your other comment, shall I drop the entire SSE2 and libmvec detection logic?

tim.schmielau added a comment.EditedMar 4 2022, 2:21 AM

Even if we run the test unconditionally, some logic will still be required to select the right flags to enable auto-vectorization.

I am wondering if the aim should be to make auto-vectorization of math library calls a commodity that does not require special compiler flags but is automatically enabled with -O3. In that case, the detection logic shouldn't go into the test suite but the compiler itself.

fhahn added a comment.Mar 4 2022, 2:28 AM

Even if we run the test unconditionally, some logic will still be required to select the right flags to enable auto-vectorization.

I'm not sure what you mean here. Are you talking about automatically choosing a vector library (-fveclib)? I am not sure that will be feasible, as there are 2 potential complications:

  1. Unless the platform guarantees that the vector library will always be available the compiler has no way of knowing what libraries are available.
  2. Most vector libraries are not drop-in replacements for existing math functions, i.e. they will not produce the same result in all cases

Note that we could check in CMake if the library is available by trying to link a simple program that uses a function from libmvec.

tim.schmielau planned changes to this revision.Mar 4 2022, 2:38 AM

Are you talking about automatically choosing a vector library (-fveclib)?

Yes, where available I want this test to be run with a vector library. As you point out, this is not entirely trivial though. I'll follow your suggestion and make the directory generic, with a specific test for libmvec presence to enable the test with required flags. Others can then add suitable setups for other vector libraries where they are able to check the setup is working.

Add check that libmvec is available for linking and at runtime
and rename Libmvec/ directory to Veclib/ to indicate the test is
not specific to libmvec and might be enabled for other vector
libraries as well.

RKSimon added inline comments.Mar 4 2022, 9:58 AM
cmake/modules/DetectLibmvecX86.c
6 ↗(On Diff #413045)

do we need to test for 32 and 64 bit x86? with 32-bit pointers will the args take __m128i ?

  • check for AVX2, not SSE2

What a blunder...

tim.schmielau added inline comments.Mar 4 2022, 10:12 AM
cmake/modules/DetectLibmvecX86.c
6 ↗(On Diff #413045)

I don't know. https://sourceware.org/glibc/wiki/libmvec is silent about anything else bit x86-64 so I'll write a 32 bit test to find out...

tim.schmielau edited the summary of this revision. (Show Details)Mar 4 2022, 10:41 AM
  • handle 32 bit pointer size
tim.schmielau added inline comments.Mar 4 2022, 11:48 PM
cmake/modules/DetectLibmvecX86.c
6 ↗(On Diff #413045)

You are of course right. I've updated the code to handle 32 bit wordsize.
Now anyone tell me about a case where pointer size isn't the same as word size?

  • run libmvec tests for each supported ISA
  • don't depend on immintrin.h

It turns out my mixup of AVX2 and SSE2 had its value, although in an unexpected way:
I've modified the test so that it runs for all instruction sets in {SSE, SSE2, AVX, AVX2, AVX512F} as well as with -march=native to improve coverage.
And indeed SSE and SSE2 tests are currently failing when run with the changes proposed in https://reviews.llvm.org/D116879.
The failures look like they could be related to outer loop vectorisation, but will require further investigation.

  • run tests in both single and double precision

Interestingly, the SSE and SSE2 tests succeed in single precision.

I haven't tried to debug the failures yet but that is what I need to do next.
I am happy with the coverage this test now provides, but maybe I'll still learn more from debugging the failures.

tim.schmielau edited the summary of this revision. (Show Details)Mar 6 2022, 5:11 AM
  • compute reference result in long double

to ensure it cannot get vectorized as well.

  • increase nest count
  • add nounroll sincos() tests
  • make sincos test fail reliably even in single precision
  • protect against wrongly computed reference results by printing a few values

So it turns out I was quite mistaken when I started the LLVM sincos() autovectorization
patch assuming I am only exposing functionality already available via OpenMP.
Careful reading of the OpenMP documentation for the current development branch at
https://clang.llvm.org/docs/OpenMPSupport.html, or a look at the LLVM sources reveals
that while #pragma omp declare simd is implemented, #pragma omp simd is not.
I.e., the OpenMP equivalent for declaring vectorized functions via -fveclib is present,
but the necessary analysis for deciding whether a function call can safely be vectorized
is not.
Instead, the decision relies on the simple heuristc that any currently exposed function is safe.
This assumption is of course violated if sincos() is added to the list of vectorized functions.

Given this status, all nested tests in this commit should have failed. What made some of them pass is
that LLVM chooses to aggressively interleave the inner loop of the nested test with interleave
factors up to 4. With vector lengths up to 8 for single precision AVX2 this means that at least
32 inner loop iterations are required before the incorrect dependency analysis reliably leads to
incorrect concurrent evaluation of dependent expressions and reliable test failure.

An alternative way of making the test fail is to avoid interleaving the loop by adding #pragma unroll 1.
I've done both for better coverage - increased the nested loop iteration beyond 32, and added a separate
test with unrolling disabled.

The updated test now always fails with my current incorrect patch at https://reviews.llvm.org/D120977,
and succeeds with the fixed version that I'll post there shortly.

  • at newlines at end of file where missing

Add tests using #pragma clang loop vectorize(...).