Import/adapt the SLEEF vector math-function library as an LLVM runtime
Needs ReviewPublic

Authored by hfinkel on Sep 26 2016, 6:32 PM.
This revision needs review, but there are no reviewers specified.

Details

Reviewers
None
Summary

This represents the start of my work to import and adapt the SLEEF vector math-function library, authored by Naoki Shibata, to LLVM. See https://github.com/shibatch/sleef for the original source. For the RFC, see: http://lists.llvm.org/pipermail/llvm-dev/2016-July/102254.html

I've not changed any of the meat of the implementation in order to make this patch; but I've tried to make is more like a runtime library (and I've made the source files C++ files instead of C files). All of the external functions start with __. The largest issue is is how to deal with vector ISA/ABI compatibility. I've tried to properly separate the concerns of:

  1. For what processor is the runtime library itself being compiled
  2. For what vector ABIs are vectorized functions being made available

Aside from the scalar versions, which are pure C/C++ and always compiled, vector versions are compiled when possible. For example, we have xsin and xsinf, the scalar versions, xsinsse2 and xsinfsse2 (which use m128d and m128 types), xsinavx and xsinfavx (which use m256d and m256), and xsinavx2 and xsinfavx2 (which also use m256d and m256, although some functions use a different integer type compared to the avx versions). As many of these variants as possible are compiled into the library simultaneously.

The library is implemented using intrinsics, not assembly, and so the associated target features must be enabled in the compiler to compile the relevant versions of these functions. By default, compilers on x86 often only enable support for SSE2, (i.e. not AVX or later ISAs). When the compiler will support adding flags to turn on AVX, AVX2, etc. the build will do that, but only the files which require it. This is important because if you're building for an older core (or just trying to the portable), you don't want the compiler to start generating AVX instructions inside your SSE2 functions.

For ARM, NEON is supported (although only single-precision currently).

I've not yet dealt with testing; the source on github has testing programs. They make use of mpfr (a dependency I doubt we want), and, in part, perform randomized testing (and, at least for regression tests, we probably don't want that either). We need to figure out what we want to do here.

In any case, there's a lot of discuss here about code structure, naming conventions, testing, etc.

Diff Detail

hfinkel retitled this revision from to Import/adapt the SLEEF vector math-function library as an LLVM runtime.Sep 26 2016, 6:32 PM
hfinkel updated this object.
hfinkel added a subscriber: parallel_libs-commits.
hfinkel updated this revision to Diff 72595.Sep 26 2016, 6:39 PM

Update lib license headers to be consistent with our policy.

lsaba added a subscriber: lsaba.Sep 27 2016, 4:15 AM
spatel added a subscriber: spatel.Sep 27 2016, 6:56 AM
hfinkel updated this revision to Diff 72731.Sep 27 2016, 4:30 PM

Start on some basic unit tests (and some other minor fixes).

hfinkel updated this revision to Diff 72940.Sep 28 2016, 7:15 PM

I finished up the initial set of unit tests (we now have essentially all of the tests from the original source, for both the scalar and vector versions of the functions).

A few people have suggested using the Intel vector ABI name-mangling scheme (https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) for these functions.

There are two potential not-mutually-exclusive meanings to this suggestion:

  1. Use our own base names for the functions (e.g. xcos or sleef_cos), but identify the vector versions using the vector ABI mangling scheme (e.g. _ZGVxN4v___xcos or _ZGVxN4v___sleef_cos).
  2. Name our vector functions provide vector-ABI-mangled versions of the libm functions (e.g. _ZGVxN4v_cos).

Regarding (1), this makes sense to me, the only disadvantage I can see being that the names are less human-readable, what do people prefer? Regarding (2), one potential issue is that, since we (i.e. the provider of a compiler and not the provider of libc, which is true for some of us) don't "own" cos we also don't really "own" _ZGVxN4v_cos either, and providing this when libc might also could lead to conflicts. One option is to use our own base name for the functions, but also provide weak aliases to the standard names with the vector ABI names. Thoughts?

Focusing on the (very) high level issues:

I think we should probably import this with a name other than "SLEEF". At least for me, that name doesn't convey much. I would choose the name to help guide how the library should be developed and used, hopefully in a way that is reasonable short.

At least two possible and reasonable long-term goals come to my mind:

  1. Provide vectorized versions of any math routines whose scalar counterparts are typically provided via libc or libm such that LLVM can use the system libc and/or libm without being limited to the vectorized facilities it happens to provide when vectorizing code.
  1. Provide vectorized versions of any math routines useful to enable vectorization.

Clearly #2 is a superset of #1 here. The interesting question between these two is whether there is a specific desire to limit the scope to the libc and libm common set of mathematical functions. I can see good reasons to do that, and I can see good reasons to expand the scope if there are common mathematical routines that come up for users in contexts where vectorization is desired.

If folks want the scope to be narrow along the lines of #1, I would suggest a library name that is reasonably reminiscent of libm. libmv or libmvec? Dunno.

If folks want the scope to be broader, I would probably be inclined to go for a broader name as well. vector_math or vmath? Other ideas?

Or are there other suggestions on how to describe / scope the library that would lend themselves to other names?

Once the name at the project level is settled, we should pick the installed name. I would strongly suggest something with 'llvm' or at least 'll' in it. I can imagine people wanting to incorporate this code into other vector math libraries which is I think ultimately a good thing. But LLVM should be able to ship as a toolchain a clearly named runtime library that we own and that provides a set of runtime functions the toolchain expects to be able to use fairly directly.

This brings us to the question of how to name the actual routines which you raised in the original patch description and which you raised again with your latest email Hal:

A few people have suggested using the Intel vector ABI name-mangling scheme (https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) for these functions.

There are two potential not-mutually-exclusive meanings to this suggestion:

  1. Use our own base names for the functions (e.g. xcos or sleef_cos), but identify the vector versions using the vector ABI mangling scheme (e.g. _ZGVxN4v___xcos or _ZGVxN4v___sleef_cos).
  2. Name our vector functions provide vector-ABI-mangled versions of the libm functions (e.g. _ZGVxN4v_cos).

    Regarding (1), this makes sense to me, the only disadvantage I can see being that the names are less human-readable, what do people prefer? Regarding (2), one potential issue is that, since we (i.e. the provider of a compiler and not the provider of libc, which is true for some of us) don't "own" cos we also don't really "own" _ZGVxN4v_cos either, and providing this when libc might also could lead to conflicts. One option is to use our own base name for the functions, but also provide weak aliases to the standard names with the vector ABI names. Thoughts?

A huge initial question here: do we want to support users of LLVM calling these routines directly from their code? That is, do we want to use this *both* to support LLVM's vectorizers and to support users vectorizing their code?

I'm inclined to say "yes" but I represent a relatively small and specialized set of potential users for these routines. I suspect Hal and others should comment how they expect to proceed here.

If the answer is "yes" then I think we should think carefully about how that API is spelled in user code. That would (IMO) make it important that the routines have quite friendly names and interfaces. I suspect that will end up being the overriding concern with naming these, potentially making things like C++ namespaces part of what we want to think about here. However, I can also see an argument we should design a user-facing API as wrappers for the underlying runtime library.

Past all of that, or if the answer is "no", my concern about naming these routines following the Intel ABI is that it would mean zero consistency between architectures which seems really unfortunate.

My suggested naming of the routines would follow a set of conventions to build each name in a predictable way:

  • Library (and LLVM) specific prefix like __llvm_vector_math
  • Name of the mathematical function
  • Vectorization factor and lane type like v4f32
  • (optional) An architecture and potentially ABI-specific suffix like x86_64

This would give us as an example:

__llvm_vector_math_sin_vNiM_<arch/abi>
__llvm_vector_math_sin_vNfM_<arch/abi>

I very much want a common prefix (and for that common prefix to be something LLVM-specific) so we don't have collisions and can identify where these routines came from. I would like the component of the name that just identifies the vectorization and lane type to be common across architectures to make maintenance and IR calling these routines much more clear. We have good common naming conventions within the LLVM backend already. If there are more dimensions than vectorization factor and element type, we can add more to this descriptor, I just pulled this example off the top of my head.

The last bit I'm not as confident in... I can see an argument that we should just have the raw name and generate the code for the architecture, but I think for testing and other purposes we will often want multiple variants per architecture and in *source code* may see multiple architectures at the same time. There, having suffixes may make development much more clear. If we go that route, it would be good to have a suffix-free alias that is computed using an IFUNC at runtime or otherwise a reasonable default.

All that said, I really like the idea of using the Intel ABI where appropriate. I would suggest that we have a build mode that includes (weak?) aliases for the libc and libm routines with the vector ABI applied on x86 so that many things "just work". If there are other contexts in which emitting names under that ABI make sense, that seems good to me as well, I would just always do it with aliases to the more predictable / easy to read names.

Does all of this make sense? Curious if there are drawbacks here I've not thought of...

Focusing on the (very) high level issues:

I think we should probably import this with a name other than "SLEEF". At least for me, that name doesn't convey much. I would choose the name to help guide how the library should be developed and used, hopefully in a way that is reasonable short.

At least two possible and reasonable long-term goals come to my mind:

  1. Provide vectorized versions of any math routines whose scalar counterparts are typically provided via libc or libm such that LLVM can use the system libc and/or libm without being limited to the vectorized facilities it happens to provide when vectorizing code.
  2. Provide vectorized versions of any math routines useful to enable vectorization.

    Clearly #2 is a superset of #1 here. The interesting question between these two is whether there is a specific desire to limit the scope to the libc and libm common set of mathematical functions. I can see good reasons to do that, and I can see good reasons to expand the scope if there are common mathematical routines that come up for users in contexts where vectorization is desired.

    If folks want the scope to be narrow along the lines of #1, I would suggest a library name that is reasonably reminiscent of libm. libmv or libmvec? Dunno.

    If folks want the scope to be broader, I would probably be inclined to go for a broader name as well. vector_math or vmath? Other ideas?

I see no reason to, up front, limit the scope to only vectorized versions of functions provided by libm. vmath sounds good to me.

Or are there other suggestions on how to describe / scope the library that would lend themselves to other names?

Once the name at the project level is settled, we should pick the installed name. I would strongly suggest something with 'llvm' or at least 'll' in it. I can imagine people wanting to incorporate this code into other vector math libraries which is I think ultimately a good thing. But LLVM should be able to ship as a toolchain a clearly named runtime library that we own and that provides a set of runtime functions the toolchain expects to be able to use fairly directly.

You mean like using libllvmvmath or liblvmath or libllvm_vmath instead of libvmath? It looks like we could use libvmath itself with very minimal conflict.

This brings us to the question of how to name the actual routines which you raised in the original patch description and which you raised again with your latest email Hal:

A few people have suggested using the Intel vector ABI name-mangling scheme (https://www.cilkplus.org/sites/default/files/open_specifications/Intel-ABI-Vector-Function-2012-v0.9.5.pdf) for these functions.

There are two potential not-mutually-exclusive meanings to this suggestion:

  1. Use our own base names for the functions (e.g. xcos or sleef_cos), but identify the vector versions using the vector ABI mangling scheme (e.g. _ZGVxN4v___xcos or _ZGVxN4v___sleef_cos).
  2. Name our vector functions provide vector-ABI-mangled versions of the libm functions (e.g. _ZGVxN4v_cos).

    Regarding (1), this makes sense to me, the only disadvantage I can see being that the names are less human-readable, what do people prefer? Regarding (2), one potential issue is that, since we (i.e. the provider of a compiler and not the provider of libc, which is true for some of us) don't "own" cos we also don't really "own" _ZGVxN4v_cos either, and providing this when libc might also could lead to conflicts. One option is to use our own base name for the functions, but also provide weak aliases to the standard names with the vector ABI names. Thoughts?

A huge initial question here: do we want to support users of LLVM calling these routines directly from their code? That is, do we want to use this *both* to support LLVM's vectorizers and to support users vectorizing their code?

I'm inclined to say "yes" but I represent a relatively small and specialized set of potential users for these routines. I suspect Hal and others should comment how they expect to proceed here.

Yes, I definitely think we should expect that users will want to call these routines themselves; and we should provide some human-readable interface. We could do this by using wrapper functions in a header file, and I'd want to encourage that do what we can provide definitions with __attribute__((const)), etc.

If the answer is "yes" then I think we should think carefully about how that API is spelled in user code. That would (IMO) make it important that the routines have quite friendly names and interfaces. I suspect that will end up being the overriding concern with naming these, potentially making things like C++ namespaces part of what we want to think about here. However, I can also see an argument we should design a user-facing API as wrappers for the underlying runtime library.

Past all of that, or if the answer is "no", my concern about naming these routines following the Intel ABI is that it would mean zero consistency between architectures which seems really unfortunate.

My suggested naming of the routines would follow a set of conventions to build each name in a predictable way:

  • Library (and LLVM) specific prefix like __llvm_vector_math
  • Name of the mathematical function
  • Vectorization factor and lane type like v4f32
  • (optional) An architecture and potentially ABI-specific suffix like x86_64

    This would give us as an example: llvm_vector_math_sin_vNiM_<arch/abi> llvm_vector_math_sin_vNfM_<arch/abi>

    I very much want a common prefix (and for that common prefix to be something LLVM-specific) so we don't have collisions and can identify where these routines came from. I would like the component of the name that just identifies the vectorization and lane type to be common across architectures to make maintenance and IR calling these routines much more clear. We have good common naming conventions within the LLVM backend already. If there are more dimensions than vectorization factor and element type, we can add more to this descriptor, I just pulled this example off the top of my head.

    The last bit I'm not as confident in... I can see an argument that we should just have the raw name and generate the code for the architecture, but I think for testing and other purposes we will often want multiple variants per architecture and in *source code* may see multiple architectures at the same time. There, having suffixes may make development much more clear. If we go that route, it would be good to have a suffix-free alias that is computed using an IFUNC at runtime or otherwise a reasonable default.

I agree, although I'm not sure about the ifunc part, because the different isa variants have different ABIs (although sometimes only for a few functions, like ldexp, because it needs to take a vector of integers instead of just floating-point numbers).

All that said, I really like the idea of using the Intel ABI where appropriate. I would suggest that we have a build mode that includes (weak?) aliases for the libc and libm routines with the vector ABI applied on x86 so that many things "just work". If there are other contexts in which emitting names under that ABI make sense, that seems good to me as well, I would just always do it with aliases to the more predictable / easy to read names.

Sounds good. FWIW, my understanding is that other architectures are adopting a similar ABI, so it shouldn't just be Intel specific. I'm not sure how that's progressing in practice, however.

Does all of this make sense? Curious if there are drawbacks here I've not thought of...

hfinkel updated this revision to Diff 73970.Oct 7 2016, 12:36 PM

The library name has been updated to be vmath. The function names now all look like:

__llvm_<func>_<type>[_<abi>]

So, for example, for sin the library contains on my x86_64 system:

__llvm_sin_f64
__llvm_sin_u1_f64
__llvm_sin_f32
__llvm_sin_u1_f32
__llvm_sin_u1_v2f64_sse2
__llvm_sin_v2f64_sse2
__llvm_sin_u1_v4f32_sse2
__llvm_sin_v4f32_sse2
__llvm_sin_u1_v4f64_avx
__llvm_sin_v4f64_avx
__llvm_sin_u1_v8f32_avx
__llvm_sin_v8f32_avx
__llvm_sin_u1_v4f64_avx2
__llvm_sin_v4f64_avx2
__llvm_sin_u1_v8f32_avx2
__llvm_sin_v8f32_avx2
hfinkel updated this revision to Diff 73988.Oct 7 2016, 2:41 PM

Make the build a little more compiler-rt-like (we now build the library, when doing an in-tree build, such that it will be installed as lib/clang/4.0.0/lib/linux/libclang_rt.vmath-x86_64.a (on my system anyway).

hfinkel updated this revision to Diff 73998.Oct 7 2016, 3:23 PM

When doing an in-tree build, copy the headers to the build directory like compiler-rt does.

The library name has been updated to be vmath. The function names now all look like:

__llvm_<func>_<type>[_<abi>]

So, for example, for sin the library contains on my x86_64 system:

__llvm_sin_f64
__llvm_sin_u1_f64
__llvm_sin_f32
__llvm_sin_u1_f32
__llvm_sin_u1_v2f64_sse2
__llvm_sin_v2f64_sse2
__llvm_sin_u1_v4f32_sse2
__llvm_sin_v4f32_sse2
__llvm_sin_u1_v4f64_avx
__llvm_sin_v4f64_avx
__llvm_sin_u1_v8f32_avx
__llvm_sin_v8f32_avx
__llvm_sin_u1_v4f64_avx2
__llvm_sin_v4f64_avx2
__llvm_sin_u1_v8f32_avx2
__llvm_sin_v8f32_avx2

i Hal,

thank you for working on including SLEEF into parallel-libs.

I think it would be great if the symbols provided by the "libvmath"
would be easily accessible also by other compilers.

The best way to achieve this is to make sure that the symbols names
are generated according a well known and accepted classification. I
think that generating the names under the "#pragma omp declare simd"
classification is the best way to achieve this, as the pragma takes
care of different aspects that need to be considered when generating a
vector version of a vector function.

To my understanding, this is what the Intel vector ABI [1] is taking
care of.

Since other compilers are interested in linking their vector code
again this library, I recommend using for each target architecture the
naming conventions specified in the Vector ABI - obviously, ARM may
develop such an equivalent ABI in the future.

For the end user interested in using the auto-vectorization framework
provided by their favourite compiler, using either "vector math
library - flavour A" (say sleef in parallel-libs) or another "vector
math library - flavour B" (say e.g. libmvec in glib) would reduce in
linking against one of the libraries. The auto-vectorizer would
be library independent (up to the need of knowing what vector math
functions are available in the code).

Does this make sense to you? Anyone else having arguments pro/against
this choice?

Regards,

Francesco

[1] https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt

vasich added a subscriber: vasich.Tue, Aug 8, 3:03 AM