Page MenuHomePhabricator

[AArch64] Enable libm vectorized functions via SLEEF
AcceptedPublic

Authored by steleman on Oct 31 2018, 4:00 AM.

Details

Summary

This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/.

  • A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
  • A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
  • A comprehensive test case is included in this changeset.
  • In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@joel_k_jones , @steleman

I am sorry for having kept you waiting for the patch. I was not fully aware of the approval process upstream (didn’t do much contributions upstream myself), and before being able to approve I had to understand the process.

I will approve the patch and you will be able to submit it, but before doing that I would like to ask another tiny change.

FWIW, I'm sorry to push on this, but I think it would be better to have review from folks more familiar w/ generic LLVM optimizations and with SLEEF to review this. The obvious candidate, IMO, would be Hal. If Hal is unavailable, we could have a number of other people that would be very effective at reviewing (Tim, others...) with deep and londstanding experience in this place.

Given that this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, so I think it is reasonable to expect a bit more wide review before it gets landed. Essentially, I'd suggest being a bit more reluctant to approve this kind of patch. =D

That said, I also want to say a huge thank you for actually reviewing (and doing a really detailed review!) of patches like this. It's really helpful, and please don't stop doing taht. It's simply amazing and definitely helps our process scale.

FWIW, I'm sorry to push on this, but I think it would be better to have review from folks more familiar w/ generic LLVM optimizations and with SLEEF to review this. The obvious candidate, IMO, would be Hal. If Hal is unavailable, we could have a number of other people that would be very effective at reviewing (Tim, others...) with deep and londstanding experience in this place.

Given that this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, so I think it is reasonable to expect a bit more wide review before it gets landed.

To make things perfectly clear:

  • This changeset doesn't do anything that wasn't already in LLVM. -fveclib=SVML support was added by Intel prior to this changeset.
  • There is nothing here that qualifies as this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, as -fveclib=<X> is not the default in clang.
  • The only difference is in the veclib library: SLEEF instead of SVML. And in the vector function names, obviously.
  • @shibatch - a.k.a. the author of SLEEF - is subscribed to this changeset, and has commented on it already.

Essentially, I'd suggest being a bit more reluctant to approve this kind of patch.

I'd like a better understanding of what this really means.

Adding Hal Finkel and Tim Northover as reviewers as I discussed with Chandler in IRC.

I don't think that this is such a major change, at least in terms of possibility of breaking anything without notice or in terms of introducing a new kind of infrastructure.

Adding new support for a vector math library seems worth getting reviewed / approved by someone reasonably deeply familiar w/ our vectorization infra.

There are a bunch of possibilities here. I suggest Hal because I know he has worked specifically with SLEEF and vectorization before. But there are plenty of others. You can look at folks who have touched lib/Transforms/Vectorize if you'd like to see others.

I'm not trying to slow anything down, and maybe the code change is super simple and this will just be a quick stamp. But it seems quite useful to double check with folks that might have context on this area, etc.

Overall, this change looks ok to me. The tests look good, too.

However, I have a question that I have asked before and don't seem to see a solution here: how do you deal with evolving library support?

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

They cope with the main libraries by validating and releasing a clear set of compatible libraries, and the GNU community participates heavily in that process by testing and patching gcc, glibc, binutils, etc.

I don't think SLEEF maintainers will have the same stamina, so it's possible that an update in either distro or SLEEF breaks things, they reject the new package, but now, LLVM is broken, too.

This is not an LLVM problem, per se, but by supporting SLEEF, you'll be transferring the problem to the LLVM community, as we'll be the ones receiving the complaints first.

This also happens in LLVM proper, when we have just released a new version and people haven't updated SLEEF yet, and we receive a burst of complaints.

The real question is: is the SLEEF community ready to take on the task of fixing compatibility / linkage issues as soon as they occur? The alternative is what we do with other broken code: deletion. I want to avoid that.

The other question, inline, is about the argument name. We really want to use the same as existing one, for the sake of least surprise.

Some additional random comments...

Adding the triple here shows the real dependency that was never added, because SVML assumes Intel hardware, so no problems there.

The discussion on how to simplify the pattern matching is valid, but at this stage, I think we can keep the existing mechanism.

I agree that this is bound to increase with SVE but I'm not sure writing a whole new table-gen back end just for this is worthy.

The code would be slightly longer on the src side (instead of build side) but it's well organised and clear, so no big deal.

I tend to agree with @steleman that auto-generating as a loop+replace can be tricky if there are support gaps in the library.

But further cleanup changes to this patch could prove me wrong. :)

cheers,
--renato

llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

A quick google search told me GCC uses -fveclib=SLEEF. We don't want to be different or that would create big problems in build systems. It seems, at least right now, there's only one ABI, so there's no point in premature classification.

Hello Renato,

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

SLEEF already has support for SVE.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

This is also a problem in SVML.
My understanding is that this patch is kind of tentative, which will be used until the OpenMP patch ( https://reviews.llvm.org/D54412 ) will be merged.
After that, the include file that is contained in the vector math library is referred by the compiler to see which math functions are available. So this problem won't occur.

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

SLEEF already has support for SVE.

I see, but the point is still the same.

Implementation will evolve, new functions will be added, and testing all variants will still lead to the same maintenance issues.

The bottom line is: this is a long term commitment, and I want to make sure everyone involved is aware of that.

This is also a problem in SVML.

Yes, and Intel has been very forthcoming in maintaining their code.

My understanding is that this patch is kind of tentative, which will be used until the OpenMP patch ( https://reviews.llvm.org/D54412 ) will be merged.
After that, the include file that is contained in the vector math library is referred by the compiler to see which math functions are available. So this problem won't occur.

In that case, I propose we either make this experimental, or we wait until that patch lands and we can change this one accordingly.

I am worried to walk into a situation where we generate OpenMP code where no pragmas were written, as we don't (and shouldn't) include libomp by default.

I know this patch doesn't do that, but some of the comments here and there make allusion to that fact, and we should be careful how we proceed.

I would welcome @hfinkel's opinion.

Hi Renato,

My answers inline.

Overall, this change looks ok to me. The tests look good, too.

However, I have a question that I have asked before and don't seem to see a solution here: how do you deal with evolving library support?

I'm not. I'm punting, simply because I can't control SLEEF's potential ABI changes from LLVM. :-) And absent a formal definiton of SLEEF's mangling and ABI, I don't see how this problem can be dealt with in a definitive way.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

Which means that those environments who will want to use SLEEF will have to build their own.

The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

The objects compiled with LLVM next year will still link with SLEEF from today. They just won't be binding to the SVE version of SLEEF. They'll keep binding to the non-SVE SLEEF.

Meaning, ThunderX2/T99 - for example - will never use SVE, because SVE is not available in T99 silicon.

So, future SVE becomes a source of problems only for AArch64 silicon that has SVE support. For those CPU's it becomes a compile-time/link-time choice.

If they want to use a SVE-enabled SLEEF, they'll have to build a SVE-enabled SLEEF to link against. And I suspect they'll have to say '-march=armv8.2+sve' - or something like that - on compile-line.

Those environments can always use two different versions of SLEEF - one with SVE, the other one without, just by using different shared object names and different binding SONAMEs: one for SVE, the other one for not SVE.

clang doesn't auto-magically pass -lsleefgnuabi on link-line, so that's left up to the consumer.

They cope with the main libraries by validating and releasing a clear set of compatible libraries, and the GNU community participates heavily in that process by testing and patching gcc, glibc, binutils, etc.

I don't think SLEEF maintainers will have the same stamina, so it's possible that an update in either distro or SLEEF breaks things, they reject the new package, but now, LLVM is broken, too.

This is not an LLVM problem, per se, but by supporting SLEEF, you'll be transferring the problem to the LLVM community, as we'll be the ones receiving the complaints first.

This also happens in LLVM proper, when we have just released a new version and people haven't updated SLEEF yet, and we receive a burst of complaints.

The real question is: is the SLEEF community ready to take on the task of fixing compatibility / linkage issues as soon as they occur? The alternative is what we do with other broken code: deletion. I want to avoid that.

The other question, inline, is about the argument name. We really want to use the same as existing one, for the sake of least surprise.

I agree with you. ;-) I will change it back to fveclib=SLEEF. ;-)

Some additional random comments...

Adding the triple here shows the real dependency that was never added, because SVML assumes Intel hardware, so no problems there.

The discussion on how to simplify the pattern matching is valid, but at this stage, I think we can keep the existing mechanism.

I agree that this is bound to increase with SVE but I'm not sure writing a whole new table-gen back end just for this is worthy.

The code would be slightly longer on the src side (instead of build side) but it's well organised and clear, so no big deal.

I tend to agree with @steleman that auto-generating as a loop+replace can be tricky if there are support gaps in the library.

But further cleanup changes to this patch could prove me wrong. :)

Now about the -fveclib=OpenMP + #pragma omp simd story:

I am fully aware of the OpenMP SIMD Plan[tm] and of the not-really-a-RFC RFC at https://reviews.llvm.org/D54412.

Here are my thoughts on that - i've already expressed some vague version of those thoughts earlier in this changeset:

  1. If the -fveclib=OpenMP feature requires changes to existing code to work -- namely inserting #pragma omp simd directives in existing production code -- then it's a non-starter by design. There are many environments in HPC/Super-computing where making changes to existing production code requires an Act Of God and half a million $USD in code re-certification costs.
  1. Not every program wants to become dependent on OpenMP. Some programs really really don't want OpenMP. There should exist a mechanism for those kinds of programs to vectorize libm functions without involving OpenMP.
  1. The story with the "magic" header file. Some environments might consider this "magic" header file a sneaky way of implementing [1] -- namely production source code changes. A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.
  1. I am very happy about the thought of OpenMP being able - at some point in the future - to vectorize things. Programs that already use OpenMP will happily take advantage of this future feature. But, based on my understanding of the current OpenMP reality, this new OpenMP feature won't be available for a while. It's certainly not available now. So, how do we accomplish libm vectorization in the meanwhile?

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

The name mangling rule in SLEEF is basically the vector function ABI for AArch64.
The functions for Intel ISA conform to the corresponding ABI for x86.

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

Debian now includes SLEEF as a package. https://tracker.debian.org/pkg/sleef

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

Several open source projects including PyTorch have already adopted SLEEF.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

The name mangling rule in SLEEF is basically the vector function ABI for AArch64.
The functions for Intel ISA conform to the corresponding ABI for x86.

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

I know. :-)

Speaking from my own experience: I have never met an ABI that didn't have to change after it was adopted and ratified as a PDF. ;-)

So yes, the AArch64 Vector ABI is probably stable, with a 99.3% level of confidence.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.
AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

As @shibatch said, they use Intel's and Arm's ABIs, so that should be fine. But also, that this is not just a distro issue, but bundling into other libraries.

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

The signal I'm getting from all labs (in US, Europe and Asia) is that they really *don't* want to maintain their own binaries. They just cope with it because we don't do a good job at the tools/packaging level.

I'm not willing to add more chaos into the mix.

Those environments can always use two different versions of SLEEF - one with SVE, the other one without, just by using different shared object names and different binding SONAMEs: one for SVE, the other one for not SVE.
clang doesn't auto-magically pass -lsleefgnuabi on link-line, so that's left up to the consumer.

SVE was just an example of completely missing options, but new functions and potentially renames can raise much harder issues to fix. We had such issues with glibc, libgcc, compiler-rt in the past and it takes awfully long to go away.

The link flag issue is minor, as people passing -fveclib=sleef to CFLAGS will also pass -lsleef to LDFLAGS, but that doesn't solve the versioning problem.

  1. I am very happy about the thought of OpenMP being able - at some point in the future - to vectorize things. Programs that already use OpenMP will happily take advantage of this future feature. But, based on my understanding of the current OpenMP reality, this new OpenMP feature won't be available for a while. It's certainly not available now. So, how do we accomplish libm vectorization in the meanwhile?

The alternative is to call it experimental and take slow steps. This is mostly a political change, not technical, so it won't involve a lot of red tape. As I said before, the code looks fine to me.

What this normally means is:

  • Commit messages and code comments have "experimental" in them
  • A small paragraph of comments on the code explaining the problems with the current approach
  • Developer availability to move code around or re-write in the near future

The bad side is that:

  • Experimental features have less priority than other changes
  • Other developers are far less likely to keep that part of the code in good shape
  • If this code breaks some other, "official" code, it could be reverted or changed
  • If no one takes the maintenance role, the code will bit rot and be eventually removed

A code moves from experimental to official when:

  • There are enough users / developers maintaining it for months / year
  • There are enough validation efforts (buildbots, new tests, distro use)
  • The code, and its tests, don't break other unrelated changes often enough
  • Developers usually respond promptly to such breakages

As I said, I still want to hear from Hal/Tim, but I'd be happy to call this experimental, as long as we're agreeing that there is a commitment to look after this code (and all ramifications, such as for OMP).

The commitment doesn't need to be from you directly, but from enough developers to make this worthwhile.

I just want to make clear I do want SLEEF support in LLVM, but we need to make it right from the start.

cheers,
--renato

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

The signal I'm getting from all labs (in US, Europe and Asia) is that they really *don't* want to maintain their own binaries. They just cope with it because we don't do a good job at the tools/packaging level.

I'm not willing to add more chaos into the mix.

Then they are in trouble with a hypothetical distro SLEEF.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

Many applications and libraries including SLEEF use dispatchers to choose a code with which the available vector extension can be utilized.

Then they are in trouble with a hypothetical distro SLEEF.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

A bit off topic, but there are two solutions to that problem:

  1. Dynamic Arch

The likes of IFUNC and virtual dispatch, which compiles all variants, but chooses the right one at run time. OpenBLAS uses this, for example.

This is easy on the compiler, which knows all variables at build time and no user are harmed in the process. But it's hard to maintain and has additional runtime costs.

  1. Modules / Overlays

LMod, EasyBuild, Spack can choose the appropriate libraries at build / run time as needed.

That's not so easy on the compiler, especially if the modules are shared libraries, linked at runtime. It gets worse if users build half local and half use the modules.

Both are preferable to having every cluster user re-compile the whole stack every time, or having cluster admins curate a random number of random builds at random places (which is usually the end result).

That is also assuming the user knows how to best compile the libraries, better than the developers or packagers, which is usually not true.

For example, just adding march / mtune to OpenBLAS doesn't do much good and only improves a fraction of forcing the right arch via TARGET=ARCH, which then will select the right flags anyway.

fpetrogalli added inline comments.Dec 11 2018, 1:38 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hi @rengolin - where did you find this option? I am struggling to find it in the gcc code base and in the gcc developer mailing lists. GCC uses -mveclibabi, but it is only for x86.

rengolin added inline comments.Dec 11 2018, 1:53 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hum, now I'm struggling to find it, too. I may have hit a sweet spot between sleef from clang and mveclibabi from gcc and assumed it worked.

This page [1] tells me only svml and acml are supported, so I stand corrected. However, if/when gcc supports it, mveclibabi=sleefgnuabi will be very weird. So I still prefer sleef instead of sleefgnuabi.

Also, why do we use fveclib and not mveclibabi? This isn't really helpful for build systems...

[1] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

fpetrogalli added inline comments.Dec 11 2018, 2:07 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

I proposed to use -whateveroption=SLEEFGNUABI instead of -whateveroption=SLEEF because SLEEF is produced in two version: libsleef.so and libsleefgnuabi.so. The two libraries have different signatures and naming conventions. The GNUABI version is compatible with the names of the Vector Function ABI of x86 and AArch64, and with the signature of the functions in glibc, which makes it a superior choice as it adheres to standards. That is why I think this patch is making the right choice in choosing the GNUABI names.

Now, although we might have personal preferences on how to name things (I agree with you that mveclibabi|fveclib=sleefgnuabi is not pretty), I think the ultimate criteria for our choice is correctness and clarity. Having -fveclib=SLEEF when the compiler will be really using libsleefgnuabi.so would just be confusing, especially if some day we decide that we want to support also libsleef.so (for example, some workloads might prefer the sincos signature in there over the one provided in libsleefgnuabi).

rengolin added inline comments.Dec 11 2018, 2:15 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Is there any realistic expectation that we should support two ABIs for the same library with one being incompatible with the other ABIs (x86, Arm, Gnu)?

If these are just aliases, we don't even need to have two SO objects, and just add the aliases and wrappers to the main SO. We do that in Compiler-RT because it would be a nightmare otherwise.

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

Yes. Distro upgrades must be (and are) certified. Once they're certified, they stay immutable for a very long time.

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

So the non-openmp case doesn't change code, and doesn't introduce external dependencies that may not have been present in the initial implementation. It just adds a function attribute. Which, in GCC-land, translates to a function call to a (possibly inlined) vectorized function.

I am not, by any means, suggesting that there is something inherently wrong with using OpenMP for SIMD libm vectorization. There is absolutely nothing wrong with that approach.

However, OpenMP cannot be the only method of obtaining vectorization of libm functions, for reasons already mentioned. There has to be an alternate method, one that does not involve importing OpenMP pragmas and interfaces, and their implicit trigger of code changes. GCC provides that alternative with attribute((simd)).

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

    Both directive follows the syntax of the declare simd and the declare variant directives of OpenMP, with the exception that declare variant is used only for the simd context.

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

    Both directive follows the syntax of the declare simd and the declare variant directives of OpenMP, with the exception that declare variant is used only for the simd context.
  1. Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
  1. What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
  1. If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Yes, but bear in mind that the proprietary compiler options will always have less weight than other OSS compilers.

Putting it bluntly, if it doesn't work in GCC, people will not use it. Regardless of what any proprietary compiler does, based or not in LLVM.

We already have some clang pragmas for vectorisation, we don't need yet another syntax, and if we do, we should use OpenMPs.

Using OMP pragmas doesn't necessarily means we need to link with OMP libraries, and that's a special case for OMP SIMD.

Any new pragma MUST be done though an RFC process in the llvm-dev list, that's for sure.

steleman updated this revision to Diff 178475.Dec 17 2018, 9:04 AM

Updated version of the patch/changeset, as per comments from Renato:

  • veclib argument reverted back to SLEEF.
  • Corresponding enum names reverted back to SLEEF.
  • New benchmark comparing SLEEF performance when built with GCC 8.2 vs. clang ToT from 12/15/2018:

https://docs.google.com/spreadsheets/d/1QgdJjgCTFA9I2_5MvJU0uR4JT4knoW5xu-ujxszRbd0/edit?usp=sharing

Thanks! I've updated the reviewers on the Clang review (D53928).

Ping!

Yes, I realize people were on holidays. :-)

Is this OK to go in (the second time around)?

Is anyone else interested in commenting?

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

OK, so I ping'ed Clang and no-one is biting. :-)

So, what are the next steps for getting this into LLVM?

Hal approved the Clang side, so this one looks good, too.

Please commit this one first, then Clang's, to make sure fast bots and bisects don't break.

Thanks!

Thank you Hal and Renato.

steleman updated this revision to Diff 185576.Wed, Feb 6, 9:21 AM

This is a significant update to the original version of the SLEEF
changeset.

While testing LLVM+SLEEF on AArch64, I discovered that, for programs
such as the following simple test case:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

int main()
{
  double D;
  double R;

  srand48(time(0));

  D = drand48();
  R = tgamma(D);
  (void) fprintf(stderr, "tgamma: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acos(D);
  (void) fprintf(stderr, "acos: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acosh(D);
  (void) fprintf(stderr, "acosh: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = asin(D);
  (void) fprintf(stderr, "asin: D=%lf R=%lf\n", D, R);

  return 0;
}

Compile with:

./clang -O3 -mcpu=thunderx2t99 -fvectorize -fveclib=SLEEF -ffast-math -ffp-contract=fast -c t.c -o t.o

compilation will fail in LLVM:

fatal error: error in backend: Cannot select: intrinsic %llvm.asin
clang-8: error: clang frontend command failed with exit code 70 (use -v to see invocation)
[ ... etc etc ... ]

This failure is triggered by -ffast-math -ffp-contract=fast. Without
these flags, the fatal error does not occur.

This only happens when the newly-enabled intrinsics (tgamma, acos,
acosh, asin, etc) are called on a scalar -- as in the test program above.

When called on a vector, with or without -ffast-math -ffp-contract=fast,
these new intrinsics always succeed.

This updated version of the patch addresses this problem.

I have also added a second test case for SLEEF AArch64 which tests for

%call = call fast float @acosf(float %conv)

I realize that this patch has grown in size significantly, and that
it may be a candidate for splitting it into smaller parts.

The problem is, I do not quite see how it can be split into several
smaller parts, simply because without this entire patch, the new
intrinsics will cause a fatal error in the LLVM backend, as shown above.

The patch is based on LLVM ToT from 02/06/2019.

Herald added a project: Restricted Project. · View Herald TranscriptWed, Feb 6, 9:21 AM
rengolin reopened this revision.Sat, Feb 9, 5:08 AM

Right, this is a completely different patch and it seems that the original proposal hadn't been tested thoroughly.

It did feel as the initial patch was far too simple for what it enabled, but I thought this was due to implementations in the intrinsic selection that I didn't follow for the past year or two.

By looking over this new patch, it feels as though most changes are mechanical, but I'm afraid I don't have enough familiarity with the code base any more to be sure without further testing.

Given that this review is "approved" and "closed" and then reverted, and that the implementation is now totally different than the original approved patch, I'll reopen this one and request revision again.

Also, the Clang part (D53928) seems to have changed significantly after approved, so that, too, would need revisiting. I have updated that review to look at it again.

This revision is now accepted and ready to land.Sat, Feb 9, 5:08 AM
rengolin requested changes to this revision.Sat, Feb 9, 5:08 AM
This revision now requires changes to proceed.Sat, Feb 9, 5:08 AM

Hi,

I have a few comments inline, mostly harmless, as the change is almost entirely mechanical and makes sense.

I'm happy with the change as is, no need to split (as I also can't see how), after we address all the comments.

That'd also give some time for other people to review the new changes.

cheers,
--renato

include/llvm/Analysis/TargetLibraryInfo.def
315

Probably a silly question but, why "tgamma" when the function name is "gamma"?

include/llvm/IR/RuntimeLibcalls.def
258

Looks like someone should one day look into generating this file, an OPcode section and TargetLibraryInfo from the intrinsics table-gen... Not now, though.

lib/Analysis/TargetLibraryInfo.cpp
370

This would probably fail in many non-Arm64 arches, too.

Best to restrict to what this was meant for, which I'm guessing is x86.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1108

Please clean up all "new intrinsics" comments, as they're not necessary for the patch.

I imagine they were helpful while developing... :)

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952

I think we can have these in the same place as the others, but with an "experimental" comment before.

The main idea behind the experimental tag is so that people don't rely on it until it's complete, but once it is, they can.

So if we create them in different places, we'd have to move them in once they're no longer experimental and that would be a pain.

I'm not expecting this to be in experimental for too long and am hoping to get this official by the next release (july-ish 2019), so, let's keep the experimental status on comments only and then remove the comments when everything is working as expected.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
2

We don't generally recommend O3 in tests anyway, as this introduces test noise when the passes in O3 change (add/del/move around).

4

does fp-contract=fast really make a difference for these transformations?

7

is this really specific to Arm64? At least the generation could be done for any target, if the library supports it or not is not relevant for this test, no?

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
4

What's the difference between these two test files?

Hi Renato,

Thank you very much for the comments.

I will create a new changeset with all your suggestions, and then re-submit.

bryanpkc added inline comments.
include/llvm/Analysis/TargetLibraryInfo.def
315

This didn't work in my test build. The function names must be in alphabetical order, so __gamma_r_finite etc. cannot come after __isoc99_scanf.

rengolin added inline comments.Thu, Feb 14, 2:51 AM
include/llvm/Analysis/TargetLibraryInfo.def
315

Better move to alphabetical order, then. Would that remove the need to call it "tgamma"?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Ah! Sounds good. :)

Thanks!

steleman marked 11 inline comments as done.Thu, Feb 14, 10:32 PM

Added inline comments/answers to Renato's questions.

lib/Analysis/TargetLibraryInfo.cpp
370

I am a bit worried about this one. The original set exp10|exp10f|exp10l as unavailable on all arches. I originally enabled these functions for AArch64 only.

If we restrict this to x86|x86_64 only, then these functions will become available
on MIPS, PPC, PPC64, etc. I have no way of testing these ISA's to determine if they're broken or not.

So I disabled these functions for x86|x86_64 only. Let me know if you'd like to revert this to the original, which was to enable them on AArch64 only.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952

Renamed this function to expandNonVectorIntrinsics.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
4

It does not. Which is good. :-) That's the point of the test - the transformations should stay identical, and pass, with or without -fp-contract=fast.

7

SLEEF is only currently enabled for AArch64 in TargetLibraryInfo. So, for any other ISA, all the tests will fail, because the transformations won't happen.

Plus, for any ISA other than AArch64, the SLEEF function name mangling will be different than on AArch64. Which means this particular test - as written for AArch64 - can't be used on any other ISA; all the CHECKs will fail.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
4

sleef-calls-aarch64-fast.ll calls

%call = call fast <type> <function-name>(<argument>)

sleef-calls-aarch64.ll calls

%call = tail call <type> <function-name>(<argument>)
steleman updated this revision to Diff 186962.Thu, Feb 14, 10:34 PM
steleman marked 5 inline comments as done.

Updated changeset to reflect comments/questions from Renato.

This new changeset is based on LLVM ToT from 02/14/2019.

Thanks for your answers, they make sense. Other than the mistake I made on disabling exp calls (you were right), looks fine.

lib/Analysis/TargetLibraryInfo.cpp
370

My bad, I read it the other way round. Your original patch was correct.

steleman updated this revision to Diff 186993.Fri, Feb 15, 4:17 AM

Reverted change in TargetLibraryInfo.cpp re: exp10|exp10f|exp10l on Linux with GLIBC.

These functions are now allowed on AArch64 or AArch64_be only.

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

include/llvm/Analysis/TargetLibraryInfo.def
315

@bryanpkc What's your environment?

@steleman Any idea why you don't see that on your builds?

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

rengolin accepted this revision.Fri, Feb 15, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

This revision is now accepted and ready to land.Fri, Feb 15, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

Thank you Sir! :-)

bryanpkc added a comment.EditedFri, Feb 15, 3:35 PM

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

I think the correct fix here is to use an associative container, not an array, and not rely on writing a text file containing sorted character strings.

std::set comes to mind.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

steleman updated this revision to Diff 187131.Fri, Feb 15, 10:41 PM

Corrected manual sorting in TargetLibraryInfo.def as per comments.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

OK, changeset updated. Tested with Debug+Assertions+ExpensiveChecks.

Perfect, thanks!

steleman updated this revision to Diff 187507.Tue, Feb 19, 7:53 PM

One minor update:

${top_srcdir}/unittests/Analysis/TargetLibraryInfoTest.cpp

needs an update to account for lgamma, tgamma, lgamma_r_finite and gamma_r_finite.