This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Enable libm vectorized functions via SLEEF
AcceptedPublic

Authored by steleman on Oct 31 2018, 4:00 AM.

Details

Summary

This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/.

  • A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
  • A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
  • A comprehensive test case is included in this changeset.
  • In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Diff Detail

Event Timeline

steleman created this revision.Oct 31 2018, 4:00 AM

The corresponding Clang changeset is: https://reviews.llvm.org/D53928.

Hi @steleman , thank you for working on this!

Before doing a detailed review, I have a couple of comments and requests.

  1. Please add full context to the patch (git show -U999999).
  2. You are using Vector Function ABI mangled names, which means you ware assuming to link agains libsleefgnuabi.so instead of libsleef.so. For this reason, I think it is worth to replace the -fveclib=SLEEF option with -fveclib=SLEEFGNUABI. This request applies to all places where you have used SLEEF.
  3. Given that SLEEF has some degree of target independence, I would recommend using the TargetTriple if that selects the list of functions into a switch statement that sets the architectural extension token in the mangled names. On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets. Notice that I am not saying that you should add all targets, I am just saying that you should prepare the ground for extending this, as I think that -fveclib=SLEEFGNUABI should work the same for all the extensions supported by SLEEF.
  4. On the longer term, I want to make you aware of the fact that there is plan to replace the TLI implementation of -fveclib with an OpenMP based one (to decouple backend and frontend). We discussed this at the last LLVM dev meeting, I will circulate an RFC soon. See https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7.
  5. Please change your commit messages header, the function you are adding are not just trigonometric functions. May I suggest "[AArch64] Enable libm vectorized functions via SLEEFGNUABI"?

Kind regards,

Francesco

Hi @steleman , thank you for working on this!

Before doing a detailed review, I have a couple of comments and requests.

  1. You are using Vector Function ABI mangled names, which means you ware assuming to link agains libsleefgnuabi.so instead of libsleef.so. For this reason, I think it is worth to replace the -fveclib=SLEEF option with -fveclib=SLEEFGNUABI. This request applies to all places where you have used SLEEF.

Sorry but I wholeheartedly disagree with this idea.

For starters, saying -fveclib=SLEEFGNUABI is just ugly IMO.

Furthermore, I do not understand why this change is necessary. It provides nothing that the current naming convention doesn't already provide. So, really, it's a matter of personal preference.

Also: Intel did not name their veclib argument "SVMLINTELABI". They named it "SVML".

  1. Given that SLEEF has some degree of target independence, I would recommend using the TargetTriple if that selects the list of functions into a switch statement that sets the architectural extension token in the mangled names. On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets. Notice that I am not saying that you should add all targets, I am just saying that you should prepare the ground for extending this, as I think that -fveclib=SLEEFGNUABI should work the same for all the extensions supported by SLEEF.

I don't understand any of this. It seems to me that you are describing what I've already done.

The TargetTriple wasn't present in the TargetLibraryInfoImpl class. I had to add it, specifically because it is needed by the SLEEF switch statement case for AArch64.

On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets.

I don't understand what this means.

Are you suggesting that the VecLib array of function name mappings should be generated by tblgen? Yes, I agree. It should be. But it currently isn't. And I don't know when tblgen will be extended to do this job. Or if it ever will be extended to do this job.

Also, we (Cavium) will not be extending the SLEEF mappings to other ISA's. Simply because we do not have the capability of testing PPC64, or Windows.

So, really, providing additional SLEEF mappings for other ISA's or platforms is best suited for those who are best versed in the subtle details of those other ISA's or platforms.

The SLEEFGNUABI ABI has very little CPU independence when compiling for for X86_64. It is, in fact, very CPU-dependent.

In fact, in the current implementation of TargetLibraryInfo, it is impossible to provide X86_64 mappings for SLEEF using the GNUABI. That is because the X86_64 GNUABI mappings require knowledge of the capabilities of the specific CPU that the object is being compiled for. These CPU capabilities are hardcoded by SLEEF, in the SLEEF GNU ABI mangled name:

'b' -> SSE
'c' -> AVX
'd' -> AVX2
'e' -> AVX512

So, at a minimum, on X86_64, you would need to know the values passed to -march= and -mtune=. But these CPU capabilities are currently inaccessible from TargetLibraryInfo.

TargetLibraryInfo could be extended to acquire the CPU capability information, but that's for another changeset. Maybe after TargetLibraryInfo is extended to support this capability we can revisit the construction of the GNU ABI mangled name.

  1. On the longer term, I want to make you aware of the fact that there is plan to replace the TLI implementation of -fveclib with an OpenMP based one (to decouple backend and frontend). We discussed this at the last LLVM dev meeting, I will circulate an RFC soon. See https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7.

That's an interesting plan.

I'd like to make you aware that brute-forcing OpenMP for vectorization will probably run afoul of SPEC rules.

SPEC allows OpenMP in SPECspeed, but not in SPECrate, and that is clearly stated at the SPEC web site. Not to mention that the SPEC benchmarks code cannot be modified to add #pragma omp simd. And having clang silently and secretly modify emitted code to insert OpenMP simd will likely not fly at SPEC.

I am also not a fan of OpenMP being used as a giant hammer that can hit any nail. You would be introducing a dependency on OpenMP for programs that do not otherwise use, or need, OpenMP. So, in reality, there is no "decoupling" here. There is only a dependency substitution: the existing dependency on SLEEF is replaced by the future dependency on OpenMP.

  1. Please change your commit messages header, the function you are adding are not just trigonometric functions. May I suggest "[AArch64] Enable libm vectorized functions via SLEEFGNUABI"?

I'll settle for "[AArch64] Enable libm vectorized functions via SLEEF".

steleman retitled this revision from [AArch64] Enable trigonometry libm vectorized functions via SLEEF to [AArch64] Enable libm vectorized functions via SLEEF.Oct 31 2018, 10:42 AM
  1. You are using Vector Function ABI mangled names, which means you ware assuming to link agains libsleefgnuabi.so instead of libsleef.so. For this reason, I think it is worth to replace the -fveclib=SLEEF option with -fveclib=SLEEFGNUABI. This request applies to all places where you have used SLEEF.

Sorry but I wholeheartedly disagree with this idea.

For starters, saying -fveclib=SLEEFGNUABI is just ugly IMO.

Furthermore, I do not understand why this change is necessary. It provides nothing that the current naming convention doesn't already provide. So, really, it's a matter of personal preference.

Fair enough. :)
I just think SLEEFGNUABI is more precise, and avoids confusion given that SLEEF is packaged in two version, with incompatible naming conventions and signatures. You are clearly using the conventions in libsleefgnuabi.so, I think I just think it is better to use -fveclib=SLEEFGNUABI.
Moreover, someone might want to add -fveclib for the non GNUABI version of the library - for example, they might prefer the sincos signatures of libsleef over libsleefgnuabi. I agree that -ffveclib=SLEEFGNUABI might be ugly to someone, but not more than -fveclib=SLEEFNONGNUABi.

I will not enforce this though, it is just a name.

Also: Intel did not name their veclib argument "SVMLINTELABI". They named it "SVML”.

I think this could have happened whether they would have come up with a version of the library that had new signatures and name mangling scheme.

  1. Given that SLEEF has some degree of target independence, I would recommend using the TargetTriple if that selects the list of functions into a switch statement that sets the architectural extension token in the mangled names. On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets. Notice that I am not saying that you should add all targets, I am just saying that you should prepare the ground for extending this, as I think that -fveclib=SLEEFGNUABI should work the same for all the extensions supported by SLEEF.

I don't understand any of this. It seems to me that you are describing what I've already done.

The TargetTriple wasn't present in the TargetLibraryInfoImpl class. I had to add it, specifically because it is needed by the SLEEF switch statement case for AArch64.

On the long term, the list of available vector functions should be generated by setting such token instead of hard coding all names for all targets.

I don't understand what this means.

Are you suggesting that the VecLib array of function name mappings should be generated by tblgen? Yes, I agree. It should be. But it currently isn't. And I don't know when tblgen will be extended to do this job. Or if it ever will be extended to do this job.

Also, we (Cavium) will not be extending the SLEEF mappings to other ISA's. Simply because we do not have the capability of testing PPC64, or Windows.

So, really, providing additional SLEEF mappings for other ISA's or platforms is best suited for those who are best versed in the subtle details of those other ISA's or platforms.

The SLEEFGNUABI ABI has very little CPU independence when compiling for for X86_64. It is, in fact, very CPU-dependent.

In fact, in the current implementation of TargetLibraryInfo, it is impossible to provide X86_64 mappings for SLEEF using the GNUABI. That is because the X86_64 GNUABI mappings require knowledge of the capabilities of the specific CPU that the object is being compiled for. These CPU capabilities are hardcoded by SLEEF, in the SLEEF GNU ABI mangled name:

'b' -> SSE
'c' -> AVX
'd' -> AVX2
'e' -> AVX512

So, at a minimum, on X86_64, you would need to know the values passed to -march= and -mtune=. But these CPU capabilities are currently inaccessible from TargetLibraryInfo.

TargetLibraryInfo could be extended to acquire the CPU capability information, but that's for another changeset. Maybe after TargetLibraryInfo is extended to support this capability we can revisit the construction of the GNU ABI mangled name.

I think there is no need to use tablegen for this. For the rest, I agree with all you said. Let me explain what I exactly mean by “prepare the ground to extend this”.

You have encoded the list of function for AArch64 as follows (allow me to use pseudocode).

If (AArch64) {
  List ={ {“sin”, “_ZGVn2v_sin”, 2},
             {“cos”, “_ZGVn2v_cos”, 2},
             ….
};
}

I think you should rewrite your code as

  List ={ {“sin”, “_ZGV<simdext>2v_sin”, 2},
             {“cos”, “_ZGV<simdext>2v_cos”, 2},
             ….
};
If (aarch64)
   ext_token = “n”.

Loop through (list) and replace “<simdext>” with ext_token.

With this change, if anyone decide to add any of the other architectures (of course, adding was is missing in target triple to discern the right extension token), all they have to do is to add new logic to set the ext_token variable correctly.

Let me stress again the fact that I am not asking you do do the non aarch64 work. I just think it make sense to make sure other people can extend it easily.

  1. On the longer term, I want to make you aware of the fact that there is plan to replace the TLI implementation of -fveclib with an OpenMP based one (to decouple backend and frontend). We discussed this at the last LLVM dev meeting, I will circulate an RFC soon. See https://llvm.org/devmtg/2018-10/talk-abstracts.html#bof7.

That's an interesting plan.

I'd like to make you aware that brute-forcing OpenMP for vectorization will probably run afoul of SPEC rules.

SPEC allows OpenMP in SPECspeed, but not in SPECrate, and that is clearly stated at the SPEC web site. Not to mention that the SPEC benchmarks code cannot be modified to add #pragma omp simd. And having clang silently and secretly modify emitted code to insert OpenMP simd will likely not fly at SPEC.

I am also not a fan of OpenMP being used as a giant hammer that can hit any nail. You would be introducing a dependency on OpenMP for programs that do not otherwise use, or need, OpenMP. So, in reality, there is no "decoupling" here. There is only a dependency substitution: the existing dependency on SLEEF is replaced by the future dependency on OpenMP.

The OpenMP based implementation will pull in only the simd capabilities of openmp (-fopenmp-simd), not the whole OpenMP specs. I will be more detailed on this in the RFC, and add you in the discussion to make sure your concerns are addressed.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

Apologies for the confidentiality notice at the end my last message. It wasn't supposed to be there, please ignore it. Nothing of what I have written is confidential.

The SLEEFGNUABI ABI has very little CPU independence when compiling for for X86_64. It is, in fact, very CPU-dependent.

In fact, in the current implementation of TargetLibraryInfo, it is impossible to provide X86_64 mappings for SLEEF using the GNUABI. That is because the X86_64 GNUABI mappings require knowledge of the capabilities of the specific CPU that the object is being compiled for. These CPU capabilities are hardcoded by SLEEF, in the SLEEF GNU ABI mangled name:

'b' -> SSE
'c' -> AVX
'd' -> AVX2
'e' -> AVX512

So, at a minimum, on X86_64, you would need to know the values passed to -march= and -mtune=. But these CPU capabilities are currently inaccessible from TargetLibraryInfo.

TargetLibraryInfo could be extended to acquire the CPU capability information, but that's for another changeset. Maybe after TargetLibraryInfo is extended to support this capability we can revisit the construction of the GNU ABI mangled name.

I think there is no need to use tablegen for this. For the rest, I agree with all you said. Let me explain what I exactly mean by “prepare the ground to extend this”.

You have encoded the list of function for AArch64 as follows (allow me to use pseudocode).

If (AArch64) {
  List ={ {“sin”, “_ZGVn2v_sin”, 2},
             {“cos”, “_ZGVn2v_cos”, 2},
             ….
};
}

I think you should rewrite your code as

  List ={ {“sin”, “_ZGV<simdext>2v_sin”, 2},
             {“cos”, “_ZGV<simdext>2v_cos”, 2},
             ….
};
If (aarch64)
   ext_token = “n”.

Loop through (list) and replace “<simdext>” with ext_token.

With this change, if anyone decide to add any of the other architectures (of course, adding was is missing in target triple to discern the right extension token), all they have to do is to add new logic to set the ext_token variable correctly.

OK, I like the idea of having a generalized way of expressing the libm <-> SLEEF bindings.

But, there's always a 'but':

  1. Looping through this static array of struct bindings, and doing the replacement of the ${ext_token} with the appropriate mangling symbol requires an extra function call. Do we want that extra function call?
  1. This gets even more complicated. Here's why:
  • Here are the available SLEEF functions for atan on AArch64. We get them with

readelf -sW libsleefgnuabi.so.3.3 | awk '{ printf "%s\t%s\n", $5, $8 }' | egrep atan | egrep -v 'atan2|atanh

GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf_u35
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN4v_atanf_u35

  • And here are the SLEEF functions for atan for X86_64, which was compiled with -march=core-avx2:

WEAK _ZGVeM8v___atan_finite
GLOBAL _ZGVeM8v_atan
GLOBAL _ZGVeM8v_atan_u35
GLOBAL _ZGVbN4v_atanf
GLOBAL _ZGVbN2v_atan_u35
GLOBAL _ZGVcN4v_atan
GLOBAL _ZGVeN8v_atan_u35
GLOBAL _ZGVdN4v_atan_u35
GLOBAL _ZGVbN2v_atan
GLOBAL _ZGVbN4v_atanf_u35
WEAK _ZGVeM16v___atanf_finite_u35
WEAK _ZGVeM16v___atanf_finite
GLOBAL _ZGVdN8v_atanf_u35
WEAK _ZGVeM8v___atan_finite_u35
GLOBAL _ZGVcN4v_atan_u35
GLOBAL _ZGVdN8v_atanf
GLOBAL _ZGVcN8v_atanf
GLOBAL _ZGVcN8v_atanf_u35
GLOBAL _ZGVdN4v_atan
GLOBAL _ZGVeN16v_atanf_u35
GLOBAL _ZGVeN16v_atanf
GLOBAL _ZGVeM16v_atanf
GLOBAL _ZGVeM16v_atanf_u35
GLOBAL _ZGVeN8v_atan
GLOBAL _ZGVeN16v_atanf
WEAK _ZGVeM8v___atan_finite
WEAK _ZGVeM16v___atanf_finite_u35
GLOBAL _ZGVeM8v_atan_u35
GLOBAL _ZGVdN4v_atan_u35
GLOBAL _ZGVbN4v_atanf
WEAK _ZGVeM16v___atanf_finite
WEAK _ZGVeM8v___atan_finite_u35
GLOBAL _ZGVeM16v_atanf
GLOBAL _ZGVeM8v_atan
GLOBAL _ZGVeN16v_atanf_u35
GLOBAL _ZGVcN4v_atan_u35
GLOBAL _ZGVcN4v_atan
GLOBAL _ZGVeN8v_atan_u35
GLOBAL _ZGVbN4v_atanf_u35
GLOBAL _ZGVdN4v_atan
GLOBAL _ZGVbN2v_atan_u35
GLOBAL _ZGVdN8v_atanf
GLOBAL _ZGVbN2v_atan
GLOBAL _ZGVcN8v_atanf
GLOBAL _ZGVcN8v_atanf_u35
GLOBAL _ZGVdN8v_atanf_u35
GLOBAL _ZGVeN8v_atan
GLOBAL _ZGVeM16v_atanf_u35

As you can see, for X86_64, the bindings are not as straightforward (read: 1-1) as they currently are on AArch64. On X86_64, there's several variants for atan, and for the same vectorization factor. The discriminating factor between the several variants being the CPU capability.

This will happen on AArch64 too when SVE is introduced. We'll end up having several different versions of atan: one for SVE, the other for non-SVE.

I have a strong suspicion - that I have not verified in practice - that PPC/PPC64 is exactly the same. There will be several different versions for every function, depending on which vectorization model / CPU capability model was chosen at compile-time (-march= | -mtune=).

Right now, it is not possible to make the distinction between different CPU capabilities inside TargetLibraryInfo. That's because the information from -march= and -mtune= isn't captured in TargetLibraryInfo. The only thing we have - right now - is the TargetTriple. And the TargetTriple tells us nothing about the -march= and/or -mcpu= micro-arch tuning.

So, we need to add some new information to TargetLibraryInfo: namely the -march= and -mtune= information that was passed to clang. I think implementing this bit is part of a larger - and different - discussion. And I am very reluctant to mix the TargetLibraryInfo enhancement with SLEEF support in AArch64, in the same changeset.

So, what I would suggest for now, is that we postpone this generalization of the libm <-> SLEEF bindings until we have a complete picture of what are the various - and subtle - differences between the various CPU capabilities mangling. We don't yet have a complete picture - I can't build SLEEF on PPC/PPC64 simply because I don't have access to a PPC/PPC64 box.

What do you think?

OK, I like the idea of having a generalized way of expressing the libm <-> SLEEF bindings.

Great! But… (yes, I also have a but now!)

But, there's always a 'but':

  1. Looping through this static array of struct bindings, and doing the replacement of the ${ext_token} with the appropriate mangling symbol requires an extra function call. Do we want that extra function call?
  1. This gets even more complicated. Here's why:
  • Here are the available SLEEF functions for atan on AArch64. We get them with

readelf -sW libsleefgnuabi.so.3.3 | awk '{ printf "%s\t%s\n", $5, $8 }' | egrep atan | egrep -v 'atan2|atanh

GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf_u35
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN4v_atanf_u35

  • And here are the SLEEF functions for atan for X86_64, which was compiled with -march=core-avx2:

WEAK _ZGVeM8v___atan_finite
GLOBAL _ZGVeM8v_atan
GLOBAL _ZGVeM8v_atan_u35
GLOBAL _ZGVbN4v_atanf
GLOBAL _ZGVbN2v_atan_u35
GLOBAL _ZGVcN4v_atan
GLOBAL _ZGVeN8v_atan_u35
GLOBAL _ZGVdN4v_atan_u35
GLOBAL _ZGVbN2v_atan
GLOBAL _ZGVbN4v_atanf_u35
WEAK _ZGVeM16v___atanf_finite_u35
WEAK _ZGVeM16v___atanf_finite
GLOBAL _ZGVdN8v_atanf_u35
WEAK _ZGVeM8v___atan_finite_u35
GLOBAL _ZGVcN4v_atan_u35
GLOBAL _ZGVdN8v_atanf
GLOBAL _ZGVcN8v_atanf
GLOBAL _ZGVcN8v_atanf_u35
GLOBAL _ZGVdN4v_atan
GLOBAL _ZGVeN16v_atanf_u35
GLOBAL _ZGVeN16v_atanf
GLOBAL _ZGVeM16v_atanf
GLOBAL _ZGVeM16v_atanf_u35
GLOBAL _ZGVeN8v_atan
GLOBAL _ZGVeN16v_atanf
WEAK _ZGVeM8v___atan_finite
WEAK _ZGVeM16v___atanf_finite_u35
GLOBAL _ZGVeM8v_atan_u35
GLOBAL _ZGVdN4v_atan_u35
GLOBAL _ZGVbN4v_atanf
WEAK _ZGVeM16v___atanf_finite
WEAK _ZGVeM8v___atan_finite_u35
GLOBAL _ZGVeM16v_atanf
GLOBAL _ZGVeM8v_atan
GLOBAL _ZGVeN16v_atanf_u35
GLOBAL _ZGVcN4v_atan_u35
GLOBAL _ZGVcN4v_atan
GLOBAL _ZGVeN8v_atan_u35
GLOBAL _ZGVbN4v_atanf_u35
GLOBAL _ZGVdN4v_atan
GLOBAL _ZGVbN2v_atan_u35
GLOBAL _ZGVdN8v_atanf
GLOBAL _ZGVbN2v_atan
GLOBAL _ZGVcN8v_atanf
GLOBAL _ZGVcN8v_atanf_u35
GLOBAL _ZGVdN8v_atanf_u35
GLOBAL _ZGVeN8v_atan
GLOBAL _ZGVeM16v_atanf_u35

As you can see, for X86_64, the bindings are not as straightforward (read: 1-1) as they currently are on AArch64. On X86_64, there's several variants for atan, and for the same vectorization factor. The discriminating factor between the several variants being the CPU capability.

… but you made a good point here in listing the variants with different lanes on x86. I missed this in my reasoning yesterday. Even if we find a way of generalizing the list of function, we still have to make sure the NumberOfLanes integral field of the mapping is appropriate for the function being used, which means the list of AArch64 functions cannot be shared with the x86 one. The hypothetical function call you mentioned before for setting the name would have to set the number of lanes, and actually add new records to the list… too convoluted when compared to a clear static list.

This will happen on AArch64 too when SVE is introduced. We'll end up having several different versions of atan: one for SVE, the other for non-SVE.

I have a strong suspicion - that I have not verified in practice - that PPC/PPC64 is exactly the same. There will be several different versions for every function, depending on which vectorization model / CPU capability model was chosen at compile-time (-march= | -mtune=).

Right now, it is not possible to make the distinction between different CPU capabilities inside TargetLibraryInfo. That's because the information from -march= and -mtune= isn't captured in TargetLibraryInfo. The only thing we have - right now - is the TargetTriple. And the TargetTriple tells us nothing about the -march= and/or -mcpu= micro-arch tuning.

So, we need to add some new information to TargetLibraryInfo: namely the -march= and -mtune= information that was passed to clang. I think implementing this bit is part of a larger - and different - discussion. And I am very reluctant to mix the TargetLibraryInfo enhancement with SLEEF support in AArch64, in the same changeset.

Yes, and the way we will go will probably be to skip the TLI and give this responsibility to the LoopVectorizer. But as you said, this is a discussion that needs to be taken at a different level, I will talk about this in the openMP sims-only based proposal.

So, what I would suggest for now, is that we postpone this generalization of the libm <-> SLEEF bindings until we have a complete picture of what are the various - and subtle - differences between the various CPU capabilities mangling. We don't yet have a complete picture - I can't build SLEEF on PPC/PPC64 simply because I don't have access to a PPC/PPC64 box.

What do you think?

I agree with you. The design of the patch is OK as it is. Thank you for your patience. I will go through it carefully.

But, if I may, I would like to stress the fact that you should use -fveclib=SLEEFGNUABI. I think we can agree that, although SLEEFGNUABI might be ugly, it is the right name to use for uniformity with the linker flags it implies (-fsleefgnuabi).

Kind regards,

Francesco

Is it easy to add pow and atan2?
Since these functions are relatively frequently used, I think people would appreciate if you could also add these two functions.

Is it easy to add pow and atan2?
Since these functions are relatively frequently used, I think people would appreciate if you could also add these two functions.

I will add them. They will be in the next revision of the changeset.

GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf_u35
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan_u35
GLOBAL _ZGVnN2v_atan
GLOBAL _ZGVnN4v_atanf
GLOBAL _ZGVnN4v_atanf_u35

@steleman , one more thing. You might want to map the functions to the _u35 version when it is available, because the one with no _u35 postfix are using the 1.0ULP implementation which is slower than the 3.5ULP one. Of course, assuming that 3.5 ULP is good enough for you.

Francesco

steleman updated this revision to Diff 172637.Nov 5 2018, 12:46 PM
  • changed the -fveclib=<X> argument value to 'sleefgnuabi'.
  • added atan2 and pow
  • spreadsheet with comparison between libm and sleef is here:

https://docs.google.com/spreadsheets/d/1lcpESCnuzEoTl_XHBqE9FLL0tXJB_tZGR8yciCx1yjg/edit?usp=sharing

  • comprehensive test case in C with timings is here:

https://drive.google.com/open?id=1PGKRUdL29_ANoYebOo3Q59syhKp_mNSj

I am sorry for not reading the comments carefully, but I think 3.5-ULP functions should be used only when a fast-math option is specified.

I am sorry for not reading the comments carefully, but I think 3.5-ULP functions should be used only when a fast-math option is specified.

Which means we have to use the non 3.5-ULP functions because we have no way of knowing from inside TargetLibraryInfo if -ffast-math was specified on compile-line or not.

I will revert it back to the non 3.5-ULP functions tomorrow.

steleman updated this revision to Diff 172754.Nov 6 2018, 6:21 AM
  • Reverted to using the non _u35 SLEEF function names as per comment from @shibatch.

Hi @steleman,

sorry for the delay in getting back to you. I have a couple of observations, for you and @shibatch.

  1. @steleman I don't understand some of the values in your benchmarks. In particular, sin and cos should have similar timings, not differ so much as in your report. I wonder whether the choice of the CLOCK_PROCESS_CPUTIME_ID might have caused this. I think that CLOCK_PROCESS_CPUTIME_ID might translate in a syscall, and therefore cause much overhead in the measurement. I'd rather use CLOCK_MONOTONIC. Also, to make sure you are just measuring the function latency, I think you should invoke the benchmark on array of smaller size, and invoke the call a couple of times before actually starting the time measurement, to reduce the amount of noise causes by warm up effects.
  1. @shibatch, do you know for sure that we shouldn't use the 3.5 ULP version of the function? I have just run the following example with both -O3 and -Ofast, they both produce the same SVML calls. As far as I know SVML guarantee a 4ULP precision, so I think we are good in doing SLEEF. @shibatch, so you have any advice here?
// compile with "clang -fveclib=SVML ~/test.c -o - -c -S -emit-llvm [-Ofast|-O3]"
#include <math.h>

void f(double * restrict x, double * restrict y, unsigned n) {
  int i;
  for (i = 0; i < n; ++i) {
    x[i]=sin(y[i]);
  }
}

On a final note, you'll have to excuse again delays in my replies in the coming week, because of SC18. If you will be there too, it would be great to meet.

Kind regards,

Francesco

@shibatch, do you know for sure that we shouldn't use the 3.5 ULP version of the function?

For example, wikepedia says "Reputable numeric libraries compute the basic transcendental functions to between 0.5 and about 1 ULP".

https://en.wikipedia.org/wiki/Unit_in_the_last_place

As far as I know SVML guarantee a 4ULP precision, so I think we are good in doing SLEEF.

I think there is a room for discussion here. Since your OpemMP patch will have the same problem, it is better to post an RFC for this.

  1. @steleman I don't understand some of the values in your benchmarks. In particular, sin and cos should have similar timings, not differ so much as in your report. I wonder whether the choice of the CLOCK_PROCESS_CPUTIME_ID might have caused this. I think that CLOCK_PROCESS_CPUTIME_ID might translate in a syscall, and therefore cause much overhead in the measurement. I'd rather use CLOCK_MONOTONIC. Also, to make sure you are just measuring the function latency, I think you should invoke the benchmark on array of smaller size, and invoke the call a couple of times before actually starting the time measurement, to reduce the amount of noise causes by warm up effects.

I don't understand this. Are you suggesting that decreasing the size of the statistical sample increases the accuracy?

At any rate, the test benchmark and the results spreadsheet aren't part of this changeset.

So I'm confused as to what's being asked here.

If there are no more comments directly related to this changeset, can we move this along?

Hi @steleman , thank you for your patience.

I don't understand this. Are you suggesting that decreasing the size of the statistical sample increases the accuracy?

yes, reducing the size of the array increases the accuracy of the measurement, if what you are measuring is latency of the function. You should also run each measurement multiple times and get the best.

At any rate, the test benchmark and the results spreadsheet aren't part of this changeset.
So I'm confused as to what's being asked here.

Yes, the benchmarks and the results are not part of the submission. I think your results are quite unstable, I was suggesting changes to get them more stable (sin shouldn't differ so much from cos, because their algorithm is very similar). If I may, I would like to point you at the open source benchmarks that our math library (libm) team uses: https://github.com/ARM-software/optimized-routines/blob/master/test/mathbench.c#L253-L274

Having said that, the implementation of the functionality looks good to me. My only concerns are about the actual performance, that is why I was suggesting to use the 3.5 ULP version of the functions. But I'll let you sort out which version is best for you.

As per submitting this code, I have to redirect you to the maintainer of this part of the back-end for submitting your changes.

Kind regards,

Francesco

Hi @steleman , thank you for your patience.

As per submitting this code, I have to redirect you to the maintainer of this part of the back-end for submitting your changes.

It would have been much more appropriate to make this re-direction a week ago, and not let this changeset rot unnecessarily for a week.

@fpetrogalli Who do you recommend as the maintainer? Looking at the history for TargetLibraryInfo.{cpp,h} doesn't indicate an obvious candidate.

@joel_k_jones , @steleman

I am sorry for having kept you waiting for the patch. I was not fully aware of the approval process upstream (didn’t do much contributions upstream myself), and before being able to approve I had to understand the process.

I will approve the patch and you will be able to submit it, but before doing that I would like to ask another tiny change.

I would like to change the value of the option from -fveclib=SLEEFGNUABI to -fveclib=SLEEFGNUABI10. The rest of the patch can stay the same.

By specifying “10” at the end of the option value, it will be clear to the user that they are using high precision functions. In the future, we might add the 3.5 ULP ones with the option SLEEFGNUABI35.

Thank you for promoting SLEEF in clang!

I hope this makes sense.

Francesco

I will approve the patch and you will be able to submit it, but before doing that I would like to ask another tiny change.

I would like to change the value of the option from -fveclib=SLEEFGNUABI to -fveclib=SLEEFGNUABI10. The rest of the patch can stay the same.

By specifying “10” at the end of the option value, it will be clear to the user that they are using high precision functions. In the future, we might add the 3.5 ULP ones with the option SLEEFGNUABI35.

This is completely unnecessary.

Whether or not SLEEF uses ULP 1.0 or ULP 3.5 will be determined by -ffast-math passed on compile-line. There is no need to invent two or more different argument values for SLEEF.

For now, SLEEF will use ULP 1.0 exclusively.

When TargetLibraryInfo is expanded to capture the information about -mcpu=, -march=, -mtune= and -ffast-math, the choice between ULP 1.0 and ULP 3.5 will happen silently at compile-time.

There is no need for having two or more different argument values for -fveclib=sleefgnuabi.

This opens the door for having potentially N number of possible values for -fveclib=sleefgnuabi. That's confusing to the end-user and it doesn't follow the existing model already implemented by Intel with SVML.

fpetrogalli accepted this revision.Nov 20 2018, 10:05 AM

For now, SLEEF will use ULP 1.0 exclusively.

When TargetLibraryInfo is expanded to capture the information about -mcpu=, -march=, -mtune= and -ffast-math, the choice between ULP 1.0 and ULP 3.5 will happen silently at compile-time.

Ah, this sounds reasonable to me, thank you for pointing it out, apologies in advance in case you already mentioned it and I missed it.

Kind regards,

Francesco

This revision is now accepted and ready to land.Nov 20 2018, 10:05 AM
This revision was automatically updated to reflect the committed changes.

@joel_k_jones , @steleman

I am sorry for having kept you waiting for the patch. I was not fully aware of the approval process upstream (didn’t do much contributions upstream myself), and before being able to approve I had to understand the process.

I will approve the patch and you will be able to submit it, but before doing that I would like to ask another tiny change.

FWIW, I'm sorry to push on this, but I think it would be better to have review from folks more familiar w/ generic LLVM optimizations and with SLEEF to review this. The obvious candidate, IMO, would be Hal. If Hal is unavailable, we could have a number of other people that would be very effective at reviewing (Tim, others...) with deep and londstanding experience in this place.

Given that this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, so I think it is reasonable to expect a bit more wide review before it gets landed. Essentially, I'd suggest being a bit more reluctant to approve this kind of patch. =D

That said, I also want to say a huge thank you for actually reviewing (and doing a really detailed review!) of patches like this. It's really helpful, and please don't stop doing taht. It's simply amazing and definitely helps our process scale.

FWIW, I'm sorry to push on this, but I think it would be better to have review from folks more familiar w/ generic LLVM optimizations and with SLEEF to review this. The obvious candidate, IMO, would be Hal. If Hal is unavailable, we could have a number of other people that would be very effective at reviewing (Tim, others...) with deep and londstanding experience in this place.

Given that this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, so I think it is reasonable to expect a bit more wide review before it gets landed.

To make things perfectly clear:

  • This changeset doesn't do anything that wasn't already in LLVM. -fveclib=SVML support was added by Intel prior to this changeset.
  • There is nothing here that qualifies as this is a pretty significant change for AArch64 at least, and potentially for LLVM as a whole, as -fveclib=<X> is not the default in clang.
  • The only difference is in the veclib library: SLEEF instead of SVML. And in the vector function names, obviously.
  • @shibatch - a.k.a. the author of SLEEF - is subscribed to this changeset, and has commented on it already.

Essentially, I'd suggest being a bit more reluctant to approve this kind of patch.

I'd like a better understanding of what this really means.

Adding Hal Finkel and Tim Northover as reviewers as I discussed with Chandler in IRC.

I don't think that this is such a major change, at least in terms of possibility of breaking anything without notice or in terms of introducing a new kind of infrastructure.

Adding new support for a vector math library seems worth getting reviewed / approved by someone reasonably deeply familiar w/ our vectorization infra.

There are a bunch of possibilities here. I suggest Hal because I know he has worked specifically with SLEEF and vectorization before. But there are plenty of others. You can look at folks who have touched lib/Transforms/Vectorize if you'd like to see others.

I'm not trying to slow anything down, and maybe the code change is super simple and this will just be a quick stamp. But it seems quite useful to double check with folks that might have context on this area, etc.

Overall, this change looks ok to me. The tests look good, too.

However, I have a question that I have asked before and don't seem to see a solution here: how do you deal with evolving library support?

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

They cope with the main libraries by validating and releasing a clear set of compatible libraries, and the GNU community participates heavily in that process by testing and patching gcc, glibc, binutils, etc.

I don't think SLEEF maintainers will have the same stamina, so it's possible that an update in either distro or SLEEF breaks things, they reject the new package, but now, LLVM is broken, too.

This is not an LLVM problem, per se, but by supporting SLEEF, you'll be transferring the problem to the LLVM community, as we'll be the ones receiving the complaints first.

This also happens in LLVM proper, when we have just released a new version and people haven't updated SLEEF yet, and we receive a burst of complaints.

The real question is: is the SLEEF community ready to take on the task of fixing compatibility / linkage issues as soon as they occur? The alternative is what we do with other broken code: deletion. I want to avoid that.

The other question, inline, is about the argument name. We really want to use the same as existing one, for the sake of least surprise.

Some additional random comments...

Adding the triple here shows the real dependency that was never added, because SVML assumes Intel hardware, so no problems there.

The discussion on how to simplify the pattern matching is valid, but at this stage, I think we can keep the existing mechanism.

I agree that this is bound to increase with SVE but I'm not sure writing a whole new table-gen back end just for this is worthy.

The code would be slightly longer on the src side (instead of build side) but it's well organised and clear, so no big deal.

I tend to agree with @steleman that auto-generating as a loop+replace can be tricky if there are support gaps in the library.

But further cleanup changes to this patch could prove me wrong. :)

cheers,
--renato

llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

A quick google search told me GCC uses -fveclib=SLEEF. We don't want to be different or that would create big problems in build systems. It seems, at least right now, there's only one ABI, so there's no point in premature classification.

Hello Renato,

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

SLEEF already has support for SVE.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

This is also a problem in SVML.
My understanding is that this patch is kind of tentative, which will be used until the OpenMP patch ( https://reviews.llvm.org/D54412 ) will be merged.
After that, the include file that is contained in the vector math library is referred by the compiler to see which math functions are available. So this problem won't occur.

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

SLEEF already has support for SVE.

I see, but the point is still the same.

Implementation will evolve, new functions will be added, and testing all variants will still lead to the same maintenance issues.

The bottom line is: this is a long term commitment, and I want to make sure everyone involved is aware of that.

This is also a problem in SVML.

Yes, and Intel has been very forthcoming in maintaining their code.

My understanding is that this patch is kind of tentative, which will be used until the OpenMP patch ( https://reviews.llvm.org/D54412 ) will be merged.
After that, the include file that is contained in the vector math library is referred by the compiler to see which math functions are available. So this problem won't occur.

In that case, I propose we either make this experimental, or we wait until that patch lands and we can change this one accordingly.

I am worried to walk into a situation where we generate OpenMP code where no pragmas were written, as we don't (and shouldn't) include libomp by default.

I know this patch doesn't do that, but some of the comments here and there make allusion to that fact, and we should be careful how we proceed.

I would welcome @hfinkel's opinion.

Hi Renato,

My answers inline.

Overall, this change looks ok to me. The tests look good, too.

However, I have a question that I have asked before and don't seem to see a solution here: how do you deal with evolving library support?

I'm not. I'm punting, simply because I can't control SLEEF's potential ABI changes from LLVM. :-) And absent a formal definiton of SLEEF's mangling and ABI, I don't see how this problem can be dealt with in a definitive way.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

Which means that those environments who will want to use SLEEF will have to build their own.

The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

Today, we don't have SVE, so we don't generate SVE bindings, all fine. Say, next year you upstream SVE bindings in SLEEF and then change LLVM to emit those too.

The objects compiled with LLVM next year won't link against SLEEF compiled today. You can say "compile SLEEF again", but Linux distros have different schedule for different packages, especially when they have different maintainers.

The objects compiled with LLVM next year will still link with SLEEF from today. They just won't be binding to the SVE version of SLEEF. They'll keep binding to the non-SVE SLEEF.

Meaning, ThunderX2/T99 - for example - will never use SVE, because SVE is not available in T99 silicon.

So, future SVE becomes a source of problems only for AArch64 silicon that has SVE support. For those CPU's it becomes a compile-time/link-time choice.

If they want to use a SVE-enabled SLEEF, they'll have to build a SVE-enabled SLEEF to link against. And I suspect they'll have to say '-march=armv8.2+sve' - or something like that - on compile-line.

Those environments can always use two different versions of SLEEF - one with SVE, the other one without, just by using different shared object names and different binding SONAMEs: one for SVE, the other one for not SVE.

clang doesn't auto-magically pass -lsleefgnuabi on link-line, so that's left up to the consumer.

They cope with the main libraries by validating and releasing a clear set of compatible libraries, and the GNU community participates heavily in that process by testing and patching gcc, glibc, binutils, etc.

I don't think SLEEF maintainers will have the same stamina, so it's possible that an update in either distro or SLEEF breaks things, they reject the new package, but now, LLVM is broken, too.

This is not an LLVM problem, per se, but by supporting SLEEF, you'll be transferring the problem to the LLVM community, as we'll be the ones receiving the complaints first.

This also happens in LLVM proper, when we have just released a new version and people haven't updated SLEEF yet, and we receive a burst of complaints.

The real question is: is the SLEEF community ready to take on the task of fixing compatibility / linkage issues as soon as they occur? The alternative is what we do with other broken code: deletion. I want to avoid that.

The other question, inline, is about the argument name. We really want to use the same as existing one, for the sake of least surprise.

I agree with you. ;-) I will change it back to fveclib=SLEEF. ;-)

Some additional random comments...

Adding the triple here shows the real dependency that was never added, because SVML assumes Intel hardware, so no problems there.

The discussion on how to simplify the pattern matching is valid, but at this stage, I think we can keep the existing mechanism.

I agree that this is bound to increase with SVE but I'm not sure writing a whole new table-gen back end just for this is worthy.

The code would be slightly longer on the src side (instead of build side) but it's well organised and clear, so no big deal.

I tend to agree with @steleman that auto-generating as a loop+replace can be tricky if there are support gaps in the library.

But further cleanup changes to this patch could prove me wrong. :)

Now about the -fveclib=OpenMP + #pragma omp simd story:

I am fully aware of the OpenMP SIMD Plan[tm] and of the not-really-a-RFC RFC at https://reviews.llvm.org/D54412.

Here are my thoughts on that - i've already expressed some vague version of those thoughts earlier in this changeset:

  1. If the -fveclib=OpenMP feature requires changes to existing code to work -- namely inserting #pragma omp simd directives in existing production code -- then it's a non-starter by design. There are many environments in HPC/Super-computing where making changes to existing production code requires an Act Of God and half a million $USD in code re-certification costs.
  1. Not every program wants to become dependent on OpenMP. Some programs really really don't want OpenMP. There should exist a mechanism for those kinds of programs to vectorize libm functions without involving OpenMP.
  1. The story with the "magic" header file. Some environments might consider this "magic" header file a sneaky way of implementing [1] -- namely production source code changes. A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.
  1. I am very happy about the thought of OpenMP being able - at some point in the future - to vectorize things. Programs that already use OpenMP will happily take advantage of this future feature. But, based on my understanding of the current OpenMP reality, this new OpenMP feature won't be available for a while. It's certainly not available now. So, how do we accomplish libm vectorization in the meanwhile?

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

The name mangling rule in SLEEF is basically the vector function ABI for AArch64.
The functions for Intel ISA conform to the corresponding ABI for x86.

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

Debian now includes SLEEF as a package. https://tracker.debian.org/pkg/sleef

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

Several open source projects including PyTorch have already adopted SLEEF.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.

The name mangling rule in SLEEF is basically the vector function ABI for AArch64.
The functions for Intel ISA conform to the corresponding ABI for x86.

https://developer.arm.com/products/software-development-tools/hpc/arm-compiler-for-hpc/vector-function-abi

I know. :-)

Speaking from my own experience: I have never met an ABI that didn't have to change after it was adopted and ratified as a PDF. ;-)

So yes, the AArch64 Vector ABI is probably stable, with a 99.3% level of confidence.

I am taking it on faith that SLEEF won't keep changing their mangling and ABI.
AFAICS, SLEEF isn't included in any Linux distro. I can't find any reference to it in RH/CentOS/Fedora, or at rpmfind. Or Ubuntu or SuSE.

As @shibatch said, they use Intel's and Arm's ABIs, so that should be fine. But also, that this is not just a distro issue, but bundling into other libraries.

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

The signal I'm getting from all labs (in US, Europe and Asia) is that they really *don't* want to maintain their own binaries. They just cope with it because we don't do a good job at the tools/packaging level.

I'm not willing to add more chaos into the mix.

Those environments can always use two different versions of SLEEF - one with SVE, the other one without, just by using different shared object names and different binding SONAMEs: one for SVE, the other one for not SVE.
clang doesn't auto-magically pass -lsleefgnuabi on link-line, so that's left up to the consumer.

SVE was just an example of completely missing options, but new functions and potentially renames can raise much harder issues to fix. We had such issues with glibc, libgcc, compiler-rt in the past and it takes awfully long to go away.

The link flag issue is minor, as people passing -fveclib=sleef to CFLAGS will also pass -lsleef to LDFLAGS, but that doesn't solve the versioning problem.

  1. I am very happy about the thought of OpenMP being able - at some point in the future - to vectorize things. Programs that already use OpenMP will happily take advantage of this future feature. But, based on my understanding of the current OpenMP reality, this new OpenMP feature won't be available for a while. It's certainly not available now. So, how do we accomplish libm vectorization in the meanwhile?

The alternative is to call it experimental and take slow steps. This is mostly a political change, not technical, so it won't involve a lot of red tape. As I said before, the code looks fine to me.

What this normally means is:

  • Commit messages and code comments have "experimental" in them
  • A small paragraph of comments on the code explaining the problems with the current approach
  • Developer availability to move code around or re-write in the near future

The bad side is that:

  • Experimental features have less priority than other changes
  • Other developers are far less likely to keep that part of the code in good shape
  • If this code breaks some other, "official" code, it could be reverted or changed
  • If no one takes the maintenance role, the code will bit rot and be eventually removed

A code moves from experimental to official when:

  • There are enough users / developers maintaining it for months / year
  • There are enough validation efforts (buildbots, new tests, distro use)
  • The code, and its tests, don't break other unrelated changes often enough
  • Developers usually respond promptly to such breakages

As I said, I still want to hear from Hal/Tim, but I'd be happy to call this experimental, as long as we're agreeing that there is a commitment to look after this code (and all ramifications, such as for OMP).

The commitment doesn't need to be from you directly, but from enough developers to make this worthwhile.

I just want to make clear I do want SLEEF support in LLVM, but we need to make it right from the start.

cheers,
--renato

Which means that those environments who will want to use SLEEF will have to build their own.
The upside is that SLEEF is likely of interest only to the rarefied world of HPC/Super-computing. Those environments can definitely build their own SLEEF and maintain their own binary copy.

The signal I'm getting from all labs (in US, Europe and Asia) is that they really *don't* want to maintain their own binaries. They just cope with it because we don't do a good job at the tools/packaging level.

I'm not willing to add more chaos into the mix.

Then they are in trouble with a hypothetical distro SLEEF.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

Many applications and libraries including SLEEF use dispatchers to choose a code with which the available vector extension can be utilized.

Then they are in trouble with a hypothetical distro SLEEF.

That's because no commercial distro - if they ever decide to include SLEEF - will compile SLEEF for the maximum CPU feature-set possible. They do the exact opposite - they compile it for the lowest possible common denominator.

For example, on AArch64, you won't get distro libraries compiled specifically for -mcpu=thunderx2t99. Or on x86_64 you'll never get -mavx512ifma, for example.

A bit off topic, but there are two solutions to that problem:

  1. Dynamic Arch

The likes of IFUNC and virtual dispatch, which compiles all variants, but chooses the right one at run time. OpenBLAS uses this, for example.

This is easy on the compiler, which knows all variables at build time and no user are harmed in the process. But it's hard to maintain and has additional runtime costs.

  1. Modules / Overlays

LMod, EasyBuild, Spack can choose the appropriate libraries at build / run time as needed.

That's not so easy on the compiler, especially if the modules are shared libraries, linked at runtime. It gets worse if users build half local and half use the modules.

Both are preferable to having every cluster user re-compile the whole stack every time, or having cluster admins curate a random number of random builds at random places (which is usually the end result).

That is also assuming the user knows how to best compile the libraries, better than the developers or packagers, which is usually not true.

For example, just adding march / mtune to OpenBLAS doesn't do much good and only improves a fraction of forcing the right arch via TARGET=ARCH, which then will select the right flags anyway.

fpetrogalli added inline comments.Dec 11 2018, 1:38 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hi @rengolin - where did you find this option? I am struggling to find it in the gcc code base and in the gcc developer mailing lists. GCC uses -mveclibabi, but it is only for x86.

rengolin added inline comments.Dec 11 2018, 1:53 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hum, now I'm struggling to find it, too. I may have hit a sweet spot between sleef from clang and mveclibabi from gcc and assumed it worked.

This page [1] tells me only svml and acml are supported, so I stand corrected. However, if/when gcc supports it, mveclibabi=sleefgnuabi will be very weird. So I still prefer sleef instead of sleefgnuabi.

Also, why do we use fveclib and not mveclibabi? This isn't really helpful for build systems...

[1] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

fpetrogalli added inline comments.Dec 11 2018, 2:07 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

I proposed to use -whateveroption=SLEEFGNUABI instead of -whateveroption=SLEEF because SLEEF is produced in two version: libsleef.so and libsleefgnuabi.so. The two libraries have different signatures and naming conventions. The GNUABI version is compatible with the names of the Vector Function ABI of x86 and AArch64, and with the signature of the functions in glibc, which makes it a superior choice as it adheres to standards. That is why I think this patch is making the right choice in choosing the GNUABI names.

Now, although we might have personal preferences on how to name things (I agree with you that mveclibabi|fveclib=sleefgnuabi is not pretty), I think the ultimate criteria for our choice is correctness and clarity. Having -fveclib=SLEEF when the compiler will be really using libsleefgnuabi.so would just be confusing, especially if some day we decide that we want to support also libsleef.so (for example, some workloads might prefer the sincos signature in there over the one provided in libsleefgnuabi).

rengolin added inline comments.Dec 11 2018, 2:15 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Is there any realistic expectation that we should support two ABIs for the same library with one being incompatible with the other ABIs (x86, Arm, Gnu)?

If these are just aliases, we don't even need to have two SO objects, and just add the aliases and wrappers to the main SO. We do that in Compiler-RT because it would be a nightmare otherwise.

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

Yes. Distro upgrades must be (and are) certified. Once they're certified, they stay immutable for a very long time.

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

So the non-openmp case doesn't change code, and doesn't introduce external dependencies that may not have been present in the initial implementation. It just adds a function attribute. Which, in GCC-land, translates to a function call to a (possibly inlined) vectorized function.

I am not, by any means, suggesting that there is something inherently wrong with using OpenMP for SIMD libm vectorization. There is absolutely nothing wrong with that approach.

However, OpenMP cannot be the only method of obtaining vectorization of libm functions, for reasons already mentioned. There has to be an alternate method, one that does not involve importing OpenMP pragmas and interfaces, and their implicit trigger of code changes. GCC provides that alternative with attribute((simd)).

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

Both directive follows the syntax of the declare simd and the
declare variant directives of OpenMP, with the exception that
declare variant is used only for the simd context.

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

Both directive follows the syntax of the declare simd and the
declare variant directives of OpenMP, with the exception that
declare variant is used only for the simd context.

  1. Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
  1. What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
  1. If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Yes, but bear in mind that the proprietary compiler options will always have less weight than other OSS compilers.

Putting it bluntly, if it doesn't work in GCC, people will not use it. Regardless of what any proprietary compiler does, based or not in LLVM.

We already have some clang pragmas for vectorisation, we don't need yet another syntax, and if we do, we should use OpenMPs.

Using OMP pragmas doesn't necessarily means we need to link with OMP libraries, and that's a special case for OMP SIMD.

Any new pragma MUST be done though an RFC process in the llvm-dev list, that's for sure.

steleman updated this revision to Diff 178475.Dec 17 2018, 9:04 AM

Updated version of the patch/changeset, as per comments from Renato:

  • veclib argument reverted back to SLEEF.
  • Corresponding enum names reverted back to SLEEF.
  • New benchmark comparing SLEEF performance when built with GCC 8.2 vs. clang ToT from 12/15/2018:

https://docs.google.com/spreadsheets/d/1QgdJjgCTFA9I2_5MvJU0uR4JT4knoW5xu-ujxszRbd0/edit?usp=sharing

Thanks! I've updated the reviewers on the Clang review (D53928).

Ping!

Yes, I realize people were on holidays. :-)

Is this OK to go in (the second time around)?

Is anyone else interested in commenting?

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

OK, so I ping'ed Clang and no-one is biting. :-)

So, what are the next steps for getting this into LLVM?

Hal approved the Clang side, so this one looks good, too.

Please commit this one first, then Clang's, to make sure fast bots and bisects don't break.

Thanks!

Thank you Hal and Renato.

steleman updated this revision to Diff 185576.Feb 6 2019, 9:21 AM

This is a significant update to the original version of the SLEEF
changeset.

While testing LLVM+SLEEF on AArch64, I discovered that, for programs
such as the following simple test case:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

int main()
{
  double D;
  double R;

  srand48(time(0));

  D = drand48();
  R = tgamma(D);
  (void) fprintf(stderr, "tgamma: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acos(D);
  (void) fprintf(stderr, "acos: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acosh(D);
  (void) fprintf(stderr, "acosh: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = asin(D);
  (void) fprintf(stderr, "asin: D=%lf R=%lf\n", D, R);

  return 0;
}

Compile with:

./clang -O3 -mcpu=thunderx2t99 -fvectorize -fveclib=SLEEF -ffast-math -ffp-contract=fast -c t.c -o t.o

compilation will fail in LLVM:

fatal error: error in backend: Cannot select: intrinsic %llvm.asin
clang-8: error: clang frontend command failed with exit code 70 (use -v to see invocation)
[ ... etc etc ... ]

This failure is triggered by -ffast-math -ffp-contract=fast. Without
these flags, the fatal error does not occur.

This only happens when the newly-enabled intrinsics (tgamma, acos,
acosh, asin, etc) are called on a scalar -- as in the test program above.

When called on a vector, with or without -ffast-math -ffp-contract=fast,
these new intrinsics always succeed.

This updated version of the patch addresses this problem.

I have also added a second test case for SLEEF AArch64 which tests for

%call = call fast float @acosf(float %conv)

I realize that this patch has grown in size significantly, and that
it may be a candidate for splitting it into smaller parts.

The problem is, I do not quite see how it can be split into several
smaller parts, simply because without this entire patch, the new
intrinsics will cause a fatal error in the LLVM backend, as shown above.

The patch is based on LLVM ToT from 02/06/2019.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 9:21 AM
rengolin reopened this revision.Feb 9 2019, 5:08 AM

Right, this is a completely different patch and it seems that the original proposal hadn't been tested thoroughly.

It did feel as the initial patch was far too simple for what it enabled, but I thought this was due to implementations in the intrinsic selection that I didn't follow for the past year or two.

By looking over this new patch, it feels as though most changes are mechanical, but I'm afraid I don't have enough familiarity with the code base any more to be sure without further testing.

Given that this review is "approved" and "closed" and then reverted, and that the implementation is now totally different than the original approved patch, I'll reopen this one and request revision again.

Also, the Clang part (D53928) seems to have changed significantly after approved, so that, too, would need revisiting. I have updated that review to look at it again.

This revision is now accepted and ready to land.Feb 9 2019, 5:08 AM
rengolin requested changes to this revision.Feb 9 2019, 5:08 AM
This revision now requires changes to proceed.Feb 9 2019, 5:08 AM

Hi,

I have a few comments inline, mostly harmless, as the change is almost entirely mechanical and makes sense.

I'm happy with the change as is, no need to split (as I also can't see how), after we address all the comments.

That'd also give some time for other people to review the new changes.

cheers,
--renato

include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

Probably a silly question but, why "tgamma" when the function name is "gamma"?

include/llvm/IR/RuntimeLibcalls.def
258 ↗(On Diff #185576)

Looks like someone should one day look into generating this file, an OPcode section and TargetLibraryInfo from the intrinsics table-gen... Not now, though.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

This would probably fail in many non-Arm64 arches, too.

Best to restrict to what this was meant for, which I'm guessing is x86.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1108 ↗(On Diff #185576)

Please clean up all "new intrinsics" comments, as they're not necessary for the patch.

I imagine they were helpful while developing... :)

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952 ↗(On Diff #185576)

I think we can have these in the same place as the others, but with an "experimental" comment before.

The main idea behind the experimental tag is so that people don't rely on it until it's complete, but once it is, they can.

So if we create them in different places, we'd have to move them in once they're no longer experimental and that would be a pain.

I'm not expecting this to be in experimental for too long and am hoping to get this official by the next release (july-ish 2019), so, let's keep the experimental status on comments only and then remove the comments when everything is working as expected.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
1 ↗(On Diff #185576)

We don't generally recommend O3 in tests anyway, as this introduces test noise when the passes in O3 change (add/del/move around).

3 ↗(On Diff #185576)

does fp-contract=fast really make a difference for these transformations?

6 ↗(On Diff #185576)

is this really specific to Arm64? At least the generation could be done for any target, if the library supports it or not is not relevant for this test, no?

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
3 ↗(On Diff #185576)

What's the difference between these two test files?

Hi Renato,

Thank you very much for the comments.

I will create a new changeset with all your suggestions, and then re-submit.

bryanpkc added inline comments.
include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

This didn't work in my test build. The function names must be in alphabetical order, so __gamma_r_finite etc. cannot come after __isoc99_scanf.

rengolin added inline comments.Feb 14 2019, 2:51 AM
include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

Better move to alphabetical order, then. Would that remove the need to call it "tgamma"?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Ah! Sounds good. :)

Thanks!

steleman marked 11 inline comments as done.Feb 14 2019, 10:32 PM

Added inline comments/answers to Renato's questions.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

I am a bit worried about this one. The original set exp10|exp10f|exp10l as unavailable on all arches. I originally enabled these functions for AArch64 only.

If we restrict this to x86|x86_64 only, then these functions will become available
on MIPS, PPC, PPC64, etc. I have no way of testing these ISA's to determine if they're broken or not.

So I disabled these functions for x86|x86_64 only. Let me know if you'd like to revert this to the original, which was to enable them on AArch64 only.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952 ↗(On Diff #185576)

Renamed this function to expandNonVectorIntrinsics.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
3 ↗(On Diff #185576)

It does not. Which is good. :-) That's the point of the test - the transformations should stay identical, and pass, with or without -fp-contract=fast.

6 ↗(On Diff #185576)

SLEEF is only currently enabled for AArch64 in TargetLibraryInfo. So, for any other ISA, all the tests will fail, because the transformations won't happen.

Plus, for any ISA other than AArch64, the SLEEF function name mangling will be different than on AArch64. Which means this particular test - as written for AArch64 - can't be used on any other ISA; all the CHECKs will fail.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
3 ↗(On Diff #185576)

sleef-calls-aarch64-fast.ll calls

%call = call fast <type> <function-name>(<argument>)

sleef-calls-aarch64.ll calls

%call = tail call <type> <function-name>(<argument>)
steleman updated this revision to Diff 186962.Feb 14 2019, 10:34 PM
steleman marked 5 inline comments as done.

Updated changeset to reflect comments/questions from Renato.

This new changeset is based on LLVM ToT from 02/14/2019.

Thanks for your answers, they make sense. Other than the mistake I made on disabling exp calls (you were right), looks fine.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

My bad, I read it the other way round. Your original patch was correct.

steleman updated this revision to Diff 186993.Feb 15 2019, 4:17 AM

Reverted change in TargetLibraryInfo.cpp re: exp10|exp10f|exp10l on Linux with GLIBC.

These functions are now allowed on AArch64 or AArch64_be only.

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

@bryanpkc What's your environment?

@steleman Any idea why you don't see that on your builds?

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

rengolin accepted this revision.Feb 15 2019, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

This revision is now accepted and ready to land.Feb 15 2019, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

Thank you Sir! :-)

bryanpkc added a comment.EditedFeb 15 2019, 3:35 PM

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

I think the correct fix here is to use an associative container, not an array, and not rely on writing a text file containing sorted character strings.

std::set comes to mind.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

steleman updated this revision to Diff 187131.Feb 15 2019, 10:41 PM

Corrected manual sorting in TargetLibraryInfo.def as per comments.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

OK, changeset updated. Tested with Debug+Assertions+ExpensiveChecks.

Perfect, thanks!

steleman updated this revision to Diff 187507.Feb 19 2019, 7:53 PM

One minor update:

${top_srcdir}/unittests/Analysis/TargetLibraryInfoTest.cpp

needs an update to account for lgamma, tgamma, lgamma_r_finite and gamma_r_finite.

arsenm added a subscriber: arsenm.Feb 26 2019, 2:13 PM
arsenm added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

I would split adding the new intrinsics into a separate patch

steleman marked 2 inline comments as done.Feb 26 2019, 2:16 PM
steleman added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

That won't work.

echristo added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

"Why not" is always a good follow up to that sort of response :)

steleman marked 3 inline comments as done.Feb 26 2019, 4:03 PM
steleman added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

Because:

  • It adds unnecessary complexity to this changeset and to D53928.
  • There is no obvious benefit to this added complexity.
  • There are many drawbacks to this added complexity.
  • Splitting it based on new vs. existing intrinsics seems arbitrary. One might as well ask for a separate changeset for each and every new intrinsic.

And, now it's my turn to ask:

  • Why should it be split?
  • What is the benefit of splitting it?
  • What will be accomplished by splitting it?
  • Is increased complexity - with all its drawbacks and pitfalls - a goal?
  • Does integrating it as is prevent, or hinder something else?
  • Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Hi @steleman, that wasn't a "loaded question", it was an honest one. Let me try to clarify what that means for all parties.

Because:

It adds unnecessary complexity to this changeset and to D53928.
There is no obvious benefit to this added complexity.
There are many drawbacks to this added complexity.

Every split or merge adds complexity, but when submitting patches upstream, you are required to explain not only what you've done, but why you haven't done some other way. This is the dynamics of upstream review and is not meant to push back or demerit, but to make sure all assumptions are shared and there is consensus.

A better answer would be:

  • because puling it apart would essentially create two very similar patch sets, possibly the second overwriting behaviour
  • because we would need to make sure the intermediate state is sane (to avoid crashes in bisects) and that means testing a state that doesn't make sense
  • because the final subset is too small, next to the refactoring needed, and the refactoring is part of "adding" the functionality

in addition to your own:

Splitting it based on new vs. existing intrinsics seems arbitrary. One might as well ask for a separate changeset for each and every new intrinsic.

To answer your questions...

And, now it's my turn to ask:

Why should it be split?
What is the benefit of splitting it?
What will be accomplished by splitting it?
Does integrating it as is prevent, or hinder something else?

Again, that was a standard question in upstream reviews, especially LLVM, as we have a more aggressive commit policy (more trunk breakages, more bisects, more reverts). For that to work, we must introduce the least amount of breakages, even in intermediate states, so they have to be small ans testable.

Furthermore, reverting the refactoring plus the actual change is more complicated than the actual change. If the breakage is in your usage of the refactoring, we can only revert that, leaving the refactoring, and being able to test the refactoring without the new behaviour and with, to find if the original assumption was broken, or if it was some silly mistake on the usage of it.

Is increased complexity - with all its drawbacks and pitfalls - a goal?

Don't assume ill intent, those questions are always worth asking and rarely meant as anything else other than double (or triple) checking consensus.

Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Again, another standard upstream behaviour (on Linux, GCC and others as well).

People are copied in all sorts of reviews, code owners even more. If we were to review everything that people copy us in straight away we wouldn't be able to do anything else, including our day jobs.

Matt's and Eric's comments here are very welcome, and I would appreciate their review on our comments about the design decisions. If they agree, we have more hope that this was the right thing all along. If they have more questions, proposals and change requests, we will be relieved that they caught something we didn't before users started relying on it.

It does take some time, but when it goes in, there's a higher probability of being right.

Given that this change (and its Clang counterpart) will add external user-visible behaviour that will have to be maintained for a long time, it's only right that more people chime in when they have time.

I'm glad Matt and Eric took the time before it went in.

cheers,
--renato

steleman marked an inline comment as done.Feb 27 2019, 5:06 AM

Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Again, another standard upstream behaviour (on Linux, GCC and others as well).

People are copied in all sorts of reviews, code owners even more. If we were to review everything that people copy us in straight away we wouldn't be able to do anything else, including our day jobs.

Matt's and Eric's comments here are very welcome, and I would appreciate their review on our comments about the design decisions. If they agree, we have more hope that this was the right thing all along. If they have more questions, proposals and change requests, we will be relieved that they caught something we didn't before users started relying on it.

It does take some time, but when it goes in, there's a higher probability of being right.

Given that this change (and its Clang counterpart) will add external user-visible behaviour that will have to be maintained for a long time, it's only right that more people chime in when they have time.

I'm glad Matt and Eric took the time before it went in.

Renato,

I don't have a problem with various people chiming in and offering comments and ideas.

I am, however, confused, about the concept of open-ended discussion that never reaches a terminal point, even though it would appear that it did.

I'm not sure what the proposed split would consist of. Would it be

  • A patch consisting of only the changes in Intrinsics.td which would be applied first. Wouldn't the changes in SelectionDAG be required as well?
  • Everything but the changes in TargetLibraryInfo.cpp etc. for supporting SLEEF and including any related test cases that would be upstreamed 2nd?

Would the second choice be useful by itself? Either in functionality or improved cross-platform testing before the SLEEF support went in?

I'm not sure what the proposed split would consist of.

I can see a few different concepts introduced here:

  • Lowering sin/cos/tan, etc and all related mechanical changes (large)
  • Adding the STRICT variants, with aliases on the lowering switches (medium)
  • Changing the log10 behaviour and code movement (small)
  • Adding SLEEF at the end, making use of all three patches above (tiny)

The lines between those concepts may be blurred, so it's never a clear cut, and trying to split the patch in that sequence may show problems I haven't consider.

But Git is quite helpful in splitting patches into a branch, and then testing the branch, and then one patch at a time.

From a high point of view, I can't see why any intermediate state would be harmful to builds or tests, given they only add unused code, which will be enabled by the last one.

I also imagine the tests can only work after the last patch is applied, and that's the main reason why I was ok with the patch being a single bulk.

The other reason is because there's a Clang patch, which needs to be applied after all these, and it may cause bisect issues to have too many depending patches across repos.

Hopefully the monorepo will help solve that problem, but we're not there yet.

I'm trying to understand what log10 behavior got changed. Could someone out where that change is?

I'm trying to understand what log10 behavior got changed. Could someone out where that change is?

Sorry, I mixed things up.

I was referring to the log10 order change in the list, but that was done *after* lgamma/tgamma was introduced, to build in the right order (in debug builds), so that was a change on gamma order, not log10.

Just ignore that item.

This revision was automatically updated to reflect the committed changes.
Artoria2e5 added a subscriber: Artoria2e5.EditedAug 21 2020, 7:01 AM

This should probably be reopened following commit 7459398a436f67f304c8653f39b8d82109778fec which reverted this change. Not doing that myself considering I may have missed something like a resubmission.

Update: searched the site to make sure, and it looks like there really is none. D53928 is accepted, but obviously can't be merged unless this one is there.

Artoria2e5 reopened this revision.Aug 21 2020, 10:50 AM
This revision is now accepted and ready to land.Aug 21 2020, 10:50 AM

Hi all!

I was looking throughout LLVM repo to see if there is any SLEEF support here, and this is exactly what I need. Any updates on this patch?

Thanks,
George

Hi @georgemitenkov,

This was a long time ago, I completely forgot about this.

I was happy with the patch before and I have no reason to hot be happy now, but we need someone else to have a look to make sure we don't revert it back again.

@echristo @arsenm @hfinkel would be good candidates.

That is, of course, if @steleman and @joelkevinjones are still interested in pushing this forward (and its clang counterpart, which now can be atomically committed together due to the monorepo).

The main pending issue I see, reading back the comments, is a split on the intrinsics. I personally don't think it's a good idea, but am not opposed to such a change. It'd be up to the authors and other reviewers to decide on that, though.

As it stands, despite having been approved by me and @fpetrogalli, it would need an additional approval by another reviewer (preferably one of the three mentioned above) to be committed.

Thanks,
--renato

Matt added a subscriber: Matt.May 26 2021, 12:13 PM

Thanks @rengolin! I guess in this case we have to see what @steleman and @joelkevinjones think and if they are interested (As well as other reviewers).

It would be really great to have this functionality. With the recent patch (https://reviews.llvm.org/D95373) math intrinsics can be changed into calls for SIMD libraries, and having Sleef there just completes the picture. I am wondering if may be it is better to discuss this on the mailing list? (Sometimes reverted patches are lost for reviewers)

Thanks @rengolin! I guess in this case we have to see what @steleman and @joelkevinjones think and if they are interested (As well as other reviewers).

It would be really great to have this functionality. With the recent patch (https://reviews.llvm.org/D95373) math intrinsics can be changed into calls for SIMD libraries, and having Sleef there just completes the picture. I am wondering if may be it is better to discuss this on the mailing list? (Sometimes reverted patches are lost for reviewers)

  1. I no longer have access to an AArch64 server to re-do and test this patch - I am no longer at Marvell. The patch in its current form will not apply to trunk because it is much too old.
  1. I can't use IBM TJWatson's infrastructure or IBM time to work on this patch. It's outside my responsibilities. I am willing to work on it in my free time, as a hobby.
  1. If I somehow obtain access to an AArch64 box to re-do this patch (I'm fuzzy on that at the moment), and I spend my own free time doing the work, I would like to know ahead of time that it will not degenerate again into what went on here from 2018 through 2019.
  1. If I somehow obtain access to an AArch64 box to re-do this patch (I'm fuzzy on that at the moment), and I spend my own free time doing the work, I would like to know ahead of time that it will not degenerate again into what went on here from 2018 through 2019.

I'd like to get feedback from @echristo @ararslan @hfinkel and others before we start working on it.

But assuming they're ok with it I can perhaps give you access to an AArch64 machine in my garage, or you can ask @maxim-kuvyrkov for some more beefy server access.