Page MenuHomePhabricator

[AArch64] Enable libm vectorized functions via SLEEF
ClosedPublic

Authored by steleman on Oct 31 2018, 4:00 AM.

Details

Summary

This changeset is modeled after Intel's submission for SVML. It enables trigonometry functions vectorization via SLEEF: http://sleef.org/.

  • A new vectorization library enum is added to TargetLibraryInfo.h: SLEEF.
  • A new option is added to TargetLibraryInfoImpl - ClVectorLibrary: SLEEF.
  • A comprehensive test case is included in this changeset.
  • In a separate changeset (for clang), a new vectorization library argument is added to -fveclib: -fveclib=SLEEF.

Trigonometry functions that are vectorized by sleef:

acos
asin
atan
atanh
cos
cosh
exp
exp2
exp10
lgamma
log10
log2
log
sin
sinh
sqrt
tan
tanh
tgamma

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
fpetrogalli added inline comments.Dec 11 2018, 1:38 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hi @rengolin - where did you find this option? I am struggling to find it in the gcc code base and in the gcc developer mailing lists. GCC uses -mveclibabi, but it is only for x86.

rengolin added inline comments.Dec 11 2018, 1:53 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Hum, now I'm struggling to find it, too. I may have hit a sweet spot between sleef from clang and mveclibabi from gcc and assumed it worked.

This page [1] tells me only svml and acml are supported, so I stand corrected. However, if/when gcc supports it, mveclibabi=sleefgnuabi will be very weird. So I still prefer sleef instead of sleefgnuabi.

Also, why do we use fveclib and not mveclibabi? This isn't really helpful for build systems...

[1] https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

fpetrogalli added inline comments.Dec 11 2018, 2:07 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

I proposed to use -whateveroption=SLEEFGNUABI instead of -whateveroption=SLEEF because SLEEF is produced in two version: libsleef.so and libsleefgnuabi.so. The two libraries have different signatures and naming conventions. The GNUABI version is compatible with the names of the Vector Function ABI of x86 and AArch64, and with the signature of the functions in glibc, which makes it a superior choice as it adheres to standards. That is why I think this patch is making the right choice in choosing the GNUABI names.

Now, although we might have personal preferences on how to name things (I agree with you that mveclibabi|fveclib=sleefgnuabi is not pretty), I think the ultimate criteria for our choice is correctness and clarity. Having -fveclib=SLEEF when the compiler will be really using libsleefgnuabi.so would just be confusing, especially if some day we decide that we want to support also libsleef.so (for example, some workloads might prefer the sincos signature in there over the one provided in libsleefgnuabi).

rengolin added inline comments.Dec 11 2018, 2:15 PM
llvm/trunk/lib/Analysis/TargetLibraryInfo.cpp
29 ↗(On Diff #175149)

Is there any realistic expectation that we should support two ABIs for the same library with one being incompatible with the other ABIs (x86, Arm, Gnu)?

If these are just aliases, we don't even need to have two SO objects, and just add the aliases and wrappers to the main SO. We do that in Compiler-RT because it would be a nightmare otherwise.

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

A header file that wasn't there before is now automagically included, and there's nothing they can do to not include it. Boom! - source code change. Happy re-certification planning.

@steleman Could you explain a little bit about when a re-certification is required?
Is it required when a standard header file(e.g. math.h) is changed?

Yes. Distro upgrades must be (and are) certified. Once they're certified, they stay immutable for a very long time.

gcc/glibc is doing a similar thing. In math-vector.h, which is included from math.h, there is declaration like below.

# if defined _OPENMP && _OPENMP >= 201307
/* OpenMP case.  */
#  define __DECL_SIMD_x86_64 _Pragma ("omp declare simd notinbranch")
# elif __GNUC_PREREQ (6,0)
/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
#  define __DECL_SIMD_x86_64 __attribute__ ((__simd__ ("notinbranch")))
# endif

# ifdef __DECL_SIMD_x86_64
#  undef __DECL_SIMD_cos
#  define __DECL_SIMD_cos __DECL_SIMD_x86_64
...

So the non-openmp case doesn't change code, and doesn't introduce external dependencies that may not have been present in the initial implementation. It just adds a function attribute. Which, in GCC-land, translates to a function call to a (possibly inlined) vectorized function.

I am not, by any means, suggesting that there is something inherently wrong with using OpenMP for SIMD libm vectorization. There is absolutely nothing wrong with that approach.

However, OpenMP cannot be the only method of obtaining vectorization of libm functions, for reasons already mentioned. There has to be an alternate method, one that does not involve importing OpenMP pragmas and interfaces, and their implicit trigger of code changes. GCC provides that alternative with attribute((simd)).

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

    Both directive follows the syntax of the declare simd and the declare variant directives of OpenMP, with the exception that declare variant is used only for the simd context.

The RFC for reimplementing -fveclib with OpenMP is titled "OpenMP" because the corresponding code for ARM HPC compiler requires OpenMP.
However, the proposal in RFC does not require OpenMP, as quoted below.

The new proposed #pragma directive are:

  1. #pragma veclib declare simd.
  2. #pragma veclib declare variant.

    Both directive follows the syntax of the declare simd and the declare variant directives of OpenMP, with the exception that declare variant is used only for the simd context.
  1. Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
  1. What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
  1. If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Where, when, how and by whom are these #pragmas inserted? Are they silently inserted by clang? Do they have to be manually inserted in existing code?
What is the effect of these #pragmas? The ARM closed-source commercial compiler requires OpenMP. How will these #pragmas be implemented in the open-source clang? Will they import OpenMP interfaces in the open-source clang as well?
If importing OpenMP interfaces is not triggered by the insertion of these #pragmas, why is the argument to -fveclib called 'openmp'? I would find it baffling to have to type -fveclib=openmp on compile-line, but ending up not using OpenMP.

I think you should post these questions as a reply to the RFC.

Yes, but bear in mind that the proprietary compiler options will always have less weight than other OSS compilers.

Putting it bluntly, if it doesn't work in GCC, people will not use it. Regardless of what any proprietary compiler does, based or not in LLVM.

We already have some clang pragmas for vectorisation, we don't need yet another syntax, and if we do, we should use OpenMPs.

Using OMP pragmas doesn't necessarily means we need to link with OMP libraries, and that's a special case for OMP SIMD.

Any new pragma MUST be done though an RFC process in the llvm-dev list, that's for sure.

steleman updated this revision to Diff 178475.Dec 17 2018, 9:04 AM

Updated version of the patch/changeset, as per comments from Renato:

  • veclib argument reverted back to SLEEF.
  • Corresponding enum names reverted back to SLEEF.
  • New benchmark comparing SLEEF performance when built with GCC 8.2 vs. clang ToT from 12/15/2018:

https://docs.google.com/spreadsheets/d/1QgdJjgCTFA9I2_5MvJU0uR4JT4knoW5xu-ujxszRbd0/edit?usp=sharing

Thanks! I've updated the reviewers on the Clang review (D53928).

Ping!

Yes, I realize people were on holidays. :-)

Is this OK to go in (the second time around)?

Is anyone else interested in commenting?

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

Perhaps best to ping Clang's commit, as that's the review that is blocking this one.

As long as we agree on the behaviour in Clang, this patch should be fine.

OK, so I ping'ed Clang and no-one is biting. :-)

So, what are the next steps for getting this into LLVM?

Hal approved the Clang side, so this one looks good, too.

Please commit this one first, then Clang's, to make sure fast bots and bisects don't break.

Thanks!

Thank you Hal and Renato.

steleman updated this revision to Diff 185576.Feb 6 2019, 9:21 AM

This is a significant update to the original version of the SLEEF
changeset.

While testing LLVM+SLEEF on AArch64, I discovered that, for programs
such as the following simple test case:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>

int main()
{
  double D;
  double R;

  srand48(time(0));

  D = drand48();
  R = tgamma(D);
  (void) fprintf(stderr, "tgamma: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acos(D);
  (void) fprintf(stderr, "acos: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = acosh(D);
  (void) fprintf(stderr, "acosh: D=%lf R=%lf\n", D, R);

  D = drand48();
  R = asin(D);
  (void) fprintf(stderr, "asin: D=%lf R=%lf\n", D, R);

  return 0;
}

Compile with:

./clang -O3 -mcpu=thunderx2t99 -fvectorize -fveclib=SLEEF -ffast-math -ffp-contract=fast -c t.c -o t.o

compilation will fail in LLVM:

fatal error: error in backend: Cannot select: intrinsic %llvm.asin
clang-8: error: clang frontend command failed with exit code 70 (use -v to see invocation)
[ ... etc etc ... ]

This failure is triggered by -ffast-math -ffp-contract=fast. Without
these flags, the fatal error does not occur.

This only happens when the newly-enabled intrinsics (tgamma, acos,
acosh, asin, etc) are called on a scalar -- as in the test program above.

When called on a vector, with or without -ffast-math -ffp-contract=fast,
these new intrinsics always succeed.

This updated version of the patch addresses this problem.

I have also added a second test case for SLEEF AArch64 which tests for

%call = call fast float @acosf(float %conv)

I realize that this patch has grown in size significantly, and that
it may be a candidate for splitting it into smaller parts.

The problem is, I do not quite see how it can be split into several
smaller parts, simply because without this entire patch, the new
intrinsics will cause a fatal error in the LLVM backend, as shown above.

The patch is based on LLVM ToT from 02/06/2019.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 9:21 AM
rengolin reopened this revision.Feb 9 2019, 5:08 AM

Right, this is a completely different patch and it seems that the original proposal hadn't been tested thoroughly.

It did feel as the initial patch was far too simple for what it enabled, but I thought this was due to implementations in the intrinsic selection that I didn't follow for the past year or two.

By looking over this new patch, it feels as though most changes are mechanical, but I'm afraid I don't have enough familiarity with the code base any more to be sure without further testing.

Given that this review is "approved" and "closed" and then reverted, and that the implementation is now totally different than the original approved patch, I'll reopen this one and request revision again.

Also, the Clang part (D53928) seems to have changed significantly after approved, so that, too, would need revisiting. I have updated that review to look at it again.

This revision is now accepted and ready to land.Feb 9 2019, 5:08 AM
rengolin requested changes to this revision.Feb 9 2019, 5:08 AM
This revision now requires changes to proceed.Feb 9 2019, 5:08 AM

Hi,

I have a few comments inline, mostly harmless, as the change is almost entirely mechanical and makes sense.

I'm happy with the change as is, no need to split (as I also can't see how), after we address all the comments.

That'd also give some time for other people to review the new changes.

cheers,
--renato

include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

Probably a silly question but, why "tgamma" when the function name is "gamma"?

include/llvm/IR/RuntimeLibcalls.def
258 ↗(On Diff #185576)

Looks like someone should one day look into generating this file, an OPcode section and TargetLibraryInfo from the intrinsics table-gen... Not now, though.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

This would probably fail in many non-Arm64 arches, too.

Best to restrict to what this was meant for, which I'm guessing is x86.

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1108 ↗(On Diff #185576)

Please clean up all "new intrinsics" comments, as they're not necessary for the patch.

I imagine they were helpful while developing... :)

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952 ↗(On Diff #185576)

I think we can have these in the same place as the others, but with an "experimental" comment before.

The main idea behind the experimental tag is so that people don't rely on it until it's complete, but once it is, they can.

So if we create them in different places, we'd have to move them in once they're no longer experimental and that would be a pain.

I'm not expecting this to be in experimental for too long and am hoping to get this official by the next release (july-ish 2019), so, let's keep the experimental status on comments only and then remove the comments when everything is working as expected.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
1 ↗(On Diff #185576)

We don't generally recommend O3 in tests anyway, as this introduces test noise when the passes in O3 change (add/del/move around).

3 ↗(On Diff #185576)

does fp-contract=fast really make a difference for these transformations?

6 ↗(On Diff #185576)

is this really specific to Arm64? At least the generation could be done for any target, if the library supports it or not is not relevant for this test, no?

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
3 ↗(On Diff #185576)

What's the difference between these two test files?

Hi Renato,

Thank you very much for the comments.

I will create a new changeset with all your suggestions, and then re-submit.

bryanpkc added inline comments.
include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

This didn't work in my test build. The function names must be in alphabetical order, so __gamma_r_finite etc. cannot come after __isoc99_scanf.

rengolin added inline comments.Feb 14 2019, 2:51 AM
include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

Better move to alphabetical order, then. Would that remove the need to call it "tgamma"?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

tgamma stands for "true gamma", and it is the function name specified in the ANSI C standard.

Hm, if this is the same as tgamma, why is it __gamma_r_finite and not __tgamma_r_finite?

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Because __tgamma_r_finite doesn't exist, at least on Linux. :-)

See /usr/include/bits/math-finite.h.

Ah! Sounds good. :)

Thanks!

steleman marked 11 inline comments as done.Feb 14 2019, 10:32 PM

Added inline comments/answers to Renato's questions.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

I am a bit worried about this one. The original set exp10|exp10f|exp10l as unavailable on all arches. I originally enabled these functions for AArch64 only.

If we restrict this to x86|x86_64 only, then these functions will become available
on MIPS, PPC, PPC64, etc. I have no way of testing these ISA's to determine if they're broken or not.

So I disabled these functions for x86|x86_64 only. Let me know if you'd like to revert this to the original, which was to enable them on AArch64 only.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4952 ↗(On Diff #185576)

Renamed this function to expandNonVectorIntrinsics.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64-fast.ll
3 ↗(On Diff #185576)

It does not. Which is good. :-) That's the point of the test - the transformations should stay identical, and pass, with or without -fp-contract=fast.

6 ↗(On Diff #185576)

SLEEF is only currently enabled for AArch64 in TargetLibraryInfo. So, for any other ISA, all the tests will fail, because the transformations won't happen.

Plus, for any ISA other than AArch64, the SLEEF function name mangling will be different than on AArch64. Which means this particular test - as written for AArch64 - can't be used on any other ISA; all the CHECKs will fail.

test/Transforms/LoopVectorize/AArch64/sleef-calls-aarch64.ll
3 ↗(On Diff #185576)

sleef-calls-aarch64-fast.ll calls

%call = call fast <type> <function-name>(<argument>)

sleef-calls-aarch64.ll calls

%call = tail call <type> <function-name>(<argument>)
steleman updated this revision to Diff 186962.Feb 14 2019, 10:34 PM
steleman marked 5 inline comments as done.

Updated changeset to reflect comments/questions from Renato.

This new changeset is based on LLVM ToT from 02/14/2019.

Thanks for your answers, they make sense. Other than the mistake I made on disabling exp calls (you were right), looks fine.

lib/Analysis/TargetLibraryInfo.cpp
368 ↗(On Diff #185576)

My bad, I read it the other way round. Your original patch was correct.

steleman updated this revision to Diff 186993.Feb 15 2019, 4:17 AM

Reverted change in TargetLibraryInfo.cpp re: exp10|exp10f|exp10l on Linux with GLIBC.

These functions are now allowed on AArch64 or AArch64_be only.

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

include/llvm/Analysis/TargetLibraryInfo.def
315 ↗(On Diff #185576)

@bryanpkc What's your environment?

@steleman Any idea why you don't see that on your builds?

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

rengolin accepted this revision.Feb 15 2019, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

This revision is now accepted and ready to land.Feb 15 2019, 5:51 AM

Just did a rebuild after moving lgamma after isoc99. No warnings, no errors, and no difference in my test results ...

It seems like a build issue (or a different library issue), that we can fix after commit.

LGTM, thanks!

Thank you Sir! :-)

bryanpkc added a comment.EditedFeb 15 2019, 3:35 PM

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

Thanks Stephan, now looks good.

Did you see the comment on lgamma coming after isoc99? I wonder if that's reproducible in all systems...

Yup I saw it. But I don't get any errors here (Ubuntu 16.04 / Ubuntu 18.something) AArch64.

And it's not clear to me how that sorted alphabetical order is supposed to work. There's a bunch of other functions that come after lgamma/tgamma that aren't sorted.

For example:

/// char * __strtok_r(char *s, const char *delim, char **save_ptr);
TLI_DEFINE_ENUM_INTERNAL(dunder_strtok_r)
TLI_DEFINE_STRING_INTERNAL("__strtok_r")
/// int abs(int j);
TLI_DEFINE_ENUM_INTERNAL(abs)
TLI_DEFINE_STRING_INTERNAL("abs")
/// int access(const char *path, int amode);
TLI_DEFINE_ENUM_INTERNAL(access)
TLI_DEFINE_STRING_INTERNAL("access")

The build was fine, but when running the compiler, this assertion in lib/Analysis/TargetLibraryInfo.cpp will fail:

static void initialize(TargetLibraryInfoImpl &TLI, const Triple &T,
                       ArrayRef<StringRef> StandardNames) {
  // Verify that the StandardNames array is in alphabetical order.
  assert(std::is_sorted(StandardNames.begin(), StandardNames.end(),
                        [](StringRef LHS, StringRef RHS) {
                          return LHS < RHS;
                        }) &&
         "TargetLibraryInfoImpl function names must be sorted");

You probably don't see it because you are only building in Release mode.

P.S. The examples you listed are sorted (_ is less than a).

I think the correct fix here is to use an associative container, not an array, and not rely on writing a text file containing sorted character strings.

std::set comes to mind.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

steleman updated this revision to Diff 187131.Feb 15 2019, 10:41 PM

Corrected manual sorting in TargetLibraryInfo.def as per comments.

@bryanpkc debug build makes sense, thanks! @steleman, I'd rather just order now and try to rector later. This patch has changed enough tunes already. :-)

OK, changeset updated. Tested with Debug+Assertions+ExpensiveChecks.

Perfect, thanks!

steleman updated this revision to Diff 187507.Feb 19 2019, 7:53 PM

One minor update:

${top_srcdir}/unittests/Analysis/TargetLibraryInfoTest.cpp

needs an update to account for lgamma, tgamma, lgamma_r_finite and gamma_r_finite.

arsenm added a subscriber: arsenm.Feb 26 2019, 2:13 PM
arsenm added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

I would split adding the new intrinsics into a separate patch

steleman marked 2 inline comments as done.Feb 26 2019, 2:16 PM
steleman added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

That won't work.

echristo added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

"Why not" is always a good follow up to that sort of response :)

steleman marked 3 inline comments as done.Feb 26 2019, 4:03 PM
steleman added inline comments.
include/llvm/IR/Intrinsics.td
516–533 ↗(On Diff #187507)

Because:

  • It adds unnecessary complexity to this changeset and to D53928.
  • There is no obvious benefit to this added complexity.
  • There are many drawbacks to this added complexity.
  • Splitting it based on new vs. existing intrinsics seems arbitrary. One might as well ask for a separate changeset for each and every new intrinsic.

And, now it's my turn to ask:

  • Why should it be split?
  • What is the benefit of splitting it?
  • What will be accomplished by splitting it?
  • Is increased complexity - with all its drawbacks and pitfalls - a goal?
  • Does integrating it as is prevent, or hinder something else?
  • Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Hi @steleman, that wasn't a "loaded question", it was an honest one. Let me try to clarify what that means for all parties.

Because:

It adds unnecessary complexity to this changeset and to D53928.
There is no obvious benefit to this added complexity.
There are many drawbacks to this added complexity.

Every split or merge adds complexity, but when submitting patches upstream, you are required to explain not only what you've done, but why you haven't done some other way. This is the dynamics of upstream review and is not meant to push back or demerit, but to make sure all assumptions are shared and there is consensus.

A better answer would be:

  • because puling it apart would essentially create two very similar patch sets, possibly the second overwriting behaviour
  • because we would need to make sure the intermediate state is sane (to avoid crashes in bisects) and that means testing a state that doesn't make sense
  • because the final subset is too small, next to the refactoring needed, and the refactoring is part of "adding" the functionality

in addition to your own:

Splitting it based on new vs. existing intrinsics seems arbitrary. One might as well ask for a separate changeset for each and every new intrinsic.

To answer your questions...

And, now it's my turn to ask:

Why should it be split?
What is the benefit of splitting it?
What will be accomplished by splitting it?
Does integrating it as is prevent, or hinder something else?

Again, that was a standard question in upstream reviews, especially LLVM, as we have a more aggressive commit policy (more trunk breakages, more bisects, more reverts). For that to work, we must introduce the least amount of breakages, even in intermediate states, so they have to be small ans testable.

Furthermore, reverting the refactoring plus the actual change is more complicated than the actual change. If the breakage is in your usage of the refactoring, we can only revert that, leaving the refactoring, and being able to test the refactoring without the new behaviour and with, to find if the original assumption was broken, or if it was some silly mistake on the usage of it.

Is increased complexity - with all its drawbacks and pitfalls - a goal?

Don't assume ill intent, those questions are always worth asking and rarely meant as anything else other than double (or triple) checking consensus.

Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Again, another standard upstream behaviour (on Linux, GCC and others as well).

People are copied in all sorts of reviews, code owners even more. If we were to review everything that people copy us in straight away we wouldn't be able to do anything else, including our day jobs.

Matt's and Eric's comments here are very welcome, and I would appreciate their review on our comments about the design decisions. If they agree, we have more hope that this was the right thing all along. If they have more questions, proposals and change requests, we will be relieved that they caught something we didn't before users started relying on it.

It does take some time, but when it goes in, there's a higher probability of being right.

Given that this change (and its Clang counterpart) will add external user-visible behaviour that will have to be maintained for a long time, it's only right that more people chime in when they have time.

I'm glad Matt and Eric took the time before it went in.

cheers,
--renato

steleman marked an inline comment as done.Feb 27 2019, 5:06 AM

Why does this come up now, after this changeset - and its friend D53928 - have been reviewed for 4+ months, and already approved, and re-approved, and there are 21 subscribers to this changeset?

Again, another standard upstream behaviour (on Linux, GCC and others as well).

People are copied in all sorts of reviews, code owners even more. If we were to review everything that people copy us in straight away we wouldn't be able to do anything else, including our day jobs.

Matt's and Eric's comments here are very welcome, and I would appreciate their review on our comments about the design decisions. If they agree, we have more hope that this was the right thing all along. If they have more questions, proposals and change requests, we will be relieved that they caught something we didn't before users started relying on it.

It does take some time, but when it goes in, there's a higher probability of being right.

Given that this change (and its Clang counterpart) will add external user-visible behaviour that will have to be maintained for a long time, it's only right that more people chime in when they have time.

I'm glad Matt and Eric took the time before it went in.

Renato,

I don't have a problem with various people chiming in and offering comments and ideas.

I am, however, confused, about the concept of open-ended discussion that never reaches a terminal point, even though it would appear that it did.

I'm not sure what the proposed split would consist of. Would it be

  • A patch consisting of only the changes in Intrinsics.td which would be applied first. Wouldn't the changes in SelectionDAG be required as well?
  • Everything but the changes in TargetLibraryInfo.cpp etc. for supporting SLEEF and including any related test cases that would be upstreamed 2nd?

Would the second choice be useful by itself? Either in functionality or improved cross-platform testing before the SLEEF support went in?

I'm not sure what the proposed split would consist of.

I can see a few different concepts introduced here:

  • Lowering sin/cos/tan, etc and all related mechanical changes (large)
  • Adding the STRICT variants, with aliases on the lowering switches (medium)
  • Changing the log10 behaviour and code movement (small)
  • Adding SLEEF at the end, making use of all three patches above (tiny)

The lines between those concepts may be blurred, so it's never a clear cut, and trying to split the patch in that sequence may show problems I haven't consider.

But Git is quite helpful in splitting patches into a branch, and then testing the branch, and then one patch at a time.

From a high point of view, I can't see why any intermediate state would be harmful to builds or tests, given they only add unused code, which will be enabled by the last one.

I also imagine the tests can only work after the last patch is applied, and that's the main reason why I was ok with the patch being a single bulk.

The other reason is because there's a Clang patch, which needs to be applied after all these, and it may cause bisect issues to have too many depending patches across repos.

Hopefully the monorepo will help solve that problem, but we're not there yet.

I'm trying to understand what log10 behavior got changed. Could someone out where that change is?

I'm trying to understand what log10 behavior got changed. Could someone out where that change is?

Sorry, I mixed things up.

I was referring to the log10 order change in the list, but that was done *after* lgamma/tgamma was introduced, to build in the right order (in debug builds), so that was a change on gamma order, not log10.

Just ignore that item.

This revision was automatically updated to reflect the committed changes.