This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Adding IEEE-754 SIMD detection to loop vectorizer
ClosedPublic

Authored by rengolin on Apr 1 2016, 10:19 AM.

Details

Summary

Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.

This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision (ex. -ffast-math, Darwin).

For that, the target description now has a method which tells us if the
vectorizer is allowed to handle FP math without falling into unsafe
representations, plus a check on every FP instruction in the candidate loop
to check for the safety flags.

This commit makes LLVM behave like GCC with respect to ARM NEON support, but
it stops short of fixing the underlying problem: sub-normals. Neither GCC
nor LLVM have a flag for allowing sub-normal operations. Before this patch,
GCC only allows it using unsafe-math flags and LLVM allows it by default with
no way to turn it off (short of not using NEON at all).

As a first step, we push this change to make it safe and in sync with GCC.
The second step is to discuss a new sub-normal's flag on both communitues
and come up with a common solution. The third step is to improve the FastMath
flags in LLVM to encode sub-normals and use those flags to restrict NEON FP.

Fixes PR16275.

Diff Detail

Event Timeline

rengolin updated this revision to Diff 52391.Apr 1 2016, 10:19 AM
rengolin retitled this revision from to [ARM] Adding IEEE-754 SIMD detection to loop vectorizer.
rengolin updated this object.
rengolin set the repository for this revision to rL LLVM.
rengolin added subscribers: llvm-commits, grosser.
rengolin updated this revision to Diff 52400.Apr 1 2016, 11:30 AM
rengolin removed rL LLVM as the repository for this revision.

Forgot to modify tests to cope with ARMv8 as well as clean it up a bit.

ashutosh.nema added inline comments.Apr 5 2016, 4:27 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4714

You may want to set unsafe flag for some of the intrinsics as well.
i.e. intrinsic like minnum, maxnum are floating point unsafe and
follows IEEE-754 semantics.

rengolin added inline comments.Apr 5 2016, 5:17 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4714

Hum, that's a good point.

But I don't want to be listing specific intrinsics if the list is not *very* short, or completely target independent.

Maybe I should just be safe and do that for all FP operations except move, load and store?

ashutosh.nema added inline comments.Apr 5 2016, 6:59 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4714

Doing this for all operations except move, load and store would be conservative, this might miss eligible cases for vectorization.

If we list, then list should be for the safe intrinsic(those can be vectorized) and for others will mark ‘setPotentiallyUnsafe’.

Also need to handle function which are later going to be mapped to intrinsic (i.e. fmax, fmaxf).

llvm::getIntrinsicIDForCall returns intrinsic ID for vectorizable CallInst, in case it does not found it returns not_intrinsic.

Probably you can write a wrapper around this function and check safety.

llvm::isTriviallyVectorizable has a list of vectorizable intrinsics, might help during listing.

rengolin added inline comments.Apr 5 2016, 7:54 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4714

Can you guarantee that whatever list you put up will be final? I'd rather be conservative and expect the user to explicitly specify -ffast-math or -fsubnormal-math than get it wrong in the future.

There's no point in knowing which functions are vectoriseable if we don't which one of them can be affected by the "unsafe" behaviour on each target. On ARM NEON, it's subnormals, on others may be something else.

When we get to the point of creating the subnormal's flag, we'll have also to come up with a table for: is this instruction / call safe on this target? We don't need that for now, as we're assuming nothing is safe, just like GCC.

It *will* inhibit eligible cases, and that's intentional.

rengolin updated this revision to Diff 52780.Apr 6 2016, 5:21 AM

Update from reviews:

  • Adding a FIXME comment to the vectorizer's check, make clear how big the hammer is
  • Being even more conservative, using all possible types of instructions (now including shifts and function calls, even those that will be builtins)
  • Changing the CHECK lines to use the legality message, not the final vectorization one, which can change over time and is irrelevant to the patch
  • Adding two new cases for fabs() with and without fast-math

Checked that the behaviour is consistent with GCC.

Hello,

I currently have an RFC for translating vector math intrinsics to svml calls. This proposal includes the user specifying the desired precision requirements via several flags (supported by the Intel compiler currently). The plan is to attach this information in the form of function attributes at the calls sites of the math intrinsics. In turn, these attributes drive the selection of the appropriate svml function variant. Would this be helpful in this particular case? I've included a description of the flags below.

-fimf-absolute-error=value[:funclist]

       define the maximum allowable absolute error for math library
       function results
         value    - a positive, floating-point number conforming to the
                    format [digits][.digits][{e|E}[sign]digits]
         funclist - optional comma separated list of one or more math
                    library functions to which the attribute should be
                    applied

-fimf-accuracy-bits=bits[:funclist]
       define the relative error, measured by the number of correct bits,
       for math library function results
         bits     - a positive, floating-point number
         funclist - optional comma separated list of one or more math
                    library functions to which the attribute should be
                    applied

-fimf-arch-consistency=value[:funclist]
       ensures that the math library functions produce consistent results
       across different implementations of the same architecture
         value    - true or false
         funclist - optional comma separated list of one or more math
                    library functions to which the attribute should be
                    applied

-fimf-max-error=ulps[:funclist]
       defines the maximum allowable relative error, measured in ulps, for
       math library function results
         ulps     - a positive, floating-point number conforming to the
                    format [digits][.digits][{e|E}[sign]digits]
         funclist - optional comma separated list of one or more math
                    library functions to which the attribute should be
                    applied

-fimf-precision=value[:funclist]
       defines the accuracy (precision) for math library functions
         value    - defined as one of the following values
                    high   - equivalent to max-error = 0.6
                    medium - equivalent to max-error = 4 (DEFAULT)
                    low    - equivalent to accuracy-bits = 11 (single
                             precision); accuracy-bits = 26 (double
                             precision)
         funclist - optional comma separated list of one or more math
                    library functions to which the attribute should be
                    applied

-fimf-domain-exclusion=classlist[:funclist]
       indicates the input arguments domain on which math functions
       must provide correct results.
        classlist - defined as one of the following values
                      nans, infinities, denormals, zeros
                      all, none, common
        funclist - optional list of one or more math library
                   functions to which the attribute should be applied.

Hi Matt,

I've seen your proposal, but I don't think it's relevant in this case. IEEE-753 compliance here is not about precision, but about behaviour.

Also, we already have flags to define most other behaviours from the standard, just not subnormals, so when we introduce the new flag, it'll be in the same group as all the others.

cheers,
--renato

@hfinkel, does this make sense? It's as close as I can get from GCC.

@grosser, is this what you expected?

This won't affect how we benchmark LLVM, since we already use -ffast-math when we don't care about precision anyway.

rengolin updated this revision to Diff 53076.Apr 8 2016, 1:18 PM

Forcing check-prefix=CHECK to make sure it gets the common case. I assumed it would by default, apparently, it doesn't.

hfinkel edited edge metadata.Apr 13 2016, 7:07 PM

Yes, this is essentially what I had in mind. Two comments:

include/llvm/Analysis/TargetTransformInfo.h
368

The name of this method, and the corresponding comment, seem potentially confusing (no pun intended ;) ). It does not explain which floating-point operations are potentially unsafe (essentially all of them?). I think it would be much better to name this:

bool isFPVectorizationPotentiallyUnsafe() const;

perhaps with this comment:

/// \brief Indicate that it is potentially unsafe to automatically vectorize floating-point operations because the semantics of vector and scalar floating-point semantics may differ. For example, ARM NEON v7 SIMD math does not support IEEE-754 denormal numbers, while depending on the platform, scalar floating-point math does.
lib/Transforms/Vectorize/LoopVectorize.cpp
4727

I don't understand why you have the (CI || it->isBinaryOp() || it->isCast() || it->isShift()) - why does it matter what kind of operation it is? And do we have any floating-point shifts? If you specifically want to include casts, do you also specifically need to pick up loads and stores?

rengolin added inline comments.Apr 14 2016, 3:55 AM
include/llvm/Analysis/TargetTransformInfo.h
368

I see, and then invert the logic to return "false" by default, as in "it's not unsafe to transform this operation in this target", but on v7 NEON return true because it "is unsafe to do unsafe FP on NEON". I'll change that, as I really hated my original name. :)

lib/Transforms/Vectorize/LoopVectorize.cpp
4727

We don't want to get loads and stores because they're normally safe anyway. That's why I'm using the "else if".

But also I don't want to get MOVs and shuffles either, since there's no chance they'll change the semantics or precision unless it's specifically requested (widening/shortening, type change, etc). So I just selected everything else. :)

Function calls and binary operations are obvious. Casts may change the type, so they may end up as the case above. I don't know of any FP shifts, so that was probably over the top.

I think it should be safe to do:

} else if (it->getType()->isFloatingPointTy() &&
             (CI || it->isBinaryOp()) &&
             !it->hasUnsafeAlgebra()) {
hfinkel added inline comments.Apr 14 2016, 7:18 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4727

That makes sense to me. We should add an associated note to the declaration of isFPVectorizationPotentiallyUnsafe indicating this. Something like this perhaps:

// This applies to floating-point math operations and calls, not memory operations, shuffles, or casts.
rengolin added inline comments.Apr 14 2016, 7:21 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4727

Will do, thanks!

rengolin accepted this revision.Apr 14 2016, 1:50 PM
rengolin added a reviewer: rengolin.

Thanks Hal,

I've applied the changes you suggested and committed as r266363.

I'll study a bit more on would be a good subnormal flag.

--renato

This revision is now accepted and ready to land.Apr 14 2016, 1:50 PM
rengolin closed this revision.Apr 14 2016, 1:50 PM

Sorry to have missed this before commit.

lib/Target/ARM/ARMTargetTransformInfo.h
58

What's the idea here? The FTZ behaviour doesn't change in AArch32 execution state on ARMv8-A, you're still unsafe. The full IEEE-compliant Advanced SIMD is in AArch64 execution state.

rengolin added inline comments.Apr 18 2016, 3:17 AM
lib/Target/ARM/ARMTargetTransformInfo.h
58

Hum. I've been wrongly advised on this one... Now reading the ARM ARM, I can see that this is the case. I'll change it to match AArch64 only. Thanks!