This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Adding IEEE-754 SIMD detection to loop vectorizer
AbandonedPublic

Authored by rengolin on Feb 11 2016, 7:14 AM.

Details

Summary

Some SIMD implementations are not IEEE-754 compliant, for example ARM's NEON.
This patch teaches the loop vectorizer to only allow transformations of loops
that either contain no floating-point operations or have enough allowance
flags supporting lack of precision, including -ffast-math and other NaN/Inf
related flags.

For that, the target description now has a method which tells us if the
SIMD unit is IEEE-754 compliant, and the vectorizer has a check on every
FP instruction in the candidate loop to check for the safety flags.

Fixes PR16275.

Diff Detail

Event Timeline

rengolin updated this revision to Diff 47643.Feb 11 2016, 7:14 AM
rengolin retitled this revision from to [ARM] Adding IEEE-754 SIMD detection to loop vectorizer.
rengolin updated this object.
rengolin set the repository for this revision to rL LLVM.
rengolin added subscribers: llvm-commits, kristof.beyls, aadg.
rengolin added inline comments.Feb 11 2016, 7:20 AM
include/llvm/Analysis/TargetTransformInfo.h
373

An alternative to this would be to have an enum:

enum {
  VFP = 0x1,
  SIMD = 0x2
} IEEE754Support;

and initialise all targets with 0x3, but ARM with 0x1.

Then to get the value, we do:

IsSIMDIEEE = getIEEE754Support() & IEEE754Support::SIMD;

But since this patch doesn't need that, I think we should do that later, when it's needed. Changes to the SLP vectorizer will probably do, and I can change it when I get there.

anemet added a subscriber: anemet.Feb 11 2016, 10:04 AM

Thanks Renato for working on this.

lib/Transforms/Vectorize/LoopVectorize.cpp
4333

During ReductionPHI identification it checks floating point min max is only handled
when ‘no-nans-fp-math’ is ON. Probably this behaviour condition needs to be modified.

RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,

                                      InstDesc &Prev, bool HasFunNoNaNAttr) {
case Instruction::FCmp:
case Instruction::ICmp:
case Instruction::Select:
  if (Kind != RK_IntegerMinMax &&
      (!HasFunNoNaNAttr || Kind != RK_FloatMinMax))
    return InstDesc(false, I);
  return isMinMaxSelectCmpPattern(I, Prev);
rengolin added inline comments.Feb 12 2016, 5:37 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
4333

That would only stop reduction vectorization. The problem I'm trying to fix is not a *vectorization* issue, but an ISA one.

NEON can't be used for safe-maths operations under *any* circumstance, so I need to make sure that no loops with FP ops gets ever vectorized without -ffast-math. That includes reductions and inductions.

As you see in my tests I check for both behaviours to be correct.

That's not correct for all platforms. Darwin, for example, uses NEON for floating point even in non-fast-math mode. The platform is simply defined not to support subnormals.

That's not correct for all platforms. Darwin, for example, uses NEON for floating point even in non-fast-math mode. The platform is simply defined not to support subnormals.

Well, NEON denormal behaviour is independent of the OS running underneath, so this is the correct information. But it may not be the best decision for Darwin, and that's a separate story.

Can you always add a fast-math flag on Darwin builds in the front-end? Or maybe have another CC1 option to control SIMD use?

I don't think adding isDarwin() to the TTI makes any sense.

cheers,
--renato

That's not correct for all platforms. Darwin, for example, uses NEON for floating point even in non-fast-math mode. The platform is simply defined not to support subnormals.

Well, NEON denormal behaviour is independent of the OS running underneath, so this is the correct information. But it may not be the best decision for Darwin, and that's a separate story.

It *does* depend on the OS running underneath, and I'm providing an example right here. This is comparable to the fact that many implementations do not provide sNaN support.

Can you always add a fast-math flag on Darwin builds in the front-end? Or maybe have another CC1 option to control SIMD use?

That's not an acceptable solution. Darwin AArch32 specific documents that that subnormals are not supported on the platform, but users can and do expect that code compiled without an explicit -ffast-math will respect other floating point properties, such as preventing floating point reassociation.

I don't think adding isDarwin() to the TTI makes any sense.

You can probably add a hook for "supportsSubnormals", but it'll end up being "!(isAArch32() && isDarwin())"

It *does* depend on the OS running underneath, and I'm providing an example right here. This is comparable to the fact that many implementations do not provide sNaN support.

I'm not sure I understand. Are you talking about Apple's implementation of the SIMD unit, or about the OS setting up some different flags?

AFAIK, you can't "turn on" IEEE compliance on NEON with hardware flags. So, if you accept non-IEEE-compliant code in Darwin (the OS?) becomes a platform decision, not a hardware requirement. This flag is about the hardware requirement.

That's not an acceptable solution. Darwin AArch32 specific documents that that subnormals are not supported on the platform, but users can and do expect that code compiled without an explicit -ffast-math will respect other floating point properties, such as preventing floating point reassociation.

Now I'm even more confused. The purpose of fast-math is to relax the FP model, not restrict it. I'm not relaxing more than before, I'm restricting it more.

If Darwin doesn't support subnormals, then how can it use NEON? How can it use the vectorizer as it is, which produces NEON instructions?

It *does* depend on the OS running underneath, and I'm providing an example right here. This is comparable to the fact that many implementations do not provide sNaN support.

I'm not sure I understand. Are you talking about Apple's implementation of the SIMD unit, or about the OS setting up some different flags?

AFAIK, you can't "turn on" IEEE compliance on NEON with hardware flags. So, if you accept non-IEEE-compliant code in Darwin (the OS?) becomes a platform decision, not a hardware requirement. This flag is about the hardware requirement.

Darwin, as a policy, maps all floating point onto the NEON unit. As you point out, this discards some elements of IEEE conformance, with subnormal support being the one that is typically brought. Darwin documents this as the correct behavior of the platform, even without -ffast-math.

Now I'm even more confused. The purpose of fast-math is to relax the FP model, not restrict it. I'm not relaxing more than before, I'm restricting it more.

Yes, and restricting it in the way you propose will introduce significant regressions on Darwin. Darwin wants to perform floating point vectorization, even without -ffast-math, provided that the actual constraints like no reassociation are respected. In other words, the reasoning you present here that the NEON unit can only be used with -ffast-math is not true for Darwin.

Darwin, as a policy, maps all floating point onto the NEON unit. As you point out, this discards some elements of IEEE conformance, with subnormal support being the one that is typically brought. Darwin documents this as the correct behavior of the platform, even without -ffast-math.

Ok, a platform choice, not hardware support.

I understand that enabling fast-math will break other promises, so I agree that an extra subnormal property to the target, depending on platform is needed.

I'll work on that.

Thanks!
--renato

rengolin updated this revision to Diff 48812.Feb 23 2016, 6:17 AM
rengolin removed rL LLVM as the repository for this revision.

Adding supportsSubnormal() call to make sure we always optimise on Darwin.

rengolin marked an inline comment as done.Feb 23 2016, 9:25 AM
hfinkel added inline comments.Feb 23 2016, 11:46 AM
include/llvm/Analysis/TargetTransformInfo.h
373

What does one thing have to do with the other (i.e. what does IEEE floating point have to do with allowing fast-math)? The underlying issue is that, without fast-math, the numeric representation, and the operations on numbers in that representation, should be the same. fast-math allows the use of alternate representations and operations (so long as they're not too different), but also allows reassociation. To allow vectorizing reductions, we need reassociation as well (which is a separate matter from the potential change in operational semantics).

lib/Target/ARM/ARMTargetTransformInfo.h
59

This comment seems misleading. "fast math" normally implies allowing things like reassociation (and lack of NaNs, etc.), and that has nothing to do with subnormals.

lib/Transforms/Vectorize/LoopVectorize.cpp
1859

This knowledge should live in the frontend. You can add proper diagnostic information subclass, and then catch it in the frontend to produce front-end specific information.

resistor added inline comments.Feb 23 2016, 11:54 AM
include/llvm/Analysis/TargetTransformInfo.h
373

To pile on a bit, it's not just about SIMD. Darwin uses NEON for scalar floating point as well, rather than VFP.

rengolin added inline comments.Feb 23 2016, 12:49 PM
include/llvm/Analysis/TargetTransformInfo.h
373

I need to make this abundantly clear: this is not about reductions. It's not because fast-math is used to free reductions that it has *only* to do with reductions.

The problem is that IEEE 754 states that *any* transformation *has* to have the same semantics as the original intention. This has to do with CSE, strength reduction, vectorization, inlining, etc. So, it's not just because we're not dealing with reductions that we don't care about IEEE compliance.

The flag -ffast-math acts as a collection of flags related to rouding, zeroes, infinites, nans, etc. There are many ways to disable specific guarantees of the IEEE standard, but fast-math disables all of them. This is an exageration, but it's also safer. Reducing it to the most localised flag is an optimisation. Disabling it for the general case is a correctness change.

Why SIMD?

Because the loop vectorizer in particular only uses SIMD instructions, and this is a change in the loop vectorizer. There will be additional SLP changes, but this particular change is only related to the loop vectorizer. One thing at a time.

It just happens that ARMv7's SIMD is not IEEE compliant, so I need to detect it and avoid vectorization. I'll need to do the same thing on the SLP vectorizer as well, and only allow VFP instructions. The loop vectorizer does not use VFP instructions, so we should be safe.

Other alternatives were discussed (like vectorizing here, but scalarizing later), but that'll lead to bad predictions and likely bad performance and it's not worth the effort.

Darwin can continue to use NEON for scalar FP without an issue, this is *JUST* for the loop vectorizer and will make no difference at all in Darwin.

The newly added flags are just informational. The decisions are left for the passes to do.

rengolin added inline comments.Feb 23 2016, 12:51 PM
include/llvm/Analysis/TargetTransformInfo.h
373

To pile on a bit, it's not just about SIMD. Darwin uses NEON for scalar floating point as well, rather than VFP.

To be specific, SIMD is the name of the hardware unit, not what you do with it, so this comment is not relevant to the discussion. ARMv7's SIMD will never be IEEE compliant, no matter what you do with it, and that's what the method is conveying.

rengolin added inline comments.Feb 24 2016, 10:04 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
1859

Which knowledge?

AFAICS, SIMD IEEE compliance should not be in the front-end, neither should be Darwin's compliance.

The way I'm handling in the loop vectorizer is the same as with other restrictions, and it shouldn't affect SLP or other transformations, so moving this to the front-end will impact everything, not just this specific case.

hfinkel added inline comments.Feb 24 2016, 2:49 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1859

The knowledge of which front-end flags (-ffast-math) and pragmas can change the behavior belongs in the frontend. The messages we send from the backend are often strings, but can be message subclasses that the frontend can interpret to provide frontend-specific information along with that provided by the back end. (I'll happily provide more details if you'd like when. I'm not working off my phone).

rengolin added inline comments.Feb 25 2016, 9:39 AM
lib/Target/ARM/ARMTargetTransformInfo.h
59

Indeed, I'll remove this comment.

lib/Transforms/Vectorize/LoopVectorize.cpp
1859

Right, I think we're going off track here. I was trying to be helpful, but I can change it to just say the SIMD is not IEEE compatible and I can't vectorize unless the user requests floating-point relaxation, and leave -ffast-math as an exercise to the reader.

Is that what you meant?

rengolin updated this revision to Diff 49082.Feb 25 2016, 9:49 AM

Removing fast-math comments and debug messages.

rengolin updated this revision to Diff 49083.Feb 25 2016, 9:51 AM
hfinkel added inline comments.Feb 26 2016, 6:30 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
1859

SGTM

rengolin abandoned this revision.Mar 23 2016, 4:56 AM

I'll work on a different approach.

Drive-by comment:

test/Transforms/LoopVectorize/ARM/arm-ieee-vectorize.ll
2

Do we really need O2 here?

rengolin added inline comments.Mar 23 2016, 12:54 PM
test/Transforms/LoopVectorize/ARM/arm-ieee-vectorize.ll
2

not really, laziness of my part. I'll just add the necessary flags on my next iteration.