This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Target/ARM/
-
ARM/
-
ARMTargetTransformInfo.h
-
Transforms/Vectorize/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/ARM/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
arm-ieee-vectorize.ll

Differential D59335

[RFC] Enable vectorization on Neon even without fast-math
AbandonedPublic

Authored by sanjoy on Mar 13 2019, 3:09 PM.

Download Raw Diff

Details

Reviewers

rengolin
tra

Summary

This patch introduces a "ftz" function attribute and uses that to enable
vectorization for ARM Neon when -ffast-math is not specified. It would be nicer
to encode FTZ as part of FastMathFlags but we've run out of space there.

If this approach looks workable, I'll change the NVPTX backend to also use this
(backend independent) ftz attribute instead of the custom "nvptx-f32ftz"
attribute. I'll also add an entry to the langref.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 29106
Build 29105: arc lint + arc unit

Event Timeline

sanjoy created this revision.Mar 13 2019, 3:09 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 13 2019, 3:09 PM

Herald added subscribers: jdoerfert, bixia, kristof.beyls and 3 others. · View Herald Transcript

Harbormaster completed remote builds in B29106: Diff 190517.Mar 13 2019, 3:11 PM

langref part seems to be missing.

It would be nicer to encode FTZ as part of FastMathFlags but we've run out of space there.

That recently happened with sanitizer, and backend (predicates? don't recall),
it shouldn't be too hard to fix properly..

langref part seems to be missing.

I'll add it if/once reviewers are okay with this approach.

That recently happened with sanitizer, and backend (predicates? don't recall),

By "That" you mean they too ran out of bits? In this specific case we're running out of bits in Value::SubclassOptionalData, I don't see a simple fix for it that doesn't involve increasing the size of llvm::Value. I could steal a bit or two from, say, the UseList pointer but I'm not sure that counts as simple.

In D59335#1428404, @sanjoy wrote:

langref part seems to be missing.

I'll add it if/once reviewers are okay with this approach.

Actually, adding the langref changes to the patch help reviewers understand the new semantics and provide an atomic change, guaranteeing that, if the patch is reverted, so are the docs.

That recently happened with sanitizer, and backend (predicates? don't recall),

By "That" you mean they too ran out of bits? In this specific case we're running out of bits in Value::SubclassOptionalData, I don't see a simple fix for it that doesn't involve increasing the size of llvm::Value. I could steal a bit or two from, say, the UseList pointer but I'm not sure that counts as simple.

(not for this patch, but) Could you factor the flags out completely? It would replace space for computation, and probably need a new handler class, but it would be cleaner than having some flags in and others out.

I'm curious as to how will we generate these flags.

Is this the responsibility of the front-end, based on command line options, language standards, target-specific?

Or can some middle-end passes change that, too?

How do they propagate, and how does it merge with other fast-math flags?

I could not find a discussion about this on the list, it would be good to solve those before just adding another flag, especially when we already ran out of space. :)

In D59335#1431918, @rengolin wrote:

I'm curious as to how will we generate these flags.

Is this the responsibility of the front-end, based on command line options, language standards, target-specific?

Yes.

Or can some middle-end passes change that, too?

Yes, if they know what they're doing. :)

How do they propagate, and how does it merge with other fast-math flags?

I could not find a discussion about this on the list, it would be good to solve those before just adding another flag, especially when we already ran out of space. :)

Okay, I'll start an RFC on llvm-dev to avoid having a long discussion here.

sanjoy abandoned this revision.Jan 29 2022, 5:37 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

8 lines

TargetTransformInfoImpl.h

2 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

ARM/

ARMTargetTransformInfo.h

4 lines

Transforms/

Vectorize/

LoopVectorize.cpp

4 lines

test/

Transforms/

LoopVectorize/

ARM/

arm-ieee-vectorize.ll

39 lines

Diff 190517

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 605 Lines • ▼ Show 20 Lines	public:

/// Indicate that it is potentially unsafe to automatically vectorize		/// Indicate that it is potentially unsafe to automatically vectorize
/// floating-point operations because the semantics of vector and scalar		/// floating-point operations because the semantics of vector and scalar
/// floating-point semantics may differ. For example, ARM NEON v7 SIMD math		/// floating-point semantics may differ. For example, ARM NEON v7 SIMD math
/// does not support IEEE-754 denormal numbers, while depending on the		/// does not support IEEE-754 denormal numbers, while depending on the
/// platform, scalar floating-point math does.		/// platform, scalar floating-point math does.
/// This applies to floating-point math operations and calls, not memory		/// This applies to floating-point math operations and calls, not memory
/// operations, shuffles, or casts.		/// operations, shuffles, or casts.
bool isFPVectorizationPotentiallyUnsafe() const;		bool isFPVectorizationPotentiallyUnsafe(bool IsFTZEnabled) const;

/// Determine if the target supports unaligned memory accesses.		/// Determine if the target supports unaligned memory accesses.
bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth, unsigned AddressSpace = 0,		unsigned BitWidth, unsigned AddressSpace = 0,
unsigned Alignment = 1,		unsigned Alignment = 1,
bool *Fast = nullptr) const;		bool *Fast = nullptr) const;

/// Return hardware support for population count.		/// Return hardware support for population count.
▲ Show 20 Lines • Show All 478 Lines • ▼ Show 20 Lines	public:
virtual unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		virtual unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
unsigned VF) = 0;		unsigned VF) = 0;
virtual bool supportsEfficientVectorElementLoadStore() = 0;		virtual bool supportsEfficientVectorElementLoadStore() = 0;
virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;		virtual bool enableAggressiveInterleaving(bool LoopHasReductions) = 0;
virtual const MemCmpExpansionOptions *enableMemCmpExpansion(		virtual const MemCmpExpansionOptions *enableMemCmpExpansion(
bool IsZeroCmp) const = 0;		bool IsZeroCmp) const = 0;
virtual bool enableInterleavedAccessVectorization() = 0;		virtual bool enableInterleavedAccessVectorization() = 0;
virtual bool enableMaskedInterleavedAccessVectorization() = 0;		virtual bool enableMaskedInterleavedAccessVectorization() = 0;
virtual bool isFPVectorizationPotentiallyUnsafe() = 0;		virtual bool isFPVectorizationPotentiallyUnsafe(bool IsFTZEnabled) = 0;
virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		virtual bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
unsigned Alignment,		unsigned Alignment,
bool *Fast) = 0;		bool *Fast) = 0;
virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;		virtual PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) = 0;
virtual bool haveFastSqrt(Type *Ty) = 0;		virtual bool haveFastSqrt(Type *Ty) = 0;
virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) = 0;		virtual bool isFCmpOrdCheaperThanFCmpZero(Type *Ty) = 0;
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	const MemCmpExpansionOptions *enableMemCmpExpansion(
return Impl.enableMemCmpExpansion(IsZeroCmp);		return Impl.enableMemCmpExpansion(IsZeroCmp);
}		}
bool enableInterleavedAccessVectorization() override {		bool enableInterleavedAccessVectorization() override {
return Impl.enableInterleavedAccessVectorization();		return Impl.enableInterleavedAccessVectorization();
}		}
bool enableMaskedInterleavedAccessVectorization() override {		bool enableMaskedInterleavedAccessVectorization() override {
return Impl.enableMaskedInterleavedAccessVectorization();		return Impl.enableMaskedInterleavedAccessVectorization();
}		}
bool isFPVectorizationPotentiallyUnsafe() override {		bool isFPVectorizationPotentiallyUnsafe(bool IsFTZEnabled) override {
return Impl.isFPVectorizationPotentiallyUnsafe();		return Impl.isFPVectorizationPotentiallyUnsafe(IsFTZEnabled);
}		}
bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth, unsigned AddressSpace,		unsigned BitWidth, unsigned AddressSpace,
unsigned Alignment, bool *Fast) override {		unsigned Alignment, bool *Fast) override {
return Impl.allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,		return Impl.allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,
Alignment, Fast);		Alignment, Fast);
}		}
PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {		PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) override {
▲ Show 20 Lines • Show All 334 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 311 Lines • ▼ Show 20 Lines	const TTI::MemCmpExpansionOptions *enableMemCmpExpansion(
bool IsZeroCmp) const {		bool IsZeroCmp) const {
return nullptr;		return nullptr;
}		}

bool enableInterleavedAccessVectorization() { return false; }		bool enableInterleavedAccessVectorization() { return false; }

bool enableMaskedInterleavedAccessVectorization() { return false; }		bool enableMaskedInterleavedAccessVectorization() { return false; }

bool isFPVectorizationPotentiallyUnsafe() { return false; }		bool isFPVectorizationPotentiallyUnsafe(bool IsFTZEnabled) { return false; }

bool allowsMisalignedMemoryAccesses(LLVMContext &Context,		bool allowsMisalignedMemoryAccesses(LLVMContext &Context,
unsigned BitWidth,		unsigned BitWidth,
unsigned AddressSpace,		unsigned AddressSpace,
unsigned Alignment,		unsigned Alignment,
bool *Fast) { return false; }		bool *Fast) { return false; }

TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {		TTI::PopcntSupportKind getPopcntSupport(unsigned IntTyWidthInBit) {
▲ Show 20 Lines • Show All 541 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 272 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::enableInterleavedAccessVectorization() const {			bool TargetTransformInfo::enableInterleavedAccessVectorization() const {
	return TTIImpl->enableInterleavedAccessVectorization();			return TTIImpl->enableInterleavedAccessVectorization();
	}			}

	bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {			bool TargetTransformInfo::enableMaskedInterleavedAccessVectorization() const {
	return TTIImpl->enableMaskedInterleavedAccessVectorization();			return TTIImpl->enableMaskedInterleavedAccessVectorization();
	}			}

	bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe() const {			bool TargetTransformInfo::isFPVectorizationPotentiallyUnsafe(
	return TTIImpl->isFPVectorizationPotentiallyUnsafe();			bool IsFTZEnabled) const {
				return TTIImpl->isFPVectorizationPotentiallyUnsafe(IsFTZEnabled);
	}			}

	bool TargetTransformInfo::allowsMisalignedMemoryAccesses(LLVMContext &Context,			bool TargetTransformInfo::allowsMisalignedMemoryAccesses(LLVMContext &Context,
	unsigned BitWidth,			unsigned BitWidth,
	unsigned AddressSpace,			unsigned AddressSpace,
	unsigned Alignment,			unsigned Alignment,
	bool *Fast) const {			bool *Fast) const {
	return TTIImpl->allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,			return TTIImpl->allowsMisalignedMemoryAccesses(Context, BitWidth, AddressSpace,
	▲ Show 20 Lines • Show All 936 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.h

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	bool shouldFavorBackedgeIndex(const Loop *L) const {
if (L->getHeader()->getParent()->optForSize())		if (L->getHeader()->getParent()->optForSize())
return false;		return false;
return ST->isMClass() && ST->isThumb2() && L->getNumBlocks() == 1;		return ST->isMClass() && ST->isThumb2() && L->getNumBlocks() == 1;
}		}

/// Floating-point computation using ARMv8 AArch32 Advanced		/// Floating-point computation using ARMv8 AArch32 Advanced
/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD		/// SIMD instructions remains unchanged from ARMv7. Only AArch64 SIMD
/// is IEEE-754 compliant, but it's not covered in this target.		/// is IEEE-754 compliant, but it's not covered in this target.
bool isFPVectorizationPotentiallyUnsafe() {		bool isFPVectorizationPotentiallyUnsafe(bool IsFTZEnabled) {
return !ST->isTargetDarwin();		return !(IsFTZEnabled \|\| ST->isTargetDarwin());
}		}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,		int getIntImmCodeSizeCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);

▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,278 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "LV: Can't vectorize when the NoImplicitFloat"
"attribute is used.\n");		"attribute is used.\n");
ORE->emit(createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(),		ORE->emit(createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(),
"NoImplicitFloat", L)		"NoImplicitFloat", L)
<< "loop not vectorized due to NoImplicitFloat attribute");		<< "loop not vectorized due to NoImplicitFloat attribute");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}

		bool IsFTZEnabled = F->hasFnAttribute("ftz");

// Check if the target supports potentially unsafe FP vectorization.		// Check if the target supports potentially unsafe FP vectorization.
// FIXME: Add a check for the type of safety issue (denormal, signaling)		// FIXME: Add a check for the type of safety issue (denormal, signaling)
// for the target we're vectorizing for, to make sure none of the		// for the target we're vectorizing for, to make sure none of the
// additional fp-math flags can help.		// additional fp-math flags can help.
if (Hints.isPotentiallyUnsafe() &&		if (Hints.isPotentiallyUnsafe() &&
TTI->isFPVectorizationPotentiallyUnsafe()) {		TTI->isFPVectorizationPotentiallyUnsafe(IsFTZEnabled)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");		dbgs() << "LV: Potentially unsafe FP op prevents vectorization.\n");
ORE->emit(		ORE->emit(
createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)		createLVMissedAnalysis(Hints.vectorizeAnalysisPassName(), "UnsafeFP", L)
<< "loop not vectorized due to unsafe FP support.");		<< "loop not vectorized due to unsafe FP support.");
Hints.emitRemarkWithHints();		Hints.emitRemarkWithHints();
return false;		return false;
}		}
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/ARM/arm-ieee-vectorize.ll

; RUN: opt -mtriple armv7-linux-gnueabihf -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=LINUX		; RUN: opt -mtriple armv7-linux-gnueabihf -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=LINUX
; RUN: opt -mtriple armv8-linux-gnu -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=LINUX		; RUN: opt -mtriple armv8-linux-gnu -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=LINUX
; RUN: opt -mtriple armv7-unknwon-darwin -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=DARWIN		; RUN: opt -mtriple armv7-unknwon-darwin -loop-vectorize -S %s -debug-only=loop-vectorize -o /dev/null 2>&1 \| FileCheck %s --check-prefix=CHECK --check-prefix=DARWIN
; REQUIRES: asserts		; REQUIRES: asserts

; Testing the ability of the loop vectorizer to tell when SIMD is safe or not		; Testing the ability of the loop vectorizer to tell when SIMD is safe or not
; regarding IEEE 754 standard.		; regarding IEEE 754 standard.
; On Linux, we only want the vectorizer to work when -ffast-math flag is set,		; On Linux, we only want the vectorizer to work when -ffast-math flag is set or
; because NEON is not IEEE compliant.		; when the function has an FTZ attribute, because NEON is not IEEE compliant.
; Darwin, on the other hand, doesn't support subnormals, and all optimizations		; Darwin, on the other hand, doesn't support subnormals, and all optimizations
; are allowed, even without -ffast-math.		; are allowed, even without -ffast-math.

; Integer loops are always vectorizeable		; Integer loops are always vectorizeable
; CHECK: Checking a loop in "sumi"		; CHECK: Checking a loop in "sumi"
; CHECK: We can vectorize this loop!		; CHECK: We can vectorize this loop!
define void @sumi(i32* noalias nocapture readonly %A, i32* noalias nocapture readonly %B, i32* noalias nocapture %C, i32 %N) {		define void @sumi(i32* noalias nocapture readonly %A, i32* noalias nocapture readonly %B, i32* noalias nocapture %C, i32 %N) {
entry:		entry:
▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	for.body: ; preds = %entry, %for.body
br i1 %exitcond, label %for.end, label %for.body		br i1 %exitcond, label %for.end, label %for.body

for.end: ; preds = %for.body, %entry		for.end: ; preds = %for.body, %entry
ret void		ret void
}		}

declare float @fabsf(float)		declare float @fabsf(float)

		; Floating-point loops need fast-math to be vectorizeable
		; LINUX: Checking a loop in "sumf_with_ftz"
		; LINUX-NOT: Potentially unsafe FP op prevents vectorization
		; DARWIN: Checking a loop in "sumf_with_ftz"
		; DARWIN-NOT: Potentially unsafe FP op prevents vectorization
		define void @sumf_with_ftz(float* noalias nocapture readonly %A, float* noalias nocapture readonly %B, float* noalias nocapture %C, i32 %N) #3 {
		entry:
		%cmp5 = icmp eq i32 %N, 0
		br i1 %cmp5, label %for.end, label %for.body.preheader

		for.body.preheader: ; preds = %entry
		br label %for.body

		for.body: ; preds = %for.body.preheader, %for.body
		%i.06 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
		%arrayidx = getelementptr inbounds float, float* %A, i32 %i.06
		%0 = load float, float* %arrayidx, align 4
		%arrayidx1 = getelementptr inbounds float, float* %B, i32 %i.06
		%1 = load float, float* %arrayidx1, align 4
		%mul = fmul float %0, %1
		%arrayidx2 = getelementptr inbounds float, float* %C, i32 %i.06
		store float %mul, float* %arrayidx2, align 4
		%inc = add nuw nsw i32 %i.06, 1
		%exitcond = icmp eq i32 %inc, %N
		br i1 %exitcond, label %for.end.loopexit, label %for.body

		for.end.loopexit: ; preds = %for.body
		br label %for.end

		for.end: ; preds = %for.end.loopexit, %entry
		ret void
		}

attributes #1 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+vfp3" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #1 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+vfp3" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+vfp3" "unsafe-fp-math"="true" "use-soft-float"="false" }		attributes #2 = { nounwind readnone "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+vfp3" "unsafe-fp-math"="true" "use-soft-float"="false" }
		attributes #3 = { "ftz" }