Diff 346675

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
}		}

InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,		InstructionCost getGatherScatterOpCost(unsigned Opcode, Type *DataTy,
const Value *Ptr, bool VariableMask,		const Value *Ptr, bool VariableMask,
Align Alignment,		Align Alignment,
TTI::TargetCostKind CostKind,		TTI::TargetCostKind CostKind,
const Instruction *I);		const Instruction *I);

bool isLegalElementTypeForRVV(Type *ScalarTy) {		bool isLegalElementTypeForRVV(Type *ScalarTy) const {
if (ScalarTy->isPointerTy())		if (ScalarTy->isPointerTy())
return true;		return true;

if (ScalarTy->isIntegerTy(8) \|\| ScalarTy->isIntegerTy(16) \|\|		if (ScalarTy->isIntegerTy(8) \|\| ScalarTy->isIntegerTy(16) \|\|
ScalarTy->isIntegerTy(32) \|\| ScalarTy->isIntegerTy(64))		ScalarTy->isIntegerTy(32) \|\| ScalarTy->isIntegerTy(64))
return true;		return true;

if (ScalarTy->isHalfTy())		if (ScalarTy->isHalfTy())
Show All 32 Lines	bool isLegalMaskedGatherScatter(Type *DataType, Align Alignment) {
if (isa<FixedVectorType>(DataType) && ST->getMinRVVVectorSizeInBits() == 0)		if (isa<FixedVectorType>(DataType) && ST->getMinRVVVectorSizeInBits() == 0)
return false;		return false;

return isLegalElementTypeForRVV(DataType->getScalarType());		return isLegalElementTypeForRVV(DataType->getScalarType());
}		}

bool isLegalMaskedGather(Type *DataType, Align Alignment) {		bool isLegalMaskedGather(Type *DataType, Align Alignment) {
return isLegalMaskedGatherScatter(DataType, Alignment);		return isLegalMaskedGatherScatter(DataType, Alignment);
}		}
		craig.topperUnsubmitted Not Done Reply Inline Actions There should be a blank line above this. craig.topper: There should be a blank line above this.
bool isLegalMaskedScatter(Type *DataType, Align Alignment) {		bool isLegalMaskedScatter(Type *DataType, Align Alignment) {
return isLegalMaskedGatherScatter(DataType, Alignment);		return isLegalMaskedGatherScatter(DataType, Alignment);
		frasercrmckUnsubmitted Not Done Reply Inline Actions We support the fixed-length vector reductions. Can we add support for that too? frasercrmck: We support the fixed-length vector reductions. Can we add support for that too?
		luke957AuthorUnsubmitted Done Reply Inline Actions Can we just retrun true when VF is not scalable to support fixed-length vector reductions? luke957: Can we just retrun true when VF is not scalable to support fixed-length vector reductions?
}		}
		craig.topperUnsubmitted Not Done Reply Inline Actions This is returning true for not scalable. Is that saying that any fixed length reduction is supported? craig.topper: This is returning true for not scalable. Is that saying that any fixed length reduction is…
		luke957AuthorUnsubmitted Done Reply Inline Actions I understand returning true for not scalable is just to make canVectorizeReductions() in LoopVectorizer.cpp return right value. There will be other checks after canVectorizeReductions() returns. luke957: I understand returning true for not scalable is just to make canVectorizeReductions() in…
		craig.topperUnsubmitted Not Done Reply Inline Actions Let me rephrase a little. You're returning true for fixed vectors, but not checking element type or opcode or hasStdExtV. Does the the mean the vectorizer will start generating reductions for vectors of i128. Or reductions of floats when the F extension isn't enabled? craig.topper: Let me rephrase a little. You're returning true for fixed vectors, but not checking element…
		dmgreenUnsubmitted Not Done Reply Inline Actions I believe the "legal" in this function really means "are you going to crash if there is a reduction of this type". Normal non-scalable reductions can always be legalized to something, even if that mean expanding or scalarizing or converting to soft float. The standard costmodelling then kicks in to say whether it is actually a good idea. i.e work the same as X86/Arm/etc. (But, because they are outside the loop, the vectorizer doesn't account for them directly, only the arithmetic instructions that will be in the loop. The assumption is what is in the loop will dominate performance). dmgreen: I believe the "legal" in this function really means "are you going to crash if there is a…
		luke957AuthorUnsubmitted Done Reply Inline Actions I think so. But as StdExtV is optional for riscv, hasStdExtV() check might still be needed. luke957: I think so. But as StdExtV is optional for riscv, hasStdExtV() check might still be needed.

/// \returns How the target needs this vector-predicated operation to be		/// \returns How the target needs this vector-predicated operation to be
		frasercrmckUnsubmitted Not Done Reply Inline Actions I'm wondering if returning `true` for fixed-length vectors, even if correct (i.e. not crashing), is likely to produce worse code. Will it trick the cost modeling into producing code which we'll then expand during legalization, and increase register pressure and the likelihood of spilling? frasercrmck: I'm wondering if returning `true` for fixed-length vectors, even if correct (i.e. not…
		craig.topperUnsubmitted Not Done Reply Inline Actions For most reductions the expansion should be log2(elements) SingleSrcPermute shuffles and binops to reduce elements by half each step. So it shouldn't increase the register pressure much since you just need 2 registers. This should match the default cost model for getArithmeticReductionCost/getMinMaxReductionCost. That isn't what happens for ordered floating point reduction though, but it also doesn't look like this function is told that it is an ordered reduction. It doesn't look like getArithmeticReductionCost get's told either. Maybe we only vectorize to unordered? craig.topper: For most reductions the expansion should be log2(elements) SingleSrcPermute shuffles and binops…
		craig.topperUnsubmitted Not Done Reply Inline Actions I think the ordered propery is in RecurrenceDescriptor after https://reviews.llvm.org/D98435. But getArithmeticReductionCost/getMinMaxReductionCost don't receive the RecurrenceDescriptor. craig.topper: I think the ordered propery is in RecurrenceDescriptor after https://reviews.llvm.org/D98435.
		frasercrmckUnsubmitted Not Done Reply Inline Actions Yeah fair enough, that sounds fine then. Part of me also finds it weird that we'd say it's legal to generate a fixed-length fmin/fmax/mul reduction even though we'd expand it 100% of the time. But perhaps it's just that this API's job isn't perfectly clear. frasercrmck: Yeah fair enough, that sounds fine then. Part of me also finds it weird that we'd say it's…
		luke957AuthorUnsubmitted Done Reply Inline Actions I agree the API's name is weird. luke957: I agree the API's name is weird.
		luke957AuthorUnsubmitted Done Reply Inline Actions The ordered propery is added for RecurrenceDescriptor in https://reviews.llvm.org/D98435 while in-order reduction support is off by default and controlled with the `-enable-strict-reductions` flag. Would ordered floating point reduction be fixed in function InnerLoopVectorizer::fixVectorizedLoop() if `-enable-strict-reductions` flag is used? luke957: The ordered propery is added for RecurrenceDescriptor in https://reviews.llvm.org/D98435 while…
/// transformed.		/// transformed.
TargetTransformInfo::VPLegalization		TargetTransformInfo::VPLegalization
getVPLegalizationStrategy(const VPIntrinsic &PI) const {		getVPLegalizationStrategy(const VPIntrinsic &PI) const {
using VPLegalization = TargetTransformInfo::VPLegalization;		using VPLegalization = TargetTransformInfo::VPLegalization;
return VPLegalization(VPLegalization::Legal, VPLegalization::Legal);		return VPLegalization(VPLegalization::Legal, VPLegalization::Legal);
}		}

		bool isLegalToVectorizeReduction(RecurrenceDescriptor RdxDesc,
		ElementCount VF) const {
		if (!ST->hasStdExtV())
		return false;

		if (!VF.isScalable())
		return true;

		Type *Ty = RdxDesc.getRecurrenceType();
		frasercrmckUnsubmitted Not Done Reply Inline Actions We don't support the fmin/fmax reductions yet. I suspect this would crash in the backend? frasercrmck: We don't support the fmin/fmax reductions yet. I suspect this would crash in the backend?
		luke957AuthorUnsubmitted Done Reply Inline Actions The added test case contains a fmin case. For opt -loop-vectorize, it seems no crash. luke957: The added test case contains a fmin case. For opt -loop-vectorize, it seems no crash.
		craig.topperUnsubmitted Not Done Reply Inline Actions The crash would be in llc in SelectionDAG, not opt. craig.topper: The crash would be in llc in SelectionDAG, not opt.
		luke957AuthorUnsubmitted Done Reply Inline Actions Silly of me. Add fixme for fmin/fmax. luke957: Silly of me. Add fixme for fmin/fmax.
		if (!isLegalElementTypeForRVV(Ty))
		return false;
		craig.topperUnsubmitted Not Done Reply Inline Actions Please wrap this to 80 columns. craig.topper: Please wrap this to 80 columns.
		luke957AuthorUnsubmitted Done Reply Inline Actions Fixed. luke957: Fixed.

		frasercrmckUnsubmitted Not Done Reply Inline Actions You can update this now that D101518 went through. frasercrmck: You can update this now that D101518 went through.
		luke957AuthorUnsubmitted Done Reply Inline Actions Updated. Thanks for your work. luke957: Updated. Thanks for your work.
		switch (RdxDesc.getRecurrenceKind()) {
		case RecurKind::Add:
		case RecurKind::FAdd:
		case RecurKind::And:
		case RecurKind::Or:
		case RecurKind::Xor:
		case RecurKind::SMin:
		case RecurKind::SMax:
		case RecurKind::UMin:
		case RecurKind::UMax:
		case RecurKind::FMin:
		case RecurKind::FMax:
		return true;
		default:
		return false;
		}
		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H		#endif // LLVM_LIB_TARGET_RISCV_RISCVTARGETTRANSFORMINFO_H

llvm/test/Transforms/LoopVectorize/RISCV/scalable-reductions.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -scalable-vectorization=on \
				; RUN: -riscv-v-vector-bits-min=128 -riscv-v-vector-bits-max=128 \
				; RUN: -pass-remarks=loop-vectorize -pass-remarks-analysis=loop-vectorize \
				; RUN: -pass-remarks-missed=loop-vectorize -mtriple riscv64-linux-gnu \
				; RUN: -mattr=+experimental-v,+f -S 2>%t \| FileCheck %s -check-prefix=CHECK
				; RUN: cat %t \| FileCheck %s -check-prefix=CHECK-REMARK

				; Reduction can be vectorized

				; ADD

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define i32 @add(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @add
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ADD1:.*]] = add <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = add <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ADD:.*]] = add <vscale x 8 x i32> %[[ADD2]], %[[ADD1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.add.nxv8i32(<vscale x 8 x i32> %[[ADD]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret i32 %add
				}

				; OR

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define i32 @or(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @or
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[OR1:.*]] = or <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[OR2:.*]] = or <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[OR:.*]] = or <vscale x 8 x i32> %[[OR2]], %[[OR1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.or.nxv8i32(<vscale x 8 x i32> %[[OR]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %or, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%or = or i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret i32 %or
				}

				; AND

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define i32 @and(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @and
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[AND1:.*]] = and <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[AND2:.*]] = and <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ABD:.*]] = and <vscale x 8 x i32> %[[ADD2]], %[[AND1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.and.nxv8i32(<vscale x 8 x i32> %[[ADD]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %and, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%and = and i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret i32 %and
				}

				; XOR

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define i32 @xor(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @xor
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[XOR1:.*]] = xor <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[XOR2:.*]] = xor <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[XOR:.*]] = xor <vscale x 8 x i32> %[[XOR2]], %[[XOR1]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.xor.nxv8i32(<vscale x 8 x i32> %[[XOR]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %xor, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%xor = xor i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret i32 %xor
				}

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				; SMIN

				define i32 @smin(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @smin
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ICMP1:.*]] = icmp slt <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = icmp slt <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = icmp slt <vscale x 8 x i32> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x i32> %[[SEL1]], <vscale x 8 x i32> %[[SEL2]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.smin.nxv8i32(<vscale x 8 x i32> %[[SEL]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.010 = phi i32 [ 2, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp.i = icmp slt i32 %0, %sum.010
				%.sroa.speculated = select i1 %cmp.i, i32 %0, i32 %sum.010
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret i32 %.sroa.speculated
				}

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				; UMAX

				define i32 @umax(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @umax
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x i32>
				; CHECK: %[[ICMP1:.*]] = icmp ugt <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[ICMP2:.*]] = icmp ugt <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[ICMP1]], <vscale x 8 x i32> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[ICMP2]], <vscale x 8 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ICMP:.*]] = icmp ugt <vscale x 8 x i32> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select <vscale x 8 x i1> %[[ICMP]], <vscale x 8 x i32> %[[SEL1]], <vscale x 8 x i32> %[[SEL2]]
				; CHECK-NEXT: call i32 @llvm.vector.reduce.umax.nxv8i32(<vscale x 8 x i32> %[[SEL]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.010 = phi i32 [ 2, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%cmp.i = icmp ugt i32 %0, %sum.010
				%.sroa.speculated = select i1 %cmp.i, i32 %0, i32 %sum.010
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret i32 %.sroa.speculated
				}

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				; FADD (FAST)

				define float @fadd_fast(float* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_fast
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[ADD1:.*]] = fadd fast <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = fadd fast <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[ADD:.*]] = fadd fast <vscale x 8 x float> %[[ADD2]], %[[ADD1]]
				; CHECK-NEXT: call fast float @llvm.vector.reduce.fadd.nxv8f32(float -0.000000e+00, <vscale x 8 x float> %[[ADD]])
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%add = fadd fast float %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret float %add
				}

				; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
				; CHECK-REMARK: vectorized loop (vectorization width: 8, interleaved count: 2)
				define bfloat @fadd_fast_bfloat(bfloat* noalias nocapture readonly %a, i64 %n) {
				; CHECK-LABEL: @fadd_fast_bfloat
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <8 x bfloat>
				; CHECK: %[[LOAD2:.*]] = load <8 x bfloat>
				; CHECK: %[[FADD1:.*]] = fadd fast <8 x bfloat> %[[LOAD1]]
				; CHECK: %[[FADD2:.*]] = fadd fast <8 x bfloat> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[RDX:.*]] = fadd fast <8 x bfloat> %[[FADD2]], %[[FADD1]]
				; CHECK: call fast bfloat @llvm.vector.reduce.fadd.v8bf16(bfloat 0xR8000, <8 x bfloat> %[[RDX]])
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi bfloat [ 0.000000e+00, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds bfloat, bfloat* %a, i64 %iv
				%0 = load bfloat, bfloat* %arrayidx, align 4
				%add = fadd fast bfloat %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret bfloat %add
				}

				; FMIN (FAST)

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define float @fmin_fast(float* noalias nocapture readonly %a, i64 %n) #0 {
				; CHECK-LABEL: @fmin_fast
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[FCMP1:.*]] = fcmp olt <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[FCMP2:.*]] = fcmp olt <vscale x 8 x float> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[FCMP1]], <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[FCMP2]], <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[FCMP:.*]] = fcmp olt <vscale x 8 x float> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select <vscale x 8 x i1> %[[FCMP]], <vscale x 8 x float> %[[SEL1]], <vscale x 8 x float> %[[SEL2]]
				; CHECK-NEXT: call float @llvm.vector.reduce.fmin.nxv8f32(<vscale x 8 x float> %[[SEL]])
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp olt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret float %.sroa.speculated
				}

				; FMAX (FAST)

				; CHECK-REMARK: vectorized loop (vectorization width: vscale x 8, interleaved count: 2)
				define float @fmax_fast(float* noalias nocapture readonly %a, i64 %n) #0 {
				; CHECK-LABEL: @fmax_fast
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <vscale x 8 x float>
				; CHECK: %[[LOAD2:.*]] = load <vscale x 8 x float>
				; CHECK: %[[FCMP1:.*]] = fcmp fast ogt <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[FCMP2:.*]] = fcmp fast ogt <vscale x 8 x float> %[[LOAD2]]
				; CHECK: %[[SEL1:.*]] = select <vscale x 8 x i1> %[[FCMP1]], <vscale x 8 x float> %[[LOAD1]]
				; CHECK: %[[SEL2:.*]] = select <vscale x 8 x i1> %[[FCMP2]], <vscale x 8 x float> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[FCMP:.*]] = fcmp fast ogt <vscale x 8 x float> %[[SEL1]], %[[SEL2]]
				; CHECK-NEXT: %[[SEL:.*]] = select fast <vscale x 8 x i1> %[[FCMP]], <vscale x 8 x float> %[[SEL1]], <vscale x 8 x float> %[[SEL2]]
				; CHECK-NEXT: call fast float @llvm.vector.reduce.fmax.nxv8f32(<vscale x 8 x float> %[[SEL]])
				entry:
				br label %for.body

				for.body:
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi float [ 0.000000e+00, %entry ], [ %.sroa.speculated, %for.body ]
				%arrayidx = getelementptr inbounds float, float* %a, i64 %iv
				%0 = load float, float* %arrayidx, align 4
				%cmp.i = fcmp fast ogt float %0, %sum.07
				%.sroa.speculated = select i1 %cmp.i, float %0, float %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret float %.sroa.speculated
				}

				; Reduction cannot be vectorized

				; MUL

				; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
				; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
				define i32 @mul(i32* nocapture %a, i32* nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @mul
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <4 x i32>
				; CHECK: %[[LOAD2:.*]] = load <4 x i32>
				; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD1]]
				; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD2]]
				; CHECK: middle.block:
				; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
				; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ]
				%sum.07 = phi i32 [ 2, %entry ], [ %mul, %for.body ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %iv
				%0 = load i32, i32* %arrayidx, align 4
				%mul = mul nsw i32 %0, %sum.07
				%iv.next = add nuw nsw i64 %iv, 1
				%exitcond.not = icmp eq i64 %iv.next, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end: ; preds = %for.body, %entry
				ret i32 %mul
				craig.topperUnsubmitted Not Done Reply Inline Actions "end emit a warning" -> "and emit a warning"? craig.topper: "end emit a warning" -> "and emit a warning"?
				luke957AuthorUnsubmitted Done Reply Inline Actions Fix typo. Remove fmin and fmax test cases. luke957: Fix typo. Remove fmin and fmax test cases.
				}

				; Note: This test was added to ensure we always check the legality of reductions (and emit a warning if necessary) before checking for memory dependencies
				; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
				; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
				define i32 @memory_dependence(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i64 %n) {
				; CHECK-LABEL: @memory_dependence
				; CHECK: vector.body:
				; CHECK: %[[LOAD1:.*]] = load <4 x i32>
				; CHECK: %[[LOAD2:.*]] = load <4 x i32>
				; CHECK: %[[LOAD3:.*]] = load <4 x i32>
				; CHECK: %[[LOAD4:.*]] = load <4 x i32>
				; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD3]], %[[LOAD1]]
				; CHECK: %[[ADD2:.*]] = add nsw <4 x i32> %[[LOAD4]], %[[LOAD2]]
				; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD3]]
				; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD4]]
				; CHECK: middle.block:
				; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
				; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ %inc, %for.body ], [ 0, %entry ]
				%sum = phi i32 [ %mul, %for.body ], [ 2, %entry ]
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %i
				%0 = load i32, i32* %arrayidx, align 4
				%arrayidx1 = getelementptr inbounds i32, i32* %b, i64 %i
				%1 = load i32, i32* %arrayidx1, align 4
				%add = add nsw i32 %1, %0
				%add2 = add nuw nsw i64 %i, 32
				%arrayidx3 = getelementptr inbounds i32, i32* %a, i64 %add2
				store i32 %add, i32* %arrayidx3, align 4
				%mul = mul nsw i32 %1, %sum
				%inc = add nuw nsw i64 %i, 1
				%exitcond.not = icmp eq i64 %inc, %n
				br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !0

				for.end:
				ret i32 %mul
				}

				attributes #0 = { "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" }

				!0 = distinct !{!0, !1, !2, !3, !4}
				!1 = !{!"llvm.loop.vectorize.width", i32 8}
				!2 = !{!"llvm.loop.vectorize.scalable.enable", i1 true}
				!3 = !{!"llvm.loop.interleave.count", i32 2}
				!4 = !{!"llvm.loop.vectorize.enable", i1 true}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add legality check for vectoring reduction
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346675

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/test/Transforms/LoopVectorize/RISCV/scalable-reductions.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add legality check for vectoring reductionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346675

llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h

llvm/test/Transforms/LoopVectorize/RISCV/scalable-reductions.ll

[RISCV] Add legality check for vectoring reduction
ClosedPublic