Download Raw Diff

Details

Reviewers

tvvikram
mkuper
kristof.beyls
sdesmalen
Ayal

Commits

rG3f5ce18658f0: Reland "Relax constraints for reduction vectorization"
rL355889: Reland "Relax constraints for reduction vectorization"
rG93f8cc186ace: Relax constraints for reduction vectorization
rL355868: Relax constraints for reduction vectorization

Summary

I'm somewhat unsure here, but gating vectorizing reductions on all
fastmath flags seems unnecessary; reassoc should be sufficient.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 27713
Build 27712: arc lint + arc unit

Event Timeline

sanjoy created this revision.Feb 4 2019, 5:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2019, 5:35 PM

Herald added subscribers: bixia, jlebar, mcrosier. · View Herald Transcript

Harbormaster completed remote builds in B27713: Diff 185204.Feb 4 2019, 5:35 PM

sanjoy added a subscriber: jmolloy.Feb 4 2019, 5:35 PM

ping!

sanjoy added a reviewer: kristof.beyls.Feb 19 2019, 2:34 PM

kristof.beyls added a reviewer: sdesmalen.Feb 20 2019, 12:03 AM

huntergr added a subscriber: huntergr.Feb 20 2019, 5:26 AM

sdesmalen added inline comments.Feb 20 2019, 8:19 AM

lib/Analysis/IVDescriptors.cpp
550	Is there a reason for the condition `I->hasAllowContract()`? As far as I can tell only `hasAllowReassoc()` seems required and thus `hasAllowContract()` would be overly restrictive. As long as `hasAllowReassoc()` is true, I think the resulting reduction operations should have the same properties as the original instruction. For example, if an instruction has `hasNoNaNs() == hasNoInfs() == false`, the vectorised reduction retain those properties under reassocation. When trying out this patch, the reduction block seems to assume `fast` instead of just `reassoc`, e.g. middle.block: %bin.rdx = fadd fast <4 x float> %9, %8 You'll probably want to fix that and extend the test to ensure the flags are retained.

Propagate only correct FP fast-math flags.

I'd appreciate a careful review here since I'm not sure all of the ways we
codegen reductions in the loop vectorizer.

Harbormaster completed remote builds in B28735: Diff 189097.Mar 3 2019, 2:00 PM

sanjoy added a parent revision: D58887: PHI nodes are not `FPMathOperator` s.Mar 3 2019, 2:01 PM

sanjoy marked an inline comment as done.

dcaballe added a subscriber: dcaballe.Mar 4 2019, 10:57 AM

Rebase on trunk.

Harbormaster completed remote builds in B28784: Diff 189272.Mar 4 2019, 8:16 PM

sanjoy added a reviewer: Ayal.Mar 5 2019, 3:36 PM

sanjoy edited the summary of this revision. (Show Details)

Thanks for these changes to your patch @sanjoy! The patch is looking in good shape, just a few remaining nits mostly.

include/llvm/Analysis/IVDescriptors.h
244 ↗	(On Diff #189272)	nit: recurrenct -> recurrent
lib/Analysis/IVDescriptors.cpp
254	Is it worth adding a comment describing that FMF will be an intersection of the FastMathFlags from all the reduction operations (and thus needs to start with the full set of flags)?
299	nit: unnecessary curly braces
558	nit: since 'CanVectorizeReduction()' is only used once, it probably makes more sense to just expand it here and remove the function.
lib/CodeGen/ExpandReductions.cpp
124 ↗	(On Diff #189272)	Rather than choosing 'getFast() and going through 'IsOrdered' (which is set when the full 'fast' property is set), do we want to take the FastMathFlags from the IntrinsicInst directly?
124 ↗	(On Diff #189272)	Note that there is a test missing for this change.
lib/Transforms/Utils/LoopUtils.cpp
675 ↗	(On Diff #189272)	nit: unnecessary curly braces.
lib/Transforms/Vectorize/LoopVectorize.cpp
322 ↗	(On Diff #189272)	nit: unnecessary curly braces.
329 ↗	(On Diff #189272)	nit: unnecessary curly braces.
test/Transforms/LoopVectorize/reduction-fastmath.ll
49	Do you want to add a check for the 'fast' attribute on the reductions in the middle.block, similar to what you've done for reduction_sum_float_only_reassoc_and_contract ?

Address review comments

lib/CodeGen/ExpandReductions.cpp
124 ↗	(On Diff #189272)	Rather than choosing 'getFast() and going through 'IsOrdered' (which is set when the full 'fast' property is set), do we want to take the FastMathFlags from the IntrinsicInst directly? My rationale was that I wanted to keep this as obviously NFC as possible -- previously we would implicitly tag the instructions as `fast`, and this change just makes it explicit. Note that there is a test missing for this change. This change is not supposed to change any behavior (assuming no bugs :) ). Do you want me to add a test case to check existing behavior? `CodeGen/Generic/expand-experimental-reductions.ll` already tests for `fast` flags in the expansion.

Harbormaster completed remote builds in B28858: Diff 189593.Mar 6 2019, 2:14 PM

Thanks for updating your patch @sanjoy ! LGTM (with a side-note on a follow up patch to take more specific flags in expandReductions).
You may want to update the commit message before you commit :)

lib/CodeGen/ExpandReductions.cpp
124 ↗	(On Diff #189272)	My rationale was that I wanted to keep this as obviously NFC as possible -- previously we would implicitly tag the instructions as fast, and this change just makes it explicit. In a way that feels a bit artificial since the patch is already not NFC and the purpose of this patch is to be more specific about passing the exact flags. However, if you think it makes more sense to do this change in a separate patch, that's fine with me.

This revision is now accepted and ready to land.Mar 11 2019, 4:53 AM

sanjoy marked 2 inline comments as done.Mar 11 2019, 2:27 PM

sanjoy added inline comments.

lib/CodeGen/ExpandReductions.cpp
124 ↗	(On Diff #189272)	As I said above, we already have tests for this behavior (that we propagate fast-math flags into `getShuffleReduction`.

Closed by commit rL355868: Relax constraints for reduction vectorization (authored by sanjoy). · Explain WhyMar 11 2019, 2:35 PM

This revision was automatically updated to reflect the committed changes.

Diff 185204

lib/Analysis/IVDescriptors.cpp

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	if (!isIntegerRecurrenceKind(Kind))
return false;		return false;
if (isArithmeticRecurrenceKind(Kind))		if (isArithmeticRecurrenceKind(Kind))
Start = lookThroughAnd(Phi, RecurrenceType, VisitedInsts, CastInsts);		Start = lookThroughAnd(Phi, RecurrenceType, VisitedInsts, CastInsts);
}		}

Worklist.push_back(Start);		Worklist.push_back(Start);
VisitedInsts.insert(Start);		VisitedInsts.insert(Start);

// A value in the reduction can be used:		// A value in the reduction can be used:
		sdesmalenUnsubmitted Done Reply Inline Actions Is it worth adding a comment describing that FMF will be an intersection of the FastMathFlags from all the reduction operations (and thus needs to start with the full set of flags)? sdesmalen: Is it worth adding a comment describing that FMF will be an intersection of the FastMathFlags…
// - By the reduction:		// - By the reduction:
// - Reduction operation:		// - Reduction operation:
// - One use of reduction value (safe).		// - One use of reduction value (safe).
// - Multiple use of reduction value (not safe).		// - Multiple use of reduction value (not safe).
// - PHI:		// - PHI:
// - All uses of the PHI must be the reduction (safe).		// - All uses of the PHI must be the reduction (safe).
// - Otherwise, not safe.		// - Otherwise, not safe.
// - By instructions outside of the loop (safe).		// - By instructions outside of the loop (safe).
Show All 28 Lines	while (!Worklist.empty()) {

// Any reduction instruction must be of one of the allowed kinds. We ignore		// Any reduction instruction must be of one of the allowed kinds. We ignore
// the starting value (the Phi or an AND instruction if the Phi has been		// the starting value (the Phi or an AND instruction if the Phi has been
// type-promoted).		// type-promoted).
if (Cur != Start) {		if (Cur != Start) {
ReduxDesc = isRecurrenceInstr(Cur, Kind, ReduxDesc, HasFunNoNaNAttr);		ReduxDesc = isRecurrenceInstr(Cur, Kind, ReduxDesc, HasFunNoNaNAttr);
if (!ReduxDesc.isRecurrence())		if (!ReduxDesc.isRecurrence())
return false;		return false;
}		}
		sdesmalenUnsubmitted Done Reply Inline Actions nit: unnecessary curly braces sdesmalen: nit: unnecessary curly braces

bool IsASelect = isa<SelectInst>(Cur);		bool IsASelect = isa<SelectInst>(Cur);

// A conditional reduction operation must only have 2 or less uses in		// A conditional reduction operation must only have 2 or less uses in
// VisitedInsts.		// VisitedInsts.
if (IsASelect && (Kind == RK_FloatAdd \|\| Kind == RK_FloatMult) &&		if (IsASelect && (Kind == RK_FloatAdd \|\| Kind == RK_FloatMult) &&
hasMultipleUsesOf(Cur, VisitedInsts, 2))		hasMultipleUsesOf(Cur, VisitedInsts, 2))
return false;		return false;
▲ Show 20 Lines • Show All 233 Lines • ▼ Show 20 Lines	if ((m_FAdd(m_Value(Op1), m_Value(Op2)).match(I1) \|\|
return InstDesc(Kind == RK_FloatAdd, SI);		return InstDesc(Kind == RK_FloatAdd, SI);

if (m_FMul(m_Value(Op1), m_Value(Op2)).match(I1) && (I1->isFast()))		if (m_FMul(m_Value(Op1), m_Value(Op2)).match(I1) && (I1->isFast()))
return InstDesc(Kind == RK_FloatMult, SI);		return InstDesc(Kind == RK_FloatMult, SI);

return InstDesc(false, I);		return InstDesc(false, I);
}		}

		static bool CanVectorizeReduction(Instruction *I) {
		return I->hasAllowReassoc() && I->hasAllowContract();
		sdesmalenUnsubmitted Done Reply Inline Actions Is there a reason for the condition `I->hasAllowContract()`? As far as I can tell only `hasAllowReassoc()` seems required and thus `hasAllowContract()` would be overly restrictive. As long as `hasAllowReassoc()` is true, I think the resulting reduction operations should have the same properties as the original instruction. For example, if an instruction has `hasNoNaNs() == hasNoInfs() == false`, the vectorised reduction retain those properties under reassocation. When trying out this patch, the reduction block seems to assume `fast` instead of just `reassoc`, e.g. middle.block: %bin.rdx = fadd fast <4 x float> %9, %8 You'll probably want to fix that and extend the test to ensure the flags are retained. sdesmalen: Is there a reason for the condition `I->hasAllowContract()`? As far as I can tell only…
		}

RecurrenceDescriptor::InstDesc		RecurrenceDescriptor::InstDesc
RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,		RecurrenceDescriptor::isRecurrenceInstr(Instruction *I, RecurrenceKind Kind,
InstDesc &Prev, bool HasFunNoNaNAttr) {		InstDesc &Prev, bool HasFunNoNaNAttr) {
bool FP = I->getType()->isFloatingPointTy();		bool FP = I->getType()->isFloatingPointTy();
Instruction *UAI = Prev.getUnsafeAlgebraInst();		Instruction *UAI = Prev.getUnsafeAlgebraInst();
if (!UAI && FP && !I->isFast())		if (!UAI && FP && !CanVectorizeReduction(I))
		sdesmalenUnsubmitted Done Reply Inline Actions nit: since 'CanVectorizeReduction()' is only used once, it probably makes more sense to just expand it here and remove the function. sdesmalen: nit: since 'CanVectorizeReduction()' is only used once, it probably makes more sense to just…
UAI = I; // Found an unsafe (unvectorizable) algebra instruction.		UAI = I; // Found an unsafe (unvectorizable) algebra instruction.

switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
return InstDesc(false, I);		return InstDesc(false, I);
case Instruction::PHI:		case Instruction::PHI:
return InstDesc(I, Prev.getMinMaxKind(), Prev.getUnsafeAlgebraInst());		return InstDesc(I, Prev.getMinMaxKind(), Prev.getUnsafeAlgebraInst());
case Instruction::Sub:		case Instruction::Sub:
▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/reduction-fastmath.ll

This file was added.

				; RUN: opt -S -loop-vectorize < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				define float @reduction_sum_float_ieee(i32 %n, float* %array) {
				; CHECK-LABEL: define float @reduction_sum_float_ieee(
				entry:
				%entry.cond = icmp ne i32 0, 4096
				br i1 %entry.cond, label %loop, label %loop.exit

				loop:
				%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
				%sum = phi float [ 0.000000e+00, %entry ], [ %sum.inc, %loop ]
				%address = getelementptr float, float* %array, i32 %idx
				%value = load float, float* %address
				%sum.inc = fadd float %sum, %value
				%idx.inc = add i32 %idx, 1
				%be.cond = icmp ne i32 %idx.inc, 4096
				br i1 %be.cond, label %loop, label %loop.exit

				loop.exit:
				%sum.lcssa = phi float [ %sum.inc, %loop ], [ 0.000000e+00, %entry ]
				; CHECK-NOT: %wide.load = load <4 x float>, <4 x float>*
				; CHECK: ret float %sum.lcssa
				ret float %sum.lcssa
				}

				define float @reduction_sum_float_fastmath(i32 %n, float* %array) {
				; CHECK-LABEL: define float @reduction_sum_float_fastmath(
				entry:
				%entry.cond = icmp ne i32 0, 4096
				br i1 %entry.cond, label %loop, label %loop.exit

				loop:
				%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
				%sum = phi float [ 0.000000e+00, %entry ], [ %sum.inc, %loop ]
				%address = getelementptr float, float* %array, i32 %idx
				%value = load float, float* %address
				%sum.inc = fadd fast float %sum, %value
				%idx.inc = add i32 %idx, 1
				%be.cond = icmp ne i32 %idx.inc, 4096
				br i1 %be.cond, label %loop, label %loop.exit

				loop.exit:
				%sum.lcssa = phi float [ %sum.inc, %loop ], [ 0.000000e+00, %entry ]
				; CHECK: %wide.load = load <4 x float>, <4 x float>*
				; CHECK: ret float %sum.lcssa
				ret float %sum.lcssa
				sdesmalenUnsubmitted Done Reply Inline Actions Do you want to add a check for the 'fast' attribute on the reductions in the middle.block, similar to what you've done for reduction_sum_float_only_reassoc_and_contract ? sdesmalen: Do you want to add a check for the 'fast' attribute on the reductions in the middle.block…
				}

				define float @reduction_sum_float_partial_fastmath(i32 %n, float* %array) {
				; CHECK-LABEL: define float @reduction_sum_float_partial_fastmath(
				entry:
				%entry.cond = icmp ne i32 0, 4096
				br i1 %entry.cond, label %loop, label %loop.exit

				loop:
				%idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ]
				%sum = phi float [ 0.000000e+00, %entry ], [ %sum.inc, %loop ]
				%address = getelementptr float, float* %array, i32 %idx
				%value = load float, float* %address
				%sum.inc = fadd reassoc contract float %sum, %value
				%idx.inc = add i32 %idx, 1
				%be.cond = icmp ne i32 %idx.inc, 4096
				br i1 %be.cond, label %loop, label %loop.exit

				loop.exit:
				%sum.lcssa = phi float [ %sum.inc, %loop ], [ 0.000000e+00, %entry ]
				; CHECK: %wide.load = load <4 x float>, <4 x float>*
				; CHECK: ret float %sum.lcssa
				ret float %sum.lcssa
				}

This is an archive of the discontinued LLVM Phabricator instance.

Relax constraints for reduction vectorization
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 185204

lib/Analysis/IVDescriptors.cpp

test/Transforms/LoopVectorize/reduction-fastmath.ll

This is an archive of the discontinued LLVM Phabricator instance.

Relax constraints for reduction vectorizationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 185204

lib/Analysis/IVDescriptors.cpp

test/Transforms/LoopVectorize/reduction-fastmath.ll

Relax constraints for reduction vectorization
ClosedPublic