This is an archive of the discontinued LLVM Phabricator instance.

Differential D11089

[NVPTX] declare no vector registers
ClosedPublic

Authored by jingyue on Jul 9 2015, 3:51 PM.

Download Raw Diff

Details

Reviewers

jholewinski

Commits

rGad85c8c20487: [NVPTX] declare no vector registers
rL241884: [NVPTX] declare no vector registers

Summary

Without this patch, LoopVectorizer in certain cases (see loop-vectorize.ll)
produces code with complex control flow which hurts later optimizations. Since
NVPTX doesn't have vector registers in LLVM's sense
(NVPTXTTI::getRegisterBitWidth(true) == 32), we for now declare no vector
registers to effectively disable loop vectorization.

Diff Detail

Repository: rL LLVM

Event Timeline

jingyue updated this revision to Diff 29405.Jul 9 2015, 3:51 PM

jingyue retitled this revision from to [NVPTX] declare no vector registers.

jingyue updated this object.

jingyue added a reviewer: jholewinski.

jingyue added a subscriber: llvm-commits.

Herald added a subscriber: jholewinski. · View Herald TranscriptJul 9 2015, 3:51 PM

Justin, I wonder why NVPTX doesn't leverage vector instructions (such as
vadd) at all. llc on

fadd <2 x float> %a, %b

gives me two add.f32 instead of vadd.f32 or like.

Jingyue

LGTM

The short answer is that ptxas doesn't handle vector registers very well. It may be good to revisit this, but ptxas currently prefers scalar ops.

Last time I looked into this, the implementation cost greatly outweighed any potential benefits. At the SASS level, we don't have vector fp ops anyway.

Makes sense. Thank you.

Closed by commit rL241884: [NVPTX] declare no vector registers (authored by jingyue). · Explain WhyJul 9 2015, 9:32 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

NVPTX/

NVPTXTargetTransformInfo.h

2 lines

NVPTXTargetTransformInfo.cpp

6 lines

test/

CodeGen/

NVPTX/

loop-vectorize.ll

39 lines

Diff 29427

llvm/trunk/lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	public:
bool isSourceOfDivergence(const Value *V);		bool isSourceOfDivergence(const Value *V);

unsigned getArithmeticInstrCost(		unsigned getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);

		unsigned getNumberOfRegisters(bool Vector);
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/trunk/lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

Show First 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	case ISD::AND:
// those on types that can fit into one machine register.		// those on types that can fit into one machine register.
if (LT.second.SimpleTy == MVT::i64)		if (LT.second.SimpleTy == MVT::i64)
return 2 * LT.first;		return 2 * LT.first;
// Delegate other cases to the basic TTI.		// Delegate other cases to the basic TTI.
return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,		return BaseT::getArithmeticInstrCost(Opcode, Ty, Opd1Info, Opd2Info,
Opd1PropInfo, Opd2PropInfo);		Opd1PropInfo, Opd2PropInfo);
}		}
}		}

		unsigned NVPTXTTIImpl::getNumberOfRegisters(bool Vector) {
		if (Vector)
		return 0;
		return BaseT::getNumberOfRegisters(Vector);
		}

llvm/trunk/test/CodeGen/NVPTX/loop-vectorize.ll

				; RUN: opt < %s -O3 -S \| FileCheck %s

				target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
				target triple = "nvptx64-nvidia-cuda"

				define void @no_vectorization(i32 %n, i32 %a, i32 %b) {
				; CHECK-LABEL: no_vectorization(
				; CHECK-NOT: <4 x i32>
				; CHECK-NOT: <4 x i1>
				entry:
				%cmp.5 = icmp sgt i32 %n, 0
				br i1 %cmp.5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				ret void

				for.body: ; preds = %for.body.preheader, %for.body
				%i.06 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%add = add nsw i32 %i.06, %a
				%mul = mul nsw i32 %add, %b
				%cmp1 = icmp sgt i32 %mul, -1
				tail call void @llvm.assume(i1 %cmp1)
				%inc = add nuw nsw i32 %i.06, 1
				%exitcond = icmp eq i32 %inc, %n
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				}

				declare void @llvm.assume(i1) #0

				attributes #0 = { nounwind }

				!nvvm.annotations = !{!0}
				!0 = !{void (i32, i32, i32)* @no_vectorization, !"kernel", i32 1}