This is an archive of the discontinued LLVM Phabricator instance.

Generation of PSAD in LoopVectorizer
Needs ReviewPublic

Authored by Vijender on Mar 7 2015, 1:27 AM.

Download Raw Diff

Details

Reviewers

aschwaighofer
hfinkel
spatel

Summary

This patch corresponds to the discussion we had in the following link about generation of psad in loop vectorizer
http://permalink.gmane.org/gmane.comp.compilers.llvm.devel/81724

Couple of structures were added to make it generic so that other patterns in the future can be easily integrated.

Diff Detail

Repository: rL LLVM

Event Timeline

Vijender updated this revision to Diff 21419.Mar 7 2015, 1:27 AM

Vijender retitled this revision from to Generation of PSAD in LoopVectorizer.

Vijender updated this object.

Vijender edited the test plan for this revision. (Show Details)

Vijender added reviewers: hfinkel, aschwaighofer, spatel.

Vijender added a subscriber: Unknown Object (MLST).

ab added a subscriber: ab.Mar 7 2015, 4:24 PM

This patch needs regression tests, for for the vectorizer itself (in test/Transforms/LoopVectorize/X86) and CodeGen tests for the new intrinsic in test/CodeGen/X86.

Also, you need generic lowering support in LegalizeDAG for targets that don't support directly lowering the ISD node (and you should make sure that it is set by default to Expand in TargetLoweringBase::initActions).

include/llvm/Analysis/TargetTransformInfo.h
453	If this function applies only to ISDs, then it should not be here. This header only contains functions directly useful to IR-level passes. You can add this as a callback in the BasicTTI implementation, specialized by the targets, in order to share code.
454	Indenting is off.
573	Indenting is off.
729	Indenting is off.
include/llvm/CodeGen/ISDOpcodes.h
647	Please write out what SAD stands for. Please also say something more about the semantics, and any constraints on the relationship between the scalar type and the input vector element type?
include/llvm/IR/Intrinsics.td
584	Replace this comment with: // Calculate the Sum of Absolute Differences (SAD) of the two input vectors. Also, I assume only integer vectors are supported, in which case, please say so. (note that our IR-level intrinsics are defined only by their semantics, and we never guarantee to produce any specific instruction)
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1291 ↗	(On Diff #21419)	I don't understand this comment. Doesn't the beginning of this function assert the type legality of all node operands and return values (except for TargetConstants)?
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4423 ↗	(On Diff #21419)	Why v2i64 for v16i8?
4426 ↗	(On Diff #21419)	{ should be on the previous line.
4438 ↗	(On Diff #21419)	else should be on the line with the } (per our coding conventions)
4441 ↗	(On Diff #21419)	Shouldn't be i32, should the the type from the input intrinsic.
lib/Transforms/Vectorize/LoopVectorize.cpp
630	How about: // minVF - Specifies the minimum VF for which the target supports this pattern. For example, for x86 the minimum VF for SAD is 8.
2683	cv -> CV
2808	Remove commented-out code.
4251	two -> Two
4358	Line too long.
4548	Widest == Smallest?
5023	Line too long.

Hello All,

The Lowering code of SAD is separated from this patch. This patch contains changes in LoopVectorizer and cost modeling. The Lowering code with come as a separate patch. All the previous comments in LoopVectorizer are taken care, Please review the LoopVectorize code.

Thank you,
Vijender

Hi Vijender,

Thanks for doing this! I have a bunch of comments below. High-level, why are you lowering this as an intrinsic rather than an IR-level pattern? I ask because we really do want this pattern recognized in straight-line (non vectorized) code too, and that is something the DAGCombiner could do.

Cheers,

James

include/llvm/Analysis/TargetTransformInfo.h
455	Please spell out "SAD" = "Sum of absolute differences" = "C += abs(A - B)" in the docstring, so it is clear (not all backends will use the same mnemonic).
include/llvm/CodeGen/ISDOpcodes.h
647	"of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the same as the output type?
include/llvm/IR/Intrinsics.td
596	Char? Why i32? why not i64? why not i8/i16?
597	What is the signedness of this operation? Is there a floating point equivalent? If not, why not?
lib/Transforms/Vectorize/LoopVectorize.cpp
630	Just use "minVF" here, it won't conflict as long as you reference it in the initializer list.
642	I don't like the assumption here that the legality of an operation is a range [min,max]. I don't think we should assume continuity in the range. Perhaps this should be a bitset of supported types instead?
767	Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple patterns, and a complex pattern may subsume a simple pattern. So an instruction could be part of two patterns (until one is selected).
1014	Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi should have a smaller type, right? Your proposed algorithm falls down in this case: int32_t a; int8_t b; int32_t *c; for (...) { a += abs(c[i] - c[i+1]); b += (int8_t)c[i]; } Oh dear, you'll select "i8" as your type but that is illegal (it's not part of this PHI).
1050	I generally don't like the use of "Special". Can we not think of a more descriptive name for it, like "Pattern"?
2556	Well this isn't right - Who says the identity for all pattern based reductions is integer zero?
2582	If it assumed that all patterns can be added together, that isn't documented anywhere.
3014	I don't like this at all. You've introduced a new generic pattern type then binned that and special-cased for SAD. Why is SAD special? why is it different from ADD for this case?
3295	Args
3312	No default case?
4862	I think here you're special casing the SpecialPattern stuff when really it shouldn't be special-cased - this is a problem (finding the smallest valid type for an operation) that applies in many cases to most operators, not just your patterns. I think it gives the biggest impact in ARM/MIPS world though as i8 and i16 aren't legal scalar types for us.

Hello James,

Thank you for your reply. My comments are below.

Thanks for doing this! I have a bunch of comments below. High-level, why are >you lowering this as an intrinsic rather than an IR-level pattern? I ask because >we really do want this pattern recognized in straight-line (non vectorized) code >too, and that is something the DAGCombiner could do.

[VJ] I got your concern and I do accept your point. But if I handle this in DAGCombiner there will be too many patterns to handle because there will be multiple paths reaching the DAGCombiner. For example if LV is enabled and VF is selected as 4, then the pattern contains instructions expanded to V4. Similar is the case with VF =8. If Vectorizer is disabled then the pattern will be scalar pattern. So basically we want to solve it in divide and conquer approach. Right now we have handled in LV and a serious effort is done by Shahid to handle it in SLP (because there will be cases where LV cannot handle due to loop unrolling and other reasons). Now the only remaining part will be non vectorize path which can be handled in DAGCombiner as you mentioned which will be the future work.

[VJ] The reason for replacing the pattern with an intrinsic is that we don’t want that pattern to be disturbed by other optimizations.

Vijender

Thank you James. Comments below.

include/llvm/CodeGen/ISDOpcodes.h
647	[VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a byte level reduction operation. If you have a look at PSAD in X86 or USAD in ARM, all happen at byte level. [VJ] - The scalar type of output is not same as scalar type of input because sum of eight i8's may not fit properly in an i8 without data lose.
include/llvm/IR/Intrinsics.td
596	[VJ] - I accept your point. We can make it to return any type.
597	[VJ] - There is no floating point equivalent of SAD.
lib/Transforms/Vectorize/LoopVectorize.cpp
642	[VJ] - Can you please explain how to solve this issue. Are you telling to have supported types of the instruction in a list and check that instead of checking the VF factor?
767	[VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before inserting in the ActionList. Because I think it will not make sense to keep that instruction in both the patterns. Rather we can select which pattern to associate that instruction before inserting. What do you think?
1014	[VJ] Actually this value is required to select the maximum VF factor to try. So lower the value the higher the vectorization factors I can try.
1050	[VJ] - This "Special" word was suggested in the previous review comments in RFC.
2556	[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective identity. Will work on it.
2582	[VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective instruction. Will work on it.
3014	[VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD intrinsic output is already a scalar. I need to skip this merger block for SAD.
4862	[VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can try all the vectorization factors? Anyways cost modeling already handles everything properly so I feel there is no harm in using getSmallestType.

Hi Vj,

I've given some more comments. Sorry for any lag on this review, I'm moving house and it's the Easter break.

Cheers,

James

include/llvm/CodeGen/ISDOpcodes.h
647	Hi, No they don't, not in ARM at least. We have i8, i16, i32 and i64 variants. Also, we mustn't tie target agnostic intrinsics to a specific target or set of targets. It should work with whatever types make sense to it. Also, ARM at least does not return a scalar. Its SAD instructions happen elementwise on a vector (see VABA/UABA/SABA). So we'd need support for this too. Does X86 have this?
include/llvm/IR/Intrinsics.td
597	Not in X86, but why not in LLVM? LLVM's target agnostic layer is not X86. Does SAD not make sense on FP types? Also, what about signedness?

What is the status of this patch? Is there any plan to move it forward?

In D8136#237540, @congh wrote:

What is the status of this patch? Is there any plan to move it forward?

Hi Congh,

Yes, there is a plan for this.

This work depends on the two llvm intrinsics, llvm@*absdiff and llvm@*hsum. Two patches, http://reviews.llvm.org/D11678
and http://reviews.llvm.org/D10964, related to these intrinsics are under review and almost waiting for review clearance. Once those are through we will update this patch accordingly.

Regards,
Shahid

In D8136#237985, @ashahid wrote:

In D8136#237540, @congh wrote:

What is the status of this patch? Is there any plan to move it forward?

Hi Congh,

Yes, there is a plan for this.

This work depends on the two llvm intrinsics, llvm@*absdiff and llvm@*hsum. Two patches, http://reviews.llvm.org/D11678
and http://reviews.llvm.org/D10964, related to these intrinsics are under review and almost waiting for review clearance. Once those are through we will update this patch accordingly.

Regards,
Shahid

Thank you very much for the information, Shahid! I am looking forward to the check-in of those patches.

Based on the discussion and link above, it is not clear to me how PSAD can be implemented through ABSDIFF and HSUM. Take doing PSAD on 2 x v16i8 -> v?i32 for example: currently v16i8 will be widened before DIFF and ABS, so is ABSDIFF for the widened type (v16i16 or v16i32) or for v16i8 here? How HSUM is used to represent PSAD?

spatel resigned from this revision.Sep 26 2017, 3:49 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

10 lines

TargetTransformInfoImpl.h

4 lines

CodeGen/

BasicTTIImpl.h

3 lines

ISDOpcodes.h

5 lines

IR/

Intrinsics.td

6 lines

lib/

Analysis/

TargetTransformInfo.cpp

5 lines

Target/

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

27 lines

Transforms/

Vectorize/

LoopVectorize.cpp

382 lines

test/

Transforms/

LoopVectorize/

X86/

sad-pattern.ll

34 lines

Diff 23038

include/llvm/Analysis/TargetTransformInfo.h

Context not available.
	unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,	unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
	ArrayRef<Type *> Tys) const;	ArrayRef<Type *> Tys) const;
		hfinkelUnsubmitted Not Done Reply Inline Actions If this function applies only to ISDs, then it should not be here. This header only contains functions directly useful to IR-level passes. You can add this as a callback in the BasicTTI implementation, specialized by the targets, in order to share code. hfinkel: If this function applies only to ISDs, then it should not be here. This header only contains…

		hfinkelUnsubmitted Not Done Reply Inline Actions Indenting is off. hfinkel: Indenting is off.
		/// \returns the cost of SAD instruction.
		jmolloyUnsubmitted Not Done Reply Inline Actions Please spell out "SAD" = "Sum of absolute differences" = "C += abs(A - B)" in the docstring, so it is clear (not all backends will use the same mnemonic). jmolloy: Please spell out "SAD" = "Sum of absolute differences" = "C += abs(A - B)" in the docstring, so…
		unsigned getSADInstrCost(Type RetTy, Type op1,
		Type *op2) const;

	/// \returns The cost of Call instructions.	/// \returns The cost of Call instructions.
	unsigned getCallInstrCost(Function F, Type RetTy,	unsigned getCallInstrCost(Function F, Type RetTy,
	ArrayRef<Type *> Tys) const;	ArrayRef<Type *> Tys) const;
		hfinkelUnsubmitted Not Done Reply Inline Actions Indenting is off. hfinkel: Indenting is off.
Context not available.
	bool IsPairwiseForm) = 0;	bool IsPairwiseForm) = 0;
	virtual unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,	virtual unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
	ArrayRef<Type *> Tys) = 0;	ArrayRef<Type *> Tys) = 0;
		virtual unsigned getSADInstrCost(Type RetTy, Type op1,
		Type *op2) = 0;
	virtual unsigned getCallInstrCost(Function F, Type RetTy,	virtual unsigned getCallInstrCost(Function F, Type RetTy,
	ArrayRef<Type *> Tys) = 0;	ArrayRef<Type *> Tys) = 0;
	virtual unsigned getNumberOfParts(Type *Tp) = 0;	virtual unsigned getNumberOfParts(Type *Tp) = 0;
		hfinkelUnsubmitted Not Done Reply Inline Actions Indenting is off. hfinkel: Indenting is off.
Context not available.
	ArrayRef<Type *> Tys) override {	ArrayRef<Type *> Tys) override {
	return Impl.getIntrinsicInstrCost(ID, RetTy, Tys);	return Impl.getIntrinsicInstrCost(ID, RetTy, Tys);
	}	}
		unsigned getSADInstrCost(Type RetTy, Type op1,
		Type *op2) override {
		return Impl.getSADInstrCost(RetTy, op1, op2);
		}
	unsigned getCallInstrCost(Function F, Type RetTy,	unsigned getCallInstrCost(Function F, Type RetTy,
	ArrayRef<Type *> Tys) override {	ArrayRef<Type *> Tys) override {
	return Impl.getCallInstrCost(F, RetTy, Tys);	return Impl.getCallInstrCost(F, RetTy, Tys);
Context not available.

include/llvm/Analysis/TargetTransformInfoImpl.h

Context not available.
	ArrayRef<Type *> Tys) {	ArrayRef<Type *> Tys) {
	return 1;	return 1;
	}	}

		unsigned getSADInstrCost(Type RetTy, Type op1, Type *op2) {
		return 1;
		}

	unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {	unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {
	return 1;	return 1;
Context not available.

include/llvm/CodeGen/BasicTTIImpl.h

Context not available.
	case Intrinsic::masked_load:	case Intrinsic::masked_load:
	return static_cast<T *>(this)	return static_cast<T *>(this)
	->getMaskedMemoryOpCost(Instruction::Load, RetTy, 0, 0);	->getMaskedMemoryOpCost(Instruction::Load, RetTy, 0, 0);
		case Intrinsic::sad:
		return static_cast<T *>(this)
		->getSADInstrCost(RetTy, Tys[0], Tys[1]);
	}	}

	const TargetLoweringBase *TLI = getTLI();	const TargetLoweringBase *TLI = getTLI();
Context not available.

include/llvm/CodeGen/ISDOpcodes.h

Context not available.
	/// read / write specifier, locality specifier and instruction / data cache	/// read / write specifier, locality specifier and instruction / data cache
	/// specifier.	/// specifier.
	PREFETCH,	PREFETCH,

		/// SAD - This corresponds to a Sum of Absolute Difference(SAD) instruction.
		/// The operands are vectors of char type with length of vector being power
		hfinkelUnsubmitted Not Done Reply Inline Actions Please write out what SAD stands for. Please also say something more about the semantics, and any constraints on the relationship between the scalar type and the input vector element type? hfinkel: Please write out what SAD stands for. Please also say something more about the semantics, and…
		jmolloyUnsubmitted Not Done Reply Inline Actions "of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the same as the output type? jmolloy: "of char type"? Do you mean i8? Why only i8? Why is the (scalar part of the) input type not the…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a byte level reduction operation. If you have a look at PSAD in X86 or USAD in ARM, all happen at byte level. [VJ] - The scalar type of output is not same as scalar type of input because sum of eight i8's may not fit properly in an i8 without data lose. Vijender: [VJ] - The Sum of absolute difference instruction is only appropriate for char Types. It is a…
		jmolloyUnsubmitted Not Done Reply Inline Actions Hi, No they don't, not in ARM at least. We have i8, i16, i32 and i64 variants. Also, we mustn't tie target agnostic intrinsics to a specific target or set of targets. It should work with whatever types make sense to it. Also, ARM at least does not return a scalar. Its SAD instructions happen elementwise on a vector (see VABA/UABA/SABA). So we'd need support for this too. Does X86 have this? jmolloy: Hi, No they don't, not in ARM at least. We have i8, i16, i32 and i64 variants. Also, we…
		/// of 2 and the result is a scalar integer.
		SAD,

	/// OUTCHAIN = ATOMIC_FENCE(INCHAIN, ordering, scope)	/// OUTCHAIN = ATOMIC_FENCE(INCHAIN, ordering, scope)
	/// This corresponds to the fence instruction. It takes an input chain, and	/// This corresponds to the fence instruction. It takes an input chain, and
Context not available.

include/llvm/IR/Intrinsics.td

Context not available.
	def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],	def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
	[], "llvm.clear_cache">;	[], "llvm.clear_cache">;

		// Calculate the Sum of Absolute Differences (SAD) of the two input vectors.
		// Only vectors of char Type are allowed.
		jmolloyUnsubmitted Not Done Reply Inline Actions Char? Why i32? why not i64? why not i8/i16? jmolloy: Char? Why i32? why not i64? why not i8/i16?
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - I accept your point. We can make it to return any type. Vijender: [VJ] - I accept your point. We can make it to return any type.
		def int_sad : Intrinsic<[llvm_i32_ty],
		jmolloyUnsubmitted Not Done Reply Inline Actions What is the signedness of this operation? Is there a floating point equivalent? If not, why not? jmolloy: What is the signedness of this operation? Is there a floating point equivalent? If not, why not?
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - There is no floating point equivalent of SAD. Vijender: [VJ] - There is no floating point equivalent of SAD.
		jmolloyUnsubmitted Not Done Reply Inline Actions Not in X86, but why not in LLVM? LLVM's target agnostic layer is not X86. Does SAD not make sense on FP types? Also, what about signedness? jmolloy: Not in X86, but why not in LLVM? LLVM's target agnostic layer is not X86. Does SAD not make…
		[llvm_anyvector_ty, llvm_anyvector_ty],
		[IntrNoMem]>;

	//===-------------------------- Masked Intrinsics -------------------------===//	//===-------------------------- Masked Intrinsics -------------------------===//
	//	//
	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty, LLVMPointerTo<0>,
Context not available.

lib/Analysis/TargetTransformInfo.cpp

Context not available.
	Opd1PropInfo, Opd2PropInfo);	Opd1PropInfo, Opd2PropInfo);
	}	}

		unsigned TargetTransformInfo::getSADInstrCost(Type RetTy, Type op1,
		Type *op2) const {
		return TTIImpl->getSADInstrCost(RetTy, op1, op2);
		}

	unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty,	unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Ty,
	int Index, Type *SubTp) const {	int Index, Type *SubTp) const {
	return TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);	return TTIImpl->getShuffleCost(Kind, Ty, Index, SubTp);
Context not available.

lib/Target/X86/X86TargetTransformInfo.h

Context not available.
	unsigned getNumberOfRegisters(bool Vector);	unsigned getNumberOfRegisters(bool Vector);
	unsigned getRegisterBitWidth(bool Vector);	unsigned getRegisterBitWidth(bool Vector);
	unsigned getMaxInterleaveFactor();	unsigned getMaxInterleaveFactor();
		unsigned getSADInstrCost(Type RetTy, Type op1,
		Type *op2);
	unsigned getArithmeticInstrCost(	unsigned getArithmeticInstrCost(
	unsigned Opcode, Type *Ty,	unsigned Opcode, Type *Ty,
	TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,	TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
Context not available.

lib/Target/X86/X86TargetTransformInfo.cpp

Context not available.

	return 2;	return 2;
	}	}
		unsigned X86TTIImpl::getSADInstrCost(Type RetTy, Type op1,
		Type *op2) {

		unsigned result = 0;
		std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(op1);
		EVT arg1Ty = TLI->getValueType(op1);
		EVT arg2Ty = TLI->getValueType(op2);
		assert((arg1Ty == arg2Ty) &&
		"cannot handle different type of arguments for SAD");

		MVT MTy = arg1Ty.getSimpleVT();

		static const CostTblEntry<MVT::SimpleValueType> SSE1CostTable[] = {
		{ISD::SAD, MVT::v8i8, 4},
		{ISD::SAD, MVT::v16i8, 5},
		};

		if (ST->hasSSE1() \|\| ST->hasAVX()) {
		int Idx = CostTableLookup(SSE1CostTable, ISD::SAD, MTy);
		if (Idx != -1)
		return result = LT.first * SSE1CostTable[Idx].Cost;
		}
		if (ST->is64Bit() \|\| ST->hasMMX()) {
		return 4;
		}
		return 0;
		}

	unsigned X86TTIImpl::getArithmeticInstrCost(	unsigned X86TTIImpl::getArithmeticInstrCost(
	unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,	unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,
	TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,	TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,
Context not available.

lib/Transforms/Vectorize/LoopVectorize.cpp

Context not available.
	/// Copy and widen the instructions from the old loop.	/// Copy and widen the instructions from the old loop.
	virtual void vectorizeLoop();	virtual void vectorizeLoop();

		/// While widening the instructions, skip the SAD pattern and replace
		/// with SAD intrinsic call.
		void generateSADInstruction(Instruction I, SmallVector<Value , 4> &args,
		VectorParts &Entry);

	/// \brief The Loop exit block may have single value PHI nodes where the	/// \brief The Loop exit block may have single value PHI nodes where the
	/// incoming value is 'Undef'. While vectorizing we only handled real values	/// incoming value is 'Undef'. While vectorizing we only handled real values
	/// that were defined inside the loop. Here we fix the 'undef case'.	/// that were defined inside the loop. Here we fix the 'undef case'.
Context not available.
	RK_IntegerMinMax, ///< Min/max implemented in terms of select(cmp()).	RK_IntegerMinMax, ///< Min/max implemented in terms of select(cmp()).
	RK_FloatAdd, ///< Sum of floats.	RK_FloatAdd, ///< Sum of floats.
	RK_FloatMult, ///< Product of floats.	RK_FloatMult, ///< Product of floats.
	RK_FloatMinMax ///< Min/max implemented in terms of select(cmp()).	RK_FloatMinMax, ///< Min/max implemented in terms of select(cmp()).
		RK_IntegerSpecial ///< Special patterns like SAD.
	};	};

	/// This enum represents the kinds of inductions that we support.	/// This enum represents the kinds of inductions that we support.
Context not available.
	MRK_FloatMax	MRK_FloatMax
	};	};

		// This enum is used to keep track of actions to be performed on
		// Instructions in special patterns.
		enum SpecialActionKind {
		SAK_Invalid,
		SAK_Skip, ///< Skip instruction which is part of special pattern
		SAK_Replace, ///< replace the instruction
		SAK_Arg, ///< arguments
		SAK_Phi ///< Phi instruction
		};

		// This keeps track of special pattern kind
		enum SpecialPatternKind {
		SPK_Invalid,
		SPK_Sad
		};

		// This struct holds information about each instruction in
		// Special Patterns
		struct SpecialPatternDescriptor {
		SpecialPatternDescriptor(SpecialPatternKind SPKind,
		SpecialActionKind SAK, unsigned miVF,
		hfinkelUnsubmitted Not Done Reply Inline Actions How about: // minVF - Specifies the minimum VF for which the target supports this pattern. For example, for x86 the minimum VF for SAD is 8. hfinkel: How about: // minVF - Specifies the minimum VF for which the target supports this pattern.
		jmolloyUnsubmitted Not Done Reply Inline Actions Just use "minVF" here, it won't conflict as long as you reference it in the initializer list. jmolloy: Just use "minVF" here, it won't conflict as long as you reference it in the initializer list.
		unsigned maVF)
		: Kind(SPKind), Action(SAK), minVF(miVF), maxVF(maVF) {}

		SpecialPatternDescriptor() : Kind(SPK_Invalid), Action(SAK_Invalid),
		minVF(0), maxVF(1) {}
		// Kind - to keep track of which pattern the instruction belongs
		SpecialPatternKind Kind;
		// Action - Specifies the action to be performed for this instruction
		SpecialActionKind Action;
		// minVF - Specifies the minimum VF for which the target supports
		// this pattern. For example, for x86 the minimum VF for SAD is 8.
		unsigned minVF;
		jmolloyUnsubmitted Not Done Reply Inline Actions I don't like the assumption here that the legality of an operation is a range [min,max]. I don't think we should assume continuity in the range. Perhaps this should be a bitset of supported types instead? jmolloy: I don't like the assumption here that the legality of an operation is a range [min,max]. I…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - Can you please explain how to solve this issue. Are you telling to have supported types of the instruction in a list and check that instead of checking the VF factor? Vijender: [VJ] - Can you please explain how to solve this issue. Are you telling to have supported types…
		// maxVF - Specifies the maximum VF for which the target supports
		// this pattern. For example, for x86 the maximum VF for SAD is 16.
		unsigned maxVF;
		};

	/// This struct holds information about reduction variables.	/// This struct holds information about reduction variables.
	struct ReductionDescriptor {	struct ReductionDescriptor {
	ReductionDescriptor() : StartValue(nullptr), LoopExitInstr(nullptr),	ReductionDescriptor() : StartValue(nullptr), LoopExitInstr(nullptr),
Context not available.
	/// This POD struct holds information about a potential reduction operation.	/// This POD struct holds information about a potential reduction operation.
	struct ReductionInstDesc {	struct ReductionInstDesc {
	ReductionInstDesc(bool IsRedux, Instruction *I) :	ReductionInstDesc(bool IsRedux, Instruction *I) :
	IsReduction(IsRedux), PatternLastInst(I), MinMaxKind(MRK_Invalid) {}	IsReduction(IsRedux), PatternLastInst(I), MinMaxKind(MRK_Invalid),
		SPKind(SPK_Invalid) {}

		ReductionInstDesc(Instruction *I, MinMaxReductionKind K,
		SpecialPatternKind SPK) :
		IsReduction(true), PatternLastInst(I), MinMaxKind(K), SPKind(SPK) {}

	ReductionInstDesc(Instruction *I, MinMaxReductionKind K) :	ReductionInstDesc(Instruction *I, MinMaxReductionKind K) :
	IsReduction(true), PatternLastInst(I), MinMaxKind(K) {}	IsReduction(true), PatternLastInst(I), MinMaxKind(K), SPKind(SPK_Invalid) {}

	// Is this instruction a reduction candidate.	// Is this instruction a reduction candidate.
	bool IsReduction;	bool IsReduction;
Context not available.
	Instruction *PatternLastInst;	Instruction *PatternLastInst;
	// If this is a min/max pattern the comparison predicate.	// If this is a min/max pattern the comparison predicate.
	MinMaxReductionKind MinMaxKind;	MinMaxReductionKind MinMaxKind;
		// if special pattern then hold the pattern kind.
		SpecialPatternKind SPKind;
	};	};

	/// A struct for saving information about induction variables.	/// A struct for saving information about induction variables.
Context not available.
	/// induction descriptor.	/// induction descriptor.
	typedef MapVector<PHINode*, InductionInfo> InductionList;	typedef MapVector<PHINode*, InductionInfo> InductionList;

		/// ActionList stores the actions to be performed for an instruction
		/// which is part of special pattern in a basic block.
		typedef DenseMap<Instruction *, SpecialPatternDescriptor> ActionList;
		jmolloyUnsubmitted Not Done Reply Inline Actions Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple patterns, and a complex pattern may subsume a simple pattern. So an instruction could be part of two patterns (until one is selected). jmolloy: Shouldn't this be a multimap? I can imagine that there might be complex patterns and simple…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before inserting in the ActionList. Because I think it will not make sense to keep that instruction in both the patterns. Rather we can select which pattern to associate that instruction before inserting. What do you think? Vijender: [VJ] - Ok. How about checking the instruction whether it belongs to any other pattern before…

	/// Returns true if it is legal to vectorize this loop.	/// Returns true if it is legal to vectorize this loop.
	/// This does not mean that it is profitable to vectorize this	/// This does not mean that it is profitable to vectorize this
	/// loop, only that it is legal to do so.	/// loop, only that it is legal to do so.
Context not available.
	/// Returns the induction variables found in the loop.	/// Returns the induction variables found in the loop.
	InductionList *getInductionVars() { return &Inductions; }	InductionList *getInductionVars() { return &Inductions; }

		/// Returns the actionlist for instructions in the loop.
		ActionList *getActionMap() { return &SpecialActions; }

	/// Returns the widest induction type.	/// Returns the widest induction type.
	Type *getWidestInductionType() { return WidestIndTy; }	Type *getWidestInductionType() { return WidestIndTy; }

Context not available.
	/// Collect the variables that need to stay uniform after vectorization.	/// Collect the variables that need to stay uniform after vectorization.
	void collectLoopUniforms();	void collectLoopUniforms();

		/// Returns true if we find a SAD pattern.
		bool isSADPattern(PHINode *Phi);

		/// Returns a ReductionInstDesc with SpecialPatternKind set if it matches
		/// any special pattern.
		ReductionInstDesc isSpecialPattern(PHINode *Phi, ReductionInstDesc &Prev);

	/// Return true if all of the instructions in the block can be speculatively	/// Return true if all of the instructions in the block can be speculatively
	/// executed. \p SafePtrs is a list of addresses that are known to be legal	/// executed. \p SafePtrs is a list of addresses that are known to be legal
	/// and we know that we can read from them without segfault.	/// and we know that we can read from them without segfault.
Context not available.
	PHINode *Induction;	PHINode *Induction;
	/// Holds the reduction variables.	/// Holds the reduction variables.
	ReductionList Reductions;	ReductionList Reductions;
		/// Holds the actions for instructions in Loop.
		ActionList SpecialActions;
	/// Holds all of the induction variables that we found in the loop.	/// Holds all of the induction variables that we found in the loop.
	/// Notice that inductions don't need to start at zero and that induction	/// Notice that inductions don't need to start at zero and that induction
	/// variables can be pointers.	/// variables can be pointers.
Context not available.
	/// needs to be vectorized. We ignore values that remain scalar such as	/// needs to be vectorized. We ignore values that remain scalar such as
	/// 64 bit loop indices.	/// 64 bit loop indices.
	unsigned getWidestType();	unsigned getWidestType();

		/// \return The most profitable type for the pattern.
		/// For example, for SAD pattern the return value is the smallest type
		/// presnt in basic block.
		/// If there is no special pattern then the widest type is returned.
		unsigned selectBestTypeForPattern();
		jmolloyUnsubmitted Not Done Reply Inline Actions Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi should have a smaller type, right? Your proposed algorithm falls down in this case: int32_t a; int8_t b; int32_t c; for (...) { a += abs(c[i] - c[i+1]); b += (int8_t)c[i]; } Oh dear, you'll select "i8" as your type but that is illegal (it's not part of this PHI). jmolloy:* Uh, surely this needs to be just the type of the phi? If smaller types are legal, the phi…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] Actually this value is required to select the maximum VF factor to try. So lower the value the higher the vectorization factors I can try. Vijender: [VJ] Actually this value is required to select the maximum VF factor to try. So lower the value…

	/// \return The most profitable unroll factor.	/// \return The most profitable unroll factor.
	/// If UserUF is non-zero then this method finds the best unroll-factor	/// If UserUF is non-zero then this method finds the best unroll-factor
Context not available.
	/// width. Vector width of one means scalar.	/// width. Vector width of one means scalar.
	unsigned getInstructionCost(Instruction *I, unsigned VF);	unsigned getInstructionCost(Instruction *I, unsigned VF);

		/// Returns the execution time cost of a special reduction instruction
		/// for a given vector width.
		unsigned SpecialPhiCost(PHINode *p, unsigned VF);
		jmolloyUnsubmitted Not Done Reply Inline Actions I generally don't like the use of "Special". Can we not think of a more descriptive name for it, like "Pattern"? jmolloy: I generally don't like the use of "Special". Can we not think of a more descriptive name for it…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - This "Special" word was suggested in the previous review comments in RFC. Vijender: [VJ] - This "Special" word was suggested in the previous review comments in RFC.

	/// Returns whether the instruction is a load or store and will be a emitted	/// Returns whether the instruction is a load or store and will be a emitted
	/// as a vector operation.	/// as a vector operation.
	bool isConsecutiveLoadOrStore(Instruction *I);	bool isConsecutiveLoadOrStore(Instruction *I);
Context not available.
	case RK_IntegerXor:	case RK_IntegerXor:
	case RK_IntegerAdd:	case RK_IntegerAdd:
	case RK_IntegerOr:	case RK_IntegerOr:
		case RK_IntegerSpecial:
		jmolloyUnsubmitted Not Done Reply Inline Actions Well this isn't right - Who says the identity for all pattern based reductions is integer zero? jmolloy: Well this isn't right - Who says the identity for all pattern based reductions is integer zero?
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective identity. Will work on it. Vijender: [VJ] - I got your point. I need to send the Phi node as one of the arguments to this function…
	// Adding, Xoring, Oring zero to a number does not change it.	// Adding, Xoring, Oring zero to a number does not change it.
	return ConstantInt::get(Tp, 0);	return ConstantInt::get(Tp, 0);
	case RK_IntegerMult:	case RK_IntegerMult:
Context not available.
	switch (Kind) {	switch (Kind) {
	case LoopVectorizationLegality::RK_IntegerAdd:	case LoopVectorizationLegality::RK_IntegerAdd:
	return Instruction::Add;	return Instruction::Add;
		case LoopVectorizationLegality::RK_IntegerSpecial:
		jmolloyUnsubmitted Not Done Reply Inline Actions If it assumed that all patterns can be added together, that isn't documented anywhere. jmolloy: If it assumed that all patterns can be added together, that isn't documented anywhere.
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - I got your point. I need to send the Phi node as one of the arguments to this function to know more about the type and return the respective instruction. Will work on it. Vijender: [VJ] - I got your point. I need to send the Phi node as one of the arguments to this function…
		return Instruction::Add;
	case LoopVectorizationLegality::RK_IntegerMult:	case LoopVectorizationLegality::RK_IntegerMult:
	return Instruction::Mul;	return Instruction::Mul;
	case LoopVectorizationLegality::RK_IntegerOr:	case LoopVectorizationLegality::RK_IntegerOr:
		hfinkelUnsubmitted Not Done Reply Inline Actions cv -> CV hfinkel: cv -> CV
Context not available.
	return V;	return V;
	}	}

		void InnerLoopVectorizer::generateSADInstruction(Instruction *I,
		SmallVector<Value *, 4> &args,
		VectorParts &Entry) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[I];
		switch (SPD.Action) {
		case LoopVectorizationLegality::SAK_Arg: {
		Instruction *CV = dyn_cast<CastInst>(I);
		if (CV) {
		VectorParts &Op = getVectorValue(CV->getOperand(0));
		args.push_back(Op[0]);
		}
		break;
		}
		case LoopVectorizationLegality::SAK_Skip:
		// do nothing
		break;
		case LoopVectorizationLegality::SAK_Replace: {
		Module *M = I->getParent()->getParent()->getParent();
		SmallVector<Type *, 4> Tys;
		Tys.push_back(args[0]->getType());
		Tys.push_back(args[1]->getType());
		Function *F = Intrinsic::getDeclaration(M, Intrinsic::sad, Tys);
		Value *intr_call = Builder.CreateCall(F, args);
		VectorParts &phi = getVectorValue(I->getOperand(1));
		Value *add = Builder.CreateBinOp(Instruction::Add, intr_call, phi[0]);
		Entry[0] = add;
		propagateMetadata(Entry, I);
		break;
		}
		case LoopVectorizationLegality::SAK_Phi: {
		Type *VecTy = I->getType();
		Entry[0] = PHINode::Create(VecTy, 2, "sadvec.phi",
		LoopVectorBody.back()->getFirstInsertionPt());
		break;
		}
		case LoopVectorizationLegality::SAK_Invalid:
		assert(0 && "Unknown action to be performed!!");
		break;
		}
		}
	/// Estimate the overhead of scalarizing a value. Insert and Extract are set if	/// Estimate the overhead of scalarizing a value. Insert and Extract are set if
	/// the result needs to be inserted and/or extracted from vectors.	/// the result needs to be inserted and/or extracted from vectors.
	static unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract,	static unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract,
		hfinkelUnsubmitted Not Done Reply Inline Actions Remove commented-out code. hfinkel: Remove commented-out code.
Context not available.
	for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();	for (PhiVector::iterator it = RdxPHIsToFix.begin(), e = RdxPHIsToFix.end();
	it != e; ++it) {	it != e; ++it) {
	PHINode RdxPhi = it;	PHINode RdxPhi = it;
		bool isSAD = false;
	assert(RdxPhi && "Unable to recover vectorized PHI");	assert(RdxPhi && "Unable to recover vectorized PHI");

	// Find the reduction variable descriptor.	// Find the reduction variable descriptor.
Context not available.
	LoopVectorizationLegality::ReductionDescriptor RdxDesc =	LoopVectorizationLegality::ReductionDescriptor RdxDesc =
	(*Legal->getReductionVars())[RdxPhi];	(*Legal->getReductionVars())[RdxPhi];

		if (RdxDesc.Kind == LoopVectorizationLegality::RK_IntegerSpecial) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[RdxPhi];

		if (SPD.minVF <= VF && SPD.maxVF >= VF &&
		SPD.Kind == LoopVectorizationLegality::SPK_Sad )
		isSAD = true;
		}

	setDebugLocFromInst(Builder, RdxDesc.StartValue);	setDebugLocFromInst(Builder, RdxDesc.StartValue);

	// We need to generate a reduction vector from the incoming scalar.	// We need to generate a reduction vector from the incoming scalar.
Context not available.
	RdxDesc.StartValue,	RdxDesc.StartValue,
	"minmax.ident");	"minmax.ident");
	}	}
		} else if (RdxDesc.Kind == LoopVectorizationLegality::RK_IntegerSpecial) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[RdxPhi];
		if (SPD.minVF <= VF && SPD.maxVF >= VF &&
		SPD.Kind == LoopVectorizationLegality::SPK_Sad)
		VectorStart = Identity = RdxDesc.StartValue;
	} else {	} else {
	// Handle other reduction kinds:	// Handle other reduction kinds:
	Constant *Iden =	Constant *Iden =
Context not available.
	ReducedPartRdx, RdxParts[part]);	ReducedPartRdx, RdxParts[part]);
	}	}

	if (VF > 1) {	if (VF > 1 && !isSAD) {
		jmolloyUnsubmitted Not Done Reply Inline Actions I don't like this at all. You've introduced a new generic pattern type then binned that and special-cased for SAD. Why is SAD special? why is it different from ADD for this case? jmolloy: I don't like this at all. You've introduced a new generic pattern type then binned that and…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD intrinsic output is already a scalar. I need to skip this merger block for SAD. Vijender: [VJ] the code inside the IF block reduces the vector to a single scalar value. Here my SAD…
	// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles	// VF is a power of 2 so we can emit the reduction using log2(VF) shuffles
	// and vector ops, reducing the set of values being computed by half each	// and vector ops, reducing the set of values being computed by half each
	// round.	// round.
Context not available.

	void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {	void InnerLoopVectorizer::vectorizeBlockInLoop(BasicBlock BB, PhiVector PV) {
	// For each instruction in the old loop.	// For each instruction in the old loop.
		SmallVector<Value *, 4> args;
		jmolloyUnsubmitted Not Done Reply Inline Actions Args jmolloy: Args
	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {	for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
	VectorParts &Entry = WidenMap.get(it);	VectorParts &Entry = WidenMap.get(it);
		if (Legal->getActionMap()->count(it)) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[it];
		if (SPD.minVF <= VF && SPD.maxVF >= VF) {
		switch (SPD.Kind) {
		case LoopVectorizationLegality::SPK_Sad:
		generateSADInstruction(it, args, Entry);
		if (SPD.Action == LoopVectorizationLegality::SAK_Phi) {
		PHINode *P = cast<PHINode>(it);
		PV->push_back(P);
		}
		break;
		case LoopVectorizationLegality::SPK_Invalid:
		assert(0 && "Cannot be Invalid here!!");
		break;
		jmolloyUnsubmitted Not Done Reply Inline Actions No default case? jmolloy: No default case?
		}
		continue;
		}
		}
	switch (it->getOpcode()) {	switch (it->getOpcode()) {
	case Instruction::Br:	case Instruction::Br:
	// Nothing to do for PHIs and BR, since we already took care of the	// Nothing to do for PHIs and BR, since we already took care of the
Context not available.

	continue;	continue;
	}	}
		// Check the Special Patterns.
		if (AddReductionVar(Phi, RK_IntegerSpecial)) {
		DEBUG(dbgs() << "LV: Found a Special reduction PHI." << *Phi <<"\n");
		continue;
		}
	if (AddReductionVar(Phi, RK_IntegerAdd)) {	if (AddReductionVar(Phi, RK_IntegerAdd)) {
	DEBUG(dbgs() << "LV: Found an ADD reduction PHI."<< *Phi <<"\n");	DEBUG(dbgs() << "LV: Found an ADD reduction PHI."<< *Phi <<"\n");
	continue;	continue;
Context not available.
	return Stride;	return Stride;
	}	}

		LoopVectorizationLegality::ReductionInstDesc
		LoopVectorizationLegality::isSpecialPattern(PHINode *Phi,
		ReductionInstDesc &Prev) {
		if (isSADPattern(Phi)) {
		return ReductionInstDesc(Phi, Prev.MinMaxKind, SPK_Sad);
		} else {
		return Prev;
		}
		}

	void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {	void LoopVectorizationLegality::collectStridedAccess(Value *MemAccess) {
	Value *Ptr = nullptr;	Value *Ptr = nullptr;
	if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))	if (LoadInst *LI = dyn_cast<LoadInst>(MemAccess))
		hfinkelUnsubmitted Not Done Reply Inline Actions two -> Two hfinkel: two -> Two
		hfinkelUnsubmitted Not Done Reply Inline Actions Line too long. hfinkel: Line too long.
Context not available.

	if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)	if (!FoundStartPHI \|\| !FoundReduxOp \|\| !ExitInstruction)
	return false;	return false;

		ReduxDesc = isSpecialPattern(Phi, ReduxDesc);

		if (Kind == RK_IntegerSpecial && ReduxDesc.SPKind == SPK_Invalid)
		return false;

	// We found a reduction var if we have reached the original phi node and we	// We found a reduction var if we have reached the original phi node and we
	// only have a single instruction with out-of-loop users.	// only have a single instruction with out-of-loop users.

Context not available.
	return true;	return true;
	}	}

		/// \brief This function checks for the following pattern

		/// %sum.05 = phi i32 [ 0, %entry ], [ %add, %for.body ]
		/// ....
		/// %sub = sub nsw i32 %conv, %conv3
		/// %ispos = icmp uge i8 %0, %1
		/// %neg = sub nsw i32 0, %sub
		/// %2 = select i1 %ispos, i32 %sub, i32 %neg
		/// %add = add nsw i32 %2, %sum.05

		/// This pattern is matched from bottom. if everything matches
		/// then it returns true.
		bool LoopVectorizationLegality::isSADPattern(PHINode *Phi) {

		LLVMContext &Context = TheLoop->getHeader()->getContext();
		// SmallVector<Instruction, 2> SadIntrList = &SPDesc.IntrList;
		for (User *U : Phi->users()) {
		// This is supposed to be an add instruction
		if (!isa<Instruction>(U))
		return false;
		Instruction *UI = dyn_cast<Instruction>(U);
		if (UI->getOpcode() != Instruction::Add)
		return false;
		// all uses of Add instruction must be phi instructions
		for (User *Ur : UI->users()) {
		Instruction *I = dyn_cast<Instruction>(Ur);
		if (!isa<PHINode>(I))
		return false;
		}
		// Checking for Selection instruction
		if (!isa<SelectInst>(UI->getOperand(0)))
		return false;

		// two subs and one icmp instruction are matched
		Instruction *Sel = dyn_cast<Instruction>(UI->getOperand(0));
		if (!isa<Instruction>(Sel->getOperand(1)))
		return false;
		Instruction *Sub1 = dyn_cast<Instruction>(Sel->getOperand(1));
		if (!isa<Instruction>(Sel->getOperand(2)))
		return false;
		Instruction *Sub2 = dyn_cast<Instruction>(Sel->getOperand(2));
		if (Sub1->getOpcode() != Instruction::Sub \|\|
		Sub2->getOpcode() != Instruction::Sub \|\|
		!isa<ICmpInst>(Sel->getOperand(0)))
		return false;
		Instruction *cv1 = dyn_cast<CastInst>(Sub1->getOperand(0));
		Instruction *cv2 = dyn_cast<CastInst>(Sub1->getOperand(1));
		if (!cv1 \|\| !cv2 \|\| (cv1->getOperand(0)->getType()->getScalarType() !=
		Type::getInt8Ty(Context)) \|\|
		(cv2->getOperand(0)->getType()->getScalarType() !=
		Type::getInt8Ty(Context)))
		return false;

		if (!isa<Instruction>(Sel->getOperand(0)))
		return false;
		Instruction *icmp = dyn_cast<Instruction>(Sel->getOperand(0));
		if (icmp->getOperand(0) != Sub1 \|\| Sub2->getOperand(1) != Sub1) {
		if (((icmp->getOperand(0) != cv1->getOperand(0)) &&
		(icmp->getOperand(1) != cv1->getOperand(0))) \|\|
		((icmp->getOperand(0) != cv2->getOperand(0)) &&
		(icmp->getOperand(1) != cv2->getOperand(0))))
		return false;
		}
		// All these instructions are added to ActionList
		SpecialActions[UI] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Replace, 8, 16); // SAK_Replace
		SpecialActions[Sel] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Skip, 8, 16); // SAK_Skip
		SpecialActions[icmp] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Skip, 8, 16); // SAK_Skip
		SpecialActions[Sub2] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Skip, 8, 16); // SAK_Skip
		SpecialActions[Sub1] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Skip, 8, 16); // SAK_Skip
		SpecialActions[cv1] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Arg, 8, 16); // SAK_Arg
		SpecialActions[cv2] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Arg, 8, 16); // SAK_Arg
		SpecialActions[Phi] =
		SpecialPatternDescriptor(SPK_Sad, SAK_Phi, 8, 16); // SAK_Phi
		}
		return true;
		}

	/// Returns true if the instruction is a Select(ICmp(X, Y), X, Y) instruction	/// Returns true if the instruction is a Select(ICmp(X, Y), X, Y) instruction
	/// pattern corresponding to a min(X, Y) or max(X, Y).	/// pattern corresponding to a min(X, Y) or max(X, Y).
	LoopVectorizationLegality::ReductionInstDesc	LoopVectorizationLegality::ReductionInstDesc
Context not available.
	ReductionInstDesc &Prev) {	ReductionInstDesc &Prev) {
	bool FP = I->getType()->isFloatingPointTy();	bool FP = I->getType()->isFloatingPointTy();
	bool FastMath = FP && I->hasUnsafeAlgebra();	bool FastMath = FP && I->hasUnsafeAlgebra();
		if (Kind == RK_IntegerSpecial) {
		return ReductionInstDesc(true, I);
		}
	switch (I->getOpcode()) {	switch (I->getOpcode()) {
	default:	default:
	return ReductionInstDesc(false, I);	return ReductionInstDesc(false, I);
		hfinkelUnsubmitted Not Done Reply Inline Actions Widest == Smallest? hfinkel: Widest == Smallest?
Context not available.
	unsigned TC = SE->getSmallConstantTripCount(TheLoop);	unsigned TC = SE->getSmallConstantTripCount(TheLoop);
	DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');	DEBUG(dbgs() << "LV: Found trip count: " << TC << '\n');

	unsigned WidestType = getWidestType();	unsigned BestType = selectBestTypeForPattern();
	unsigned WidestRegister = TTI.getRegisterBitWidth(true);	unsigned WidestRegister = TTI.getRegisterBitWidth(true);
	unsigned MaxSafeDepDist = -1U;	unsigned MaxSafeDepDist = -1U;
	if (Legal->getMaxSafeDepDistBytes() != -1U)	if (Legal->getMaxSafeDepDistBytes() != -1U)
Context not available.
	MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;	MaxSafeDepDist = Legal->getMaxSafeDepDistBytes() * 8;
	WidestRegister = ((WidestRegister < MaxSafeDepDist) ?	WidestRegister = ((WidestRegister < MaxSafeDepDist) ?
	WidestRegister : MaxSafeDepDist);	WidestRegister : MaxSafeDepDist);
	unsigned MaxVectorSize = WidestRegister / WidestType;	unsigned MaxVectorSize = WidestRegister / BestType;
	DEBUG(dbgs() << "LV: The Widest type: " << WidestType << " bits.\n");	DEBUG(dbgs() << "LV: The Best type: " << BestType << " bits.\n");
	DEBUG(dbgs() << "LV: The Widest register is: "	DEBUG(dbgs() << "LV: The Widest register is: "
	<< WidestRegister << " bits.\n");	<< WidestRegister << " bits.\n");

Context not available.
	return Factor;	return Factor;
	}	}

		unsigned LoopVectorizationCostModel::selectBestTypeForPattern() {
		unsigned MaxWidth = 8;
		const DataLayout &DL = TheFunction->getParent()->getDataLayout();
		unsigned specialPatternWidth = 1024;
		// For each block.
		for (Loop::block_iterator bb = TheLoop->block_begin(),
		be = TheLoop->block_end(); bb != be; ++bb) {
		BasicBlock BB = bb;

		// For each instruction in the loop.
		for (BasicBlock::iterator it = BB->begin(), e = BB->end(); it != e; ++it) {
		Type *T = it->getType();

		// Ignore ephemeral values.
		if (EphValues.count(it))
		continue;

		// Only examine Loads, Stores and PHINodes.
		if (!isa<LoadInst>(it) && !isa<StoreInst>(it) && !isa<PHINode>(it))
		continue;

		// Examine PHI nodes that are reduction variables.
		if (PHINode *PN = dyn_cast<PHINode>(it)) {
		if (!Legal->getReductionVars()->count(PN))
		continue;
		if(Legal->getActionMap()->count(PN)) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[it];
		switch(SPD.Kind) {
		case LoopVectorizationLegality::SPK_Sad:
		jmolloyUnsubmitted Not Done Reply Inline Actions I think here you're special casing the SpecialPattern stuff when really it shouldn't be special-cased - this is a problem (finding the smallest valid type for an operation) that applies in many cases to most operators, not just your patterns. I think it gives the biggest impact in ARM/MIPS world though as i8 and i16 aren't legal scalar types for us. jmolloy: I think here you're special casing the SpecialPattern stuff when really it shouldn't be special…
		VijenderAuthorUnsubmitted Not Done Reply Inline Actions [VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can try all the vectorization factors? Anyways cost modeling already handles everything properly so I feel there is no harm in using getSmallestType. Vijender: [VJ] - So shall I replace the getWidestType in the code with the getSmallestType so that it can…
		specialPatternWidth = std::min(specialPatternWidth,(unsigned)8);
		continue;
		default : break;
		}
		}
		}

		// Examine the stored values.
		if (StoreInst *ST = dyn_cast<StoreInst>(it))
		T = ST->getValueOperand()->getType();

		// Ignore loaded pointer types and stored pointer types that are not
		// consecutive. However, we do want to take consecutive stores/loads of
		// pointer vectors into account.
		if (T->isPointerTy() && !isConsecutiveLoadOrStore(it))
		continue;

		MaxWidth = std::max(MaxWidth,
		(unsigned)DL.getTypeSizeInBits(T->getScalarType()));
		}
		}

		return std::min(MaxWidth,specialPatternWidth);
		}

	unsigned LoopVectorizationCostModel::getWidestType() {	unsigned LoopVectorizationCostModel::getWidestType() {
	unsigned MaxWidth = 8;	unsigned MaxWidth = 8;
	const DataLayout &DL = TheFunction->getParent()->getDataLayout();	const DataLayout &DL = TheFunction->getParent()->getDataLayout();
		hfinkelUnsubmitted Not Done Reply Inline Actions Line too long. hfinkel: Line too long.
Context not available.
	// Ignore ephemeral values.	// Ignore ephemeral values.
	if (EphValues.count(it))	if (EphValues.count(it))
	continue;	continue;

		if (Legal->getActionMap()->count(it)) {
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[it];
		if (VF >= SPD.minVF && VF <= SPD.maxVF) {
		if (SPD.Action == LoopVectorizationLegality::SAK_Phi) {
		PHINode *p = dyn_cast<PHINode>(it);
		unsigned PhiCost;
		PhiCost= SpecialPhiCost(p, VF);
		DEBUG(dbgs() << "LV: Found an estimated cost of " << PhiCost
		<< " for VF " << VF <<
		" For special Phi instruction: " << *it << '\n');
		BlockCost += PhiCost;
		}
		continue;
		}
		}
	unsigned C = getInstructionCost(it, VF);	unsigned C = getInstructionCost(it, VF);

	// Check if we should override the cost.	// Check if we should override the cost.
Context not available.
	}	}

	unsigned	unsigned
		LoopVectorizationCostModel::SpecialPhiCost(PHINode *P, unsigned VF) {

		if (Legal->getReductionVars()->count(P)) {
		LoopVectorizationLegality::ReductionDescriptor RdxDesc =
		(*Legal->getReductionVars())[P];
		assert(Legal->getActionMap()->count(P) && "something wrong");
		LoopVectorizationLegality::SpecialPatternDescriptor SPD =
		(*Legal->getActionMap())[P];
		if (VF >= SPD.minVF && VF <= SPD.maxVF) {
		switch (SPD.Kind) {
		case LoopVectorizationLegality::SPK_Sad: {
		LLVMContext &Context = TheLoop->getHeader()->getContext();
		SmallVector<Type *, 4> Tys;
		Tys.push_back(VectorType::get(Type::getInt8Ty(Context), VF));
		Tys.push_back(VectorType::get(Type::getInt8Ty(Context), VF));
		Type *RetTy = Type::getInt32Ty(Context);
		Intrinsic::ID ID = Intrinsic::sad;
		unsigned IntrinsicCost = TTI.getIntrinsicInstrCost(ID, RetTy, Tys);
		return IntrinsicCost;
		} break;

		default:
		break;
		}
		}
		}
		return 0;
		}

		unsigned
	LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {	LoopVectorizationCostModel::getInstructionCost(Instruction *I, unsigned VF) {
	// If we know that this instruction will remain uniform, check the cost of	// If we know that this instruction will remain uniform, check the cost of
	// the scalar version.	// the scalar version.
Context not available.

test/Transforms/LoopVectorize/X86/sad-pattern.ll

				; RUN: opt < %s -loop-vectorize -S \| FileCheck %s

				; ModuleID = '<stdin>'
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; Function Attrs: nounwind readonly uwtable
				; CHECK-LABEL: sad
				; CHECK: call i32 @llvm.sad.v16i8.v16i8
				define i32 @sad(i8* nocapture readonly %pix1, i8* nocapture readonly %pix2) #0 {
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%sum.07 = phi i32 [ 0, %entry ], [ %add, %for.body ]
				%arrayidx = getelementptr inbounds i8, i8* %pix1, i64 %indvars.iv
				%0 = load i8, i8* %arrayidx, align 1
				%conv = zext i8 %0 to i32
				%arrayidx2 = getelementptr inbounds i8, i8* %pix2, i64 %indvars.iv
				%1 = load i8, i8* %arrayidx2, align 1
				%conv3 = zext i8 %1 to i32
				%sub = sub nsw i32 %conv, %conv3
				%cmp4 = icmp sgt i32 %sub, 0
				%sub6 = sub nsw i32 0, %sub
				%cond = select i1 %cmp4, i32 %sub, i32 %sub6
				%add = add nsw i32 %cond, %sum.07
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 16
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret i32 %add
				}