This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/CodeGen/
-
lib/
-
CodeGen/
-
CGBuiltin.cpp
-
CGDebugInfo.cpp
-
CodeGenTypes.cpp
-
llvm/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
VectorUtils.h
-
CodeGen/
-
ValueTypes.h
-
IR/
-
DataLayout.h
-
DerivedTypes.h
-
Instructions.h
-
Support/
-
MachineValueType.h
4/11
TypeSize.h
-
lib/
-
Analysis/
-
InstructionSimplify.cpp
-
VFABIDemangling.cpp
-
ValueTracking.cpp
-
Bitcode/Writer/
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/
-
CodeGenPrepare.cpp
-
SelectionDAG/
-
DAGCombiner.cpp
-
SelectionDAGBuilder.cpp
-
TargetLoweringBase.cpp
-
ValueTypes.cpp
-
IR/
-
AsmWriter.cpp
-
ConstantFold.cpp
-
Constants.cpp
-
Core.cpp
-
DataLayout.cpp
-
Function.cpp
-
IRBuilder.cpp
-
Instructions.cpp
-
IntrinsicInst.cpp
-
Type.cpp
-
Target/AArch64/
-
AArch64/
-
AArch64ISelDAGToDAG.cpp
2
AArch64ISelLowering.cpp
-
Transforms/
-
InstCombine/
-
InstCombineVectorOps.cpp
-
Utils/
-
FunctionComparator.cpp
-
Vectorize/
-
LoopVectorize.cpp
-
VPlan.h
-
VPlan.cpp
-
unittests/
-
CodeGen/
-
ScalableVectorMVTsTest.cpp
-
IR/
-
VectorTypesTest.cpp

Differential D86065

[SVE] Make ElementCount members private
ClosedPublic

Authored by david-arm on Aug 17 2020, 5:41 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
ctetreau
efriedma
fpetrogalli
kmclaughlin
c-rhodes
paulwalker-arm

Commits

rGf4257c5832aa: [SVE] Make ElementCount members private

Summary

This patch changes ElementCount so that the Min and Scalable
members are now private and can only be accessed via the get
functions getKnownMinValue() and isScalable(). This is now inline
with the TypeSize class.

In addition I've added some other member functions for more
commonly used operations. Hopefully this makes the class more
useful and will reduce the need for calling getKnownMinValue().

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

david-arm created this revision.Aug 17 2020, 5:41 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 17 2020, 5:41 AM

Herald added subscribers: llvm-commits, cfe-commits, psnobl and 2 others. · View Herald Transcript

david-arm requested review of this revision.Aug 17 2020, 5:41 AM

Harbormaster completed remote builds in B68592: Diff 285981.Aug 17 2020, 6:13 AM

david-arm added a reviewer: paulwalker-arm.Aug 17 2020, 8:34 AM

fpetrogalli added inline comments.Aug 17 2020, 8:53 AM

llvm/include/llvm/Support/TypeSize.h
61	I think that @ctetreau is right on https://reviews.llvm.org/D85794#inline-793909. We should not overload a comparison operator on this class because the set it represent it cannot be ordered. Chris suggests an approach of writing a static function that can be used as a comparison operator, so that we can make it explicit of what kind of comparison we are doing.

Perhaps now would be a good time to combine TypeSize and ElementCount into a single Polynomial type? We don't have to implement the whole abstraction of c*x^n (since we currently don't use the exponent, and don't distinguish between X's) but if it's ever needed in the future it will be obvious where to add it, and it will Just Work.

llvm/include/llvm/Support/TypeSize.h
61	In C++, it's common to overload the comparison operators for the purposes of being able to std::sort and use ordered sets. Normally, I would be OK with such usages. However, since `ElementCount` is basically a numeric type, and they only have a partial ordering, I think this is dangerous. I'm concerned that this will result in more bugs whereby somebody didn't remember that vectors can be scalable. I don't have a strong opinion what the comparator function should be called, but I strongly prefer that it not be a comparison operator.
345–347	NIT: this can be rewritten without duplicating `EltCnt.getKnownMinValue() * 37U`
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3548	What is this `>>` thing? Some indicator of whitespace change, or is this a hard tab?

fpetrogalli mentioned this in D85794: [llvm][LV] Replace `unsigned VF` with `ElementCount VF` [NFCI].Aug 17 2020, 2:30 PM

vkmr added a subscriber: vkmr.Aug 18 2020, 4:21 AM

Perhaps now would be a good time to combine TypeSize and ElementCount into a single Polynomial type? We don't have to implement the whole abstraction of c*x^n (since we currently don't use the exponent, and don't distinguish between X's) but if it's ever needed in the future it will be obvious where to add it, and it will Just Work.

Even if the types are structurally similar, I'd prefer to keep them separate. The uses are distinct, the usage in a function signature indicates what kind of value we actually expect. (It's particularly easy to perform incorrect conversions with types representing size and alignment.)

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3548	It's an indicator of a whitespace change; I think they started showing up with the recent Phabricator upgrade

Hi @ctetreau, I agree with @efriedma that keeping the two classes distinct for now seems best. The reason is I spent quite a lot of time trying to unify these classes already and I hit a stumbling block - TypeSize has the ugly uint64_t() cast operator, which makes unifying difficult. I didn't want to introduce a templated cast operator that ElementCount would then have too. I also tried making TypeSize derive from a templated parent, but that was pretty ugly too. Perhaps once we've removed the TypeSize -> uint64_t we might be better able to consider it?

llvm/include/llvm/Support/TypeSize.h
61	Hi @ctetreau, yeah I understand. The reason I chose to use operators was simply to be consistent with what we have already in TypeSize. Also, we have existing "==" and "!=" operators in ElementCount too, although these are essentially testing that two ElementCounts are identically the same or not, i.e. for 2 given polynomials (a + bx) and (c + dx) we're essentially asking if both a==c and b==d. If I introduce a new comparison function, I'll probably keep the asserts in for now, but in general we can do better than simply asserting if something is scalable or not. For example, we know that (vscale * 4) is definitely >= 4 because vscale is at least 1. I'm just not sure if we have that need yet.

david-arm updated this revision to Diff 287003.Aug 21 2020, 4:45 AM

paulwalker-arm added inline comments.Aug 21 2020, 5:36 AM

llvm/include/llvm/Support/TypeSize.h
61	I think we should treat the non-equality comparison functions more like floating point. What we don't want is somebody writing !GreaterThan when they actually mean LessThan. Perhaps we should name the functions accordingly (i.e. ogt for OrderedAndGreaterThan). We will also need matching less than functions since I can see those being useful when analysing constant insert/extract element indices which stand a good chance to be a known comparison (with 0 being the most common index).

fpetrogalli added inline comments.Aug 24 2020, 10:09 AM

llvm/include/llvm/Support/TypeSize.h
61	May I suggest the following name scheme? (my 2 c, will not hold the patch for not addressing this comment) static bool [Non]Total<cmp>(...) with `<cmp>` being `LT` -> less than, aka `<` `LToE` -> less than or equal, aka "<=" `GT` -> greater than, aka ">" `GToE` -> greater than or equal, aka ">=" and `Total` , `NonTotal` being the prefix that gives information about the behavior on the value of scalable: `Total` -> for example, all scalable ECs are bigger than fixed ECs. `NonTotal` -> asserting on `(LHS.Scalable == RHS.Scalable)` before returning `LHS.Min <cmp> RHS.Min`. Taking it further: it could also be a template on an enumeration that list the type of comparisons? enum CMPType { TotalGT, NonTotalLT, fancy_one }; ... template <unsigned T> static bool cmp(ElementCount &LHS, ElementCount &RHS ); ... static bool ElementCount::cmp<ElementCount::CMPType::TotalGT>(ElementCount &LHS, ElementCount &RHS ) { /// implementation } static bool ElementCount::cmp<ElementCount::CMPType::fancy_one>(ElementCount &LHS, ElementCount &RHS ) { /// implementation }

Hi @fpetrogalli, if you don't mind I think I'll stick with Paul's idea for ogt because this matches the IR neatly, i.e. "fcmp ogt". Also, for me personally it's much simpler and more intuitive.

In D86065#2235434, @david-arm wrote:

Hi @fpetrogalli, if you don't mind I think I'll stick with Paul's idea for ogt because this matches the IR neatly, i.e. "fcmp ogt". Also, for me personally it's much simpler and more intuitive.

I don't mind at all, please follow Paul's suggestion, mine was just an alternative. Thanks!

Changed comparison function from gt to ogt and added a olt (less than) comparison function too.
Instead of adding the ">>=" operator I've added "/=" instead as I think this is more common. In places where ">>= 1" was used we now do "/= 2".
After rebasing it was necessary to add a "*=" operator too for the Loop Vectorizer.

Herald added a subscriber: rogfer01. · View Herald TranscriptAug 25 2020, 8:14 AM

david-arm marked 2 inline comments as done.Aug 25 2020, 8:15 AM

ctetreau added inline comments.Aug 25 2020, 12:22 PM

llvm/include/llvm/Support/TypeSize.h
61	Honestly, I think this is actually worse. My issue is the fact that, from a mathematical perspective, `vscale_1 * min_1 < vscale_2 * min_2` is a function of `vscale_1` and `vscale_2`. In principle, we can know some ordering relationships between certain element counts (such as `vscale * min_1 >= min_2 = true`), but in general, this function does not make sense. However, an `operator<` is useful because it allows you to put an `ElementCount` into an ordered set, and it will just work. Renaming the function to `olt` just makes it so that you can't put `ElementCount`s into an ordered set, but still implies that `ElementCount`s are comparable in general. This will also blow up if you actually try to mix fixed width and scalable `ElementCount`s in an ordered set, which should work IMO. Here is what I propose: Add a predicate for establishing an arbitrary ordering. The predicate would be completely arbitrary, because it's only useful for establishing an ordering for an ordered set or in a sorting algorithm. It could look something like this: static bool orderedBefore(const ElementCount &LHS, const ElementCount &RHS) { auto l = std::tie(LHS.Scalable, LHS.Min); auto r = std::tie(RHS.Scalable, RHS.Min); return l < r; } don't add any sort of mathematical comparison functions. Code working with `ElementCount` almost certainly either inspects the Scalable field and does something with the Min: if (EC.isScalable()) { unsigned min = EC.getKnownMinValue(); ... // do stuff with min ... or just uses it as a unit: auto * VecTy = VectorType::get(SomeTy, EC); I do not think that having the relation operators on ElementCount would simplify very much code. However, it is very easy to use incorrectly, and if it is ever extended in the future (one machine with two different `vscale` values? `vscale == 0` becoming valid?), it would become even worse. Best just not open that door.

Hi @ctetreau, ok for now I'm going to completely remove the operators and revert the code using those operators to how it was before. I'm not sure what you mean about the predicate functions so I've left those for now, since they aren't needed for this patch. The purpose of this patch was originally supposed to be mechanical anyway - just making members private. I only added the operators as an after-thought really, just to be consistent with how TypeSize dealt with the identical problem. For what it's worth, I believe that GCC solved this exact same problem by adding two types of comparison functions - one set that absolutely wanted an answer to ">,<,>=,<=" and asserted if it wasn't known at compile time, and another set of comparison functions that returned an additional boolean value indicating whether the answer was known or not. Perhaps my knowledge is out of date, but I believe this was the accepted solution and seemed to work well.

david-arm updated this revision to Diff 288260.Aug 27 2020, 3:15 AM

david-arm edited the summary of this revision. (Show Details)

paulwalker-arm added inline comments.Aug 27 2020, 4:19 AM

llvm/include/llvm/Support/TypeSize.h
114	I don't believe this is safe. For example we know SVE supported vector lengths only have to be a multiple of 128bits. So for scalable vectors we cannot know the element count is a power of 2 unless we perform a runtime check.

david-arm added inline comments.Aug 27 2020, 4:33 AM

llvm/include/llvm/Support/TypeSize.h
114	Ok, but if that's true how is code in llvm/lib/CodeGen/TargetLoweringBase.cpp ever safe for scalable vectors? I thought that the question being asked wasn't that the total size was a power of 2, but whether or not it was safe to split the vector. The answer should be the same even if vscale is 3, for example. I thought the problem here is that the legaliser simply needs to know in what way it should break down different types, and that whatever approach it took would work when scaled up. The vector breakdown algorithm relies upon having an answer here - perhaps this is just a case of changing the question and name of function?

I cannot say whether such questions make sense without a deeper investigation, but I can say for certain that EC.isPowerOf2 is a question we cannot answer at compile time. Given this is a mechanical change I would just remove the member function and leave the code as is (well change EC.Min to EC.getKnownMinValue()). We already know that we'll need to visit the places where getKnownMinValue() is used to ensure the question makes sense in the face of scalable vectors.

Removed isPowerOf2() function since this is potentially misleading - it's only the known minimum value that we're checking.
Renamed isEven to isKnownEven to try and make it clear that returning true indicates we know definitely that the total number of elements is even, whereas returning false could mean either the element count is odd or that we don't know.

There's probably a few .Min to .getKnownMinValue() conversions where the .Min could be dropped (calls to Builder.CreateVectorSplat for example) but they can be tidied up as part of a proper activity to reduce the places where getKnownMinValue is called. So other than my suggested updated to EC::operator/ the patch looks good to my eye. Please give other reviewers a little more time to provide other insights.

llvm/include/llvm/Support/TypeSize.h
66	If you add an assert that the divide is lossless (i.e. MIN % RHS == 0) then asserts like: assert(EltCnt.isKnownEven() && "Splitting vector, but not in half!"); are no longer required. Plus those places which are not checking for lossless division will be automatically protected. This feels like a sensible default to me. If somebody wants a truncated result, they can do the maths using getKnownMinValue().

This revision is now accepted and ready to land.Aug 27 2020, 10:34 AM

In D86065#2241146, @david-arm wrote:

Hi @ctetreau, ok for now I'm going to completely remove the operators and revert the code using those operators to how it was before. ...

This is probably for the best.

In D86065#2241146, @david-arm wrote:

... I'm not sure what you mean about the predicate functions ...

I'm referring to providing some built in way to std::sort a collection of ElementCount or have a std::set<ElementCount>. By default, C++ wants to use operator< for this, which I believe was the original motivation for the operator being here in the first place. I think it's reasonable for ElementCount to provide a built-in function to establish an ordering for these purposes, but the function should be named such that nobody thinks the function is intended to be the mathematical relation.

I think this is good to go as is. Assuming @paulwalker-arm is satisfied with leaving operator/ as is, then LGTM.

llvm/include/llvm/Support/TypeSize.h
66	I would prefer that this not be done. This would make this function non-total in an unrecoverable way, and would force everybody to write a bunch of tedious error handling code, even if the normal integer division behavior would have been fine: ElementCount res = LHS.getKnownMinValue() % RHS.getKnownMinValue() == 0 ? LHS / RHS : SomeOtherThing; Everybody knows how integer division works, so I think the lossy behavior will not surprise anybody. An assert might.

david-arm added a child revision: D86697: [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll.Aug 28 2020, 2:17 AM

david-arm mentioned this in D86697: [SVE][CodeGen] Fix TypeSize/ElementCount related warnings in sve-split-load.ll.Aug 28 2020, 2:22 AM

Can't say I agree since people are already writing the ugly code, because the result typically demands different handling or they're asserting the divide doesn't truncate in the first place. That said I'm happy for there to be no assert as long as operator% is implemented so users can calculate the remainder in the expected way.

I'm retracting my operator% request. After thinking about it and speaking with Dave I just cannot see how allowing a total divide is safe for scalable vectors. If you are relying on a truncating divide then special handling is require anyway, which is likely to be different between fixed-length and scalable vectors.

To be more clear, I'm happy to defer the divide conversation for if/when we run into issues so my previous acceptance still stands. It'll be good to get the intent of the patch in (i.e. stoping access to internal class members) asap, plus any follow up work will be a smaller more manageable patch. It's worth talking this through during the next sync call to see it we can get some consensus regarding what maths is and isn't allowed.

Closed by commit rGf4257c5832aa: [SVE] Make ElementCount members private (authored by david-arm). · Explain WhyAug 28 2020, 6:44 AM

This revision was automatically updated to reflect the committed changes.

david-arm added a commit: rGf4257c5832aa: [SVE] Make ElementCount members private.

In D86065#2241146, @david-arm wrote:

Hi @ctetreau, ok for now I'm going to completely remove the operators and revert the code using those operators to how it was before. I'm not sure what you mean about the predicate functions so I've left those for now, since they aren't needed for this patch. The purpose of this patch was originally supposed to be mechanical anyway - just making members private. I only added the operators as an after-thought really, just to be consistent with how TypeSize dealt with the identical problem. For what it's worth, I believe that GCC solved this exact same problem by adding two types of comparison functions - one set that absolutely wanted an answer to ">,<,>=,<=" and asserted if it wasn't known at compile time, and another set of comparison functions that returned an additional boolean value indicating whether the answer was known or not. Perhaps my knowledge is out of date, but I believe this was the accepted solution and seemed to work well.

FWIW, the GCC scheme is to have one set of functions maybe_<cond> that are true if a condition *might* hold (i.e. would hold for one possible value of the runtime indeterminates), and another set of functions known_<cond> (originally must_<cond>) that are true if a condition *always* holds (i.e. would hold for all possible values of the runtime indeterminates). known_le is a partial order but maybe_le is not (because it isn't antisymmetric). Having both is redundant with !, since e.g. known_le is the opposite of maybe_gt, but it seemed more readable to allow every condition to be expressed positively.

Like you say, there is also a test for whether two values are ordered by known_le, and there are also some operations like ordered_min and ordered_max that assert if the values aren't ordered by known_le.

[Sorry for the post-commit comment, but it's related to something that wasn't part of the commit.]

ctetreau mentioned this in D82237: [SVE] Remove calls to VectorType::getNumElements from InstCombine.Aug 28 2020, 12:06 PM

ctetreau mentioned this in D78127: [SVE] Mark VectorType::getNumElements() deprecated.Aug 28 2020, 12:19 PM

david-arm mentioned this in D86894: [SVE] Disable INSERT_SUBVECTOR DAGCombine for scalable vectors.Sep 1 2020, 1:24 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

3 lines

CGDebugInfo.cpp

2 lines

CodeGenTypes.cpp

3 lines

llvm/

include/

llvm/

Analysis/

TargetTransformInfo.h

4 lines

VectorUtils.h

2 lines

CodeGen/

ValueTypes.h

7 lines

IR/

DataLayout.h

6 lines

DerivedTypes.h

15 lines

Instructions.h

5 lines

Support/

MachineValueType.h

10 lines

TypeSize.h

35 lines

lib/

Analysis/

InstructionSimplify.cpp

7 lines

VFABIDemangling.cpp

2 lines

ValueTracking.cpp

3 lines

Bitcode/

Writer/

BitcodeWriter.cpp

2 lines

CodeGen/

CodeGenPrepare.cpp

4 lines

SelectionDAG/

DAGCombiner.cpp

2 lines

SelectionDAGBuilder.cpp

12 lines

TargetLoweringBase.cpp

46 lines

ValueTypes.cpp

10 lines

IR/

4 lines

11 lines

12 lines

2 lines

2 lines

5 lines

2 lines

13 lines

10 lines

8 lines

Target/

AArch64/

AArch64ISelDAGToDAG.cpp

3 lines

AArch64ISelLowering.cpp

26 lines

Transforms/

InstCombine/

InstCombineVectorOps.cpp

6 lines

Utils/

FunctionComparator.cpp

13 lines

Vectorize/

LoopVectorize.cpp

317 lines

VPlan.h

9 lines

VPlan.cpp

21 lines

unittests/

CodeGen/

ScalableVectorMVTsTest.cpp

8 lines

IR/

VectorTypesTest.cpp

10 lines

Diff 288594

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 8,451 Lines • ▼ Show 20 Lines	Value *CodeGenFunction::EmitAArch64SVEBuiltinExpr(unsigned BuiltinID,
case SVE::BI__builtin_sve_svlen_s32:		case SVE::BI__builtin_sve_svlen_s32:
case SVE::BI__builtin_sve_svlen_s64:		case SVE::BI__builtin_sve_svlen_s64:
case SVE::BI__builtin_sve_svlen_u8:		case SVE::BI__builtin_sve_svlen_u8:
case SVE::BI__builtin_sve_svlen_u16:		case SVE::BI__builtin_sve_svlen_u16:
case SVE::BI__builtin_sve_svlen_u32:		case SVE::BI__builtin_sve_svlen_u32:
case SVE::BI__builtin_sve_svlen_u64: {		case SVE::BI__builtin_sve_svlen_u64: {
SVETypeFlags TF(Builtin->TypeModifier);		SVETypeFlags TF(Builtin->TypeModifier);
auto VTy = cast<llvm::VectorType>(getSVEType(TF));		auto VTy = cast<llvm::VectorType>(getSVEType(TF));
auto NumEls = llvm::ConstantInt::get(Ty, VTy->getElementCount().Min);		auto *NumEls =
		llvm::ConstantInt::get(Ty, VTy->getElementCount().getKnownMinValue());

Function *F = CGM.getIntrinsic(Intrinsic::vscale, Ty);		Function *F = CGM.getIntrinsic(Intrinsic::vscale, Ty);
return Builder.CreateMul(NumEls, Builder.CreateCall(F));		return Builder.CreateMul(NumEls, Builder.CreateCall(F));
}		}

case SVE::BI__builtin_sve_svtbl2_u8:		case SVE::BI__builtin_sve_svtbl2_u8:
case SVE::BI__builtin_sve_svtbl2_s8:		case SVE::BI__builtin_sve_svtbl2_s8:
case SVE::BI__builtin_sve_svtbl2_u16:		case SVE::BI__builtin_sve_svtbl2_u16:
▲ Show 20 Lines • Show All 8,296 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGDebugInfo.cpp

Show First 20 Lines • Show All 720 Lines • ▼ Show 20 Lines	case BuiltinType::Id: \
return getOrCreateStructPtrType("opencl_" #ExtType, Id##Ty);		return getOrCreateStructPtrType("opencl_" #ExtType, Id##Ty);
#include "clang/Basic/OpenCLExtensionTypes.def"		#include "clang/Basic/OpenCLExtensionTypes.def"

#define SVE_TYPE(Name, Id, SingletonId) case BuiltinType::Id:		#define SVE_TYPE(Name, Id, SingletonId) case BuiltinType::Id:
#include "clang/Basic/AArch64SVEACLETypes.def"		#include "clang/Basic/AArch64SVEACLETypes.def"
{		{
ASTContext::BuiltinVectorTypeInfo Info =		ASTContext::BuiltinVectorTypeInfo Info =
CGM.getContext().getBuiltinVectorTypeInfo(BT);		CGM.getContext().getBuiltinVectorTypeInfo(BT);
unsigned NumElemsPerVG = (Info.EC.Min * Info.NumVectors) / 2;		unsigned NumElemsPerVG = (Info.EC.getKnownMinValue() * Info.NumVectors) / 2;

// Debuggers can't extract 1bit from a vector, so will display a		// Debuggers can't extract 1bit from a vector, so will display a
// bitpattern for svbool_t instead.		// bitpattern for svbool_t instead.
if (Info.ElementType == CGM.getContext().BoolTy) {		if (Info.ElementType == CGM.getContext().BoolTy) {
NumElemsPerVG /= 8;		NumElemsPerVG /= 8;
Info.ElementType = CGM.getContext().UnsignedCharTy;		Info.ElementType = CGM.getContext().UnsignedCharTy;
}		}

▲ Show 20 Lines • Show All 4,290 Lines • Show Last 20 Lines

clang/lib/CodeGen/CodeGenTypes.cpp

Show First 20 Lines • Show All 580 Lines • ▼ Show 20 Lines	#include "clang/Basic/OpenCLExtensionTypes.def"
case BuiltinType::SveFloat64x4:		case BuiltinType::SveFloat64x4:
case BuiltinType::SveBFloat16:		case BuiltinType::SveBFloat16:
case BuiltinType::SveBFloat16x2:		case BuiltinType::SveBFloat16x2:
case BuiltinType::SveBFloat16x3:		case BuiltinType::SveBFloat16x3:
case BuiltinType::SveBFloat16x4: {		case BuiltinType::SveBFloat16x4: {
ASTContext::BuiltinVectorTypeInfo Info =		ASTContext::BuiltinVectorTypeInfo Info =
Context.getBuiltinVectorTypeInfo(cast<BuiltinType>(Ty));		Context.getBuiltinVectorTypeInfo(cast<BuiltinType>(Ty));
return llvm::ScalableVectorType::get(ConvertType(Info.ElementType),		return llvm::ScalableVectorType::get(ConvertType(Info.ElementType),
Info.EC.Min * Info.NumVectors);		Info.EC.getKnownMinValue() *
		Info.NumVectors);
}		}
case BuiltinType::Dependent:		case BuiltinType::Dependent:
#define BUILTIN_TYPE(Id, SingletonId)		#define BUILTIN_TYPE(Id, SingletonId)
#define PLACEHOLDER_TYPE(Id, SingletonId) \		#define PLACEHOLDER_TYPE(Id, SingletonId) \
case BuiltinType::Id:		case BuiltinType::Id:
#include "clang/AST/BuiltinTypes.def"		#include "clang/AST/BuiltinTypes.def"
llvm_unreachable("Unexpected placeholder builtin type!");		llvm_unreachable("Unexpected placeholder builtin type!");
}		}
▲ Show 20 Lines • Show All 307 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	public:
IntrinsicCostAttributes(const IntrinsicInst &I);		IntrinsicCostAttributes(const IntrinsicInst &I);

IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI);		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI);

IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,
unsigned Factor);		unsigned Factor);
IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,
ElementCount Factor)		ElementCount Factor)
: IntrinsicCostAttributes(Id, CI, Factor.Min) {		: IntrinsicCostAttributes(Id, CI, Factor.getKnownMinValue()) {
assert(!Factor.Scalable);		assert(!Factor.isScalable());
}		}

IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,		IntrinsicCostAttributes(Intrinsic::ID Id, const CallBase &CI,
unsigned Factor, unsigned ScalarCost);		unsigned Factor, unsigned ScalarCost);

IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,		IntrinsicCostAttributes(Intrinsic::ID Id, Type *RTy,
ArrayRef<Type *> Tys, FastMathFlags Flags);		ArrayRef<Type *> Tys, FastMathFlags Flags);

▲ Show 20 Lines • Show All 2,075 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/VectorUtils.h

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	struct VFShape {
static VFShape get(const CallInst &CI, ElementCount EC, bool HasGlobalPred) {		static VFShape get(const CallInst &CI, ElementCount EC, bool HasGlobalPred) {
SmallVector<VFParameter, 8> Parameters;		SmallVector<VFParameter, 8> Parameters;
for (unsigned I = 0; I < CI.arg_size(); ++I)		for (unsigned I = 0; I < CI.arg_size(); ++I)
Parameters.push_back(VFParameter({I, VFParamKind::Vector}));		Parameters.push_back(VFParameter({I, VFParamKind::Vector}));
if (HasGlobalPred)		if (HasGlobalPred)
Parameters.push_back(		Parameters.push_back(
VFParameter({CI.arg_size(), VFParamKind::GlobalPredicate}));		VFParameter({CI.arg_size(), VFParamKind::GlobalPredicate}));

return {EC.Min, EC.Scalable, Parameters};		return {EC.getKnownMinValue(), EC.isScalable(), Parameters};
}		}
/// Sanity check on the Parameters in the VFShape.		/// Sanity check on the Parameters in the VFShape.
bool hasValidParameterList() const;		bool hasValidParameterList() const;
};		};

/// Holds the VFShape for a specific scalar to vector function mapping.		/// Holds the VFShape for a specific scalar to vector function mapping.
struct VFInfo {		struct VFInfo {
VFShape Shape; /// Classification of the vector function.		VFShape Shape; /// Classification of the vector function.
▲ Show 20 Lines • Show All 828 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ValueTypes.h

Show First 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	ElementCount getVectorElementCount() const {
if (isSimple())		if (isSimple())
return V.getVectorElementCount();		return V.getVectorElementCount();

return getExtendedVectorElementCount();		return getExtendedVectorElementCount();
}		}

/// Given a vector type, return the minimum number of elements it contains.		/// Given a vector type, return the minimum number of elements it contains.
unsigned getVectorMinNumElements() const {		unsigned getVectorMinNumElements() const {
return getVectorElementCount().Min;		return getVectorElementCount().getKnownMinValue();
}		}

/// Return the size of the specified value type in bits.		/// Return the size of the specified value type in bits.
///		///
/// If the value type is a scalable vector type, the scalable property will		/// If the value type is a scalable vector type, the scalable property will
/// be set and the runtime size will be a positive integer multiple of the		/// be set and the runtime size will be a positive integer multiple of the
/// base size.		/// base size.
TypeSize getSizeInBits() const {		TypeSize getSizeInBits() const {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	#endif
}		}

// Return a VT for a vector type with the same element type but		// Return a VT for a vector type with the same element type but
// half the number of elements. The type returned may be an		// half the number of elements. The type returned may be an
// extended type.		// extended type.
EVT getHalfNumVectorElementsVT(LLVMContext &Context) const {		EVT getHalfNumVectorElementsVT(LLVMContext &Context) const {
EVT EltVT = getVectorElementType();		EVT EltVT = getVectorElementType();
auto EltCnt = getVectorElementCount();		auto EltCnt = getVectorElementCount();
assert(!(EltCnt.Min & 1) && "Splitting vector, but not in half!");		assert(EltCnt.isKnownEven() && "Splitting vector, but not in half!");
return EVT::getVectorVT(Context, EltVT, EltCnt / 2);		return EVT::getVectorVT(Context, EltVT, EltCnt / 2);
}		}

/// Returns true if the given vector is a power of 2.		/// Returns true if the given vector is a power of 2.
bool isPow2VectorType() const {		bool isPow2VectorType() const {
unsigned NElts = getVectorMinNumElements();		unsigned NElts = getVectorMinNumElements();
return !(NElts & (NElts - 1));		return !(NElts & (NElts - 1));
}		}

/// Widens the length of the given vector EVT up to the nearest power of 2		/// Widens the length of the given vector EVT up to the nearest power of 2
/// and returns that type.		/// and returns that type.
EVT getPow2VectorType(LLVMContext &Context) const {		EVT getPow2VectorType(LLVMContext &Context) const {
if (!isPow2VectorType()) {		if (!isPow2VectorType()) {
ElementCount NElts = getVectorElementCount();		ElementCount NElts = getVectorElementCount();
NElts.Min = 1 << Log2_32_Ceil(NElts.Min);		unsigned NewMinCount = 1 << Log2_32_Ceil(NElts.getKnownMinValue());
		NElts = ElementCount::get(NewMinCount, NElts.isScalable());
return EVT::getVectorVT(Context, getVectorElementType(), NElts);		return EVT::getVectorVT(Context, getVectorElementType(), NElts);
}		}
else {		else {
return *this;		return *this;
}		}
}		}

/// This function returns value type as a string, e.g. "i32".		/// This function returns value type as a string, e.g. "i32".
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/include/llvm/IR/DataLayout.h

Show First 20 Lines • Show All 690 Lines • ▼ Show 20 Lines	inline TypeSize DataLayout::getTypeSizeInBits(Type *Ty) const {
// In memory objects this is always aligned to a higher boundary, but		// In memory objects this is always aligned to a higher boundary, but
// only 80 bits contain information.		// only 80 bits contain information.
case Type::X86_FP80TyID:		case Type::X86_FP80TyID:
return TypeSize::Fixed(80);		return TypeSize::Fixed(80);
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VTy = cast<VectorType>(Ty);		VectorType *VTy = cast<VectorType>(Ty);
auto EltCnt = VTy->getElementCount();		auto EltCnt = VTy->getElementCount();
uint64_t MinBits = EltCnt.Min *		uint64_t MinBits = EltCnt.getKnownMinValue() *
getTypeSizeInBits(VTy->getElementType()).getFixedSize();		getTypeSizeInBits(VTy->getElementType()).getFixedSize();
return TypeSize(MinBits, EltCnt.Scalable);		return TypeSize(MinBits, EltCnt.isScalable());
}		}
default:		default:
llvm_unreachable("DataLayout::getTypeSizeInBits(): Unsupported type");		llvm_unreachable("DataLayout::getTypeSizeInBits(): Unsupported type");
}		}
}		}

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_DATALAYOUT_H		#endif // LLVM_IR_DATALAYOUT_H

llvm/include/llvm/IR/DerivedTypes.h

Show First 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	public:
VectorType &operator=(const VectorType &) = delete;		VectorType &operator=(const VectorType &) = delete;

/// Get the number of elements in this vector. It does not make sense to call		/// Get the number of elements in this vector. It does not make sense to call
/// this function on a scalable vector, and this will be moved into		/// this function on a scalable vector, and this will be moved into
/// FixedVectorType in a future commit		/// FixedVectorType in a future commit
unsigned getNumElements() const {		unsigned getNumElements() const {
ElementCount EC = getElementCount();		ElementCount EC = getElementCount();
#ifdef STRICT_FIXED_SIZE_VECTORS		#ifdef STRICT_FIXED_SIZE_VECTORS
assert(!EC.Scalable &&		assert(!EC.isScalable() &&
"Request for fixed number of elements from scalable vector");		"Request for fixed number of elements from scalable vector");
return EC.Min;		return EC.getKnownMinValue();
#else		#else
if (EC.Scalable)		if (EC.isScalable())
WithColor::warning()		WithColor::warning()
<< "The code that requested the fixed number of elements has made "		<< "The code that requested the fixed number of elements has made "
"the assumption that this vector is not scalable. This assumption "		"the assumption that this vector is not scalable. This assumption "
"was not correct, and this may lead to broken code\n";		"was not correct, and this may lead to broken code\n";
return EC.Min;		return EC.getKnownMinValue();
#endif		#endif
}		}

Type *getElementType() const { return ContainedType; }		Type *getElementType() const { return ContainedType; }

/// This static method is the primary way to construct an VectorType.		/// This static method is the primary way to construct an VectorType.
static VectorType get(Type ElementType, ElementCount EC);		static VectorType get(Type ElementType, ElementCount EC);

▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	static VectorType getSubdividedVectorType(VectorType VTy, int NumSubdivs) {
}		}
return VTy;		return VTy;
}		}

/// This static method returns a VectorType with half as many elements as the		/// This static method returns a VectorType with half as many elements as the
/// input type and the same element type.		/// input type and the same element type.
static VectorType getHalfElementsVectorType(VectorType VTy) {		static VectorType getHalfElementsVectorType(VectorType VTy) {
auto EltCnt = VTy->getElementCount();		auto EltCnt = VTy->getElementCount();
assert ((EltCnt.Min & 1) == 0 &&		assert(EltCnt.isKnownEven() &&
"Cannot halve vector with odd number of elements.");		"Cannot halve vector with odd number of elements.");
return VectorType::get(VTy->getElementType(), EltCnt/2);		return VectorType::get(VTy->getElementType(), EltCnt/2);
}		}

/// This static method returns a VectorType with twice as many elements as the		/// This static method returns a VectorType with twice as many elements as the
/// input type and the same element type.		/// input type and the same element type.
static VectorType getDoubleElementsVectorType(VectorType VTy) {		static VectorType getDoubleElementsVectorType(VectorType VTy) {
auto EltCnt = VTy->getElementCount();		auto EltCnt = VTy->getElementCount();
assert((EltCnt.Min * 2ull) <= UINT_MAX && "Too many elements in vector");		assert((EltCnt.getKnownMinValue() * 2ull) <= UINT_MAX &&
		"Too many elements in vector");
return VectorType::get(VTy->getElementType(), EltCnt * 2);		return VectorType::get(VTy->getElementType(), EltCnt * 2);
}		}

/// Return true if the specified type is valid as a element type.		/// Return true if the specified type is valid as a element type.
static bool isValidElementType(Type *ElemTy);		static bool isValidElementType(Type *ElemTy);

/// Return an ElementCount instance to represent the (possibly scalable)		/// Return an ElementCount instance to represent the (possibly scalable)
/// number of elements in the vector.		/// number of elements in the vector.
▲ Show 20 Lines • Show All 178 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Instructions.h

Show First 20 Lines • Show All 2,040 Lines • ▼ Show 20 Lines	public:

ArrayRef<int> getShuffleMask() const { return ShuffleMask; }		ArrayRef<int> getShuffleMask() const { return ShuffleMask; }

/// Return true if this shuffle returns a vector with a different number of		/// Return true if this shuffle returns a vector with a different number of
/// elements than its source vectors.		/// elements than its source vectors.
/// Examples: shufflevector <4 x n> A, <4 x n> B, <1,2,3>		/// Examples: shufflevector <4 x n> A, <4 x n> B, <1,2,3>
/// shufflevector <4 x n> A, <4 x n> B, <1,2,3,4,5>		/// shufflevector <4 x n> A, <4 x n> B, <1,2,3,4,5>
bool changesLength() const {		bool changesLength() const {
unsigned NumSourceElts =		unsigned NumSourceElts = cast<VectorType>(Op<0>()->getType())
cast<VectorType>(Op<0>()->getType())->getElementCount().Min;		->getElementCount()
		.getKnownMinValue();
unsigned NumMaskElts = ShuffleMask.size();		unsigned NumMaskElts = ShuffleMask.size();
return NumSourceElts != NumMaskElts;		return NumSourceElts != NumMaskElts;
}		}

/// Return true if this shuffle returns a vector with a greater number of		/// Return true if this shuffle returns a vector with a greater number of
/// elements than its source vectors.		/// elements than its source vectors.
/// Example: shufflevector <2 x n> A, <2 x n> B, <1,2,3>		/// Example: shufflevector <2 x n> A, <2 x n> B, <1,2,3>
bool increasesLength() const {		bool increasesLength() const {
▲ Show 20 Lines • Show All 3,239 Lines • Show Last 20 Lines

llvm/include/llvm/Support/MachineValueType.h

Show First 20 Lines • Show All 418 Lines • ▼ Show 20 Lines	bool isOverloaded() const {
SimpleTy == MVT::iPTRAny);		SimpleTy == MVT::iPTRAny);
}		}

/// Return a VT for a vector type with the same element type but		/// Return a VT for a vector type with the same element type but
/// half the number of elements.		/// half the number of elements.
MVT getHalfNumVectorElementsVT() const {		MVT getHalfNumVectorElementsVT() const {
MVT EltVT = getVectorElementType();		MVT EltVT = getVectorElementType();
auto EltCnt = getVectorElementCount();		auto EltCnt = getVectorElementCount();
assert(!(EltCnt.Min & 1) && "Splitting vector, but not in half!");		assert(EltCnt.isKnownEven() && "Splitting vector, but not in half!");
return getVectorVT(EltVT, EltCnt / 2);		return getVectorVT(EltVT, EltCnt / 2);
}		}

/// Returns true if the given vector is a power of 2.		/// Returns true if the given vector is a power of 2.
bool isPow2VectorType() const {		bool isPow2VectorType() const {
unsigned NElts = getVectorNumElements();		unsigned NElts = getVectorNumElements();
return !(NElts & (NElts - 1));		return !(NElts & (NElts - 1));
}		}
▲ Show 20 Lines • Show All 301 Lines • ▼ Show 20 Lines	public:
}		}

ElementCount getVectorElementCount() const {		ElementCount getVectorElementCount() const {
return ElementCount::get(getVectorNumElements(), isScalableVector());		return ElementCount::get(getVectorNumElements(), isScalableVector());
}		}

/// Given a vector type, return the minimum number of elements it contains.		/// Given a vector type, return the minimum number of elements it contains.
unsigned getVectorMinNumElements() const {		unsigned getVectorMinNumElements() const {
return getVectorElementCount().Min;		return getVectorElementCount().getKnownMinValue();
}		}

/// Returns the size of the specified MVT in bits.		/// Returns the size of the specified MVT in bits.
///		///
/// If the value type is a scalable vector type, the scalable property will		/// If the value type is a scalable vector type, the scalable property will
/// be set and the runtime size will be a positive integer multiple of the		/// be set and the runtime size will be a positive integer multiple of the
/// base size.		/// base size.
TypeSize getSizeInBits() const {		TypeSize getSizeInBits() const {
▲ Show 20 Lines • Show All 448 Lines • ▼ Show 20 Lines

static MVT getVectorVT(MVT VT, unsigned NumElements, bool IsScalable) {		static MVT getVectorVT(MVT VT, unsigned NumElements, bool IsScalable) {
if (IsScalable)		if (IsScalable)
return getScalableVectorVT(VT, NumElements);		return getScalableVectorVT(VT, NumElements);
return getVectorVT(VT, NumElements);		return getVectorVT(VT, NumElements);
}		}

static MVT getVectorVT(MVT VT, ElementCount EC) {		static MVT getVectorVT(MVT VT, ElementCount EC) {
if (EC.Scalable)		if (EC.isScalable())
return getScalableVectorVT(VT, EC.Min);		return getScalableVectorVT(VT, EC.getKnownMinValue());
return getVectorVT(VT, EC.Min);		return getVectorVT(VT, EC.getKnownMinValue());
}		}

/// Return the value type corresponding to the specified type. This returns		/// Return the value type corresponding to the specified type. This returns
/// all pointers as iPTR. If HandleUnknown is true, unknown types are		/// all pointers as iPTR. If HandleUnknown is true, unknown types are
/// returned as Other, otherwise they are invalid.		/// returned as Other, otherwise they are invalid.
static MVT getVT(Type *Ty, bool HandleUnknown = false);		static MVT getVT(Type *Ty, bool HandleUnknown = false);

private:		private:
▲ Show 20 Lines • Show All 83 Lines • Show Last 20 Lines

llvm/include/llvm/Support/TypeSize.h

	Show All 21 Lines
	#include <cassert>			#include <cassert>

	namespace llvm {			namespace llvm {

	template <typename T> struct DenseMapInfo;			template <typename T> struct DenseMapInfo;

	class ElementCount {			class ElementCount {
	private:			private:
				unsigned Min; // Minimum number of vector elements.
				bool Scalable; // If true, NumElements is a multiple of 'Min' determined
				// at runtime rather than compile time.

	/// Prevent code from using initializer-list contructors like			/// Prevent code from using initializer-list contructors like
	/// ElementCount EC = {<unsigned>, <bool>}. The static `get*`			/// ElementCount EC = {<unsigned>, <bool>}. The static `get*`
	/// methods below are preferred, as users should always make a			/// methods below are preferred, as users should always make a
	/// conscious choice on the type of `ElementCount` they are			/// conscious choice on the type of `ElementCount` they are
	/// requesting.			/// requesting.
	ElementCount(unsigned Min, bool Scalable) : Min(Min), Scalable(Scalable) {}			ElementCount(unsigned Min, bool Scalable) : Min(Min), Scalable(Scalable) {}

	public:			public:
	unsigned Min; // Minimum number of vector elements.
	bool Scalable; // If true, NumElements is a multiple of 'Min' determined
	// at runtime rather than compile time.

	ElementCount() = default;			ElementCount() = default;

	ElementCount operator*(unsigned RHS) {			ElementCount operator*(unsigned RHS) {
	return { Min * RHS, Scalable };			return { Min * RHS, Scalable };
	}			}
	ElementCount operator/(unsigned RHS) {			ElementCount operator/(unsigned RHS) {
	assert(Min % RHS == 0 && "Min is not a multiple of RHS.");			assert(Min % RHS == 0 && "Min is not a multiple of RHS.");
	return { Min / RHS, Scalable };			return { Min / RHS, Scalable };
	}			}

	bool operator==(const ElementCount& RHS) const {			bool operator==(const ElementCount& RHS) const {
	return Min == RHS.Min && Scalable == RHS.Scalable;			return Min == RHS.Min && Scalable == RHS.Scalable;
	}			}
	bool operator!=(const ElementCount& RHS) const {			bool operator!=(const ElementCount& RHS) const {
	return !(*this == RHS);			return !(*this == RHS);
	}			}
	bool operator==(unsigned RHS) const { return Min == RHS && !Scalable; }			bool operator==(unsigned RHS) const { return Min == RHS && !Scalable; }
	bool operator!=(unsigned RHS) const { return !(*this == RHS); }			bool operator!=(unsigned RHS) const { return !(*this == RHS); }

				ElementCount &operator*=(unsigned RHS) {
				fpetrogalliUnsubmitted Not Done Reply Inline Actions I think that @ctetreau is right on https://reviews.llvm.org/D85794#inline-793909. We should not overload a comparison operator on this class because the set it represent it cannot be ordered. Chris suggests an approach of writing a static function that can be used as a comparison operator, so that we can make it explicit of what kind of comparison we are doing. fpetrogalli: I think that @ctetreau is right on https://reviews.llvm.org/D85794#inline-793909. We should not…
				ctetreauUnsubmitted Not Done Reply Inline Actions In C++, it's common to overload the comparison operators for the purposes of being able to std::sort and use ordered sets. Normally, I would be OK with such usages. However, since `ElementCount` is basically a numeric type, and they only have a partial ordering, I think this is dangerous. I'm concerned that this will result in more bugs whereby somebody didn't remember that vectors can be scalable. I don't have a strong opinion what the comparator function should be called, but I strongly prefer that it not be a comparison operator. ctetreau: In C++, it's common to overload the comparison operators for the purposes of being able to std…
				david-armAuthorUnsubmitted Done Reply Inline Actions Hi @ctetreau, yeah I understand. The reason I chose to use operators was simply to be consistent with what we have already in TypeSize. Also, we have existing "==" and "!=" operators in ElementCount too, although these are essentially testing that two ElementCounts are identically the same or not, i.e. for 2 given polynomials (a + bx) and (c + dx) we're essentially asking if both a==c and b==d. If I introduce a new comparison function, I'll probably keep the asserts in for now, but in general we can do better than simply asserting if something is scalable or not. For example, we know that (vscale * 4) is definitely >= 4 because vscale is at least 1. I'm just not sure if we have that need yet. david-arm: Hi @ctetreau, yeah I understand. The reason I chose to use operators was simply to be…
				paulwalker-armUnsubmitted Done Reply Inline Actions I think we should treat the non-equality comparison functions more like floating point. What we don't want is somebody writing !GreaterThan when they actually mean LessThan. Perhaps we should name the functions accordingly (i.e. ogt for OrderedAndGreaterThan). We will also need matching less than functions since I can see those being useful when analysing constant insert/extract element indices which stand a good chance to be a known comparison (with 0 being the most common index). paulwalker-arm: I think we should treat the non-equality comparison functions more like floating point. What…
				fpetrogalliUnsubmitted Done Reply Inline Actions May I suggest the following name scheme? (my 2 c, will not hold the patch for not addressing this comment) static bool [Non]Total<cmp>(...) with `<cmp>` being `LT` -> less than, aka `<` `LToE` -> less than or equal, aka "<=" `GT` -> greater than, aka ">" `GToE` -> greater than or equal, aka ">=" and `Total` , `NonTotal` being the prefix that gives information about the behavior on the value of scalable: `Total` -> for example, all scalable ECs are bigger than fixed ECs. `NonTotal` -> asserting on `(LHS.Scalable == RHS.Scalable)` before returning `LHS.Min <cmp> RHS.Min`. Taking it further: it could also be a template on an enumeration that list the type of comparisons? enum CMPType { TotalGT, NonTotalLT, fancy_one }; ... template <unsigned T> static bool cmp(ElementCount &LHS, ElementCount &RHS ); ... static bool ElementCount::cmp<ElementCount::CMPType::TotalGT>(ElementCount &LHS, ElementCount &RHS ) { /// implementation } static bool ElementCount::cmp<ElementCount::CMPType::fancy_one>(ElementCount &LHS, ElementCount &RHS ) { /// implementation } fpetrogalli: May I suggest the following name scheme? (my 2 c, will not hold the patch for not addressing…
				ctetreauUnsubmitted Not Done Reply Inline Actions Honestly, I think this is actually worse. My issue is the fact that, from a mathematical perspective, `vscale_1 * min_1 < vscale_2 * min_2` is a function of `vscale_1` and `vscale_2`. In principle, we can know some ordering relationships between certain element counts (such as `vscale * min_1 >= min_2 = true`), but in general, this function does not make sense. However, an `operator<` is useful because it allows you to put an `ElementCount` into an ordered set, and it will just work. Renaming the function to `olt` just makes it so that you can't put `ElementCount`s into an ordered set, but still implies that `ElementCount`s are comparable in general. This will also blow up if you actually try to mix fixed width and scalable `ElementCount`s in an ordered set, which should work IMO. Here is what I propose: Add a predicate for establishing an arbitrary ordering. The predicate would be completely arbitrary, because it's only useful for establishing an ordering for an ordered set or in a sorting algorithm. It could look something like this: static bool orderedBefore(const ElementCount &LHS, const ElementCount &RHS) { auto l = std::tie(LHS.Scalable, LHS.Min); auto r = std::tie(RHS.Scalable, RHS.Min); return l < r; } don't add any sort of mathematical comparison functions. Code working with `ElementCount` almost certainly either inspects the Scalable field and does something with the Min: if (EC.isScalable()) { unsigned min = EC.getKnownMinValue(); ... // do stuff with min ... or just uses it as a unit: auto * VecTy = VectorType::get(SomeTy, EC); I do not think that having the relation operators on ElementCount would simplify very much code. However, it is very easy to use incorrectly, and if it is ever extended in the future (one machine with two different `vscale` values? `vscale == 0` becoming valid?), it would become even worse. Best just not open that door. ctetreau: Honestly, I think this is actually worse. My issue is the fact that, from a mathematical…
				Min *= RHS;
				return *this;
				}

				ElementCount &operator/=(unsigned RHS) {
				paulwalker-armUnsubmitted Not Done Reply Inline Actions If you add an assert that the divide is lossless (i.e. MIN % RHS == 0) then asserts like: assert(EltCnt.isKnownEven() && "Splitting vector, but not in half!"); are no longer required. Plus those places which are not checking for lossless division will be automatically protected. This feels like a sensible default to me. If somebody wants a truncated result, they can do the maths using getKnownMinValue(). paulwalker-arm: If you add an assert that the divide is lossless (i.e. MIN % RHS == 0) then asserts like: ```…
				ctetreauUnsubmitted Not Done Reply Inline Actions I would prefer that this not be done. This would make this function non-total in an unrecoverable way, and would force everybody to write a bunch of tedious error handling code, even if the normal integer division behavior would have been fine: ElementCount res = LHS.getKnownMinValue() % RHS.getKnownMinValue() == 0 ? LHS / RHS : SomeOtherThing; Everybody knows how integer division works, so I think the lossy behavior will not surprise anybody. An assert might. ctetreau: I would prefer that this not be done. This would make this function non-total in an…
				Min /= RHS;
				return *this;
				}

	ElementCount NextPowerOf2() const {			ElementCount NextPowerOf2() const {
	return {(unsigned)llvm::NextPowerOf2(Min), Scalable};			return {(unsigned)llvm::NextPowerOf2(Min), Scalable};
	}			}

	static ElementCount getFixed(unsigned Min) { return {Min, false}; }			static ElementCount getFixed(unsigned Min) { return {Min, false}; }
	static ElementCount getScalable(unsigned Min) { return {Min, true}; }			static ElementCount getScalable(unsigned Min) { return {Min, true}; }
	static ElementCount get(unsigned Min, bool Scalable) {			static ElementCount get(unsigned Min, bool Scalable) {
	return {Min, Scalable};			return {Min, Scalable};
	}			}

	/// Printing function.			/// Printing function.
	void print(raw_ostream &OS) const {			void print(raw_ostream &OS) const {
	if (Scalable)			if (Scalable)
	OS << "vscale x ";			OS << "vscale x ";
	OS << Min;			OS << Min;
	}			}
	/// Counting predicates.			/// Counting predicates.
	///			///
	/// Notice that Min = 1 and Scalable = true is considered more than			/// Notice that Min = 1 and Scalable = true is considered more than
	/// one element.			/// one element.
	///			///
	///@{ No elements..			///@{ No elements..
	bool isZero() const { return Min == 0; }			bool isZero() const { return Min == 0; }
				/// At least one element.
				bool isNonZero() const { return Min != 0; }
				/// A return value of true indicates we know at compile time that the number
				/// of elements (vscale * Min) is definitely even. However, returning false
				/// does not guarantee that the total number of elements is odd.
				bool isKnownEven() const { return (Min & 0x1) == 0; }
	/// Exactly one element.			/// Exactly one element.
	bool isScalar() const { return !Scalable && Min == 1; }			bool isScalar() const { return !Scalable && Min == 1; }
	/// One or more elements.			/// One or more elements.
	bool isVector() const { return (Scalable && Min != 0) \|\| Min > 1; }			bool isVector() const { return (Scalable && Min != 0) \|\| Min > 1; }
	///@}			///@}

				unsigned getKnownMinValue() const { return Min; }

				bool isScalable() const { return Scalable; }
	};			};

	/// Stream operator function for `ElementCount`.			/// Stream operator function for `ElementCount`.
	inline raw_ostream &operator<<(raw_ostream &OS, const ElementCount &EC) {			inline raw_ostream &operator<<(raw_ostream &OS, const ElementCount &EC) {
	EC.print(OS);			EC.print(OS);
	return OS;			return OS;
				paulwalker-armUnsubmitted Not Done Reply Inline Actions I don't believe this is safe. For example we know SVE supported vector lengths only have to be a multiple of 128bits. So for scalable vectors we cannot know the element count is a power of 2 unless we perform a runtime check. paulwalker-arm: I don't believe this is safe. For example we know SVE supported vector lengths only have to be…
				david-armAuthorUnsubmitted Done Reply Inline Actions Ok, but if that's true how is code in llvm/lib/CodeGen/TargetLoweringBase.cpp ever safe for scalable vectors? I thought that the question being asked wasn't that the total size was a power of 2, but whether or not it was safe to split the vector. The answer should be the same even if vscale is 3, for example. I thought the problem here is that the legaliser simply needs to know in what way it should break down different types, and that whatever approach it took would work when scaled up. The vector breakdown algorithm relies upon having an answer here - perhaps this is just a case of changing the question and name of function? david-arm: Ok, but if that's true how is code in llvm/lib/CodeGen/TargetLoweringBase.cpp ever safe for…
	}			}

	// This class is used to represent the size of types. If the type is of fixed			// This class is used to represent the size of types. If the type is of fixed
	// size, it will represent the exact size. If the type is a scalable vector,			// size, it will represent the exact size. If the type is a scalable vector,
	// it will represent the known minimum size.			// it will represent the known minimum size.
	class TypeSize {			class TypeSize {
	uint64_t MinSize; // The known minimum size.			uint64_t MinSize; // The known minimum size.
	bool IsScalable; // If true, then the runtime size is an integer multiple			bool IsScalable; // If true, then the runtime size is an integer multiple
	▲ Show 20 Lines • Show All 214 Lines • ▼ Show 20 Lines
	template <> struct DenseMapInfo<ElementCount> {			template <> struct DenseMapInfo<ElementCount> {
	static inline ElementCount getEmptyKey() {			static inline ElementCount getEmptyKey() {
	return ElementCount::getScalable(~0U);			return ElementCount::getScalable(~0U);
	}			}
	static inline ElementCount getTombstoneKey() {			static inline ElementCount getTombstoneKey() {
	return ElementCount::getFixed(~0U - 1);			return ElementCount::getFixed(~0U - 1);
	}			}
	static unsigned getHashValue(const ElementCount& EltCnt) {			static unsigned getHashValue(const ElementCount& EltCnt) {
	if (EltCnt.Scalable)			unsigned HashVal = EltCnt.getKnownMinValue() * 37U;
	return (EltCnt.Min * 37U) - 1U;			if (EltCnt.isScalable())
				return (HashVal - 1U);
				ctetreauUnsubmitted Not Done Reply Inline Actions NIT: this can be rewritten without duplicating `EltCnt.getKnownMinValue() * 37U` ctetreau: NIT: this can be rewritten without duplicating `EltCnt.getKnownMinValue() * 37U`

	return EltCnt.Min * 37U;			return HashVal;
	}			}

	static bool isEqual(const ElementCount& LHS, const ElementCount& RHS) {			static bool isEqual(const ElementCount& LHS, const ElementCount& RHS) {
	return LHS == RHS;			return LHS == RHS;
	}			}
	};			};

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_SUPPORT_TypeSize_H			#endif // LLVM_SUPPORT_TypeSize_H

llvm/lib/Analysis/InstructionSimplify.cpp

Show First 20 Lines • Show All 4,544 Lines • ▼ Show 20 Lines	static Value SimplifyShuffleVectorInst(Value Op0, Value *Op1,
unsigned MaxRecurse) {		unsigned MaxRecurse) {
if (all_of(Mask, [](int Elem) { return Elem == UndefMaskElem; }))		if (all_of(Mask, [](int Elem) { return Elem == UndefMaskElem; }))
return UndefValue::get(RetTy);		return UndefValue::get(RetTy);

auto *InVecTy = cast<VectorType>(Op0->getType());		auto *InVecTy = cast<VectorType>(Op0->getType());
unsigned MaskNumElts = Mask.size();		unsigned MaskNumElts = Mask.size();
ElementCount InVecEltCount = InVecTy->getElementCount();		ElementCount InVecEltCount = InVecTy->getElementCount();

bool Scalable = InVecEltCount.Scalable;		bool Scalable = InVecEltCount.isScalable();

SmallVector<int, 32> Indices;		SmallVector<int, 32> Indices;
Indices.assign(Mask.begin(), Mask.end());		Indices.assign(Mask.begin(), Mask.end());

// Canonicalization: If mask does not select elements from an input vector,		// Canonicalization: If mask does not select elements from an input vector,
// replace that input vector with undef.		// replace that input vector with undef.
if (!Scalable) {		if (!Scalable) {
bool MaskSelects0 = false, MaskSelects1 = false;		bool MaskSelects0 = false, MaskSelects1 = false;
unsigned InVecNumElts = InVecEltCount.Min;		unsigned InVecNumElts = InVecEltCount.getKnownMinValue();
for (unsigned i = 0; i != MaskNumElts; ++i) {		for (unsigned i = 0; i != MaskNumElts; ++i) {
if (Indices[i] == -1)		if (Indices[i] == -1)
continue;		continue;
if ((unsigned)Indices[i] < InVecNumElts)		if ((unsigned)Indices[i] < InVecNumElts)
MaskSelects0 = true;		MaskSelects0 = true;
else		else
MaskSelects1 = true;		MaskSelects1 = true;
}		}
Show All 12 Lines	static Value SimplifyShuffleVectorInst(Value Op0, Value *Op1,
if (Op0Const && Op1Const)		if (Op0Const && Op1Const)
return ConstantExpr::getShuffleVector(Op0Const, Op1Const, Mask);		return ConstantExpr::getShuffleVector(Op0Const, Op1Const, Mask);

// Canonicalization: if only one input vector is constant, it shall be the		// Canonicalization: if only one input vector is constant, it shall be the
// second one. This transformation depends on the value of the mask which		// second one. This transformation depends on the value of the mask which
// is not known at compile time for scalable vectors		// is not known at compile time for scalable vectors
if (!Scalable && Op0Const && !Op1Const) {		if (!Scalable && Op0Const && !Op1Const) {
std::swap(Op0, Op1);		std::swap(Op0, Op1);
ShuffleVectorInst::commuteShuffleMask(Indices, InVecEltCount.Min);		ShuffleVectorInst::commuteShuffleMask(Indices,
		InVecEltCount.getKnownMinValue());
}		}

// A splat of an inserted scalar constant becomes a vector constant:		// A splat of an inserted scalar constant becomes a vector constant:
// shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>		// shuf (inselt ?, C, IndexC), undef, <IndexC, IndexC...> --> <C, C...>
// NOTE: We may have commuted above, so analyze the updated Indices, not the		// NOTE: We may have commuted above, so analyze the updated Indices, not the
// original mask constant.		// original mask constant.
// NOTE: This transformation depends on the value of the mask which is not		// NOTE: This transformation depends on the value of the mask which is not
// known at compile time for scalable vectors		// known at compile time for scalable vectors
▲ Show 20 Lines • Show All 1,303 Lines • Show Last 20 Lines

llvm/lib/Analysis/VFABIDemangling.cpp

Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines	Optional<VFInfo> VFABI::tryDemangleForVFABI(StringRef MangledName,
// set to 0.		// set to 0.
if (IsScalable) {		if (IsScalable) {
const Function *F = M.getFunction(VectorName);		const Function *F = M.getFunction(VectorName);
// The declaration of the function must be present in the module		// The declaration of the function must be present in the module
// to be able to retrieve its signature.		// to be able to retrieve its signature.
if (!F)		if (!F)
return None;		return None;
const ElementCount EC = getECFromSignature(F->getFunctionType());		const ElementCount EC = getECFromSignature(F->getFunctionType());
VF = EC.Min;		VF = EC.getKnownMinValue();
}		}

// Sanity checks.		// Sanity checks.
// 1. We don't accept a zero lanes vectorization factor.		// 1. We don't accept a zero lanes vectorization factor.
// 2. We don't accept the demangling if the vector function is not		// 2. We don't accept the demangling if the vector function is not
// present in the module.		// present in the module.
if (VF == 0)		if (VF == 0)
return None;		return None;
Show All 29 Lines

llvm/lib/Analysis/ValueTracking.cpp

Show First 20 Lines • Show All 4,802 Lines • ▼ Show 20 Lines	case Instruction::Invoke: {
return !CB->hasRetAttr(Attribute::NoUndef);		return !CB->hasRetAttr(Attribute::NoUndef);
}		}
case Instruction::InsertElement:		case Instruction::InsertElement:
case Instruction::ExtractElement: {		case Instruction::ExtractElement: {
// If index exceeds the length of the vector, it returns poison		// If index exceeds the length of the vector, it returns poison
auto *VTy = cast<VectorType>(Op->getOperand(0)->getType());		auto *VTy = cast<VectorType>(Op->getOperand(0)->getType());
unsigned IdxOp = Op->getOpcode() == Instruction::InsertElement ? 2 : 1;		unsigned IdxOp = Op->getOpcode() == Instruction::InsertElement ? 2 : 1;
auto *Idx = dyn_cast<ConstantInt>(Op->getOperand(IdxOp));		auto *Idx = dyn_cast<ConstantInt>(Op->getOperand(IdxOp));
if (!Idx \|\| Idx->getZExtValue() >= VTy->getElementCount().Min)		if (!Idx \|\|
		Idx->getZExtValue() >= VTy->getElementCount().getKnownMinValue())
return true;		return true;
return false;		return false;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
// shufflevector may return undef.		// shufflevector may return undef.
if (PoisonOnly)		if (PoisonOnly)
return false;		return false;
ArrayRef<int> Mask = isa<ConstantExpr>(Op)		ArrayRef<int> Mask = isa<ConstantExpr>(Op)
▲ Show 20 Lines • Show All 1,866 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 964 Lines • ▼ Show 20 Lines	case Type::ArrayTyID: {
break;		break;
}		}
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VT = cast<VectorType>(T);		VectorType *VT = cast<VectorType>(T);
// VECTOR [numelts, eltty] or		// VECTOR [numelts, eltty] or
// [numelts, eltty, scalable]		// [numelts, eltty, scalable]
Code = bitc::TYPE_CODE_VECTOR;		Code = bitc::TYPE_CODE_VECTOR;
TypeVals.push_back(VT->getElementCount().Min);		TypeVals.push_back(VT->getElementCount().getKnownMinValue());
TypeVals.push_back(VE.getTypeID(VT->getElementType()));		TypeVals.push_back(VE.getTypeID(VT->getElementType()));
if (isa<ScalableVectorType>(VT))		if (isa<ScalableVectorType>(VT))
TypeVals.push_back(true);		TypeVals.push_back(true);
break;		break;
}		}
}		}

// Emit the finished record.		// Emit the finished record.
▲ Show 20 Lines • Show All 3,918 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,951 Lines • ▼ Show 20 Lines	if (!UseSplat) {
else		else
UseSplat = true;		UseSplat = true;
}		}

ElementCount EC = cast<VectorType>(getTransitionType())->getElementCount();		ElementCount EC = cast<VectorType>(getTransitionType())->getElementCount();
if (UseSplat)		if (UseSplat)
return ConstantVector::getSplat(EC, Val);		return ConstantVector::getSplat(EC, Val);

if (!EC.Scalable) {		if (!EC.isScalable()) {
SmallVector<Constant *, 4> ConstVec;		SmallVector<Constant *, 4> ConstVec;
UndefValue *UndefVal = UndefValue::get(Val->getType());		UndefValue *UndefVal = UndefValue::get(Val->getType());
for (unsigned Idx = 0; Idx != EC.Min; ++Idx) {		for (unsigned Idx = 0; Idx != EC.getKnownMinValue(); ++Idx) {
if (Idx == ExtractIdx)		if (Idx == ExtractIdx)
ConstVec.push_back(Val);		ConstVec.push_back(Val);
else		else
ConstVec.push_back(UndefVal);		ConstVec.push_back(UndefVal);
}		}
return ConstantVector::get(ConstVec);		return ConstantVector::get(ConstVec);
} else		} else
llvm_unreachable(		llvm_unreachable(
▲ Show 20 Lines • Show All 979 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 18,988 Lines • ▼ Show 20 Lines	for (SDValue Op : N->ops()) {
SrcOps.push_back(Op.getOperand(0));		SrcOps.push_back(Op.getOperand(0));
}		}

// The wider cast must be supported by the target. This is unusual because		// The wider cast must be supported by the target. This is unusual because
// the operation support type parameter depends on the opcode. In addition,		// the operation support type parameter depends on the opcode. In addition,
// check the other type in the cast to make sure this is really legal.		// check the other type in the cast to make sure this is really legal.
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
EVT SrcEltVT = SrcVT.getVectorElementType();		EVT SrcEltVT = SrcVT.getVectorElementType();
unsigned NumElts = SrcVT.getVectorElementCount().Min * N->getNumOperands();		ElementCount NumElts = SrcVT.getVectorElementCount() * N->getNumOperands();
EVT ConcatSrcVT = EVT::getVectorVT(*DAG.getContext(), SrcEltVT, NumElts);		EVT ConcatSrcVT = EVT::getVectorVT(*DAG.getContext(), SrcEltVT, NumElts);
const TargetLowering &TLI = DAG.getTargetLoweringInfo();		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
switch (CastOpcode) {		switch (CastOpcode) {
case ISD::SINT_TO_FP:		case ISD::SINT_TO_FP:
case ISD::UINT_TO_FP:		case ISD::UINT_TO_FP:
if (!TLI.isOperationLegalOrCustom(CastOpcode, ConcatSrcVT) \|\|		if (!TLI.isOperationLegalOrCustom(CastOpcode, ConcatSrcVT) \|\|
!TLI.isTypeLegal(VT))		!TLI.isTypeLegal(VT))
return SDValue();		return SDValue();
▲ Show 20 Lines • Show All 3,247 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 422 Lines • ▼ Show 20 Lines	if (PartEVT == ValueVT)
return Val;		return Val;

if (PartEVT.isVector()) {		if (PartEVT.isVector()) {
// If the element type of the source/dest vectors are the same, but the		// If the element type of the source/dest vectors are the same, but the
// parts vector has more elements than the value vector, then we have a		// parts vector has more elements than the value vector, then we have a
// vector widening case (e.g. <2 x float> -> <4 x float>). Extract the		// vector widening case (e.g. <2 x float> -> <4 x float>). Extract the
// elements we want.		// elements we want.
if (PartEVT.getVectorElementType() == ValueVT.getVectorElementType()) {		if (PartEVT.getVectorElementType() == ValueVT.getVectorElementType()) {
assert((PartEVT.getVectorElementCount().Min >		assert((PartEVT.getVectorElementCount().getKnownMinValue() >
ValueVT.getVectorElementCount().Min) &&		ValueVT.getVectorElementCount().getKnownMinValue()) &&
(PartEVT.getVectorElementCount().Scalable ==		(PartEVT.getVectorElementCount().isScalable() ==
ValueVT.getVectorElementCount().Scalable) &&		ValueVT.getVectorElementCount().isScalable()) &&
"Cannot narrow, it would be a lossy transformation");		"Cannot narrow, it would be a lossy transformation");
return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValueVT, Val,		return DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ValueVT, Val,
DAG.getVectorIdxConstant(0, DL));		DAG.getVectorIdxConstant(0, DL));
}		}

// Vector/Vector bitcast.		// Vector/Vector bitcast.
if (ValueVT.getSizeInBits() == PartEVT.getSizeInBits())		if (ValueVT.getSizeInBits() == PartEVT.getSizeInBits())
return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);		return DAG.getNode(ISD::BITCAST, DL, ValueVT, Val);
▲ Show 20 Lines • Show All 3,303 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitGetElementPtr(const User &I) {
bool IsVectorGEP = I.getType()->isVectorTy();		bool IsVectorGEP = I.getType()->isVectorTy();
ElementCount VectorElementCount =		ElementCount VectorElementCount =
IsVectorGEP ? cast<VectorType>(I.getType())->getElementCount()		IsVectorGEP ? cast<VectorType>(I.getType())->getElementCount()
: ElementCount::getFixed(0);		: ElementCount::getFixed(0);

if (IsVectorGEP && !N.getValueType().isVector()) {		if (IsVectorGEP && !N.getValueType().isVector()) {
LLVMContext &Context = *DAG.getContext();		LLVMContext &Context = *DAG.getContext();
EVT VT = EVT::getVectorVT(Context, N.getValueType(), VectorElementCount);		EVT VT = EVT::getVectorVT(Context, N.getValueType(), VectorElementCount);
if (VectorElementCount.Scalable)		if (VectorElementCount.isScalable())
N = DAG.getSplatVector(VT, dl, N);		N = DAG.getSplatVector(VT, dl, N);
else		else
N = DAG.getSplatBuildVector(VT, dl, N);		N = DAG.getSplatBuildVector(VT, dl, N);
}		}

for (gep_type_iterator GTI = gep_type_begin(&I), E = gep_type_end(&I);		for (gep_type_iterator GTI = gep_type_begin(&I), E = gep_type_end(&I);
GTI != E; ++GTI) {		GTI != E; ++GTI) {
const Value *Idx = GTI.getOperand();		const Value *Idx = GTI.getOperand();
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	if (StructType *StTy = GTI.getStructTypeOrNull()) {
}		}

// N = N + Idx * ElementMul;		// N = N + Idx * ElementMul;
SDValue IdxN = getValue(Idx);		SDValue IdxN = getValue(Idx);

if (!IdxN.getValueType().isVector() && IsVectorGEP) {		if (!IdxN.getValueType().isVector() && IsVectorGEP) {
EVT VT = EVT::getVectorVT(*Context, IdxN.getValueType(),		EVT VT = EVT::getVectorVT(*Context, IdxN.getValueType(),
VectorElementCount);		VectorElementCount);
if (VectorElementCount.Scalable)		if (VectorElementCount.isScalable())
IdxN = DAG.getSplatVector(VT, dl, IdxN);		IdxN = DAG.getSplatVector(VT, dl, IdxN);
else		else
IdxN = DAG.getSplatBuildVector(VT, dl, IdxN);		IdxN = DAG.getSplatBuildVector(VT, dl, IdxN);
}		}

// If the index is smaller or larger than intptr_t, truncate or extend		// If the index is smaller or larger than intptr_t, truncate or extend
// it.		// it.
IdxN = DAG.getSExtOrTrunc(IdxN, dl, N.getValueType());		IdxN = DAG.getSExtOrTrunc(IdxN, dl, N.getValueType());
▲ Show 20 Lines • Show All 6,850 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetLoweringBase.cpp

Show First 20 Lines • Show All 958 Lines • ▼ Show 20 Lines	static unsigned getVectorTypeBreakdownMVT(MVT VT, MVT &IntermediateVT,
// Figure out the right, legal destination reg to copy into.		// Figure out the right, legal destination reg to copy into.
ElementCount EC = VT.getVectorElementCount();		ElementCount EC = VT.getVectorElementCount();
MVT EltTy = VT.getVectorElementType();		MVT EltTy = VT.getVectorElementType();

unsigned NumVectorRegs = 1;		unsigned NumVectorRegs = 1;

// Scalable vectors cannot be scalarized, so splitting or widening is		// Scalable vectors cannot be scalarized, so splitting or widening is
// required.		// required.
if (VT.isScalableVector() && !isPowerOf2_32(EC.Min))		if (VT.isScalableVector() && !isPowerOf2_32(EC.getKnownMinValue()))
llvm_unreachable(		llvm_unreachable(
"Splitting or widening of non-power-of-2 MVTs is not implemented.");		"Splitting or widening of non-power-of-2 MVTs is not implemented.");

// FIXME: We don't support non-power-of-2-sized vectors for now.		// FIXME: We don't support non-power-of-2-sized vectors for now.
// Ideally we could break down into LHS/RHS like LegalizeDAG does.		// Ideally we could break down into LHS/RHS like LegalizeDAG does.
if (!isPowerOf2_32(EC.Min)) {		if (!isPowerOf2_32(EC.getKnownMinValue())) {
// Split EC to unit size (scalable property is preserved).		// Split EC to unit size (scalable property is preserved).
NumVectorRegs = EC.Min;		NumVectorRegs = EC.getKnownMinValue();
EC = EC / NumVectorRegs;		EC = ElementCount::getFixed(1);
}		}

// Divide the input until we get to a supported size. This will		// Divide the input until we get to a supported size. This will
// always end up with an EC that represent a scalar or a scalable		// always end up with an EC that represent a scalar or a scalable
// scalar.		// scalar.
while (EC.Min > 1 && !TLI->isTypeLegal(MVT::getVectorVT(EltTy, EC))) {		while (EC.getKnownMinValue() > 1 &&
EC.Min >>= 1;		!TLI->isTypeLegal(MVT::getVectorVT(EltTy, EC))) {
		EC /= 2;
NumVectorRegs <<= 1;		NumVectorRegs <<= 1;
}		}

NumIntermediates = NumVectorRegs;		NumIntermediates = NumVectorRegs;

MVT NewVT = MVT::getVectorVT(EltTy, EC);		MVT NewVT = MVT::getVectorVT(EltTy, EC);
if (!TLI->isTypeLegal(NewVT))		if (!TLI->isTypeLegal(NewVT))
NewVT = EltTy;		NewVT = EltTy;
▲ Show 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	case TypePromoteInteger: {
}		}
}		}
if (IsLegalWiderType)		if (IsLegalWiderType)
break;		break;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
}		}

case TypeWidenVector:		case TypeWidenVector:
if (isPowerOf2_32(EC.Min)) {		if (isPowerOf2_32(EC.getKnownMinValue())) {
// Try to widen the vector.		// Try to widen the vector.
for (unsigned nVT = i + 1; nVT <= MVT::LAST_VECTOR_VALUETYPE; ++nVT) {		for (unsigned nVT = i + 1; nVT <= MVT::LAST_VECTOR_VALUETYPE; ++nVT) {
MVT SVT = (MVT::SimpleValueType) nVT;		MVT SVT = (MVT::SimpleValueType) nVT;
if (SVT.getVectorElementType() == EltVT &&		if (SVT.getVectorElementType() == EltVT &&
SVT.isScalableVector() == IsScalable &&		SVT.isScalableVector() == IsScalable &&
SVT.getVectorElementCount().Min > EC.Min && isTypeLegal(SVT)) {		SVT.getVectorElementCount().getKnownMinValue() >
		EC.getKnownMinValue() &&
		isTypeLegal(SVT)) {
TransformToType[i] = SVT;		TransformToType[i] = SVT;
RegisterTypeForVT[i] = SVT;		RegisterTypeForVT[i] = SVT;
NumRegistersForVT[i] = 1;		NumRegistersForVT[i] = 1;
ValueTypeActions.setTypeAction(VT, TypeWidenVector);		ValueTypeActions.setTypeAction(VT, TypeWidenVector);
IsLegalWiderType = true;		IsLegalWiderType = true;
break;		break;
}		}
}		}
Show All 27 Lines	case TypeScalarizeVector: {
MVT NVT = VT.getPow2VectorType();		MVT NVT = VT.getPow2VectorType();
if (NVT == VT) {		if (NVT == VT) {
// Type is already a power of 2. The default action is to split.		// Type is already a power of 2. The default action is to split.
TransformToType[i] = MVT::Other;		TransformToType[i] = MVT::Other;
if (PreferredAction == TypeScalarizeVector)		if (PreferredAction == TypeScalarizeVector)
ValueTypeActions.setTypeAction(VT, TypeScalarizeVector);		ValueTypeActions.setTypeAction(VT, TypeScalarizeVector);
else if (PreferredAction == TypeSplitVector)		else if (PreferredAction == TypeSplitVector)
ValueTypeActions.setTypeAction(VT, TypeSplitVector);		ValueTypeActions.setTypeAction(VT, TypeSplitVector);
else if (EC.Min > 1)		else if (EC.getKnownMinValue() > 1)
ValueTypeActions.setTypeAction(VT, TypeSplitVector);		ValueTypeActions.setTypeAction(VT, TypeSplitVector);
else		else
ValueTypeActions.setTypeAction(VT, EC.Scalable		ValueTypeActions.setTypeAction(VT, EC.isScalable()
? TypeScalarizeScalableVector		? TypeScalarizeScalableVector
: TypeScalarizeVector);		: TypeScalarizeVector);
} else {		} else {
TransformToType[i] = NVT;		TransformToType[i] = NVT;
ValueTypeActions.setTypeAction(VT, TypeWidenVector);		ValueTypeActions.setTypeAction(VT, TypeWidenVector);
}		}
break;		break;
}		}
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	unsigned TargetLoweringBase::getVectorTypeBreakdown(LLVMContext &Context, EVT VT,
ElementCount EltCnt = VT.getVectorElementCount();		ElementCount EltCnt = VT.getVectorElementCount();

// If there is a wider vector type with the same element type as this one,		// If there is a wider vector type with the same element type as this one,
// or a promoted vector type that has the same number of elements which		// or a promoted vector type that has the same number of elements which
// are wider, then we should convert to that legal vector type.		// are wider, then we should convert to that legal vector type.
// This handles things like <2 x float> -> <4 x float> and		// This handles things like <2 x float> -> <4 x float> and
// <4 x i1> -> <4 x i32>.		// <4 x i1> -> <4 x i32>.
LegalizeTypeAction TA = getTypeAction(Context, VT);		LegalizeTypeAction TA = getTypeAction(Context, VT);
if (EltCnt.Min != 1 && (TA == TypeWidenVector \|\| TA == TypePromoteInteger)) {		if (EltCnt.getKnownMinValue() != 1 &&
		(TA == TypeWidenVector \|\| TA == TypePromoteInteger)) {
EVT RegisterEVT = getTypeToTransformTo(Context, VT);		EVT RegisterEVT = getTypeToTransformTo(Context, VT);
if (isTypeLegal(RegisterEVT)) {		if (isTypeLegal(RegisterEVT)) {
IntermediateVT = RegisterEVT;		IntermediateVT = RegisterEVT;
RegisterVT = RegisterEVT.getSimpleVT();		RegisterVT = RegisterEVT.getSimpleVT();
NumIntermediates = 1;		NumIntermediates = 1;
return 1;		return 1;
}		}
}		}

// Figure out the right, legal destination reg to copy into.		// Figure out the right, legal destination reg to copy into.
EVT EltTy = VT.getVectorElementType();		EVT EltTy = VT.getVectorElementType();

unsigned NumVectorRegs = 1;		unsigned NumVectorRegs = 1;

// Scalable vectors cannot be scalarized, so handle the legalisation of the		// Scalable vectors cannot be scalarized, so handle the legalisation of the
// types like done elsewhere in SelectionDAG.		// types like done elsewhere in SelectionDAG.
if (VT.isScalableVector() && !isPowerOf2_32(EltCnt.Min)) {		if (VT.isScalableVector() && !isPowerOf2_32(EltCnt.getKnownMinValue())) {
LegalizeKind LK;		LegalizeKind LK;
EVT PartVT = VT;		EVT PartVT = VT;
do {		do {
// Iterate until we've found a legal (part) type to hold VT.		// Iterate until we've found a legal (part) type to hold VT.
LK = getTypeConversion(Context, PartVT);		LK = getTypeConversion(Context, PartVT);
PartVT = LK.second;		PartVT = LK.second;
} while (LK.first != TypeLegal);		} while (LK.first != TypeLegal);

NumIntermediates =		NumIntermediates = VT.getVectorElementCount().getKnownMinValue() /
VT.getVectorElementCount().Min / PartVT.getVectorElementCount().Min;		PartVT.getVectorElementCount().getKnownMinValue();

// FIXME: This code needs to be extended to handle more complex vector		// FIXME: This code needs to be extended to handle more complex vector
// breakdowns, like nxv7i64 -> nxv8i64 -> 4 x nxv2i64. Currently the only		// breakdowns, like nxv7i64 -> nxv8i64 -> 4 x nxv2i64. Currently the only
// supported cases are vectors that are broken down into equal parts		// supported cases are vectors that are broken down into equal parts
// such as nxv6i64 -> 3 x nxv2i64.		// such as nxv6i64 -> 3 x nxv2i64.
assert(NumIntermediates * PartVT.getVectorElementCount().Min ==		assert((PartVT.getVectorElementCount() * NumIntermediates) ==
VT.getVectorElementCount().Min &&		VT.getVectorElementCount() &&
"Expected an integer multiple of PartVT");		"Expected an integer multiple of PartVT");
IntermediateVT = PartVT;		IntermediateVT = PartVT;
RegisterVT = getRegisterType(Context, IntermediateVT);		RegisterVT = getRegisterType(Context, IntermediateVT);
return NumIntermediates;		return NumIntermediates;
}		}

// FIXME: We don't support non-power-of-2-sized vectors for now. Ideally		// FIXME: We don't support non-power-of-2-sized vectors for now. Ideally
// we could break down into LHS/RHS like LegalizeDAG does.		// we could break down into LHS/RHS like LegalizeDAG does.
if (!isPowerOf2_32(EltCnt.Min)) {		if (!isPowerOf2_32(EltCnt.getKnownMinValue())) {
NumVectorRegs = EltCnt.Min;		NumVectorRegs = EltCnt.getKnownMinValue();
EltCnt.Min = 1;		EltCnt = ElementCount::getFixed(1);
}		}

// Divide the input until we get to a supported size. This will always		// Divide the input until we get to a supported size. This will always
// end with a scalar if the target doesn't support vectors.		// end with a scalar if the target doesn't support vectors.
while (EltCnt.Min > 1 &&		while (EltCnt.getKnownMinValue() > 1 &&
!isTypeLegal(EVT::getVectorVT(Context, EltTy, EltCnt))) {		!isTypeLegal(EVT::getVectorVT(Context, EltTy, EltCnt))) {
EltCnt.Min >>= 1;		EltCnt /= 2;
NumVectorRegs <<= 1;		NumVectorRegs <<= 1;
}		}

NumIntermediates = NumVectorRegs;		NumIntermediates = NumVectorRegs;

EVT NewVT = EVT::getVectorVT(Context, EltTy, EltCnt);		EVT NewVT = EVT::getVectorVT(Context, EltTy, EltCnt);
if (!isTypeLegal(NewVT))		if (!isTypeLegal(NewVT))
NewVT = EltTy;		NewVT = EltTy;
▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ValueTypes.cpp

	Show First 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	EVT EVT::getExtendedVectorElementType() const {			EVT EVT::getExtendedVectorElementType() const {
	assert(isExtended() && "Type is not extended!");			assert(isExtended() && "Type is not extended!");
	return EVT::getEVT(cast<VectorType>(LLVMTy)->getElementType());			return EVT::getEVT(cast<VectorType>(LLVMTy)->getElementType());
	}			}

	unsigned EVT::getExtendedVectorNumElements() const {			unsigned EVT::getExtendedVectorNumElements() const {
	assert(isExtended() && "Type is not extended!");			assert(isExtended() && "Type is not extended!");
	ElementCount EC = cast<VectorType>(LLVMTy)->getElementCount();			ElementCount EC = cast<VectorType>(LLVMTy)->getElementCount();
	if (EC.Scalable) {			if (EC.isScalable()) {
	WithColor::warning()			WithColor::warning()
	<< "The code that requested the fixed number of elements has made the "			<< "The code that requested the fixed number of elements has made the "
	"assumption that this vector is not scalable. This assumption was "			"assumption that this vector is not scalable. This assumption was "
	"not correct, and this may lead to broken code\n";			"not correct, and this may lead to broken code\n";
	}			}
	return EC.Min;			return EC.getKnownMinValue();
	}			}

	ElementCount EVT::getExtendedVectorElementCount() const {			ElementCount EVT::getExtendedVectorElementCount() const {
	assert(isExtended() && "Type is not extended!");			assert(isExtended() && "Type is not extended!");
	return cast<VectorType>(LLVMTy)->getElementCount();			return cast<VectorType>(LLVMTy)->getElementCount();
	}			}

	TypeSize EVT::getExtendedSizeInBits() const {			TypeSize EVT::getExtendedSizeInBits() const {
	assert(isExtended() && "Type is not extended!");			assert(isExtended() && "Type is not extended!");
	if (IntegerType *ITy = dyn_cast<IntegerType>(LLVMTy))			if (IntegerType *ITy = dyn_cast<IntegerType>(LLVMTy))
	return TypeSize::Fixed(ITy->getBitWidth());			return TypeSize::Fixed(ITy->getBitWidth());
	if (VectorType *VTy = dyn_cast<VectorType>(LLVMTy))			if (VectorType *VTy = dyn_cast<VectorType>(LLVMTy))
	return VTy->getPrimitiveSizeInBits();			return VTy->getPrimitiveSizeInBits();
	llvm_unreachable("Unrecognized extended type!");			llvm_unreachable("Unrecognized extended type!");
	}			}

	/// getEVTString - This function returns value type as a string, e.g. "i32".			/// getEVTString - This function returns value type as a string, e.g. "i32".
	std::string EVT::getEVTString() const {			std::string EVT::getEVTString() const {
	switch (V.SimpleTy) {			switch (V.SimpleTy) {
	default:			default:
	if (isVector())			if (isVector())
	return (isScalableVector() ? "nxv" : "v")			return (isScalableVector() ? "nxv" : "v") +
	+ utostr(getVectorElementCount().Min)			utostr(getVectorElementCount().getKnownMinValue()) +
	+ getVectorElementType().getEVTString();			getVectorElementType().getEVTString();
	if (isInteger())			if (isInteger())
	return "i" + utostr(getSizeInBits());			return "i" + utostr(getSizeInBits());
	if (isFloatingPoint())			if (isFloatingPoint())
	return "f" + utostr(getSizeInBits());			return "f" + utostr(getSizeInBits());
	llvm_unreachable("Invalid EVT!");			llvm_unreachable("Invalid EVT!");
	case MVT::bf16: return "bf16";			case MVT::bf16: return "bf16";
	case MVT::ppcf128: return "ppcf128";			case MVT::ppcf128: return "ppcf128";
	case MVT::isVoid: return "isVoid";			case MVT::isVoid: return "isVoid";
	▲ Show 20 Lines • Show All 357 Lines • Show Last 20 Lines

llvm/lib/IR/AsmWriter.cpp

Show First 20 Lines • Show All 650 Lines • ▼ Show 20 Lines	case Type::ArrayTyID: {
OS << ']';		OS << ']';
return;		return;
}		}
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *PTy = cast<VectorType>(Ty);		VectorType *PTy = cast<VectorType>(Ty);
ElementCount EC = PTy->getElementCount();		ElementCount EC = PTy->getElementCount();
OS << "<";		OS << "<";
if (EC.Scalable)		if (EC.isScalable())
OS << "vscale x ";		OS << "vscale x ";
OS << EC.Min << " x ";		OS << EC.getKnownMinValue() << " x ";
print(PTy->getElementType(), OS);		print(PTy->getElementType(), OS);
OS << '>';		OS << '>';
return;		return;
}		}
}		}
llvm_unreachable("Invalid TypeID");		llvm_unreachable("Invalid TypeID");
}		}

▲ Show 20 Lines • Show All 3,996 Lines • Show Last 20 Lines

llvm/lib/IR/ConstantFold.cpp

Show First 20 Lines • Show All 925 Lines • ▼ Show 20 Lines	Constant llvm::ConstantFoldShuffleVectorInstruction(Constant V1, Constant *V2,
// Undefined shuffle mask -> undefined value.		// Undefined shuffle mask -> undefined value.
if (all_of(Mask, [](int Elt) { return Elt == UndefMaskElem; })) {		if (all_of(Mask, [](int Elt) { return Elt == UndefMaskElem; })) {
return UndefValue::get(FixedVectorType::get(EltTy, MaskNumElts));		return UndefValue::get(FixedVectorType::get(EltTy, MaskNumElts));
}		}

// If the mask is all zeros this is a splat, no need to go through all		// If the mask is all zeros this is a splat, no need to go through all
// elements.		// elements.
if (all_of(Mask, [](int Elt) { return Elt == 0; }) &&		if (all_of(Mask, [](int Elt) { return Elt == 0; }) &&
!MaskEltCount.Scalable) {		!MaskEltCount.isScalable()) {
Type *Ty = IntegerType::get(V1->getContext(), 32);		Type *Ty = IntegerType::get(V1->getContext(), 32);
Constant *Elt =		Constant *Elt =
ConstantExpr::getExtractElement(V1, ConstantInt::get(Ty, 0));		ConstantExpr::getExtractElement(V1, ConstantInt::get(Ty, 0));
return ConstantVector::getSplat(MaskEltCount, Elt);		return ConstantVector::getSplat(MaskEltCount, Elt);
}		}
// Do not iterate on scalable vector. The num of elements is unknown at		// Do not iterate on scalable vector. The num of elements is unknown at
// compile-time.		// compile-time.
if (isa<ScalableVectorType>(V1VTy))		if (isa<ScalableVectorType>(V1VTy))
return nullptr;		return nullptr;

unsigned SrcNumElts = V1VTy->getElementCount().Min;		unsigned SrcNumElts = V1VTy->getElementCount().getKnownMinValue();

// Loop over the shuffle mask, evaluating each element.		// Loop over the shuffle mask, evaluating each element.
SmallVector<Constant*, 32> Result;		SmallVector<Constant*, 32> Result;
for (unsigned i = 0; i != MaskNumElts; ++i) {		for (unsigned i = 0; i != MaskNumElts; ++i) {
int Elt = Mask[i];		int Elt = Mask[i];
if (Elt == -1) {		if (Elt == -1) {
Result.push_back(UndefValue::get(EltTy));		Result.push_back(UndefValue::get(EltTy));
continue;		continue;
▲ Show 20 Lines • Show All 1,097 Lines • ▼ Show 20 Lines	if (Constant *C1Splat = C1->getSplatValue())
C1VTy->getElementCount(),		C1VTy->getElementCount(),
ConstantExpr::getCompare(pred, C1Splat, C2Splat));		ConstantExpr::getCompare(pred, C1Splat, C2Splat));

// If we can constant fold the comparison of each element, constant fold		// If we can constant fold the comparison of each element, constant fold
// the whole vector comparison.		// the whole vector comparison.
SmallVector<Constant*, 4> ResElts;		SmallVector<Constant*, 4> ResElts;
Type *Ty = IntegerType::get(C1->getContext(), 32);		Type *Ty = IntegerType::get(C1->getContext(), 32);
// Compare the elements, producing an i1 result or constant expr.		// Compare the elements, producing an i1 result or constant expr.
for (unsigned i = 0, e = C1VTy->getElementCount().Min; i != e; ++i) {		for (unsigned I = 0, E = C1VTy->getElementCount().getKnownMinValue();
		I != E; ++I) {
Constant *C1E =		Constant *C1E =
ConstantExpr::getExtractElement(C1, ConstantInt::get(Ty, i));		ConstantExpr::getExtractElement(C1, ConstantInt::get(Ty, I));
Constant *C2E =		Constant *C2E =
ConstantExpr::getExtractElement(C2, ConstantInt::get(Ty, i));		ConstantExpr::getExtractElement(C2, ConstantInt::get(Ty, I));

ResElts.push_back(ConstantExpr::getCompare(pred, C1E, C2E));		ResElts.push_back(ConstantExpr::getCompare(pred, C1E, C2E));
}		}

return ConstantVector::get(ResElts);		return ConstantVector::get(ResElts);
}		}

if (C1->getType()->isFloatingPointTy() &&		if (C1->getType()->isFloatingPointTy() &&
▲ Show 20 Lines • Show All 537 Lines • Show Last 20 Lines

llvm/lib/IR/Constants.cpp

Show First 20 Lines • Show All 1,294 Lines • ▼ Show 20 Lines	if (ConstantDataSequential::isElementTypeCompatible(C->getType()))
return getSequenceIfElementsMatch<ConstantDataVector>(C, V);		return getSequenceIfElementsMatch<ConstantDataVector>(C, V);

// Otherwise, the element type isn't compatible with ConstantDataVector, or		// Otherwise, the element type isn't compatible with ConstantDataVector, or
// the operand list contains a ConstantExpr or something else strange.		// the operand list contains a ConstantExpr or something else strange.
return nullptr;		return nullptr;
}		}

Constant ConstantVector::getSplat(ElementCount EC, Constant V) {		Constant ConstantVector::getSplat(ElementCount EC, Constant V) {
if (!EC.Scalable) {		if (!EC.isScalable()) {
// If this splat is compatible with ConstantDataVector, use it instead of		// If this splat is compatible with ConstantDataVector, use it instead of
// ConstantVector.		// ConstantVector.
if ((isa<ConstantFP>(V) \|\| isa<ConstantInt>(V)) &&		if ((isa<ConstantFP>(V) \|\| isa<ConstantInt>(V)) &&
ConstantDataSequential::isElementTypeCompatible(V->getType()))		ConstantDataSequential::isElementTypeCompatible(V->getType()))
return ConstantDataVector::getSplat(EC.Min, V);		return ConstantDataVector::getSplat(EC.getKnownMinValue(), V);

SmallVector<Constant *, 32> Elts(EC.Min, V);		SmallVector<Constant *, 32> Elts(EC.getKnownMinValue(), V);
return get(Elts);		return get(Elts);
}		}

Type *VTy = VectorType::get(V->getType(), EC);		Type *VTy = VectorType::get(V->getType(), EC);

if (V->isNullValue())		if (V->isNullValue())
return ConstantAggregateZero::get(VTy);		return ConstantAggregateZero::get(VTy);
else if (isa<UndefValue>(V))		else if (isa<UndefValue>(V))
return UndefValue::get(VTy);		return UndefValue::get(VTy);

Type *I32Ty = Type::getInt32Ty(VTy->getContext());		Type *I32Ty = Type::getInt32Ty(VTy->getContext());

// Move scalar into vector.		// Move scalar into vector.
Constant *UndefV = UndefValue::get(VTy);		Constant *UndefV = UndefValue::get(VTy);
V = ConstantExpr::getInsertElement(UndefV, V, ConstantInt::get(I32Ty, 0));		V = ConstantExpr::getInsertElement(UndefV, V, ConstantInt::get(I32Ty, 0));
// Build shuffle mask to perform the splat.		// Build shuffle mask to perform the splat.
SmallVector<int, 8> Zeros(EC.Min, 0);		SmallVector<int, 8> Zeros(EC.getKnownMinValue(), 0);
// Splat.		// Splat.
return ConstantExpr::getShuffleVector(V, UndefV, Zeros);		return ConstantExpr::getShuffleVector(V, UndefV, Zeros);
}		}

ConstantTokenNone *ConstantTokenNone::get(LLVMContext &Context) {		ConstantTokenNone *ConstantTokenNone::get(LLVMContext &Context) {
LLVMContextImpl *pImpl = Context.pImpl;		LLVMContextImpl *pImpl = Context.pImpl;
if (!pImpl->TheNoneToken)		if (!pImpl->TheNoneToken)
pImpl->TheNoneToken.reset(new ConstantTokenNone(Context));		pImpl->TheNoneToken.reset(new ConstantTokenNone(Context));
▲ Show 20 Lines • Show All 923 Lines • ▼ Show 20 Lines	Constant ConstantExpr::getGetElementPtr(Type Ty, Constant *C,
auto EltCount = ElementCount::getFixed(0);		auto EltCount = ElementCount::getFixed(0);
if (VectorType *VecTy = dyn_cast<VectorType>(C->getType()))		if (VectorType *VecTy = dyn_cast<VectorType>(C->getType()))
EltCount = VecTy->getElementCount();		EltCount = VecTy->getElementCount();
else		else
for (auto Idx : Idxs)		for (auto Idx : Idxs)
if (VectorType *VecTy = dyn_cast<VectorType>(Idx->getType()))		if (VectorType *VecTy = dyn_cast<VectorType>(Idx->getType()))
EltCount = VecTy->getElementCount();		EltCount = VecTy->getElementCount();

if (EltCount.Min != 0)		if (EltCount.isNonZero())
ReqTy = VectorType::get(ReqTy, EltCount);		ReqTy = VectorType::get(ReqTy, EltCount);

if (OnlyIfReducedTy == ReqTy)		if (OnlyIfReducedTy == ReqTy)
return nullptr;		return nullptr;

// Look up the constant in the table first to ensure uniqueness		// Look up the constant in the table first to ensure uniqueness
std::vector<Constant*> ArgVec;		std::vector<Constant*> ArgVec;
ArgVec.reserve(1 + Idxs.size());		ArgVec.reserve(1 + Idxs.size());
ArgVec.push_back(C);		ArgVec.push_back(C);
auto GTI = gep_type_begin(Ty, Idxs), GTE = gep_type_end(Ty, Idxs);		auto GTI = gep_type_begin(Ty, Idxs), GTE = gep_type_end(Ty, Idxs);
for (; GTI != GTE; ++GTI) {		for (; GTI != GTE; ++GTI) {
auto *Idx = cast<Constant>(GTI.getOperand());		auto *Idx = cast<Constant>(GTI.getOperand());
assert(		assert(
(!isa<VectorType>(Idx->getType()) \|\|		(!isa<VectorType>(Idx->getType()) \|\|
cast<VectorType>(Idx->getType())->getElementCount() == EltCount) &&		cast<VectorType>(Idx->getType())->getElementCount() == EltCount) &&
"getelementptr index type missmatch");		"getelementptr index type missmatch");

if (GTI.isStruct() && Idx->getType()->isVectorTy()) {		if (GTI.isStruct() && Idx->getType()->isVectorTy()) {
Idx = Idx->getSplatValue();		Idx = Idx->getSplatValue();
} else if (GTI.isSequential() && EltCount.Min != 0 &&		} else if (GTI.isSequential() && EltCount.isNonZero() &&
!Idx->getType()->isVectorTy()) {		!Idx->getType()->isVectorTy()) {
Idx = ConstantVector::getSplat(EltCount, Idx);		Idx = ConstantVector::getSplat(EltCount, Idx);
}		}
ArgVec.push_back(Idx);		ArgVec.push_back(Idx);
}		}

unsigned SubClassOptionalData = InBounds ? GEPOperator::IsInBounds : 0;		unsigned SubClassOptionalData = InBounds ? GEPOperator::IsInBounds : 0;
if (InRangeIndex && *InRangeIndex < 63)		if (InRangeIndex && *InRangeIndex < 63)
▲ Show 20 Lines • Show All 1,028 Lines • Show Last 20 Lines

llvm/lib/IR/Core.cpp

Show First 20 Lines • Show All 775 Lines • ▼ Show 20 Lines	unsigned LLVMGetArrayLength(LLVMTypeRef ArrayTy) {
return unwrap<ArrayType>(ArrayTy)->getNumElements();		return unwrap<ArrayType>(ArrayTy)->getNumElements();
}		}

unsigned LLVMGetPointerAddressSpace(LLVMTypeRef PointerTy) {		unsigned LLVMGetPointerAddressSpace(LLVMTypeRef PointerTy) {
return unwrap<PointerType>(PointerTy)->getAddressSpace();		return unwrap<PointerType>(PointerTy)->getAddressSpace();
}		}

unsigned LLVMGetVectorSize(LLVMTypeRef VectorTy) {		unsigned LLVMGetVectorSize(LLVMTypeRef VectorTy) {
return unwrap<VectorType>(VectorTy)->getElementCount().Min;		return unwrap<VectorType>(VectorTy)->getElementCount().getKnownMinValue();
}		}

/--.. Operations on other types ...........................................--/		/--.. Operations on other types ...........................................--/

LLVMTypeRef LLVMVoidTypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMVoidTypeInContext(LLVMContextRef C) {
return wrap(Type::getVoidTy(*unwrap(C)));		return wrap(Type::getVoidTy(*unwrap(C)));
}		}
LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C) {
▲ Show 20 Lines • Show All 3,332 Lines • Show Last 20 Lines

llvm/lib/IR/DataLayout.cpp

Show First 20 Lines • Show All 624 Lines • ▼ Show 20 Lines	Align DataLayout::getAlignmentInfo(AlignTypeEnum AlignType, uint32_t BitWidth,
} else if (AlignType == VECTOR_ALIGN) {		} else if (AlignType == VECTOR_ALIGN) {
// By default, use natural alignment for vector types. This is consistent		// By default, use natural alignment for vector types. This is consistent
// with what clang and llvm-gcc do.		// with what clang and llvm-gcc do.
unsigned Alignment =		unsigned Alignment =
getTypeAllocSize(cast<VectorType>(Ty)->getElementType());		getTypeAllocSize(cast<VectorType>(Ty)->getElementType());
// We're only calculating a natural alignment, so it doesn't have to be		// We're only calculating a natural alignment, so it doesn't have to be
// based on the full size for scalable vectors. Using the minimum element		// based on the full size for scalable vectors. Using the minimum element
// count should be enough here.		// count should be enough here.
Alignment *= cast<VectorType>(Ty)->getElementCount().Min;		Alignment *= cast<VectorType>(Ty)->getElementCount().getKnownMinValue();
Alignment = PowerOf2Ceil(Alignment);		Alignment = PowerOf2Ceil(Alignment);
return Align(Alignment);		return Align(Alignment);
}		}

// If we still couldn't find a reasonable default alignment, fall back		// If we still couldn't find a reasonable default alignment, fall back
// to a simple heuristic that the alignment is the first power of two		// to a simple heuristic that the alignment is the first power of two
// greater-or-equal to the store size of the type. This is a reasonable		// greater-or-equal to the store size of the type. This is a reasonable
// approximation of reality, and if the user wanted something less		// approximation of reality, and if the user wanted something less
▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

llvm/lib/IR/Function.cpp

Show First 20 Lines • Show All 708 Lines • ▼ Show 20 Lines	if (PointerType* PTyp = dyn_cast<PointerType>(Ty)) {
for (size_t i = 0; i < FT->getNumParams(); i++)		for (size_t i = 0; i < FT->getNumParams(); i++)
Result += getMangledTypeStr(FT->getParamType(i));		Result += getMangledTypeStr(FT->getParamType(i));
if (FT->isVarArg())		if (FT->isVarArg())
Result += "vararg";		Result += "vararg";
// Ensure nested function types are distinguishable.		// Ensure nested function types are distinguishable.
Result += "f";		Result += "f";
} else if (VectorType* VTy = dyn_cast<VectorType>(Ty)) {		} else if (VectorType* VTy = dyn_cast<VectorType>(Ty)) {
ElementCount EC = VTy->getElementCount();		ElementCount EC = VTy->getElementCount();
if (EC.Scalable)		if (EC.isScalable())
Result += "nx";		Result += "nx";
Result += "v" + utostr(EC.Min) + getMangledTypeStr(VTy->getElementType());		Result += "v" + utostr(EC.getKnownMinValue()) +
		getMangledTypeStr(VTy->getElementType());
} else if (Ty) {		} else if (Ty) {
switch (Ty->getTypeID()) {		switch (Ty->getTypeID()) {
default: llvm_unreachable("Unhandled type");		default: llvm_unreachable("Unhandled type");
case Type::VoidTyID: Result += "isVoid"; break;		case Type::VoidTyID: Result += "isVoid"; break;
case Type::MetadataTyID: Result += "Metadata"; break;		case Type::MetadataTyID: Result += "Metadata"; break;
case Type::HalfTyID: Result += "f16"; break;		case Type::HalfTyID: Result += "f16"; break;
case Type::BFloatTyID: Result += "bf16"; break;		case Type::BFloatTyID: Result += "bf16"; break;
case Type::FloatTyID: Result += "f32"; break;		case Type::FloatTyID: Result += "f32"; break;
▲ Show 20 Lines • Show All 1,007 Lines • Show Last 20 Lines

llvm/lib/IR/IRBuilder.cpp

	Show First 20 Lines • Show All 997 Lines • ▼ Show 20 Lines
	Value IRBuilderBase::CreateVectorSplat(unsigned NumElts, Value V,			Value IRBuilderBase::CreateVectorSplat(unsigned NumElts, Value V,
	const Twine &Name) {			const Twine &Name) {
	auto EC = ElementCount::getFixed(NumElts);			auto EC = ElementCount::getFixed(NumElts);
	return CreateVectorSplat(EC, V, Name);			return CreateVectorSplat(EC, V, Name);
	}			}

	Value IRBuilderBase::CreateVectorSplat(ElementCount EC, Value V,			Value IRBuilderBase::CreateVectorSplat(ElementCount EC, Value V,
	const Twine &Name) {			const Twine &Name) {
	assert(EC.Min > 0 && "Cannot splat to an empty vector!");			assert(EC.isNonZero() && "Cannot splat to an empty vector!");

	// First insert it into an undef vector so we can shuffle it.			// First insert it into an undef vector so we can shuffle it.
	Type *I32Ty = getInt32Ty();			Type *I32Ty = getInt32Ty();
	Value *Undef = UndefValue::get(VectorType::get(V->getType(), EC));			Value *Undef = UndefValue::get(VectorType::get(V->getType(), EC));
	V = CreateInsertElement(Undef, V, ConstantInt::get(I32Ty, 0),			V = CreateInsertElement(Undef, V, ConstantInt::get(I32Ty, 0),
	Name + ".splatinsert");			Name + ".splatinsert");

	// Shuffle the value across the desired number of elements.			// Shuffle the value across the desired number of elements.
	▲ Show 20 Lines • Show All 165 Lines • Show Last 20 Lines

llvm/lib/IR/Instructions.cpp

Show First 20 Lines • Show All 1,961 Lines • ▼ Show 20 Lines

bool ShuffleVectorInst::isValidOperands(const Value V1, const Value V2,		bool ShuffleVectorInst::isValidOperands(const Value V1, const Value V2,
ArrayRef<int> Mask) {		ArrayRef<int> Mask) {
// V1 and V2 must be vectors of the same type.		// V1 and V2 must be vectors of the same type.
if (!isa<VectorType>(V1->getType()) \|\| V1->getType() != V2->getType())		if (!isa<VectorType>(V1->getType()) \|\| V1->getType() != V2->getType())
return false;		return false;

// Make sure the mask elements make sense.		// Make sure the mask elements make sense.
int V1Size = cast<VectorType>(V1->getType())->getElementCount().Min;		int V1Size =
		cast<VectorType>(V1->getType())->getElementCount().getKnownMinValue();
for (int Elem : Mask)		for (int Elem : Mask)
if (Elem != UndefMaskElem && Elem >= V1Size * 2)		if (Elem != UndefMaskElem && Elem >= V1Size * 2)
return false;		return false;

if (isa<ScalableVectorType>(V1->getType()))		if (isa<ScalableVectorType>(V1->getType()))
if ((Mask[0] != 0 && Mask[0] != UndefMaskElem) \|\| !is_splat(Mask))		if ((Mask[0] != 0 && Mask[0] != UndefMaskElem) \|\| !is_splat(Mask))
return false;		return false;

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	bool ShuffleVectorInst::isValidOperands(const Value V1, const Value V2,
return false;		return false;
}		}

void ShuffleVectorInst::getShuffleMask(const Constant *Mask,		void ShuffleVectorInst::getShuffleMask(const Constant *Mask,
SmallVectorImpl<int> &Result) {		SmallVectorImpl<int> &Result) {
ElementCount EC = cast<VectorType>(Mask->getType())->getElementCount();		ElementCount EC = cast<VectorType>(Mask->getType())->getElementCount();

if (isa<ConstantAggregateZero>(Mask)) {		if (isa<ConstantAggregateZero>(Mask)) {
Result.resize(EC.Min, 0);		Result.resize(EC.getKnownMinValue(), 0);
return;		return;
}		}

Result.reserve(EC.Min);		Result.reserve(EC.getKnownMinValue());

if (EC.Scalable) {		if (EC.isScalable()) {
assert((isa<ConstantAggregateZero>(Mask) \|\| isa<UndefValue>(Mask)) &&		assert((isa<ConstantAggregateZero>(Mask) \|\| isa<UndefValue>(Mask)) &&
"Scalable vector shuffle mask must be undef or zeroinitializer");		"Scalable vector shuffle mask must be undef or zeroinitializer");
int MaskVal = isa<UndefValue>(Mask) ? -1 : 0;		int MaskVal = isa<UndefValue>(Mask) ? -1 : 0;
for (unsigned I = 0; I < EC.Min; ++I)		for (unsigned I = 0; I < EC.getKnownMinValue(); ++I)
Result.emplace_back(MaskVal);		Result.emplace_back(MaskVal);
return;		return;
}		}

unsigned NumElts = EC.Min;		unsigned NumElts = EC.getKnownMinValue();

if (auto *CDS = dyn_cast<ConstantDataSequential>(Mask)) {		if (auto *CDS = dyn_cast<ConstantDataSequential>(Mask)) {
for (unsigned i = 0; i != NumElts; ++i)		for (unsigned i = 0; i != NumElts; ++i)
Result.push_back(CDS->getElementAsInteger(i));		Result.push_back(CDS->getElementAsInteger(i));
return;		return;
}		}
for (unsigned i = 0; i != NumElts; ++i) {		for (unsigned i = 0; i != NumElts; ++i) {
Constant *C = Mask->getAggregateElement(i);		Constant *C = Mask->getAggregateElement(i);
▲ Show 20 Lines • Show All 2,428 Lines • Show Last 20 Lines

llvm/lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 274 Lines • ▼ Show 20 Lines	bool VPIntrinsic::canIgnoreVectorLengthParam() const {
if (!VLParam)		if (!VLParam)
return true;		return true;

// Note that the VP intrinsic causes undefined behavior if the Explicit Vector		// Note that the VP intrinsic causes undefined behavior if the Explicit Vector
// Length parameter is strictly greater-than the number of vector elements of		// Length parameter is strictly greater-than the number of vector elements of
// the operation. This function returns true when this is detected statically		// the operation. This function returns true when this is detected statically
// in the IR.		// in the IR.

// Check whether "W == vscale * EC.Min"		// Check whether "W == vscale * EC.getKnownMinValue()"
if (EC.Scalable) {		if (EC.isScalable()) {
// Undig the DL		// Undig the DL
auto ParMod = this->getModule();		auto ParMod = this->getModule();
if (!ParMod)		if (!ParMod)
return false;		return false;
const auto &DL = ParMod->getDataLayout();		const auto &DL = ParMod->getDataLayout();

// Compare vscale patterns		// Compare vscale patterns
uint64_t VScaleFactor;		uint64_t VScaleFactor;
if (match(VLParam, m_c_Mul(m_ConstantInt(VScaleFactor), m_VScale(DL))))		if (match(VLParam, m_c_Mul(m_ConstantInt(VScaleFactor), m_VScale(DL))))
return VScaleFactor >= EC.Min;		return VScaleFactor >= EC.getKnownMinValue();
return (EC.Min == 1) && match(VLParam, m_VScale(DL));		return (EC.getKnownMinValue() == 1) && match(VLParam, m_VScale(DL));
}		}

// standard SIMD operation		// standard SIMD operation
auto VLConst = dyn_cast<ConstantInt>(VLParam);		auto VLConst = dyn_cast<ConstantInt>(VLParam);
if (!VLConst)		if (!VLConst)
return false;		return false;

uint64_t VLNum = VLConst->getZExtValue();		uint64_t VLNum = VLConst->getZExtValue();
if (VLNum >= EC.Min)		if (VLNum >= EC.getKnownMinValue())
return true;		return true;

return false;		return false;
}		}

Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {		Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
case Intrinsic::uadd_with_overflow:		case Intrinsic::uadd_with_overflow:
Show All 36 Lines

llvm/lib/IR/Type.cpp

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	TypeSize Type::getPrimitiveSizeInBits() const {
case Type::IntegerTyID:		case Type::IntegerTyID:
return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());		return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
const VectorType *VTy = cast<VectorType>(this);		const VectorType *VTy = cast<VectorType>(this);
ElementCount EC = VTy->getElementCount();		ElementCount EC = VTy->getElementCount();
TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();		TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();
assert(!ETS.isScalable() && "Vector type should have fixed-width elements");		assert(!ETS.isScalable() && "Vector type should have fixed-width elements");
return {ETS.getFixedSize() * EC.Min, EC.Scalable};		return {ETS.getFixedSize() * EC.getKnownMinValue(), EC.isScalable()};
}		}
default: return TypeSize::Fixed(0);		default: return TypeSize::Fixed(0);
}		}
}		}

unsigned Type::getScalarSizeInBits() const {		unsigned Type::getScalarSizeInBits() const {
// It is safe to assume that the scalar types have a fixed size.		// It is safe to assume that the scalar types have a fixed size.
return getScalarType()->getPrimitiveSizeInBits().getFixedSize();		return getScalarType()->getPrimitiveSizeInBits().getFixedSize();
▲ Show 20 Lines • Show All 453 Lines • ▼ Show 20 Lines
VectorType::VectorType(Type *ElType, unsigned EQ, Type::TypeID TID)		VectorType::VectorType(Type *ElType, unsigned EQ, Type::TypeID TID)
: Type(ElType->getContext(), TID), ContainedType(ElType),		: Type(ElType->getContext(), TID), ContainedType(ElType),
ElementQuantity(EQ) {		ElementQuantity(EQ) {
ContainedTys = &ContainedType;		ContainedTys = &ContainedType;
NumContainedTys = 1;		NumContainedTys = 1;
}		}

VectorType VectorType::get(Type ElementType, ElementCount EC) {		VectorType VectorType::get(Type ElementType, ElementCount EC) {
if (EC.Scalable)		if (EC.isScalable())
return ScalableVectorType::get(ElementType, EC.Min);		return ScalableVectorType::get(ElementType, EC.getKnownMinValue());
else		else
return FixedVectorType::get(ElementType, EC.Min);		return FixedVectorType::get(ElementType, EC.getKnownMinValue());
}		}

bool VectorType::isValidElementType(Type *ElemTy) {		bool VectorType::isValidElementType(Type *ElemTy) {
return ElemTy->isIntegerTy() \|\| ElemTy->isFloatingPointTy() \|\|		return ElemTy->isIntegerTy() \|\| ElemTy->isFloatingPointTy() \|\|
ElemTy->isPointerTy();		ElemTy->isPointerTy();
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

Show First 20 Lines • Show All 4,821 Lines • ▼ Show 20 Lines	static EVT getPackedVectorTypeFromPredicateType(LLVMContext &Ctx, EVT PredVT,
if (!PredVT.isScalableVector() \|\| PredVT.getVectorElementType() != MVT::i1)		if (!PredVT.isScalableVector() \|\| PredVT.getVectorElementType() != MVT::i1)
return EVT();		return EVT();

if (PredVT != MVT::nxv16i1 && PredVT != MVT::nxv8i1 &&		if (PredVT != MVT::nxv16i1 && PredVT != MVT::nxv8i1 &&
PredVT != MVT::nxv4i1 && PredVT != MVT::nxv2i1)		PredVT != MVT::nxv4i1 && PredVT != MVT::nxv2i1)
return EVT();		return EVT();

ElementCount EC = PredVT.getVectorElementCount();		ElementCount EC = PredVT.getVectorElementCount();
EVT ScalarVT = EVT::getIntegerVT(Ctx, AArch64::SVEBitsPerBlock / EC.Min);		EVT ScalarVT =
		EVT::getIntegerVT(Ctx, AArch64::SVEBitsPerBlock / EC.getKnownMinValue());
EVT MemVT = EVT::getVectorVT(Ctx, ScalarVT, EC * NumVec);		EVT MemVT = EVT::getVectorVT(Ctx, ScalarVT, EC * NumVec);

return MemVT;		return MemVT;
}		}

/// Return the EVT of the data associated to a memory operation in \p		/// Return the EVT of the data associated to a memory operation in \p
/// Root. If such EVT cannot be retrived, it returns an invalid EVT.		/// Root. If such EVT cannot be retrived, it returns an invalid EVT.
static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {		static EVT getMemVTFromNode(LLVMContext &Ctx, SDNode *Root) {
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,526 Lines • ▼ Show 20 Lines	if (VT.isVector()) {
}		}

if (StoreNode->isTruncatingStore()) {		if (StoreNode->isTruncatingStore()) {
return LowerTruncateVectorStore(Dl, StoreNode, VT, MemVT, DAG);		return LowerTruncateVectorStore(Dl, StoreNode, VT, MemVT, DAG);
}		}
// 256 bit non-temporal stores can be lowered to STNP. Do this as part of		// 256 bit non-temporal stores can be lowered to STNP. Do this as part of
// the custom lowering, as there are no un-paired non-temporal stores and		// the custom lowering, as there are no un-paired non-temporal stores and
// legalization will break up 256 bit inputs.		// legalization will break up 256 bit inputs.
		ElementCount EC = MemVT.getVectorElementCount();
if (StoreNode->isNonTemporal() && MemVT.getSizeInBits() == 256u &&		if (StoreNode->isNonTemporal() && MemVT.getSizeInBits() == 256u &&
MemVT.getVectorElementCount().Min % 2u == 0 &&		EC.isKnownEven() &&
((MemVT.getScalarSizeInBits() == 8u \|\|		((MemVT.getScalarSizeInBits() == 8u \|\|
MemVT.getScalarSizeInBits() == 16u \|\|		MemVT.getScalarSizeInBits() == 16u \|\|
MemVT.getScalarSizeInBits() == 32u \|\|		MemVT.getScalarSizeInBits() == 32u \|\|
MemVT.getScalarSizeInBits() == 64u))) {		MemVT.getScalarSizeInBits() == 64u))) {
SDValue Lo =		SDValue Lo =
DAG.getNode(ISD::EXTRACT_SUBVECTOR, Dl,		DAG.getNode(ISD::EXTRACT_SUBVECTOR, Dl,
MemVT.getHalfNumVectorElementsVT(*DAG.getContext()),		MemVT.getHalfNumVectorElementsVT(*DAG.getContext()),
StoreNode->getValue(), DAG.getConstant(0, Dl, MVT::i64));		StoreNode->getValue(), DAG.getConstant(0, Dl, MVT::i64));
SDValue Hi = DAG.getNode(		SDValue Hi =
ISD::EXTRACT_SUBVECTOR, Dl,		DAG.getNode(ISD::EXTRACT_SUBVECTOR, Dl,
MemVT.getHalfNumVectorElementsVT(*DAG.getContext()),		MemVT.getHalfNumVectorElementsVT(*DAG.getContext()),
		ctetreauUnsubmitted Not Done Reply Inline Actions What is this `>>` thing? Some indicator of whitespace change, or is this a hard tab? ctetreau: What is this `>>` thing? Some indicator of whitespace change, or is this a hard tab?
		efriedmaUnsubmitted Not Done Reply Inline Actions It's an indicator of a whitespace change; I think they started showing up with the recent Phabricator upgrade efriedma: It's an indicator of a whitespace change; I think they started showing up with the recent…
StoreNode->getValue(),		StoreNode->getValue(),
DAG.getConstant(MemVT.getVectorElementCount().Min / 2, Dl, MVT::i64));		DAG.getConstant(EC.getKnownMinValue() / 2, Dl, MVT::i64));
SDValue Result = DAG.getMemIntrinsicNode(		SDValue Result = DAG.getMemIntrinsicNode(
AArch64ISD::STNP, Dl, DAG.getVTList(MVT::Other),		AArch64ISD::STNP, Dl, DAG.getVTList(MVT::Other),
{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},		{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},
StoreNode->getMemoryVT(), StoreNode->getMemOperand());		StoreNode->getMemoryVT(), StoreNode->getMemOperand());
return Result;		return Result;
}		}
} else if (MemVT == MVT::i128 && StoreNode->isVolatile()) {		} else if (MemVT == MVT::i128 && StoreNode->isVolatile()) {
assert(StoreNode->getValue()->getValueType(0) == MVT::i128);		assert(StoreNode->getValue()->getValueType(0) == MVT::i128);
▲ Show 20 Lines • Show All 6,807 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerSVEStructLoad(unsigned Intrinsic,

unsigned N, Opcode;		unsigned N, Opcode;
static std::map<unsigned, std::pair<unsigned, unsigned>> IntrinsicMap = {		static std::map<unsigned, std::pair<unsigned, unsigned>> IntrinsicMap = {
{Intrinsic::aarch64_sve_ld2, {2, AArch64ISD::SVE_LD2_MERGE_ZERO}},		{Intrinsic::aarch64_sve_ld2, {2, AArch64ISD::SVE_LD2_MERGE_ZERO}},
{Intrinsic::aarch64_sve_ld3, {3, AArch64ISD::SVE_LD3_MERGE_ZERO}},		{Intrinsic::aarch64_sve_ld3, {3, AArch64ISD::SVE_LD3_MERGE_ZERO}},
{Intrinsic::aarch64_sve_ld4, {4, AArch64ISD::SVE_LD4_MERGE_ZERO}}};		{Intrinsic::aarch64_sve_ld4, {4, AArch64ISD::SVE_LD4_MERGE_ZERO}}};

std::tie(N, Opcode) = IntrinsicMap[Intrinsic];		std::tie(N, Opcode) = IntrinsicMap[Intrinsic];
assert(VT.getVectorElementCount().Min % N == 0 &&		assert(VT.getVectorElementCount().getKnownMinValue() % N == 0 &&
"invalid tuple vector type!");		"invalid tuple vector type!");

EVT SplitVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),		EVT SplitVT = EVT::getVectorVT(*DAG.getContext(), VT.getVectorElementType(),
VT.getVectorElementCount() / N);		VT.getVectorElementCount() / N);
assert(isTypeLegal(SplitVT));		assert(isTypeLegal(SplitVT));

SmallVector<EVT, 5> VTs(N, SplitVT);		SmallVector<EVT, 5> VTs(N, SplitVT);
VTs.push_back(MVT::Other); // Chain		VTs.push_back(MVT::Other); // Chain
▲ Show 20 Lines • Show All 4,056 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_W_CHAIN:
case Intrinsic::aarch64_sve_tuple_get: {		case Intrinsic::aarch64_sve_tuple_get: {
SDLoc DL(N);		SDLoc DL(N);
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Src1 = N->getOperand(2);		SDValue Src1 = N->getOperand(2);
SDValue Idx = N->getOperand(3);		SDValue Idx = N->getOperand(3);

uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();		uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();
EVT ResVT = N->getValueType(0);		EVT ResVT = N->getValueType(0);
uint64_t NumLanes = ResVT.getVectorElementCount().Min;		uint64_t NumLanes = ResVT.getVectorElementCount().getKnownMinValue();
SDValue ExtIdx = DAG.getVectorIdxConstant(IdxConst * NumLanes, DL);		SDValue ExtIdx = DAG.getVectorIdxConstant(IdxConst * NumLanes, DL);
SDValue Val =		SDValue Val =
DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResVT, Src1, ExtIdx);		DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, ResVT, Src1, ExtIdx);
return DAG.getMergeValues({Val, Chain}, DL);		return DAG.getMergeValues({Val, Chain}, DL);
}		}
case Intrinsic::aarch64_sve_tuple_set: {		case Intrinsic::aarch64_sve_tuple_set: {
SDLoc DL(N);		SDLoc DL(N);
SDValue Chain = N->getOperand(0);		SDValue Chain = N->getOperand(0);
SDValue Tuple = N->getOperand(2);		SDValue Tuple = N->getOperand(2);
SDValue Idx = N->getOperand(3);		SDValue Idx = N->getOperand(3);
SDValue Vec = N->getOperand(4);		SDValue Vec = N->getOperand(4);

EVT TupleVT = Tuple.getValueType();		EVT TupleVT = Tuple.getValueType();
uint64_t TupleLanes = TupleVT.getVectorElementCount().Min;		uint64_t TupleLanes = TupleVT.getVectorElementCount().getKnownMinValue();

uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();		uint64_t IdxConst = cast<ConstantSDNode>(Idx)->getZExtValue();
uint64_t NumLanes = Vec.getValueType().getVectorElementCount().Min;		uint64_t NumLanes =
		Vec.getValueType().getVectorElementCount().getKnownMinValue();

if ((TupleLanes % NumLanes) != 0)		if ((TupleLanes % NumLanes) != 0)
report_fatal_error("invalid tuple vector!");		report_fatal_error("invalid tuple vector!");

uint64_t NumVecs = TupleLanes / NumLanes;		uint64_t NumVecs = TupleLanes / NumLanes;

SmallVector<SDValue, 4> Opnds;		SmallVector<SDValue, 4> Opnds;
for (unsigned I = 0; I < NumVecs; ++I) {		for (unsigned I = 0; I < NumVecs; ++I) {
▲ Show 20 Lines • Show All 219 Lines • ▼ Show 20 Lines	void AArch64TargetLowering::ReplaceExtractSubVectorResults(

SDLoc DL(N);		SDLoc DL(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// The following checks bail if this is not a halving operation.		// The following checks bail if this is not a halving operation.

ElementCount ResEC = VT.getVectorElementCount();		ElementCount ResEC = VT.getVectorElementCount();

if (InVT.getVectorElementCount().Min != (ResEC.Min * 2))		if (InVT.getVectorElementCount() != (ResEC * 2))
return;		return;

auto *CIndex = dyn_cast<ConstantSDNode>(N->getOperand(1));		auto *CIndex = dyn_cast<ConstantSDNode>(N->getOperand(1));
if (!CIndex)		if (!CIndex)
return;		return;

unsigned Index = CIndex->getZExtValue();		unsigned Index = CIndex->getZExtValue();
if ((Index != 0) && (Index != ResEC.Min))		if ((Index != 0) && (Index != ResEC.getKnownMinValue()))
return;		return;

unsigned Opcode = (Index == 0) ? AArch64ISD::UUNPKLO : AArch64ISD::UUNPKHI;		unsigned Opcode = (Index == 0) ? AArch64ISD::UUNPKLO : AArch64ISD::UUNPKHI;
EVT ExtendedHalfVT = VT.widenIntegerVectorElementType(*DAG.getContext());		EVT ExtendedHalfVT = VT.widenIntegerVectorElementType(*DAG.getContext());

SDValue Half = DAG.getNode(Opcode, DL, ExtendedHalfVT, N->getOperand(0));		SDValue Half = DAG.getNode(Opcode, DL, ExtendedHalfVT, N->getOperand(0));
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, Half));		Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, VT, Half));
}		}
▲ Show 20 Lines • Show All 1,002 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

Show First 20 Lines • Show All 334 Lines • ▼ Show 20 Lines	if (Value *V = SimplifyExtractElementInst(SrcVec, Index,
SQ.getWithInstruction(&EI)))		SQ.getWithInstruction(&EI)))
return replaceInstUsesWith(EI, V);		return replaceInstUsesWith(EI, V);

// If extracting a specified index from the vector, see if we can recursively		// If extracting a specified index from the vector, see if we can recursively
// find a previously computed scalar that was inserted into the vector.		// find a previously computed scalar that was inserted into the vector.
auto *IndexC = dyn_cast<ConstantInt>(Index);		auto *IndexC = dyn_cast<ConstantInt>(Index);
if (IndexC) {		if (IndexC) {
ElementCount EC = EI.getVectorOperandType()->getElementCount();		ElementCount EC = EI.getVectorOperandType()->getElementCount();
unsigned NumElts = EC.Min;		unsigned NumElts = EC.getKnownMinValue();

// InstSimplify should handle cases where the index is invalid.		// InstSimplify should handle cases where the index is invalid.
// For fixed-length vector, it's invalid to extract out-of-range element.		// For fixed-length vector, it's invalid to extract out-of-range element.
if (!EC.Scalable && IndexC->getValue().uge(NumElts))		if (!EC.isScalable() && IndexC->getValue().uge(NumElts))
return nullptr;		return nullptr;

// This instruction only demands the single element from the input vector.		// This instruction only demands the single element from the input vector.
// Skip for scalable type, the number of elements is unknown at		// Skip for scalable type, the number of elements is unknown at
// compile-time.		// compile-time.
if (!EC.Scalable && NumElts != 1) {		if (!EC.isScalable() && NumElts != 1) {
// If the input vector has a single use, simplify it based on this use		// If the input vector has a single use, simplify it based on this use
// property.		// property.
if (SrcVec->hasOneUse()) {		if (SrcVec->hasOneUse()) {
APInt UndefElts(NumElts, 0);		APInt UndefElts(NumElts, 0);
APInt DemandedElts(NumElts, 0);		APInt DemandedElts(NumElts, 0);
DemandedElts.setBit(IndexC->getZExtValue());		DemandedElts.setBit(IndexC->getZExtValue());
if (Value *V =		if (Value *V =
SimplifyDemandedVectorElts(SrcVec, DemandedElts, UndefElts))		SimplifyDemandedVectorElts(SrcVec, DemandedElts, UndefElts))
▲ Show 20 Lines • Show All 2,281 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/FunctionComparator.cpp

Show First 20 Lines • Show All 482 Lines • ▼ Show 20 Lines	case Type::ArrayTyID: {
if (STyL->getNumElements() != STyR->getNumElements())		if (STyL->getNumElements() != STyR->getNumElements())
return cmpNumbers(STyL->getNumElements(), STyR->getNumElements());		return cmpNumbers(STyL->getNumElements(), STyR->getNumElements());
return cmpTypes(STyL->getElementType(), STyR->getElementType());		return cmpTypes(STyL->getElementType(), STyR->getElementType());
}		}
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
auto *STyL = cast<VectorType>(TyL);		auto *STyL = cast<VectorType>(TyL);
auto *STyR = cast<VectorType>(TyR);		auto *STyR = cast<VectorType>(TyR);
if (STyL->getElementCount().Scalable != STyR->getElementCount().Scalable)		if (STyL->getElementCount().isScalable() !=
return cmpNumbers(STyL->getElementCount().Scalable,		STyR->getElementCount().isScalable())
STyR->getElementCount().Scalable);		return cmpNumbers(STyL->getElementCount().isScalable(),
if (STyL->getElementCount().Min != STyR->getElementCount().Min)		STyR->getElementCount().isScalable());
return cmpNumbers(STyL->getElementCount().Min,		if (STyL->getElementCount() != STyR->getElementCount())
STyR->getElementCount().Min);		return cmpNumbers(STyL->getElementCount().getKnownMinValue(),
		STyR->getElementCount().getKnownMinValue());
return cmpTypes(STyL->getElementType(), STyR->getElementType());		return cmpTypes(STyL->getElementType(), STyR->getElementType());
}		}
}		}
}		}

// Determine whether the two operations are the same except that pointer-to-A		// Determine whether the two operations are the same except that pointer-to-A
// and pointer-to-B are equivalent. This should be kept in sync with		// and pointer-to-B are equivalent. This should be kept in sync with
// Instruction::isSameOperationAs.		// Instruction::isSameOperationAs.
▲ Show 20 Lines • Show All 465 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	if (auto *LI = dyn_cast<LoadInst>(I))
return LI->getType();		return LI->getType();
return cast<StoreInst>(I)->getValueOperand()->getType();		return cast<StoreInst>(I)->getValueOperand()->getType();
}		}

/// A helper function that returns true if the given type is irregular. The		/// A helper function that returns true if the given type is irregular. The
/// type is irregular if its allocated size doesn't equal the store size of an		/// type is irregular if its allocated size doesn't equal the store size of an
/// element of the corresponding vector type at the given vectorization factor.		/// element of the corresponding vector type at the given vectorization factor.
static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {		static bool hasIrregularType(Type *Ty, const DataLayout &DL, ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
// Determine if an array of VF elements of type Ty is "bitcast compatible"		// Determine if an array of VF elements of type Ty is "bitcast compatible"
// with a <VF x Ty> vector.		// with a <VF x Ty> vector.
if (VF.isVector()) {		if (VF.isVector()) {
auto *VectorTy = VectorType::get(Ty, VF);		auto *VectorTy = VectorType::get(Ty, VF);
return VF * DL.getTypeAllocSize(Ty) != DL.getTypeStoreSize(VectorTy);		return VF * DL.getTypeAllocSize(Ty) != DL.getTypeStoreSize(VectorTy);
}		}

// If the vectorization factor is one, we just check if an array of type Ty		// If the vectorization factor is one, we just check if an array of type Ty
▲ Show 20 Lines • Show All 540 Lines • ▼ Show 20 Lines	static Instruction getDebugLocFromInstOrOperands(Instruction I) {
return I;		return I;
}		}

void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {		void InnerLoopVectorizer::setDebugLocFromInst(IRBuilder<> &B, const Value *Ptr) {
if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {		if (const Instruction *Inst = dyn_cast_or_null<Instruction>(Ptr)) {
const DILocation *DIL = Inst->getDebugLoc();		const DILocation *DIL = Inst->getDebugLoc();
if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&		if (DIL && Inst->getFunction()->isDebugInfoForProfiling() &&
!isa<DbgInfoIntrinsic>(Inst)) {		!isa<DbgInfoIntrinsic>(Inst)) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto NewDIL = DIL->cloneByMultiplyingDuplicationFactor(UF * VF.Min);		auto NewDIL =
		DIL->cloneByMultiplyingDuplicationFactor(UF * VF.getKnownMinValue());
if (NewDIL)		if (NewDIL)
B.SetCurrentDebugLocation(NewDIL.getValue());		B.SetCurrentDebugLocation(NewDIL.getValue());
else		else
LLVM_DEBUG(dbgs()		LLVM_DEBUG(dbgs()
<< "Failed to create new discriminator: "		<< "Failed to create new discriminator: "
<< DIL->getFilename() << " Line: " << DIL->getLine());		<< DIL->getFilename() << " Line: " << DIL->getLine());
}		}
else		else
▲ Show 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < Grp->getFactor(); ++i) {
}		}
}		}
}		}

/// Return the cost model decision for the given instruction \p I and vector		/// Return the cost model decision for the given instruction \p I and vector
/// width \p VF. Return CM_Unknown if this instruction did not pass		/// width \p VF. Return CM_Unknown if this instruction did not pass
/// through the cost modeling.		/// through the cost modeling.
InstWidening getWideningDecision(Instruction *I, ElementCount VF) {		InstWidening getWideningDecision(Instruction *I, ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
assert(VF.isVector() && "Expected VF >=2");		assert(VF.isVector() && "Expected VF >=2");

// Cost model is not run in the VPlan-native path - return conservative		// Cost model is not run in the VPlan-native path - return conservative
// result until this changes.		// result until this changes.
if (EnableVPlanNativePath)		if (EnableVPlanNativePath)
return CM_GatherScatter;		return CM_GatherScatter;

std::pair<Instruction *, ElementCount> InstOnVF = std::make_pair(I, VF);		std::pair<Instruction *, ElementCount> InstOnVF = std::make_pair(I, VF);
▲ Show 20 Lines • Show All 604 Lines • ▼ Show 20 Lines	if (Step->getType()->isIntegerTy()) {
MulOp = Instruction::Mul;		MulOp = Instruction::Mul;
} else {		} else {
AddOp = II.getInductionOpcode();		AddOp = II.getInductionOpcode();
MulOp = Instruction::FMul;		MulOp = Instruction::FMul;
}		}

// Multiply the vectorization factor by the step using integer or		// Multiply the vectorization factor by the step using integer or
// floating-point arithmetic as appropriate.		// floating-point arithmetic as appropriate.
Value *ConstVF = getSignedIntOrFpConstant(Step->getType(), VF.Min);		Value *ConstVF =
		getSignedIntOrFpConstant(Step->getType(), VF.getKnownMinValue());
Value *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, Step, ConstVF));		Value *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, Step, ConstVF));

// Create a vector splat to use in the induction update.		// Create a vector splat to use in the induction update.
//		//
// FIXME: If the step is non-constant, we create the vector splat with		// FIXME: If the step is non-constant, we create the vector splat with
// IRBuilder. IRBuilder can constant-fold the multiply, but it doesn't		// IRBuilder. IRBuilder can constant-fold the multiply, but it doesn't
// handle a constant vector splat.		// handle a constant vector splat.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Value *SplatVF = isa<Constant>(Mul)		Value *SplatVF = isa<Constant>(Mul)
? ConstantVector::getSplat(VF, cast<Constant>(Mul))		? ConstantVector::getSplat(VF, cast<Constant>(Mul))
: Builder.CreateVectorSplat(VF, Mul);		: Builder.CreateVectorSplat(VF, Mul);
Builder.restoreIP(CurrIP);		Builder.restoreIP(CurrIP);

// We may need to add the step a number of times, depending on the unroll		// We may need to add the step a number of times, depending on the unroll
// factor. The last of those goes into the PHI.		// factor. The last of those goes into the PHI.
PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",		PHINode *VecInd = PHINode::Create(SteppedStart->getType(), 2, "vec.ind",
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	auto CreateScalarIV = [&](Value &Step) -> Value {
return ScalarIV;		return ScalarIV;
};		};

// Create the vector values from the scalar IV, in the absence of creating a		// Create the vector values from the scalar IV, in the absence of creating a
// vector IV.		// vector IV.
auto CreateSplatIV = [&](Value ScalarIV, Value Step) {		auto CreateSplatIV = [&](Value ScalarIV, Value Step) {
Value *Broadcasted = getBroadcastInstrs(ScalarIV);		Value *Broadcasted = getBroadcastInstrs(ScalarIV);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Value EntryPart = getStepVector(Broadcasted, VF.Min Part, Step,		Value *EntryPart =
		getStepVector(Broadcasted, VF.getKnownMinValue() * Part, Step,
ID.getInductionOpcode());		ID.getInductionOpcode());
VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);		VectorLoopValueMap.setVectorValue(EntryVal, Part, EntryPart);
if (Trunc)		if (Trunc)
addMetadata(EntryPart, Trunc);		addMetadata(EntryPart, Trunc);
recordVectorLoopValueForInductionCast(ID, EntryVal, EntryPart, Part);		recordVectorLoopValueForInductionCast(ID, EntryVal, EntryPart, Part);
}		}
};		};

// Now do the actual transformations, and start with creating the step value.		// Now do the actual transformations, and start with creating the step value.
▲ Show 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getStepVector(Value Val, int StartIdx, Value *Step,
return BOp;		return BOp;
}		}

void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,		void InnerLoopVectorizer::buildScalarSteps(Value ScalarIV, Value Step,
Instruction *EntryVal,		Instruction *EntryVal,
const InductionDescriptor &ID) {		const InductionDescriptor &ID) {
// We shouldn't have to build scalar steps if we aren't vectorizing.		// We shouldn't have to build scalar steps if we aren't vectorizing.
assert(VF.isVector() && "VF should be greater than one");		assert(VF.isVector() && "VF should be greater than one");
assert(!VF.Scalable &&		assert(!VF.isScalable() &&
"the code below assumes a fixed number of elements at compile time");		"the code below assumes a fixed number of elements at compile time");
// Get the value type and ensure it and the step have the same integer type.		// Get the value type and ensure it and the step have the same integer type.
Type *ScalarIVTy = ScalarIV->getType()->getScalarType();		Type *ScalarIVTy = ScalarIV->getType()->getScalarType();
assert(ScalarIVTy == Step->getType() &&		assert(ScalarIVTy == Step->getType() &&
"Val and Step should have the same type");		"Val and Step should have the same type");

// We build scalar steps for both integer and floating-point induction		// We build scalar steps for both integer and floating-point induction
// variables. Here, we determine the kind of arithmetic we will perform.		// variables. Here, we determine the kind of arithmetic we will perform.
Instruction::BinaryOps AddOp;		Instruction::BinaryOps AddOp;
Instruction::BinaryOps MulOp;		Instruction::BinaryOps MulOp;
if (ScalarIVTy->isIntegerTy()) {		if (ScalarIVTy->isIntegerTy()) {
AddOp = Instruction::Add;		AddOp = Instruction::Add;
MulOp = Instruction::Mul;		MulOp = Instruction::Mul;
} else {		} else {
AddOp = ID.getInductionOpcode();		AddOp = ID.getInductionOpcode();
MulOp = Instruction::FMul;		MulOp = Instruction::FMul;
}		}

// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If EntryVal is uniform, we only need to generate the first		// iteration. If EntryVal is uniform, we only need to generate the first
// lane. Otherwise, we generate all VF values.		// lane. Otherwise, we generate all VF values.
unsigned Lanes =		unsigned Lanes =
Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF)		Cost->isUniformAfterVectorization(cast<Instruction>(EntryVal), VF)
? 1		? 1
: VF.Min;		: VF.getKnownMinValue();
// Compute the scalar steps and save the results in VectorLoopValueMap.		// Compute the scalar steps and save the results in VectorLoopValueMap.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
auto *StartIdx =		auto *StartIdx = getSignedIntOrFpConstant(
getSignedIntOrFpConstant(ScalarIVTy, VF.Min * Part + Lane);		ScalarIVTy, VF.getKnownMinValue() * Part + Lane);
auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));		auto *Mul = addFastMathFlag(Builder.CreateBinOp(MulOp, StartIdx, Step));
auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));		auto *Add = addFastMathFlag(Builder.CreateBinOp(AddOp, ScalarIV, Mul));
VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);		VectorLoopValueMap.setScalarValue(EntryVal, {Part, Lane}, Add);
recordVectorLoopValueForInductionCast(ID, EntryVal, Add, Part, Lane);		recordVectorLoopValueForInductionCast(ID, EntryVal, Add, Part, Lane);
}		}
}		}
}		}

Show All 26 Lines	if (VF == 1) {
VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);		VectorLoopValueMap.setVectorValue(V, Part, ScalarValue);
return ScalarValue;		return ScalarValue;
}		}

// Get the last scalar instruction we generated for V and Part. If the value		// Get the last scalar instruction we generated for V and Part. If the value
// is known to be uniform after vectorization, this corresponds to lane zero		// is known to be uniform after vectorization, this corresponds to lane zero
// of the Part unroll iteration. Otherwise, the last instruction is the one		// of the Part unroll iteration. Otherwise, the last instruction is the one
// we created for the last vector lane of the Part unroll iteration.		// we created for the last vector lane of the Part unroll iteration.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
unsigned LastLane =		unsigned LastLane = Cost->isUniformAfterVectorization(I, VF)
Cost->isUniformAfterVectorization(I, VF) ? 0 : VF.Min - 1;		? 0
		: VF.getKnownMinValue() - 1;
auto *LastInst = cast<Instruction>(		auto *LastInst = cast<Instruction>(
VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));		VectorLoopValueMap.getScalarValue(V, {Part, LastLane}));

// Set the insert point after the last scalarized instruction. This ensures		// Set the insert point after the last scalarized instruction. This ensures
// the insertelement sequence will directly follow the scalar definitions.		// the insertelement sequence will directly follow the scalar definitions.
auto OldIP = Builder.saveIP();		auto OldIP = Builder.saveIP();
auto NewIP = std::next(BasicBlock::iterator(LastInst));		auto NewIP = std::next(BasicBlock::iterator(LastInst));
Builder.SetInsertPoint(&*NewIP);		Builder.SetInsertPoint(&*NewIP);

// However, if we are vectorizing, we need to construct the vector values.		// However, if we are vectorizing, we need to construct the vector values.
// If the value is known to be uniform after vectorization, we can just		// If the value is known to be uniform after vectorization, we can just
// broadcast the scalar value corresponding to lane zero for each unroll		// broadcast the scalar value corresponding to lane zero for each unroll
// iteration. Otherwise, we construct the vector values using insertelement		// iteration. Otherwise, we construct the vector values using insertelement
// instructions. Since the resulting vectors are stored in		// instructions. Since the resulting vectors are stored in
// VectorLoopValueMap, we will only generate the insertelements once.		// VectorLoopValueMap, we will only generate the insertelements once.
Value *VectorValue = nullptr;		Value *VectorValue = nullptr;
if (Cost->isUniformAfterVectorization(I, VF)) {		if (Cost->isUniformAfterVectorization(I, VF)) {
VectorValue = getBroadcastInstrs(ScalarValue);		VectorValue = getBroadcastInstrs(ScalarValue);
VectorLoopValueMap.setVectorValue(V, Part, VectorValue);		VectorLoopValueMap.setVectorValue(V, Part, VectorValue);
} else {		} else {
// Initialize packing with insertelements to start from undef.		// Initialize packing with insertelements to start from undef.
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
Value *Undef = UndefValue::get(VectorType::get(V->getType(), VF));		Value *Undef = UndefValue::get(VectorType::get(V->getType(), VF));
VectorLoopValueMap.setVectorValue(V, Part, Undef);		VectorLoopValueMap.setVectorValue(V, Part, Undef);
for (unsigned Lane = 0; Lane < VF.Min; ++Lane)		for (unsigned Lane = 0; Lane < VF.getKnownMinValue(); ++Lane)
packScalarIntoVectorValue(V, {Part, Lane});		packScalarIntoVectorValue(V, {Part, Lane});
VectorValue = VectorLoopValueMap.getVectorValue(V, Part);		VectorValue = VectorLoopValueMap.getVectorValue(V, Part);
}		}
Builder.restoreIP(OldIP);		Builder.restoreIP(OldIP);
return VectorValue;		return VectorValue;
}		}

// If this scalar is unknown, assume that it is a constant or that it is		// If this scalar is unknown, assume that it is a constant or that it is
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::packScalarIntoVectorValue(
Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);		Value *VectorValue = VectorLoopValueMap.getVectorValue(V, Instance.Part);
VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,		VectorValue = Builder.CreateInsertElement(VectorValue, ScalarInst,
Builder.getInt32(Instance.Lane));		Builder.getInt32(Instance.Lane));
VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);		VectorLoopValueMap.resetVectorValue(V, Instance.Part, VectorValue);
}		}

Value InnerLoopVectorizer::reverseVector(Value Vec) {		Value InnerLoopVectorizer::reverseVector(Value Vec) {
assert(Vec->getType()->isVectorTy() && "Invalid type");		assert(Vec->getType()->isVectorTy() && "Invalid type");
assert(!VF.Scalable && "Cannot reverse scalable vectors");		assert(!VF.isScalable() && "Cannot reverse scalable vectors");
SmallVector<int, 8> ShuffleMask;		SmallVector<int, 8> ShuffleMask;
for (unsigned i = 0; i < VF.Min; ++i)		for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)
ShuffleMask.push_back(VF.Min - i - 1);		ShuffleMask.push_back(VF.getKnownMinValue() - i - 1);

return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),		return Builder.CreateShuffleVector(Vec, UndefValue::get(Vec->getType()),
ShuffleMask, "reverse");		ShuffleMask, "reverse");
}		}

// Return whether we allow using masked interleave-groups (for dealing with		// Return whether we allow using masked interleave-groups (for dealing with
// strided loads/stores that reside in predicated blocks, or for dealing		// strided loads/stores that reside in predicated blocks, or for dealing
// with gaps).		// with gaps).
Show All 37 Lines	void InnerLoopVectorizer::vectorizeInterleaveGroup(
const InterleaveGroup<Instruction> *Group, VPTransformState &State,		const InterleaveGroup<Instruction> *Group, VPTransformState &State,
VPValue Addr, VPValue BlockInMask) {		VPValue Addr, VPValue BlockInMask) {
Instruction *Instr = Group->getInsertPos();		Instruction *Instr = Group->getInsertPos();
const DataLayout &DL = Instr->getModule()->getDataLayout();		const DataLayout &DL = Instr->getModule()->getDataLayout();

// Prepare for the vector type of the interleaved load/store.		// Prepare for the vector type of the interleaved load/store.
Type *ScalarTy = getMemInstValueType(Instr);		Type *ScalarTy = getMemInstValueType(Instr);
unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);		auto VecTy = VectorType::get(ScalarTy, VF InterleaveFactor);

// Prepare for the new pointers.		// Prepare for the new pointers.
SmallVector<Value *, 2> AddrParts;		SmallVector<Value *, 2> AddrParts;
unsigned Index = Group->getIndex(Instr);		unsigned Index = Group->getIndex(Instr);

// TODO: extend the masked interleaved-group support to reversed access.		// TODO: extend the masked interleaved-group support to reversed access.
assert((!BlockInMask \|\| !Group->isReverse()) &&		assert((!BlockInMask \|\| !Group->isReverse()) &&
"Reversed masked interleave-group not supported.");		"Reversed masked interleave-group not supported.");

// If the group is reverse, adjust the index to refer to the last vector lane		// If the group is reverse, adjust the index to refer to the last vector lane
// instead of the first. We adjust the index from the first vector lane,		// instead of the first. We adjust the index from the first vector lane,
// rather than directly getting the pointer for lane VF - 1, because the		// rather than directly getting the pointer for lane VF - 1, because the
// pointer operand of the interleaved access is supposed to be uniform. For		// pointer operand of the interleaved access is supposed to be uniform. For
// uniform instructions, we're only required to generate a value for the		// uniform instructions, we're only required to generate a value for the
// first vector lane in each unroll iteration.		// first vector lane in each unroll iteration.
assert(!VF.Scalable &&		assert(!VF.isScalable() &&
"scalable vector reverse operation is not implemented");		"scalable vector reverse operation is not implemented");
if (Group->isReverse())		if (Group->isReverse())
Index += (VF.Min - 1) * Group->getFactor();		Index += (VF.getKnownMinValue() - 1) * Group->getFactor();

for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *AddrPart = State.get(Addr, {Part, 0});		Value *AddrPart = State.get(Addr, {Part, 0});
setDebugLocFromInst(Builder, AddrPart);		setDebugLocFromInst(Builder, AddrPart);

// Notice current instruction could be any index. Need to adjust the address		// Notice current instruction could be any index. Need to adjust the address
// to the member of index 0.		// to the member of index 0.
//		//
Show All 18 Lines	for (unsigned Part = 0; Part < UF; Part++) {
AddrParts.push_back(Builder.CreateBitCast(AddrPart, PtrTy));		AddrParts.push_back(Builder.CreateBitCast(AddrPart, PtrTy));
}		}

setDebugLocFromInst(Builder, Instr);		setDebugLocFromInst(Builder, Instr);
Value *UndefVec = UndefValue::get(VecTy);		Value *UndefVec = UndefValue::get(VecTy);

Value *MaskForGaps = nullptr;		Value *MaskForGaps = nullptr;
if (Group->requiresScalarEpilogue() && !Cost->isScalarEpilogueAllowed()) {		if (Group->requiresScalarEpilogue() && !Cost->isScalarEpilogueAllowed()) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
MaskForGaps = createBitMaskForGaps(Builder, VF.Min, *Group);		MaskForGaps = createBitMaskForGaps(Builder, VF.getKnownMinValue(), *Group);
assert(MaskForGaps && "Mask for Gaps is required but it is null");		assert(MaskForGaps && "Mask for Gaps is required but it is null");
}		}

// Vectorize the interleaved load group.		// Vectorize the interleaved load group.
if (isa<LoadInst>(Instr)) {		if (isa<LoadInst>(Instr)) {
// For each unroll part, create a wide load for the group.		// For each unroll part, create a wide load for the group.
SmallVector<Value *, 2> NewLoads;		SmallVector<Value *, 2> NewLoads;
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Instruction *NewLoad;		Instruction *NewLoad;
if (BlockInMask \|\| MaskForGaps) {		if (BlockInMask \|\| MaskForGaps) {
assert(useMaskedInterleavedAccesses(*TTI) &&		assert(useMaskedInterleavedAccesses(*TTI) &&
"masked interleaved groups are not allowed.");		"masked interleaved groups are not allowed.");
Value *GroupMask = MaskForGaps;		Value *GroupMask = MaskForGaps;
if (BlockInMask) {		if (BlockInMask) {
Value *BlockInMaskPart = State.get(BlockInMask, Part);		Value *BlockInMaskPart = State.get(BlockInMask, Part);
auto *Undefs = UndefValue::get(BlockInMaskPart->getType());		auto *Undefs = UndefValue::get(BlockInMaskPart->getType());
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Value *ShuffledMask = Builder.CreateShuffleVector(		Value *ShuffledMask = Builder.CreateShuffleVector(
BlockInMaskPart, Undefs,		BlockInMaskPart, Undefs,
createReplicatedMask(InterleaveFactor, VF.Min),		createReplicatedMask(InterleaveFactor, VF.getKnownMinValue()),
"interleaved.mask");		"interleaved.mask");
GroupMask = MaskForGaps		GroupMask = MaskForGaps
? Builder.CreateBinOp(Instruction::And, ShuffledMask,		? Builder.CreateBinOp(Instruction::And, ShuffledMask,
MaskForGaps)		MaskForGaps)
: ShuffledMask;		: ShuffledMask;
}		}
NewLoad =		NewLoad =
Builder.CreateMaskedLoad(AddrParts[Part], Group->getAlign(),		Builder.CreateMaskedLoad(AddrParts[Part], Group->getAlign(),
Show All 10 Lines	if (isa<LoadInst>(Instr)) {
// wide loads.		// wide loads.
for (unsigned I = 0; I < InterleaveFactor; ++I) {		for (unsigned I = 0; I < InterleaveFactor; ++I) {
Instruction *Member = Group->getMember(I);		Instruction *Member = Group->getMember(I);

// Skip the gaps in the group.		// Skip the gaps in the group.
if (!Member)		if (!Member)
continue;		continue;

assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto StrideMask = createStrideMask(I, InterleaveFactor, VF.Min);		auto StrideMask =
		createStrideMask(I, InterleaveFactor, VF.getKnownMinValue());
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
Value *StridedVec = Builder.CreateShuffleVector(		Value *StridedVec = Builder.CreateShuffleVector(
NewLoads[Part], UndefVec, StrideMask, "strided.vec");		NewLoads[Part], UndefVec, StrideMask, "strided.vec");

// If this member has different type, cast the result type.		// If this member has different type, cast the result type.
if (Member->getType() != ScalarTy) {		if (Member->getType() != ScalarTy) {
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
VectorType *OtherVTy = VectorType::get(Member->getType(), VF);		VectorType *OtherVTy = VectorType::get(Member->getType(), VF);
StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);		StridedVec = createBitOrPointerCast(StridedVec, OtherVTy, DL);
}		}

if (Group->isReverse())		if (Group->isReverse())
StridedVec = reverseVector(StridedVec);		StridedVec = reverseVector(StridedVec);

VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);		VectorLoopValueMap.setVectorValue(Member, Part, StridedVec);
}		}
}		}
return;		return;
}		}

// The sub vector type for current instruction.		// The sub vector type for current instruction.
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
auto *SubVT = VectorType::get(ScalarTy, VF);		auto *SubVT = VectorType::get(ScalarTy, VF);

// Vectorize the interleaved store group.		// Vectorize the interleaved store group.
for (unsigned Part = 0; Part < UF; Part++) {		for (unsigned Part = 0; Part < UF; Part++) {
// Collect the stored vector from each member.		// Collect the stored vector from each member.
SmallVector<Value *, 4> StoredVecs;		SmallVector<Value *, 4> StoredVecs;
for (unsigned i = 0; i < InterleaveFactor; i++) {		for (unsigned i = 0; i < InterleaveFactor; i++) {
// Interleaved store group doesn't allow a gap, so each index has a member		// Interleaved store group doesn't allow a gap, so each index has a member
Show All 12 Lines	for (unsigned i = 0; i < InterleaveFactor; i++) {

StoredVecs.push_back(StoredVec);		StoredVecs.push_back(StoredVec);
}		}

// Concatenate all vectors into a wide vector.		// Concatenate all vectors into a wide vector.
Value *WideVec = concatenateVectors(Builder, StoredVecs);		Value *WideVec = concatenateVectors(Builder, StoredVecs);

// Interleave the elements in the wide vector.		// Interleave the elements in the wide vector.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Value *IVec = Builder.CreateShuffleVector(		Value *IVec = Builder.CreateShuffleVector(
WideVec, UndefVec, createInterleaveMask(VF.Min, InterleaveFactor),		WideVec, UndefVec,
		createInterleaveMask(VF.getKnownMinValue(), InterleaveFactor),
"interleaved.vec");		"interleaved.vec");

Instruction *NewStoreInstr;		Instruction *NewStoreInstr;
if (BlockInMask) {		if (BlockInMask) {
Value *BlockInMaskPart = State.get(BlockInMask, Part);		Value *BlockInMaskPart = State.get(BlockInMask, Part);
auto *Undefs = UndefValue::get(BlockInMaskPart->getType());		auto *Undefs = UndefValue::get(BlockInMaskPart->getType());
Value *ShuffledMask = Builder.CreateShuffleVector(		Value *ShuffledMask = Builder.CreateShuffleVector(
BlockInMaskPart, Undefs,		BlockInMaskPart, Undefs,
createReplicatedMask(InterleaveFactor, VF.Min), "interleaved.mask");		createReplicatedMask(InterleaveFactor, VF.getKnownMinValue()),
		"interleaved.mask");
NewStoreInstr = Builder.CreateMaskedStore(		NewStoreInstr = Builder.CreateMaskedStore(
IVec, AddrParts[Part], Group->getAlign(), ShuffledMask);		IVec, AddrParts[Part], Group->getAlign(), ShuffledMask);
}		}
else		else
NewStoreInstr =		NewStoreInstr =
Builder.CreateAlignedStore(IVec, AddrParts[Part], Group->getAlign());		Builder.CreateAlignedStore(IVec, AddrParts[Part], Group->getAlign());

Group->addMetadata(NewStoreInstr);		Group->addMetadata(NewStoreInstr);
Show All 17 Lines	LoopVectorizationCostModel::InstWidening Decision =
Cost->getWideningDecision(Instr, VF);		Cost->getWideningDecision(Instr, VF);
assert((Decision == LoopVectorizationCostModel::CM_Widen \|\|		assert((Decision == LoopVectorizationCostModel::CM_Widen \|\|
Decision == LoopVectorizationCostModel::CM_Widen_Reverse \|\|		Decision == LoopVectorizationCostModel::CM_Widen_Reverse \|\|
Decision == LoopVectorizationCostModel::CM_GatherScatter) &&		Decision == LoopVectorizationCostModel::CM_GatherScatter) &&
"CM decision is not to widen the memory instruction");		"CM decision is not to widen the memory instruction");

Type *ScalarDataTy = getMemInstValueType(Instr);		Type *ScalarDataTy = getMemInstValueType(Instr);

assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto *DataTy = VectorType::get(ScalarDataTy, VF);		auto *DataTy = VectorType::get(ScalarDataTy, VF);
const Align Alignment = getLoadStoreAlignment(Instr);		const Align Alignment = getLoadStoreAlignment(Instr);

// Determine if the pointer operand of the access is either consecutive or		// Determine if the pointer operand of the access is either consecutive or
// reverse consecutive.		// reverse consecutive.
bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);		bool Reverse = (Decision == LoopVectorizationCostModel::CM_Widen_Reverse);
bool ConsecutiveStride =		bool ConsecutiveStride =
Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);		Reverse \|\| (Decision == LoopVectorizationCostModel::CM_Widen);
Show All 19 Lines	const auto CreateVecPtr = [&](unsigned Part, Value Ptr) -> Value {
bool InBounds = false;		bool InBounds = false;
if (auto *gep = dyn_cast<GetElementPtrInst>(Ptr->stripPointerCasts()))		if (auto *gep = dyn_cast<GetElementPtrInst>(Ptr->stripPointerCasts()))
InBounds = gep->isInBounds();		InBounds = gep->isInBounds();

if (Reverse) {		if (Reverse) {
// If the address is consecutive but reversed, then the		// If the address is consecutive but reversed, then the
// wide store needs to start at the last vector element.		// wide store needs to start at the last vector element.
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(
ScalarDataTy, Ptr, Builder.getInt32(-Part * VF.Min)));		ScalarDataTy, Ptr, Builder.getInt32(-Part * VF.getKnownMinValue())));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(
ScalarDataTy, PartPtr, Builder.getInt32(1 - VF.Min)));		ScalarDataTy, PartPtr, Builder.getInt32(1 - VF.getKnownMinValue())));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
if (isMaskRequired) // Reverse of a null all-one mask is a null mask.		if (isMaskRequired) // Reverse of a null all-one mask is a null mask.
BlockInMaskParts[Part] = reverseVector(BlockInMaskParts[Part]);		BlockInMaskParts[Part] = reverseVector(BlockInMaskParts[Part]);
} else {		} else {
PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(		PartPtr = cast<GetElementPtrInst>(Builder.CreateGEP(
ScalarDataTy, Ptr, Builder.getInt32(Part * VF.Min)));		ScalarDataTy, Ptr, Builder.getInt32(Part * VF.getKnownMinValue())));
PartPtr->setIsInBounds(InBounds);		PartPtr->setIsInBounds(InBounds);
}		}

unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();		unsigned AddressSpace = Ptr->getType()->getPointerAddressSpace();
return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));		return Builder.CreateBitCast(PartPtr, DataTy->getPointerTo(AddressSpace));
};		};

// Handle Stores:		// Handle Stores:
▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	Value InnerLoopVectorizer::getOrCreateVectorTripCount(Loop L) {
if (VectorTripCount)		if (VectorTripCount)
return VectorTripCount;		return VectorTripCount;

Value *TC = getOrCreateTripCount(L);		Value *TC = getOrCreateTripCount(L);
IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());		IRBuilder<> Builder(L->getLoopPreheader()->getTerminator());

Type *Ty = TC->getType();		Type *Ty = TC->getType();
// This is where we can make the step a runtime constant.		// This is where we can make the step a runtime constant.
assert(!VF.Scalable && "scalable vectorization is not supported yet");		assert(!VF.isScalable() && "scalable vectorization is not supported yet");
Constant Step = ConstantInt::get(Ty, VF.Min UF);		Constant Step = ConstantInt::get(Ty, VF.getKnownMinValue() UF);

// If the tail is to be folded by masking, round the number of iterations N		// If the tail is to be folded by masking, round the number of iterations N
// up to a multiple of Step instead of rounding down. This is done by first		// up to a multiple of Step instead of rounding down. This is done by first
// adding Step-1 and then rounding down. Note that it's ok if this addition		// adding Step-1 and then rounding down. Note that it's ok if this addition
// overflows: the vector induction variable will eventually wrap to zero given		// overflows: the vector induction variable will eventually wrap to zero given
// that it starts at zero and its Step is a power of two; the loop will then		// that it starts at zero and its Step is a power of two; the loop will then
// exit, with the last early-exit vector comparison also producing all-true.		// exit, with the last early-exit vector comparison also producing all-true.
if (Cost->foldTailByMasking()) {		if (Cost->foldTailByMasking()) {
assert(isPowerOf2_32(VF.Min * UF) &&		assert(isPowerOf2_32(VF.getKnownMinValue() * UF) &&
"VF*UF must be a power of 2 when folding tail by masking");		"VF*UF must be a power of 2 when folding tail by masking");
TC = Builder.CreateAdd(TC, ConstantInt::get(Ty, VF.Min * UF - 1),		TC = Builder.CreateAdd(
"n.rnd.up");		TC, ConstantInt::get(Ty, VF.getKnownMinValue() * UF - 1), "n.rnd.up");
}		}

// Now we need to generate the expression for the part of the loop that the		// Now we need to generate the expression for the part of the loop that the
// vectorized body will execute. This is equal to N - (N % Step) if scalar		// vectorized body will execute. This is equal to N - (N % Step) if scalar
// iterations are not required for correctness, or N - Step, otherwise. Step		// iterations are not required for correctness, or N - Step, otherwise. Step
// is equal to the vectorization factor (number of SIMD elements) times the		// is equal to the vectorization factor (number of SIMD elements) times the
// unroll factor (number of SIMD instructions).		// unroll factor (number of SIMD instructions).
Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");		Value *R = Builder.CreateURem(TC, Step, "n.mod.vf");
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::emitMinimumIterationCountCheck(Loop *L,
// to the backedge-taken count overflowed leading to an incorrect trip count		// to the backedge-taken count overflowed leading to an incorrect trip count
// of zero. In this case we will also jump to the scalar loop.		// of zero. In this case we will also jump to the scalar loop.
auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE		auto P = Cost->requiresScalarEpilogue() ? ICmpInst::ICMP_ULE
: ICmpInst::ICMP_ULT;		: ICmpInst::ICMP_ULT;

// If tail is to be folded, vector loop takes care of all iterations.		// If tail is to be folded, vector loop takes care of all iterations.
Value *CheckMinIters = Builder.getFalse();		Value *CheckMinIters = Builder.getFalse();
if (!Cost->foldTailByMasking()) {		if (!Cost->foldTailByMasking()) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
CheckMinIters = Builder.CreateICmp(		CheckMinIters = Builder.CreateICmp(
P, Count, ConstantInt::get(Count->getType(), VF.Min * UF),		P, Count,
		ConstantInt::get(Count->getType(), VF.getKnownMinValue() * UF),
"min.iters.check");		"min.iters.check");
}		}
// Create new preheader for vector loop.		// Create new preheader for vector loop.
LoopVectorPreHeader =		LoopVectorPreHeader =
SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,		SplitBlock(TCCheckBlock, TCCheckBlock->getTerminator(), DT, LI, nullptr,
"vector.ph");		"vector.ph");

assert(DT->properlyDominates(DT->getNode(TCCheckBlock),		assert(DT->properlyDominates(DT->getNode(TCCheckBlock),
▲ Show 20 Lines • Show All 438 Lines • ▼ Show 20 Lines	BasicBlock *InnerLoopVectorizer::createVectorizedLoopSkeleton() {
// - counts from zero, stepping by one		// - counts from zero, stepping by one
// - is the size of the widest induction variable type		// - is the size of the widest induction variable type
// then we create a new one.		// then we create a new one.
OldInduction = Legal->getPrimaryInduction();		OldInduction = Legal->getPrimaryInduction();
Type *IdxTy = Legal->getWidestInductionType();		Type *IdxTy = Legal->getWidestInductionType();
Value *StartIdx = ConstantInt::get(IdxTy, 0);		Value *StartIdx = ConstantInt::get(IdxTy, 0);
// The loop step is equal to the vectorization factor (num of SIMD elements)		// The loop step is equal to the vectorization factor (num of SIMD elements)
// times the unroll factor (num of SIMD instructions).		// times the unroll factor (num of SIMD instructions).
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Constant Step = ConstantInt::get(IdxTy, VF.Min UF);		Constant Step = ConstantInt::get(IdxTy, VF.getKnownMinValue() UF);
Value *CountRoundDown = getOrCreateVectorTripCount(Lp);		Value *CountRoundDown = getOrCreateVectorTripCount(Lp);
Induction =		Induction =
createInductionVariable(Lp, StartIdx, CountRoundDown, Step,		createInductionVariable(Lp, StartIdx, CountRoundDown, Step,
getDebugLocFromInstOrOperands(OldInduction));		getDebugLocFromInstOrOperands(OldInduction));

// Emit phis for the new starting index of the scalar loop.		// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(Lp, CountRoundDown);		createInductionResumeValues(Lp, CountRoundDown);

▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E;) {

CSEMap[In] = In;		CSEMap[In] = In;
}		}
}		}

unsigned LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,		unsigned LoopVectorizationCostModel::getVectorCallCost(CallInst *CI,
ElementCount VF,		ElementCount VF,
bool &NeedToScalarize) {		bool &NeedToScalarize) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Function *F = CI->getCalledFunction();		Function *F = CI->getCalledFunction();
Type *ScalarRetTy = CI->getType();		Type *ScalarRetTy = CI->getType();
SmallVector<Type *, 4> Tys, ScalarTys;		SmallVector<Type *, 4> Tys, ScalarTys;
for (auto &ArgOp : CI->arg_operands())		for (auto &ArgOp : CI->arg_operands())
ScalarTys.push_back(ArgOp->getType());		ScalarTys.push_back(ArgOp->getType());

// Estimate cost of scalarized vector call. The source operands are assumed		// Estimate cost of scalarized vector call. The source operands are assumed
// to be vectors, so we need to extract individual elements from there,		// to be vectors, so we need to extract individual elements from there,
// execute VF scalar calls, and then gather the result into the vector return		// execute VF scalar calls, and then gather the result into the vector return
// value.		// value.
unsigned ScalarCallCost = TTI.getCallInstrCost(F, ScalarRetTy, ScalarTys,		unsigned ScalarCallCost = TTI.getCallInstrCost(F, ScalarRetTy, ScalarTys,
TTI::TCK_RecipThroughput);		TTI::TCK_RecipThroughput);
if (VF.isScalar())		if (VF.isScalar())
return ScalarCallCost;		return ScalarCallCost;

// Compute corresponding vector type for return value and arguments.		// Compute corresponding vector type for return value and arguments.
Type *RetTy = ToVectorTy(ScalarRetTy, VF);		Type *RetTy = ToVectorTy(ScalarRetTy, VF);
for (Type *ScalarTy : ScalarTys)		for (Type *ScalarTy : ScalarTys)
Tys.push_back(ToVectorTy(ScalarTy, VF));		Tys.push_back(ToVectorTy(ScalarTy, VF));

// Compute costs of unpacking argument values for the scalar calls and		// Compute costs of unpacking argument values for the scalar calls and
// packing the return values to a vector.		// packing the return values to a vector.
unsigned ScalarizationCost = getScalarizationOverhead(CI, VF);		unsigned ScalarizationCost = getScalarizationOverhead(CI, VF);

unsigned Cost = ScalarCallCost * VF.Min + ScalarizationCost;		unsigned Cost = ScalarCallCost * VF.getKnownMinValue() + ScalarizationCost;

// If we can't emit a vector call for this function, then the currently found		// If we can't emit a vector call for this function, then the currently found
// cost is the cost we need to return.		// cost is the cost we need to return.
NeedToScalarize = true;		NeedToScalarize = true;
VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);		VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);
Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);		Function VecFunc = VFDatabase(CI).getVectorizedFunction(Shape);

if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc)		if (!TLI \|\| CI->isNoBuiltin() \|\| !VecFunc)
▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixVectorizedLoop() {
// loop iterations are now distributed among them. Note that original loop		// loop iterations are now distributed among them. Note that original loop
// represented by LoopScalarBody becomes remainder loop after vectorization.		// represented by LoopScalarBody becomes remainder loop after vectorization.
//		//
// For cases like foldTailByMasking() and requiresScalarEpiloque() we may		// For cases like foldTailByMasking() and requiresScalarEpiloque() we may
// end up getting slightly roughened result but that should be OK since		// end up getting slightly roughened result but that should be OK since
// profile is not inherently precise anyway. Note also possible bypass of		// profile is not inherently precise anyway. Note also possible bypass of
// vector code caused by legality checks is ignored, assigning all the weight		// vector code caused by legality checks is ignored, assigning all the weight
// to the vector loop, optimistically.		// to the vector loop, optimistically.
assert(!VF.Scalable &&		assert(!VF.isScalable() &&
"cannot use scalable ElementCount to determine unroll factor");		"cannot use scalable ElementCount to determine unroll factor");
setProfileInfoAfterUnrolling(LI->getLoopFor(LoopScalarBody),		setProfileInfoAfterUnrolling(
LI->getLoopFor(LoopVectorBody),		LI->getLoopFor(LoopScalarBody), LI->getLoopFor(LoopVectorBody),
LI->getLoopFor(LoopScalarBody), VF.Min * UF);		LI->getLoopFor(LoopScalarBody), VF.getKnownMinValue() * UF);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs() {		void InnerLoopVectorizer::fixCrossIterationPHIs() {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #2: We now need to fix the recurrences by adding incoming edges to		// stage #2: We now need to fix the recurrences by adding incoming edges to
// the currently empty PHI nodes. At this point every instruction in the		// the currently empty PHI nodes. At this point every instruction in the
// original loop is widened to a vector form so we can use them to construct		// original loop is widened to a vector form so we can use them to construct
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {
// Get the initial and previous values of the scalar recurrence.		// Get the initial and previous values of the scalar recurrence.
auto *ScalarInit = Phi->getIncomingValueForBlock(Preheader);		auto *ScalarInit = Phi->getIncomingValueForBlock(Preheader);
auto *Previous = Phi->getIncomingValueForBlock(Latch);		auto *Previous = Phi->getIncomingValueForBlock(Latch);

// Create a vector from the initial value.		// Create a vector from the initial value.
auto *VectorInit = ScalarInit;		auto *VectorInit = ScalarInit;
if (VF.isVector()) {		if (VF.isVector()) {
Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());		Builder.SetInsertPoint(LoopVectorPreHeader->getTerminator());
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
VectorInit = Builder.CreateInsertElement(		VectorInit = Builder.CreateInsertElement(
UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,		UndefValue::get(VectorType::get(VectorInit->getType(), VF)), VectorInit,
Builder.getInt32(VF.Min - 1), "vector.recur.init");		Builder.getInt32(VF.getKnownMinValue() - 1), "vector.recur.init");
}		}

// We constructed a temporary phi node in the first phase of vectorization.		// We constructed a temporary phi node in the first phase of vectorization.
// This phi node will eventually be deleted.		// This phi node will eventually be deleted.
Builder.SetInsertPoint(		Builder.SetInsertPoint(
cast<Instruction>(VectorLoopValueMap.getVectorValue(Phi, 0)));		cast<Instruction>(VectorLoopValueMap.getVectorValue(Phi, 0)));

// Create a phi node for the new recurrence. The current value will either be		// Create a phi node for the new recurrence. The current value will either be
Show All 24 Lines	if (isa<PHINode>(PreviousLastPart))
InsertPt = PreviousInst->getParent()->getFirstInsertionPt();		InsertPt = PreviousInst->getParent()->getFirstInsertionPt();
else		else
InsertPt = ++PreviousInst->getIterator();		InsertPt = ++PreviousInst->getIterator();
}		}
Builder.SetInsertPoint(&*InsertPt);		Builder.SetInsertPoint(&*InsertPt);

// We will construct a vector for the recurrence by combining the values for		// We will construct a vector for the recurrence by combining the values for
// the current and previous iterations. This is the required shuffle mask.		// the current and previous iterations. This is the required shuffle mask.
assert(!VF.Scalable);		assert(!VF.isScalable());
SmallVector<int, 8> ShuffleMask(VF.Min);		SmallVector<int, 8> ShuffleMask(VF.getKnownMinValue());
ShuffleMask[0] = VF.Min - 1;		ShuffleMask[0] = VF.getKnownMinValue() - 1;
for (unsigned I = 1; I < VF.Min; ++I)		for (unsigned I = 1; I < VF.getKnownMinValue(); ++I)
ShuffleMask[I] = I + VF.Min - 1;		ShuffleMask[I] = I + VF.getKnownMinValue() - 1;

// The vector from which to take the initial value for the current iteration		// The vector from which to take the initial value for the current iteration
// (actual or unrolled). Initially, this is the vector phi node.		// (actual or unrolled). Initially, this is the vector phi node.
Value *Incoming = VecPhi;		Value *Incoming = VecPhi;

// Shuffle the current and previous vector and update the vector parts.		// Shuffle the current and previous vector and update the vector parts.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *PreviousPart = getOrCreateVectorValue(Previous, Part);		Value *PreviousPart = getOrCreateVectorValue(Previous, Part);
Show All 12 Lines	void InnerLoopVectorizer::fixFirstOrderRecurrence(PHINode *Phi) {
VecPhi->addIncoming(Incoming, LI->getLoopFor(LoopVectorBody)->getLoopLatch());		VecPhi->addIncoming(Incoming, LI->getLoopFor(LoopVectorBody)->getLoopLatch());

// Extract the last vector element in the middle block. This will be the		// Extract the last vector element in the middle block. This will be the
// initial value for the recurrence when jumping to the scalar loop.		// initial value for the recurrence when jumping to the scalar loop.
auto *ExtractForScalar = Incoming;		auto *ExtractForScalar = Incoming;
if (VF.isVector()) {		if (VF.isVector()) {
Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
ExtractForScalar = Builder.CreateExtractElement(		ExtractForScalar = Builder.CreateExtractElement(
ExtractForScalar, Builder.getInt32(VF.Min - 1), "vector.recur.extract");		ExtractForScalar, Builder.getInt32(VF.getKnownMinValue() - 1),
		"vector.recur.extract");
}		}
// Extract the second last element in the middle block if the		// Extract the second last element in the middle block if the
// Phi is used outside the loop. We need to extract the phi itself		// Phi is used outside the loop. We need to extract the phi itself
// and not the last element (the phi update in the current iteration). This		// and not the last element (the phi update in the current iteration). This
// will be the value when jumping to the exit block from the LoopMiddleBlock,		// will be the value when jumping to the exit block from the LoopMiddleBlock,
// when the scalar loop is not run at all.		// when the scalar loop is not run at all.
Value *ExtractForPhiUsedOutsideLoop = nullptr;		Value *ExtractForPhiUsedOutsideLoop = nullptr;
if (VF.isVector())		if (VF.isVector())
ExtractForPhiUsedOutsideLoop = Builder.CreateExtractElement(		ExtractForPhiUsedOutsideLoop = Builder.CreateExtractElement(
Incoming, Builder.getInt32(VF.Min - 2), "vector.recur.extract.for.phi");		Incoming, Builder.getInt32(VF.getKnownMinValue() - 2),
		"vector.recur.extract.for.phi");
// When loop is unrolled without vectorizing, initialize		// When loop is unrolled without vectorizing, initialize
// ExtractForPhiUsedOutsideLoop with the value just prior to unrolled value of		// ExtractForPhiUsedOutsideLoop with the value just prior to unrolled value of
// `Incoming`. This is analogous to the vectorized case above: extracting the		// `Incoming`. This is analogous to the vectorized case above: extracting the
// second last element when VF > 1.		// second last element when VF > 1.
else if (UF > 1)		else if (UF > 1)
ExtractForPhiUsedOutsideLoop = getOrCreateVectorValue(Previous, UF - 2);		ExtractForPhiUsedOutsideLoop = getOrCreateVectorValue(Previous, UF - 2);

// Fix the initial value of the original recurrence in the scalar loop.		// Fix the initial value of the original recurrence in the scalar loop.
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	if (Cost->foldTailByMasking()) {
}		}
}		}

// If the vector reduction can be performed in a smaller type, we truncate		// If the vector reduction can be performed in a smaller type, we truncate
// then extend the loop exit value to enable InstCombine to evaluate the		// then extend the loop exit value to enable InstCombine to evaluate the
// entire expression in the smaller type.		// entire expression in the smaller type.
if (VF.isVector() && Phi->getType() != RdxDesc.getRecurrenceType()) {		if (VF.isVector() && Phi->getType() != RdxDesc.getRecurrenceType()) {
assert(!IsInLoopReductionPhi && "Unexpected truncated inloop reduction!");		assert(!IsInLoopReductionPhi && "Unexpected truncated inloop reduction!");
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);		Type *RdxVecTy = VectorType::get(RdxDesc.getRecurrenceType(), VF);
Builder.SetInsertPoint(		Builder.SetInsertPoint(
LI->getLoopFor(LoopVectorBody)->getLoopLatch()->getTerminator());		LI->getLoopFor(LoopVectorBody)->getLoopLatch()->getTerminator());
VectorParts RdxParts(UF);		VectorParts RdxParts(UF);
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
RdxParts[Part] = VectorLoopValueMap.getVectorValue(LoopExitInst, Part);		RdxParts[Part] = VectorLoopValueMap.getVectorValue(LoopExitInst, Part);
Value *Trunc = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);		Value *Trunc = Builder.CreateTrunc(RdxParts[Part], RdxVecTy);
Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)		Value *Extnd = RdxDesc.isSigned() ? Builder.CreateSExt(Trunc, VecTy)
▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines	for (User *U : Cur->users()) {
if ((Cur != LoopExitInstr \|\| OrigLoop->contains(UI->getParent())) &&		if ((Cur != LoopExitInstr \|\| OrigLoop->contains(UI->getParent())) &&
Visited.insert(UI).second)		Visited.insert(UI).second)
Worklist.push_back(UI);		Worklist.push_back(UI);
}		}
}		}
}		}

void InnerLoopVectorizer::fixLCSSAPHIs() {		void InnerLoopVectorizer::fixLCSSAPHIs() {
assert(!VF.Scalable && "the code below assumes fixed width vectors");		assert(!VF.isScalable() && "the code below assumes fixed width vectors");
for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {		for (PHINode &LCSSAPhi : LoopExitBlock->phis()) {
if (LCSSAPhi.getNumIncomingValues() == 1) {		if (LCSSAPhi.getNumIncomingValues() == 1) {
auto *IncomingValue = LCSSAPhi.getIncomingValue(0);		auto *IncomingValue = LCSSAPhi.getIncomingValue(0);
// Non-instruction incoming values will have only one value.		// Non-instruction incoming values will have only one value.
unsigned LastLane = 0;		unsigned LastLane = 0;
if (isa<Instruction>(IncomingValue))		if (isa<Instruction>(IncomingValue))
LastLane = Cost->isUniformAfterVectorization(		LastLane = Cost->isUniformAfterVectorization(
cast<Instruction>(IncomingValue), VF)		cast<Instruction>(IncomingValue), VF)
? 0		? 0
: VF.Min - 1;		: VF.getKnownMinValue() - 1;
// Can be a loop invariant incoming value or the last scalar value to be		// Can be a loop invariant incoming value or the last scalar value to be
// extracted from the vectorized loop.		// extracted from the vectorized loop.
Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());		Builder.SetInsertPoint(LoopMiddleBlock->getTerminator());
Value *lastIncomingValue =		Value *lastIncomingValue =
getOrCreateScalarValue(IncomingValue, { UF - 1, LastLane });		getOrCreateScalarValue(IncomingValue, { UF - 1, LastLane });
LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);		LCSSAPhi.addIncoming(lastIncomingValue, LoopMiddleBlock);
}		}
}		}
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	for (unsigned Part = 0; Part < UF; ++Part) {
VectorLoopValueMap.setVectorValue(GEP, Part, NewGEP);		VectorLoopValueMap.setVectorValue(GEP, Part, NewGEP);
addMetadata(NewGEP, GEP);		addMetadata(NewGEP, GEP);
}		}
}		}
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
ElementCount VF) {		ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
PHINode *P = cast<PHINode>(PN);		PHINode *P = cast<PHINode>(PN);
if (EnableVPlanNativePath) {		if (EnableVPlanNativePath) {
// Currently we enter here in the VPlan-native path for non-induction		// Currently we enter here in the VPlan-native path for non-induction
// PHIs where all control flow is uniform. We simply widen these PHIs.		// PHIs where all control flow is uniform. We simply widen these PHIs.
// Create a vector phi with no operands - the vector phi operands will be		// Create a vector phi with no operands - the vector phi operands will be
// set at the end of vector code generation.		// set at the end of vector code generation.
Type *VecTy =		Type *VecTy =
(VF.isScalar()) ? PN->getType() : VectorType::get(PN->getType(), VF);		(VF.isScalar()) ? PN->getType() : VectorType::get(PN->getType(), VF);
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	case InductionDescriptor::IK_PtrInduction: {

if (Cost->isScalarAfterVectorization(P, VF)) {		if (Cost->isScalarAfterVectorization(P, VF)) {
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd =		Value *PtrInd =
Builder.CreateSExtOrTrunc(Induction, II.getStep()->getType());		Builder.CreateSExtOrTrunc(Induction, II.getStep()->getType());
// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF.Min;		unsigned Lanes =
		Cost->isUniformAfterVectorization(P, VF) ? 1 : VF.getKnownMinValue();
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
for (unsigned Lane = 0; Lane < Lanes; ++Lane) {		for (unsigned Lane = 0; Lane < Lanes; ++Lane) {
Constant *Idx =		Constant *Idx = ConstantInt::get(PtrInd->getType(),
ConstantInt::get(PtrInd->getType(), Lane + Part * VF.Min);		Lane + Part * VF.getKnownMinValue());
Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);		Value *GlobalIdx = Builder.CreateAdd(PtrInd, Idx);
Value *SclrGep =		Value *SclrGep =
emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II);		emitTransformedIndex(Builder, GlobalIdx, PSE.getSE(), DL, II);
SclrGep->setName("next.gep");		SclrGep->setName("next.gep");
VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep);		VectorLoopValueMap.setScalarValue(P, {Part, Lane}, SclrGep);
}		}
}		}
return;		return;
Show All 13 Lines	case InductionDescriptor::IK_PtrInduction: {
BasicBlock *LoopLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();		BasicBlock *LoopLatch = LI->getLoopFor(LoopVectorBody)->getLoopLatch();
Instruction *InductionLoc = LoopLatch->getTerminator();		Instruction *InductionLoc = LoopLatch->getTerminator();
const SCEV *ScalarStep = II.getStep();		const SCEV *ScalarStep = II.getStep();
SCEVExpander Exp(*PSE.getSE(), DL, "induction");		SCEVExpander Exp(*PSE.getSE(), DL, "induction");
Value *ScalarStepValue =		Value *ScalarStepValue =
Exp.expandCodeFor(ScalarStep, PhiType, InductionLoc);		Exp.expandCodeFor(ScalarStep, PhiType, InductionLoc);
Value *InductionGEP = GetElementPtrInst::Create(		Value *InductionGEP = GetElementPtrInst::Create(
ScStValueType->getPointerElementType(), NewPointerPhi,		ScStValueType->getPointerElementType(), NewPointerPhi,
Builder.CreateMul(ScalarStepValue,		Builder.CreateMul(
ConstantInt::get(PhiType, VF.Min * UF)),		ScalarStepValue,
		ConstantInt::get(PhiType, VF.getKnownMinValue() * UF)),
"ptr.ind", InductionLoc);		"ptr.ind", InductionLoc);
NewPointerPhi->addIncoming(InductionGEP, LoopLatch);		NewPointerPhi->addIncoming(InductionGEP, LoopLatch);

// Create UF many actual address geps that use the pointer		// Create UF many actual address geps that use the pointer
// phi as base and a vectorized version of the step value		// phi as base and a vectorized version of the step value
// (<step0, ..., stepN>) as offset.		// (<step0, ..., stepN>) as offset.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
SmallVector<Constant *, 8> Indices;		SmallVector<Constant *, 8> Indices;
// Create a vector of consecutive numbers from zero to VF.		// Create a vector of consecutive numbers from zero to VF.
for (unsigned i = 0; i < VF.Min; ++i)		for (unsigned i = 0; i < VF.getKnownMinValue(); ++i)
Indices.push_back(ConstantInt::get(PhiType, i + Part * VF.Min));		Indices.push_back(
		ConstantInt::get(PhiType, i + Part * VF.getKnownMinValue()));
Constant *StartOffset = ConstantVector::get(Indices);		Constant *StartOffset = ConstantVector::get(Indices);

Value *GEP = Builder.CreateGEP(		Value *GEP = Builder.CreateGEP(
ScStValueType->getPointerElementType(), NewPointerPhi,		ScStValueType->getPointerElementType(), NewPointerPhi,
Builder.CreateMul(StartOffset,		Builder.CreateMul(
Builder.CreateVectorSplat(VF.Min, ScalarStepValue),		StartOffset,
		Builder.CreateVectorSplat(VF.getKnownMinValue(), ScalarStepValue),
"vector.gep"));		"vector.gep"));
VectorLoopValueMap.setVectorValue(P, Part, GEP);		VectorLoopValueMap.setVectorValue(P, Part, GEP);
}		}
}		}
}		}
}		}

/// A helper function for checking whether an integer division-related		/// A helper function for checking whether an integer division-related
/// instruction may divide by zero (in which case it must be predicated if		/// instruction may divide by zero (in which case it must be predicated if
Show All 10 Lines	assert((I.getOpcode() == Instruction::UDiv \|\|
"Unexpected instruction");		"Unexpected instruction");
Value *Divisor = I.getOperand(1);		Value *Divisor = I.getOperand(1);
auto *CInt = dyn_cast<ConstantInt>(Divisor);		auto *CInt = dyn_cast<ConstantInt>(Divisor);
return !CInt \|\| CInt->isZero();		return !CInt \|\| CInt->isZero();
}		}

void InnerLoopVectorizer::widenInstruction(Instruction &I, VPUser &User,		void InnerLoopVectorizer::widenInstruction(Instruction &I, VPUser &User,
VPTransformState &State) {		VPTransformState &State) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
switch (I.getOpcode()) {		switch (I.getOpcode()) {
case Instruction::Call:		case Instruction::Call:
case Instruction::Br:		case Instruction::Br:
case Instruction::PHI:		case Instruction::PHI:
case Instruction::GetElementPtr:		case Instruction::GetElementPtr:
case Instruction::Select:		case Instruction::Select:
llvm_unreachable("This instruction is handled by a different recipe.");		llvm_unreachable("This instruction is handled by a different recipe.");
case Instruction::UDiv:		case Instruction::UDiv:
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	void InnerLoopVectorizer::widenInstruction(Instruction &I, VPUser &User,
case Instruction::UIToFP:		case Instruction::UIToFP:
case Instruction::Trunc:		case Instruction::Trunc:
case Instruction::FPTrunc:		case Instruction::FPTrunc:
case Instruction::BitCast: {		case Instruction::BitCast: {
auto *CI = cast<CastInst>(&I);		auto *CI = cast<CastInst>(&I);
setDebugLocFromInst(Builder, CI);		setDebugLocFromInst(Builder, CI);

/// Vectorize casts.		/// Vectorize casts.
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
Type *DestTy =		Type *DestTy =
(VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF);		(VF.isScalar()) ? CI->getType() : VectorType::get(CI->getType(), VF);

for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
Value *A = State.get(User.getOperand(0), Part);		Value *A = State.get(User.getOperand(0), Part);
Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);		Value *Cast = Builder.CreateCast(CI->getOpcode(), A, DestTy);
VectorLoopValueMap.setVectorValue(&I, Part, Cast);		VectorLoopValueMap.setVectorValue(&I, Part, Cast);
addMetadata(Cast, &I);		addMetadata(Cast, &I);
Show All 13 Lines	assert(!isa<DbgInfoIntrinsic>(I) &&
"DbgInfoIntrinsic should have been dropped during VPlan construction");		"DbgInfoIntrinsic should have been dropped during VPlan construction");
setDebugLocFromInst(Builder, &I);		setDebugLocFromInst(Builder, &I);

Module *M = I.getParent()->getParent()->getParent();		Module *M = I.getParent()->getParent()->getParent();
auto *CI = cast<CallInst>(&I);		auto *CI = cast<CallInst>(&I);

SmallVector<Type *, 4> Tys;		SmallVector<Type *, 4> Tys;
for (Value *ArgOperand : CI->arg_operands())		for (Value *ArgOperand : CI->arg_operands())
Tys.push_back(ToVectorTy(ArgOperand->getType(), VF.Min));		Tys.push_back(ToVectorTy(ArgOperand->getType(), VF.getKnownMinValue()));

Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// The flag shows whether we use Intrinsic or a usual Call for vectorized		// The flag shows whether we use Intrinsic or a usual Call for vectorized
// version of the instruction.		// version of the instruction.
// Is it beneficial to perform intrinsic call compared to lib call?		// Is it beneficial to perform intrinsic call compared to lib call?
bool NeedToScalarize = false;		bool NeedToScalarize = false;
unsigned CallCost = Cost->getVectorCallCost(CI, VF, NeedToScalarize);		unsigned CallCost = Cost->getVectorCallCost(CI, VF, NeedToScalarize);
Show All 15 Lines	for (auto &I : enumerate(ArgOperands.operands())) {
Args.push_back(Arg);		Args.push_back(Arg);
}		}

Function *VectorF;		Function *VectorF;
if (UseVectorIntrinsic) {		if (UseVectorIntrinsic) {
// Use vector version of the intrinsic.		// Use vector version of the intrinsic.
Type *TysForDecl[] = {CI->getType()};		Type *TysForDecl[] = {CI->getType()};
if (VF.isVector()) {		if (VF.isVector()) {
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);		TysForDecl[0] = VectorType::get(CI->getType()->getScalarType(), VF);
}		}
VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);		VectorF = Intrinsic::getDeclaration(M, ID, TysForDecl);
assert(VectorF && "Can't retrieve vector intrinsic.");		assert(VectorF && "Can't retrieve vector intrinsic.");
} else {		} else {
// Use vector version of the function call.		// Use vector version of the function call.
const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);		const VFShape Shape = VFShape::get(CI, VF, false /HasGlobalPred*/);
#ifndef NDEBUG		#ifndef NDEBUG
▲ Show 20 Lines • Show All 222 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "LV: Found scalar instruction: " << *IndUpdate
<< "\n");		<< "\n");
}		}

Scalars[VF].insert(Worklist.begin(), Worklist.end());		Scalars[VF].insert(Worklist.begin(), Worklist.end());
}		}

bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,		bool LoopVectorizationCostModel::isScalarWithPredication(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
if (!blockNeedsPredication(I->getParent()))		if (!blockNeedsPredication(I->getParent()))
return false;		return false;
switch(I->getOpcode()) {		switch(I->getOpcode()) {
default:		default:
break;		break;
case Instruction::Load:		case Instruction::Load:
case Instruction::Store: {		case Instruction::Store: {
if (!Legal->isMaskRequired(I))		if (!Legal->isMaskRequired(I))
▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	if (TTI.shouldMaximizeVectorBandwidth(!isScalarEpilogueAllowed()) \|\|
for (int i = RUs.size() - 1; i >= 0; --i) {		for (int i = RUs.size() - 1; i >= 0; --i) {
bool Selected = true;		bool Selected = true;
for (auto& pair : RUs[i].MaxLocalUsers) {		for (auto& pair : RUs[i].MaxLocalUsers) {
unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);		unsigned TargetNumRegisters = TTI.getNumberOfRegisters(pair.first);
if (pair.second > TargetNumRegisters)		if (pair.second > TargetNumRegisters)
Selected = false;		Selected = false;
}		}
if (Selected) {		if (Selected) {
MaxVF = VFs[i].Min;		MaxVF = VFs[i].getKnownMinValue();
break;		break;
}		}
}		}
if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {		if (unsigned MinVF = TTI.getMinimumVF(SmallestType)) {
if (MaxVF < MinVF) {		if (MaxVF < MinVF) {
LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF		LLVM_DEBUG(dbgs() << "LV: Overriding calculated MaxVF(" << MaxVF
<< ") with target's minimum: " << MinVF << '\n');		<< ") with target's minimum: " << MinVF << '\n');
MaxVF = MinVF;		MaxVF = MinVF;
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	if (EnableIndVarRegisterHeur) {
PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs - 1) /		PowerOf2Floor((TargetNumRegisters - LoopInvariantRegs - 1) /
std::max(1U, (MaxLocalUsers - 1)));		std::max(1U, (MaxLocalUsers - 1)));
}		}

IC = std::min(IC, TmpIC);		IC = std::min(IC, TmpIC);
}		}

// Clamp the interleave ranges to reasonable counts.		// Clamp the interleave ranges to reasonable counts.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
unsigned MaxInterleaveCount = TTI.getMaxInterleaveFactor(VF.Min);		unsigned MaxInterleaveCount =
		TTI.getMaxInterleaveFactor(VF.getKnownMinValue());

// Check if the user has overridden the max.		// Check if the user has overridden the max.
if (VF == 1) {		if (VF == 1) {
if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxScalarInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxScalarInterleaveFactor;
} else {		} else {
if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)		if (ForceTargetMaxVectorInterleaveFactor.getNumOccurrences() > 0)
MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;		MaxInterleaveCount = ForceTargetMaxVectorInterleaveFactor;
}		}

// If trip count is known or estimated compile time constant, limit the		// If trip count is known or estimated compile time constant, limit the
// interleave count to be less than the trip count divided by VF.		// interleave count to be less than the trip count divided by VF.
if (BestKnownTC) {		if (BestKnownTC) {
MaxInterleaveCount = std::min(*BestKnownTC / VF.Min, MaxInterleaveCount);		MaxInterleaveCount =
		std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
}		}

// If we did not calculate the cost for VF (because the user selected the VF)		// If we did not calculate the cost for VF (because the user selected the VF)
// then we calculate the cost of VF here.		// then we calculate the cost of VF here.
if (LoopCost == 0)		if (LoopCost == 0)
LoopCost = expectedCost(VF).first;		LoopCost = expectedCost(VF).first;

assert(LoopCost && "Non-zero loop cost expected");		assert(LoopCost && "Non-zero loop cost expected");
▲ Show 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::calculateRegisterUsage(ArrayRef<ElementCount> VFs) {

LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");		LLVM_DEBUG(dbgs() << "LV(REG): Calculating max register usage:\n");

// A lambda that gets the register usage for the given type and VF.		// A lambda that gets the register usage for the given type and VF.
auto GetRegUsage = [&DL, WidestRegister](Type *Ty, ElementCount VF) {		auto GetRegUsage = [&DL, WidestRegister](Type *Ty, ElementCount VF) {
if (Ty->isTokenTy())		if (Ty->isTokenTy())
return 0U;		return 0U;
unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());		unsigned TypeSize = DL.getTypeSizeInBits(Ty->getScalarType());
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
return std::max<unsigned>(1, VF.Min * TypeSize / WidestRegister);		return std::max<unsigned>(1, VF.getKnownMinValue() * TypeSize /
		WidestRegister);
};		};

for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {		for (unsigned int i = 0, s = IdxToInstr.size(); i < s; ++i) {
Instruction *I = IdxToInstr[i];		Instruction *I = IdxToInstr[i];

// Remove all of the instructions that end at this location.		// Remove all of the instructions that end at this location.
InstrList &List = TransposeEnds[i];		InstrList &List = TransposeEnds[i];
for (Instruction *ToRemove : List)		for (Instruction *ToRemove : List)
▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {
// Compute the cost of the vector instruction. Note that this cost already		// Compute the cost of the vector instruction. Note that this cost already
// includes the scalarization overhead of the predicated instruction.		// includes the scalarization overhead of the predicated instruction.
unsigned VectorCost = getInstructionCost(I, VF).first;		unsigned VectorCost = getInstructionCost(I, VF).first;

// Compute the cost of the scalarized instruction. This cost is the cost of		// Compute the cost of the scalarized instruction. This cost is the cost of
// the instruction as if it wasn't if-converted and instead remained in the		// the instruction as if it wasn't if-converted and instead remained in the
// predicated block. We will scale this cost by block probability after		// predicated block. We will scale this cost by block probability after
// computing the scalarization overhead.		// computing the scalarization overhead.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
unsigned ScalarCost =		unsigned ScalarCost =
VF.Min * getInstructionCost(I, ElementCount::getFixed(1)).first;		VF.getKnownMinValue() *
		getInstructionCost(I, ElementCount::getFixed(1)).first;

// Compute the scalarization overhead of needed insertelement instructions		// Compute the scalarization overhead of needed insertelement instructions
// and phi nodes.		// and phi nodes.
if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {		if (isScalarWithPredication(I) && !I->getType()->isVoidTy()) {
ScalarCost += TTI.getScalarizationOverhead(		ScalarCost += TTI.getScalarizationOverhead(
cast<VectorType>(ToVectorTy(I->getType(), VF)),		cast<VectorType>(ToVectorTy(I->getType(), VF)),
APInt::getAllOnesValue(VF.Min), true, false);		APInt::getAllOnesValue(VF.getKnownMinValue()), true, false);
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
ScalarCost +=		ScalarCost +=
VF.Min *		VF.getKnownMinValue() *
TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);		TTI.getCFInstrCost(Instruction::PHI, TTI::TCK_RecipThroughput);
}		}

// Compute the scalarization overhead of needed extractelement		// Compute the scalarization overhead of needed extractelement
// instructions. For each of the instruction's operands, if the operand can		// instructions. For each of the instruction's operands, if the operand can
// be scalarized, add it to the worklist; otherwise, account for the		// be scalarized, add it to the worklist; otherwise, account for the
// overhead.		// overhead.
for (Use &U : I->operands())		for (Use &U : I->operands())
if (auto *J = dyn_cast<Instruction>(U.get())) {		if (auto *J = dyn_cast<Instruction>(U.get())) {
assert(VectorType::isValidElementType(J->getType()) &&		assert(VectorType::isValidElementType(J->getType()) &&
"Instruction has non-scalar type");		"Instruction has non-scalar type");
if (canBeScalarized(J))		if (canBeScalarized(J))
Worklist.push_back(J);		Worklist.push_back(J);
else if (needsExtract(J, VF)) {		else if (needsExtract(J, VF)) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
ScalarCost += TTI.getScalarizationOverhead(		ScalarCost += TTI.getScalarizationOverhead(
cast<VectorType>(ToVectorTy(J->getType(), VF)),		cast<VectorType>(ToVectorTy(J->getType(), VF)),
APInt::getAllOnesValue(VF.Min), false, true);		APInt::getAllOnesValue(VF.getKnownMinValue()), false, true);
}		}
}		}

// Scale the total scalar cost by block probability.		// Scale the total scalar cost by block probability.
ScalarCost /= getReciprocalPredBlockProb();		ScalarCost /= getReciprocalPredBlockProb();

// Compute the discount. A non-negative discount means the vector version		// Compute the discount. A non-negative discount means the vector version
// of the instruction costs more, and scalarizing would be beneficial.		// of the instruction costs more, and scalarizing would be beneficial.
Discount += VectorCost - ScalarCost;		Discount += VectorCost - ScalarCost;
ScalarCosts[I] = ScalarCost;		ScalarCosts[I] = ScalarCost;
}		}

return Discount;		return Discount;
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::expectedCost(ElementCount VF) {		LoopVectorizationCostModel::expectedCost(ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
VectorizationCostTy Cost;		VectorizationCostTy Cost;

// For each block.		// For each block.
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
VectorizationCostTy BlockCost;		VectorizationCostTy BlockCost;

// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : BB->instructionsWithoutDebug()) {		for (Instruction &I : BB->instructionsWithoutDebug()) {
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	return Legal->hasStride(I->getOperand(0)) \|\|
Legal->hasStride(I->getOperand(1));		Legal->hasStride(I->getOperand(1));
}		}

unsigned		unsigned
LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,		LoopVectorizationCostModel::getMemInstScalarizationCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(VF.isVector() &&		assert(VF.isVector() &&
"Scalarization cost of instruction implies vectorization.");		"Scalarization cost of instruction implies vectorization.");
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
Type *ValTy = getMemInstValueType(I);		Type *ValTy = getMemInstValueType(I);
auto SE = PSE.getSE();		auto SE = PSE.getSE();

unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);
Value *Ptr = getLoadStorePointerOperand(I);		Value *Ptr = getLoadStorePointerOperand(I);
Type *PtrTy = ToVectorTy(Ptr->getType(), VF);		Type *PtrTy = ToVectorTy(Ptr->getType(), VF);

// Figure out whether the access is strided and get the stride value		// Figure out whether the access is strided and get the stride value
// if it's known in compile time		// if it's known in compile time
const SCEV *PtrSCEV = getAddressAccessSCEV(Ptr, Legal, PSE, TheLoop);		const SCEV *PtrSCEV = getAddressAccessSCEV(Ptr, Legal, PSE, TheLoop);

// Get the cost of the scalar memory instruction and address computation.		// Get the cost of the scalar memory instruction and address computation.
unsigned Cost = VF.Min * TTI.getAddressComputationCost(PtrTy, SE, PtrSCEV);		unsigned Cost =
		VF.getKnownMinValue() * TTI.getAddressComputationCost(PtrTy, SE, PtrSCEV);

// Don't pass *I here, since it is scalar but will actually be part of a		// Don't pass *I here, since it is scalar but will actually be part of a
// vectorized loop where the user of it is a vectorized instruction.		// vectorized loop where the user of it is a vectorized instruction.
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
Cost += VF.Min *		Cost += VF.getKnownMinValue() *
TTI.getMemoryOpCost(I->getOpcode(), ValTy->getScalarType(), Alignment,		TTI.getMemoryOpCost(I->getOpcode(), ValTy->getScalarType(), Alignment,
AS, TTI::TCK_RecipThroughput);		AS, TTI::TCK_RecipThroughput);

// Get the overhead of the extractelement and insertelement instructions		// Get the overhead of the extractelement and insertelement instructions
// we might create due to scalarization.		// we might create due to scalarization.
Cost += getScalarizationOverhead(I, VF);		Cost += getScalarizationOverhead(I, VF);

// If we have a predicated store, it may not be executed for each vector		// If we have a predicated store, it may not be executed for each vector
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	return TTI.getAddressComputationCost(ValTy) +
TTI.getShuffleCost(TargetTransformInfo::SK_Broadcast, VectorTy);		TTI.getShuffleCost(TargetTransformInfo::SK_Broadcast, VectorTy);
}		}
StoreInst *SI = cast<StoreInst>(I);		StoreInst *SI = cast<StoreInst>(I);

bool isLoopInvariantStoreValue = Legal->isUniform(SI->getValueOperand());		bool isLoopInvariantStoreValue = Legal->isUniform(SI->getValueOperand());
return TTI.getAddressComputationCost(ValTy) +		return TTI.getAddressComputationCost(ValTy) +
TTI.getMemoryOpCost(Instruction::Store, ValTy, Alignment, AS,		TTI.getMemoryOpCost(Instruction::Store, ValTy, Alignment, AS,
CostKind) +		CostKind) +
(isLoopInvariantStoreValue ? 0 : TTI.getVectorInstrCost(		(isLoopInvariantStoreValue
Instruction::ExtractElement,		? 0
VectorTy, VF.Min - 1));		: TTI.getVectorInstrCost(Instruction::ExtractElement, VectorTy,
		VF.getKnownMinValue() - 1));
}		}

unsigned LoopVectorizationCostModel::getGatherScatterCost(Instruction *I,		unsigned LoopVectorizationCostModel::getGatherScatterCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
Type *ValTy = getMemInstValueType(I);		Type *ValTy = getMemInstValueType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
const Align Alignment = getLoadStoreAlignment(I);		const Align Alignment = getLoadStoreAlignment(I);
const Value *Ptr = getLoadStorePointerOperand(I);		const Value *Ptr = getLoadStorePointerOperand(I);
Show All 9 Lines	unsigned LoopVectorizationCostModel::getInterleaveGroupCost(Instruction *I,
Type *ValTy = getMemInstValueType(I);		Type *ValTy = getMemInstValueType(I);
auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));		auto *VectorTy = cast<VectorType>(ToVectorTy(ValTy, VF));
unsigned AS = getLoadStoreAddressSpace(I);		unsigned AS = getLoadStoreAddressSpace(I);

auto Group = getInterleavedAccessGroup(I);		auto Group = getInterleavedAccessGroup(I);
assert(Group && "Fail to get an interleaved access group.");		assert(Group && "Fail to get an interleaved access group.");

unsigned InterleaveFactor = Group->getFactor();		unsigned InterleaveFactor = Group->getFactor();
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto WideVecTy = VectorType::get(ValTy, VF InterleaveFactor);		auto WideVecTy = VectorType::get(ValTy, VF InterleaveFactor);

// Holds the indices of existing members in an interleaved load group.		// Holds the indices of existing members in an interleaved load group.
// An interleaved store group doesn't need this as it doesn't allow gaps.		// An interleaved store group doesn't need this as it doesn't allow gaps.
SmallVector<unsigned, 4> Indices;		SmallVector<unsigned, 4> Indices;
if (isa<LoadInst>(I)) {		if (isa<LoadInst>(I)) {
for (unsigned i = 0; i < InterleaveFactor; i++)		for (unsigned i = 0; i < InterleaveFactor; i++)
if (Group->getMember(i))		if (Group->getMember(i))
Show All 31 Lines	return TTI.getAddressComputationCost(ValTy) +
TTI::TCK_RecipThroughput, I);		TTI::TCK_RecipThroughput, I);
}		}
return getWideningCost(I, VF);		return getWideningCost(I, VF);
}		}

LoopVectorizationCostModel::VectorizationCostTy		LoopVectorizationCostModel::VectorizationCostTy
LoopVectorizationCostModel::getInstructionCost(Instruction *I,		LoopVectorizationCostModel::getInstructionCost(Instruction *I,
ElementCount VF) {		ElementCount VF) {
assert(!VF.Scalable &&		assert(!VF.isScalable() &&
"the cost model is not yet implemented for scalable vectorization");		"the cost model is not yet implemented for scalable vectorization");
// If we know that this instruction will remain uniform, check the cost of		// If we know that this instruction will remain uniform, check the cost of
// the scalar version.		// the scalar version.
if (isUniformAfterVectorization(I, VF))		if (isUniformAfterVectorization(I, VF))
VF = ElementCount::getFixed(1);		VF = ElementCount::getFixed(1);

if (VF.isVector() && isProfitableToScalarize(I, VF))		if (VF.isVector() && isProfitableToScalarize(I, VF))
return VectorizationCostTy(InstsToScalarize[VF][I], false);		return VectorizationCostTy(InstsToScalarize[VF][I], false);

// Forced scalars do not have any scalarization overhead.		// Forced scalars do not have any scalarization overhead.
auto ForcedScalar = ForcedScalars.find(VF);		auto ForcedScalar = ForcedScalars.find(VF);
if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {		if (VF.isVector() && ForcedScalar != ForcedScalars.end()) {
auto InstSet = ForcedScalar->second;		auto InstSet = ForcedScalar->second;
if (InstSet.count(I))		if (InstSet.count(I))
return VectorizationCostTy(		return VectorizationCostTy(
(getInstructionCost(I, ElementCount::getFixed(1)).first * VF.Min),		(getInstructionCost(I, ElementCount::getFixed(1)).first *
		VF.getKnownMinValue()),
false);		false);
}		}

Type *VectorTy;		Type *VectorTy;
unsigned C = getInstructionCost(I, VF, VectorTy);		unsigned C = getInstructionCost(I, VF, VectorTy);

bool TypeNotScalarized = VF.isVector() && VectorTy->isVectorTy() &&		bool TypeNotScalarized =
TTI.getNumberOfParts(VectorTy) < VF.Min;		VF.isVector() && VectorTy->isVectorTy() &&
		TTI.getNumberOfParts(VectorTy) < VF.getKnownMinValue();
return VectorizationCostTy(C, TypeNotScalarized);		return VectorizationCostTy(C, TypeNotScalarized);
}		}

unsigned LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,		unsigned LoopVectorizationCostModel::getScalarizationOverhead(Instruction *I,
ElementCount VF) {		ElementCount VF) {

assert(!VF.Scalable &&		assert(!VF.isScalable() &&
"cannot compute scalarization overhead for scalable vectorization");		"cannot compute scalarization overhead for scalable vectorization");
if (VF.isScalar())		if (VF.isScalar())
return 0;		return 0;

unsigned Cost = 0;		unsigned Cost = 0;
Type *RetTy = ToVectorTy(I->getType(), VF);		Type *RetTy = ToVectorTy(I->getType(), VF);
if (!RetTy->isVoidTy() &&		if (!RetTy->isVoidTy() &&
(!isa<LoadInst>(I) \|\| !TTI.supportsEfficientVectorElementLoadStore()))		(!isa<LoadInst>(I) \|\| !TTI.supportsEfficientVectorElementLoadStore()))
Cost += TTI.getScalarizationOverhead(		Cost += TTI.getScalarizationOverhead(
cast<VectorType>(RetTy), APInt::getAllOnesValue(VF.Min), true, false);		cast<VectorType>(RetTy), APInt::getAllOnesValue(VF.getKnownMinValue()),
		true, false);

// Some targets keep addresses scalar.		// Some targets keep addresses scalar.
if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())		if (isa<LoadInst>(I) && !TTI.prefersVectorizedAddressing())
return Cost;		return Cost;

// Some targets support efficient element stores.		// Some targets support efficient element stores.
if (isa<StoreInst>(I) && TTI.supportsEfficientVectorElementLoadStore())		if (isa<StoreInst>(I) && TTI.supportsEfficientVectorElementLoadStore())
return Cost;		return Cost;

// Collect operands to consider.		// Collect operands to consider.
CallInst *CI = dyn_cast<CallInst>(I);		CallInst *CI = dyn_cast<CallInst>(I);
Instruction::op_range Ops = CI ? CI->arg_operands() : I->operands();		Instruction::op_range Ops = CI ? CI->arg_operands() : I->operands();

// Skip operands that do not require extraction/scalarization and do not incur		// Skip operands that do not require extraction/scalarization and do not incur
// any overhead.		// any overhead.
return Cost +		return Cost + TTI.getOperandsScalarizationOverhead(
TTI.getOperandsScalarizationOverhead(filterExtractingOperands(Ops, VF),		filterExtractingOperands(Ops, VF), VF.getKnownMinValue());
VF.Min);
}		}

void LoopVectorizationCostModel::setCostBasedWideningDecision(ElementCount VF) {		void LoopVectorizationCostModel::setCostBasedWideningDecision(ElementCount VF) {
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
if (VF.isScalar())		if (VF.isScalar())
return;		return;
NumPredStores = 0;		NumPredStores = 0;
for (BasicBlock *BB : TheLoop->blocks()) {		for (BasicBlock *BB : TheLoop->blocks()) {
// For each instruction in the old loop.		// For each instruction in the old loop.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
Value *Ptr = getLoadStorePointerOperand(&I);		Value *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)		if (!Ptr)
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	if (isa<LoadInst>(I)) {
// by cost functions, but since this involves the task of finding out		// by cost functions, but since this involves the task of finding out
// if the loaded register is involved in an address computation, it is		// if the loaded register is involved in an address computation, it is
// instead changed here when we know this is the case.		// instead changed here when we know this is the case.
InstWidening Decision = getWideningDecision(I, VF);		InstWidening Decision = getWideningDecision(I, VF);
if (Decision == CM_Widen \|\| Decision == CM_Widen_Reverse)		if (Decision == CM_Widen \|\| Decision == CM_Widen_Reverse)
// Scalarize a widened load of address.		// Scalarize a widened load of address.
setWideningDecision(		setWideningDecision(
I, VF, CM_Scalarize,		I, VF, CM_Scalarize,
(VF.Min * getMemoryInstructionCost(I, ElementCount::getFixed(1))));		(VF.getKnownMinValue() *
		getMemoryInstructionCost(I, ElementCount::getFixed(1))));
else if (auto Group = getInterleavedAccessGroup(I)) {		else if (auto Group = getInterleavedAccessGroup(I)) {
// Scalarize an interleave group of address loads.		// Scalarize an interleave group of address loads.
for (unsigned I = 0; I < Group->getFactor(); ++I) {		for (unsigned I = 0; I < Group->getFactor(); ++I) {
if (Instruction *Member = Group->getMember(I))		if (Instruction *Member = Group->getMember(I))
setWideningDecision(		setWideningDecision(
Member, VF, CM_Scalarize,		Member, VF, CM_Scalarize,
(VF.Min *		(VF.getKnownMinValue() *
getMemoryInstructionCost(Member, ElementCount::getFixed(1))));		getMemoryInstructionCost(Member, ElementCount::getFixed(1))));
}		}
}		}
} else		} else
// Make sure I gets scalarized and a cost estimate without		// Make sure I gets scalarized and a cost estimate without
// scalarization overhead.		// scalarization overhead.
ForcedScalars[VF].insert(I);		ForcedScalars[VF].insert(I);
}		}
Show All 25 Lines	case Instruction::Br: {
BranchInst *BI = cast<BranchInst>(I);		BranchInst *BI = cast<BranchInst>(I);
if (VF.isVector() && BI->isConditional() &&		if (VF.isVector() && BI->isConditional() &&
(PredicatedBBsAfterVectorization.count(BI->getSuccessor(0)) \|\|		(PredicatedBBsAfterVectorization.count(BI->getSuccessor(0)) \|\|
PredicatedBBsAfterVectorization.count(BI->getSuccessor(1))))		PredicatedBBsAfterVectorization.count(BI->getSuccessor(1))))
ScalarPredicatedBB = true;		ScalarPredicatedBB = true;

if (ScalarPredicatedBB) {		if (ScalarPredicatedBB) {
// Return cost for branches around scalarized and predicated blocks.		// Return cost for branches around scalarized and predicated blocks.
assert(!VF.Scalable && "scalable vectors not yet supported.");		assert(!VF.isScalable() && "scalable vectors not yet supported.");
auto *Vec_i1Ty =		auto *Vec_i1Ty =
VectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);		VectorType::get(IntegerType::getInt1Ty(RetTy->getContext()), VF);
return (TTI.getScalarizationOverhead(		return (TTI.getScalarizationOverhead(
Vec_i1Ty, APInt::getAllOnesValue(VF.Min), false, true) +		Vec_i1Ty, APInt::getAllOnesValue(VF.getKnownMinValue()),
(TTI.getCFInstrCost(Instruction::Br, CostKind) * VF.Min));		false, true) +
		(TTI.getCFInstrCost(Instruction::Br, CostKind) *
		VF.getKnownMinValue()));
} else if (I->getParent() == TheLoop->getLoopLatch() \|\| VF.isScalar())		} else if (I->getParent() == TheLoop->getLoopLatch() \|\| VF.isScalar())
// The back-edge branch will remain, as will all scalar branches.		// The back-edge branch will remain, as will all scalar branches.
return TTI.getCFInstrCost(Instruction::Br, CostKind);		return TTI.getCFInstrCost(Instruction::Br, CostKind);
else		else
// This branch will be eliminated by if-conversion.		// This branch will be eliminated by if-conversion.
return 0;		return 0;
// Note: We currently assume zero cost for an unconditional branch inside		// Note: We currently assume zero cost for an unconditional branch inside
// a predicated block since it will become a fall-through, although we		// a predicated block since it will become a fall-through, although we
// may decide in the future to call TTI for all branches.		// may decide in the future to call TTI for all branches.
}		}
case Instruction::PHI: {		case Instruction::PHI: {
auto *Phi = cast<PHINode>(I);		auto *Phi = cast<PHINode>(I);

// First-order recurrences are replaced by vector shuffles inside the loop.		// First-order recurrences are replaced by vector shuffles inside the loop.
// NOTE: Don't use ToVectorTy as SK_ExtractSubvector expects a vector type.		// NOTE: Don't use ToVectorTy as SK_ExtractSubvector expects a vector type.
if (VF.isVector() && Legal->isFirstOrderRecurrence(Phi))		if (VF.isVector() && Legal->isFirstOrderRecurrence(Phi))
return TTI.getShuffleCost(TargetTransformInfo::SK_ExtractSubvector,		return TTI.getShuffleCost(
cast<VectorType>(VectorTy), VF.Min - 1,		TargetTransformInfo::SK_ExtractSubvector, cast<VectorType>(VectorTy),
FixedVectorType::get(RetTy, 1));		VF.getKnownMinValue() - 1, FixedVectorType::get(RetTy, 1));

// Phi nodes in non-header blocks (not inductions, reductions, etc.) are		// Phi nodes in non-header blocks (not inductions, reductions, etc.) are
// converted into select instructions. We require N - 1 selects per phi		// converted into select instructions. We require N - 1 selects per phi
// node, where N is the number of incoming values.		// node, where N is the number of incoming values.
if (VF.isVector() && Phi->getParent() != TheLoop->getHeader())		if (VF.isVector() && Phi->getParent() != TheLoop->getHeader())
return (Phi->getNumIncomingValues() - 1) *		return (Phi->getNumIncomingValues() - 1) *
TTI.getCmpSelInstrCost(		TTI.getCmpSelInstrCost(
Instruction::Select, ToVectorTy(Phi->getType(), VF),		Instruction::Select, ToVectorTy(Phi->getType(), VF),
Show All 12 Lines	case Instruction::SRem:
// predicated, we fall through to the next case.		// predicated, we fall through to the next case.
if (VF.isVector() && isScalarWithPredication(I)) {		if (VF.isVector() && isScalarWithPredication(I)) {
unsigned Cost = 0;		unsigned Cost = 0;

// These instructions have a non-void type, so account for the phi nodes		// These instructions have a non-void type, so account for the phi nodes
// that we will create. This cost is likely to be zero. The phi node		// that we will create. This cost is likely to be zero. The phi node
// cost, if any, should be scaled by the block probability because it		// cost, if any, should be scaled by the block probability because it
// models a copy at the end of each predicated block.		// models a copy at the end of each predicated block.
Cost += VF.Min * TTI.getCFInstrCost(Instruction::PHI, CostKind);		Cost += VF.getKnownMinValue() *
		TTI.getCFInstrCost(Instruction::PHI, CostKind);

// The cost of the non-predicated instruction.		// The cost of the non-predicated instruction.
Cost +=		Cost += VF.getKnownMinValue() *
VF.Min * TTI.getArithmeticInstrCost(I->getOpcode(), RetTy, CostKind);		TTI.getArithmeticInstrCost(I->getOpcode(), RetTy, CostKind);

// The cost of insertelement and extractelement instructions needed for		// The cost of insertelement and extractelement instructions needed for
// scalarization.		// scalarization.
Cost += getScalarizationOverhead(I, VF);		Cost += getScalarizationOverhead(I, VF);

// Scale the cost by the probability of executing the predicated blocks.		// Scale the cost by the probability of executing the predicated blocks.
// This assumes the predicated block for each vector lane is equally		// This assumes the predicated block for each vector lane is equally
// likely.		// likely.
Show All 22 Lines	case Instruction::Xor: {
Value *Op2 = I->getOperand(1);		Value *Op2 = I->getOperand(1);
TargetTransformInfo::OperandValueProperties Op2VP;		TargetTransformInfo::OperandValueProperties Op2VP;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TTI.getOperandInfo(Op2, Op2VP);		TTI.getOperandInfo(Op2, Op2VP);
if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))		if (Op2VK == TargetTransformInfo::OK_AnyValue && Legal->isUniform(Op2))
Op2VK = TargetTransformInfo::OK_UniformValue;		Op2VK = TargetTransformInfo::OK_UniformValue;

SmallVector<const Value *, 4> Operands(I->operand_values());		SmallVector<const Value *, 4> Operands(I->operand_values());
unsigned N = isScalarAfterVectorization(I, VF) ? VF.Min : 1;		unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(		return N * TTI.getArithmeticInstrCost(
I->getOpcode(), VectorTy, CostKind,		I->getOpcode(), VectorTy, CostKind,
TargetTransformInfo::OK_AnyValue,		TargetTransformInfo::OK_AnyValue,
Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);		Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
}		}
case Instruction::FNeg: {		case Instruction::FNeg: {
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
unsigned N = isScalarAfterVectorization(I, VF) ? VF.Min : 1;		unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N * TTI.getArithmeticInstrCost(		return N * TTI.getArithmeticInstrCost(
I->getOpcode(), VectorTy, CostKind,		I->getOpcode(), VectorTy, CostKind,
TargetTransformInfo::OK_AnyValue,		TargetTransformInfo::OK_AnyValue,
TargetTransformInfo::OK_AnyValue,		TargetTransformInfo::OK_AnyValue,
TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,		TargetTransformInfo::OP_None, TargetTransformInfo::OP_None,
I->getOperand(0), I);		I->getOperand(0), I);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond) {		if (!ScalarCond) {
assert(!VF.Scalable && "VF is assumed to be non scalable.");		assert(!VF.isScalable() && "VF is assumed to be non scalable.");
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);
}		}
return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,		return TTI.getCmpSelInstrCost(I->getOpcode(), VectorTy, CondTy,
CostKind, I);		CostKind, I);
}		}
case Instruction::ICmp:		case Instruction::ICmp:
case Instruction::FCmp: {		case Instruction::FCmp: {
Type *ValTy = I->getOperand(0)->getType();		Type *ValTy = I->getOperand(0)->getType();
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	if (canTruncateToMinimalBitwidth(I, VF)) {
largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		largestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
} else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt) {		} else if (Opcode == Instruction::ZExt \|\| Opcode == Instruction::SExt) {
SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);		SrcVecTy = largestIntegerVectorType(SrcVecTy, MinVecTy);
VectorTy =		VectorTy =
smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);		smallestIntegerVectorType(ToVectorTy(I->getType(), VF), MinVecTy);
}		}
}		}

assert(!VF.Scalable && "VF is assumed to be non scalable");		assert(!VF.isScalable() && "VF is assumed to be non scalable");
unsigned N = isScalarAfterVectorization(I, VF) ? VF.Min : 1;		unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
return N *		return N *
TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);		TTI.getCastInstrCost(Opcode, VectorTy, SrcVecTy, CCH, CostKind, I);
}		}
case Instruction::Call: {		case Instruction::Call: {
bool NeedToScalarize;		bool NeedToScalarize;
CallInst *CI = cast<CallInst>(I);		CallInst *CI = cast<CallInst>(I);
unsigned CallCost = getVectorCallCost(CI, VF, NeedToScalarize);		unsigned CallCost = getVectorCallCost(CI, VF, NeedToScalarize);
if (getVectorIntrinsicIDForCall(CI, TLI))		if (getVectorIntrinsicIDForCall(CI, TLI))
return std::min(CallCost, getVectorIntrinsicCost(CI, VF));		return std::min(CallCost, getVectorIntrinsicCost(CI, VF));
return CallCost;		return CallCost;
}		}
default:		default:
// The cost of executing VF copies of the scalar instruction. This opcode		// The cost of executing VF copies of the scalar instruction. This opcode
// is unknown. Assume that it is the same as 'mul'.		// is unknown. Assume that it is the same as 'mul'.
return VF.Min *		return VF.getKnownMinValue() * TTI.getArithmeticInstrCost(
TTI.getArithmeticInstrCost(Instruction::Mul, VectorTy,		Instruction::Mul, VectorTy, CostKind) +
CostKind) +
getScalarizationOverhead(I, VF);		getScalarizationOverhead(I, VF);
} // end of switch.		} // end of switch.
}		}

char LoopVectorize::ID = 0;		char LoopVectorize::ID = 0;

static const char lv_name[] = "Loop Vectorization";		static const char lv_name[] = "Loop Vectorization";

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	static unsigned determineVPlanVF(const unsigned WidestVectorRegBits,
LoopVectorizationCostModel &CM) {		LoopVectorizationCostModel &CM) {
unsigned WidestType;		unsigned WidestType;
std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();		std::tie(std::ignore, WidestType) = CM.getSmallestAndWidestTypes();
return WidestVectorRegBits / WidestType;		return WidestVectorRegBits / WidestType;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {		LoopVectorizationPlanner::planInVPlanNativePath(ElementCount UserVF) {
assert(!UserVF.Scalable && "scalable vectors not yet supported");		assert(!UserVF.isScalable() && "scalable vectors not yet supported");
ElementCount VF = UserVF;		ElementCount VF = UserVF;
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
if (!OrigLoop->empty()) {		if (!OrigLoop->empty()) {
// If the user doesn't provide a vectorization factor, determine a		// If the user doesn't provide a vectorization factor, determine a
// reasonable one.		// reasonable one.
if (UserVF.isZero()) {		if (UserVF.isZero()) {
VF = ElementCount::getFixed(		VF = ElementCount::getFixed(
determineVPlanVF(TTI->getRegisterBitWidth(true /* Vector*/), CM));		determineVPlanVF(TTI->getRegisterBitWidth(true /* Vector*/), CM));
LLVM_DEBUG(dbgs() << "LV: VPlan computed VF " << VF << ".\n");		LLVM_DEBUG(dbgs() << "LV: VPlan computed VF " << VF << ".\n");

// Make sure we have a VF > 1 for stress testing.		// Make sure we have a VF > 1 for stress testing.
if (VPlanBuildStressTest && (VF.isScalar() \|\| VF.isZero())) {		if (VPlanBuildStressTest && (VF.isScalar() \|\| VF.isZero())) {
LLVM_DEBUG(dbgs() << "LV: VPlan stress testing: "		LLVM_DEBUG(dbgs() << "LV: VPlan stress testing: "
<< "overriding computed VF.\n");		<< "overriding computed VF.\n");
VF = ElementCount::getFixed(4);		VF = ElementCount::getFixed(4);
}		}
}		}
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
assert(isPowerOf2_32(VF.Min) && "VF needs to be a power of two");		assert(isPowerOf2_32(VF.getKnownMinValue()) &&
		"VF needs to be a power of two");
LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")		LLVM_DEBUG(dbgs() << "LV: Using " << (!UserVF.isZero() ? "user " : "")
<< "VF " << VF << " to build VPlans.\n");		<< "VF " << VF << " to build VPlans.\n");
buildVPlans(VF.Min, VF.Min);		buildVPlans(VF.getKnownMinValue(), VF.getKnownMinValue());

// For VPlan build stress testing, we bail out after VPlan construction.		// For VPlan build stress testing, we bail out after VPlan construction.
if (VPlanBuildStressTest)		if (VPlanBuildStressTest)
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();

return {VF, 0 /Cost/};		return {VF, 0 /Cost/};
}		}

LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return VectorizationFactor::Disabled();		return VectorizationFactor::Disabled();
}		}

Optional<VectorizationFactor>		Optional<VectorizationFactor>
LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {		LoopVectorizationPlanner::plan(ElementCount UserVF, unsigned UserIC) {
assert(!UserVF.Scalable && "scalable vectorization not yet handled");		assert(!UserVF.isScalable() && "scalable vectorization not yet handled");
assert(OrigLoop->empty() && "Inner loop expected.");		assert(OrigLoop->empty() && "Inner loop expected.");
Optional<unsigned> MaybeMaxVF = CM.computeMaxVF(UserVF.Min, UserIC);		Optional<unsigned> MaybeMaxVF =
		CM.computeMaxVF(UserVF.getKnownMinValue(), UserIC);
if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.		if (!MaybeMaxVF) // Cases that should not to be vectorized nor interleaved.
return None;		return None;

// Invalidate interleave groups if all blocks of loop will be predicated.		// Invalidate interleave groups if all blocks of loop will be predicated.
if (CM.blockNeedsPredication(OrigLoop->getHeader()) &&		if (CM.blockNeedsPredication(OrigLoop->getHeader()) &&
!useMaskedInterleavedAccesses(*TTI)) {		!useMaskedInterleavedAccesses(*TTI)) {
LLVM_DEBUG(		LLVM_DEBUG(
dbgs()		dbgs()
<< "LV: Invalidate all interleaved groups due to fold-tail by masking "		<< "LV: Invalidate all interleaved groups due to fold-tail by masking "
"which requires masked-interleaved support.\n");		"which requires masked-interleaved support.\n");
if (CM.InterleaveInfo.invalidateGroups())		if (CM.InterleaveInfo.invalidateGroups())
// Invalidating interleave groups also requires invalidating all decisions		// Invalidating interleave groups also requires invalidating all decisions
// based on them, which includes widening decisions and uniform and scalar		// based on them, which includes widening decisions and uniform and scalar
// values.		// values.
CM.invalidateCostModelingDecisions();		CM.invalidateCostModelingDecisions();
}		}

if (!UserVF.isZero()) {		if (!UserVF.isZero()) {
LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		LLVM_DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF.Min) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF.getKnownMinValue()) &&
		"VF needs to be a power of two");
// Collect the instructions (and their associated costs) that will be more		// Collect the instructions (and their associated costs) that will be more
// profitable to scalarize.		// profitable to scalarize.
CM.selectUserVectorizationFactor(UserVF);		CM.selectUserVectorizationFactor(UserVF);
CM.collectInLoopReductions();		CM.collectInLoopReductions();
buildVPlansWithVPRecipes(UserVF.Min, UserVF.Min);		buildVPlansWithVPRecipes(UserVF.getKnownMinValue(),
		UserVF.getKnownMinValue());
LLVM_DEBUG(printPlans(dbgs()));		LLVM_DEBUG(printPlans(dbgs()));
return {{UserVF, 0}};		return {{UserVF, 0}};
}		}

unsigned MaxVF = MaybeMaxVF.getValue();		unsigned MaxVF = MaybeMaxVF.getValue();
assert(MaxVF != 0 && "MaxVF is zero.");		assert(MaxVF != 0 && "MaxVF is zero.");

for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {		for (unsigned VF = 1; VF <= MaxVF; VF *= 2) {
▲ Show 20 Lines • Show All 272 Lines • ▼ Show 20 Lines

VPWidenMemoryInstructionRecipe *		VPWidenMemoryInstructionRecipe *
VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange &Range,		VPRecipeBuilder::tryToWidenMemory(Instruction *I, VFRange &Range,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&		assert((isa<LoadInst>(I) \|\| isa<StoreInst>(I)) &&
"Must be called with either a load or store");		"Must be called with either a load or store");

auto willWiden = [&](ElementCount VF) -> bool {		auto willWiden = [&](ElementCount VF) -> bool {
assert(!VF.Scalable && "unexpected scalable ElementCount");		assert(!VF.isScalable() && "unexpected scalable ElementCount");
if (VF.isScalar())		if (VF.isScalar())
return false;		return false;
LoopVectorizationCostModel::InstWidening Decision =		LoopVectorizationCostModel::InstWidening Decision =
CM.getWideningDecision(I, VF);		CM.getWideningDecision(I, VF);
assert(Decision != LoopVectorizationCostModel::CM_Unknown &&		assert(Decision != LoopVectorizationCostModel::CM_Unknown &&
"CM decision should be taken at this point.");		"CM decision should be taken at this point.");
if (Decision == LoopVectorizationCostModel::CM_Interleave)		if (Decision == LoopVectorizationCostModel::CM_Interleave)
return true;		return true;
▲ Show 20 Lines • Show All 517 Lines • ▼ Show 20 Lines	if (CM.foldTailByMasking() && !Legal->getReductionVars().empty()) {
}		}
}		}

std::string PlanName;		std::string PlanName;
raw_string_ostream RSO(PlanName);		raw_string_ostream RSO(PlanName);
ElementCount VF = ElementCount::getFixed(Range.Start);		ElementCount VF = ElementCount::getFixed(Range.Start);
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "Initial VPlan for VF={" << VF;		RSO << "Initial VPlan for VF={" << VF;
for (VF.Min = 2; VF.Min < Range.End; VF.Min = 2) {		for (VF = 2; VF.getKnownMinValue() < Range.End; VF = 2) {
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "," << VF;		RSO << "," << VF;
}		}
RSO << "},UF>=1";		RSO << "},UF>=1";
RSO.flush();		RSO.flush();
Plan->setName(PlanName);		Plan->setName(PlanName);

return Plan;		return Plan;
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines
void VPReplicateRecipe::execute(VPTransformState &State) {		void VPReplicateRecipe::execute(VPTransformState &State) {
if (State.Instance) { // Generate a single instance.		if (State.Instance) { // Generate a single instance.
State.ILV->scalarizeInstruction(Ingredient, User, *State.Instance,		State.ILV->scalarizeInstruction(Ingredient, User, *State.Instance,
IsPredicated, State);		IsPredicated, State);
// Insert scalar instance packing it into a vector.		// Insert scalar instance packing it into a vector.
if (AlsoPack && State.VF.isVector()) {		if (AlsoPack && State.VF.isVector()) {
// If we're constructing lane 0, initialize to start from undef.		// If we're constructing lane 0, initialize to start from undef.
if (State.Instance->Lane == 0) {		if (State.Instance->Lane == 0) {
assert(!State.VF.Scalable && "VF is assumed to be non scalable.");		assert(!State.VF.isScalable() && "VF is assumed to be non scalable.");
Value *Undef =		Value *Undef =
UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));		UndefValue::get(VectorType::get(Ingredient->getType(), State.VF));
State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);		State.ValueMap.setVectorValue(Ingredient, State.Instance->Part, Undef);
}		}
State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);		State.ILV->packScalarIntoVectorValue(Ingredient, *State.Instance);
}		}
return;		return;
}		}

// Generate scalar instances for all VF lanes of all UF parts, unless the		// Generate scalar instances for all VF lanes of all UF parts, unless the
// instruction is uniform inwhich case generate only the first lane for each		// instruction is uniform inwhich case generate only the first lane for each
// of the UF parts.		// of the UF parts.
unsigned EndLane = IsUniform ? 1 : State.VF.Min;		unsigned EndLane = IsUniform ? 1 : State.VF.getKnownMinValue();
for (unsigned Part = 0; Part < State.UF; ++Part)		for (unsigned Part = 0; Part < State.UF; ++Part)
for (unsigned Lane = 0; Lane < EndLane; ++Lane)		for (unsigned Lane = 0; Lane < EndLane; ++Lane)
State.ILV->scalarizeInstruction(Ingredient, User, {Part, Lane},		State.ILV->scalarizeInstruction(Ingredient, User, {Part, Lane},
IsPredicated, State);		IsPredicated, State);
}		}

void VPBranchOnMaskRecipe::execute(VPTransformState &State) {		void VPBranchOnMaskRecipe::execute(VPTransformState &State) {
assert(State.Instance && "Branch on Mask works only on single instance.");		assert(State.Instance && "Branch on Mask works only on single instance.");
▲ Show 20 Lines • Show All 576 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines	public:
/// \return True if the map has any scalar entry for \p Key.		/// \return True if the map has any scalar entry for \p Key.
bool hasAnyScalarValue(Value *Key) const {		bool hasAnyScalarValue(Value *Key) const {
return ScalarMapStorage.count(Key);		return ScalarMapStorage.count(Key);
}		}

/// \return True if the map has a scalar entry for \p Key and \p Instance.		/// \return True if the map has a scalar entry for \p Key and \p Instance.
bool hasScalarValue(Value *Key, const VPIteration &Instance) const {		bool hasScalarValue(Value *Key, const VPIteration &Instance) const {
assert(Instance.Part < UF && "Queried Scalar Part is too large.");		assert(Instance.Part < UF && "Queried Scalar Part is too large.");
assert(Instance.Lane < VF.Min && "Queried Scalar Lane is too large.");		assert(Instance.Lane < VF.getKnownMinValue() &&
assert(!VF.Scalable && "VF is assumed to be non scalable.");		"Queried Scalar Lane is too large.");
		assert(!VF.isScalable() && "VF is assumed to be non scalable.");

if (!hasAnyScalarValue(Key))		if (!hasAnyScalarValue(Key))
return false;		return false;
const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;		const ScalarParts &Entry = ScalarMapStorage.find(Key)->second;
assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");		assert(Entry.size() == UF && "ScalarParts has wrong dimensions.");
assert(Entry[Instance.Part].size() == VF.Min &&		assert(Entry[Instance.Part].size() == VF.getKnownMinValue() &&
"ScalarParts has wrong dimensions.");		"ScalarParts has wrong dimensions.");
return Entry[Instance.Part][Instance.Lane] != nullptr;		return Entry[Instance.Part][Instance.Lane] != nullptr;
}		}

/// Retrieve the existing vector value that corresponds to \p Key and		/// Retrieve the existing vector value that corresponds to \p Key and
/// \p Part.		/// \p Part.
Value getVectorValue(Value Key, unsigned Part) {		Value getVectorValue(Value Key, unsigned Part) {
assert(hasVectorValue(Key, Part) && "Getting non-existent value.");		assert(hasVectorValue(Key, Part) && "Getting non-existent value.");
Show All 22 Lines	public:
/// value is not already set.		/// value is not already set.
void setScalarValue(Value Key, const VPIteration &Instance, Value Scalar) {		void setScalarValue(Value Key, const VPIteration &Instance, Value Scalar) {
assert(!hasScalarValue(Key, Instance) && "Scalar value already set");		assert(!hasScalarValue(Key, Instance) && "Scalar value already set");
if (!ScalarMapStorage.count(Key)) {		if (!ScalarMapStorage.count(Key)) {
ScalarParts Entry(UF);		ScalarParts Entry(UF);
// TODO: Consider storing uniform values only per-part, as they occupy		// TODO: Consider storing uniform values only per-part, as they occupy
// lane 0 only, keeping the other VF-1 redundant entries null.		// lane 0 only, keeping the other VF-1 redundant entries null.
for (unsigned Part = 0; Part < UF; ++Part)		for (unsigned Part = 0; Part < UF; ++Part)
Entry[Part].resize(VF.Min, nullptr);		Entry[Part].resize(VF.getKnownMinValue(), nullptr);
ScalarMapStorage[Key] = Entry;		ScalarMapStorage[Key] = Entry;
}		}
ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;		ScalarMapStorage[Key][Instance.Part][Instance.Lane] = Scalar;
}		}

/// Reset the vector value associated with \p Key for the given \p Part.		/// Reset the vector value associated with \p Key for the given \p Part.
/// This function can be used to update values that have already been		/// This function can be used to update values that have already been
/// vectorized. This is the case for "fix-up" operations including type		/// vectorized. This is the case for "fix-up" operations including type
▲ Show 20 Lines • Show All 1,819 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/VPlan.cpp

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	void VPRegionBlock::execute(VPTransformState *State) {

assert(!State->Instance && "Replicating a Region with non-null instance.");		assert(!State->Instance && "Replicating a Region with non-null instance.");

// Enter replicating mode.		// Enter replicating mode.
State->Instance = {0, 0};		State->Instance = {0, 0};

for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part) {		for (unsigned Part = 0, UF = State->UF; Part < UF; ++Part) {
State->Instance->Part = Part;		State->Instance->Part = Part;
assert(!State->VF.Scalable && "VF is assumed to be non scalable.");		assert(!State->VF.isScalable() && "VF is assumed to be non scalable.");
for (unsigned Lane = 0, VF = State->VF.Min; Lane < VF; ++Lane) {		for (unsigned Lane = 0, VF = State->VF.getKnownMinValue(); Lane < VF;
		++Lane) {
State->Instance->Lane = Lane;		State->Instance->Lane = Lane;
// Visit the VPBlocks connected to \p this, starting from it.		// Visit the VPBlocks connected to \p this, starting from it.
for (VPBlockBase *Block : RPOT) {		for (VPBlockBase *Block : RPOT) {
LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');		LLVM_DEBUG(dbgs() << "LV: VPBlock in RPO " << Block->getName() << '\n');
Block->execute(State);		Block->execute(State);
}		}
}		}
}		}
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	void VPInstruction::generateInstruction(VPTransformState &State,
}		}
case VPInstruction::ActiveLaneMask: {		case VPInstruction::ActiveLaneMask: {
// Get first lane of vector induction variable.		// Get first lane of vector induction variable.
Value *VIVElem0 = State.get(getOperand(0), {Part, 0});		Value *VIVElem0 = State.get(getOperand(0), {Part, 0});
// Get the original loop tripcount.		// Get the original loop tripcount.
Value *ScalarTC = State.TripCount;		Value *ScalarTC = State.TripCount;

auto *Int1Ty = Type::getInt1Ty(Builder.getContext());		auto *Int1Ty = Type::getInt1Ty(Builder.getContext());
auto *PredTy = FixedVectorType::get(Int1Ty, State.VF.Min);		auto *PredTy = FixedVectorType::get(Int1Ty, State.VF.getKnownMinValue());
Instruction *Call = Builder.CreateIntrinsic(		Instruction *Call = Builder.CreateIntrinsic(
Intrinsic::get_active_lane_mask, {PredTy, ScalarTC->getType()},		Intrinsic::get_active_lane_mask, {PredTy, ScalarTC->getType()},
{VIVElem0, ScalarTC}, nullptr, "active.lane.mask");		{VIVElem0, ScalarTC}, nullptr, "active.lane.mask");
State.set(this, Call, Part);		State.set(this, Call, Part);
break;		break;
}		}
default:		default:
llvm_unreachable("Unsupported opcode for instruction");		llvm_unreachable("Unsupported opcode for instruction");
▲ Show 20 Lines • Show All 435 Lines • ▼ Show 20 Lines	void VPWidenMemoryInstructionRecipe::print(raw_ostream &O, const Twine &Indent,
}		}
}		}

void VPWidenCanonicalIVRecipe::execute(VPTransformState &State) {		void VPWidenCanonicalIVRecipe::execute(VPTransformState &State) {
Value *CanonicalIV = State.CanonicalIV;		Value *CanonicalIV = State.CanonicalIV;
Type *STy = CanonicalIV->getType();		Type *STy = CanonicalIV->getType();
IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());		IRBuilder<> Builder(State.CFG.PrevBB->getTerminator());
ElementCount VF = State.VF;		ElementCount VF = State.VF;
assert(!VF.Scalable && "the code following assumes non scalables ECs");		assert(!VF.isScalable() && "the code following assumes non scalables ECs");
Value *VStart = VF.isScalar() ? CanonicalIV		Value *VStart = VF.isScalar()
: Builder.CreateVectorSplat(VF.Min, CanonicalIV,		? CanonicalIV
"broadcast");		: Builder.CreateVectorSplat(VF.getKnownMinValue(),
		CanonicalIV, "broadcast");
for (unsigned Part = 0, UF = State.UF; Part < UF; ++Part) {		for (unsigned Part = 0, UF = State.UF; Part < UF; ++Part) {
SmallVector<Constant *, 8> Indices;		SmallVector<Constant *, 8> Indices;
for (unsigned Lane = 0; Lane < VF.Min; ++Lane)		for (unsigned Lane = 0; Lane < VF.getKnownMinValue(); ++Lane)
Indices.push_back(ConstantInt::get(STy, Part * VF.Min + Lane));		Indices.push_back(
		ConstantInt::get(STy, Part * VF.getKnownMinValue() + Lane));
// If VF == 1, there is only one iteration in the loop above, thus the		// If VF == 1, there is only one iteration in the loop above, thus the
// element pushed back into Indices is ConstantInt::get(STy, Part)		// element pushed back into Indices is ConstantInt::get(STy, Part)
Constant *VStep = VF == 1 ? Indices.back() : ConstantVector::get(Indices);		Constant *VStep = VF == 1 ? Indices.back() : ConstantVector::get(Indices);
// Add the consecutive indices to the vector value.		// Add the consecutive indices to the vector value.
Value *CanonicalVectorIV = Builder.CreateAdd(VStart, VStep, "vec.iv");		Value *CanonicalVectorIV = Builder.CreateAdd(VStart, VStep, "vec.iv");
State.set(getVPValue(), CanonicalVectorIV, Part);		State.set(getVPValue(), CanonicalVectorIV, Part);
}		}
}		}
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	TEST(ScalableVectorMVTsTest, HelperFuncs) {
EXPECT_EQ(EVT::getVectorVT(Ctx, MVT::i64, EltCnt / 2), MVT::nxv1i64);		EXPECT_EQ(EVT::getVectorVT(Ctx, MVT::i64, EltCnt / 2), MVT::nxv1i64);

// Check that float->int conversion works		// Check that float->int conversion works
EVT Vnx2f64 = EVT::getVectorVT(Ctx, MVT::f64, ElementCount::getScalable(2));		EVT Vnx2f64 = EVT::getVectorVT(Ctx, MVT::f64, ElementCount::getScalable(2));
EXPECT_EQ(Vnx2f64.changeTypeToInteger(), Vnx2i64);		EXPECT_EQ(Vnx2f64.changeTypeToInteger(), Vnx2i64);

// Check fields inside llvm::ElementCount		// Check fields inside llvm::ElementCount
EltCnt = Vnx4i32.getVectorElementCount();		EltCnt = Vnx4i32.getVectorElementCount();
EXPECT_EQ(EltCnt.Min, 4U);		EXPECT_EQ(EltCnt.getKnownMinValue(), 4U);
ASSERT_TRUE(EltCnt.Scalable);		ASSERT_TRUE(EltCnt.isScalable());

// Check that fixed-length vector types aren't scalable.		// Check that fixed-length vector types aren't scalable.
EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);		EVT V8i32 = EVT::getVectorVT(Ctx, MVT::i32, 8);
ASSERT_FALSE(V8i32.isScalableVector());		ASSERT_FALSE(V8i32.isScalableVector());
EVT V4f64 = EVT::getVectorVT(Ctx, MVT::f64, ElementCount::getFixed(4));		EVT V4f64 = EVT::getVectorVT(Ctx, MVT::f64, ElementCount::getFixed(4));
ASSERT_FALSE(V4f64.isScalableVector());		ASSERT_FALSE(V4f64.isScalableVector());

// Check that llvm::ElementCount works for fixed-length types.		// Check that llvm::ElementCount works for fixed-length types.
EltCnt = V8i32.getVectorElementCount();		EltCnt = V8i32.getVectorElementCount();
EXPECT_EQ(EltCnt.Min, 8U);		EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
ASSERT_FALSE(EltCnt.Scalable);		ASSERT_FALSE(EltCnt.isScalable());
}		}

TEST(ScalableVectorMVTsTest, IRToVTTranslation) {		TEST(ScalableVectorMVTsTest, IRToVTTranslation) {
LLVMContext Ctx;		LLVMContext Ctx;

Type *Int64Ty = Type::getInt64Ty(Ctx);		Type *Int64Ty = Type::getInt64Ty(Ctx);
VectorType *ScV8Int64Ty =		VectorType *ScV8Int64Ty =
VectorType::get(Int64Ty, ElementCount::getScalable(8));		VectorType::get(Int64Ty, ElementCount::getScalable(8));
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

llvm/unittests/IR/VectorTypesTest.cpp

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	TEST(VectorTypesTest, FixedLength) {
EXPECT_EQ(DoubledTy->getElementType()->getScalarSizeInBits(), 64U);		EXPECT_EQ(DoubledTy->getElementType()->getScalarSizeInBits(), 64U);

auto *ConvTy = dyn_cast<FixedVectorType>(VectorType::getInteger(V4Float64Ty));		auto *ConvTy = dyn_cast<FixedVectorType>(VectorType::getInteger(V4Float64Ty));
EXPECT_VTY_EQ(ConvTy, V4Int64Ty);		EXPECT_VTY_EQ(ConvTy, V4Int64Ty);
EXPECT_EQ(ConvTy->getNumElements(), 4U);		EXPECT_EQ(ConvTy->getNumElements(), 4U);
EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);		EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);

EltCnt = V8Int64Ty->getElementCount();		EltCnt = V8Int64Ty->getElementCount();
EXPECT_EQ(EltCnt.Min, 8U);		EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
ASSERT_FALSE(EltCnt.Scalable);		ASSERT_FALSE(EltCnt.isScalable());
}		}

TEST(VectorTypesTest, Scalable) {		TEST(VectorTypesTest, Scalable) {
LLVMContext Ctx;		LLVMContext Ctx;

Type *Int8Ty = Type::getInt8Ty(Ctx);		Type *Int8Ty = Type::getInt8Ty(Ctx);
Type *Int16Ty = Type::getInt16Ty(Ctx);		Type *Int16Ty = Type::getInt16Ty(Ctx);
Type *Int32Ty = Type::getInt32Ty(Ctx);		Type *Int32Ty = Type::getInt32Ty(Ctx);
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	TEST(VectorTypesTest, Scalable) {

auto *ConvTy =		auto *ConvTy =
dyn_cast<ScalableVectorType>(VectorType::getInteger(ScV4Float64Ty));		dyn_cast<ScalableVectorType>(VectorType::getInteger(ScV4Float64Ty));
EXPECT_VTY_EQ(ConvTy, ScV4Int64Ty);		EXPECT_VTY_EQ(ConvTy, ScV4Int64Ty);
EXPECT_EQ(ConvTy->getMinNumElements(), 4U);		EXPECT_EQ(ConvTy->getMinNumElements(), 4U);
EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);		EXPECT_EQ(ConvTy->getElementType()->getScalarSizeInBits(), 64U);

EltCnt = ScV8Int64Ty->getElementCount();		EltCnt = ScV8Int64Ty->getElementCount();
EXPECT_EQ(EltCnt.Min, 8U);		EXPECT_EQ(EltCnt.getKnownMinValue(), 8U);
ASSERT_TRUE(EltCnt.Scalable);		ASSERT_TRUE(EltCnt.isScalable());
}		}

TEST(VectorTypesTest, BaseVectorType) {		TEST(VectorTypesTest, BaseVectorType) {
LLVMContext Ctx;		LLVMContext Ctx;

Type *Int16Ty = Type::getInt16Ty(Ctx);		Type *Int16Ty = Type::getInt16Ty(Ctx);
Type *Int32Ty = Type::getInt32Ty(Ctx);		Type *Int32Ty = Type::getInt32Ty(Ctx);

Show All 17 Lines	/*
. .		. .
. .		. .
(7,0) ... (7,7)		(7,0) ... (7,7)
*/		*/
for (size_t I = 0, IEnd = VTys.size(); I < IEnd; ++I) {		for (size_t I = 0, IEnd = VTys.size(); I < IEnd; ++I) {
// test I == J		// test I == J
VectorType *VI = VTys[I];		VectorType *VI = VTys[I];
ElementCount ECI = VI->getElementCount();		ElementCount ECI = VI->getElementCount();
EXPECT_EQ(isa<ScalableVectorType>(VI), ECI.Scalable);		EXPECT_EQ(isa<ScalableVectorType>(VI), ECI.isScalable());

for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {		for (size_t J = I + 1, JEnd = VTys.size(); J < JEnd; ++J) {
// test I < J		// test I < J
VectorType *VJ = VTys[J];		VectorType *VJ = VTys[J];
EXPECT_VTY_NE(VI, VJ);		EXPECT_VTY_NE(VI, VJ);

VectorType *VJPrime = VectorType::get(VI->getElementType(), VJ);		VectorType *VJPrime = VectorType::get(VI->getElementType(), VJ);
if (VI->getElementType() == VJ->getElementType()) {		if (VI->getElementType() == VJ->getElementType()) {
▲ Show 20 Lines • Show All 120 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Make ElementCount members privateClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 288594

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/CodeGen/CGDebugInfo.cpp

clang/lib/CodeGen/CodeGenTypes.cpp

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/VectorUtils.h

llvm/include/llvm/CodeGen/ValueTypes.h

llvm/include/llvm/IR/DataLayout.h

llvm/include/llvm/IR/DerivedTypes.h

llvm/include/llvm/IR/Instructions.h

llvm/include/llvm/Support/MachineValueType.h

llvm/include/llvm/Support/TypeSize.h

llvm/lib/Analysis/InstructionSimplify.cpp

llvm/lib/Analysis/VFABIDemangling.cpp

llvm/lib/Analysis/ValueTracking.cpp

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/lib/CodeGen/CodeGenPrepare.cpp

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/TargetLoweringBase.cpp

llvm/lib/CodeGen/ValueTypes.cpp

llvm/lib/IR/AsmWriter.cpp

llvm/lib/IR/ConstantFold.cpp

llvm/lib/IR/Constants.cpp

llvm/lib/IR/Core.cpp

llvm/lib/IR/DataLayout.cpp

llvm/lib/IR/Function.cpp

llvm/lib/IR/IRBuilder.cpp

llvm/lib/IR/Instructions.cpp

llvm/lib/IR/IntrinsicInst.cpp

llvm/lib/IR/Type.cpp

llvm/lib/Target/AArch64/AArch64ISelDAGToDAG.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp

llvm/lib/Transforms/Utils/FunctionComparator.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/lib/Transforms/Vectorize/VPlan.h

llvm/lib/Transforms/Vectorize/VPlan.cpp

llvm/unittests/CodeGen/ScalableVectorMVTsTest.cpp

llvm/unittests/IR/VectorTypesTest.cpp

[SVE] Make ElementCount members private
ClosedPublic