This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Analysis/
-
llvm/
-
Analysis/
1
TargetTransformInfo.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
CodeGen/
-
BasicTargetTransformInfo.cpp
-
Target/
-
AArch64/
-
AArch64TargetTransformInfo.cpp
-
ARM/
-
ARMTargetTransformInfo.cpp
-
PowerPC/
-
PPCTargetTransformInfo.cpp
-
X86/
5
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
2
LoopVectorize.cpp
-
SLPVectorizer.cpp
-
test/Transforms/
-
Transforms/
-
LoopVectorize/X86/
-
X86/
-
powof2div.ll
-
SLPVectorizer/X86/
-
X86/
1
powof2div.ll

Differential D4971

Improve Cost model for SLPVectorizer when we have a vector division by power of 2
ClosedPublic

Authored by karthikthecool on Aug 19 2014, 6:33 AM.

Download Raw Diff

Details

Reviewers

nadav
aschwaighofer
andreadb
hfinkel

Summary

Hi All,
This patch improves the cost model in SLPVectorizer for division by power of 2. Currently the below code is not vectorized by clang in O3. Gcc though is able to vectorizes this.

void f(int* restrict a,int *restrict b,int *restrict c ) {
  a[0] = (b[0]+c[0])/2;
  a[1] = (b[1]+c[1])/2;
  a[2] = (b[2]+c[2])/2;
  a[3] = (b[3]+c[3])/2;
}

The problem is SLPVectorizer estimates the cost of vector divide as too high to be profitable and gives up on vectorization.
But in cases such as the above were we are dividing by power of 2 the cost infact is much less as backend converts them into instruction such as psrad/psraw on X86 targets.
The current patch updates the cost model when we divide by power of 2 to enable vectorization in such cases.

Please let me know your i/p's on the same.

Thanks and Regards
Karthik Bhat

Diff Detail

Event Timeline

karthikthecool updated this revision to Diff 12663.Aug 19 2014, 6:33 AM

karthikthecool retitled this revision from to Improve Cost model for SLPVectorizer when we have a vector division by power of 2.

karthikthecool updated this object.

karthikthecool edited the test plan for this revision. (Show Details)

karthikthecool added reviewers: nadav, aschwaighofer, hfinkel.

karthikthecool added a subscriber: Unknown Object (MLST).

spatel added a subscriber: spatel.Aug 19 2014, 10:13 AM

spatel added inline comments.Aug 19 2014, 10:34 AM

lib/Target/X86/X86TargetTransformInfo.cpp
231	udiv should become a logical shift left: "vpsrld" or "vpsrlw" with AVX2. With SSE, it's "psrld" or "psrlw" (just remove the leading 'v'). sdiv is handled with a sequence of logical shift left, add, algebraic shift left. The cost should be the sum of those ops?

Original Message -----

From: "Sanjay Patel" <spatel@rotateright.com>
To: "kv bhat" <kv.bhat@samsung.com>, nrotem@apple.com, aschwaighofer@apple.com, hfinkel@anl.gov
Cc: spatel@rotateright.com, llvm-commits@cs.uiuc.edu
Sent: Tuesday, August 19, 2014 12:34:33 PM
Subject: Re: [PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2

Comment at: lib/Target/X86/X86TargetTransformInfo.cpp:208
@@ +207,3 @@
+ {ISD::SDIV, MVT::v16i16, 1}, psraw instruction
+ {ISD::UDIV, MVT::v16i16, 1}, psraw instruction

+ {ISD::SDIV, MVT::v8i32, 1}, // psrad instruction

udiv should become a logical shift left: "vpsrld" or "vpsrlw" with
AVX2. With SSE, it's "psrld" or "psrlw" (just remove the leading
'v').

sdiv is handled with a sequence of logical shift left, add, algebraic
shift left. The cost should be the sum of those ops?

The costs are really about throughput. I'd imagine that it will only really be the sum if there is a single dependency chain, no uop fusion, etc. Likely best to measure it.

-Hal

http://reviews.llvm.org/D4971

Should there also be a corresponding change in the loop vectorizer?

include/llvm/Analysis/TargetTransformInfo.h
338	I wonder if this is the right design: This is really a subcase of the uniform constant case, and requires all implementations to handle the defaulting. We could have non-uniform values that are all powers of two. Having some optional feature bits seems better. I imagine will want more special properties like this in the future.

Hi Hal,Sanjay,
Thanks for the inputs. The reason i added support for only uniform constant power of 2 is because currently x86 backend successfully emits PSRAW/PSRAD (but if i'm not wrong this requires all the shifts to be same. i.e. division by same power of 2).

Non uniform values that are all power of 2 though are currently not that profitable as backend doesn't emit/or have instructions that can do this profitably.

Hal i had a doubt, by feature bit do we mean having something like a member variable say isPowerof2 and setting this before calling getArithmeticInstrCost?
Please correct me if my understanding is wrong.

Thanks for all the help and taking your time to review the patch.
I will update the patch shortly based on your i/p's and raise updated patch.

Thanks
Karthik Bhat

Original Message -----

From: "Karthik Bhat" <kv.bhat@samsung.com>
To: "kv bhat" <kv.bhat@samsung.com>, nrotem@apple.com, aschwaighofer@apple.com, hfinkel@anl.gov
Cc: spatel@rotateright.com, llvm-commits@cs.uiuc.edu
Sent: Wednesday, August 20, 2014 12:18:05 PM
Subject: Re: [PATCH] Improve Cost model for SLPVectorizer when we have a vector division by power of 2

Hi Hal,Sanjay,
Thanks for the inputs. The reason i added support for only uniform
constant power of 2 is because currently x86 backend successfully
emits PSRAW/PSRAD (but if i'm not wrong this requires all the shifts
to be same. i.e. division by same power of 2).

Non uniform values that are all power of 2 though are currently not
that profitable as backend doesn't emit/or have instructions that
can do this profitably.

Hal i had a doubt, by feature bit do we mean having something like a
member variable say isPowerof2 and setting this before calling
getArithmeticInstrCost?
Please correct me if my understanding is wrong.

No; maybe we want to add something like:

enum OperandValueProperties {

OP_None = 0,
OP_PowerOf2 = 1

}
(this is really a bit field).

and then make getArithmeticInstrCost take (OperandValueKind Opd1Info, unsigned Prop1, OperandValueKind Opd2Info, OperandValueProperties Prop2)
instead of just Opd1Info and Opd2Info.

Then in the routines, you check for (Prop1 & OP_PowerOf2), etc.

-Hal

Thanks for all the help and taking your time to review the patch.
I will update the patch shortly based on your i/p's and raise updated
patch.

Thanks
Karthik Bhat

http://reviews.llvm.org/D4971

Very interesting. Not just X86, either. Filed http://llvm.org/bugs/show_bug.cgi?id=20714 for the AArch64 equivalent.

Hi Hal,Sanjay,
Updated the patch as per review comments. The modifcations done are -

Add feature bit as per review comments to check if we have a uniform const power of 2. Currently only handling uniform constant power of 2. Will check more on non uniform const with all power of 2 and come up with another patch.
Updated Loop vectorizers cost model to handle similar situation.
Updated cost table comments and value in X86TargetTransformInfo.

Please let me know if this looks good to commit. Thanks for your time and valuable inputs.

Regards
Karthik Bhat

Herald added a subscriber: mcrosier. · View Herald TranscriptAug 21 2014, 1:45 AM

This LGTM (but please also get an okay from someone familiar with the X86 vector instructions).

lib/Transforms/Vectorize/LoopVectorize.cpp
5855	This can just be: if (SplatValue) {

Thanks Hal for the review. Updated the patch.
Nadav,Arnold any inputs on this one? Shall i go ahead with commit?
Thanks
Karthik Bhat

Hi Karthik,

sorry for the long post.
I have a few comments related to your patch.

I think your cost tables are incomplete. For example (correct me if I am wrong), the cost of a UDIV/SDIV by a constant splat of powers-of-2 is exactly the same on AVX2 and AVX (and on SSE as well) if we exclude the legalization cost which maybe different for SSE.

Also, the cost of an SDIV would be 4 and not 3:
on AVX for example, you would get a vpsrad + vpsrld + vpaddd + vpsrad.
You would get an extra arithmetic shift at the beginning because you would need to propagate the sign bit. Under the assumption that we are dividing by a non-negative power-of-2 then we are done (i.e. the cost is 4); otherwise we would pay an extra ISD::SUB to negate the result. In your case you don't need to account for the extra ISD::SUB because method APInt::isPowerOf2() would only return true for values that are > 0.

That said, if possible I suggest to reuse the existing cost tables as much as possible rather than defining new tables (see comments below).

I hope this makes sense to you.
-Andrea

lib/Target/X86/X86TargetTransformInfo.cpp
189–212	Here you can add the following check: // Unsigned divisions by powers-of-2 are always reduced to SRL. if (ISD == ISD::UDIV && TargetTransformInfo::OK_UniformConstantValue && Op2PropInfo == TargetTransformInfo::OP_PowerOf2) ISD = ISD::SRL; I had a quick look at the code and I am pretty sure that this is what we do for every target. This will allow you to get rid of all your new entries for UDIV in the new cost tables you added for powers-of-2. In the case of SDIV, each targets may provide its own custom implementation of `sdiv x, pow2`. On X86 we don't override method 'BuildSDIVPow2' and therefore we fall back to the standard expansion of signed divisions by pow2; as a result, we introduce the sequence SRA + SRL + ADD + SRA. if (ISD == ISD::SDIV && TargetTransformInfo::OK_UniformConstantValue && Op2PropInfo == TargetTransformInfo::OP_PowerOf2) { // On X86, vector signed division by constants power-of-two are // normally expanded to the sequence SRA + SRL + ADD + SRA. unsigned Cost = 2 * getArithmeticInstrCost(ISD::SRA, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); Cost += getArithmeticInstrCost(ISD::SRL, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); Cost += getArithmeticInstrCost(ISD::ADD, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); return Cost; } This will allow you to reuse the existing cost tables without having to add extra cost tables to handle the special case of divisions by powers-of-2. As a side note: in the future, we can think of removing the check for TargetTransformInfo::OK_UniformConstantValue to also address non-uniform constants as suggested by Hal. For example, on AVX2, UDIV by non-uniform constants power-of-2 could be emitted as a single vprslv. That is because we can take advantage of vector shifts with variable count (which we don't have on SSE/AVX). Unfortunately, this is not happening at the moment and we end up scalarizing the logical shifts instead. This is why we still need that check for now.. Example: %div = udiv <4 x i32> %A, <i32 4, i32 8, i32 16, i32 4> Could be strenght reduced to: %div = lshr <4 x i32> %A, <i32 2, i32 3, i32 4, i32 2> On AVX2, that logical shift right would be emitted as a single vpsrlvd. Unfortunately this is not happening at the moment and we get a long sequence of vpextrd + shrl + vpinsrd.. Similarly, On AVX2 we could strongly optimize the SDIV expansion by non-uniform constants power-of-2 (this - again - is not happening at the moment).

Hi Andrea,
Thanks for the i/p's. Yes for UDIV the cost can be reduced to that of SRL.
For SDIV though I checked the below code with gcc and clang we seem to get 3 extra instructions (vpsrld,vpaddd,vpsrad) for SDIV. Please correct me if i'm wrong.

void f(int* restrict a,int* restrict b,int* restrict c) {
  int i;
  for(i=0;i<4;i=i+4) {
    a[i] = (b[i]+c[i])/2;
    a[i+1] = (b[i+1]+c[i+1])/2;
    a[i+2] = (b[i+2]+c[i+2])/2;
    a[i+3] = (b[i+3]+c[i+3])/2;
  }
}

compiled with clang -O3 -mavx2 test.c -S -o test.s

#BB#0:                                 # %entry
vmovdqu	(%rdx), %xmm0
vpaddd	(%rsi), %xmm0, %xmm0
vpsrld	$31, %xmm0, %xmm1
vpaddd	%xmm1, %xmm0, %xmm0
vpsrad	$1, %xmm0, %xmm0
vmovdqu	%xmm0, (%rdi)
retq

I agree that reusing the existing table is a good option. I will update the patch accordingly.

Yes we can vectorize non constant power of 2 in avx2 but i suppose it will be using vpsrlvd,vpsravd?
Thanks all for helping me out here.

Regards
Karthik Bhat

Hi Andrea,
Thanks for clarifying my doubts. Updated the patch as per review comments and ran clang format.
I will try to handle vectorization of division by non constant pow of 2 seperatly in another patch.
Please let me know your opinion on the same.

Thanks and Regards
Karthik Bhat

I have two minor comments, otherwise the patch LGTM!
(see comments below).

Thanks,
Andrea

lib/Target/X86/X86TargetTransformInfo.cpp
222	I think we also have to explicitly set Opd2PropInfo to 'TargetTransformInfo::OP_None. The shift count of the resulting shift is unlikely to be a power-of-2.
230–235	Now that I think about it.. it is wrong to pass the old 'Opd1PropInfo' and 'Opd2PropInfo' to those new calls to 'getArithmeticInstrCost' (sorry, that code came from my original suggestion...). The expanded shifts will have different shift counts which may or may not be powers-of-2. The Add will have different operands, so we cannot reuse the operand properties of the original UDIV/SDIV. To be conservative here, we have to use the default values for Opd1PropInfo and Opd2PropInfo (OP_None) in each call to 'getArithmeticInstrCost'.

This revision is now accepted and ready to land.Aug 21 2014, 8:21 AM

spatel added inline comments.Aug 21 2014, 8:29 AM

lib/Target/X86/X86TargetTransformInfo.cpp
245	Nit: the changes to the bracket spacing are not consistent with the rest of the file. Can you adjust all of those in a later checkin so this patch is minimal?

Hi all -
If we recognize that power of 2 integer division is always converted to one or more simple ops (shifts, adds, subs) for all architectures via DAGCombiner, then can we hoist the changes out of X86TargetTransformInfo and into the superclass TargetTransformInfo so we don't have to repeat this logic for every target?

In D4971#28, @spatel wrote:

Hi all -
If we recognize that power of 2 integer division is always converted to one or more simple ops (shifts, adds, subs) for all architectures via DAGCombiner, then can we hoist the changes out of X86TargetTransformInfo and into the superclass TargetTransformInfo so we don't have to repeat this logic for every target?

We can safely do this only for UDIV. UDIV by pow-2 is always converted to a SRL. This is true for all targets.

However, we cannot guarantee that SDIV is treated the same way by all targets.
How SDIV gets expanded in the backend really depends is always target specific.
By default, SDIV is expanded into a sequence of shifts+adds (this is the behavior on X86). Other targets may not implement that same default behavior.
For example, Aarch64 custom expands SDIV bu Pow2 in a different way (see AArch64TargetLowering::BuildSDIVPow2).

Also, some targets may want to define TLI.isPow2DivCheap... so, as you can see, the problem is complicated.

In D4971#29, @andreadb wrote:

We can safely do this only for UDIV. UDIV by pow-2 is always converted to a SRL. This is true for all targets.

Does it make sense to split this patch into 2 pieces then? One that handles UDIV universally, and then a follow-on for SDIV. I was going to suggest additional test cases for each op anyway. :)

However, we cannot guarantee that SDIV is treated the same way by all targets.
How SDIV gets expanded in the backend really depends is always target specific.
By default, SDIV is expanded into a sequence of shifts+adds (this is the behavior on X86). Other targets may not implement that same default behavior.
For example, Aarch64 custom expands SDIV bu Pow2 in a different way (see AArch64TargetLowering::BuildSDIVPow2).

Right - Aarch64 has extra goodness via the rounding constant in "usra"; this is shown in the bug ( http://llvm.org/bugs/show_bug.cgi?id=20714 ) that Jim filed.

But we could still have a conservative upper cost bound for all targets? As you noted, division by exactly "2" costs one inst less, and there's one more instruction if we change this code to handle negative divisors, so we're not getting an exact cost value in any case.

Also, some targets may want to define TLI.isPow2DivCheap... so, as you can see, the problem is complicated.

I think PPC is the only arch that sets it (which seems like a bug to me, but I'm probably missing the reason; the PPC backend turns scalar signed int pow2div into sra/addze anyway).

I assume the intent of that flag is to say that the HW itself recognizes pow2div (signed or unsigned?) and can do it just as fast as a shift. But I'm not aware of any vector ISA that even includes an integer division instruction.

In D4971#30, @spatel wrote:

I assume the intent of that flag is to say that the HW itself recognizes pow2div (signed or unsigned?) and can do it just as fast as a shift.
But I'm not aware of any vector ISA that even includes an integer division instruction.

Let me try to make my point here clearer: if there's no vector integer division instruction to use, then I think the value of isPow2DivCheap() is irrelevant; we have to implement division using shift(s) anyway.

I don't know if checking for the existence of a vector division instruction is available at this level of LLVM, so the point may be moot...or that could be added as another target feature flag?

On Thu, Aug 21, 2014 at 11:16 AM, Andrea Di Biagio <andrea.dibiagio@gmail.com> wrote:

I think it makes perfect sense :-).
I have a (maybe stupid) question: do we really have to worry about the
case of UDIV by powers-of-2 in the vectorizer?
I am asking this because the optimizer would always convert an UDIV by
powers-of-2 into a SRL. So, by the time we run the vectorizer, all the
foldable UDIV by power-of-2 have been already optimized into SRL..>

That's an excellent question...
I converted both of the current test cases in the patch to 'udiv', and we already get vectorized 'lshr' with opt -O2. :)

Whether we need extra code down in TargetTransformInfo for UDIV as a safeguard, I don't know.

Hi Andrea,Sanjay,
It seems like i missed out on some intresting discussion yesterday.
Updating the patch addressing Andrea and Sanjay's comments.

Intrestingly yes UDIV by power of 2 is actually coverted into SRL before it reaches vectorization code so i think checking for UDIV with power of 2 is as good as dead code here. Removed the same.

Opd1PropInfo,Opd2PropInfo actually doesn't effect cost of Instruction::AShr,Instruction::LShr,Instruction::Add as it is used only by SDIV. Having said that i agree it is incorrect to pass old property while getting the cost although they are not used as of now. So updated it to use OP_None.

Updated formatting comments given by Sanjay.
Since now we are only checking for signed division by power of 2 the existing test cases should suffice?

Thanks again for your interest in the patch. Does this now look good to commit?

Thanks and Regards
Karthik Bhat

Patch LGTM. Thanks!

In D4971#33, @karthikthecool wrote:

Hi Andrea,Sanjay,
It seems like i missed out on some intresting discussion yesterday.
Updating the patch addressing Andrea and Sanjay's comments.

Intrestingly yes UDIV by power of 2 is actually coverted into SRL before it reaches vectorization code so i think checking for UDIV with power of 2 is as good as dead code here. Removed the same.

Opd1PropInfo,Opd2PropInfo actually doesn't effect cost of Instruction::AShr,Instruction::LShr,Instruction::Add as it is used only by SDIV. Having said that i agree it is incorrect to pass old property while getting the cost although they are not used as of now. So updated it to use OP_None.

Updated formatting comments given by Sanjay.
Since now we are only checking for signed division by power of 2 the existing test cases should suffice?

Thanks again for your interest in the patch. Does this now look good to commit?

Thanks and Regards
Karthik Bhat

spatel added inline comments.Aug 22 2014, 7:53 AM

test/Transforms/SLPVectorizer/X86/powof2div.ll
9	Does this test case require 3 operands and adds, or can it be simplified to just do the sdivs? Please also use CHECK-LABEL for both test cases to make it easier for future additions in each file. If the output is entirely predictable, use CHECK-NEXT's and specify each instruction including the 'ret'.

Hi Karthik,
We haven't answered the biggest question that I have about this patch:
Can we supply a default cost calculation for SDIV that uses the sra/srl/add/sra sequence that is generated by default by DAGCombiner? If the answer is yes, then all targets will benefit. There should be a way to tie the cost calculation to whatever is implemented in "BuildSDIVPow2()" - so if a target is overriding that, they can also override the default cost calculation if they'd like.

Since you already have LGTM from the other reviewers, I won't hold up this patch, but please add a FIXME comment somewhere if you commit this only for x86 or file a bug so we don't lose track of the issue.

spatel added inline comments.Aug 24 2014, 3:57 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
5848	Replace isa<> and cast<> with a dyn_cast<>? http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates

Hi All,
Updated patch as per Sanjay's comment and submitted as r216371.

I think the reason we have different cost tables for different targets is because it may not be possible to come up with a common upper bound which can be profitable in case of all targets.
For e.g. sra/srl/add/sra sequence may be the upper bound for X86 but this may/may not be true for other targets.
One more advantage which i feel of having target specific cost table is that it is easily extendable in case a new intruction is added to support some feature.

I'm not that familiar with DAGCombiner as of now will have a look at it and get back on this.
Thanks and Regards
Karthik Bhat

Hi Karthik -
Thanks for adding the FIXME comment.
Your patch did not address the inline comments I made regarding the test case for the SLP vectorizer and using dyn_cast<>. Please update.

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

12 lines

lib/

Analysis/

TargetTransformInfo.cpp

14 lines

CodeGen/

BasicTargetTransformInfo.cpp

8 lines

Target/

AArch64/

AArch64TargetTransformInfo.cpp

20 lines

ARM/

ARMTargetTransformInfo.cpp

20 lines

PowerPC/

PPCTargetTransformInfo.cpp

18 lines

X86/

X86TargetTransformInfo.cpp

31 lines

Transforms/

Vectorize/

LoopVectorize.cpp

21 lines

SLPVectorizer.cpp

16 lines

test/

Transforms/

LoopVectorize/

X86/

powof2div.ll

31 lines

SLPVectorizer/

X86/

powof2div.ll

42 lines

Diff 12832

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	public:
/// \brief Additional information about an operand's possible values.		/// \brief Additional information about an operand's possible values.
enum OperandValueKind {		enum OperandValueKind {
OK_AnyValue, // Operand can have any value.		OK_AnyValue, // Operand can have any value.
OK_UniformValue, // Operand is uniform (splat of a value).		OK_UniformValue, // Operand is uniform (splat of a value).
OK_UniformConstantValue, // Operand is uniform constant.		OK_UniformConstantValue, // Operand is uniform constant.
OK_NonUniformConstantValue // Operand is a non uniform constant value.		OK_NonUniformConstantValue // Operand is a non uniform constant value.
};		};

		/// \brief Additional properties of an operand's values.
		hfinkelUnsubmitted Not Done Reply Inline Actions I wonder if this is the right design: This is really a subcase of the uniform constant case, and requires all implementations to handle the defaulting. We could have non-uniform values that are all powers of two. Having some optional feature bits seems better. I imagine will want more special properties like this in the future. hfinkel: I wonder if this is the right design: 1. This is really a subcase of the uniform constant case…
		enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };

/// \return The number of scalar or vector registers that the target has.		/// \return The number of scalar or vector registers that the target has.
/// If 'Vectors' is true, it returns the number of vector registers. If it is		/// If 'Vectors' is true, it returns the number of vector registers. If it is
/// set to false, it returns the number of scalar registers.		/// set to false, it returns the number of scalar registers.
virtual unsigned getNumberOfRegisters(bool Vector) const;		virtual unsigned getNumberOfRegisters(bool Vector) const;

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
virtual unsigned getRegisterBitWidth(bool Vector) const;		virtual unsigned getRegisterBitWidth(bool Vector) const;

/// \return The maximum unroll factor that the vectorizer should try to		/// \return The maximum unroll factor that the vectorizer should try to
/// perform for this target. This number depends on the level of parallelism		/// perform for this target. This number depends on the level of parallelism
/// and the number of execution units in the CPU.		/// and the number of execution units in the CPU.
virtual unsigned getMaximumUnrollFactor() const;		virtual unsigned getMaximumUnrollFactor() const;

/// \return The expected cost of arithmetic ops, such as mul, xor, fsub, etc.		/// \return The expected cost of arithmetic ops, such as mul, xor, fsub, etc.
virtual unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		virtual unsigned
		getArithmeticInstrCost(unsigned Opcode, Type *Ty,
OperandValueKind Opd1Info = OK_AnyValue,		OperandValueKind Opd1Info = OK_AnyValue,
OperandValueKind Opd2Info = OK_AnyValue) const;		OperandValueKind Opd2Info = OK_AnyValue,
		OperandValueProperties Opd1PropInfo = OP_None,
		OperandValueProperties Opd2PropInfo = OP_None) const;

/// \return The cost of a shuffle instruction of kind Kind and of type Tp.		/// \return The cost of a shuffle instruction of kind Kind and of type Tp.
/// The index and subtype parameters are used by the subvector insertion and		/// The index and subtype parameters are used by the subvector insertion and
/// extraction shuffle kinds.		/// extraction shuffle kinds.
virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index = 0,		virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp, int Index = 0,
Type *SubTp = nullptr) const;		Type *SubTp = nullptr) const;

/// \return The expected cost of cast instructions, such as bitcast, trunc,		/// \return The expected cost of cast instructions, such as bitcast, trunc,
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines
unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {		unsigned TargetTransformInfo::getRegisterBitWidth(bool Vector) const {
return PrevTTI->getRegisterBitWidth(Vector);		return PrevTTI->getRegisterBitWidth(Vector);
}		}

unsigned TargetTransformInfo::getMaximumUnrollFactor() const {		unsigned TargetTransformInfo::getMaximumUnrollFactor() const {
return PrevTTI->getMaximumUnrollFactor();		return PrevTTI->getMaximumUnrollFactor();
}		}

unsigned TargetTransformInfo::getArithmeticInstrCost(unsigned Opcode,		unsigned TargetTransformInfo::getArithmeticInstrCost(
Type *Ty,		unsigned Opcode, Type *Ty, OperandValueKind Op1Info,
OperandValueKind Op1Info,		OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo,
OperandValueKind Op2Info) const {		OperandValueProperties Opd2PropInfo) const {
return PrevTTI->getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);		return PrevTTI->getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info,
		Opd1PropInfo, Opd2PropInfo);
}		}

unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Tp,		unsigned TargetTransformInfo::getShuffleCost(ShuffleKind Kind, Type *Tp,
int Index, Type *SubTp) const {		int Index, Type *SubTp) const {
return PrevTTI->getShuffleCost(Kind, Tp, Index, SubTp);		return PrevTTI->getShuffleCost(Kind, Tp, Index, SubTp);
}		}

unsigned TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst,		unsigned TargetTransformInfo::getCastInstrCost(unsigned Opcode, Type *Dst,
▲ Show 20 Lines • Show All 377 Lines • ▼ Show 20 Lines	unsigned getRegisterBitWidth(bool Vector) const override {
return 32;		return 32;
}		}

unsigned getMaximumUnrollFactor() const override {		unsigned getMaximumUnrollFactor() const override {
return 1;		return 1;
}		}

unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,		unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,
OperandValueKind) const override {		OperandValueKind, OperandValueProperties,
		OperandValueProperties) const override {
return 1;		return 1;
}		}

unsigned getShuffleCost(ShuffleKind Kind, Type *Ty,		unsigned getShuffleCost(ShuffleKind Kind, Type *Ty,
int Index = 0, Type *SubTp = nullptr) const override {		int Index = 0, Type *SubTp = nullptr) const override {
return 1;		return 1;
}		}

▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

lib/CodeGen/BasicTargetTransformInfo.cpp

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	public:

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector) const override;		unsigned getNumberOfRegisters(bool Vector) const override;
unsigned getMaximumUnrollFactor() const override;		unsigned getMaximumUnrollFactor() const override;
unsigned getRegisterBitWidth(bool Vector) const override;		unsigned getRegisterBitWidth(bool Vector) const override;
unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,		unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,
OperandValueKind) const override;		OperandValueKind, OperandValueProperties,
		OperandValueProperties) const override;
unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,		unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,
int Index, Type *SubTp) const override;		int Index, Type *SubTp) const override;
unsigned getCastInstrCost(unsigned Opcode, Type *Dst,		unsigned getCastInstrCost(unsigned Opcode, Type *Dst,
Type *Src) const override;		Type *Src) const override;
unsigned getCFInstrCost(unsigned Opcode) const override;		unsigned getCFInstrCost(unsigned Opcode) const override;
unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,		unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy) const override;		Type *CondTy) const override;
unsigned getVectorInstrCost(unsigned Opcode, Type *Val,		unsigned getVectorInstrCost(unsigned Opcode, Type *Val,
▲ Show 20 Lines • Show All 168 Lines • ▼ Show 20 Lines	unsigned BasicTTI::getRegisterBitWidth(bool Vector) const {
return 32;		return 32;
}		}

unsigned BasicTTI::getMaximumUnrollFactor() const {		unsigned BasicTTI::getMaximumUnrollFactor() const {
return 1;		return 1;
}		}

unsigned BasicTTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned BasicTTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,
OperandValueKind,		OperandValueKind, OperandValueKind,
OperandValueKind) const {		OperandValueProperties,
		OperandValueProperties) const {
// Check if any of the operands are vector operands.		// Check if any of the operands are vector operands.
const TargetLoweringBase *TLI = getTLI();		const TargetLoweringBase *TLI = getTLI();
int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);		std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);

bool IsFloat = Ty->getScalarType()->isFloatingPointTy();		bool IsFloat = Ty->getScalarType()->isFloatingPointTy();
▲ Show 20 Lines • Show All 342 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	public:
unsigned getMaximumUnrollFactor() const override;		unsigned getMaximumUnrollFactor() const override;

unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src) const		unsigned getCastInstrCost(unsigned Opcode, Type Dst, Type Src) const
override;		override;

unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) const		unsigned getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) const
override;		override;

unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned getArithmeticInstrCost(
OperandValueKind Opd1Info = OK_AnyValue,		unsigned Opcode, Type *Ty, OperandValueKind Opd1Info = OK_AnyValue,
OperandValueKind Opd2Info = OK_AnyValue) const		OperandValueKind Opd2Info = OK_AnyValue,
override;		OperandValueProperties Opd1PropInfo = OP_None,
		OperandValueProperties Opd2PropInfo = OP_None) const override;

unsigned getAddressComputationCost(Type *Ty, bool IsComplex) const override;		unsigned getAddressComputationCost(Type *Ty, bool IsComplex) const override;

unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) const		unsigned getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) const
override;		override;

unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,		unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
unsigned AddressSpace) const override;		unsigned AddressSpace) const override;
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	if (Index != -1U) {
if (Index == 0)		if (Index == 0)
return 0;		return 0;
}		}

// All other insert/extracts cost this much.		// All other insert/extracts cost this much.
return 2;		return 2;
}		}

unsigned AArch64TTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned AArch64TTI::getArithmeticInstrCost(
OperandValueKind Opd1Info,		unsigned Opcode, Type *Ty, OperandValueKind Opd1Info,
OperandValueKind Opd2Info) const {		OperandValueKind Opd2Info, OperandValueProperties Opd1PropInfo,
		OperandValueProperties Opd2PropInfo) const {
// Legalize the type.		// Legalize the type.
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);		std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);

int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);

switch (ISD) {		switch (ISD) {
default:		default:
return TargetTransformInfo::getArithmeticInstrCost(Opcode, Ty, Opd1Info,		return TargetTransformInfo::getArithmeticInstrCost(
Opd2Info);		Opcode, Ty, Opd1Info, Opd2Info, Opd1PropInfo, Opd2PropInfo);
case ISD::ADD:		case ISD::ADD:
case ISD::MUL:		case ISD::MUL:
case ISD::XOR:		case ISD::XOR:
case ISD::OR:		case ISD::OR:
case ISD::AND:		case ISD::AND:
// These nodes are marked as 'custom' for combining purposes only.		// These nodes are marked as 'custom' for combining purposes only.
// We know that they are legal. See LowerAdd in ISelLowering.		// We know that they are legal. See LowerAdd in ISelLowering.
return 1 * LT.first;		return 1 * LT.first;
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

lib/Target/ARM/ARMTargetTransformInfo.cpp

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy) const override;		Type *CondTy) const override;

unsigned getVectorInstrCost(unsigned Opcode, Type *Val,		unsigned getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) const override;		unsigned Index) const override;

unsigned getAddressComputationCost(Type *Val,		unsigned getAddressComputationCost(Type *Val,
bool IsComplex) const override;		bool IsComplex) const override;

unsigned		unsigned getArithmeticInstrCost(
getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty, OperandValueKind Op1Info = OK_AnyValue,
OperandValueKind Op1Info = OK_AnyValue,		OperandValueKind Op2Info = OK_AnyValue,
OperandValueKind Op2Info = OK_AnyValue) const override;		OperandValueProperties Opd1PropInfo = OP_None,
		OperandValueProperties Opd2PropInfo = OP_None) const override;

unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,		unsigned getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
unsigned AddressSpace) const override;		unsigned AddressSpace) const override;
/// @}		/// @}
};		};

} // end anonymous namespace		} // end anonymous namespace

▲ Show 20 Lines • Show All 351 Lines • ▼ Show 20 Lines	int Idx =
CostTableLookup(NEONAltShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second);		CostTableLookup(NEONAltShuffleTbl, ISD::VECTOR_SHUFFLE, LT.second);
if (Idx == -1)		if (Idx == -1)
return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);		return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);
return LT.first * NEONAltShuffleTbl[Idx].Cost;		return LT.first * NEONAltShuffleTbl[Idx].Cost;
}		}
return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);		return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);
}		}

unsigned ARMTTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned ARMTTI::getArithmeticInstrCost(
OperandValueKind Op1Info,		unsigned Opcode, Type *Ty, OperandValueKind Op1Info,
OperandValueKind Op2Info) const {		OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo,
		OperandValueProperties Opd2PropInfo) const {

int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);		int ISDOpcode = TLI->InstructionOpcodeToISD(Opcode);
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);		std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);

const unsigned FunctionCallDivCost = 20;		const unsigned FunctionCallDivCost = 20;
const unsigned ReciprocalDivCost = 10;		const unsigned ReciprocalDivCost = 10;
static const CostTblEntry<MVT::SimpleValueType> CostTbl[] = {		static const CostTblEntry<MVT::SimpleValueType> CostTbl[] = {
// Division.		// Division.
Show All 39 Lines	unsigned ARMTTI::getArithmeticInstrCost(
int Idx = -1;		int Idx = -1;

if (ST->hasNEON())		if (ST->hasNEON())
Idx = CostTableLookup(CostTbl, ISDOpcode, LT.second);		Idx = CostTableLookup(CostTbl, ISDOpcode, LT.second);

if (Idx != -1)		if (Idx != -1)
return LT.first * CostTbl[Idx].Cost;		return LT.first * CostTbl[Idx].Cost;

unsigned Cost =		unsigned Cost = TargetTransformInfo::getArithmeticInstrCost(
TargetTransformInfo::getArithmeticInstrCost(Opcode, Ty, Op1Info, Op2Info);		Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo);

// This is somewhat of a hack. The problem that we are facing is that SROA		// This is somewhat of a hack. The problem that we are facing is that SROA
// creates a sequence of shift, and, or instructions to construct values.		// creates a sequence of shift, and, or instructions to construct values.
// These sequences are recognized by the ISel and have zero-cost. Not so for		// These sequences are recognized by the ISel and have zero-cost. Not so for
// the vectorized code. Because we have support for v2i64 but not i64 those		// the vectorized code. Because we have support for v2i64 but not i64 those
// sequences look particularly beneficial to vectorize.		// sequences look particularly beneficial to vectorize.
// To work around this we increase the cost of v2i64 operations to make them		// To work around this we increase the cost of v2i64 operations to make them
// seem less beneficial.		// seem less beneficial.
Show All 19 Lines

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	public:
/// @}		/// @}

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

virtual unsigned getNumberOfRegisters(bool Vector) const override;		virtual unsigned getNumberOfRegisters(bool Vector) const override;
virtual unsigned getRegisterBitWidth(bool Vector) const override;		virtual unsigned getRegisterBitWidth(bool Vector) const override;
virtual unsigned getMaximumUnrollFactor() const override;		virtual unsigned getMaximumUnrollFactor() const override;
virtual unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty,		virtual unsigned
OperandValueKind,		getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,
OperandValueKind) const override;		OperandValueKind, OperandValueProperties,
		OperandValueProperties) const override;
virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,		virtual unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,
int Index, Type *SubTp) const override;		int Index, Type *SubTp) const override;
virtual unsigned getCastInstrCost(unsigned Opcode, Type *Dst,		virtual unsigned getCastInstrCost(unsigned Opcode, Type *Dst,
Type *Src) const override;		Type *Src) const override;
virtual unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,		virtual unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy) const override;		Type *CondTy) const override;
virtual unsigned getVectorInstrCost(unsigned Opcode, Type *Val,		virtual unsigned getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) const override;		unsigned Index) const override;
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	unsigned PPCTTI::getMaximumUnrollFactor() const {
if (Directive == PPC::DIR_E500mc \|\| Directive == PPC::DIR_E5500)		if (Directive == PPC::DIR_E500mc \|\| Directive == PPC::DIR_E5500)
return 1;		return 1;

// For most things, modern systems have two execution units (and		// For most things, modern systems have two execution units (and
// out-of-order execution).		// out-of-order execution).
return 2;		return 2;
}		}

unsigned PPCTTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned PPCTTI::getArithmeticInstrCost(
OperandValueKind Op1Info,		unsigned Opcode, Type *Ty, OperandValueKind Op1Info,
OperandValueKind Op2Info) const {		OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo,
		OperandValueProperties Opd2PropInfo) const {
assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");		assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");

// Fallback to the default implementation.		// Fallback to the default implementation.
return TargetTransformInfo::getArithmeticInstrCost(Opcode, Ty, Op1Info,		return TargetTransformInfo::getArithmeticInstrCost(
Op2Info);		Opcode, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo);
}		}

unsigned PPCTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,		unsigned PPCTTI::getShuffleCost(ShuffleKind Kind, Type *Tp, int Index,
Type *SubTp) const {		Type *SubTp) const {
return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);		return TargetTransformInfo::getShuffleCost(Kind, Tp, Index, SubTp);
}		}

unsigned PPCTTI::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) const {		unsigned PPCTTI::getCastInstrCost(unsigned Opcode, Type Dst, Type Src) const {
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	public:

/// \name Vector TTI Implementations		/// \name Vector TTI Implementations
/// @{		/// @{

unsigned getNumberOfRegisters(bool Vector) const override;		unsigned getNumberOfRegisters(bool Vector) const override;
unsigned getRegisterBitWidth(bool Vector) const override;		unsigned getRegisterBitWidth(bool Vector) const override;
unsigned getMaximumUnrollFactor() const override;		unsigned getMaximumUnrollFactor() const override;
unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,		unsigned getArithmeticInstrCost(unsigned Opcode, Type *Ty, OperandValueKind,
OperandValueKind) const override;		OperandValueKind, OperandValueProperties,
		OperandValueProperties) const override;
unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,		unsigned getShuffleCost(ShuffleKind Kind, Type *Tp,
int Index, Type *SubTp) const override;		int Index, Type *SubTp) const override;
unsigned getCastInstrCost(unsigned Opcode, Type *Dst,		unsigned getCastInstrCost(unsigned Opcode, Type *Dst,
Type *Src) const override;		Type *Src) const override;
unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,		unsigned getCmpSelInstrCost(unsigned Opcode, Type *ValTy,
Type *CondTy) const override;		Type *CondTy) const override;
unsigned getVectorInstrCost(unsigned Opcode, Type *Val,		unsigned getVectorInstrCost(unsigned Opcode, Type *Val,
unsigned Index) const override;		unsigned Index) const override;
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	unsigned X86TTI::getMaximumUnrollFactor() const {
// Sandybridge and Haswell have multiple execution ports and pipelined		// Sandybridge and Haswell have multiple execution ports and pipelined
// vector units.		// vector units.
if (ST->hasAVX())		if (ST->hasAVX())
return 4;		return 4;

return 2;		return 2;
}		}

unsigned X86TTI::getArithmeticInstrCost(unsigned Opcode, Type *Ty,		unsigned X86TTI::getArithmeticInstrCost(
OperandValueKind Op1Info,		unsigned Opcode, Type *Ty, OperandValueKind Op1Info,
OperandValueKind Op2Info) const {		OperandValueKind Op2Info, OperandValueProperties Opd1PropInfo,
		OperandValueProperties Opd2PropInfo) const {
// Legalize the type.		// Legalize the type.
std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);		std::pair<unsigned, MVT> LT = TLI->getTypeLegalizationCost(Ty);

int ISD = TLI->InstructionOpcodeToISD(Opcode);		int ISD = TLI->InstructionOpcodeToISD(Opcode);
assert(ISD && "Invalid opcode");		assert(ISD && "Invalid opcode");

		if (ISD == ISD::SDIV &&
		Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
		Opd2PropInfo == TargetTransformInfo::OP_PowerOf2) {
		// On X86, vector signed division by constants power-of-two are
		// normally expanded to the sequence SRA + SRL + ADD + SRA.
		// The OperandValue properties many not be same as that of previous
		// operation;conservatively assume OP_None.
		unsigned Cost =
		2 * getArithmeticInstrCost(Instruction::AShr, Ty, Op1Info, Op2Info,
		TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);
		Cost += getArithmeticInstrCost(Instruction::LShr, Ty, Op1Info, Op2Info,
		TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);
		Cost += getArithmeticInstrCost(Instruction::Add, Ty, Op1Info, Op2Info,
		TargetTransformInfo::OP_None,
		TargetTransformInfo::OP_None);

		return Cost;
		}

		andreadbUnsubmitted Not Done Reply Inline Actions Here you can add the following check: // Unsigned divisions by powers-of-2 are always reduced to SRL. if (ISD == ISD::UDIV && TargetTransformInfo::OK_UniformConstantValue && Op2PropInfo == TargetTransformInfo::OP_PowerOf2) ISD = ISD::SRL; I had a quick look at the code and I am pretty sure that this is what we do for every target. This will allow you to get rid of all your new entries for UDIV in the new cost tables you added for powers-of-2. In the case of SDIV, each targets may provide its own custom implementation of `sdiv x, pow2`. On X86 we don't override method 'BuildSDIVPow2' and therefore we fall back to the standard expansion of signed divisions by pow2; as a result, we introduce the sequence SRA + SRL + ADD + SRA. if (ISD == ISD::SDIV && TargetTransformInfo::OK_UniformConstantValue && Op2PropInfo == TargetTransformInfo::OP_PowerOf2) { // On X86, vector signed division by constants power-of-two are // normally expanded to the sequence SRA + SRL + ADD + SRA. unsigned Cost = 2 * getArithmeticInstrCost(ISD::SRA, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); Cost += getArithmeticInstrCost(ISD::SRL, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); Cost += getArithmeticInstrCost(ISD::ADD, Ty, Op1Info, Op2Info, Opd1PropInfo, Opd2PropInfo); return Cost; } This will allow you to reuse the existing cost tables without having to add extra cost tables to handle the special case of divisions by powers-of-2. As a side note: in the future, we can think of removing the check for TargetTransformInfo::OK_UniformConstantValue to also address non-uniform constants as suggested by Hal. For example, on AVX2, UDIV by non-uniform constants power-of-2 could be emitted as a single vprslv. That is because we can take advantage of vector shifts with variable count (which we don't have on SSE/AVX). Unfortunately, this is not happening at the moment and we end up scalarizing the logical shifts instead. This is why we still need that check for now.. Example: %div = udiv <4 x i32> %A, <i32 4, i32 8, i32 16, i32 4> Could be strenght reduced to: %div = lshr <4 x i32> %A, <i32 2, i32 3, i32 4, i32 2> On AVX2, that logical shift right would be emitted as a single vpsrlvd. Unfortunately this is not happening at the moment and we get a long sequence of vpextrd + shrl + vpinsrd.. Similarly, On AVX2 we could strongly optimize the SDIV expansion by non-uniform constants power-of-2 (this - again - is not happening at the moment). andreadb: Here you can add the following check: ``` // Unsigned divisions by powers-of-2 are always…
static const CostTblEntry<MVT::SimpleValueType>		static const CostTblEntry<MVT::SimpleValueType>
AVX2UniformConstCostTable[] = {		AVX2UniformConstCostTable[] = {
{ ISD::SDIV, MVT::v16i16, 6 }, // vpmulhw sequence		{ ISD::SDIV, MVT::v16i16, 6 }, // vpmulhw sequence
{ ISD::UDIV, MVT::v16i16, 6 }, // vpmulhuw sequence		{ ISD::UDIV, MVT::v16i16, 6 }, // vpmulhuw sequence
{ ISD::SDIV, MVT::v8i32, 15 }, // vpmuldq sequence		{ ISD::SDIV, MVT::v8i32, 15 }, // vpmuldq sequence
{ ISD::UDIV, MVT::v8i32, 15 }, // vpmuludq sequence		{ ISD::UDIV, MVT::v8i32, 15 }, // vpmuludq sequence
};		};

if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&		if (Op2Info == TargetTransformInfo::OK_UniformConstantValue &&
ST->hasAVX2()) {		ST->hasAVX2()) {
		andreadbUnsubmitted Not Done Reply Inline Actions I think we also have to explicitly set Opd2PropInfo to 'TargetTransformInfo::OP_None. The shift count of the resulting shift is unlikely to be a power-of-2. andreadb: I think we also have to explicitly set Opd2PropInfo to 'TargetTransformInfo::OP_None. The shift…
int Idx = CostTableLookup(AVX2UniformConstCostTable, ISD, LT.second);		int Idx = CostTableLookup(AVX2UniformConstCostTable, ISD, LT.second);
if (Idx != -1)		if (Idx != -1)
return LT.first * AVX2UniformConstCostTable[Idx].Cost;		return LT.first * AVX2UniformConstCostTable[Idx].Cost;
}		}

static const CostTblEntry<MVT::SimpleValueType> AVX2CostTable[] = {		static const CostTblEntry<MVT::SimpleValueType> AVX2CostTable[] = {
// Shifts on v4i64/v8i32 on AVX2 is legal even though we declare to		// Shifts on v4i64/v8i32 on AVX2 is legal even though we declare to
// customize them to detect the cases where shift amount is a scalar one.		// customize them to detect the cases where shift amount is a scalar one.
{ ISD::SHL, MVT::v4i32, 1 },		{ ISD::SHL, MVT::v4i32, 1 },
		spatelUnsubmitted Not Done Reply Inline Actions udiv should become a logical shift left: "vpsrld" or "vpsrlw" with AVX2. With SSE, it's "psrld" or "psrlw" (just remove the leading 'v'). sdiv is handled with a sequence of logical shift left, add, algebraic shift left. The cost should be the sum of those ops? spatel: udiv should become a logical shift left: "vpsrld" or "vpsrlw" with AVX2. With SSE, it's "psrld"…
{ ISD::SRL, MVT::v4i32, 1 },		{ ISD::SRL, MVT::v4i32, 1 },
{ ISD::SRA, MVT::v4i32, 1 },		{ ISD::SRA, MVT::v4i32, 1 },
{ ISD::SHL, MVT::v8i32, 1 },		{ ISD::SHL, MVT::v8i32, 1 },
{ ISD::SRL, MVT::v8i32, 1 },		{ ISD::SRL, MVT::v8i32, 1 },
		andreadbUnsubmitted Not Done Reply Inline Actions Now that I think about it.. it is wrong to pass the old 'Opd1PropInfo' and 'Opd2PropInfo' to those new calls to 'getArithmeticInstrCost' (sorry, that code came from my original suggestion...). The expanded shifts will have different shift counts which may or may not be powers-of-2. The Add will have different operands, so we cannot reuse the operand properties of the original UDIV/SDIV. To be conservative here, we have to use the default values for Opd1PropInfo and Opd2PropInfo (OP_None) in each call to 'getArithmeticInstrCost'. andreadb: Now that I think about it.. it is wrong to pass the old 'Opd1PropInfo' and 'Opd2PropInfo' to…
{ ISD::SRA, MVT::v8i32, 1 },		{ ISD::SRA, MVT::v8i32, 1 },
{ ISD::SHL, MVT::v2i64, 1 },		{ ISD::SHL, MVT::v2i64, 1 },
{ ISD::SRL, MVT::v2i64, 1 },		{ ISD::SRL, MVT::v2i64, 1 },
{ ISD::SHL, MVT::v4i64, 1 },		{ ISD::SHL, MVT::v4i64, 1 },
{ ISD::SRL, MVT::v4i64, 1 },		{ ISD::SRL, MVT::v4i64, 1 },

{ ISD::SHL, MVT::v32i8, 42 }, // cmpeqb sequence.		{ ISD::SHL, MVT::v32i8, 42 }, // cmpeqb sequence.
{ ISD::SHL, MVT::v16i16, 16*10 }, // Scalarized.		{ ISD::SHL, MVT::v16i16, 16*10 }, // Scalarized.

{ ISD::SRL, MVT::v32i8, 32*10 }, // Scalarized.		{ ISD::SRL, MVT::v32i8, 32*10 }, // Scalarized.
		spatelUnsubmitted Not Done Reply Inline Actions Nit: the changes to the bracket spacing are not consistent with the rest of the file. Can you adjust all of those in a later checkin so this patch is minimal? spatel: Nit: the changes to the bracket spacing are not consistent with the rest of the file. Can you…
{ ISD::SRL, MVT::v16i16, 8*10 }, // Scalarized.		{ ISD::SRL, MVT::v16i16, 8*10 }, // Scalarized.

{ ISD::SRA, MVT::v32i8, 32*10 }, // Scalarized.		{ ISD::SRA, MVT::v32i8, 32*10 }, // Scalarized.
{ ISD::SRA, MVT::v16i16, 16*10 }, // Scalarized.		{ ISD::SRA, MVT::v16i16, 16*10 }, // Scalarized.
{ ISD::SRA, MVT::v4i64, 4*10 }, // Scalarized.		{ ISD::SRA, MVT::v4i64, 4*10 }, // Scalarized.

// Vectorizing division is a bad idea. See the SSE2 table for more comments.		// Vectorizing division is a bad idea. See the SSE2 table for more comments.
{ ISD::SDIV, MVT::v32i8, 32*20 },		{ ISD::SDIV, MVT::v32i8, 32*20 },
▲ Show 20 Lines • Show All 834 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

Show First 20 Lines • Show All 5,831 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
if (I->getOpcode() == Instruction::Mul && isStrideMul(I, Legal))		if (I->getOpcode() == Instruction::Mul && isStrideMul(I, Legal))
return 0;		return 0;
// Certain instructions can be cheaper to vectorize if they have a constant		// Certain instructions can be cheaper to vectorize if they have a constant
// second vector operand. One example of this are shifts on x86.		// second vector operand. One example of this are shifts on x86.
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
		TargetTransformInfo::OperandValueProperties Op1VP =
		TargetTransformInfo::OP_None;
		TargetTransformInfo::OperandValueProperties Op2VP =
		TargetTransformInfo::OP_None;
Value *Op2 = I->getOperand(1);		Value *Op2 = I->getOperand(1);

// Check for a splat of a constant or for a non uniform vector of constants.		// Check for a splat of a constant or for a non uniform vector of constants.
if (isa<ConstantInt>(Op2))		if (isa<ConstantInt>(Op2)) {
		ConstantInt *CInt = cast<ConstantInt>(Op2);
		spatelUnsubmitted Not Done Reply Inline Actions Replace isa<> and cast<> with a dyn_cast<>? http://llvm.org/docs/ProgrammersManual.html#the-isa-cast-and-dyn-cast-templates spatel: Replace isa<> and cast<> with a dyn_cast<>? http://llvm.org/docs/ProgrammersManual.html#the-isa…
		if (CInt && CInt->getValue().isPowerOf2())
		Op2VP = TargetTransformInfo::OP_PowerOf2;
Op2VK = TargetTransformInfo::OK_UniformConstantValue;		Op2VK = TargetTransformInfo::OK_UniformConstantValue;
else if (isa<ConstantVector>(Op2) \|\| isa<ConstantDataVector>(Op2)) {		} else if (isa<ConstantVector>(Op2) \|\| isa<ConstantDataVector>(Op2)) {
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
if (cast<Constant>(Op2)->getSplatValue() != nullptr)		Constant *SplatValue = cast<Constant>(Op2)->getSplatValue();
		if (SplatValue) {
		hfinkelUnsubmitted Not Done Reply Inline Actions This can just be: if (SplatValue) { hfinkel: This can just be: if (SplatValue) {
		ConstantInt *CInt = dyn_cast<ConstantInt>(SplatValue);
		if (CInt && CInt->getValue().isPowerOf2())
		Op2VP = TargetTransformInfo::OP_PowerOf2;
Op2VK = TargetTransformInfo::OK_UniformConstantValue;		Op2VK = TargetTransformInfo::OK_UniformConstantValue;
}		}
		}

return TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK, Op2VK);		return TTI.getArithmeticInstrCost(I->getOpcode(), VectorTy, Op1VK, Op2VK,
		Op1VP, Op2VP);
}		}
case Instruction::Select: {		case Instruction::Select: {
SelectInst *SI = cast<SelectInst>(I);		SelectInst *SI = cast<SelectInst>(I);
const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());		const SCEV *CondSCEV = SE->getSCEV(SI->getCondition());
bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));		bool ScalarCond = (SE->isLoopInvariant(CondSCEV, TheLoop));
Type *CondTy = SI->getCondition()->getType();		Type *CondTy = SI->getCondition()->getType();
if (!ScalarCond)		if (!ScalarCond)
CondTy = VectorType::get(CondTy, VF);		CondTy = VectorType::get(CondTy, VF);
▲ Show 20 Lines • Show All 295 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 1,419 Lines • ▼ Show 20 Lines	case Instruction::Xor: {
VecCost = TTI->getCmpSelInstrCost(Opcode, VecTy, MaskTy);		VecCost = TTI->getCmpSelInstrCost(Opcode, VecTy, MaskTy);
} else {		} else {
// Certain instructions can be cheaper to vectorize if they have a		// Certain instructions can be cheaper to vectorize if they have a
// constant second vector operand.		// constant second vector operand.
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;
		TargetTransformInfo::OperandValueProperties Op1VP =
		TargetTransformInfo::OP_None;
		TargetTransformInfo::OperandValueProperties Op2VP =
		TargetTransformInfo::OP_None;

// If all operands are exactly the same ConstantInt then set the		// If all operands are exactly the same ConstantInt then set the
// operand kind to OK_UniformConstantValue.		// operand kind to OK_UniformConstantValue.
// If instead not all operands are constants, then set the operand kind		// If instead not all operands are constants, then set the operand kind
// to OK_AnyValue. If all operands are constants but not the same,		// to OK_AnyValue. If all operands are constants but not the same,
// then set the operand kind to OK_NonUniformConstantValue.		// then set the operand kind to OK_NonUniformConstantValue.
ConstantInt *CInt = nullptr;		ConstantInt *CInt = nullptr;
for (unsigned i = 0; i < VL.size(); ++i) {		for (unsigned i = 0; i < VL.size(); ++i) {
const Instruction *I = cast<Instruction>(VL[i]);		const Instruction *I = cast<Instruction>(VL[i]);
if (!isa<ConstantInt>(I->getOperand(1))) {		if (!isa<ConstantInt>(I->getOperand(1))) {
Op2VK = TargetTransformInfo::OK_AnyValue;		Op2VK = TargetTransformInfo::OK_AnyValue;
break;		break;
}		}
if (i == 0) {		if (i == 0) {
CInt = cast<ConstantInt>(I->getOperand(1));		CInt = cast<ConstantInt>(I->getOperand(1));
continue;		continue;
}		}
if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue &&
CInt != cast<ConstantInt>(I->getOperand(1)))		CInt != cast<ConstantInt>(I->getOperand(1)))
Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;		Op2VK = TargetTransformInfo::OK_NonUniformConstantValue;
}		}
		if (Op2VK == TargetTransformInfo::OK_UniformConstantValue && CInt &&
		CInt->getValue().isPowerOf2())
		Op2VP = TargetTransformInfo::OP_PowerOf2;

ScalarCost =		ScalarCost = VecTy->getNumElements() *
VecTy->getNumElements() *		TTI->getArithmeticInstrCost(Opcode, ScalarTy, Op1VK, Op2VK,
TTI->getArithmeticInstrCost(Opcode, ScalarTy, Op1VK, Op2VK);		Op1VP, Op2VP);
VecCost = TTI->getArithmeticInstrCost(Opcode, VecTy, Op1VK, Op2VK);		VecCost = TTI->getArithmeticInstrCost(Opcode, VecTy, Op1VK, Op2VK,
		Op1VP, Op2VP);
}		}
return VecCost - ScalarCost;		return VecCost - ScalarCost;
}		}
case Instruction::GetElementPtr: {		case Instruction::GetElementPtr: {
TargetTransformInfo::OperandValueKind Op1VK =		TargetTransformInfo::OperandValueKind Op1VK =
TargetTransformInfo::OK_AnyValue;		TargetTransformInfo::OK_AnyValue;
TargetTransformInfo::OperandValueKind Op2VK =		TargetTransformInfo::OperandValueKind Op2VK =
TargetTransformInfo::OK_UniformConstantValue;		TargetTransformInfo::OK_UniformConstantValue;
▲ Show 20 Lines • Show All 2,204 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/X86/powof2div.ll

				; RUN: opt < %s -loop-vectorize -mtriple=x86_64-unknown-linux-gnu -S \| FileCheck %s
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				%struct.anon = type { [100 x i32], i32, [100 x i32] }

				@Foo = common global %struct.anon zeroinitializer, align 4

				;CHECK: load <4 x i32>*
				;CHECK: sdiv <4 x i32>
				;CHECK: store <4 x i32>

				define void @foo(){
				entry:
				br label %for.body

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds %struct.anon* @Foo, i64 0, i32 2, i64 %indvars.iv
				%0 = load i32* %arrayidx, align 4
				%div = sdiv i32 %0, 2
				%arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i64 0, i32 0, i64 %indvars.iv
				store i32 %div, i32* %arrayidx2, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 100
				br i1 %exitcond, label %for.end, label %for.body

				for.end: ; preds = %for.body
				ret void
				}

test/Transforms/SLPVectorizer/X86/powof2div.ll

				; RUN: opt < %s -basicaa -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				;CHECK: load <4 x i32>*
				;CHECK: add <4 x i32>
				;CHECK: sdiv <4 x i32>
				define void @f(i32* noalias nocapture %a, i32* noalias nocapture readonly %b, i32* noalias nocapture readonly %c){
				spatelUnsubmitted Not Done Reply Inline Actions Does this test case require 3 operands and adds, or can it be simplified to just do the sdivs? Please also use CHECK-LABEL for both test cases to make it easier for future additions in each file. If the output is entirely predictable, use CHECK-NEXT's and specify each instruction including the 'ret'. spatel: Does this test case require 3 operands and adds, or can it be simplified to just do the sdivs?
				entry:
				%0 = load i32* %b, align 4
				%1 = load i32* %c, align 4
				%add = add nsw i32 %1, %0
				%div = sdiv i32 %add, 2
				store i32 %div, i32* %a, align 4
				%arrayidx3 = getelementptr inbounds i32* %b, i64 1
				%2 = load i32* %arrayidx3, align 4
				%arrayidx4 = getelementptr inbounds i32* %c, i64 1
				%3 = load i32* %arrayidx4, align 4
				%add5 = add nsw i32 %3, %2
				%div6 = sdiv i32 %add5, 2
				%arrayidx7 = getelementptr inbounds i32* %a, i64 1
				store i32 %div6, i32* %arrayidx7, align 4
				%arrayidx8 = getelementptr inbounds i32* %b, i64 2
				%4 = load i32* %arrayidx8, align 4
				%arrayidx9 = getelementptr inbounds i32* %c, i64 2
				%5 = load i32* %arrayidx9, align 4
				%add10 = add nsw i32 %5, %4
				%div11 = sdiv i32 %add10, 2
				%arrayidx12 = getelementptr inbounds i32* %a, i64 2
				store i32 %div11, i32* %arrayidx12, align 4
				%arrayidx13 = getelementptr inbounds i32* %b, i64 3
				%6 = load i32* %arrayidx13, align 4
				%arrayidx14 = getelementptr inbounds i32* %c, i64 3
				%7 = load i32* %arrayidx14, align 4
				%add15 = add nsw i32 %7, %6
				%div16 = sdiv i32 %add15, 2
				%arrayidx17 = getelementptr inbounds i32* %a, i64 3
				store i32 %div16, i32* %arrayidx17, align 4
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

Improve Cost model for SLPVectorizer when we have a vector division by power of 2ClosedPublic

Details

Diff Detail

Event Timeline

+ {ISD::SDIV, MVT::v8i32, 1}, // psrad instruction

Revision Contents

Diff 12832

include/llvm/Analysis/TargetTransformInfo.h

lib/Analysis/TargetTransformInfo.cpp

lib/CodeGen/BasicTargetTransformInfo.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

lib/Target/ARM/ARMTargetTransformInfo.cpp

lib/Target/PowerPC/PPCTargetTransformInfo.cpp

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Transforms/LoopVectorize/X86/powof2div.ll

test/Transforms/SLPVectorizer/X86/powof2div.ll

Improve Cost model for SLPVectorizer when we have a vector division by power of 2
ClosedPublic