This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Analysis/
3/5
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
CodeGen/
1/8
BasicTTIImpl.h
-
lib/
-
Analysis/
-
CostModel.cpp
-
TargetTransformInfo.cpp
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
-
BBVectorize.cpp
-
LoopVectorize.cpp
-
SLPVectorizer.cpp
-
test/
-
Analysis/CostModel/X86/
-
CostModel/
-
X86/
-
arith-fp.ll
-
Transforms/LoopVectorize/
-
LoopVectorize/
-
AArch64/
3
interleaved_cost.ll
-
ARM/
-
interleaved_cost.ll

Differential D29540

Scalarization overhead estimation in getIntrinsicInstrCost() improved
ClosedPublic

Authored by jonpa on Feb 4 2017, 4:58 AM.

Download Raw Diff

Details

Reviewers

javed.absar
hfinkel

Summary

getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands.

getIntrinsicInstrCost("Types") has gotten a new parameter 'ScalarizationCostPassed', so that a caller can pass on a value for scalarization cost based on actual operands. If this is UINT_MAX, the Type based estimation is made as before.

getIntrinsicInstrCost("Args") has gotten a new parameter 'VF'. If VF > 1, the types involved are vectorized with it before analyzing begins (vectorized Args that are passed can however not be combined with VF > 1). This seemed like a good idea, since both SLPVectorizer and LoopVectorizer can use this.

getOperandsScalarizationOverhead() now also checks for Constants. It has also been extended, to allow vector operands (in case which VF must be 1). I deduced this to be needed since BBVectorize calls getVecTypeForPair(), which also handles vector types as input. I am not that familiar with BBVectrize however, so if this is wrong in the sense that BBVectorize never further vectorizes a vector intrinsic, this is then not needed...

In BBVectorize, things got a bit tricky while handling merged arguments. Here, the scalarization cost is computed locally, by considering all the input operands of both instructions, plus the vectorized return type. Since vectorization is done by arguments merging, getOperandsScalarizationOverhead() is then called for all operands with a VF of 1. I hope this is right.

test/Analysis/CostModel/X86/arith-fp.ll has been updated to expect lower instruction costs, which should make sense since the calls has undef operands, so the scalarization cost of them (extracts) should not be added.

Diff Detail

Event Timeline

jonpa created this revision.Feb 4 2017, 4:58 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptFeb 4 2017, 4:58 AM

ping.

RKSimon added a subscriber: RKSimon.Feb 21 2017, 5:35 AM

ping!

ping!!

hfinkel added inline comments.Mar 8 2017, 5:57 AM

include/llvm/Analysis/TargetTransformInfo.h
632	Please define what "it" is. This is the cost of scalarizing the arguments and the return value, right?
639	/// Three cases are handled:
641	"If VF > 1, it will be applied before analyzis." - I don't see how this adds value to the user of the interface? I don't see how else it could work or what it would mean if it worked otherwise.
include/llvm/CodeGen/BasicTTIImpl.h
725	"Can't multiply VF here." does not really explain the problem. How about, "VF > 1 and RetVF is a vector type"?
742	I don't understand why `RetVF > 1` is here. If RetVF is not one, then VF needs to be one, and so we're not vectorizing. As a result, we're not scalarizing either, and so adding the scalarization cost associated with the return type seems odd.
786	This interface seems a bit broken for how we're using it. Why are we assuming that we need to scalarize all intrinsics? (many target-specific intrinsics are really vector intrinsics - maybe only assume we need to scalarize the generic ones for which we don't have special handling?).

Updated per review.

include/llvm/Analysis/TargetTransformInfo.h
632	yes
641	ok - removed that sentence :-)
include/llvm/CodeGen/BasicTTIImpl.h
725	sure
742	My idea was that this function can be called from different contexts ("three cases"). The LoopVectorizer can pass the scalar instruction and VF > 1. CostModel can pass a vector instruction, RetVF > 1 (and VF argument defaults to 1). Both cases should produce the same return value. Am I missing something?
786	I was assuming this made sense in the 'default:' case at least. How scalarization cost is computed, and when it is done are two separate issues, so perhaps that can be handled later with a separate patch?

LGTM.

In BBVectorize, things got a bit tricky while handling merged arguments. Here, the scalarization cost is computed locally, by considering all the input operands of both instructions, plus the vectorized return type. Since vectorization is done by arguments merging, getOperandsScalarizationOverhead() is then called for all operands with a VF of 1. I hope this is right.

Eeh, don't worry about it. I'm going to delete the whole pass at some point soon anyway.

include/llvm/CodeGen/BasicTTIImpl.h
742	Maybe the underlying problem here is the: // Assume that we need to scalarize this intrinsic. this seems like a bad assumption for a general code model (although it probably makes sense for the vectorizer because the vectorizer's legality checking will bail out before we get here for a vector intrinsic). In any case, a comment here explaining the logic would be good.
786	Yea, this should be cleaned up as a separate patch.

This revision is now accepted and ready to land.Mar 9 2017, 1:13 PM

Thanks for review.

In any case, a comment here explaining the logic would be good.

Added a comment here - is it clear enough?

I also had to update two more tests for ARM and AArch64 targets. Please have a look and let me know if this is acceptable, before I commit.

Herald added a reviewer: javed.absar. · View Herald TranscriptMar 9 2017, 10:47 PM

hfinkel added inline comments.Mar 12 2017, 6:44 PM

test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173	Why do these change? (they're not intrinsics).

jonpa added inline comments.Mar 13 2017, 12:40 AM

test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173	getOperandsScalarizationOverhead() has been improved so that it doesn't count extraction costs for constants. ARM: This is 2 for each extract, so for VF 4, 40 -> 32 makes sense, as well as for VF 8. Interleaved cost was 40, and now scalarizing the memop is 32. (AArch64: Same) To me these changes look ok.

hfinkel added inline comments.Mar 13 2017, 6:41 AM

test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173	Okay, please proceed. Please include this explanation for the test case changes in the commit message, however, so it is clear what's going on.

Commited as @297705.

There's a chance this commit broke our code coverage bot.

@jonpa Could you take a look? http://green.lab.llvm.org/green/job/clang-stage2-coverage-R_build/926/

FAILED: lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o 
/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/bin/clang++   -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/CodeGen/SelectionDAG -I/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG -Iinclude -I/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/include -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Werror=date-time -std=c++11 -fcolor-diagnostics -fprofile-instr-generate='/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/profiles/%6m.profraw' -fcoverage-mapping -O3 -DNDEBUG    -fno-exceptions -fno-rtti -MMD -MT lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -MF lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o.d -o lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -c '/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp'
Assertion failed: (Ty->isVectorTy() && "Can only scalarize vectors"), function getScalarizationOverhead, file /Users/buildslave/jenkins/sharedspace/phase1@2/llvm/include/llvm/CodeGen/BasicTTIImpl.h, line 294.
0  clang-5.0                0x000000010df391b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  clang-5.0                0x000000010df38416 llvm::sys::RunSignalHandlers() + 86
2  clang-5.0                0x000000010df39859 SignalHandler(int) + 361
3  libsystem_platform.dylib 0x00007fff897db52a _sigtramp + 26
4  libsystem_platform.dylib 0x00007fe600000040 _sigtramp + 1988250416
5  libsystem_c.dylib        0x00007fff99af66df abort + 129
6  libsystem_c.dylib        0x00007fff99abddd8 basename + 0
7  clang-5.0                0x000000010d3e4c13 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getIntrinsicInstrCost(llvm::Intrinsic::ID, llvm::Type*, llvm::ArrayRef<llvm::Value*>, llvm::FastMathFlags, unsigned int) + 1267
8  clang-5.0                0x000000010d6344cc llvm::TargetTransformInfo::getIntrinsicInstrCost(llvm::Intrinsic::ID, llvm::Type*, llvm::ArrayRef<llvm::Value*>, llvm::FastMathFlags, unsigned int) const + 28
9  clang-5.0                0x000000010e05fee5 getVectorIntrinsicCost(llvm::CallInst*, unsigned int, llvm::TargetTransformInfo const&, llvm::TargetLibraryInfo const*) + 389
10 clang-5.0                0x000000010e05ed06 (anonymous namespace)::LoopVectorizationCostModel::getInstructionCost(llvm::Instruction*, unsigned int) + 1782
11 clang-5.0                0x000000010e06009f (anonymous namespace)::LoopVectorizationCostModel::expectedCost(unsigned int) + 319
12 clang-5.0                0x000000010e04a8bf llvm::LoopVectorizePass::processLoop(llvm::Loop*) + 8735
13 clang-5.0                0x000000010e0553ab llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&) + 539
14 clang-5.0                0x000000010e056c23 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) + 1235
15 clang-5.0                0x000000010d9efee3 llvm::FPPassManager::runOnFunction(llvm::Function&) + 547
16 clang-5.0                0x000000010d9f0143 llvm::FPPassManager::runOnModule(llvm::Module&) + 51
17 clang-5.0                0x000000010d9f05ce llvm::legacy::PassManagerImpl::run(llvm::Module&) + 766
18 clang-5.0                0x000000010e1469a3 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 14579
19 clang-5.0                0x000000010e344cc5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 917
20 clang-5.0                0x000000010ec93245 clang::ParseAST(clang::Sema&, bool, bool) + 469
21 clang-5.0                0x000000010e5970cc clang::FrontendAction::Execute() + 76
22 clang-5.0                0x000000010e556381 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1217
23 clang-5.0                0x000000010e5ef3a8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 4904
24 clang-5.0                0x000000010c9584cc cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1388
25 clang-5.0                0x000000010c956c33 main + 11955
26 libdyld.dylib            0x00007fff93aba5ad start + 1
27 libdyld.dylib            0x0000000000000060 start + 1817467572
Stack dump:
0.	Program arguments: /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/bin/clang-5.0 -cc1 -triple x86_64-apple-macosx10.11.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -disable-free -main-file-name LegalizeIntegerTypes.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core2 -target-linker-version 264.3.102 -dwarf-column-info -debugger-tuning=lldb -fprofile-instrument-path=/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/profiles/%6m.profraw -fprofile-instrument=clang -fcoverage-mapping -coverage-notes-file /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.gcno -resource-dir /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/lib/clang/5.0.0 -dependency-file lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o.d -MT lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -D GTEST_HAS_RTTI=0 -D LLVM_BUILD_GLOBAL_ISEL -D __STDC_CONSTANT_MACROS -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -I lib/CodeGen/SelectionDAG -I /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG -I include -I /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/include -D NDEBUG -stdlib=libc++ -O3 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Werror=date-time -pedantic -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build -ferror-limit 19 -fmessage-length 0 -fvisibility-inlines-hidden -stack-protector 1 -fblocks -fno-rtti -fobjc-runtime=macosx-10.11.0 -fencode-extended-block-signature -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -x c++ /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp'.
4.	Running pass 'Loop Vectorization' on function '@_ZN4llvm16DAGTypeLegalizer28PromoteIntRes_CONCAT_VECTORSEPNS_6SDNodeE'

In D29540#700891, @vsk wrote:
There's a chance this commit broke our code coverage bot.

@jonpa Could you take a look? http://green.lab.llvm.org/green/job/clang-stage2-coverage-R_build/926/
FAILED: lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o

I tried to build this with same cmake flags, but then got:
/usr/bin/ld: cannot find /root/llvm/install/llvm-dev/lib/clang/5.0.0/lib/linux/libclang_rt.profile-s390x.a: No such file or directory
If I removed -DLLVM_BUILD_INSTRUMENTED_COVERAGE=ON, everything builds fine.

Could you perhaps send me a reduced test case with command for llc?

Looking at the problem, it is difficult to tell where the problem is. It "should work" because has explicit checks everywhere if the intrinsic is vectorized or not, and only then should getScalarizationOverhead() be called. Could this be an intrinsic that returns a vector type even though it itself is not being vectorized?

Sorry, but I can't help more without a test case.

Thanks for taking a look, I filed a PR: http://bugs.llvm.org/show_bug.cgi?id=32285

In D29540#701888, @vsk wrote:

Thanks for taking a look, I filed a PR: http://bugs.llvm.org/show_bug.cgi?id=32285

I think this was an omission to check for a void return type. See my fix at https://reviews.llvm.org/D31024.

Fix commited as r297954

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

31 lines

TargetTransformInfoImpl.h

5 lines

CodeGen/

BasicTTIImpl.h

76 lines

lib/

Analysis/

CostModel.cpp

4 lines

TargetTransformInfo.cpp

12 lines

Target/

X86/

X86TargetTransformInfo.h

6 lines

X86TargetTransformInfo.cpp

9 lines

Transforms/

Vectorize/

BBVectorize.cpp

40 lines

LoopVectorize.cpp

8 lines

SLPVectorizer.cpp

11 lines

test/

Analysis/

CostModel/

X86/

arith-fp.ll

24 lines

Transforms/

LoopVectorize/

AArch64/

interleaved_cost.ll

4 lines

ARM/

interleaved_cost.ll

4 lines

Diff 91268

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 622 Lines • ▼ Show 20 Lines	public:
/// Pairwise:		/// Pairwise:
/// (v0, v1, v2, v3)		/// (v0, v1, v2, v3)
/// ((v0+v1), (v2, v3), undef, undef)		/// ((v0+v1), (v2, v3), undef, undef)
/// Split:		/// Split:
/// (v0, v1, v2, v3)		/// (v0, v1, v2, v3)
/// ((v0+v2), (v1+v3), undef, undef)		/// ((v0+v2), (v1+v3), undef, undef)
int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm) const;		int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm) const;

/// \returns The cost of Intrinsic instructions. Types analysis only.		/// \returns The cost of Intrinsic instructions. Analyses the real arguments.
		/// Three cases are handled: 1. scalar instruction 2. vector instruction
		hfinkelUnsubmitted Done Reply Inline Actions Please define what "it" is. This is the cost of scalarizing the arguments and the return value, right? hfinkel: Please define what "it" is. This is the cost of scalarizing the arguments and the return value…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions yes jonpa: yes
		/// 3. scalar instruction which is to be vectorized with VF.
int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF) const;		ArrayRef<Value *> Args, FastMathFlags FMF,
		unsigned VF = 1) const;

/// \returns The cost of Intrinsic instructions. Analyses the real arguments.		/// \returns The cost of Intrinsic instructions. Types analysis only.
		/// If ScalarizationCostPassed is UINT_MAX, the cost of scalarizing the
		hfinkelUnsubmitted Done Reply Inline Actions /// Three cases are handled: hfinkel: /// Three cases are handled:
		/// arguments and the return value will be computed based on types.
int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
		hfinkelUnsubmitted Done Reply Inline Actions "If VF > 1, it will be applied before analyzis." - I don't see how this adds value to the user of the interface? I don't see how else it could work or what it would mean if it worked otherwise. hfinkel: "If VF > 1, it will be applied before analyzis." - I don't see how this adds value to the user…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions ok - removed that sentence :-) jonpa: ok - removed that sentence :-)
ArrayRef<Value *> Args, FastMathFlags FMF) const;		ArrayRef<Type *> Tys, FastMathFlags FMF,
		unsigned ScalarizationCostPassed = UINT_MAX) const;

/// \returns The cost of Call instructions.		/// \returns The cost of Call instructions.
int getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) const;		int getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) const;

/// \returns The number of pieces into which the provided type must be		/// \returns The number of pieces into which the provided type must be
/// split during legalization. Zero is returned when the answer is unknown.		/// split during legalization. Zero is returned when the answer is unknown.
unsigned getNumberOfParts(Type *Tp) const;		unsigned getNumberOfParts(Type *Tp) const;

▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	public:
virtual int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		virtual int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor,		unsigned Factor,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Alignment,		unsigned Alignment,
unsigned AddressSpace) = 0;		unsigned AddressSpace) = 0;
virtual int getReductionCost(unsigned Opcode, Type *Ty,		virtual int getReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm) = 0;		bool IsPairwiseForm) = 0;
virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys, FastMathFlags FMF,
FastMathFlags FMF) = 0;		unsigned ScalarizationCostPassed) = 0;
virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		virtual int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args,		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) = 0;
FastMathFlags FMF) = 0;
virtual int getCallInstrCost(Function F, Type RetTy,		virtual int getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys) = 0;		ArrayRef<Type *> Tys) = 0;
virtual unsigned getNumberOfParts(Type *Tp) = 0;		virtual unsigned getNumberOfParts(Type *Tp) = 0;
virtual int getAddressComputationCost(Type Ty, ScalarEvolution SE,		virtual int getAddressComputationCost(Type Ty, ScalarEvolution SE,
const SCEV *Ptr) = 0;		const SCEV *Ptr) = 0;
virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) = 0;		virtual unsigned getCostOfKeepingLiveOverCall(ArrayRef<Type *> Tys) = 0;
virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst,		virtual bool getTgtMemIntrinsic(IntrinsicInst *Inst,
MemIntrinsicInfo &Info) = 0;		MemIntrinsicInfo &Info) = 0;
▲ Show 20 Lines • Show All 237 Lines • ▼ Show 20 Lines	int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy, unsigned Factor,
return Impl.getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		return Impl.getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
}		}
int getReductionCost(unsigned Opcode, Type *Ty,		int getReductionCost(unsigned Opcode, Type *Ty,
bool IsPairwiseForm) override {		bool IsPairwiseForm) override {
return Impl.getReductionCost(Opcode, Ty, IsPairwiseForm);		return Impl.getReductionCost(Opcode, Ty, IsPairwiseForm);
}		}
int getIntrinsicInstrCost(Intrinsic::ID ID, Type RetTy, ArrayRef<Type > Tys,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type RetTy, ArrayRef<Type > Tys,
FastMathFlags FMF) override {		FastMathFlags FMF, unsigned ScalarizationCostPassed) override {
return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF);		return Impl.getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
		ScalarizationCostPassed);
}		}
int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args,		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) override {
FastMathFlags FMF) override {		return Impl.getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF);
return Impl.getIntrinsicInstrCost(ID, RetTy, Args, FMF);
}		}
int getCallInstrCost(Function F, Type RetTy,		int getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys) override {		ArrayRef<Type *> Tys) override {
return Impl.getCallInstrCost(F, RetTy, Tys);		return Impl.getCallInstrCost(F, RetTy, Tys);
}		}
unsigned getNumberOfParts(Type *Tp) override {		unsigned getNumberOfParts(Type *Tp) override {
return Impl.getNumberOfParts(Tp);		return Impl.getNumberOfParts(Tp);
}		}
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 363 Lines • ▼ Show 20 Lines	unsigned getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor,		unsigned Factor,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Alignment,		unsigned Alignment,
unsigned AddressSpace) {		unsigned AddressSpace) {
return 1;		return 1;
}		}

unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF) {		ArrayRef<Type *> Tys, FastMathFlags FMF,
		unsigned ScalarizationCostPassed) {
return 1;		return 1;
}		}
unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		unsigned getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF) {		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) {
return 1;		return 1;
}		}

unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {		unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {
return 1;		return 1;
}		}

unsigned getNumberOfParts(Type *Tp) { return 0; }		unsigned getNumberOfParts(Type *Tp) { return 0; }
▲ Show 20 Lines • Show All 289 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	for (int i = 0, e = Ty->getVectorNumElements(); i < e; ++i) {
if (Extract)		if (Extract)
Cost += static_cast<T *>(this)		Cost += static_cast<T *>(this)
->getVectorInstrCost(Instruction::ExtractElement, Ty, i);		->getVectorInstrCost(Instruction::ExtractElement, Ty, i);
}		}

return Cost;		return Cost;
}		}

/// Estimate the overhead of scalarizing an instructions unique operands.		/// Estimate the overhead of scalarizing an instructions unique
		/// non-constant operands. The types of the arguments are ordinarily
		/// scalar, in which case the costs are multiplied with VF. Vector
		/// arguments are allowed if 1 is passed for VF.
unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,		unsigned getOperandsScalarizationOverhead(ArrayRef<const Value *> Args,
unsigned VF) {		unsigned VF) {
unsigned Cost = 0;		unsigned Cost = 0;
SmallPtrSet<const Value*, 4> UniqueOperands;		SmallPtrSet<const Value*, 4> UniqueOperands;
for (const Value *A : Args) {		for (const Value *A : Args) {
if (UniqueOperands.insert(A).second)		if (!isa<Constant>(A) && UniqueOperands.insert(A).second) {
Cost += getScalarizationOverhead(VectorType::get(A->getType(), VF),		Type *VecTy = nullptr;
false, true);		if (A->getType()->isVectorTy()) {
		assert (VF == 1 && "Vector argument passed with VF > 1");
		VecTy = A->getType();
}		}
		else
		VecTy = VectorType::get(A->getType(), VF);

		Cost += getScalarizationOverhead(VecTy, false, true);
		}
		}

return Cost;		return Cost;
}		}

unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }		unsigned getMaxInterleaveFactor(unsigned VF) { return 1; }

unsigned getArithmeticInstrCost(		unsigned getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
▲ Show 20 Lines • Show All 373 Lines • ▼ Show 20 Lines	if (Opcode == Instruction::Load) {
for (unsigned i = 0; i < NumElts; i++)		for (unsigned i = 0; i < NumElts; i++)
Cost += static_cast<T *>(this)		Cost += static_cast<T *>(this)
->getVectorInstrCost(Instruction::InsertElement, VT, i);		->getVectorInstrCost(Instruction::InsertElement, VT, i);
}		}

return Cost;		return Cost;
}		}

/// Get intrinsic cost based on arguments		/// Get intrinsic cost based on arguments.
unsigned getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		unsigned getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF) {		ArrayRef<Value *> Args, FastMathFlags FMF,
		unsigned VF = 1) {
		unsigned RetVF = (RetTy->isVectorTy() ? RetTy->getVectorNumElements() : 1);
		assert ((RetVF == 1 \|\| VF == 1) && "VF > 1 and RetVF is a vector type");
		hfinkelUnsubmitted Done Reply Inline Actions "Can't multiply VF here." does not really explain the problem. How about, "VF > 1 and RetVF is a vector type"? hfinkel: "Can't multiply VF here." does not really explain the problem. How about, "VF > 1 and RetVF is…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions sure jonpa: sure

switch (IID) {		switch (IID) {
default: {		default: {
		// Assume that we need to scalarize this intrinsic.
SmallVector<Type *, 4> Types;		SmallVector<Type *, 4> Types;
for (Value *Op : Args)		for (Value *Op : Args) {
Types.push_back(Op->getType());		Type *OpTy = Op->getType();
return static_cast<T *>(this)->getIntrinsicInstrCost(IID, RetTy, Types,		assert (VF == 1 \|\| !OpTy->isVectorTy());
FMF);		Types.push_back(VF == 1 ? OpTy : VectorType::get(OpTy, VF));
		}

		if (VF > 1 && !RetTy->isVoidTy())
		RetTy = VectorType::get(RetTy, VF);

		// Compute the scalarization overhead based on Args for a vector
		// intrinsic. A vectorizer will pass a scalar RetTy and VF > 1, while
		// CostModel will pass a vector RetTy and VF is 1.
		hfinkelUnsubmitted Not Done Reply Inline Actions I don't understand why `RetVF > 1` is here. If RetVF is not one, then VF needs to be one, and so we're not vectorizing. As a result, we're not scalarizing either, and so adding the scalarization cost associated with the return type seems odd. hfinkel: I don't understand why `RetVF > 1` is here. If RetVF is not one, then VF needs to be one, and…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions My idea was that this function can be called from different contexts ("three cases"). The LoopVectorizer can pass the scalar instruction and VF > 1. CostModel can pass a vector instruction, RetVF > 1 (and VF argument defaults to 1). Both cases should produce the same return value. Am I missing something? jonpa: My idea was that this function can be called from different contexts ("three cases"). The…
		hfinkelUnsubmitted Not Done Reply Inline Actions Maybe the underlying problem here is the: // Assume that we need to scalarize this intrinsic. this seems like a bad assumption for a general code model (although it probably makes sense for the vectorizer because the vectorizer's legality checking will bail out before we get here for a vector intrinsic). In any case, a comment here explaining the logic would be good. hfinkel: Maybe the underlying problem here is the: // Assume that we need to scalarize this…
		unsigned ScalarizationCost = UINT_MAX;
		if (RetVF > 1 \|\| VF > 1) {
		ScalarizationCost = getScalarizationOverhead(RetTy, true, false);
		ScalarizationCost += getOperandsScalarizationOverhead(Args, VF);
		}

		return static_cast<T *>(this)->
		getIntrinsicInstrCost(IID, RetTy, Types, FMF, ScalarizationCost);
}		}
case Intrinsic::masked_scatter: {		case Intrinsic::masked_scatter: {
		assert (VF == 1 && "Can't vectorize types here.");
Value *Mask = Args[3];		Value *Mask = Args[3];
bool VarMask = !isa<Constant>(Mask);		bool VarMask = !isa<Constant>(Mask);
unsigned Alignment = cast<ConstantInt>(Args[2])->getZExtValue();		unsigned Alignment = cast<ConstantInt>(Args[2])->getZExtValue();
return		return
static_cast<T *>(this)->getGatherScatterOpCost(Instruction::Store,		static_cast<T *>(this)->getGatherScatterOpCost(Instruction::Store,
Args[0]->getType(),		Args[0]->getType(),
Args[1], VarMask,		Args[1], VarMask,
Alignment);		Alignment);
}		}
case Intrinsic::masked_gather: {		case Intrinsic::masked_gather: {
		assert (VF == 1 && "Can't vectorize types here.");
Value *Mask = Args[2];		Value *Mask = Args[2];
bool VarMask = !isa<Constant>(Mask);		bool VarMask = !isa<Constant>(Mask);
unsigned Alignment = cast<ConstantInt>(Args[1])->getZExtValue();		unsigned Alignment = cast<ConstantInt>(Args[1])->getZExtValue();
return		return
static_cast<T *>(this)->getGatherScatterOpCost(Instruction::Load,		static_cast<T *>(this)->getGatherScatterOpCost(Instruction::Load,
RetTy, Args[0], VarMask,		RetTy, Args[0], VarMask,
Alignment);		Alignment);
}		}
}		}
}		}

/// Get intrinsic cost based on argument types		/// Get intrinsic cost based on argument types.
		/// If ScalarizationCostPassed is UINT_MAX, the cost of scalarizing the
		/// arguments and the return value will be computed based on types.
unsigned getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		unsigned getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF) {		ArrayRef<Type *> Tys, FastMathFlags FMF,
		unsigned ScalarizationCostPassed = UINT_MAX) {
SmallVector<unsigned, 2> ISDs;		SmallVector<unsigned, 2> ISDs;
unsigned SingleCallCost = 10; // Library call cost. Make it expensive.		unsigned SingleCallCost = 10; // Library call cost. Make it expensive.
switch (IID) {		switch (IID) {
default: {		default: {
// Assume that we need to scalarize this intrinsic.		// Assume that we need to scalarize this intrinsic.
		hfinkelUnsubmitted Not Done Reply Inline Actions This interface seems a bit broken for how we're using it. Why are we assuming that we need to scalarize all intrinsics? (many target-specific intrinsics are really vector intrinsics - maybe only assume we need to scalarize the generic ones for which we don't have special handling?). hfinkel: This interface seems a bit broken for how we're using it. Why are we assuming that we need to…
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions I was assuming this made sense in the 'default:' case at least. How scalarization cost is computed, and when it is done are two separate issues, so perhaps that can be handled later with a separate patch? jonpa: I was assuming this made sense in the 'default:' case at least. How scalarization cost is…
		hfinkelUnsubmitted Not Done Reply Inline Actions Yea, this should be cleaned up as a separate patch. hfinkel: Yea, this should be cleaned up as a separate patch.
unsigned ScalarizationCost = 0;		unsigned ScalarizationCost = ScalarizationCostPassed;
unsigned ScalarCalls = 1;		unsigned ScalarCalls = 1;
Type *ScalarRetTy = RetTy;		Type *ScalarRetTy = RetTy;
if (RetTy->isVectorTy()) {		if (RetTy->isVectorTy()) {
		if (ScalarizationCostPassed == UINT_MAX)
ScalarizationCost = getScalarizationOverhead(RetTy, true, false);		ScalarizationCost = getScalarizationOverhead(RetTy, true, false);
ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements());		ScalarCalls = std::max(ScalarCalls, RetTy->getVectorNumElements());
ScalarRetTy = RetTy->getScalarType();		ScalarRetTy = RetTy->getScalarType();
}		}
SmallVector<Type *, 4> ScalarTys;		SmallVector<Type *, 4> ScalarTys;
for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {		for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
Type *Ty = Tys[i];		Type *Ty = Tys[i];
if (Ty->isVectorTy()) {		if (Ty->isVectorTy()) {
		if (ScalarizationCostPassed == UINT_MAX)
ScalarizationCost += getScalarizationOverhead(Ty, false, true);		ScalarizationCost += getScalarizationOverhead(Ty, false, true);
ScalarCalls = std::max(ScalarCalls, Ty->getVectorNumElements());		ScalarCalls = std::max(ScalarCalls, Ty->getVectorNumElements());
Ty = Ty->getScalarType();		Ty = Ty->getScalarType();
}		}
ScalarTys.push_back(Ty);		ScalarTys.push_back(Ty);
}		}
if (ScalarCalls == 1)		if (ScalarCalls == 1)
return 1; // Return cost of a scalar intrinsic. Assume it to be cheap.		return 1; // Return cost of a scalar intrinsic. Assume it to be cheap.

▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	if (IID == Intrinsic::fmuladd)
->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +		->getArithmeticInstrCost(BinaryOperator::FMul, RetTy) +
static_cast<T *>(this)		static_cast<T *>(this)
->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);		->getArithmeticInstrCost(BinaryOperator::FAdd, RetTy);

// Else, assume that we need to scalarize this intrinsic. For math builtins		// Else, assume that we need to scalarize this intrinsic. For math builtins
// this will emit a costly libcall, adding call overhead and spills. Make it		// this will emit a costly libcall, adding call overhead and spills. Make it
// very expensive.		// very expensive.
if (RetTy->isVectorTy()) {		if (RetTy->isVectorTy()) {
unsigned ScalarizationCost = getScalarizationOverhead(RetTy, true, false);		unsigned ScalarizationCost = ((ScalarizationCostPassed != UINT_MAX) ?
		ScalarizationCostPassed : getScalarizationOverhead(RetTy, true, false));
unsigned ScalarCalls = RetTy->getVectorNumElements();		unsigned ScalarCalls = RetTy->getVectorNumElements();
SmallVector<Type *, 4> ScalarTys;		SmallVector<Type *, 4> ScalarTys;
for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {		for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
Type *Ty = Tys[i];		Type *Ty = Tys[i];
if (Ty->isVectorTy())		if (Ty->isVectorTy())
Ty = Ty->getScalarType();		Ty = Ty->getScalarType();
ScalarTys.push_back(Ty);		ScalarTys.push_back(Ty);
}		}
unsigned ScalarCost = static_cast<T *>(this)->getIntrinsicInstrCost(		unsigned ScalarCost = static_cast<T *>(this)->getIntrinsicInstrCost(
IID, RetTy->getScalarType(), ScalarTys, FMF);		IID, RetTy->getScalarType(), ScalarTys, FMF);
for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {		for (unsigned i = 0, ie = Tys.size(); i != ie; ++i) {
if (Tys[i]->isVectorTy()) {		if (Tys[i]->isVectorTy()) {
		if (ScalarizationCostPassed == UINT_MAX)
ScalarizationCost += getScalarizationOverhead(Tys[i], false, true);		ScalarizationCost += getScalarizationOverhead(Tys[i], false, true);
ScalarCalls = std::max(ScalarCalls, Tys[i]->getVectorNumElements());		ScalarCalls = std::max(ScalarCalls, Tys[i]->getVectorNumElements());
}		}
}		}

return ScalarCalls * ScalarCost + ScalarizationCost;		return ScalarCalls * ScalarCost + ScalarizationCost;
}		}

// This is going to be turned into a library call, make it expensive.		// This is going to be turned into a library call, make it expensive.
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

lib/Analysis/CostModel.cpp

Show First 20 Lines • Show All 536 Lines • ▼ Show 20 Lines	if (NumVecElems == Mask.size()) {
return TTI->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc,		return TTI->getShuffleCost(TargetTransformInfo::SK_PermuteTwoSrc,
VecTypOp0, 0, nullptr);		VecTypOp0, 0, nullptr);
}		}

return -1;		return -1;
}		}
case Instruction::Call:		case Instruction::Call:
if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {		if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
SmallVector<Value *, 4> Args;		SmallVector<Value *, 4> Args(II->arg_operands());
for (unsigned J = 0, JE = II->getNumArgOperands(); J != JE; ++J)
Args.push_back(II->getArgOperand(J));

FastMathFlags FMF;		FastMathFlags FMF;
if (auto *FPMO = dyn_cast<FPMathOperator>(II))		if (auto *FPMO = dyn_cast<FPMathOperator>(II))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();

return TTI->getIntrinsicInstrCost(II->getIntrinsicID(), II->getType(),		return TTI->getIntrinsicInstrCost(II->getIntrinsicID(), II->getType(),
Args, FMF);		Args, FMF);
}		}
Show All 23 Lines

lib/Analysis/TargetTransformInfo.cpp

Show First 20 Lines • Show All 372 Lines • ▼ Show 20 Lines	int TargetTransformInfo::getInterleavedMemoryOpCost(
unsigned Alignment, unsigned AddressSpace) const {		unsigned Alignment, unsigned AddressSpace) const {
int Cost = TTIImpl->getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,		int Cost = TTIImpl->getInterleavedMemoryOpCost(Opcode, VecTy, Factor, Indices,
Alignment, AddressSpace);		Alignment, AddressSpace);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Type *> Tys,		ArrayRef<Type *> Tys, FastMathFlags FMF,
FastMathFlags FMF) const {		unsigned ScalarizationCostPassed) const {
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Tys, FMF);		int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Tys, FMF,
		ScalarizationCostPassed);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,		int TargetTransformInfo::getIntrinsicInstrCost(Intrinsic::ID ID, Type *RetTy,
ArrayRef<Value *> Args,		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) const {
FastMathFlags FMF) const {		int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Args, FMF, VF);
int Cost = TTIImpl->getIntrinsicInstrCost(ID, RetTy, Args, FMF);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
return Cost;		return Cost;
}		}

int TargetTransformInfo::getCallInstrCost(Function F, Type RetTy,		int TargetTransformInfo::getCallInstrCost(Function F, Type RetTy,
ArrayRef<Type *> Tys) const {		ArrayRef<Type *> Tys) const {
int Cost = TTIImpl->getCallInstrCost(F, RetTy, Tys);		int Cost = TTIImpl->getCallInstrCost(F, RetTy, Tys);
assert(Cost >= 0 && "TTI should not produce negative costs!");		assert(Cost >= 0 && "TTI should not produce negative costs!");
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
int getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,		int getMaskedMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
unsigned AddressSpace);		unsigned AddressSpace);
int getGatherScatterOpCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGatherScatterOpCost(unsigned Opcode, Type DataTy, Value Ptr,
bool VariableMask, unsigned Alignment);		bool VariableMask, unsigned Alignment);
int getAddressComputationCost(Type PtrTy, ScalarEvolution SE,		int getAddressComputationCost(Type PtrTy, ScalarEvolution SE,
const SCEV *Ptr);		const SCEV *Ptr);

int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF);		ArrayRef<Type *> Tys, FastMathFlags FMF,
		unsigned ScalarizationCostPassed = UINT_MAX);
int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF);		ArrayRef<Value *> Args, FastMathFlags FMF,
		unsigned VF = 1);

int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm);		int getReductionCost(unsigned Opcode, Type *Ty, bool IsPairwiseForm);

int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		int getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Factor, ArrayRef<unsigned> Indices,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,		int getInterleavedMemoryOpCostAVX512(unsigned Opcode, Type *VecTy,
unsigned Factor, ArrayRef<unsigned> Indices,		unsigned Factor, ArrayRef<unsigned> Indices,
Show All 29 Lines

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 1,364 Lines • ▼ Show 20 Lines	int X86TTIImpl::getCmpSelInstrCost(unsigned Opcode, Type ValTy, Type CondTy) {
if (ST->hasSSE2())		if (ST->hasSSE2())
if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy);		return BaseT::getCmpSelInstrCost(Opcode, ValTy, CondTy);
}		}

int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Type *> Tys, FastMathFlags FMF) {		ArrayRef<Type *> Tys, FastMathFlags FMF,
		unsigned ScalarizationCostPassed) {
// Costs should match the codegen from:		// Costs should match the codegen from:
// BITREVERSE: llvm\test\CodeGen\X86\vector-bitreverse.ll		// BITREVERSE: llvm\test\CodeGen\X86\vector-bitreverse.ll
// BSWAP: llvm\test\CodeGen\X86\bswap-vector.ll		// BSWAP: llvm\test\CodeGen\X86\bswap-vector.ll
// CTLZ: llvm\test\CodeGen\X86\vector-lzcnt-*.ll		// CTLZ: llvm\test\CodeGen\X86\vector-lzcnt-*.ll
// CTPOP: llvm\test\CodeGen\X86\vector-popcnt-*.ll		// CTPOP: llvm\test\CodeGen\X86\vector-popcnt-*.ll
// CTTZ: llvm\test\CodeGen\X86\vector-tzcnt-*.ll		// CTTZ: llvm\test\CodeGen\X86\vector-tzcnt-*.ll
static const CostTblEntry XOPCostTbl[] = {		static const CostTblEntry XOPCostTbl[] = {
{ ISD::BITREVERSE, MVT::v4i64, 4 },		{ ISD::BITREVERSE, MVT::v4i64, 4 },
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines	int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
if (ST->hasSSE2())		if (ST->hasSSE2())
if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(SSE2CostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

if (ST->hasSSE1())		if (ST->hasSSE1())
if (const auto *Entry = CostTableLookup(SSE1CostTbl, ISD, MTy))		if (const auto *Entry = CostTableLookup(SSE1CostTbl, ISD, MTy))
return LT.first * Entry->Cost;		return LT.first * Entry->Cost;

return BaseT::getIntrinsicInstrCost(IID, RetTy, Tys, FMF);		return BaseT::getIntrinsicInstrCost(IID, RetTy, Tys, FMF, ScalarizationCostPassed);
}		}

int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,		int X86TTIImpl::getIntrinsicInstrCost(Intrinsic::ID IID, Type *RetTy,
ArrayRef<Value *> Args, FastMathFlags FMF) {		ArrayRef<Value *> Args, FastMathFlags FMF, unsigned VF) {
return BaseT::getIntrinsicInstrCost(IID, RetTy, Args, FMF);		return BaseT::getIntrinsicInstrCost(IID, RetTy, Args, FMF, VF);
}		}

int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {		int X86TTIImpl::getVectorInstrCost(unsigned Opcode, Type *Val, unsigned Index) {
assert(Val->isVectorTy() && "This must be a vector type");		assert(Val->isVectorTy() && "This must be a vector type");

Type *ScalarType = Val->getScalarType();		Type *ScalarType = Val->getScalarType();

if (Index != -1U) {		if (Index != -1U) {
▲ Show 20 Lines • Show All 695 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/BBVectorize.cpp

Show First 20 Lines • Show All 1,121 Lines • ▼ Show 20 Lines	if (CI && (FI = CI->getCalledFunction())) {
*A1JSCEV = SE->getSCEV(A1J);		*A1JSCEV = SE->getSCEV(A1J);
return (A1ISCEV == A1JSCEV);		return (A1ISCEV == A1JSCEV);
}		}

if (IID && TTI) {		if (IID && TTI) {
FastMathFlags FMFCI;		FastMathFlags FMFCI;
if (auto *FPMOCI = dyn_cast<FPMathOperator>(CI))		if (auto *FPMOCI = dyn_cast<FPMathOperator>(CI))
FMFCI = FPMOCI->getFastMathFlags();		FMFCI = FPMOCI->getFastMathFlags();
		SmallVector<Value *, 4> IArgs(CI->arg_operands());
		unsigned ICost = TTI->getIntrinsicInstrCost(IID, IT1, IArgs, FMFCI);

SmallVector<Type*, 4> Tys;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i)
Tys.push_back(CI->getArgOperand(i)->getType());
unsigned ICost = TTI->getIntrinsicInstrCost(IID, IT1, Tys, FMFCI);

Tys.clear();
CallInst *CJ = cast<CallInst>(J);		CallInst *CJ = cast<CallInst>(J);

FastMathFlags FMFCJ;		FastMathFlags FMFCJ;
if (auto *FPMOCJ = dyn_cast<FPMathOperator>(CJ))		if (auto *FPMOCJ = dyn_cast<FPMathOperator>(CJ))
FMFCJ = FPMOCJ->getFastMathFlags();		FMFCJ = FPMOCJ->getFastMathFlags();

for (unsigned i = 0, ie = CJ->getNumArgOperands(); i != ie; ++i)		SmallVector<Value *, 4> JArgs(CJ->arg_operands());
Tys.push_back(CJ->getArgOperand(i)->getType());		unsigned JCost = TTI->getIntrinsicInstrCost(IID, JT1, JArgs, FMFCJ);
unsigned JCost = TTI->getIntrinsicInstrCost(IID, JT1, Tys, FMFCJ);

Tys.clear();
assert(CI->getNumArgOperands() == CJ->getNumArgOperands() &&		assert(CI->getNumArgOperands() == CJ->getNumArgOperands() &&
"Intrinsic argument counts differ");		"Intrinsic argument counts differ");
		SmallVector<Type*, 4> Tys;
		SmallVector<Value *, 4> VecArgs;
for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {		for (unsigned i = 0, ie = CI->getNumArgOperands(); i != ie; ++i) {
if ((IID == Intrinsic::powi \|\| IID == Intrinsic::ctlz \|\|		if ((IID == Intrinsic::powi \|\| IID == Intrinsic::ctlz \|\|
IID == Intrinsic::cttz) && i == 1)		IID == Intrinsic::cttz) && i == 1) {
Tys.push_back(CI->getArgOperand(i)->getType());		Tys.push_back(CI->getArgOperand(i)->getType());
else		VecArgs.push_back(CI->getArgOperand(i));
		}
		else {
Tys.push_back(getVecTypeForPair(CI->getArgOperand(i)->getType(),		Tys.push_back(getVecTypeForPair(CI->getArgOperand(i)->getType(),
CJ->getArgOperand(i)->getType()));		CJ->getArgOperand(i)->getType()));
		// Add both operands, and then count their scalarization overhead
		// with VF 1.
		VecArgs.push_back(CI->getArgOperand(i));
		VecArgs.push_back(CJ->getArgOperand(i));
		}
}		}

		// Compute the scalarization cost here with the original operands (to
		// check for uniqueness etc), and then call getIntrinsicInstrCost()
		// with the constructed vector types.
		Type *RetTy = getVecTypeForPair(IT1, JT1);
		unsigned ScalarizationCost = 0;
		if (!RetTy->isVoidTy())
		ScalarizationCost += TTI->getScalarizationOverhead(RetTy, true, false);
		ScalarizationCost += TTI->getOperandsScalarizationOverhead(VecArgs, 1);

FastMathFlags FMFV = FMFCI;		FastMathFlags FMFV = FMFCI;
FMFV &= FMFCJ;		FMFV &= FMFCJ;
Type *RetTy = getVecTypeForPair(IT1, JT1);		unsigned VCost = TTI->getIntrinsicInstrCost(IID, RetTy, Tys, FMFV,
unsigned VCost = TTI->getIntrinsicInstrCost(IID, RetTy, Tys, FMFV);		ScalarizationCost);

if (VCost > ICost + JCost)		if (VCost > ICost + JCost)
return false;		return false;

// We don't want to fuse to a type that will be split, even		// We don't want to fuse to a type that will be split, even
// if the two input types will also be split and there is no other		// if the two input types will also be split and there is no other
// associated cost.		// associated cost.
unsigned RetParts = TTI->getNumberOfParts(RetTy);		unsigned RetParts = TTI->getNumberOfParts(RetTy);
▲ Show 20 Lines • Show All 2,099 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 3,778 Lines • ▼ Show 20 Lines
	// factor VF. Return the cost of the instruction, including scalarization			// factor VF. Return the cost of the instruction, including scalarization
	// overhead if it's needed.			// overhead if it's needed.
	static unsigned getVectorIntrinsicCost(CallInst *CI, unsigned VF,			static unsigned getVectorIntrinsicCost(CallInst *CI, unsigned VF,
	const TargetTransformInfo &TTI,			const TargetTransformInfo &TTI,
	const TargetLibraryInfo *TLI) {			const TargetLibraryInfo *TLI) {
	Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);			Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);
	assert(ID && "Expected intrinsic call!");			assert(ID && "Expected intrinsic call!");

	Type *RetTy = ToVectorTy(CI->getType(), VF);
	SmallVector<Type *, 4> Tys;
	for (Value *ArgOperand : CI->arg_operands())
	Tys.push_back(ToVectorTy(ArgOperand->getType(), VF));

	FastMathFlags FMF;			FastMathFlags FMF;
	if (auto *FPMO = dyn_cast<FPMathOperator>(CI))			if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
	FMF = FPMO->getFastMathFlags();			FMF = FPMO->getFastMathFlags();

	return TTI.getIntrinsicInstrCost(ID, RetTy, Tys, FMF);			SmallVector<Value *, 4> Operands(CI->arg_operands());
				return TTI.getIntrinsicInstrCost(ID, CI->getType(), Operands, FMF, VF);
	}			}

	static Type smallestIntegerVectorType(Type T1, Type *T2) {			static Type smallestIntegerVectorType(Type T1, Type *T2) {
	auto *I1 = cast<IntegerType>(T1->getVectorElementType());			auto *I1 = cast<IntegerType>(T1->getVectorElementType());
	auto *I2 = cast<IntegerType>(T2->getVectorElementType());			auto *I2 = cast<IntegerType>(T2->getVectorElementType());
	return I1->getBitWidth() < I2->getBitWidth() ? T1 : T2;			return I1->getBitWidth() < I2->getBitWidth() ? T1 : T2;
	}			}
	static Type largestIntegerVectorType(Type T1, Type *T2) {			static Type largestIntegerVectorType(Type T1, Type *T2) {
	▲ Show 20 Lines • Show All 4,022 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/SLPVectorizer.cpp

Show First 20 Lines • Show All 1,908 Lines • ▼ Show 20 Lines	case Instruction::Store: {
VecTy, alignment, 0);		VecTy, alignment, 0);
return VecStCost - ScalarStCost;		return VecStCost - ScalarStCost;
}		}
case Instruction::Call: {		case Instruction::Call: {
CallInst *CI = cast<CallInst>(VL0);		CallInst *CI = cast<CallInst>(VL0);
Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);		Intrinsic::ID ID = getVectorIntrinsicIDForCall(CI, TLI);

// Calculate the cost of the scalar and vector calls.		// Calculate the cost of the scalar and vector calls.
SmallVector<Type*, 4> ScalarTys, VecTys;		SmallVector<Type*, 4> ScalarTys;
for (unsigned op = 0, opc = CI->getNumArgOperands(); op!= opc; ++op) {		for (unsigned op = 0, opc = CI->getNumArgOperands(); op!= opc; ++op)
ScalarTys.push_back(CI->getArgOperand(op)->getType());		ScalarTys.push_back(CI->getArgOperand(op)->getType());
VecTys.push_back(VectorType::get(CI->getArgOperand(op)->getType(),
VecTy->getNumElements()));
}

FastMathFlags FMF;		FastMathFlags FMF;
if (auto *FPMO = dyn_cast<FPMathOperator>(CI))		if (auto *FPMO = dyn_cast<FPMathOperator>(CI))
FMF = FPMO->getFastMathFlags();		FMF = FPMO->getFastMathFlags();

int ScalarCallCost = VecTy->getNumElements() *		int ScalarCallCost = VecTy->getNumElements() *
TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys, FMF);		TTI->getIntrinsicInstrCost(ID, ScalarTy, ScalarTys, FMF);

int VecCallCost = TTI->getIntrinsicInstrCost(ID, VecTy, VecTys, FMF);		SmallVector<Value *, 4> Args(CI->arg_operands());
		int VecCallCost = TTI->getIntrinsicInstrCost(ID, CI->getType(), Args, FMF,
		VecTy->getNumElements());

DEBUG(dbgs() << "SLP: Call cost "<< VecCallCost - ScalarCallCost		DEBUG(dbgs() << "SLP: Call cost "<< VecCallCost - ScalarCallCost
<< " (" << VecCallCost << "-" << ScalarCallCost << ")"		<< " (" << VecCallCost << "-" << ScalarCallCost << ")"
<< " for " << *CI << "\n");		<< " for " << *CI << "\n");

return VecCallCost - ScalarCallCost;		return VecCallCost - ScalarCallCost;
}		}
case Instruction::ShuffleVector: {		case Instruction::ShuffleVector: {
▲ Show 20 Lines • Show All 3,285 Lines • Show Last 20 Lines

test/Analysis/CostModel/X86/arith-fp.ll

	Show First 20 Lines • Show All 450 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: 'fma'			; CHECK-LABEL: 'fma'
	define i32 @fma(i32 %arg) {			define i32 @fma(i32 %arg) {
	; SSE2: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32			; SSE2: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
	; SSE42: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32			; SSE42: cost of 10 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX2: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX2: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	; AVX512: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32			; AVX512: cost of 1 {{.*}} %F32 = call float @llvm.fma.f32
	%F32 = call float @llvm.fma.f32(float undef, float undef, float undef)			%F32 = call float @llvm.fma.f32(float undef, float undef, float undef)
	; SSE2: cost of 52 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; SSE2: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; SSE42: cost of 52 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; SSE42: cost of 43 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX2: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX2: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	; AVX512: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32			; AVX512: cost of 1 {{.*}} %V4F32 = call <4 x float> @llvm.fma.v4f32
	%V4F32 = call <4 x float> @llvm.fma.v4f32(<4 x float> undef, <4 x float> undef, <4 x float> undef)			%V4F32 = call <4 x float> @llvm.fma.v4f32(<4 x float> undef, <4 x float> undef, <4 x float> undef)
	; SSE2: cost of 104 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; SSE2: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; SSE42: cost of 104 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; SSE42: cost of 86 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX2: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX2: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	; AVX512: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32			; AVX512: cost of 1 {{.*}} %V8F32 = call <8 x float> @llvm.fma.v8f32
	%V8F32 = call <8 x float> @llvm.fma.v8f32(<8 x float> undef, <8 x float> undef, <8 x float> undef)			%V8F32 = call <8 x float> @llvm.fma.v8f32(<8 x float> undef, <8 x float> undef, <8 x float> undef)
	; SSE2: cost of 208 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; SSE2: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; SSE42: cost of 208 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; SSE42: cost of 172 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX2: cost of 4 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32			; AVX512: cost of 1 {{.*}} %V16F32 = call <16 x float> @llvm.fma.v16f32
	%V16F32 = call <16 x float> @llvm.fma.v16f32(<16 x float> undef, <16 x float> undef, <16 x float> undef)			%V16F32 = call <16 x float> @llvm.fma.v16f32(<16 x float> undef, <16 x float> undef, <16 x float> undef)

	; SSE2: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64			; SSE2: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
	; SSE42: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64			; SSE42: cost of 10 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX2: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX2: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	; AVX512: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64			; AVX512: cost of 1 {{.*}} %F64 = call double @llvm.fma.f64
	%F64 = call double @llvm.fma.f64(double undef, double undef, double undef)			%F64 = call double @llvm.fma.f64(double undef, double undef, double undef)
	; SSE2: cost of 24 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; SSE2: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; SSE42: cost of 24 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; SSE42: cost of 21 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX2: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX2: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	; AVX512: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64			; AVX512: cost of 1 {{.*}} %V2F64 = call <2 x double> @llvm.fma.v2f64
	%V2F64 = call <2 x double> @llvm.fma.v2f64(<2 x double> undef, <2 x double> undef, <2 x double> undef)			%V2F64 = call <2 x double> @llvm.fma.v2f64(<2 x double> undef, <2 x double> undef, <2 x double> undef)
	; SSE2: cost of 48 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; SSE2: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; SSE42: cost of 48 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; SSE42: cost of 42 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX2: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX2: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	; AVX512: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64			; AVX512: cost of 1 {{.*}} %V4F64 = call <4 x double> @llvm.fma.v4f64
	%V4F64 = call <4 x double> @llvm.fma.v4f64(<4 x double> undef, <4 x double> undef, <4 x double> undef)			%V4F64 = call <4 x double> @llvm.fma.v4f64(<4 x double> undef, <4 x double> undef, <4 x double> undef)
	; SSE2: cost of 96 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; SSE2: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; SSE42: cost of 96 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; SSE42: cost of 84 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX2: cost of 4 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64			; AVX512: cost of 1 {{.*}} %V8F64 = call <8 x double> @llvm.fma.v8f64
	%V8F64 = call <8 x double> @llvm.fma.v8f64(<8 x double> undef, <8 x double> undef, <8 x double> undef)			%V8F64 = call <8 x double> @llvm.fma.v8f64(<8 x double> undef, <8 x double> undef, <8 x double> undef)

	ret i32 undef			ret i32 undef
	}			}

	Show All 39 Lines

test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

	Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
	; allowed factor for AArch64 (4). Thus, we will fall back to the basic TTI			; allowed factor for AArch64 (4). Thus, we will fall back to the basic TTI
	; implementation for determining the cost of the interleaved load group. The			; implementation for determining the cost of the interleaved load group. The
	; stores do not form a legal interleaved group because the group would contain			; stores do not form a legal interleaved group because the group would contain
	; gaps.			; gaps.
	;			;
	; VF_2-LABEL: Checking a loop in "i64_factor_8"			; VF_2-LABEL: Checking a loop in "i64_factor_8"
	; VF_2: Found an estimated cost of 6 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8			; VF_2: Found an estimated cost of 6 for VF 2 For instruction: %tmp2 = load i64, i64* %tmp0, align 8
	; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, i64* %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 0 for VF 2 For instruction: %tmp3 = load i64, i64* %tmp1, align 8
	; VF_2-NEXT: Found an estimated cost of 10 for VF 2 For instruction: store i64 0, i64* %tmp0, align 8			; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp0, align 8
	hfinkelUnsubmitted Not Done Reply Inline Actions Why do these change? (they're not intrinsics). hfinkel: Why do these change? (they're not intrinsics).
	jonpaAuthorUnsubmitted Not Done Reply Inline Actions getOperandsScalarizationOverhead() has been improved so that it doesn't count extraction costs for constants. ARM: This is 2 for each extract, so for VF 4, 40 -> 32 makes sense, as well as for VF 8. Interleaved cost was 40, and now scalarizing the memop is 32. (AArch64: Same) To me these changes look ok. jonpa: getOperandsScalarizationOverhead() has been improved so that it doesn't count extraction costs…
	hfinkelUnsubmitted Not Done Reply Inline Actions Okay, please proceed. Please include this explanation for the test case changes in the commit message, however, so it is clear what's going on. hfinkel: Okay, please proceed. Please include this explanation for the test case changes in the commit…
	; VF_2-NEXT: Found an estimated cost of 10 for VF 2 For instruction: store i64 0, i64* %tmp1, align 8			; VF_2-NEXT: Found an estimated cost of 7 for VF 2 For instruction: store i64 0, i64* %tmp1, align 8
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 2			%tmp0 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 2
	%tmp1 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 6			%tmp1 = getelementptr inbounds %i64.8, %i64.8* %data, i64 %i, i32 6
	%tmp2 = load i64, i64* %tmp0, align 8			%tmp2 = load i64, i64* %tmp0, align 8
	%tmp3 = load i64, i64* %tmp1, align 8			%tmp3 = load i64, i64* %tmp1, align 8
	store i64 0, i64* %tmp0, align 8			store i64 0, i64* %tmp0, align 8
	store i64 0, i64* %tmp1, align 8			store i64 0, i64* %tmp1, align 8
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

	Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines
	define void @half_factor_2(%half.2* %data, i64 %n) {			define void @half_factor_2(%half.2* %data, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	; VF_4-LABEL: Checking a loop in "half_factor_2"			; VF_4-LABEL: Checking a loop in "half_factor_2"
	; VF_4: Found an estimated cost of 40 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_4: Found an estimated cost of 40 for VF 4 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_4-NEXT: Found an estimated cost of 0 for VF 4 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_4-NEXT: Found an estimated cost of 40 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_4-NEXT: Found an estimated cost of 32 for VF 4 For instruction: store half 0xH0000, half* %tmp1, align 2
	; VF_8-LABEL: Checking a loop in "half_factor_2"			; VF_8-LABEL: Checking a loop in "half_factor_2"
	; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2			; VF_8: Found an estimated cost of 80 for VF 8 For instruction: %tmp2 = load half, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: %tmp3 = load half, half* %tmp1, align 2
	; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2			; VF_8-NEXT: Found an estimated cost of 0 for VF 8 For instruction: store half 0xH0000, half* %tmp0, align 2
	; VF_8-NEXT: Found an estimated cost of 80 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2			; VF_8-NEXT: Found an estimated cost of 64 for VF 8 For instruction: store half 0xH0000, half* %tmp1, align 2
	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0			%tmp0 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 0
	%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1			%tmp1 = getelementptr inbounds %half.2, %half.2* %data, i64 %i, i32 1
	%tmp2 = load half, half* %tmp0, align 2			%tmp2 = load half, half* %tmp0, align 2
	%tmp3 = load half, half* %tmp1, align 2			%tmp3 = load half, half* %tmp1, align 2
	store half 0., half* %tmp0, align 2			store half 0., half* %tmp0, align 2
	store half 0., half* %tmp1, align 2			store half 0., half* %tmp1, align 2
	%i.next = add nuw nsw i64 %i, 1			%i.next = add nuw nsw i64 %i, 1
	%cond = icmp slt i64 %i.next, %n			%cond = icmp slt i64 %i.next, %n
	br i1 %cond, label %for.body, label %for.end			br i1 %cond, label %for.body, label %for.end

	for.end:			for.end:
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Scalarization overhead estimation in getIntrinsicInstrCost() improvedClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 91268

include/llvm/Analysis/TargetTransformInfo.h

include/llvm/Analysis/TargetTransformInfoImpl.h

include/llvm/CodeGen/BasicTTIImpl.h

lib/Analysis/CostModel.cpp

lib/Analysis/TargetTransformInfo.cpp

lib/Target/X86/X86TargetTransformInfo.h

lib/Target/X86/X86TargetTransformInfo.cpp

lib/Transforms/Vectorize/BBVectorize.cpp

lib/Transforms/Vectorize/LoopVectorize.cpp

lib/Transforms/Vectorize/SLPVectorizer.cpp

test/Analysis/CostModel/X86/arith-fp.ll

test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll

test/Transforms/LoopVectorize/ARM/interleaved_cost.ll

Scalarization overhead estimation in getIntrinsicInstrCost() improved
ClosedPublic