This is an archive of the discontinued LLVM Phabricator instance.

Scalarization overhead estimation in getIntrinsicInstrCost() improved
ClosedPublic

Authored by jonpa on Feb 4 2017, 4:58 AM.

Details

Summary

getIntrinsicInstrCost() used to only compute scalarization cost based on types. This patch improves this so that the actual arguments are checked when they are available, in order to handle only unique non-constant operands.

getIntrinsicInstrCost("Types") has gotten a new parameter 'ScalarizationCostPassed', so that a caller can pass on a value for scalarization cost based on actual operands. If this is UINT_MAX, the Type based estimation is made as before.

getIntrinsicInstrCost("Args") has gotten a new parameter 'VF'. If VF > 1, the types involved are vectorized with it before analyzing begins (vectorized Args that are passed can however not be combined with VF > 1). This seemed like a good idea, since both SLPVectorizer and LoopVectorizer can use this.

getOperandsScalarizationOverhead() now also checks for Constants. It has also been extended, to allow vector operands (in case which VF must be 1). I deduced this to be needed since BBVectorize calls getVecTypeForPair(), which also handles vector types as input. I am not that familiar with BBVectrize however, so if this is wrong in the sense that BBVectorize never further vectorizes a vector intrinsic, this is then not needed...

In BBVectorize, things got a bit tricky while handling merged arguments. Here, the scalarization cost is computed locally, by considering all the input operands of both instructions, plus the vectorized return type. Since vectorization is done by arguments merging, getOperandsScalarizationOverhead() is then called for all operands with a VF of 1. I hope this is right.

test/Analysis/CostModel/X86/arith-fp.ll has been updated to expect lower instruction costs, which should make sense since the calls has undef operands, so the scalarization cost of them (extracts) should not be added.

Diff Detail

Event Timeline

jonpa created this revision.Feb 4 2017, 4:58 AM
hfinkel added inline comments.Mar 8 2017, 5:57 AM
include/llvm/Analysis/TargetTransformInfo.h
632

Please define what "it" is. This is the cost of scalarizing the arguments and the return value, right?

639
/// Three cases are handled:
641

"If VF > 1, it will be applied before analyzis." - I don't see how this adds value to the user of the interface? I don't see how else it could work or what it would mean if it worked otherwise.

include/llvm/CodeGen/BasicTTIImpl.h
725

"Can't multiply VF here." does not really explain the problem. How about, "VF > 1 and RetVF is a vector type"?

742

I don't understand why RetVF > 1 is here. If RetVF is not one, then VF needs to be one, and so we're not vectorizing. As a result, we're not scalarizing either, and so adding the scalarization cost associated with the return type seems odd.

784

This interface seems a bit broken for how we're using it. Why are we assuming that we need to scalarize all intrinsics? (many target-specific intrinsics are really vector intrinsics - maybe only assume we need to scalarize the generic ones for which we don't have special handling?).

jonpa updated this revision to Diff 91140.Mar 9 2017, 12:19 AM
jonpa marked 4 inline comments as done.

Updated per review.

include/llvm/Analysis/TargetTransformInfo.h
632

yes

641

ok - removed that sentence :-)

include/llvm/CodeGen/BasicTTIImpl.h
725

sure

742

My idea was that this function can be called from different contexts ("three cases").

The LoopVectorizer can pass the scalar instruction and VF > 1.
CostModel can pass a vector instruction, RetVF > 1 (and VF argument defaults to 1).

Both cases should produce the same return value.

Am I missing something?

784

I was assuming this made sense in the 'default:' case at least.

How scalarization cost is computed, and when it is done are two separate issues, so perhaps that can be handled later with a separate patch?

hfinkel accepted this revision.Mar 9 2017, 1:13 PM

LGTM.

In BBVectorize, things got a bit tricky while handling merged arguments. Here, the scalarization cost is computed locally, by considering all the input operands of both instructions, plus the vectorized return type. Since vectorization is done by arguments merging, getOperandsScalarizationOverhead() is then called for all operands with a VF of 1. I hope this is right.

Eeh, don't worry about it. I'm going to delete the whole pass at some point soon anyway.

include/llvm/CodeGen/BasicTTIImpl.h
742

Maybe the underlying problem here is the:

// Assume that we need to scalarize this intrinsic.

this seems like a bad assumption for a general code model (although it probably makes sense for the vectorizer because the vectorizer's legality checking will bail out before we get here for a vector intrinsic).

In any case, a comment here explaining the logic would be good.

784

Yea, this should be cleaned up as a separate patch.

This revision is now accepted and ready to land.Mar 9 2017, 1:13 PM
jonpa updated this revision to Diff 91268.Mar 9 2017, 10:47 PM

Thanks for review.

In any case, a comment here explaining the logic would be good.

Added a comment here - is it clear enough?

I also had to update two more tests for ARM and AArch64 targets. Please have a look and let me know if this is acceptable, before I commit.

hfinkel added inline comments.Mar 12 2017, 6:44 PM
test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173 ↗(On Diff #91268)

Why do these change? (they're not intrinsics).

jonpa added inline comments.Mar 13 2017, 12:40 AM
test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173 ↗(On Diff #91268)

getOperandsScalarizationOverhead() has been improved so that it doesn't count extraction
costs for constants.

ARM: This is 2 for each extract, so for VF 4, 40 -> 32 makes sense, as
well as for VF 8. Interleaved cost was 40, and now scalarizing the memop is 32.
(AArch64: Same)

To me these changes look ok.

hfinkel added inline comments.Mar 13 2017, 6:41 AM
test/Transforms/LoopVectorize/AArch64/interleaved_cost.ll
173 ↗(On Diff #91268)

Okay, please proceed. Please include this explanation for the test case changes in the commit message, however, so it is clear what's going on.

jonpa closed this revision.Mar 13 2017, 11:49 PM

Commited as @297705.

vsk added subscribers: bruno, vsk.Mar 14 2017, 12:59 PM

There's a chance this commit broke our code coverage bot.

@jonpa Could you take a look? http://green.lab.llvm.org/green/job/clang-stage2-coverage-R_build/926/

FAILED: lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o 
/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/bin/clang++   -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/CodeGen/SelectionDAG -I/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG -Iinclude -I/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/include -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Werror=date-time -std=c++11 -fcolor-diagnostics -fprofile-instr-generate='/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/profiles/%6m.profraw' -fcoverage-mapping -O3 -DNDEBUG    -fno-exceptions -fno-rtti -MMD -MT lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -MF lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o.d -o lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -c '/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp'
Assertion failed: (Ty->isVectorTy() && "Can only scalarize vectors"), function getScalarizationOverhead, file /Users/buildslave/jenkins/sharedspace/phase1@2/llvm/include/llvm/CodeGen/BasicTTIImpl.h, line 294.
0  clang-5.0                0x000000010df391b8 llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 40
1  clang-5.0                0x000000010df38416 llvm::sys::RunSignalHandlers() + 86
2  clang-5.0                0x000000010df39859 SignalHandler(int) + 361
3  libsystem_platform.dylib 0x00007fff897db52a _sigtramp + 26
4  libsystem_platform.dylib 0x00007fe600000040 _sigtramp + 1988250416
5  libsystem_c.dylib        0x00007fff99af66df abort + 129
6  libsystem_c.dylib        0x00007fff99abddd8 basename + 0
7  clang-5.0                0x000000010d3e4c13 llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getIntrinsicInstrCost(llvm::Intrinsic::ID, llvm::Type*, llvm::ArrayRef<llvm::Value*>, llvm::FastMathFlags, unsigned int) + 1267
8  clang-5.0                0x000000010d6344cc llvm::TargetTransformInfo::getIntrinsicInstrCost(llvm::Intrinsic::ID, llvm::Type*, llvm::ArrayRef<llvm::Value*>, llvm::FastMathFlags, unsigned int) const + 28
9  clang-5.0                0x000000010e05fee5 getVectorIntrinsicCost(llvm::CallInst*, unsigned int, llvm::TargetTransformInfo const&, llvm::TargetLibraryInfo const*) + 389
10 clang-5.0                0x000000010e05ed06 (anonymous namespace)::LoopVectorizationCostModel::getInstructionCost(llvm::Instruction*, unsigned int) + 1782
11 clang-5.0                0x000000010e06009f (anonymous namespace)::LoopVectorizationCostModel::expectedCost(unsigned int) + 319
12 clang-5.0                0x000000010e04a8bf llvm::LoopVectorizePass::processLoop(llvm::Loop*) + 8735
13 clang-5.0                0x000000010e0553ab llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__1::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&) + 539
14 clang-5.0                0x000000010e056c23 (anonymous namespace)::LoopVectorize::runOnFunction(llvm::Function&) + 1235
15 clang-5.0                0x000000010d9efee3 llvm::FPPassManager::runOnFunction(llvm::Function&) + 547
16 clang-5.0                0x000000010d9f0143 llvm::FPPassManager::runOnModule(llvm::Module&) + 51
17 clang-5.0                0x000000010d9f05ce llvm::legacy::PassManagerImpl::run(llvm::Module&) + 766
18 clang-5.0                0x000000010e1469a3 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::__1::unique_ptr<llvm::raw_pwrite_stream, std::__1::default_delete<llvm::raw_pwrite_stream> >) + 14579
19 clang-5.0                0x000000010e344cc5 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) + 917
20 clang-5.0                0x000000010ec93245 clang::ParseAST(clang::Sema&, bool, bool) + 469
21 clang-5.0                0x000000010e5970cc clang::FrontendAction::Execute() + 76
22 clang-5.0                0x000000010e556381 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) + 1217
23 clang-5.0                0x000000010e5ef3a8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) + 4904
24 clang-5.0                0x000000010c9584cc cc1_main(llvm::ArrayRef<char const*>, char const*, void*) + 1388
25 clang-5.0                0x000000010c956c33 main + 11955
26 libdyld.dylib            0x00007fff93aba5ad start + 1
27 libdyld.dylib            0x0000000000000060 start + 1817467572
Stack dump:
0.	Program arguments: /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/bin/clang-5.0 -cc1 -triple x86_64-apple-macosx10.11.0 -Wdeprecated-objc-isa-usage -Werror=deprecated-objc-isa-usage -emit-obj -disable-free -main-file-name LegalizeIntegerTypes.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -mdisable-fp-elim -masm-verbose -munwind-tables -target-cpu core2 -target-linker-version 264.3.102 -dwarf-column-info -debugger-tuning=lldb -fprofile-instrument-path=/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/profiles/%6m.profraw -fprofile-instrument=clang -fcoverage-mapping -coverage-notes-file /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build/lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.gcno -resource-dir /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/host-compiler/lib/clang/5.0.0 -dependency-file lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o.d -MT lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -D GTEST_HAS_RTTI=0 -D LLVM_BUILD_GLOBAL_ISEL -D __STDC_CONSTANT_MACROS -D __STDC_FORMAT_MACROS -D __STDC_LIMIT_MACROS -I lib/CodeGen/SelectionDAG -I /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG -I include -I /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/include -D NDEBUG -stdlib=libc++ -O3 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -Werror=date-time -pedantic -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/clang-build -ferror-limit 19 -fmessage-length 0 -fvisibility-inlines-hidden -stack-protector 1 -fblocks -fno-rtti -fobjc-runtime=macosx-10.11.0 -fencode-extended-block-signature -fmax-type-align=16 -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -o lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o -x c++ /Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp 
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'Function Pass Manager' on module '/Users/buildslave/jenkins/sharedspace/clang-stage2-coverage-R@2/llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp'.
4.	Running pass 'Loop Vectorization' on function '@_ZN4llvm16DAGTypeLegalizer28PromoteIntRes_CONCAT_VECTORSEPNS_6SDNodeE'
jonpa added a comment.Mar 15 2017, 3:32 AM
In D29540#700891, @vsk wrote:

There's a chance this commit broke our code coverage bot.

@jonpa Could you take a look? http://green.lab.llvm.org/green/job/clang-stage2-coverage-R_build/926/

FAILED: lib/CodeGen/SelectionDAG/CMakeFiles/LLVMSelectionDAG.dir/LegalizeIntegerTypes.cpp.o

I tried to build this with same cmake flags, but then got:
/usr/bin/ld: cannot find /root/llvm/install/llvm-dev/lib/clang/5.0.0/lib/linux/libclang_rt.profile-s390x.a: No such file or directory
If I removed -DLLVM_BUILD_INSTRUMENTED_COVERAGE=ON, everything builds fine.

Could you perhaps send me a reduced test case with command for llc?

Looking at the problem, it is difficult to tell where the problem is. It "should work" because has explicit checks everywhere if the intrinsic is vectorized or not, and only then should getScalarizationOverhead() be called. Could this be an intrinsic that returns a vector type even though it itself is not being vectorized?

Sorry, but I can't help more without a test case.

vsk added a comment.Mar 15 2017, 11:22 AM

Thanks for taking a look, I filed a PR: http://bugs.llvm.org/show_bug.cgi?id=32285

jonpa added a comment.Mar 16 2017, 1:55 AM
In D29540#701888, @vsk wrote:

Thanks for taking a look, I filed a PR: http://bugs.llvm.org/show_bug.cgi?id=32285

I think this was an omission to check for a void return type. See my fix at https://reviews.llvm.org/D31024.

jonpa added a comment.Mar 16 2017, 7:19 AM

Fix commited as r297954