Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Don't bother type legalizing, just count the elements.
Only handle reciprocal throughput.
Only handle vectors that won't scalarize.
I've not done much in this area of RISC-V, but I take it the non-vector costs are suitably higher than the vector costs? We'd still prefer to vectorize these?
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h | ||
---|---|---|
64 | nit: this clang-format suggestion looks good |
The non-vector cost I think is just the same getMemoryOpCost call we're using here multiplied by VF. X86 adds a gather/scatter overhead cost on top of the element*getMemoryOpCost. It doesn't look like AArch64 adds any extra cost and they only support scalable vectors.
I think we'll compute the same cost for scalar loads/stores and gather/scatter with this patch. Hopefully vectorizing the GEP is accounted for separately.
I've played around with this for my own benefit, and it looks like illegal-vector scatter/gathers give roughly these scores multiplied by 3 to account for extract_element and some scalarization overhead. Before this patch all vector scatter/gathers are treated as illegal and so all costs are the mul-by-3 ones.
It sounds like this change will affect vectorization decisions. Should we have loop-vectorizer tests to accompany this?
Grab a loop vectorize test cases from X86 to show that it works. I can add more data types if we want.
This breaks compiling the Linux kernel for RISC-V (defconfig):
$ make -skj"$(nproc)" ARCH=riscv CC=clang CROSS_COMPILE=riscv64-linux-gnu- O=build/riscv distclean defconfig drivers/scsi/scsi_common.o clang-13: /home/nathan/cbl/github/tc-build/llvm-project/llvm/lib/Target/RISCV/RISCVSubtarget.cpp:124: unsigned int llvm::RISCVSubtarget::getMinRVVVectorSizeInBits() const: Assertion `hasStdExtV() && "Tried to get vector length without V extension support!"' failed. PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script. Stack dump: 0. Program arguments: /home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13 -cc1 -triple riscv64-unknown-linux-gnu -S -disable-free -main-file-name scsi_common.c -mrelocation-model static -fno-delete-null-pointer-checks -mllvm -warn-stack-size=2048 -mframe-pointer=all -relaxed-aliasing -mdisable-tail-calls -fmath-errno -fno-rounding-math -no-integrated-as -mconstructor-aliases -mcmodel=medium -target-feature +m -target-feature +a -target-feature +c -target-feature +relax -target-feature -save-restore -target-abi lp64 -msmall-data-limit 8 -debugger-tuning=gdb -fcoverage-compilation-dir=/home/nathan/cbl/src/linux/build/riscv -nostdsysteminc -nobuiltininc -resource-dir /home/nathan/cbl/github/tc-build/build/llvm/stage1/lib/clang/13.0.0 -dependency-file drivers/scsi/.scsi_common.o.d -MT drivers/scsi/scsi_common.o -isystem /home/nathan/cbl/github/tc-build/build/llvm/stage1/lib/clang/13.0.0/include -include /home/nathan/cbl/src/linux/include/linux/compiler-version.h -include /home/nathan/cbl/src/linux/include/linux/kconfig.h -include /home/nathan/cbl/src/linux/include/linux/compiler_types.h -I /home/nathan/cbl/src/linux/arch/riscv/include -I ./arch/riscv/include/generated -I /home/nathan/cbl/src/linux/include -I ./include -I /home/nathan/cbl/src/linux/arch/riscv/include/uapi -I ./arch/riscv/include/generated/uapi -I /home/nathan/cbl/src/linux/include/uapi -I ./include/generated/uapi -D __KERNEL__ -D CONFIG_PAGE_OFFSET=0xffffffe000000000 -I /home/nathan/cbl/src/linux/drivers/scsi -I ./drivers/scsi -D KBUILD_MODFILE=\"drivers/scsi/scsi_common\" -D KBUILD_BASENAME=\"scsi_common\" -D KBUILD_MODNAME=\"scsi_common\" -D __KBUILD_MODNAME=kmod_scsi_common -fmacro-prefix-map=/home/nathan/cbl/src/linux/= -O2 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -Werror=implicit-function-declaration -Werror=implicit-int -Werror=return-type -Wno-format-security -Werror=unknown-warning-option -Wno-frame-address -Wno-address-of-packed-member -Wno-format-invalid-specifier -Wno-gnu -Wno-unused-const-variable -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-array-bounds -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-pointer-to-enum-cast -Wno-tautological-constant-out-of-range-compare -std=gnu89 -fno-dwarf-directory-asm -fdebug-compilation-dir=/home/nathan/cbl/src/linux/build/riscv -ferror-limit 19 -fwrapv -stack-protector 2 -fno-signed-char -fwchar-type=short -fno-signed-wchar -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -o /tmp/scsi_common-bb0ee4.s -x c /home/nathan/cbl/src/linux/drivers/scsi/scsi_common.c 1. <eof> parser at end of file 2. Optimizer #0 0x0000000002b413c3 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2b413c3) #1 0x0000000002b3f22e llvm::sys::RunSignalHandlers() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2b3f22e) #2 0x0000000002b4188a SignalHandler(int) Signals.cpp:0:0 #3 0x00007f6951ca5960 __restore_rt sigaction.c:0:0 #4 0x00007f6951797ef5 raise (/usr/lib/libc.so.6+0x3cef5) #5 0x00007f6951781862 abort (/usr/lib/libc.so.6+0x26862) #6 0x00007f6951781747 _nl_load_domain.cold loadmsgcat.c:0:0 #7 0x00007f6951790646 (/usr/lib/libc.so.6+0x35646) #8 0x000000000196142d llvm::RISCVSubtarget::getMinRVVVectorSizeInBits() const RISCVSubtarget.cpp:0:0 #9 0x00000000019be430 llvm::RISCVTTIImpl::getGatherScatterOpCost(unsigned int, llvm::Type*, llvm::Value const*, bool, llvm::Align, llvm::TargetTransformInfo::TargetCostKind, llvm::Instruction const*) RISCVTargetTransformInfo.cpp:0:0 #10 0x0000000001ec6188 llvm::TargetTransformInfo::getGatherScatterOpCost(unsigned int, llvm::Type*, llvm::Value const*, bool, llvm::Align, llvm::TargetTransformInfo::TargetCostKind, llvm::Instruction const*) const (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x1ec6188) #11 0x0000000002cfc60e llvm::slpvectorizer::BoUpSLP::getEntryCost(llvm::slpvectorizer::BoUpSLP::TreeEntry*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2cfc60e) #12 0x0000000002cffe8e llvm::slpvectorizer::BoUpSLP::getTreeCost() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2cffe8e) #13 0x0000000002d166a5 tryToVectorizeHorReductionOrInstOperands(llvm::PHINode*, llvm::Instruction*, llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*, llvm::function_ref<bool (llvm::Instruction*, llvm::slpvectorizer::BoUpSLP&)>) SLPVectorizer.cpp:0:0 #14 0x0000000002d0f072 llvm::SLPVectorizerPass::vectorizeChainsInBlock(llvm::BasicBlock*, llvm::slpvectorizer::BoUpSLP&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2d0f072) #15 0x0000000002d0d90c llvm::SLPVectorizerPass::runImpl(llvm::Function&, llvm::ScalarEvolution*, llvm::TargetTransformInfo*, llvm::TargetLibraryInfo*, llvm::AAResults*, llvm::LoopInfo*, llvm::DominatorTree*, llvm::AssumptionCache*, llvm::DemandedBits*, llvm::OptimizationRemarkEmitter*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2d0d90c) #16 0x0000000002d0ce37 llvm::SLPVectorizerPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x2d0ce37) #17 0x0000000003c2655d llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) PassBuilder.cpp:0:0 #18 0x00000000024b8835 llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x24b8835) #19 0x00000000031a626d llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) BackendUtil.cpp:0:0 #20 0x00000000024bc124 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x24bc124) #21 0x00000000031a7f2d llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) BackendUtil.cpp:0:0 #22 0x00000000024b71b8 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x24b71b8) #23 0x000000000319c56b (anonymous namespace)::EmitAssemblyHelper::EmitAssemblyWithNewPassManager(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) BackendUtil.cpp:0:0 #24 0x0000000003196796 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x3196796) #25 0x00000000036611a9 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) CodeGenAction.cpp:0:0 #26 0x0000000003d8a4a4 clang::ParseAST(clang::Sema&, bool, bool) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x3d8a4a4) #27 0x00000000035b9150 clang::FrontendAction::Execute() (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x35b9150) #28 0x00000000034f4c6a clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x34f4c6a) #29 0x000000000365b0c8 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x365b0c8) #30 0x0000000001940ec4 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x1940ec4) #31 0x000000000193e982 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) driver.cpp:0:0 #32 0x000000000193e695 main (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x193e695) #33 0x00007f6951782b25 __libc_start_main (/usr/lib/libc.so.6+0x27b25) #34 0x000000000193b60e _start (/home/nathan/cbl/github/tc-build/build/llvm/stage1/bin/clang-13+0x193b60e) clang-13: error: unable to execute command: Aborted (core dumped) clang-13: error: clang frontend command failed due to signal (use -v to see invocation) ClangBuiltLinux clang version 13.0.0 (https://github.com/llvm/llvm-project 512bae81cc525de436f0d49d11b563f2afd51037) Target: riscv64-unknown-linux-gnu Thread model: posix InstalledDir: /home/nathan/cbl/github/tc-build/build/llvm/stage1/bin ...
A reduced reproducer:
$ cat scsi_common.i struct { char scsi_lun[8] } scsilun_to_int_scsilun; scsilun_to_int_i; scsilun_to_int() { long lun = scsilun_to_int_i = 0; for (; scsilun_to_int_i < sizeof(lun); scsilun_to_int_i += 2) lun = lun | scsilun_to_int_scsilun.scsi_lun[scsilun_to_int_i] + 1; return lun; } $ clang -O2 --target=riscv64-linux-gnu -c -o /dev/null scsi_common.i ... clang: /home/nathan/cbl/github/tc-build/llvm-project/llvm/lib/Target/RISCV/RISCVSubtarget.cpp:124: unsigned int llvm::RISCVSubtarget::getMinRVVVectorSizeInBits() const: Assertion `hasStdExtV() && "Tried to get vector length without V extension support!"' failed. ...
@craig.topper It seems for intrinsic llvm.masked.gather.xxx, the backend will produce similar assembly instructions.
Eg, llvm.masked.gather.v8f64.v8p0f64 will get
vsetivli zero, 8, e64, m4, tu, mu vluxei64.v v12, (zero), v8, v0.t vmv4r.v v8, v12
, and llvm.masked.gather.v4f64.v4p0f64 will get
vsetivli zero, 4, e64, m2, tu, mu vluxei64.v v10, (zero), v8, v0.t vmv2r.v v8, v10
So why the cost is calculated by element number? I'm not quite clear about this. Can you give some reference or clue? Thanks.
The memory system in a chip is likely unable to handle all of the accesses in parallel since they go to disjoint addresses. Each address could be in a different cache line. Most implementations would not be able to read all of the cache lines simultaneously.
clang-format not found in user's PATH; not linting file.