Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.
Details
- Reviewers
RKSimon anton-afanasyev dtemirbulatov - Commits
- rGad2a0ccf8f98: [SLP]Alternate vectorization for cmp instructions.
rG842a2360a846: [SLP]Alternate vectorization for cmp instructions.
rG83620bd2ad86: [SLP]Alternate vectorization for cmp instructions.
rGafaaecc88c6e: [SLP]Alternate vectorization for cmp instructions.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
110 ms | x64 debian > LLVM.Bindings/Go::go.test |
Event Timeline
LGTM - cheers
llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll | ||
---|---|---|
213 | Remove this TODO? Or move it down to one of the later tests? |
Nope, just joined a new company, need some time to configure the environment. Will commit after that.
Congrats! The 14.x branch is Feb 1 so ideally we need the SLP patches cooking in trunk for a few days beforehand, cheers.
This makes clang crash when building SciPy. Reverted in 5281f0dab2398fdbc60fc7131f8ad2438600e7ae and test case at P8276
Just a heads-up: This commit appears to be breaking some of our tests. We're working on trying to get a reproducible test case.
We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?
I don't have a reproducible test case yet, but comparing IR before and after the change, I think what is happening is that we originally had a comparison like icmp slt i64 %VALUE, 2147483648; after this change, this gets combined into a vectorized cmp but (apparently) with the condition reversed: icmp sgt <4 x i64> %1057, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483648>, where %VALUE is in the last slot of the vector, thus giving us effectively a sgt comparison where we should have slt.
(Needless to say, this causes amusing failures, with error messages like Error: Product of extents for buffer is 16384, which exceeds the maximum size of 2147483647...)
Anyway, I don't think I can narrow this to an easily encapsulated repro case to give you tonight, but in the meantime, perhaps the nature of the apparent failure mode may help track down the issue?
I went ahead and reverted it for now to unblock things. Hopefully Halide folks can find a test case soon.
Hi, can you copy-paste a small piece of the transformed code with the vector instructions? +- 10 instructions before and after icmp sgt <4 x i64> %1057, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483648>?
I have enclosed the smallest before-and-after cases that I've been able to narrow it down to, so far. I am going to hold off on trying to reduce it further until I hear back from you as to whether this is helpful or not.
The .ll files are what Halide produces via use with LLVM with the given commit (17a39aecd1703d94e2b1dff0dedd690cc1b0aaaf is correct behavior, 83620bd2ad867f706c699d0f2b8be10e43d9f3d7 is this commit with the incorrect behavior).
I realize this isn't very small, but I think the relevant sections are line 779 in the "before" case ( %.inv = icmp slt i64 %points_image2.total_extent.2, 2147483648), which seems to get incorrectly vectorized into a sgt in the "after" case (line 785 or so).
The .s files are the resulting assembly on x86-64. Note that we are configured to generate SSE4.1 and AVX, but *not* AVX2.
I committed a fixed version of the patch, could you try your repro with the updated compiler?
It still fails.
LLVM built with commit 287ce6b51675aee43870bebfff68bb144d1ab90e succeeds, but built with commit 842a2360a84692f2e4c37cc3e652640e6627d004 (your updated commit) still fails with the same symptom ("16384 > 2147483647").
(I haven't re-run my tests from earlier today with these new commits to see how the IR differs.)
Oh, and it pains me to say this, but we'll probably need to revert the latest change too, since it also breaks things :-(
Update: one of Halide's build-in tests fails reliably (for me) with these patches in place; if you pull and build Halide locally, it may make it easier for you to reproduce the failures locally. Here are the steps I think are necessary:
- git clone https://github.com/halide/Halide.git /path/to/halide
- cd /path/to/halide
- git checkout master
- export LLVM_CONFIG=/path/to/llvm-install/bin/llvm-config
- export LLVM_DIR=/path/to/llvm-install/lib/cmake/llvm
- export HL_TARGET=x86-64-linux-sse41 # Assumes you are on an x86-64 Linux machine w/ SSE4.1
- make -j8 generator_aot_metadata_tester
(wait for build & test to complete)
If everything is fine, this will produce something like Success!.
If broken, this will produce something like Error: Total allocation for buffer array_outputs6_0 is 1024, which exceeds the maximum size of 2147483647 (Aborted)
Hi, thanks, already found a bug using your previous archive, it was a misprint I overlooked, will commit fixed version tomorrow.
Here was the wrong code:
CmpInst::Predicate CurrentPred = CI->getPredicate();
in function buildTree_rec. Need to use Cmp instead of CI.
Did the final version of this change make it into the LLVM 14rc branch? I don't think it did because Halide just added testing on that branch and we're seeing the same sort of errors there now. What's the protocol for getting fixes into that branch?
That doesn't appear to be accurate -- the revert, https://github.com/llvm/llvm-project/commit/0c3d22a5926dd7d6fd28682683abd9d476f8e78b didn't make it into 14.x branch cut, while the initial change, https://github.com/llvm/llvm-project/commit/83620bd2ad867f706c699d0f2b8be10e43d9f3d7 did. There is no cherry-pick revert subsequently on the branch, either.
I've filed https://github.com/llvm/llvm-project/issues/53678
SLPVectorizer patches seem extremely revert-prone, suggesting that the pass's code
is in need of a serous refactoring and in an increase of the amount of people who know that code.
I think, the last commit (ad2a0ccf8f9816fd7805138bbf582117246f5489) still has some problems. We are seeing sporadic clang crashes after this commit. I'm trying to confirm this by comparing this revision with the previous one and trying to get a test case.
So far I have a stack trace from a release clang build:
#0 0x000055efacbf6c29 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) #1 0x000055efacbf7065 SignalHandler(int) #2 0x00007f1f25022750 __restore_rt #3 0x000055efa973210b (anonymous namespace)::LazyValueInfoImpl::getEdgeValue(llvm::Value*, llvm::BasicBlock*, llvm::BasicBlock*, llvm::Instruction*) #4 0x000055efa972db80 (anonymous namespace)::LazyValueInfoImpl::solve() (.llvm.1834014330660040286) #5 0x000055efa972a2dd (anonymous namespace)::LazyValueInfoImpl::getValueInBlock(llvm::Value*, llvm::BasicBlock*, llvm::Instruction*) #6 0x000055efa978882b runImpl(llvm::Function&, llvm::LazyValueInfo*, llvm::DominatorTree*, llvm::SimplifyQuery const&) #7 0x000055efa9787755 llvm::CorrelatedValuePropagationPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) #8 0x000055efa97873b2 llvm::detail::PassModel<llvm::Function, llvm::CorrelatedValuePropagationPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) #9 0x000055efa983d69a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) #10 0x000055efa983d2b2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) #11 0x000055efa999414e llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #12 0x000055efa9993b32 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #13 0x000055efa9940a64 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #14 0x000055efa99406f2 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #15 0x000055efa987de69 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #16 0x000055efa987dc52 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) #17 0x000055efa98934a0 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #18 0x000055efa919ddb2 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #19 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #20 0x000055efa8f49e4a llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #21 0x000055efa8f49c92 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #22 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) #23 0x000055efa8f8c260 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&) #24 0x000055efa8f84bb2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) #25 0x000055efa9a4b8d1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) #26 0x000055efa8f7e180 clang::ParseAST(clang::Sema&, bool, bool) #27 0x000055efa90338f5 clang::FrontendAction::Execute() #28 0x000055efa903372e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) #29 0x000055efa90335a1 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) #30 0x000055efa9a15d58 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) #31 0x000055efa9112f7e ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) #32 0x000055efa98b2a2d main
... and an assertion from a checked clang build, but it has a different stack trace:
assert.h assertion failed at llvm/include/llvm/Support/Casting.h:262 in typename cast_retty<X, Y>::ret_type llvm::cast(Y &) [X = clang::CodeGen::EHCleanupScope, Y = clang::CodeGen::EHScope]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!" @ 0x55948ac83344 __assert_fail @ 0x559486524fb0 clang::CodeGen::CodeGenFunction::DeactivateCleanupBlock() @ 0x55948640de42 clang::CodeGen::CodeGenFunction::EmitCall() @ 0x559486421016 clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall() @ 0x559486420429 clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall() @ 0x5594865d9cb8 clang::CodeGen::CodeGenFunction::EmitCXXConstructExpr() @ 0x5594865cb013 (anonymous namespace)::AggExprEmitter::VisitCXXConstructExpr() @ 0x5594865c6ff6 clang::CodeGen::CodeGenFunction::EmitAggExpr() @ 0x5594865dbb0b StoreAnyExprIntoOneUnit() @ 0x5594865de49d clang::CodeGen::CodeGenFunction::EmitCXXNewExpr() @ 0x55948646006a (anonymous namespace)::ScalarExprEmitter::VisitExprWithCleanups() @ 0x5594864507ab clang::CodeGen::CodeGenFunction::EmitScalarExpr() @ 0x55948642ae09 clang::CodeGen::CodeGenFunction::EmitAnyExpr() @ 0x55948642ada1 clang::CodeGen::CodeGenFunction::EmitIgnoredExpr() @ 0x5594864c8790 clang::CodeGen::CodeGenFunction::EmitStmt() @ 0x5594864d4910 clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope() @ 0x5594866e9ed3 clang::CodeGen::CodeGenFunction::EmitFunctionBody() @ 0x5594866eac9e clang::CodeGen::CodeGenFunction::GenerateCode() @ 0x55948670a722 clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition() @ 0x559486702e90 clang::CodeGen::CodeGenModule::EmitGlobalDefinition() @ 0x5594866f734f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f737f clang::CodeGen::CodeGenModule::EmitDeferred() @ 0x5594866f587c clang::CodeGen::CodeGenModule::Release() @ 0x559486827989 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit() @ 0x5594863da495 clang::BackendConsumer::HandleTranslationUnit() @ 0x55948718fa94 clang::ParseAST() @ 0x559486f07443 clang::FrontendAction::Execute() @ 0x559486e5d3bf clang::CompilerInstance::ExecuteAction() @ 0x559485fd88fe clang::ExecuteCompilerInvocation()
Not sure if these two are somehow related though. Looking further...
I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?
Not yet. I'm trying to extract something but thought I'd ask if you're already aware of any miscompilation problems in the meantime.
Ok, here's a reproducer:
opt --passes=slp-vectorizer slp.ll -o - -S
With this patch, the function returns the value 2. Without the patch the value returned is 0.
This can be seen with
build-all/bin/opt --passes=slp-vectorizer slp.ll -o - -S | build-all/bin/opt -O3 -o - -S
with/without this patch.
Pull this change out as a pre-commit? I can't recall if we had a reason to use getOpcode()/getAltOpcode() (and the null handling) or not.