Page MenuHomePhabricator

[SLP]Alternate vectorization for cmp instructions.
ClosedPublic

Authored by ABataev on Dec 17 2021, 10:42 AM.

Details

Summary

Added support for alternate ops vectorization of the cmp instructions.
It allows to vectorize either cmp instructions with same/swapped
predicate but different (swapped) operands kinds or cmp instructions
with different predicates and compatible operands kinds.

Diff Detail

Event Timeline

ABataev created this revision.Dec 17 2021, 10:42 AM
ABataev requested review of this revision.Dec 17 2021, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 17 2021, 10:42 AM
RKSimon added inline comments.Jan 3 2022, 8:49 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
438

Pull this change out as a pre-commit? I can't recall if we had a reason to use getOpcode()/getAltOpcode() (and the null handling) or not.

2095–2096

Pull this change out as a pre-commit?

RKSimon accepted this revision.Jan 10 2022, 2:34 PM

LGTM - cheers

llvm/test/Transforms/SLPVectorizer/X86/reduction-logical.ll
190

Remove this TODO? Or move it down to one of the later tests?

This revision is now accepted and ready to land.Jan 10 2022, 2:34 PM

@ABataev Any blockers for this being committed?

@ABataev Any blockers for this being committed?

Nope, just joined a new company, need some time to configure the environment. Will commit after that.

@ABataev Any blockers for this being committed?

Nope, just joined a new company, need some time to configure the environment. Will commit after that.

Congrats! The 14.x branch is Feb 1 so ideally we need the SLP patches cooking in trunk for a few days beforehand, cheers.

This revision was landed with ongoing or failed builds.Jan 31 2022, 11:12 AM
This revision was automatically updated to reflect the committed changes.
bkramer added a subscriber: bkramer.Feb 1 2022, 2:42 AM

This makes clang crash when building SciPy. Reverted in 5281f0dab2398fdbc60fc7131f8ad2438600e7ae and test case at P8276

RKSimon reopened this revision.Feb 1 2022, 2:50 AM
This revision is now accepted and ready to land.Feb 1 2022, 2:50 AM
This revision was automatically updated to reflect the committed changes.
cmtice added a subscriber: cmtice.Feb 1 2022, 5:25 PM

Just a heads-up: This commit appears to be breaking some of our tests. We're working on trying to get a reproducible test case.

srj added a subscriber: srj.Feb 1 2022, 5:47 PM
cmtice added a comment.Feb 1 2022, 6:12 PM

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

srj added a comment.Feb 1 2022, 6:46 PM

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

I don't have a reproducible test case yet, but comparing IR before and after the change, I think what is happening is that we originally had a comparison like icmp slt i64 %VALUE, 2147483648; after this change, this gets combined into a vectorized cmp but (apparently) with the condition reversed: icmp sgt <4 x i64> %1057, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483648>, where %VALUE is in the last slot of the vector, thus giving us effectively a sgt comparison where we should have slt.

(Needless to say, this causes amusing failures, with error messages like Error: Product of extents for buffer is 16384, which exceeds the maximum size of 2147483647...)

Anyway, I don't think I can narrow this to an easily encapsulated repro case to give you tonight, but in the meantime, perhaps the nature of the apparent failure mode may help track down the issue?

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

Hi, sure, I'll revert it. Still need a reproducer to investigate and fix the bug.

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

Hi, sure, I'll revert it. Still need a reproducer to investigate and fix the bug.

I went ahead and reverted it for now to unblock things. Hopefully Halide folks can find a test case soon.

uabelho added a subscriber: uabelho.Feb 2 2022, 4:10 AM

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

Hi, sure, I'll revert it. Still need a reproducer to investigate and fix the bug.

I went ahead and reverted it for now to unblock things. Hopefully Halide folks can find a test case soon.

Thanks!

We're still working on getting you a reproducible test case, but this is definitely causing some bad codegen for us. Could you please revert it while we work on getting you the test case?

I don't have a reproducible test case yet, but comparing IR before and after the change, I think what is happening is that we originally had a comparison like icmp slt i64 %VALUE, 2147483648; after this change, this gets combined into a vectorized cmp but (apparently) with the condition reversed: icmp sgt <4 x i64> %1057, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483648>, where %VALUE is in the last slot of the vector, thus giving us effectively a sgt comparison where we should have slt.

(Needless to say, this causes amusing failures, with error messages like Error: Product of extents for buffer is 16384, which exceeds the maximum size of 2147483647...)

Anyway, I don't think I can narrow this to an easily encapsulated repro case to give you tonight, but in the meantime, perhaps the nature of the apparent failure mode may help track down the issue?

Hi, can you copy-paste a small piece of the transformed code with the vector instructions? +- 10 instructions before and after icmp sgt <4 x i64> %1057, <i64 2147483647, i64 2147483647, i64 2147483647, i64 2147483648>?

srj added a comment.Feb 2 2022, 11:14 AM

I have enclosed the smallest before-and-after cases that I've been able to narrow it down to, so far. I am going to hold off on trying to reduce it further until I hear back from you as to whether this is helpful or not.

The .ll files are what Halide produces via use with LLVM with the given commit (17a39aecd1703d94e2b1dff0dedd690cc1b0aaaf is correct behavior, 83620bd2ad867f706c699d0f2b8be10e43d9f3d7 is this commit with the incorrect behavior).

I realize this isn't very small, but I think the relevant sections are line 779 in the "before" case ( %.inv = icmp slt i64 %points_image2.total_extent.2, 2147483648), which seems to get incorrectly vectorized into a sgt in the "after" case (line 785 or so).

The .s files are the resulting assembly on x86-64. Note that we are configured to generate SSE4.1 and AVX, but *not* AVX2.

I have enclosed the smallest before-and-after cases that I've been able to narrow it down to, so far. I am going to hold off on trying to reduce it further until I hear back from you as to whether this is helpful or not.

The .ll files are what Halide produces via use with LLVM with the given commit (17a39aecd1703d94e2b1dff0dedd690cc1b0aaaf is correct behavior, 83620bd2ad867f706c699d0f2b8be10e43d9f3d7 is this commit with the incorrect behavior).

I realize this isn't very small, but I think the relevant sections are line 779 in the "before" case ( %.inv = icmp slt i64 %points_image2.total_extent.2, 2147483648), which seems to get incorrectly vectorized into a sgt in the "after" case (line 785 or so).

The .s files are the resulting assembly on x86-64. Note that we are configured to generate SSE4.1 and AVX, but *not* AVX2.

I committed a fixed version of the patch, could you try your repro with the updated compiler?

srj added a comment.Feb 2 2022, 11:39 AM

I committed a fixed version of the patch, could you try your repro with the updated compiler?

It still fails.

LLVM built with commit 287ce6b51675aee43870bebfff68bb144d1ab90e succeeds, but built with commit 842a2360a84692f2e4c37cc3e652640e6627d004 (your updated commit) still fails with the same symptom ("16384 > 2147483647").

(I haven't re-run my tests from earlier today with these new commits to see how the IR differs.)

srj added a comment.Feb 2 2022, 11:40 AM

Oh, and it pains me to say this, but we'll probably need to revert the latest change too, since it also breaks things :-(

Oh, and it pains me to say this, but we'll probably need to revert the latest change too, since it also breaks things :-(

Sure, no problem, will revert it in few minutes.

srj added a comment.EditedFeb 2 2022, 2:16 PM

Update: one of Halide's build-in tests fails reliably (for me) with these patches in place; if you pull and build Halide locally, it may make it easier for you to reproduce the failures locally. Here are the steps I think are necessary:

  • git clone https://github.com/halide/Halide.git /path/to/halide
  • cd /path/to/halide
  • git checkout master
  • export LLVM_CONFIG=/path/to/llvm-install/bin/llvm-config
  • export LLVM_DIR=/path/to/llvm-install/lib/cmake/llvm
  • export HL_TARGET=x86-64-linux-sse41 # Assumes you are on an x86-64 Linux machine w/ SSE4.1
  • make -j8 generator_aot_metadata_tester

(wait for build & test to complete)

If everything is fine, this will produce something like Success!.
If broken, this will produce something like Error: Total allocation for buffer array_outputs6_0 is 1024, which exceeds the maximum size of 2147483647 (Aborted)

Update: one of Halide's build-in tests fails reliably (for me) with these patches in place; if you pull and build Halide locally, it may make it easier for you to reproduce the failures locally. Here are the steps I think are necessary:

  • git clone https://github.com/halide/Halide.git /path/to/halide
  • cd /path/to/halide
  • git checkout master
  • export LLVM_CONFIG=/path/to/llvm-install/bin/llvm-config
  • export LLVM_DIR=/path/to/llvm-install/lib/cmake/llvm
  • export HL_TARGET=x86-64-linux-sse41 # Assumes you are on an x86-64 Linux machine w/ SSE4.1
  • make -j8 generator_aot_metadata_tester

(wait for build & test to complete)

If everything is fine, this will produce something like Success!.
If broken, this will produce something like Error: Total allocation for buffer array_outputs6_0 is 1024, which exceeds the maximum size of 2147483647 (Aborted)

Hi, thanks, already found a bug using your previous archive, it was a misprint I overlooked, will commit fixed version tomorrow.

srj added a comment.Feb 3 2022, 9:11 AM

Latest commit seems to be working! What was the nature of the typo?

Latest commit seems to be working! What was the nature of the typo?

Here was the wrong code:

CmpInst::Predicate CurrentPred = CI->getPredicate();

in function buildTree_rec. Need to use Cmp instead of CI.

srj added a comment.Feb 9 2022, 9:15 AM

Did the final version of this change make it into the LLVM 14rc branch? I don't think it did because Halide just added testing on that branch and we're seeing the same sort of errors there now. What's the protocol for getting fixes into that branch?

Did the final version of this change make it into the LLVM 14rc branch? I don't think it did because Halide just added testing on that branch and we're seeing the same sort of errors there now. What's the protocol for getting fixes into that branch?

It was reverted in 14, currently only in trunk.

It was reverted in 14, currently only in trunk.

That doesn't appear to be accurate -- the revert, https://github.com/llvm/llvm-project/commit/0c3d22a5926dd7d6fd28682683abd9d476f8e78b didn't make it into 14.x branch cut, while the initial change, https://github.com/llvm/llvm-project/commit/83620bd2ad867f706c699d0f2b8be10e43d9f3d7 did. There is no cherry-pick revert subsequently on the branch, either.

It was reverted in 14, currently only in trunk.

That doesn't appear to be accurate -- the revert, https://github.com/llvm/llvm-project/commit/0c3d22a5926dd7d6fd28682683abd9d476f8e78b didn't make it into 14.x branch cut, while the initial change, https://github.com/llvm/llvm-project/commit/83620bd2ad867f706c699d0f2b8be10e43d9f3d7 did. There is no cherry-pick revert subsequently on the branch, either.

I've filed https://github.com/llvm/llvm-project/issues/53678

SLPVectorizer patches seem extremely revert-prone, suggesting that the pass's code
is in need of a serous refactoring and in an increase of the amount of people who know that code.

It was reverted in 14, currently only in trunk.

That doesn't appear to be accurate -- the revert, https://github.com/llvm/llvm-project/commit/0c3d22a5926dd7d6fd28682683abd9d476f8e78b didn't make it into 14.x branch cut, while the initial change, https://github.com/llvm/llvm-project/commit/83620bd2ad867f706c699d0f2b8be10e43d9f3d7 did. There is no cherry-pick revert subsequently on the branch, either.

Ok, need to create a ticket to revert the patch from 14.0, it was not correct.

alexfh added a subscriber: alexfh.Feb 10 2022, 6:22 AM

I think, the last commit (ad2a0ccf8f9816fd7805138bbf582117246f5489) still has some problems. We are seeing sporadic clang crashes after this commit. I'm trying to confirm this by comparing this revision with the previous one and trying to get a test case.

I think, the last commit (ad2a0ccf8f9816fd7805138bbf582117246f5489) still has some problems. We are seeing sporadic clang crashes after this commit. I'm trying to confirm this by comparing this revision with the previous one and trying to get a test case.

Let me know if you have a reproducer, will try to fix it ASAP.

So far I have a stack trace from a release clang build:

 #0 0x000055efacbf6c29 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
 #1 0x000055efacbf7065 SignalHandler(int)
 #2 0x00007f1f25022750 __restore_rt
 #3 0x000055efa973210b (anonymous namespace)::LazyValueInfoImpl::getEdgeValue(llvm::Value*, llvm::BasicBlock*, llvm::BasicBlock*, llvm::Instruction*)
 #4 0x000055efa972db80 (anonymous namespace)::LazyValueInfoImpl::solve() (.llvm.1834014330660040286)
 #5 0x000055efa972a2dd (anonymous namespace)::LazyValueInfoImpl::getValueInBlock(llvm::Value*, llvm::BasicBlock*, llvm::Instruction*)
 #6 0x000055efa978882b runImpl(llvm::Function&, llvm::LazyValueInfo*, llvm::DominatorTree*, llvm::SimplifyQuery const&)
 #7 0x000055efa9787755 llvm::CorrelatedValuePropagationPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #8 0x000055efa97873b2 llvm::detail::PassModel<llvm::Function, llvm::CorrelatedValuePropagationPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #9 0x000055efa983d69a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#10 0x000055efa983d2b2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#11 0x000055efa999414e llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#12 0x000055efa9993b32 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#13 0x000055efa9940a64 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#14 0x000055efa99406f2 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#15 0x000055efa987de69 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#16 0x000055efa987dc52 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#17 0x000055efa98934a0 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#18 0x000055efa919ddb2 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#19 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#20 0x000055efa8f49e4a llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#21 0x000055efa8f49c92 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#22 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#23 0x000055efa8f8c260 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&)
#24 0x000055efa8f84bb2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) 
#25 0x000055efa9a4b8d1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)
#26 0x000055efa8f7e180 clang::ParseAST(clang::Sema&, bool, bool)
#27 0x000055efa90338f5 clang::FrontendAction::Execute()
#28 0x000055efa903372e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
#29 0x000055efa90335a1 clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
#30 0x000055efa9a15d58 cc1_main(llvm::ArrayRef<char const*>, char const*, void*)
#31 0x000055efa9112f7e ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&)
#32 0x000055efa98b2a2d main

... and an assertion from a checked clang build, but it has a different stack trace:

assert.h assertion failed at llvm/include/llvm/Support/Casting.h:262 in typename cast_retty<X, Y>::ret_type llvm::cast(Y &) [X = clang::CodeGen::EHCleanupScope, Y = clang::CodeGen::EHScope]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
    @     0x55948ac83344  __assert_fail
    @     0x559486524fb0  clang::CodeGen::CodeGenFunction::DeactivateCleanupBlock()
    @     0x55948640de42  clang::CodeGen::CodeGenFunction::EmitCall()
    @     0x559486421016  clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall()
    @     0x559486420429  clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall()
    @     0x5594865d9cb8  clang::CodeGen::CodeGenFunction::EmitCXXConstructExpr()
    @     0x5594865cb013  (anonymous namespace)::AggExprEmitter::VisitCXXConstructExpr()  
    @     0x5594865c6ff6  clang::CodeGen::CodeGenFunction::EmitAggExpr() 
    @     0x5594865dbb0b  StoreAnyExprIntoOneUnit()  
    @     0x5594865de49d  clang::CodeGen::CodeGenFunction::EmitCXXNewExpr()
    @     0x55948646006a  (anonymous namespace)::ScalarExprEmitter::VisitExprWithCleanups()
    @     0x5594864507ab  clang::CodeGen::CodeGenFunction::EmitScalarExpr()
    @     0x55948642ae09  clang::CodeGen::CodeGenFunction::EmitAnyExpr()
    @     0x55948642ada1  clang::CodeGen::CodeGenFunction::EmitIgnoredExpr()
    @     0x5594864c8790  clang::CodeGen::CodeGenFunction::EmitStmt()
    @     0x5594864d4910  clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope()
    @     0x5594866e9ed3  clang::CodeGen::CodeGenFunction::EmitFunctionBody()
    @     0x5594866eac9e  clang::CodeGen::CodeGenFunction::GenerateCode()
    @     0x55948670a722  clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition()
    @     0x559486702e90  clang::CodeGen::CodeGenModule::EmitGlobalDefinition()
    @     0x5594866f734f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f587c  clang::CodeGen::CodeGenModule::Release()
    @     0x559486827989  (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit()
    @     0x5594863da495  clang::BackendConsumer::HandleTranslationUnit()
    @     0x55948718fa94  clang::ParseAST()
    @     0x559486f07443  clang::FrontendAction::Execute()
    @     0x559486e5d3bf  clang::CompilerInstance::ExecuteAction()
    @     0x559485fd88fe  clang::ExecuteCompilerInvocation()

Not sure if these two are somehow related though. Looking further...

So far I have a stack trace from a release clang build:

 #0 0x000055efacbf6c29 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
 #1 0x000055efacbf7065 SignalHandler(int)
 #2 0x00007f1f25022750 __restore_rt
 #3 0x000055efa973210b (anonymous namespace)::LazyValueInfoImpl::getEdgeValue(llvm::Value*, llvm::BasicBlock*, llvm::BasicBlock*, llvm::Instruction*)
 #4 0x000055efa972db80 (anonymous namespace)::LazyValueInfoImpl::solve() (.llvm.1834014330660040286)
 #5 0x000055efa972a2dd (anonymous namespace)::LazyValueInfoImpl::getValueInBlock(llvm::Value*, llvm::BasicBlock*, llvm::Instruction*)
 #6 0x000055efa978882b runImpl(llvm::Function&, llvm::LazyValueInfo*, llvm::DominatorTree*, llvm::SimplifyQuery const&)
 #7 0x000055efa9787755 llvm::CorrelatedValuePropagationPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #8 0x000055efa97873b2 llvm::detail::PassModel<llvm::Function, llvm::CorrelatedValuePropagationPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #9 0x000055efa983d69a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#10 0x000055efa983d2b2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#11 0x000055efa999414e llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#12 0x000055efa9993b32 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#13 0x000055efa9940a64 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#14 0x000055efa99406f2 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#15 0x000055efa987de69 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#16 0x000055efa987dc52 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#17 0x000055efa98934a0 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#18 0x000055efa919ddb2 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#19 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#20 0x000055efa8f49e4a llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#21 0x000055efa8f49c92 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#22 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#23 0x000055efa8f8c260 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&)
#24 0x000055efa8f84bb2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) 
#25 0x000055efa9a4b8d1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)
#26 0x000055efa8f7e180 clang::ParseAST(clang::Sema&, bool, bool)
#27 0x000055efa90338f5 clang::FrontendAction::Execute()
#28 0x000055efa903372e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
#29 0x000055efa90335a1 clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
#30 0x000055efa9a15d58 cc1_main(llvm::ArrayRef<char const*>, char const*, void*)
#31 0x000055efa9112f7e ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&)
#32 0x000055efa98b2a2d main

... and an assertion from a checked clang build, but it has a different stack trace:

assert.h assertion failed at llvm/include/llvm/Support/Casting.h:262 in typename cast_retty<X, Y>::ret_type llvm::cast(Y &) [X = clang::CodeGen::EHCleanupScope, Y = clang::CodeGen::EHScope]: isa<X>(Val) && "cast<Ty>() argument of incompatible type!"
    @     0x55948ac83344  __assert_fail
    @     0x559486524fb0  clang::CodeGen::CodeGenFunction::DeactivateCleanupBlock()
    @     0x55948640de42  clang::CodeGen::CodeGenFunction::EmitCall()
    @     0x559486421016  clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall()
    @     0x559486420429  clang::CodeGen::CodeGenFunction::EmitCXXConstructorCall()
    @     0x5594865d9cb8  clang::CodeGen::CodeGenFunction::EmitCXXConstructExpr()
    @     0x5594865cb013  (anonymous namespace)::AggExprEmitter::VisitCXXConstructExpr()  
    @     0x5594865c6ff6  clang::CodeGen::CodeGenFunction::EmitAggExpr() 
    @     0x5594865dbb0b  StoreAnyExprIntoOneUnit()  
    @     0x5594865de49d  clang::CodeGen::CodeGenFunction::EmitCXXNewExpr()
    @     0x55948646006a  (anonymous namespace)::ScalarExprEmitter::VisitExprWithCleanups()
    @     0x5594864507ab  clang::CodeGen::CodeGenFunction::EmitScalarExpr()
    @     0x55948642ae09  clang::CodeGen::CodeGenFunction::EmitAnyExpr()
    @     0x55948642ada1  clang::CodeGen::CodeGenFunction::EmitIgnoredExpr()
    @     0x5594864c8790  clang::CodeGen::CodeGenFunction::EmitStmt()
    @     0x5594864d4910  clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope()
    @     0x5594866e9ed3  clang::CodeGen::CodeGenFunction::EmitFunctionBody()
    @     0x5594866eac9e  clang::CodeGen::CodeGenFunction::GenerateCode()
    @     0x55948670a722  clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition()
    @     0x559486702e90  clang::CodeGen::CodeGenModule::EmitGlobalDefinition()
    @     0x5594866f734f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f737f  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x5594866f587c  clang::CodeGen::CodeGenModule::Release()
    @     0x559486827989  (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit()
    @     0x5594863da495  clang::BackendConsumer::HandleTranslationUnit()
    @     0x55948718fa94  clang::ParseAST()
    @     0x559486f07443  clang::FrontendAction::Execute()
    @     0x559486e5d3bf  clang::CompilerInstance::ExecuteAction()
    @     0x559485fd88fe  clang::ExecuteCompilerInvocation()

Not sure if these two are somehow related though. Looking further...

Yeah, currently it does not look like related to SLP at all.

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

Do you have a reproducer or something like this?

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

Do you have a reproducer or something like this?

Not yet. I'm trying to extract something but thought I'd ask if you're already aware of any miscompilation problems in the meantime.

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

Do you have a reproducer or something like this?

Not yet. I'm trying to extract something but thought I'd ask if you're already aware of any miscompilation problems in the meantime.

No.

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

Do you have a reproducer or something like this?

Not yet. I'm trying to extract something but thought I'd ask if you're already aware of any miscompilation problems in the meantime.

No.

Ok, here's a reproducer:

opt --passes=slp-vectorizer slp.ll -o - -S

With this patch, the function returns the value 2. Without the patch the value returned is 0.
This can be seen with

build-all/bin/opt --passes=slp-vectorizer slp.ll -o - -S | build-all/bin/opt -O3 -o - -S

with/without this patch.

I think I'm seeing a miscompile with this patch on trunk. Are we aware of any such problems?

Do you have a reproducer or something like this?

Not yet. I'm trying to extract something but thought I'd ask if you're already aware of any miscompilation problems in the meantime.

No.

Ok, here's a reproducer:

opt --passes=slp-vectorizer slp.ll -o - -S

With this patch, the function returns the value 2. Without the patch the value returned is 0.
This can be seen with

build-all/bin/opt --passes=slp-vectorizer slp.ll -o - -S | build-all/bin/opt -O3 -o - -S

with/without this patch.

Fix https://reviews.llvm.org/D119855