This is an archive of the discontinued LLVM Phabricator instance.

[SLP]Improve gathering of scalar elements.
ClosedPublic

Authored by ABataev on Jun 1 2021, 6:53 AM.

Details

Summary
  1. Better sorting of scalars to be gathered. Trying to insert constants/arguments/instructions-out-of-loop at first and only then the instructions which are inside the loop. It improves hoisting of invariant insertelements instructions.
  2. Better detection of shuffle candidates in gathering function.
  3. The cost of insertelement for constants is 0.

Part of D57059.

Diff Detail

Event Timeline

ABataev created this revision.Jun 1 2021, 6:53 AM
ABataev requested review of this revision.Jun 1 2021, 6:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2021, 6:53 AM
RKSimon added inline comments.Jun 3 2021, 1:32 PM
llvm/test/Transforms/SLPVectorizer/X86/ordering-bug.ll
33 ↗(On Diff #349604)

This is annoying - SelectionDAG wastes a lot of time reordering insertelement chains patterns into ascending order to try and match buildvector patterns - can we avoid it?

ABataev updated this revision to Diff 349880.Jun 4 2021, 8:23 AM

Address comments, better analysis of gathering order.

a couple of minors - does anyone else have any comments?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4833

unnecessary () ?

4937

Pull out repeated cast<FixedVectorType>(V->getType())->getNumElements()?

ABataev updated this revision to Diff 349886.Jun 4 2021, 8:52 AM

Address comments.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4833

Checked, all () required for the correct condition.

RKSimon accepted this revision.Jun 8 2021, 11:41 AM

LGTM

This revision is now accepted and ready to land.Jun 8 2021, 11:41 AM
This revision was landed with ongoing or failed builds.Jun 9 2021, 5:24 AM
This revision was automatically updated to reflect the committed changes.

We're seeing a test failure (and true miscompile AFAICT) in the CVC4 library that bisects to this patch. I don't have a nice reduction, but I can describe the issue we see.

In Constraint::externalExplain(AssertionOrder order), we construct a NodeBuilder [1]:

NodeBuilder<> nb(kind::AND);

That constructor expands to this [2]:

inline NodeBuilder(Kind k) :
  d_nv(&d_inlineNv),
  d_nm(NodeManager::currentNM()),
  d_nvMaxChildren(nchild_thresh) {
  Assert(k != kind::NULL_EXPR && k != kind::UNDEFINED_KIND)
      << "illegal Node-building kind";

  d_inlineNv.d_id = 1; // have a kind already
  d_inlineNv.d_rc = 0;
  d_inlineNv.d_kind = expr::NodeValue::kindToDKind(k);
  d_inlineNv.d_nchildren = 0;
}

d_inlineNv is a local class data member, and d_nv by default is just a pointer to that data member (but can be reassigned to point to something heap allocated)

The fields of d_inlineNv should be zero except for d_id which is 1, and d_kind which is 21 (corresponding to kind::AND). However after this commit, the struct is initialized with poison. The IR diff we see is this:

define dso_local void @_ZNK4CVC46theory5arith10Constraint15externalExplainEm(%"class.CVC4::NodeTemplate"* noalias sret(%"class.CVC4::NodeTemplate") align 8 %agg.result, %"class.CVC4::theory::arith::Constraint"* nonnull align 8 dereferenceable(144) %this, i64 %order) local_unnamed_addr #15 align 32 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
...
-  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ <i64 1, i64 21>, %if.else69 ]
+  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ poison, %if.else69 ]
...
  %42 = bitcast %"class.CVC4::NodeBuilder.623"* %nb to <2 x i64>*
  store <2 x i64> %40, <2 x i64>* %42, align 16

(%phi.bo220 is a path never taken AFAICT)

[1] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/theory/arith/constraint.cpp#L1563
[2] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/expr/node_builder.h#L377

We're seeing a test failure (and true miscompile AFAICT) in the CVC4 library that bisects to this patch. I don't have a nice reduction, but I can describe the issue we see.

In Constraint::externalExplain(AssertionOrder order), we construct a NodeBuilder [1]:

NodeBuilder<> nb(kind::AND);

That constructor expands to this [2]:

inline NodeBuilder(Kind k) :
  d_nv(&d_inlineNv),
  d_nm(NodeManager::currentNM()),
  d_nvMaxChildren(nchild_thresh) {
  Assert(k != kind::NULL_EXPR && k != kind::UNDEFINED_KIND)
      << "illegal Node-building kind";

  d_inlineNv.d_id = 1; // have a kind already
  d_inlineNv.d_rc = 0;
  d_inlineNv.d_kind = expr::NodeValue::kindToDKind(k);
  d_inlineNv.d_nchildren = 0;
}

d_inlineNv is a local class data member, and d_nv by default is just a pointer to that data member (but can be reassigned to point to something heap allocated)

The fields of d_inlineNv should be zero except for d_id which is 1, and d_kind which is 21 (corresponding to kind::AND). However after this commit, the struct is initialized with poison. The IR diff we see is this:

define dso_local void @_ZNK4CVC46theory5arith10Constraint15externalExplainEm(%"class.CVC4::NodeTemplate"* noalias sret(%"class.CVC4::NodeTemplate") align 8 %agg.result, %"class.CVC4::theory::arith::Constraint"* nonnull align 8 dereferenceable(144) %this, i64 %order) local_unnamed_addr #15 align 32 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
...
-  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ <i64 1, i64 21>, %if.else69 ]
+  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ poison, %if.else69 ]
...
  %42 = bitcast %"class.CVC4::NodeBuilder.623"* %nb to <2 x i64>*
  store <2 x i64> %40, <2 x i64>* %42, align 16

(%phi.bo220 is a path never taken AFAICT)

[1] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/theory/arith/constraint.cpp#L1563
[2] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/expr/node_builder.h#L377

Hi, really need a reproducer and a command line. Otherwise, it is hard to understand why there is a miscompilation. This is rather strange that this patch could cause such miscompilation, I would suppose that a https://github.com/llvm/llvm-project/commit/f126e8ec2873ceedde53d2ccee1a66a83620e9a6#diff-e3b933e8c46156c0eaf7cbb67866b712f69a8484bc941d10acec9d4d17b9f061 could cause this.

We're seeing a test failure (and true miscompile AFAICT) in the CVC4 library that bisects to this patch. I don't have a nice reduction, but I can describe the issue we see.

In Constraint::externalExplain(AssertionOrder order), we construct a NodeBuilder [1]:

NodeBuilder<> nb(kind::AND);

That constructor expands to this [2]:

inline NodeBuilder(Kind k) :
  d_nv(&d_inlineNv),
  d_nm(NodeManager::currentNM()),
  d_nvMaxChildren(nchild_thresh) {
  Assert(k != kind::NULL_EXPR && k != kind::UNDEFINED_KIND)
      << "illegal Node-building kind";

  d_inlineNv.d_id = 1; // have a kind already
  d_inlineNv.d_rc = 0;
  d_inlineNv.d_kind = expr::NodeValue::kindToDKind(k);
  d_inlineNv.d_nchildren = 0;
}

d_inlineNv is a local class data member, and d_nv by default is just a pointer to that data member (but can be reassigned to point to something heap allocated)

The fields of d_inlineNv should be zero except for d_id which is 1, and d_kind which is 21 (corresponding to kind::AND). However after this commit, the struct is initialized with poison. The IR diff we see is this:

define dso_local void @_ZNK4CVC46theory5arith10Constraint15externalExplainEm(%"class.CVC4::NodeTemplate"* noalias sret(%"class.CVC4::NodeTemplate") align 8 %agg.result, %"class.CVC4::theory::arith::Constraint"* nonnull align 8 dereferenceable(144) %this, i64 %order) local_unnamed_addr #15 align 32 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
...
-  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ <i64 1, i64 21>, %if.else69 ]
+  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ poison, %if.else69 ]
...
  %42 = bitcast %"class.CVC4::NodeBuilder.623"* %nb to <2 x i64>*
  store <2 x i64> %40, <2 x i64>* %42, align 16

(%phi.bo220 is a path never taken AFAICT)

[1] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/theory/arith/constraint.cpp#L1563
[2] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/expr/node_builder.h#L377

Hi, really need a reproducer and a command line.

Understood -- this is just an initial report with some early analysis; I've occasionally gotten lucky in reporting bugs.

Otherwise, it is hard to understand why there is a miscompilation.

I fully agree. And the IR code around the change doesn't have any of the usual shufflevector/insertelement/extractelement instructions, so it doesn't look like SLP is really involved much here. But to clarify, I really do mean that it bisects to this patch -- if I build clang at 0120e6c295e42d3b9ed2cd125b1c9056a59fbcf6 (one commit prior to this), the generated IR looks correct and the test passes. My hunch is that SLP just makes some valid optimization that exposes a miscompilation elsewhere.

This is rather strange that this patch could cause such miscompilation, I would suppose that a https://github.com/llvm/llvm-project/commit/f126e8ec2873ceedde53d2ccee1a66a83620e9a6#diff-e3b933e8c46156c0eaf7cbb67866b712f69a8484bc941d10acec9d4d17b9f061 could cause this.

Reverting that patch locally does not help, but thanks for the suggestion.

We're seeing a test failure (and true miscompile AFAICT) in the CVC4 library that bisects to this patch. I don't have a nice reduction, but I can describe the issue we see.

In Constraint::externalExplain(AssertionOrder order), we construct a NodeBuilder [1]:

NodeBuilder<> nb(kind::AND);

That constructor expands to this [2]:

inline NodeBuilder(Kind k) :
  d_nv(&d_inlineNv),
  d_nm(NodeManager::currentNM()),
  d_nvMaxChildren(nchild_thresh) {
  Assert(k != kind::NULL_EXPR && k != kind::UNDEFINED_KIND)
      << "illegal Node-building kind";

  d_inlineNv.d_id = 1; // have a kind already
  d_inlineNv.d_rc = 0;
  d_inlineNv.d_kind = expr::NodeValue::kindToDKind(k);
  d_inlineNv.d_nchildren = 0;
}

d_inlineNv is a local class data member, and d_nv by default is just a pointer to that data member (but can be reassigned to point to something heap allocated)

The fields of d_inlineNv should be zero except for d_id which is 1, and d_kind which is 21 (corresponding to kind::AND). However after this commit, the struct is initialized with poison. The IR diff we see is this:

define dso_local void @_ZNK4CVC46theory5arith10Constraint15externalExplainEm(%"class.CVC4::NodeTemplate"* noalias sret(%"class.CVC4::NodeTemplate") align 8 %agg.result, %"class.CVC4::theory::arith::Constraint"* nonnull align 8 dereferenceable(144) %this, i64 %order) local_unnamed_addr #15 align 32 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
...
-  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ <i64 1, i64 21>, %if.else69 ]
+  %40 = phi <2 x i64> [ %phi.bo220, %37 ], [ poison, %if.else69 ]
...
  %42 = bitcast %"class.CVC4::NodeBuilder.623"* %nb to <2 x i64>*
  store <2 x i64> %40, <2 x i64>* %42, align 16

(%phi.bo220 is a path never taken AFAICT)

[1] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/theory/arith/constraint.cpp#L1563
[2] https://github.com/CVC4/CVC4-archived/blob/40eac7f0529176bcc8464d6c4c8804fbde628c2b/src/expr/node_builder.h#L377

Hi, really need a reproducer and a command line.

Understood -- this is just an initial report with some early analysis; I've occasionally gotten lucky in reporting bugs.

Otherwise, it is hard to understand why there is a miscompilation.

I fully agree. And the IR code around the change doesn't have any of the usual shufflevector/insertelement/extractelement instructions, so it doesn't look like SLP is really involved much here. But to clarify, I really do mean that it bisects to this patch -- if I build clang at 0120e6c295e42d3b9ed2cd125b1c9056a59fbcf6 (one commit prior to this), the generated IR looks correct and the test passes. My hunch is that SLP just makes some valid optimization that exposes a miscompilation elsewhere.

Yes, this is possible but we have to be absolutely sure. I would really appreciate it if you could provide a reproducer.

This is rather strange that this patch could cause such miscompilation, I would suppose that a https://github.com/llvm/llvm-project/commit/f126e8ec2873ceedde53d2ccee1a66a83620e9a6#diff-e3b933e8c46156c0eaf7cbb67866b712f69a8484bc941d10acec9d4d17b9f061 could cause this.

Reverting that patch locally does not help, but thanks for the suggestion.

Carrot added a subscriber: Carrot.Jun 29 2021, 9:21 PM

Compile the following test case with

opt -slp-vectorizer -S test3.ll

Then I can get a poison value in the phi instruction.

target triple = "x86_64-grtev4-linux-gnu"

%S = type { i64, i40 }

define void @foo(i1 %cond) {
entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

exit:
  ret void
}

declare void @bar()
declare void @baz()

Compile the following test case with

opt -slp-vectorizer -S test3.ll

Then I can get a poison value in the phi instruction.

target triple = "x86_64-grtev4-linux-gnu"

%S = type { i64, i40 }

define void @foo(i1 %cond) {
entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

exit:
  ret void
}

declare void @bar()
declare void @baz()

Thanks for the reproducer, will check it and fix ASAP.

Compile the following test case with

opt -slp-vectorizer -S test3.ll

Then I can get a poison value in the phi instruction.

target triple = "x86_64-grtev4-linux-gnu"

%S = type { i64, i40 }

define void @foo(i1 %cond) {
entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

exit:
  ret void
}

declare void @bar()
declare void @baz()

Compile the following test case with

opt -slp-vectorizer -S test3.ll

Then I can get a poison value in the phi instruction.

target triple = "x86_64-grtev4-linux-gnu"

%S = type { i64, i40 }

define void @foo(i1 %cond) {
entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

exit:
  ret void
}

declare void @bar()
declare void @baz()

Checked it, looks like everything is correct. You have 2 undefs in phi, they are correctly merged into a vector poison. Also, checked the transformation via alive (https://alive2.llvm.org/ce/z/q6G8TK), it is correct

The problem is exactly in the merging of two undef into poison.

In our original code, the function foo is actually a class constructor, it initializes two bit fields

entry:
   ...
   br %cond, label %then, label %bug

then:
   call baz()
   br label %bug

bug:
  word1 = load i64 bitfield1                          
  word1.clear = and i64 word1, mask1
  word1.set = or i64 word1.clear, value1            // set bitfield2
  store i64 word1.set, bitfield1                    // store bitfield1
  word2 = load i64 bitfiled2
  word2.clear = and i64 word2, mask2
  word2.set = or i64 word2.clear, value2           // set bitfield2
  store i64 word2.set, bitfield2                   // store bitfield2
  bar()
  ret

Notice that word1 is a 64b integer which contains bitfield1 and some other bitfields, 
since it is in constructor, these fields have not been initialized, so word1 is actually undef, 
LLVM doesn't recognize it because there is a function call in block %then (without precise 
alias analysis). There is similar conclusion with word2.

In GVN pass, it has the knowledge that word1 must have undef value from control flow
%entry -> %bug, so the load instruction can be deleted in this path, but not sure from the
 control flow %then -> %bug, so it must keep the load in block %then, similar analysis applies
to word2, so it changes the IR into the form 

entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

So the two undef values are actually correct and reasonable. But a poison value makes the code completely wrong.
According to https://llvm.org/docs/LangRef.html#poison-values, A poison value is a result of an erroneous operation.
There is no erroneous operation here.

jgorbe added a subscriber: jgorbe.Jun 30 2021, 11:08 AM

The problem is exactly in the merging of two undef into poison.

In our original code, the function foo is actually a class constructor, it initializes two bit fields

entry:
   ...
   br %cond, label %then, label %bug

then:
   call baz()
   br label %bug

bug:
  word1 = load i64 bitfield1                          
  word1.clear = and i64 word1, mask1
  word1.set = or i64 word1.clear, value1            // set bitfield2
  store i64 word1.set, bitfield1                    // store bitfield1
  word2 = load i64 bitfiled2
  word2.clear = and i64 word2, mask2
  word2.set = or i64 word2.clear, value2           // set bitfield2
  store i64 word2.set, bitfield2                   // store bitfield2
  bar()
  ret

Notice that word1 is a 64b integer which contains bitfield1 and some other bitfields, 
since it is in constructor, these fields have not been initialized, so word1 is actually undef, 
LLVM doesn't recognize it because there is a function call in block %then (without precise 
alias analysis). There is similar conclusion with word2.

In GVN pass, it has the knowledge that word1 must have undef value from control flow
%entry -> %bug, so the load instruction can be deleted in this path, but not sure from the
 control flow %then -> %bug, so it must keep the load in block %then, similar analysis applies
to word2, so it changes the IR into the form 

entry:
  %nb = alloca %S, align 8
  br i1 %cond, label %then, label %bug

then:
  call void @baz()
  %ptr_2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %load2 = load i64, i64* %ptr_2, align 8
  %bf_1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr_1 = bitcast i40* %bf_1 to i64*
  %load1 = load i64, i64* %ptr_1, align 8
  br label %bug

bug:
  %word1 = phi i64 [ %load1, %then ], [ undef, %entry ]
  %word2 = phi i64 [ %load2, %then ], [ undef, %entry ]
  %ptr2 = getelementptr inbounds %S, %S* %nb, i64 0, i32 0
  %word2.clear = and i64 %word2, -1152921504606846976
  %word2.set = or i64 %word2.clear, 1
  store i64 %word2.set, i64* %ptr2, align 8
  %bf1 = getelementptr inbounds %S, %S* %nb, i64 0, i32 1
  %ptr1 = bitcast i40* %bf1 to i64*
  %word1.clear = and i64 %word1, -68719476736
  %word1.set = or i64 %word1.clear, 21
  store i64 %word1.set, i64* %ptr1, align 8
  call void @bar()
  br label %exit

So the two undef values are actually correct and reasonable. But a poison value makes the code completely wrong.
According to https://llvm.org/docs/LangRef.html#poison-values, A poison value is a result of an erroneous operation.
There is no erroneous operation here.

Alive does not agree with your analysis. Also, it is not SLP who merges undefs into poison, Builder.CreateInsertElement does this magic. Also, https://llvm.org/docs/LangRef.html#poison-values says that phis do not depend on the operands, so phi is not poisoned.

To me, the code is strange. You're allocating a variable on the stack, initialize it and then just exit from the function and the variable is not used anyware. Of course optimizer optimizes it out and all operations with it. To me, looks like something wrong with the code or other transformations, if the code is correct, SLP just reveals it.

To me, the code is strange. You're allocating a variable on the stack, initialize it and then just exit from the function and the variable is not used anyware.

Are you referring to d_inlineNv? That is a class member, not a stack variable.

Of course optimizer optimizes it out and all operations with it. To me, looks like something wrong with the code or other transformations, if the code is correct, SLP just reveals it.

I still haven't ruled out that something is wrong with the code, but it looks totally correct to me. There is also nothing that ubsan/msan/asan catch, not that those are perfect though.

To me, the code is strange. You're allocating a variable on the stack, initialize it and then just exit from the function and the variable is not used anyware.

Are you referring to d_inlineNv? That is a class member, not a stack variable.

I mean, in the LLVM IR reproducer. Here is a code:

%nb = alloca %S,

You're writing/reading data to/from this local variable, this value is not going anywhere then. So, the compiler just optimizes out all the operations with this variable.

Of course optimizer optimizes it out and all operations with it. To me, looks like something wrong with the code or other transformations, if the code is correct, SLP just reveals it.

I still haven't ruled out that something is wrong with the code, but it looks totally correct to me. There is also nothing that ubsan/msan/asan catch, not that those are perfect though.

Then probably the other pass incorrectly does some transformations. Why is this data member transformed into a stack variable?

Alive does not agree with your analysis. Also, it is not SLP who merges undefs into poison, Builder.CreateInsertElement does this magic. Also, https://llvm.org/docs/LangRef.html#poison-values says that phis do not depend on the operands, so phi is not poisoned.

Thanks for the pointer, from the LLVM IR dump, the poison value was generated after SLP pass, we may dig it further.

To me, the code is strange. You're allocating a variable on the stack, initialize it and then just exit from the function and the variable is not used anyware. Of course optimizer optimizes it out and all operations with it. To me, looks like something wrong with the code or other transformations, if the code is correct, SLP just reveals it.

It is because the code snippet is extremely reduced, the original function has hundreds of blocks! I can pass the address of the stack variable to function bar() to make it more meaningful, but it does not impact the generation of poison value.

Carrot added inline comments.Jun 30 2021, 3:17 PM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4841

The problem is here.

An undef value can be a correct value. But a poison always means something is wrong. In line 4866 you created a poison value as default, an undef should overwrite the default poison value.

ABataev added inline comments.Jul 1 2021, 5:25 AM
llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4841

Modified the reproducer a bit, after this Alive detected the problem in the transformation. Will post the fix soon, thanks!