This is an archive of the discontinued LLVM Phabricator instance.

DAGCombiner bugfix in MergeStoresOfConstantsOrVecElts()
AbandonedPublic

Authored by jonpa on Dec 4 2017, 8:18 AM.

Details

Summary

A csmith test program that contained among other things adjacent stores of 16bit immediates:

          t153: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 1)](tbaa=<0x2aa54f17a48>)> t151, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 128, undef:i64
        t155: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 2)](tbaa=<0x2aa54f17a48>)> t153, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 130, undef:i64
      t157: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 3)](tbaa=<0x2aa54f17a48>)> t155, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 132, undef:i64
    t159: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 4)](tbaa=<0x2aa54f17a48>)> t157, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 134, undef:i64
  t161: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 5)](tbaa=<0x2aa54f17a48>)> t159, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 136, undef:i64
t163: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 6)](tbaa=<0x2aa54f17a48>)> t161, Constant:i16<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 138, undef:i64

-> Type-legalized selection DAG:
(i16 is not a legal type in the SystemZ backend, so the stores are converted to truncating stores of -7 of type i32.)

       t362: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 8, i64 5)](tbaa=<0x2aa54f17a48>), trunc to i16> t361, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 122, undef:i64
        t363: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 8, i64 6)](tbaa=<0x2aa54f17a48>), trunc to i16> t362, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 124, undef:i64
      t364: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 0)](tbaa=<0x2aa54f17a48>), trunc to i16> t363, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 126, undef:i64
    t365: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 1)](tbaa=<0x2aa54f17a48>), trunc to i16> t364, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 128, undef:i64
  t366: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 2)](tbaa=<0x2aa54f17a48>), trunc to i16> t365, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 130, undef:i64
t367: ch = store<ST2[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 3)](tbaa=<0x2aa54f17a48>), trunc to i16> t366, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 132, undef:i64

-> Optimized type-legalized selection DAG: BB

DAGCombiner merges stores but they will store -7, wich is wrong:

t398: ch = store<ST4[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 4)](align=2)> t348, Constant:i32<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 134, undef:i64
t399: ch = store<ST8[getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @g_112, i64 0, i64 9, i64 0)](align=2)> t348, Constant:i64<-7>, GlobalAddress:i64<[10 x [7 x i16]]* @g_112> + 126, undef:i64

This patch seems to fix the problem by fixing the way the new constant is built for a truncating store.

Diff Detail

Event Timeline

jonpa created this revision.Dec 4 2017, 8:18 AM
spatel edited edge metadata.

It's not clear to me what it takes to reveal the bug, but it shouldn't be (much?) more than a series of stores. Can you reduce the test further than this? Also, can FP constant stores have the same problem?

@x = internal unnamed_addr global i16 1, align 2
@y = internal unnamed_addr global [10 x [7 x i16]] zeroinitializer

define signext i32 @bad_merge(i16* %x) {
  %ld = load i16, i16* @x, align 2
  %or = or i16 %ld, 32606
  store i16 %or, i16* @x, align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 0, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 1, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 2, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 3, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 4, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 5, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 6, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 7, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 8, i64 6), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 0), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 1), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 2), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 3), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 4), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 5), align 2
  store i16 -7, i16* getelementptr inbounds ([10 x [7 x i16]], [10 x [7 x i16]]* @y, i64 0, i64 9, i64 6), align 2
  ret i32 0
}
spatel added a comment.Dec 4 2017, 3:33 PM

Did this bug start after r319036 (D33675)?

jonpa updated this revision to Diff 125482.Dec 5 2017, 1:49 AM

Test case reduced further.

It's not clear to me what it takes to reveal the bug, but it shouldn't be (much?) more than a series of stores. Can you reduce the test further than this?

thanks - yes I could get it down a bit further with bugpoint.

Also, can FP constant stores have the same problem?

Good question... This would happen if the original stores were truncating wider FP constants. That seems much less likely to me, but I am not sure.

Did this bug start after r319036?

This problem disappeared at least on the test case when I reverted that patch, so yes, this seems to be the cause (It also explains why I all of a sudden got some 10 csmith wrong-code results relating to this recently...)

spatel added a comment.Dec 5 2017, 6:17 AM

Did this bug start after r319036?

This problem disappeared at least on the test case when I reverted that patch, so yes, this seems to be the cause (It also explains why I all of a sudden got some 10 csmith wrong-code results relating to this recently...)

Sorry I hadn't seen it until now, but D40701 is a superset of this fix. I think we can abandon this (though we might want to include a SystemZ test for extra coverage).

jonpa abandoned this revision.Dec 6 2017, 4:41 AM

Did this bug start after r319036?

This problem disappeared at least on the test case when I reverted that patch, so yes, this seems to be the cause (It also explains why I all of a sudden got some 10 csmith wrong-code results relating to this recently...)

Sorry I hadn't seen it until now, but D40701 is a superset of this fix. I think we can abandon this (though we might want to include a SystemZ test for extra coverage).

I don't see this fix yet in the tree... What is holding it back?

OK, I trust you then to commit the other fix and if you want also include the test case here for SystemZ.

spatel added a comment.Dec 6 2017, 6:44 AM

Did this bug start after r319036?

This problem disappeared at least on the test case when I reverted that patch, so yes, this seems to be the cause (It also explains why I all of a sudden got some 10 csmith wrong-code results relating to this recently...)

Sorry I hadn't seen it until now, but D40701 is a superset of this fix. I think we can abandon this (though we might want to include a SystemZ test for extra coverage).

I don't see this fix yet in the tree... What is holding it back?

It was approved and committed, but there were other failures, so it was reverted and being investigated.

OK, I trust you then to commit the other fix and if you want also include the test case here for SystemZ.

Not sure if this was to me. I'm not the author of the patch; @niravd is working on it from what I can see (and should see this discussion too).

niravd edited edge metadata.Dec 6 2017, 7:56 AM

D40701 has landed so this should be set. Can you commit this test case?

jonpa added a comment.Dec 7 2017, 2:14 AM

D40701 has landed so this should be set. Can you commit this test case?

It's so similar to arm-storebytesmerge.ll, so it seems unnecessary.