This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Add X / X -> 1 & X % X -> 0 folds.
ClosedPublic

Authored by RKSimon on Aug 13 2018, 7:20 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Aug 13 2018, 7:20 AM
jonpa added a comment.Aug 13 2018, 8:22 AM

Not sure if I'm doing something wrong (I use git and tried first patch -p0, and then also with dos2unix, but then that didn't work so I had to write the patch manually although just a few lines...)

This did not give any change at all on benchmarks on SystemZ (NFC), and the SystemZ test succeeds with or without the patch...?

Not sure if I'm doing something wrong (I use git and tried first patch -p0, and then also with dos2unix, but then that didn't work so I had to write the patch manually although just a few lines...)

Sorry - I've never worked out why the upload always screws up my line endings.....

This did not give any change at all on benchmarks on SystemZ (NFC), and the SystemZ test succeeds with or without the patch...?

Are the x86 tests changing? They contain explicit tests for this fold

jonpa added a comment.Aug 13 2018, 8:33 AM

Not sure if I'm doing something wrong (I use git and tried first patch -p0, and then also with dos2unix, but then that didn't work so I had to write the patch manually although just a few lines...)

Sorry - I've never worked out why the upload always screws up my line endings.....

This did not give any change at all on benchmarks on SystemZ (NFC), and the SystemZ test succeeds with or without the patch...?

Are the x86 tests changing? They contain explicit tests for this fold

Yes, change in x86 test (at least the one I tried)...

RKSimon added inline comments.Aug 13 2018, 10:07 AM
test/CodeGen/SystemZ/pr32372.ll
14

Tried again against bleeding edge trunk and I still see this change and without the volatile (which prevents the loads being combined) the checks reduce to:

define void @pr32372(i8*) {
; CHECK-LABEL: pr32372:
; CHECK:       # %bb.0: # %BB
; CHECK-NEXT:    mvhhi 0(%r1), -3825
; CHECK-NEXT:  .LBB0_1: # %CF251
; CHECK-NEXT:    # =>This Inner Loop Header: Depth=1
; CHECK-NEXT:    j .LBB0_1
craig.topper added inline comments.Aug 13 2018, 8:46 PM
test/CodeGen/X86/combine-srem.ll
455–456

Should these test IR changes be committed separately?

jonpa added inline comments.Aug 14 2018, 12:12 AM
test/CodeGen/SystemZ/pr32372.ll
14

Yes, I see the change after updating the test, but also note that this updated test passes also without the patch... So I am not sure why the test is changed if the patch does not affect it...

craig.topper added inline comments.Aug 14 2018, 12:23 AM
test/CodeGen/SystemZ/pr32372.ll
14

I think this patch affects the original version of the test. Without the volatile the two loads get CSEd to the same SDNode. This causes the same SDNode to be both operands of the urem. This patch would simplify that to 0. Simon added the volatile to prevent the loads from being CSEd. This way the urem won't get affected by this patch. So the new version should pass on trunk as well.

jonpa added inline comments.Aug 14 2018, 12:26 AM
test/CodeGen/SystemZ/pr32372.ll
14

Ah, I see. Thank you for the explanation.

I'm happy to apply the test changes beforehand once everybody is happy with the patch - but I'll keep them in the diff for now to make its effects clear.

test/CodeGen/SystemZ/pr32372.ll
14

Sorry - I missed that you were applying the test changes before the code changes!

The DAGCombiner changes and the X86 test changes all look good to me.

jonpa added a comment.EditedAug 15 2018, 3:42 AM

The origin for that test case I believe is that I had found a crash during testing and I reported this as PR32372.

The test changes here look OK to me, but when I reverted the fix for that original bug it seems the new test does no longer fail, while the unmodified one still does... :-/

Unfortunately, I don't seem to have the unreduced test failure available, so I can't make a new reduced version or even try if the original test case still fails...

Not sure if we should remove this test then, or keep it in case it may trigger somehow in the future...? Of course, the assert that the test used to trigger is still there, so removing it isn't all that unthinkable to me...

I tried some other experiments to the test, such as having two i8* arguments - one for each load - but that did not help...

RKSimon updated this revision to Diff 162547.Aug 25 2018, 7:28 AM

Rebased patch

@jonpa, it looks like changing the urem to this will trigger the original bug and avoid the optimization Simon is adding here.

%B8 = call { i8, i1 } @llvm.umul.with.overflow.i8(i8 %L5, i8 %L)
%B9 = extractvalue {i8, i1} %B8, 0

it looks like changing the urem to this will trigger the original bug and avoid the optimization Simon is adding here.

Ah, nice :-) I tried it, and just like you found that it triggers the original bug. I updated the test, patch is:

RKSimon updated this revision to Diff 162822.Aug 28 2018, 3:51 AM

Applied @jonpa's test fix.

@craig.topper I also made a fix to vector-idiv-v2i32.ll which I think was a typo?

This revision is now accepted and ready to land.Aug 28 2018, 3:42 PM
This revision was automatically updated to reflect the committed changes.