This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Revisit user of newly one-use instructions
ClosedPublic

Authored by nikic on May 31 2023, 7:50 AM.

Details

Summary

Many folds in InstCombine are limited to one-use instructions. For that reason, if the use-count of an instruction drops to one, it makes sense to revisit that one user. This is one of the most common reasons why InstCombine fails to finish in a single iteration.

Doing this revisit actually slightly improves compile-time (http://llvm-compile-time-tracker.com/compare.php?from=97f0e7b06e6b76fd85fb81b8c12eba2255ff1742&to=fc740dee13f42a948b378d731e666d0a80d16061&stat=instructions:u), because we save an extra InstCombine iteration in enough cases to make a visible difference.

This is conceptually NFC, but not NFC in practice, because differences in worklist order can result in slightly different folding behavior.

Diff Detail

Event Timeline

nikic created this revision.May 31 2023, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2023, 7:50 AM
nikic requested review of this revision.May 31 2023, 7:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2023, 7:50 AM
nikic added inline comments.May 31 2023, 8:08 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
55

This is a bit unfortunate, but I believe it's ultimately not a problem. The two lshrs here are the same instruction and will be CSEd, at which point this reduces to the same IR. Similar for the test below.

I double checked the worklist order, and the new one is "more correct" (i.e. follows instruction order more closely), but happens to produce worse results in this instance.

goldstein.w.n added inline comments.Jun 7 2023, 2:01 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80

This one seems to keep the worse codegen (full -O3 pipeline):
Before:

define i32 @multiuse2(i32 %x) {
  %i = shl i32 %x, 1
  %i6 = shl i32 %x, 8
  %i10 = and i32 %i6, 32256
  %i12 = and i32 %i, 252
  %i13 = or i32 %i10, %i12
  ret i32 %i13
}

After:

define i32 @multiuse2(i32 %x) {
  %i = shl i32 %x, 1
  %i6 = and i32 %x, 96
  %i7 = shl nuw nsw i32 %i6, 8
  %i8 = shl nuw nsw i32 %i6, 1
  %i14 = shl i32 %x, 8
  %i9 = and i32 %i14, 7680
  %i10 = or i32 %i7, %i9
  %1 = and i32 %i, 60
  %i12 = or i32 %i8, %1
  %i13 = or i32 %i10, %i12
  ret i32 %i13
}

Its not a dealbreaker imo, but maybe leave a TODO on the fold the implements it indicating the combine is missing a case.

RKSimon added inline comments.Jun 7 2023, 6:27 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80

@nikic Are you intending to look at this regression in a followup?

nikic added inline comments.Jun 9 2023, 6:45 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80

We're missing this fold: https://alive2.llvm.org/ce/z/ZXvVh-

goldstein.w.n added inline comments.Jun 9 2023, 11:03 AM
llvm/test/Transforms/InstCombine/or-shifted-masks.ll
80

See: D152568 to fix that.

RKSimon accepted this revision.Jun 12 2023, 6:04 AM

LGTM @goldstein.w.n any other comments?

This revision is now accepted and ready to land.Jun 12 2023, 6:04 AM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.Jun 14 2023, 3:21 AM

@nikic this is causing an infinite loop in opt -passes=instcombine, test case:

define void @main(<4 x float> %arg) {
bb:
  %i = shufflevector <4 x float> %arg, <4 x float> zeroinitializer, <2 x i32> <i32 2, i32 3>
  %i1 = bitcast <2 x float> %i to <2 x i32>
  %i2 = shufflevector <2 x i32> %i1, <2 x i32> zeroinitializer, <4 x i32> <i32 0, i32 1, i32 poison, i32 poison>
  %i3 = extractelement <4 x i32> %i2, i32 0
  %i4 = insertelement <4 x i32> zeroinitializer, i32 %i3, i32 0
  %i5 = shufflevector <4 x i32> %i4, <4 x i32> zeroinitializer, <2 x i32> <i32 1, i32 0>
  %i6 = bitcast <2 x i32> %i5 to <2 x float>
  %i7 = call <2 x float> (...) null(<2 x float> %i6, <2 x float> zeroinitializer, <2 x float> zeroinitializer)
  ret void
}
nikic added a comment.Jun 14 2023, 5:59 AM

@foad Thanks for the report! This should be fixed with https://github.com/llvm/llvm-project/commit/57a8ea85538503a35d1e04fd8c8ba32aa2ba3f2a. Let me know if you encounter any further issues.

foad added a comment.Jun 14 2023, 6:20 AM

@foad Thanks for the report! This should be fixed with https://github.com/llvm/llvm-project/commit/57a8ea85538503a35d1e04fd8c8ba32aa2ba3f2a. Let me know if you encounter any further issues.

Thanks! I've also verified that it fixes my original (unreduced) test case.