This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Insert instructions before adding them to worklist
ClosedPublic

Authored by kuhar on Dec 5 2019, 2:50 PM.

Details

Summary

This patch adds instructions to the InstCombine worklist after they are properly inserted. This way we don't get <badref>s printed when logging added instructions.
It also adds a check in Worklist::Add that ensures that all added instructions have parents.

Simple test case that illustrates the difference when run with --debug-only=instcombine:

define i32 @test35(i32 %a, i32 %b) {
  %1 = or i32 %a, 1135
  %2 = or i32 %1, %b
  ret i32 %2
}

Before this patch:

INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   <badref> = or i32 %2, 1135
...

With this patch:

INSTCOMBINE ITERATION #1 on test35
IC: ADDING: 3 instrs to worklist
IC: Visiting:   %1 = or i32 %a, 1135
IC: Visiting:   %2 = or i32 %1, %b
IC: ADD:   %2 = or i32 %a, %b
IC: Old =   %3 = or i32 %1, %b
    New =   <badref> = or i32 %2, 1135
IC: ADD:   %3 = or i32 %2, 1135
...

Diff Detail

Event Timeline

kuhar created this revision.Dec 5 2019, 2:50 PM
Herald added a project: Restricted Project. · View Herald Transcript
davide added a comment.Dec 5 2019, 3:15 PM

This looks fine, I wonder whether it's worth testing [how easy it's to test, that is]

spatel added inline comments.Dec 6 2019, 6:08 AM
llvm/lib/Transforms/InstCombine/InstructionCombining.cpp
3347

Remove the print of 'New' here?
I think we always produce redundant debug output for the Result instruction, so we still get something like this with <badref>:

IC: Old =   %t2 = insertelement <4 x float> %t1, float 2.000000e+00, i32 2
    New =   <badref> = shufflevector <4 x float> %b, <4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
IC: ADD:   %t2 = shufflevector <4 x float> %b, <4 x float> <float undef, float 1.000000e+00, float 2.000000e+00, float undef>, <4 x i32> <i32 0, i32 5, i32 6, i32 3>
kuhar added a comment.Dec 6 2019, 8:15 AM

This looks fine, I wonder whether it's worth testing [how easy it's to test, that is]

I'd rather not create that rely on debug messages. As an alternative, we could add an assertion in Worklist::Add checking if the added instruction has a parent. What do you think?

kuhar updated this revision to Diff 232572.Dec 6 2019, 8:18 AM
kuhar marked an inline comment as done.

I'm not convinced this is an improvement overall.

As a concrete point: we will no longer print the 'new' if said 'new' isn't a new instruction;
as in, printing of new will now depend on the fact that it will be added into worklist,
which won't be the case if we returned preexisting instruction.

I'm not convinced this is an improvement overall.

As a concrete point: we will no longer print the 'new' if said 'new' isn't a new instruction;
as in, printing of new will now depend on the fact that it will be added into worklist,
which won't be the case if we returned preexisting instruction.

If we return the existing instruction, don't we always fall down to the else at line 3372 and print it there?

I'm not convinced this is an improvement overall.

As a concrete point: we will no longer print the 'new' if said 'new' isn't a new instruction;
as in, printing of new will now depend on the fact that it will be added into worklist,
which won't be the case if we returned preexisting instruction.

If we return the existing instruction, don't we always fall down to the else at line 3372 and print it there?

No, because we only get there if we return *the same* instruction we just visited.
I'm talking about the case like:

%x = mul i8 %y, 42
%t0 = sub i8 0, %x
%t1 = sub i8 0, %t0

visit %t1
... returned %x
%x != %t1

I'm not convinced this is an improvement overall.

As a concrete point: we will no longer print the 'new' if said 'new' isn't a new instruction;
as in, printing of new will now depend on the fact that it will be added into worklist,
which won't be the case if we returned preexisting instruction.

If we return the existing instruction, don't we always fall down to the else at line 3372 and print it there?

No, because we only get there if we return *the same* instruction we just visited.
I'm talking about the case like:

%x = mul i8 %y, 42
%t0 = sub i8 0, %x
%t1 = sub i8 0, %t0

visit %t1
... returned %x
%x != %t1

Ah, disregard my suggestion then. Sorry for the noise. I don't have strong feelings about the debug spew, so feel free to proceed with whatever the consensus leads to.

kuhar updated this revision to Diff 232605.Dec 6 2019, 10:29 AM

I added an assertion checking if the instructions added have a parent (block).
Also restored the previous debug message.

For the context, I find this invariant useful for 2 reasons:

  • as mentioned, we don't print badrefs
  • I'm experimenting with new InstCombine iteration strategies geared towards GPU code and more local rewrites. To know control visitation better, I need to know which basic blocks to re-visit.
kuhar added a comment.Dec 13 2019, 8:15 AM

Ping.

Do you have any other feedback? @spatel @lebedev.ri

Ping.

Do you have any other feedback? @spatel @lebedev.ri

LGTM, but better see if @lebedev.ri concurs.
Do you have a small IR example with before/after output that we can put here in the review? That way there's at least a record of what the improvement looks like.

kuhar edited the summary of this revision. (Show Details)Dec 13 2019, 9:29 AM

Do you have a small IR example with before/after output that we can put here in the review? That way there's at least a record of what the improvement looks like.

Sure, I added one to the description.

I'm sorry, i'm not following the patch's logic anymore.
First it was about relying on Add() printing ADD: line, and now
all the printing stuff seems to be gone and the patch is adding some
invariants on when the instruction should be inserted into worklist?

Does the current diff still achieve the same initial goal of not printing badref?
How? If not, the description is not longer applicable, and post a new patch instead?

Apologies if i'm horribly misreading the patch?

llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

The described behavior sounds like a bug to me.
More importantly, it's not related to the printing changes?

I'm sorry, i'm not following the patch's logic anymore.
First it was about relying on Add() printing ADD: line, and now
all the printing stuff seems to be gone and the patch is adding some
invariants on when the instruction should be inserted into worklist?

Since the beginning, the patch was about ensuring that the instructions added to the worklist are properly inserted. One manifestation of instructions not being correctly inserted that was easy to observe was debug ADD lines with <badrefs> being printed.
One confusing thing that happened was that I followed a suggestion of reducing the amount of debug output printed and removing the line containing NEW that directly preceded ADD. But this change is no more and the amount of the debug output is the same.

Does the current diff still achieve the same initial goal of not printing badref?

Yes.

How? If not, the description is not longer applicable, and post a new patch instead?

By ensuring that instructions are inserted. I moved adding instructions later in one place in InstCombine that solves the issue.
In addition, the patch asserts that all inserted instructions have a parent (block). This is stronger than making debug logs pretty, but serves the same goal of ensuring instructions are properly inserted.

Apologies if i'm horribly misreading the patch?

I think having many iterations made is a bit messy. I also may have simplified the original description of the patch, and though that explaining the change in terms of printing will make it simpler to understand. I apologize and will try to make future patches like this be described more directly.

kuhar marked an inline comment as done.Dec 13 2019, 10:35 AM
kuhar added inline comments.
llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

The users that I refer to here are new instructions that are being constructed; InstCombine inserts them in subsequent calls to Add(). This happens in a few places and it's not obvious to me make it more straight-forward.
With the changes to the worklist here, everything works like before and the order of iteration is exactly the same.

Let me know if you want me to dig up some concrete examples for this behavior here.

...

Okay.
Thank you for detailed responses.

I'm mainly confused because, as per the patch's description:

IC: Old = %3 = or i32 %1, %b

New =   <badref> = or i32 %2, 1135

But i think now the patch as it is now is okay,
if @spatel accepted i will not block this any further.

nikic added a subscriber: nikic.Dec 14 2019, 1:33 AM
nikic added inline comments.
llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

I'd appreciate an example for this. I can see how it can happen in principle, but not really why we'd write the code that way in practice.

kuhar marked 3 inline comments as done.Dec 17 2019, 7:58 AM
kuhar added inline comments.
llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

Consider the test llvm/test/Transforms/InstCombine/zext-or-icmp.ll, function zext_or_icmp_icmp:

define i8 @zext_or_icmp_icmp(i8 %a, i8 %b) {
  %mask = and i8 %a, 1
  %toBool1 = icmp eq i8 %mask, 0
  %toBool2 = icmp eq i8 %b, 0
  %bothCond = or i1 %toBool1, %toBool2
  %zext = zext i1 %bothCond to i8
  ret i8 %zext
}

Instcombine starts like this:

INSTCOMBINE ITERATION #1 on zext_or_icmp_icmp
IC: ADDING: 6 instrs to worklist
IC: Visiting:   %mask = and i8 %a, 1
IC: Visiting:   %toBool1 = icmp eq i8 %mask, 0
IC: Visiting:   %toBool2 = icmp eq i8 %b, 0
IC: Visiting:   %bothCond = or i1 %toBool1, %toBool2
IC: Visiting:   %zext = zext i1 %bothCond to i8
IC: ADD:   %toBool11 = zext i1 %toBool1 to i8
IC: ADD:   %toBool22 = zext i1 %toBool2 to i8
IC: ADD:   %1 = xor i8 %mask, 1

Instcombine wants to replace all uses of %toBool11 = zext i1 %toBool1 to i8 with %1 = xor i8 %mask, 1. One of the uses is <badref> = or i8 %toBool11, %toBool22:

IC: Replacing   %toBool11 = zext i1 %toBool1 to i8
    with   %1 = xor i8 %mask, 1
IC: Old =   %zext = zext i1 %bothCond to i8
    New =   <badref> = or i8 %1, %toBool22
IC: ADD:   %zext = or i8 %1, %toBool22

The way I understand the issue is that one of the uses is the instructions that currently being simplified. This instruction will be added to the worklist anyway once the simplification is finished.

kuhar marked an inline comment as done.Dec 17 2019, 8:02 AM
nikic added inline comments.Dec 17 2019, 12:13 PM
llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

Thanks for the example. This transform seems to be a pretty special case that manually runs other InstCombine transforms to prevent an infinite loop.

Assuming that this is the only case (at least I didn't get anything else running InstCombine tests), possibly we might just want to adjust the transform to insert the instruction earlier? Something along these lines:

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
index 5112fb1a6c3..70a7b9bfbba 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCasts.cpp
@@ -1189,7 +1189,8 @@ Instruction *InstCombiner::visitZExt(ZExtInst &CI) {
       // zext (or icmp, icmp) -> or (zext icmp), (zext icmp)
       Value *LCast = Builder.CreateZExt(LHS, CI.getType(), LHS->getName());
       Value *RCast = Builder.CreateZExt(RHS, CI.getType(), RHS->getName());
-      BinaryOperator *Or = BinaryOperator::Create(Instruction::Or, LCast, RCast);
+      Value *Or = Builder.CreateOr(LCast, RCast);
+      Builder.SetInsertPoint(cast<Instruction>(Or));
 
       // Perform the elimination.
       if (auto *LZExt = dyn_cast<ZExtInst>(LCast))
@@ -1197,7 +1198,7 @@ Instruction *InstCombiner::visitZExt(ZExtInst &CI) {
       if (auto *RZExt = dyn_cast<ZExtInst>(RCast))
         transformZExtICmp(RHS, *RZExt);
 
-      return Or;
+      return replaceInstUsesWith(CI, Or);
     }
   }

I feel like doing this might be better than carving out an exception in the worklist management itself.

kuhar updated this revision to Diff 234575.Dec 18 2019, 10:33 AM

Fix instruction insertion in InstCombineCasts for zext-or-icmp.

kuhar marked 2 inline comments as done.Dec 18 2019, 10:34 AM
kuhar added inline comments.
llvm/include/llvm/Transforms/InstCombine/InstCombineWorklist.h
96–97

You approach seems to like a better way than silently skipping some uses.

I ran the in-source test suite and compiled a few large projects with -O3 to make sure it works.

kuhar updated this revision to Diff 234577.Dec 18 2019, 10:41 AM
kuhar marked an inline comment as done.
This comment was removed by kuhar.
kuhar updated this revision to Diff 234578.Dec 18 2019, 10:44 AM
nikic accepted this revision.Dec 18 2019, 11:19 AM

LGTM

This revision is now accepted and ready to land.Dec 18 2019, 11:19 AM
This revision was automatically updated to reflect the committed changes.