This is an archive of the discontinued LLVM Phabricator instance.

[PATCH] [GVN] Ensure replacement instruction dominates its uses
Needs ReviewPublic

Authored by PFerreira on Jul 17 2017, 2:04 AM.
This revision needs review, but all specified reviewers are disabled or inactive.

Details

Reviewers
dberlin
Summary

Related to bugzilla 33731 https://bugs.llvm.org/show_bug.cgi?id=33731

This is caused by a a PRE insertion when the predecessor block hasn't yet been through Value Numbering (due to a back-edge).
The PRE insertion adds a duplicate instruction at the end of the block, but the already-existing instruction is already used in the predecessor block, like so:

L.LB1_346: ; preds = %L.LB1_423

%7 = sext i32 %2 to i64
%8 = mul i64 %7, 4
  • %.pre = sext i32 %1 to i64** ; inserted by GVN::performScalarPREInsertion

When we then iterate over this block (later in the pass), GVN chooses to delete the first instance of the instruction as it is duplicated (instead of the later one which was created before). This is because "GVN::performScalarPREInsertion" will number the instruction as it is created (so it is in the leader-list).

The attached patch moves the "%.pre" instruction up so that the replacement is valid.

I am not sure this is a valid fix, but I couldn't figure out another solution.
I though about stoping "GVN::performScalarPREInsertion" from inserting the instruction by numbering the BasicBlock, but that might lead to infinite loops. I also tried to stop PREInsertion from running if the BasicBlock had not been numbered yet (by keeping a list of numbered blocks) but that wouldn't work since GVN only does one pass over the function (so this case would never be optimised out).

The patch does fix the bug though, without adding regressions.

Diff Detail

Event Timeline

PFerreira created this revision.Jul 17 2017, 2:04 AM
dberlin edited edge metadata.Jul 17 2017, 8:10 AM

Analyzing more, i'm confused, see the last paragraph.

This is caused by a a PRE insertion when the predecessor block hasn't yet been through Value Numbering (due to a back-edge).

I'm not surprised this is buggy, sadly.
I suspect there are a bunch of issues around this kind of thing.

When we then iterate over this block (later in the pass), GVN chooses to delete the first instance of the instruction as it is duplicated (instead of the later one which was created before).

This sounds like an issue with findLeader/addToLeaderTable not checking local dominance if they are in the same block, based on the assumption that it will process the block in order.
addToLeaderTable always adds to the end, and so it will hit the PRE leader before any other.

If so, the fastest fix i can think of is to number the instructions in the block starting at 0 as we process them. An instruction can only be a leader when BB = LeaderBB iff the number is < than the current instruction's number.
Assign all PRE'd instructions numbers that are MAX_INT.

If you have two MAX_INT instructions, they are in-order already (since PRE will only add them in order).

This is because "GVN::performScalarPREInsertion" will number the instruction as it is created (so it is in the leader-list).

The attached patch moves the "%.pre" instruction up so that the replacement is valid.

I am not sure this is a valid fix, but I couldn't figure out another solution.

There are a couple other ways to fix this.
Another would be "always check local dominance using OrderedInstruction in FindLeader ". This is equivalent to the above.

I also tried to stop PREInsertion from running if the BasicBlock had not been numbered yet (by keeping a list of numbered blocks) but that wouldn't work since GVN only does one pass over the function (so this case would never be optimised out).

I'm unclear on how this happens at all. Maybe you are talking about Load PRE and not scalar PRE?

Otherwise, from what I see, scalar PRE doesn't run until value numbering is completely done, except in *one* case, which is for GEP's.

while (ShouldContinue) {
  DEBUG(dbgs() << "GVN iteration: " << Iteration << "\n");
  ShouldContinue = iterateOnFunction(F);
  Changed |= ShouldContinue;
  ++Iteration;
}

if (EnablePRE) {
  // Fabricate val-num for dead-code in order to suppress assertion in
  // performPRE().
  assignValNumForDeadCode();
  bool PREChanged = true;
  while (PREChanged) {
    PREChanged = performPRE(F);
    Changed |= PREChanged;
  }

Note that GVN above is completed before scalar PRE.

The only case this isn't true is here:

//

If this load follows a GEP, see if we can PRE the indices before analyzing.
 if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(LI->getOperand(0))) {
   for (GetElementPtrInst::op_iterator OI = GEP->idx_begin(),
                                       OE = GEP->idx_end();
        OI != OE; ++OI)
     if (Instruction *I = dyn_cast<Instruction>(OI->get()))
       performScalarPRE(I);
 }

Obviously, this is broken if we haven't value numbered the predecessors.
In fact, even with your fix, i don't see how it could ever get correct answers.

So i'm confused how your issue occurs at all, and feel like i'm missing something.
Can you give me a code path/example, other than the GEP one (which is not mentioned in your bugzilla bug), where we try to perform scalar PRE before value numbering a predecessor?

I set a breakpoint in the R->moveBefore(I) which I added and this was the block in question:

L.LB1_346:                                        ; preds = %L.LB1_423
  %6 = sext i32 %1 to i64
  %7 = mul i64 %6, 4
  %8 = getelementptr i8, i8* null, i64 %7
  %9 = bitcast void (...)* @poisn2_ to void ([...])*
  call [...]
  %.pre = sext i32 %1 to i64
  br label %L.LB1_350

The way I debugged this was to track the creation of the "%.pre" instruction. This led me to "GVN::performScalarPREInsertion". I set a breakpoint in that function and it gets hit before the BasicBlock above has a chance to go through "processBlock" (during iterateOnFunction). Hence the %7 has not been found yet (not yet in the leader list). This violates the assumption in performScalarPREInsertion, as you stated before (I think) which is documented by the comment in that function:

// Because we are going top-down through the block, all value numbers
// will be available in the predecessor by the time we need them.

Thus "%.pre" gets created.
The iteration order doesn't process %7's parent block due to a loop.

Does that make sense?

I set a breakpoint in the R->moveBefore(I) which I added and this was the block in question:

L.LB1_346:                                        ; preds = %L.LB1_423
  %6 = sext i32 %1 to i64
  %7 = mul i64 %6, 4
  %8 = getelementptr i8, i8* null, i64 %7
  %9 = bitcast void (...)* @poisn2_ to void ([...])*
  call [...]
  %.pre = sext i32 %1 to i64
  br label %L.LB1_350

The way I debugged this was to track the creation of the "%.pre" instruction. This led me to "GVN::performScalarPREInsertion". I set a breakpoint in that function and it gets hit before the BasicBlock above has a chance to go through "processBlock" (during iterateOnFunction). Hence the %7 has not been found yet (not yet in the leader list). This violates the assumption in performScalarPREInsertion, as you stated before (I think) which is documented by the comment in that function:

This is, in fact, because of the GEP case, where it is PRE'ing in the middle of value numbering.
There is no other case it does this.

This seems fairly dangerous.
I'd actually really like to understand if this produces any real performance improvements, because i'd rather just disable this.
There are a bunch of cases we know such things will fail (for example, if GVN has discovered dead code, etc).

Because GVN's VN is so weak, we may be able to prove it safe, but it seems ... a bad plan.

A-ha, I see your point. Apologies for the confusion, but as I mentioned before, it was my hack on GVN.

I wouldn't be able to say if this improves performance or not; the original bug isn't mine. I found it on bugzilla and thought it might be something interesting to hack on.
If you'd prefer to fix the bug by disabling parts of the transformation, should I discard this review?

PFerreira updated this revision to Diff 108080.Jul 25 2017, 7:38 AM

Keep track of BasicBlocks that have been processed and avoid doing PRE on blocks that haven't been operated on.

This causes the transformation not to be applied in some of the PRE test cases (but fixes the original problem). Not sure if this is desirable.