This is an archive of the discontinued LLVM Phabricator instance.

Extra processing for BitCast + PHI in InstCombine
ClosedPublic

Authored by itsimbal on Jan 22 2019, 6:23 AM.

Details

Summary

For some specific cases with bitcast A->B->A with intervening PHI nodes InstCombiner::optimizeBitCastFromPhi transformation creates extra PHI nodes, which are actually a copy of already created PHI or in another words, they are redundant. These extra PHI nodes could lead to extra move instructions generated after DeSSA transformation. This happens when several conditions are met

  • SROA kicks in and creates new alloca;
  • there is a simple assignment L = R, which falls under 'canonicalize loads' done by combineLoadToOperationType (this transformation is by default). Exactly this transformation is the reason of bitcasts generated;
  • the alloca is then used in A->B->A + PHI chain;
  • there is a loop unrolling.

As a result optimizeBitCastFromPhi creates as many of PHI nodes for each new SROA alloca as loop unrolling factor is. These new extra PHI nodes are redundant actually except of one and should not be created. Moreover the idea of optimizeBitCastFromPhi is to get rid of the cast (when possible) but that doesn't happen in these conditions.

The proposed fix is to do the cast replacement for the whole calculated/accumulated PHI closure not for one cast only, which is an argument to the optimizeBitCastFromPhi. These will help to accomplish several things: 1) avoid extra PHI nodes generated as all casts which may trigger optimizeBitCastFromPhi transformation will be replaced, 3) bitcasts will be replaced, and 3) create more opportunities to remove dead code, which appears after the replacement.

A new test case shows that it's possible to get rid of all bitcasts completely and get quite good code reduction.

Diff Detail

Repository
rL LLVM

Event Timeline

itsimbal created this revision.Jan 22 2019, 6:23 AM

A friendly reminder for the review.

Carrot added a comment.Feb 6 2019, 2:53 PM

I couldn't understand the problem after reading the description.

Can I describe it as: A value is used as two different types in two basic blocks (or more). Although the bitcast of %11 (I guess optimizeBitCastFromPhi starts from here) can be removed with new set of PHIs, the old PHIs can't be removed due to its usage of type B in bb6. So we need to keep two set of values(registers) in many places even after DeSSA.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 2:53 PM

There are two issues:

  1. Extra redundant phi nodes are created. If you look at the test output (without proposed changes) you will see
  %4 = phi float [ %21, %.bb12 ], [ %conv.i, %.bb2 ]
  %5 = phi float [ %22, %.bb12 ], [ %conv.i, %.bb2 ]
  %rA.sroa.8.0 = phi i32 [ %rA.sroa.8.2, %.bb12 ], [ %1, %.bb2 ]
  %6 = phi float [ %23, %.bb12 ], [ %conv.i, %.bb2 ]
  %7 = phi float [ %24, %.bb12 ], [ %conv.i, %.bb2 ]
  %rA.sroa.0.0 = phi i32 [ %rA.sroa.0.2, %.bb12 ], [ %1, %.bb2 ]

and

  %13 = phi float [ %add33.1, %.bb4 ], [ %4, %.bb3 ]
  %14 = phi float [ %add33.1, %.bb4 ], [ %5, %.bb3 ]
  %rA.sroa.8.1 = phi i32 [ %11, %.bb4 ], [ %rA.sroa.8.0, %.bb3 ]
  %15 = phi float [ %add33.2, %.bb4 ], [ %6, %.bb3 ]
  %16 = phi float [ %add33.2, %.bb4 ], [ %7, %.bb3 ]
  %rA.sroa.0.1 = phi i32 [ %12, %.bb4 ], [ %rA.sroa.0.0, %.bb3 ]

Here %5, %7, %14, %16 are redundant. These insns are created as not all bitcast insns are removed when we first hit the needed pattern. Depending on a loop unroll factor there will be (factor-1) redundant phi nodes. In the attached test case for simplicity unroll the factor is 2. In my real test the unroll factor is 16 and I got 15 redundant phi nodes for each original phi node.

  1. Not all bitcast insns are removed while they could be. The optimizeBitCastFromPhi removes only one bitcast, which triggers it. Due to this the old phi nodes are also not removed. The fix tries to remove all bitcast insns which are reachable from found phi closure. The test output (with the fix applied) doesn't have any bitcast insns and old phi nodes as well.
Carrot accepted this revision.Feb 8 2019, 8:12 AM

Thanks for the explanation.
LGTM

This revision is now accepted and ready to land.Feb 8 2019, 8:12 AM
This revision was automatically updated to reflect the committed changes.