This is an archive of the discontinued LLVM Phabricator instance.

[MachinePipeliner] Improve the TargetInstrInfo API analyzeLoop/reduceLoopCount
ClosedPublic

Authored by jmolloy on Sep 4 2019, 6:27 AM.

Details

Summary

The way MachinePipeliner uses these target hooks is stateful - we reduce trip
count by one per call to reduceLoopCount. It's a little overfit for hardware
loops, where we don't have to worry about stitching a loop induction variable
across prologs and epilogs (the induction variable is implicit).

This patch introduces a new API:

/// Analyze loop L, which must be a single-basic-block loop, and if the
/// conditions can be understood enough produce a PipelinerLoopInfo object.
virtual std::unique_ptr<PipelinerLoopInfo>
analyzeLoopForPipelining(MachineBasicBlock *LoopBB) const;

The return value is expected to be an implementation of the abstract class:

/// Object returned by analyzeLoopForPipelining. Allows software pipelining
/// implementations to query attributes of the loop being pipelined.
class PipelinerLoopInfo {
public:
  virtual ~PipelinerLoopInfo();
  /// Return true if the given instruction should not be pipelined and should
  /// be ignored. An example could be a loop comparison, or induction variable
  /// update with no users being pipelined.
  virtual bool shouldIgnoreForPipelining(const MachineInstr *MI) const = 0;

  /// Create a condition to determine if the trip count of the loop is greater
  /// than TC.
  ///
  /// If the trip count is statically known to be greater than TC, return
  /// true. If the trip count is statically known to be not greater than TC,
  /// return false. Otherwise return nullopt and fill out Cond with the test
  /// condition.
  virtual Optional<bool>
  getTripCountGreaterCondition(int TC, MachineBasicBlock &MBB,
                               SmallVectorImpl<MachineOperand> &Cond) = 0;

  /// Modify the loop such that the trip count is
  /// OriginalTC + TripCountAdjust.
  /// Additionally the loop's preheader is now NewPreheader.
  virtual void adjust(int TripCountAdjust,
                      MachineBasicBlock *NewPreheader) = 0;

  /// Called when the loop is being removed. Any instructions in the preheader
  /// should be removed.
  virtual void disposed() = 0;
};

The Pipeliner (ModuloSchedule.cpp) can use this object to modify the loop while
allowing the target to hold its own state across all calls. This API, in
particular the disjunction of creating a trip count check condition and
adjusting the loop, improves the code quality in ModuloSchedule.cpp.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy created this revision.Sep 4 2019, 6:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2019, 6:27 AM
jsji added a comment.Sep 9 2019, 2:51 PM

General idea is great, some comments when I looked at PowerPC part. Thanks.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
665

No just query, but also modify the loop?

672

What is the use of this API? Did not see it in-tree.

682

I might be wrong, but it reads like a get accessor , so looks like *READ* only, but we actually create things with API.

688

I don't know why we would like to do two things in one API? Can we split?

693

It might be a little confusing to call LoopInfo->adjust after calling LoopInfo->disposed.
So maybe we should be more specific about what we dispose here in API name?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3980

I don't quite follow when we should call *splice* to NewPreheader for a target, and why Hexagon needs it, but PowerPC don't?

4027

I think we skip this loop intentionally when enabling PowerPC target , why we need it back?

jmolloy updated this revision to Diff 219889.Sep 12 2019, 5:13 AM
jmolloy marked 7 inline comments as done.

Thanks for the great feedback Jinsong! I've addressed your comments.

llvm/include/llvm/CodeGen/TargetInstrInfo.h
672

There are only two targets in-tree that use the pipeliner (Hexagon and PPC9). Both of these use dedicated hardware loop instructions so there is no induction variable update. However, the more traditional loop code sequence:

%newindvar = ADDri %indvar, 1
%done = CMPri %newindvar, TripCount
BR %done

we have two non-terminator instructions that *must not be pipelined*, because there is no guarantee that they will end up in stage 0 (which is the only valid stage for them). Tanya Lattner's thesis (and the original SMS paper) states that they strip canonical indvar updates while pipelining and add them in later when constructing the pipelined loop. This function allows the target to specify these instructions to strip.

682

Thanks! I agree.

688

Good idea!

693

I agree. This is a hook point to allow PPC and Hexagon to remove the LOOP setup instruction in the preheader. I don't particularly want to bake that implementation detail into this target-independent header though.

I've reworded this to remove references to preheaders and just mention that the loop will be removed and once this function is called no other functions can be called. Does this look alright?

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
3980

Hexagon wants the LOOP instruction in the immediate predecessor of the pipelined loop kernel:

prolog1:
prolog2:
  LOOP
kernel:
  ENDLOOP

Whereas PPC wants it to stay in the preheader to the prolog, so it can use BDZ within the prolog to adapt the loop count:

  LOOP
prolog1:
  BDZ
prolog2:
  BDZ
kernel:
  BDNZ
4027

We needed it because I was calling analyzeLoop after doing some modifications to the loop. I've changed so that ModuloScheduleExpander calls analyzeLoop early, so the preheader is in the right place and we don't need this change any more. Thanks for noticing this!

jsji added a comment.Sep 13 2019, 6:45 AM

Thanks for the great feedback Jinsong! I've addressed your comments.

Thanks @jmolloy . The API and PPC part LGTM.

jmolloy accepted this revision.Sep 20 2019, 1:56 AM

Thanks Jinsong! I've committed this as rL372376.

This revision is now accepted and ready to land.Sep 20 2019, 1:56 AM
jmolloy closed this revision.Sep 20 2019, 1:56 AM