This is an archive of the discontinued LLVM Phabricator instance.

Power9 - Add exploitation of oris/ori fusion
AbandonedPublic

Authored by nemanjai on Jul 10 2016, 1:50 AM.

Details

Summary

The patch teaches LLVM to schedule oris/ori together on Power9, because Power9 can fusing oris/ori into one instruction.

Note! Part of this patch overlaps with D22193, because D22193 provides macro fusion facilities.

Diff Detail

Event Timeline

cycheng updated this revision to Diff 63419.Jul 10 2016, 1:50 AM
cycheng retitled this revision from to Power9 - Add exploitation of oris/ori fusion.
cycheng updated this object.
cycheng added a subscriber: llvm-commits.
nemanjai added inline comments.Jul 11 2016, 5:43 AM
lib/CodeGen/PostRASchedulerList.cpp
561

I am probably missing something, but I don't see any intervening code that might set NextClusterSucc between this assert that ensures it is not set and the code that queries it.

lib/Target/PowerPC/PPCMacroFusion.cpp
18

I may be mis-remembering what was actually done, but I thought there was some work recently to convert functions that take MachineInstr by-pointer to by-reference when they can't be null (which I think is the case here).

32

Can you add a comment about what this function does (including describing the parameters)?

65

Left over from debugging?

67

Since this switch statement will grow and I assume that we will perform the same action on all pairs, might it be a cleaner implementation to write a function to query if an opcode pair can be fused?
Something along the lines of:

bool areOpcodesFusable(unsigned Opcode1, unsigned Opcode 2) {
  static unsigned P9FusedOpcodes[] = {
    PPC::ORIS8, PPC::ORI8,
    PPC::ORIS,  PPC::ORI,
    //...
    0
  }
  for (unsigned i = 0; P9FusedOpcodes[i]; i += 2)
    if (P9FusedOpcodes[i] == Opcode1 && P9FusedOpcodes[i+1] == Opcode2)
      return true;
  return false;
}

Then in this function, we would just call:

if(areOpcodesFusable(MI->getOpcode(), SU[Idx + 1].getInstr()->getOpcode())
  // ...

Of course, you might also want to take the actual instructions as arguments to areOpcodesFusable so that you would only return true if the requisite data dependency exists. It just seems like it would be easier to maintain a mapping this way.

test/CodeGen/PowerPC/fusing-constant.ll
17

Don't we want the fused instructions to be checked with CHECK-NEXT?

amehsan edited edge metadata.Jul 11 2016, 9:03 AM

Again, I do not see why you are changing PostRASchedulerList.cpp. We are going to switch to new post-ra MI-scheduler. You can enforce running that code by passing -mllvm misched-postra to clang for your experiments.

Can you explain why you need a custom mutator here? At first glance it looks like overriding TargetInstrInfo::shouldScheduleAdjacent() and using the default MacroFusion mutator is enough.

Can you explain why you need a custom mutator here? At first glance it looks like overriding TargetInstrInfo::shouldScheduleAdjacent() and using the default MacroFusion mutator is enough.

No, because default MacroFusion can only detect one pattern:

XXXInstr (on X86: Test, Cmp, Inc; On AArch64: CMN, CMP, TST, ALU)
BranchInstr

Also I didn't see any simple way to make MacroFusion more generic, so that's why we added a customized MacroFusion.

cycheng marked 4 inline comments as done.Jul 13 2016, 4:54 AM
cycheng added inline comments.
lib/CodeGen/PostRASchedulerList.cpp
561

Or should I just set it to nullptr? Because we will call schedule several times, even though I thought I've processed it properly, but I just want to make sure again.

lib/Target/PowerPC/PPCMacroFusion.cpp
18

They can't be null, I can convert to by-reference as well.

65

Oh.. I thought I've removed it .. :P

67

Thanks, this is a good idea, I am working on a new version based on your suggestion here, it is table based design and would be able to handle our current requirements.

By the way, this design should be able to handle the following pattern in a more clear way.

1-to-1 pattern:

oris Rx,RA,SIh
ori Rt,Rx,SIl

xoris Rx,RA,SIh
xori Rt,Rx,SIl 

addis Rx,RA,SIh
addi Rt,Rx,SIl

1-to-n pattern:

op1: addis Rx,RA,SIh

op2: stb RS,Rx,SIl
     sth RS,Rx,SIl
     stw RS,Rx,SIl
     std RS,Rx,SIl

special requirement on op1 or op2:

op1: addi Rx,0,SI
op2: stdx RS,Rx,RB
test/CodeGen/PowerPC/fusing-constant.ll
17

Yes! I should use CHECK-NEXT, thanks a lot !!

cycheng updated this revision to Diff 63951.Jul 14 2016, 4:27 AM
cycheng edited edge metadata.
cycheng marked 3 inline comments as done.
  • Addressed Nemanjai's comments
  • Implement a table based methodology for managing fusion instruction patterns, so we can have a cleaner way to add new patterns, query pattern, .. and so on.
cycheng updated this revision to Diff 64120.Jul 15 2016, 3:46 AM
  • Fix a bug when traversing fusion-table
  • Support addi/{stdx,ldx} for Pwr9

Not yet looked at everything. Will gradually add comments as I go through the code.

lib/CodeGen/PostRASchedulerList.cpp
574–589

IIUC, this implementation breaks an important invariant of the code: Anything that is in pending queue, and its operands are ready should be moved to the available queue. You should still let everything that is avaialble to go to available queue and then find a way to prioritize the one that you want there.

Again, I do not see why you are changing PostRASchedulerList.cpp. We are going to switch to new post-ra MI-scheduler. You can enforce running that code by passing -mllvm misched-postra to clang for your experiments.

After conversation with Kit, it seems that our plans are somewhat different from what I thought. So no objection here.

amehsan added inline comments.Jul 16 2016, 6:09 PM
lib/CodeGen/PostRASchedulerList.cpp
606–612

Somewhere in this loop, you have to find a way to prioritize the cluster node. One potential way to do it (and there might be better ways) is to check if the last scheduled node has a cluster outgoing edge. If yes, continue your search in the queue until you reach that node.

amehsan added inline comments.Jul 16 2016, 9:16 PM
lib/Target/PowerPC/PPCISelLowering.cpp
1674–1681 ↗(On Diff #64120)

Two comments here:

1- This should be separated to a different patch. This patch is focused on scheduling the code, such that fusion opportunities are caught. This piece of code, is about generating code that has more fusion opportunities. These are separate issues that has to be addressed separately.

2- Are you saying that we should always prefer (addi + stdx) to (addi + std) because the former pair can be fused? If yes, that is incorrect. This can significantly increase register pressure and costs us way more than what we gain from fusing two insns.

Hi Ehsan,

Thanks for your review, and sorry for late response, I am reading your comments, but I might not be able to give responses and updated patch in the short period.

CY

nemanjai commandeered this revision.Feb 7 2018, 12:23 PM
nemanjai edited reviewers, added: cycheng; removed: nemanjai.

Taking this over to abandon the revision as there are currently no plans to implement instruction fusion for Power9.

nemanjai abandoned this revision.Feb 7 2018, 12:23 PM