This is an archive of the discontinued LLVM Phabricator instance.

[Power8] Add the MacroFusion support for Power8
ClosedPublic

Authored by steven.zhang on Nov 24 2019, 9:51 PM.

Details

Summary

Power8 User's Manual Section 10.1.12 Instruction Fusion

There are two fuse types:
  {addi} followed by one of these {lxvd2x, lxvw4x, lxvdsx, lvebx, lvehx, lvewx, lvx, lxsdx}
  Requirements:
    addi(rt) = lxvd2x(rb)
    lxvd2x(ra) cannot be ‘0’
  Result:
    addi - no changelxvd2x gets the immed field from addi., rb is not used.This effectively provides a d-form version for the vector loads.The dependency between the two operations is removed.
  {addis) followed by one of these {ld, lbz, lhz, lwz}
  Requirements:
    addis(rt) = ld(ra) = ld(rt) - (cannot be ‘0’)
    addis(SI) first 12 bits must be all 0’s or all 1’sTA = 0 for this fusion to occur and if SI = 111111111110000 and the msb of the d/ds field of the load equals ‘1’, then fusion does not occur.
  Result:addis gets changed to a NOP. (It still takes up a dispatch slot, but is sent directly to completion.)The last 5 bits of addis(SI) are sent with the ld (information from the addis is passed to the ld).The addis is removed from execution.

This patch is intend to implement the missing P8 MacroFusion for LLVM.

Diff Detail

Event Timeline

steven.zhang created this revision.Nov 24 2019, 9:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2019, 9:51 PM
jsji added a comment.Nov 26 2019, 11:14 AM

Looks mostly good to me. But maybe we should seperate a new NFC patch before this.
Also can we get some testcases?

llvm/lib/Target/PowerPC/PPC.td
170

No . for others, why we need one here? :)

257

I like the idea of separating the inheritable feature and sub-target specific feature.
However, I think this should be done in an NFC patch before this.

And it looks confusing to me about having Power8SpecificFeatures and Power8OnlyFeature,
are P8 specific feature supposed to be P8 only?

Maybe we should align with X86, name them: P8InheritableFeatures , P*AdditionalFeatures , P*SpecificFeatures/P*OnlyFeatures?

llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
21

Is it better to put this into PPC namespace instead of anonymous namespace?

steven.zhang marked an inline comment as done.Nov 26 2019, 11:40 PM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPC.td
257

Good point.

steven.zhang marked 2 inline comments as done.Nov 27 2019, 12:10 AM

https://reviews.llvm.org/D70768 is created for the NFC patch.

llvm/lib/Target/PowerPC/PPC.td
170

It is typo.

llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
21

As it is used internally and I don't want to pollute the PPC namespace. Anonymous namespace is fine from what I think.

Update the patch according to Jinsong's comments.

kbarton requested changes to this revision.Nov 27 2019, 8:47 AM
kbarton added inline comments.
llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
64
80

Could you document what the return type means?

137

From reading the code, it seems like returning true from this function means that they should be scheduled together.
Thus, this function defaults to fusing instructions if it comes across something it doesn't understand (and asserts are not enabled). Would it not be better to return false here, and let the existing scheduling heuristics decide whether to make the instructions adjacent?

140

The preference is to not use \brief, and instead use a single sentence for the abstract and then follow-on details in a separate paragraph:

The first sentence (or a paragraph beginning with \brief) is used as an abstract. Try to use a single sentence as the \brief adds visual clutter. Put detailed discussion into separate paragraphs.

(https://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments)

160

Unless I'm missing something, for each call to shouldScheduleAdjacent for a given pair of instructions, we are going to walk all features in the FusionFeatures list. Is that correct?
If that's the case, I think this could be problematic (N*M where N is the number of pairs we want to check and M is the number of fusion features we have).

Is it possible to change the FusionFeatures list to use a map instead of an array, and use one of the instructions as a key into the map?

Also, how often do we want to check if the SecondMI is fusable without specifying a FirstMI? Is this a common case or an exceptional case?

161

I don't consider this an early return, this is just skipping it if it is not supported.

162

Is it possible to check for this earlier (e.g., when creating the FusionFeatures structure) and if it is not supported simply do not add it into the structure?

This revision now requires changes to proceed.Nov 27 2019, 8:47 AM
steven.zhang marked 7 inline comments as done.

Update the patch according to Kit's comments. Thank you!

llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
64

Done.

80

Done

137

When the checkOpConstraints() is called, we have got the fusion pair(FirstMI and SecondMI are fusion pair according to opcode). What we need to do is to kick some situation that we are sure that, it is NOT a fusion pair according to the spec. So, we have to return true by default unless we have the reason(i.e. the switch case) to say, it is NOT a fusion pair, as this is the last check.

140

Good catch. Done.

160

You are completely correct and yes, we need N*M complexity as you point out. However, this is NOT a 1:1 map, because, one instruction could be fused by multiple fusion pair. i.e. ( addi + load, addi + addis).

And in fact, it is a map between two groups. { inst1, inst2, inst3, ... } and { inst4, inst 6, ...}. Any instructions in group 1 is fused with any instructions in group2. And the FusionFeature is the data structure to map these two group. And we have an array of these maps. So, we have to walk all the element to find out all the maps with the instruction as the key. It would be great if we can reduce the complexity with better data structure.

For the case that without specifying the FirstMI, it is used to find the anchor instruction that we want to fuse. Then, MacroFusion will walk all the instructions pred that anchor instruction to check if they can be fused. It is not special case but the way that MacroFusion used to get the FirstMI.

  • List Item
161

Good catch.

162

We cannot, as we need to enable/disable each feature at runtime by option -mattr=[feature].

qiucf added a subscriber: qiucf.Dec 4 2019, 7:35 PM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
160

I think it's feasible to build a radix tree instead of array of sets for better performance, which can be done by either using helper methods (like defineRule(definePattern(...), definePattern(...))) or using tblgen. But since the number of patterns here aren't so large, this looks fine to me.

178

I think we still need some comments here explaining what would be checked in the mutation if we provide a negative depOpIdx. It might confuse someone to think here we have NO checks on registers at all.

The best way to do this is to add a tblgen backend which consume the TD that describe the fused instruction and produce a table that has 1:1 map for all the fused instruction pair. However, it is overkill I think. And in fact, we could add a multiple map to map one opcode with several fusion candidates but I don't think it is a good idea as we only have small groups of fusion candidates.

Hi, Kit @kbarton I have addressed your comments, would you please continue the review now ? Thank you!

Could you please rebase this patch?
It no longer applies cleanly, since you committed https://reviews.llvm.org/D70768.

llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
137

Could you please explain this in the comments to the function?

Also, is it possible to add an assert at the checking that FirstMI and SecondMI are a valid fusion pair? This will validate your assumption, and also prevent the function from being used erroneously in the future.

steven.zhang marked an inline comment as done.Dec 29 2019, 6:34 PM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
137

ok, I will do it.

Address comments.

steven.zhang marked an inline comment as done.Dec 29 2019, 10:20 PM
steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCMacroFusion.cpp
137

I have added the comments in the default case to explain why it returns true. Regarding to the assertion you mentioned, it is not easy to add it as all other conditions are within shouldScheduleAdjacent() and we are called by it also. I think, as this is a static function and we have the context of the call, it is not a big issue if we didn't add the assertion. This is the context of this call:

shouldScheduleAdjacent() {

 foreach fusion candidates {
   if (cannot fuse)
       continue
    
    if (checkOpConstraints(...))
        return true;
}
}
kbarton accepted this revision.Mar 2 2020, 2:07 PM

LGTM.
I apologize for the delay in getting to this.

This revision is now accepted and ready to land.Mar 2 2020, 2:07 PM
This revision was automatically updated to reflect the committed changes.