This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Move MacroFusion to the target
ClosedPublic

Authored by evandro on Jan 9 2017, 3:04 PM.

Details

Summary

This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target.

In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end.

Diff Detail

Repository
rL LLVM

Event Timeline

evandro updated this revision to Diff 83713.Jan 9 2017, 3:04 PM
evandro retitled this revision from to [CodeGen] Generalize MacroFusion for any instructions pair.
evandro updated this object.
evandro added reviewers: atrick, MatzeB.
evandro set the repository for this revision to rL LLVM.
evandro added subscribers: hiraditya, kparzysz.
evandro added a subscriber: llvm-commits.
MatzeB edited edge metadata.Jan 9 2017, 3:18 PM

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

At least on a rather fast x86 machine, any difference in the compile time was buried below the noise level.

MatzeB added a comment.Jan 9 2017, 3:23 PM

And just to warn you because I am currently running into these issues: The current macrofusion code fails to work properly for nodes having around the pending queue (which mostly means macrofusion often failing for post-ra schedulers).
If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

I am currently working on patches that form instruction bundles out of macrofusion opportunities, unfortunately this is coming along slowly as instruction bundles pre-ra are not a commonly used feature.

If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

Yes, I noticed such irritating occurrences.

atrick edited edge metadata.Jan 9 2017, 3:30 PM

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

MatzeB added a comment.Jan 9 2017, 3:41 PM

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

Or let targets subclass or write their own scheduledag mutation with an apropriate search strategy instead of the TII callback?

atrick added a comment.Jan 9 2017, 4:40 PM

Yep, this could all be done in the target. SDep::Cluster is effectively a scheduler API for the subtarget to use as it wishes.

On the other hand, that just pushes the problem to the target code, and this is proposed for AArch64.

Pardon my cluelessness, but are you guys on a tangent? I'm truly confused.

evandro updated this revision to Diff 84375.Jan 13 2017, 2:00 PM
evandro edited edge metadata.

I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks.

I have other patches in the line that depend on this one which fuse other pairs of instrs (e.g., D28698) that do happen in the middle of blocks.

But I understand your point. An alternative that I considered before was through TargetSubtargetInfo::adjustSchedDependency(). Thoughts?

Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs.

I think I see @MatzeB's point. Just remove MacroFusion from the target independent MachineScheduler. Code reuse is not really helpful here. X86 should just have it's own MacroFusion, as with AArch64. The still register the SchedDAGMutation the same way.

  • Remove shouldScheduleAdjacent from TargetInstrInfo. Targets can define that helper locally.
  • X86 MacroFusion doesn't change at all. The code just moves.
  • In AArch64 MacroFusion, *before* checking the edge, determine if if this opcode wants to be fused (e.g. isi it MOVK). Only the edges leading to fusable intrustrions are checked. No need to go through a TargetInstrInfo virtual call.
evandro updated this revision to Diff 86105.Jan 27 2017, 12:55 PM
evandro retitled this revision from [CodeGen] Generalize MacroFusion for any instructions pair to [CodeGen] Move MacroFusion to the target.
evandro edited the summary of this revision. (Show Details)

Just the skeleton of MacroFusion was left behind. If anything, in order to leave the option misched-fusion intact and to keep the interface using createMacroFusionDAGMutation().

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

That would also mean to get rid of the createMacroFusionDAGMutation(const TargetInstrInfo *TII) function and thereby the EnableMacroFusion flag. This is fine IMO as on AArch64 you can just as well enable/disable it with the FeatureArithmeticBccFusion/FeatureArithmeticCbzFusion.

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

I don't think a global flag is worth adding a callback and extra functions to MachineScheduler. I don't think there is that much value in the flag to justify that esp. since you can do -mattr=-FeatureArithmeticBccFusion,-FeatureArithmeticCbzFusion as well.

evandro updated this revision to Diff 86350.Jan 30 2017, 2:30 PM
evandro edited the summary of this revision. (Show Details)
MatzeB added inline comments.Jan 30 2017, 2:58 PM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1910–1919

This code could be put directly in AArch64MacroFusion::apply(). The same comment applies to the X86 version.

llvm/lib/Target/AArch64/AArch64MacroFusion.h
27–36

This can stay private to the .cpp file where createMacroFusionDAGMutation() is defined and doesn't need to go into a header.

The same comment applies to the X86 version.

llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19

Why is there a testcase change, shoulnd't this be NFC?

evandro added inline comments.Jan 30 2017, 3:08 PM
llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19

Since I made it common for iOS and Linux (v. line 4), I meant to trim the check to the germane part.

MatzeB added inline comments.Jan 30 2017, 3:09 PM
llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19

makes sense

evandro added inline comments.Jan 30 2017, 3:10 PM
llvm/lib/Target/AArch64/AArch64MacroFusion.h
27–36

You mean moving the method scheduleAdjacent() from <Target>InstrInfo to <Target>MacroFusion as a private function?

MatzeB added inline comments.Jan 30 2017, 4:38 PM
llvm/lib/Target/AArch64/AArch64MacroFusion.h
27–36

Pretty much. After moving the method you will probably realize that there the only caller is inside apply() and the apply() method consists only of that 1 call, so you can just as well "inline" manually and move the code into the apply() method.

evandro updated this revision to Diff 86373.Jan 30 2017, 5:38 PM
MatzeB accepted this revision.Jan 30 2017, 6:25 PM

LGTM with nitpicks addressed:

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
10

Should be /// \file This file ...

191

No space before ().

195

Should be // end namespace llvm according to coding standards.

llvm/lib/Target/AArch64/AArch64MacroFusion.h
10

Should be /// \file This file ...

24–31

Please move the class declaration into the AArch64MacroFusion.cpp file and into an anonymous namespace.

llvm/lib/Target/X86/X86MacroFusion.cpp
10

Should be /// \file ...

245

No space before ()

249

Should be // end namespace llvm.

llvm/lib/Target/X86/X86MacroFusion.h
10

Should be /// \file This file ...

24–31

Please move the class declaration into the X86MacroFusion.cpp file and into an anonymous namespace.

This revision is now accepted and ready to land.Jan 30 2017, 6:25 PM
evandro marked 6 inline comments as done.Jan 31 2017, 8:20 AM
evandro updated this revision to Diff 86537.Jan 31 2017, 5:47 PM

Final patch after approval.

This revision was automatically updated to reflect the committed changes.

Usings of \param are improper. Tweaked in r293744 just to eliminate \param(s).

\param takes at least two parameters.

\param NAME Description...

You may write \param(s) like,

/// \brief Verify that the instruction pair, should be scheduled back to back.
/// \param First The first MI to verify
/// \param Second The second MI

Note, trunk clang doesn't recognize like "\param First,Second".

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.cpp
29 ↗(On Diff #86556)

First and Second

123 ↗(On Diff #86556)

DAG, ASU, and Preds

llvm/trunk/lib/Target/X86/X86MacroFusion.cpp
29 ↗(On Diff #86556)

First and Second