⚙ D28489 [CodeGen] Move MacroFusion to the target

evandro updated this revision to Diff 83713.Jan 9 2017, 3:04 PM

evandro retitled this revision from to [CodeGen] Generalize MacroFusion for any instructions pair.

evandro updated this object.

evandro added reviewers: atrick, MatzeB.

evandro set the repository for this revision to rL LLVM.

evandro added a parent revision: D28488: [CodeGen] Implement the SUnit::print() method.

evandro added subscribers: hiraditya, kparzysz.

evandro added a subscriber: llvm-commits.

evandro added a child revision: D28491: [AArch64] Add new subtarget feature to fuse AES crypto operations.Jan 9 2017, 3:11 PM

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

evandro mentioned this in D28491: [AArch64] Add new subtarget feature to fuse AES crypto operations.Jan 9 2017, 3:18 PM

In D28489#640575, @MatzeB wrote:

Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge.

At least on a rather fast x86 machine, any difference in the compile time was buried below the noise level.

And just to warn you because I am currently running into these issues: The current macrofusion code fails to work properly for nodes having around the pending queue (which mostly means macrofusion often failing for post-ra schedulers).
If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

I am currently working on patches that form instruction bundles out of macrofusion opportunities, unfortunately this is coming along slowly as instruction bundles pre-ra are not a commonly used feature.

In D28489#640596, @MatzeB wrote:

If there are no post-ra schedulers on the other hand the register allocator sometimes places copy, spill, reload instructions in between.

Yes, I noticed such irritating occurrences.

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

In D28489#640628, @atrick wrote:

Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity.

Or let targets subclass or write their own scheduledag mutation with an apropriate search strategy instead of the TII callback?

Yep, this could all be done in the target. SDep::Cluster is effectively a scheduler API for the subtarget to use as it wishes.

On the other hand, that just pushes the problem to the target code, and this is proposed for AArch64.

flyingforyou added a subscriber: flyingforyou.Jan 9 2017, 5:45 PM

mcrosier added a subscriber: mcrosier.Jan 10 2017, 7:21 AM

Pardon my cluelessness, but are you guys on a tangent? I'm truly confused.

evandro added a child revision: D28698: [AArch64] Add new target feature to fuse literal generation.Jan 13 2017, 1:57 PM

evandro updated this revision to Diff 84375.Jan 13 2017, 2:00 PM

evandro edited edge metadata.

sbaranga added a subscriber: sbaranga.Jan 18 2017, 8:11 AM

Ping^1

I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks.

I have other patches in the line that depend on this one which fuse other pairs of instrs (e.g., D28698) that do happen in the middle of blocks.

But I understand your point. An alternative that I considered before was through TargetSubtargetInfo::adjustSchedDependency(). Thoughts?

Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs.

I think I see @MatzeB's point. Just remove MacroFusion from the target independent MachineScheduler. Code reuse is not really helpful here. X86 should just have it's own MacroFusion, as with AArch64. The still register the SchedDAGMutation the same way.

Remove shouldScheduleAdjacent from TargetInstrInfo. Targets can define that helper locally.

X86 MacroFusion doesn't change at all. The code just moves.

In AArch64 MacroFusion, *before* checking the edge, determine if if this opcode wants to be fused (e.g. isi it MOVK). Only the edges leading to fusable intrustrions are checked. No need to go through a TargetInstrInfo virtual call.

Just the skeleton of MacroFusion was left behind. If anything, in order to leave the option misched-fusion intact and to keep the interface using createMacroFusionDAGMutation().

Herald added a subscriber: aemerson. · View Herald TranscriptJan 27 2017, 12:55 PM

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

In D28489#659179, @MatzeB wrote:

The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator.

That would also mean to get rid of the createMacroFusionDAGMutation(const TargetInstrInfo *TII) function and thereby the EnableMacroFusion flag. This is fine IMO as on AArch64 you can just as well enable/disable it with the FeatureArithmeticBccFusion/FeatureArithmeticCbzFusion.

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

In D28489#659194, @evandro wrote:

@MatzeB,

I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers.

I don't think a global flag is worth adding a callback and extra functions to MachineScheduler. I don't think there is that much value in the flag to justify that esp. since you can do -mattr=-FeatureArithmeticBccFusion,-FeatureArithmeticCbzFusion as well.

evandro updated this revision to Diff 86350.Jan 30 2017, 2:30 PM

evandro edited the summary of this revision. (Show Details)

Herald added a subscriber: mgorny. · View Herald TranscriptJan 30 2017, 2:30 PM

MatzeB added inline comments.Jan 30 2017, 2:58 PM

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
2054–2063 ↗	(On Diff #86350)	This code could be put directly in `AArch64MacroFusion::apply()`. The same comment applies to the X86 version.
llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	This can stay private to the .cpp file where createMacroFusionDAGMutation() is defined and doesn't need to go into a header. The same comment applies to the X86 version.
llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	Why is there a testcase change, shoulnd't this be NFC?

evandro added inline comments.Jan 30 2017, 3:08 PM

llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	Since I made it common for iOS and Linux (v. line 4), I meant to trim the check to the germane part.

MatzeB added inline comments.Jan 30 2017, 3:09 PM

llvm/test/CodeGen/AArch64/misched-fusion.ll
10–19 ↗	(On Diff #86350)	makes sense

evandro added inline comments.Jan 30 2017, 3:10 PM

llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	You mean moving the method `scheduleAdjacent()` from `<Target>InstrInfo` to `<Target>MacroFusion` as a private function?

MatzeB added inline comments.Jan 30 2017, 4:38 PM

llvm/lib/Target/AArch64/AArch64MacroFusion.h
26–35 ↗	(On Diff #86350)	Pretty much. After moving the method you will probably realize that there the only caller is inside apply() and the apply() method consists only of that 1 call, so you can just as well "inline" manually and move the code into the apply() method.

evandro updated this revision to Diff 86373.Jan 30 2017, 5:38 PM

LGTM with nitpicks addressed:

llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
191 ↗	(On Diff #86373)	No space before `()`.
195 ↗	(On Diff #86373)	Should be `// end namespace llvm` according to coding standards.
llvm/lib/Target/AArch64/AArch64MacroFusion.h
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
24–31 ↗	(On Diff #86373)	Please move the class declaration into the AArch64MacroFusion.cpp file and into an anonymous namespace.
llvm/lib/Target/X86/X86MacroFusion.cpp
10 ↗	(On Diff #86373)	Should be `/// \file ...`
245 ↗	(On Diff #86373)	No space before `()`
249 ↗	(On Diff #86373)	Should be `// end namespace llvm`.
llvm/lib/Target/X86/X86MacroFusion.h
10 ↗	(On Diff #86373)	Should be `/// \file This file ...`
24–31 ↗	(On Diff #86373)	Please move the class declaration into the X86MacroFusion.cpp file and into an anonymous namespace.

This revision is now accepted and ready to land.Jan 30 2017, 6:25 PM

evandro marked 6 inline comments as done.Jan 31 2017, 8:20 AM

Final patch after approval.

Closed by commit rL293737: [CodeGen] Move MacroFusion to the target (authored by evandro). · Explain WhyJan 31 2017, 7:05 PM

This revision was automatically updated to reflect the committed changes.

Usings of \param are improper. Tweaked in r293744 just to eliminate \param(s).

\param takes at least two parameters.

\param NAME Description...

You may write \param(s) like,

/// \brief Verify that the instruction pair, should be scheduled back to back.
/// \param First The first MI to verify
/// \param Second The second MI

Note, trunk clang doesn't recognize like "\param First,Second".

llvm/trunk/lib/Target/AArch64/AArch64MacroFusion.cpp
29 ↗	(On Diff #86556)	First and Second
123 ↗	(On Diff #86556)	DAG, ASU, and Preds
llvm/trunk/lib/Target/X86/X86MacroFusion.cpp
29 ↗	(On Diff #86556)	First and Second

Thank you, @chapuni.

This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Move MacroFusion to the target
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 84375

llvm/lib/CodeGen/MachineScheduler.cpp