This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target.
In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end.
Paths
| Differential D28489
[CodeGen] Move MacroFusion to the target ClosedPublic Authored by evandro on Jan 9 2017, 3:04 PM.
Details Summary This patch moves the class for scheduling adjacent instructions, MacroFusion, to the target. In AArch64, it also expands the fusion to all instructions pairs in a scheduling block, beyond just among the predecessors of the branch at the end.
Diff Detail
Event Timelineevandro updated this object. evandro added a child revision: D28491: [AArch64] Add new subtarget feature to fuse AES crypto operations.Jan 9 2017, 3:11 PM Comment Actions Have you checked the effects for compiletime? This will check every edge in the schedule graph, I wonder if we shouldn't rather delegate the whole search to the target so it can restrict this to the actually interesting instructions instead of checking every edge. Comment Actions
At least on a rather fast x86 machine, any difference in the compile time was buried below the noise level. Comment Actions And just to warn you because I am currently running into these issues: The current macrofusion code fails to work properly for nodes having around the pending queue (which mostly means macrofusion often failing for post-ra schedulers). I am currently working on patches that form instruction bundles out of macrofusion opportunities, unfortunately this is coming along slowly as instruction bundles pre-ra are not a commonly used feature. Comment Actions
Yes, I noticed such irritating occurrences. Comment Actions Would it make sense to have a TII.mayFuseWithPrecedingInstr() to avoid testing all DAG edges? The DAG is quadratic, but this a rare opportunity. Comment Actions
Or let targets subclass or write their own scheduledag mutation with an apropriate search strategy instead of the TII callback? Comment Actions Yep, this could all be done in the target. SDep::Cluster is effectively a scheduler API for the subtarget to use as it wishes. On the other hand, that just pushes the problem to the target code, and this is proposed for AArch64. evandro added a child revision: D28698: [AArch64] Add new target feature to fuse literal generation.Jan 13 2017, 1:57 PM Comment Actions I think you should find a way to do this without calling shouldScheduleAdjacent on every DAG node. It's fine to say you've tested compilation time but what really matters here are the pathological cases with very large blocks. It's rare for instructions in the middle of blocks to have fusion opportunities, so it's wrong to introduce this potential cost for all blocks. Comment Actions I have other patches in the line that depend on this one which fuse other pairs of instrs (e.g., D28698) that do happen in the middle of blocks. But I understand your point. An alternative that I considered before was through TargetSubtargetInfo::adjustSchedDependency(). Thoughts? Comment Actions Note that adjustSchedDependency is defined as updating the latency. It's very important for target hooks not to mutate data structures people's backs. I think I see @MatzeB's point. Just remove MacroFusion from the target independent MachineScheduler. Code reuse is not really helpful here. X86 should just have it's own MacroFusion, as with AArch64. The still register the SchedDAGMutation the same way.
evandro retitled this revision from [CodeGen] Generalize MacroFusion for any instructions pair to [CodeGen] Move MacroFusion to the target. Comment ActionsJust the skeleton of MacroFusion was left behind. If anything, in order to leave the option misched-fusion intact and to keep the interface using createMacroFusionDAGMutation(). Comment Actions The targets add the DAG Mutators anyway, so you should be able to remove the whole shouldScheduleAdjacent() callback let the targets define their own class MacroFusion : public ScheduleDAGMutation { ... } class which they then add as a mutator. Comment Actions
That would also mean to get rid of the createMacroFusionDAGMutation(const TargetInstrInfo *TII) function and thereby the EnableMacroFusion flag. This is fine IMO as on AArch64 you can just as well enable/disable it with the FeatureArithmeticBccFusion/FeatureArithmeticCbzFusion. Comment Actions I just thought that it was convenient to control MacroFusion with a global option, misched-fusion, regardless of what the target prefers. Comment Actions
I don't think a global flag is worth adding a callback and extra functions to MachineScheduler. I don't think there is that much value in the flag to justify that esp. since you can do -mattr=-FeatureArithmeticBccFusion,-FeatureArithmeticCbzFusion as well.
Comment Actions LGTM with nitpicks addressed:
This revision is now accepted and ready to land.Jan 30 2017, 6:25 PM Closed by commit rL293737: [CodeGen] Move MacroFusion to the target (authored by evandro). · Explain WhyJan 31 2017, 7:05 PM This revision was automatically updated to reflect the committed changes. Comment Actions Usings of \param are improper. Tweaked in r293744 just to eliminate \param(s). \param takes at least two parameters. \param NAME Description... You may write \param(s) like, /// \brief Verify that the instruction pair, should be scheduled back to back. /// \param First The first MI to verify /// \param Second The second MI Note, trunk clang doesn't recognize like "\param First,Second".
Revision Contents
Diff 86373 llvm/include/llvm/CodeGen/MachineScheduler.h
llvm/include/llvm/Target/TargetInstrInfo.h
llvm/lib/CodeGen/MachineScheduler.cpp
llvm/lib/Target/AArch64/AArch64InstrInfo.h
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
llvm/lib/Target/AArch64/AArch64MacroFusion.h
llvm/lib/Target/AArch64/AArch64MacroFusion.cpp
llvm/lib/Target/AArch64/AArch64TargetMachine.cpp
llvm/lib/Target/AArch64/CMakeLists.txt
llvm/lib/Target/X86/CMakeLists.txt
llvm/lib/Target/X86/X86InstrInfo.h
llvm/lib/Target/X86/X86InstrInfo.cpp
llvm/lib/Target/X86/X86MacroFusion.h
llvm/lib/Target/X86/X86MacroFusion.cpp
llvm/lib/Target/X86/X86TargetMachine.cpp
llvm/test/CodeGen/AArch64/misched-fusion.ll
|
This code could be put directly in AArch64MacroFusion::apply(). The same comment applies to the X86 version.