This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.
Details
- Reviewers
craig.topper evandro t.p.northover atrick MatzeB - Commits
- rG5f746c8e2704: Recommit rL305677: [CodeGen] Add generic MacroFusion pass
rGee1b096f8a15: [CodeGen] Add generic MacroFusion pass.
rL305690: Recommit rL305677: [CodeGen] Add generic MacroFusion pass
rL305677: [CodeGen] Add generic MacroFusion pass.
rL293737: [CodeGen] Move MacroFusion to the target
Diff Detail
Event Timeline
lib/CodeGen/MacroFusion.cpp | ||
---|---|---|
123 | How about using the C++11 range based for loop here? |
lib/CodeGen/MacroFusion.cpp | ||
---|---|---|
124 | !SI.getSUnit() seems to be reduntant. |
This patch looks fine to me, but @atrick felt that MacroFusion should belong to the target.
I will have to look into the details later.
This patch looks fine to me, but @atrick felt that MacroFusion should belong to the target.
I did some of the pushing away from the previous shared macrofusion code in the MachineScheduler. I was mainly opposed at the previous static structure that pushed all macrofusion through a TII callback. This way all targets were force to share the same logic: X86 which only does macrofusion for branches would still need to look at all instructions of the basic block. This change brings this problem back I believe?
Having said that code sharing is of course a good idea. There are obviously pieces of macrofusion that can and should be shared. Some notes on the patch though I haven't read it in too much detail:
- Is there a way to split this into smaller utility functions/classes so X86 can keep just checking the ExitSU for opportunities?
- That target callback really needs some documentation if it is intended to be reused. I find the current behavior where you sometimes call it with SecondMI == nullptr and sometimes you do not confusing; It feels like two interfaces being mixed together in a single function. At the very least this needs some documentation, better yet simplify the interface if possible.
include/llvm/CodeGen/MacroFusion.h | ||
---|---|---|
9–12 | Should use /// \file | |
19–21 | Isn't this just repeating the \file comment above? | |
36 | // end namespace llvm | |
lib/CodeGen/MacroFusion.cpp | ||
10–11 | Needs three slashes to be recognized by doxygen. | |
142 | no space after function name |
@MatzeB thank you very much for having a look. I've updated the patch and hope that addresses your comments adequately:
- I've added two boolean member variables FuseBlock and FuseExit .
- I've simplified MacroFusion::scheduleAdjacentImpl to only look at predecessors. I think that shouldn't make a difference as the same pairs should be found, just the order they are discovered may be different. @evandro was there a reason why the successors were considered for SUs in a block?
- shouldScheduleAdjacent now only supports specifying both instructions or the second instruction, I had to update the AArch64 implementation to always check the second opcode first, but it was just the order that changed.
If memory serves, I came across some anecdotal instances when not all instructions were visited otherwise. The reason, methinks, is that some instructions are marked in a special manner, like stores and prefetches. So, I had to start with at the successors of the EntrySU and then again at the predecessors of ExitSU to make sure that all instructions in a BB would be visited.
include/llvm/CodeGen/MacroFusion.h | ||
---|---|---|
24–28 | You could make SecondMI a reference now to indicate it is never nullptr, also saves you an assert() below. | |
30–35 | How about splitting this into two functions | |
lib/CodeGen/MacroFusion.cpp | ||
38 | DAG cannot be nullptr, so use a reference; Contrary to apply() this is an internal function which we can change easily. | |
66 | Not true anymore. | |
68–69 | I don't think we should have this shortcut:
| |
72–73 | Maybe factor out DAG->TII and DAG->MF.getSubTarget() into a variable, it's used two or three times. | |
78–80 | Maybe we should even go for if (Dep.getKind() != Data) at least I can't imagine right now how anti or output dependencies could lead to macrofusion. | |
84–86 | Similar reasoning as above; Except for DepMI == nullptr being a real possibility for the EntrySU which we can hit here, though you may switch to checking for DepSU.isBoundaryNode() insteadof getInstr() == nullptr to make this fact more obvious. | |
92–125 | Extract the part that adds the fusion edge and does the adjustments into a separate function? I could imagine this being of value if it turns out a new target can use a different/special search strategy for finding fusion opportunities. | |
132 | end anonymous namespace |
include/llvm/CodeGen/MacroFusion.h | ||
---|---|---|
30–35 | As a matter of fact, I can only imagine 2 scenarios: one when the whole BB, including its exit, should be examined and another when only the exit should be examined. If so, then a single flag should suffice. |
Thanks again for the feedback!
lib/CodeGen/MacroFusion.cpp | ||
---|---|---|
78–80 | I think we have to consider Order dependencies as well, because Order dependencies are used between compare and branch instructions, which should be fused on X86. |
LGTM with the change below:
lib/CodeGen/MacroFusion.cpp | ||
---|---|---|
78–80 | Odd, I really would have expected a data dependency on the eflags register for X86. Anyway if that isn't the case today, then you should be able to use at least: if (Dep.getKind() == Anti || Dep.getKind() == Output || Dep.isWeak()) continue; then. |
Hi.
This change is failing to build on windows buildbots.
C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\cl.exe /nologo /TP -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib\CodeGen -IC:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen -Iinclude -IC:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:strictStrings /Oi /Zc:rvalueCast /W4 -wd4141 -wd4146 -wd4180 -wd4244 -wd4258 -wd4267 -wd4291 -wd4345 -wd4351 -wd4355 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4800 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4324 -w14062 -we4238 /MD /O2 /Ob2 -UNDEBUG /EHs-c- /GR- /showIncludes /Folib\CodeGen\CMakeFiles\LLVMCodeGen.dir\MacroFusion.cpp.obj /Fdlib\CodeGen\CMakeFiles\LLVMCodeGen.dir\ /FS -c C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(139): error C2668: 'llvm::make_unique': ambiguous call to overloaded function
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include\llvm/ADT/STLExtras.h(939): note: could be 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> llvm::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)'
with [ _Ty=`anonymous-namespace'::MacroFusion ]
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\memory(1628): note: or 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> std::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)' [found using argument-dependent lookup]
with [ _Ty=`anonymous-namespace'::MacroFusion ]
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(139): note: while trying to match the argument list '(llvm::ShouldSchedulePredTy, bool)'
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(146): error C2668: 'llvm::make_unique': ambiguous call to overloaded function
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include\llvm/ADT/STLExtras.h(939): note: could be 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> llvm::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)'
with [ _Ty=`anonymous-namespace'::MacroFusion ]
C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\memory(1628): note: or 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> std::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)' [found using argument-dependent lookup]
with [ _Ty=`anonymous-namespace'::MacroFusion ]
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(146): note: while trying to match the argument list '(llvm::ShouldSchedulePredTy, bool)'
regards
I've reverted the change for now, I'll how to figure out why the call to make_unique is ambiguous with the Microsoft compiler.
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141215/249348.html
It seems you need to use llvm::make_unique explicitly, as MSVC tries to use the std variants.
Re-committed https://reviews.llvm.org/rL305690 and a small fix https://reviews.llvm.org/rL305694
Should use /// \file