[CodeGen] Add generic MacroFusion pass.
ClosedPublic

Authored by fhahn on Tue, Jun 13, 6:03 AM.

Details

Summary

This patch adds a generic MacroFusion pass, that is used on X86 and
AArch64, which both define target-specific shouldScheduleAdjacent
functions. This generic pass should make it easier for other targets to
implement macro fusion and I intend to add macro fusion for ARM shortly.

Diff Detail

fhahn created this revision.Tue, Jun 13, 6:03 AM
evandro added inline comments.Tue, Jun 13, 8:13 AM
lib/CodeGen/MacroFusion.cpp
123

How about using the C++11 range based for loop here?

fhahn updated this revision to Diff 102338.Tue, Jun 13, 8:40 AM
fhahn marked an inline comment as done.

Use range based for loop

evandro added inline comments.Tue, Jun 13, 11:25 AM
lib/CodeGen/MacroFusion.cpp
124

!SI.getSUnit() seems to be reduntant.

fhahn updated this revision to Diff 102510.Wed, Jun 14, 2:53 AM
fhahn marked an inline comment as done.

Remove redundant !SI.getSUnit()

evandro added a subscriber: atrick.

This patch looks fine to me, but @atrick felt that MacroFusion should belong to the target.

I will have to look into the details later.

This patch looks fine to me, but @atrick felt that MacroFusion should belong to the target.

I did some of the pushing away from the previous shared macrofusion code in the MachineScheduler. I was mainly opposed at the previous static structure that pushed all macrofusion through a TII callback. This way all targets were force to share the same logic: X86 which only does macrofusion for branches would still need to look at all instructions of the basic block. This change brings this problem back I believe?

Having said that code sharing is of course a good idea. There are obviously pieces of macrofusion that can and should be shared. Some notes on the patch though I haven't read it in too much detail:

  • Is there a way to split this into smaller utility functions/classes so X86 can keep just checking the ExitSU for opportunities?
  • That target callback really needs some documentation if it is intended to be reused. I find the current behavior where you sometimes call it with SecondMI == nullptr and sometimes you do not confusing; It feels like two interfaces being mixed together in a single function. At the very least this needs some documentation, better yet simplify the interface if possible.
include/llvm/CodeGen/MacroFusion.h
10–13

Should use /// \file

20–22

Isn't this just repeating the \file comment above?

37

// end namespace llvm

lib/CodeGen/MacroFusion.cpp
11–12

Needs three slashes to be recognized by doxygen.

143

no space after function name

fhahn updated this revision to Diff 102659.Thu, Jun 15, 5:43 AM
fhahn marked 5 inline comments as done.

@MatzeB thank you very much for having a look. I've updated the patch and hope that addresses your comments adequately:

  • I've added two boolean member variables FuseBlock and FuseExit .
  • I've simplified MacroFusion::scheduleAdjacentImpl to only look at predecessors. I think that shouldn't make a difference as the same pairs should be found, just the order they are discovered may be different. @evandro was there a reason why the successors were considered for SUs in a block?
  • shouldScheduleAdjacent now only supports specifying both instructions or the second instruction, I had to update the AArch64 implementation to always check the second opcode first, but it was just the order that changed.
  • I've simplified MacroFusion::scheduleAdjacentImpl to only look at predecessors. I think that shouldn't make a difference as the same pairs should be found, just the order they are discovered may be different. @evandro was there a reason why the successors were considered for SUs in a block?

If memory serves, I came across some anecdotal instances when not all instructions were visited otherwise. The reason, methinks, is that some instructions are marked in a special manner, like stores and prefetches. So, I had to start with at the successors of the EntrySU and then again at the predecessors of ExitSU to make sure that all instructions in a BB would be visited.

evandro added inline comments.Thu, Jun 15, 1:41 PM
lib/CodeGen/MacroFusion.cpp
55

I'm inclined to agree that this should be enough.

79

I wonder if this should actually be Dep.isData(), since other dependencies probably don't matter to instruction fusion.

MatzeB added inline comments.Thu, Jun 15, 2:06 PM
include/llvm/CodeGen/MacroFusion.h
24–28

You could make SecondMI a reference now to indicate it is never nullptr, also saves you an assert() below.

30–35

How about splitting this into two functions
createBranchMacroFusionDAGMutation() and createMacroFusionDAGMutation() (or similar names) instead of the two bool parameters to make the calling code easier to read?

lib/CodeGen/MacroFusion.cpp
38

DAG cannot be nullptr, so use a reference; Contrary to apply() this is an internal function which we can change easily.

66

Not true anymore.

68–69

I don't think we should have this shortcut:

  • AnchorMI == nullptr should only be possible for ExitSU, which we could guard against in apply()
  • Hiding isTransient() seems unnecessary.
  • Hiding isPseudo() seems bad. I can easily imagine cases where the last instruction of a pseudo expansions fuses with the next instruction.
72–73

Maybe factor out DAG->TII and DAG->MF.getSubTarget() into a variable, it's used two or three times.

78–80

Maybe we should even go for if (Dep.getKind() != Data) at least I can't imagine right now how anti or output dependencies could lead to macrofusion.

84–86

Similar reasoning as above; Except for DepMI == nullptr being a real possibility for the EntrySU which we can hit here, though you may switch to checking for DepSU.isBoundaryNode() insteadof getInstr() == nullptr to make this fact more obvious.

92–125

Extract the part that adds the fusion edge and does the adjustments into a separate function? I could imagine this being of value if it turns out a new target can use a different/special search strategy for finding fusion opportunities.

132

end anonymous namespace

evandro added inline comments.Thu, Jun 15, 5:41 PM
include/llvm/CodeGen/MacroFusion.h
30–35

As a matter of fact, I can only imagine 2 scenarios: one when the whole BB, including its exit, should be examined and another when only the exit should be examined. If so, then a single flag should suffice.

fhahn updated this revision to Diff 102819.Fri, Jun 16, 6:51 AM
fhahn marked 10 inline comments as done.

Thanks again for the feedback!

lib/CodeGen/MacroFusion.cpp
78–80

I think we have to consider Order dependencies as well, because Order dependencies are used between compare and branch instructions, which should be fused on X86.

MatzeB accepted this revision.Fri, Jun 16, 2:15 PM

LGTM with the change below:

lib/CodeGen/MacroFusion.cpp
78–80

Odd, I really would have expected a data dependency on the eflags register for X86. Anyway if that isn't the case today, then you should be able to use at least:

if (Dep.getKind() == Anti || Dep.getKind() == Output || Dep.isWeak())
  continue;

then.

This revision is now accepted and ready to land.Fri, Jun 16, 2:15 PM
fhahn updated this revision to Diff 103004.Mon, Jun 19, 2:49 AM
fhahn edited the summary of this revision. (Show Details)
fhahn updated this revision to Diff 103007.Mon, Jun 19, 3:44 AM

minor comment cleanup

fhahn closed this revision.Mon, Jun 19, 3:52 AM

Hi.

This change is failing to build on windows buildbots.

http://lab.llvm.org:8011/waterfall?builder=llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast&builder=llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast

C:\PROGRA~2\MICROS~1.0\VC\bin\amd64\cl.exe /nologo /TP -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -DSTDC_CONSTANT_MACROS -DSTDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib\CodeGen -IC:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen -Iinclude -IC:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include /DWIN32 /D_WINDOWS /Zc:inline /Zc:strictStrings /Oi /Zc:rvalueCast /W4 -wd4141 -wd4146 -wd4180 -wd4244 -wd4258 -wd4267 -wd4291 -wd4345 -wd4351 -wd4355 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4800 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4324 -w14062 -we4238 /MD /O2 /Ob2 -UNDEBUG /EHs-c- /GR- /showIncludes /Folib\CodeGen\CMakeFiles\LLVMCodeGen.dir\MacroFusion.cpp.obj /Fdlib\CodeGen\CMakeFiles\LLVMCodeGen.dir\ /FS -c C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(139): error C2668: 'llvm::make_unique': ambiguous call to overloaded function
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include\llvm/ADT/STLExtras.h(939): note: could be 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> llvm::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)'

with
[
    _Ty=`anonymous-namespace'::MacroFusion
]

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\memory(1628): note: or 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> std::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)' [found using argument-dependent lookup]

with
[
    _Ty=`anonymous-namespace'::MacroFusion
]

C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(139): note: while trying to match the argument list '(llvm::ShouldSchedulePredTy, bool)'
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(146): error C2668: 'llvm::make_unique': ambiguous call to overloaded function
C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\include\llvm/ADT/STLExtras.h(939): note: could be 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> llvm::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)'

with
[
    _Ty=`anonymous-namespace'::MacroFusion
]

C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\INCLUDE\memory(1628): note: or 'std::unique_ptr<anonymous-namespace'::MacroFusion,std::default_delete<_Ty>> std::make_unique<anonymous-namespace'::MacroFusion,llvm::ShouldSchedulePredTy&,bool>(llvm::ShouldSchedulePredTy &,bool &&)' [found using argument-dependent lookup]

with
[
    _Ty=`anonymous-namespace'::MacroFusion
]

C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MacroFusion.cpp(146): note: while trying to match the argument list '(llvm::ShouldSchedulePredTy, bool)'

regards

fhahn added a comment.Mon, Jun 19, 4:37 AM

I've reverted the change for now, I'll how to figure out why the call to make_unique is ambiguous with the Microsoft compiler.

http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141215/249348.html
It seems you need to use llvm::make_unique explicitly, as MSVC tries to use the std variants.