This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
CodeGen/
-
Passes.h
-
InitializePasses.h
-
Target/
-
TargetInstrInfo.h
-
lib/
-
CodeGen/
-
CMakeLists.txt
-
CodeGen.cpp
-
MachineOutliner.cpp
-
TargetPassConfig.cpp
-
Target/X86/
-
X86/
-
X86InstrInfo.h
-
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
machine-outliner-debuginfo.ll
-
machine-outliner.ll

Differential D26872

Outliner: Add MIR-level outlining pass
ClosedPublic

Authored by paquette on Nov 18 2016, 3:01 PM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
mkuper
craig.topper

Commits

rGd36410945fc6: Add MIR-level outlining pass
rL296418: Add MIR-level outlining pass

Summary

This is an updated patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This would be especially useful to people working in low-memory environments, where sacrificing performance for space is acceptable.

This would add a interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

I would love for people to test it out and tell me about how well it works for them, and maybe even play around with the provided target hooks. Tell me what you think!

Diff Detail

Repository: rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks for the feedback on the previous patches! I've updated the outliner significantly from the previous version. In this version there is...

No suffix tree in ADT: The suffix tree is now a part of MachineOutliner.h

No TerminatedString: Everything is done just using std::vectors now with a couple helper methods.

Slightly different outlining technique: We don't have to keep track of offsets anymore by sorting candidates in descending order rather than ascending order.

Better documentation, comments, etc: I fixed up the Doxygen stuff, went through, and wrote detailed explanations of each function. I also added some FIXMEs and TODOs to help guide further development on the outliner.

Tell me what you think!

It might help to mark all comments that were addressed so far as "Done" in phabricator.

I realized I missed a bunch of stuff on MachineOutliner.cpp/MachineOutliner.h, so I went ahead and picked through the code and fixed a good chunk of it.

Highlights:

No more MachineOutliner.h!
Less gross X86 TargetInstrInfo!
Range-based for loops!
Proper pass initialization!

I'd also like to ask about the best way to handle custom epilogue/prologue insertion. I left in the three separate virtual functions in TargetInstrInfo.h. Does anyone think it'd be better to just smack them all together into one function which handles function call insertion? I feel like that'd be the best way to do it, but I left it as is for now.

Thanks!
Jessica

It would be awesome to have an option that serializes the Collection for use by external tools like Souper:
//std::vector<std::vector<unsigned> *> Collection;

Perhaps replace SuffixTree with SuffixContainer<SuffixTree> and SuffixContainter<SuffixArray>? Suffix arrays are more performant and are trivial to copy. Suffix trees are nice to walk.

Also, think about Vitanyai's approximation of Kolmogorov complexity using gzip, https://arxiv.org/pdf/cs/0111054.pdf

Dirty hack, but you could use the nesting of instructions that gzip comes up with.

Hi Jessica,

this is great progress from the last patch! I think we need one more roundtrip until we can commit the code and keep improving it inside the repository (as this is an independent pass it shouldn't affect the rest of the compiler).

For the whole Outliner.cpp file: Internal types and functions should be marked as such. In this case it's probably best to open an anonymous namespace (except for the externally visible functions createOutlinerPass(), and InitializerMachineOutliner())

include/llvm/CodeGen/Passes.h
401 ↗	(On Diff #80470)	Indent.
include/llvm/Target/TargetInstrInfo.h
1517–1563 ↗	(On Diff #80512)	You should move this block a bit upwards (We usually try to keep private fields at the beginning or end of a class, this would move them in the middle).
1559 ↗	(On Diff #80512)	As this is a codegen API passing in a MachineFunction& parameter is more natural. (Implementations can always use MF.getFunction() to get back to the llvm::Function)
lib/CodeGen/MachineOutliner.cpp
1 ↗	(On Diff #80512)	MachineOutliner.cpp
108 ↗	(On Diff #80512)	`std::pair<String *, size_t>`?
178–186 ↗	(On Diff #80512)	Looks like this could be a constructor.
179 ↗	(On Diff #80512)	This looks like it could just be a constructor on the SuffixTreeNode struct.
189 ↗	(On Diff #80512)	Should this be called `deleteSuffixTreeSubTree` or similar as it deletes more than 1 node?
205–206 ↗	(On Diff #80512)	this has no effect
391 ↗	(On Diff #80512)	can be `size_t size() const`
397 ↗	(On Diff #80512)	`\param`
416–417 ↗	(On Diff #80512)	No need for to cast `EndIdx`.
468 ↗	(On Diff #80512)	use a reference instead of a pointer?
476 ↗	(On Diff #80512)	Could do an early exit: if (MaxHeight == 0) return nullptr;
551 ↗	(On Diff #80512)	the braces around the cases are unnecessary
582 ↗	(On Diff #80512)	This can use an early exit so you do not need to indent everything inside the if.
595 ↗	(On Diff #80512)	This seems to be an impossible case as you already tested for `N != nullptr`.
617–621 ↗	(On Diff #80512)	This could be for (SuffixTreeNode *T = N->Link; T && T != Root; T = T->Link) T->Valid = false; (It is usually better to go with a for loop instead of `while() { ... increment; }` on principle, because you do not have to remember to duplicate the increment code when you want to use `continue` somewhere inside the loop)
630–631 ↗	(On Diff #80512)	Maybe do not initialize these to give a hint to the reader that they will be set by `findString()` anyway (If findString() doesn't always set them, then you should make it).
634 ↗	(On Diff #80512)	Could be an early exit.
647 ↗	(On Diff #80512)	Use `const StringCollection &Strings`
666 ↗	(On Diff #80512)	The deleteSuffixTreeNode() implementation already protects against nullptrs.
713 ↗	(On Diff #80512)	the llvm:: prefix should not be necessary (same for OccBB)
741 ↗	(On Diff #80512)	You probably only need to put INITIALIZE_PASS and createOutlinerPass() in the llvm namespace.
772 ↗	(On Diff #80512)	Maybe use `"MIR Function Outlining"` to be in sync with INITIALIZE_PASS. Most llvm pass 'names' read more like short descriptions.
799–801 ↗	(On Diff #80512)	Could use references instead of pointers.
843–845 ↗	(On Diff #80512)	This makes no sense.
853 ↗	(On Diff #80512)	Maybe use `"machine-outliner"` to be in sync with DEBUG_TYPE.
868 ↗	(On Diff #80512)	Maybe add `assert(CurrLegalInstrMapping < CurrIllegalInstrMapping && "Overflow");` The same at the place where you increment CurrLegalInstrMapping.
875–877 ↗	(On Diff #80512)	The `find(), if (I != end()) ... else insert()` sequence walks over the datastructure in the find() and again in the insert() step. Instead you can simply always `insert()` and check the return value to see whether an element was actually inserted or an existing one reused: auto I = map.insert(make_pair(MI, CurrLegalInstrMapping)); // Newly inserted? if (I.second) CurrLegalInstrMapping++; unsigned MINumber = I.first;
895 ↗	(On Diff #80512)	Use a reference.
990 ↗	(On Diff #80512)	I don't think the value names need to be kept around.
1019 ↗	(On Diff #80512)	Can be `begin()` instead of `instr_begin()`.
1024 ↗	(On Diff #80512)	Should probably document this with something like `// The cloned memory operands reference the old function. Drop them.`
1025 ↗	(On Diff #80512)	can be `MBB->begin()`
1047–1050 ↗	(On Diff #80512)	Could use a range based for: for (OutlinedFunction &OF : FunctionList) OF.MF = createOutlinedFunction(M, OF);
1108–1112 ↗	(On Diff #80512)	Constructing names should not be necessary here. You should be able to add a GlobalValue MachineOperand instead of a Symbol one when you construct the call instruction.
1147–1149 ↗	(On Diff #80512)	You can move those out of the loop.
1158 ↗	(On Diff #80512)	There seems to be an unnecessary duplication of the string.

Some stylistic comments.

Also, the amount of testing seems small compared to the amount of code being added. Can you beef it up? Especially testing for the handling of "unsafe" cases would be good.

lib/CodeGen/MachineOutliner.cpp
16 ↗	(On Diff #80512)	A link to your devmtg talk might be good (together with some explanation about how it relates to what is implemented here).
54 ↗	(On Diff #80512)	This seems to have some nontrivial invariants, so you may want something a bit stronger than a typedef. Maybe a lightweight struct around this would be better? You already have a couple helpers that seems like they could quite naturally be methods of such a struct. I guess the question is: does it ever make sense for a random piece of code that is using this typedef to actually operate on it using the std::vector API? Or would such code inherently risk violating some sort of invariant? If the latter, a lightweight struct encapsulating the underlying vector is likely to be a good choice.
60 ↗	(On Diff #80512)	This doesn't really make much sense to me. Can you explain this data structure a bit better?
73 ↗	(On Diff #80512)	This could use a better name.
124 ↗	(On Diff #80512)	Is there a good online resource you can link to about Ukkonen's algorithm? If this implementation is patterned off of a particular resource that might be good to link to if possible.
126 ↗	(On Diff #80512)	Small coding standard nit: use static/anonymous namespace on all the helper stuff here and elsewhere: http://llvm.org/docs/CodingStandards.html#anonymous-namespaces
141 ↗	(On Diff #80512)	I assume "start" was meant here. But "star index" (as in Kleene) sounds vaguely plausible in the context of a string algorithm so this typo is worth fixing.
152 ↗	(On Diff #80512)	Do you really need a std::map? http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc Also, if the number of outgoing edges is small, a small-type container is probably better here.
157 ↗	(On Diff #80512)	As per the comment, maybe call this "is in tree" or "already pruned" or something like that?
163 ↗	(On Diff #80512)	Can you be a bit more specific about where this shortcut link comes in in Ukkonen's algorithm?
172 ↗	(On Diff #80512)	Both `Start` and `End` here are described as "index" but one is a pointer. That's worth explaining in the comment. Also, SuffixIndex is also an "index" and gets an "Index" suffix to its variable name. Maybe these should be `StartIndex` and `EndIndex`? Also, other places in this patch use `Idx` for variable names to mean "index". Maybe these should be `StartIdx` etc.?
306 ↗	(On Diff #80512)	Move the comment inside the braces so that this can use the more common `} else {` style, here and elsewhere.
337 ↗	(On Diff #80512)	There's quite a few naked new/delete in this code. Can you encapsulate the ownership better? If not, can you centralize the new/delete in helpers and add some comments about lifetime/ownership?
770 ↗	(On Diff #80512)	Can you hold this by value?
1175 ↗	(On Diff #80512)	Can you improve the ownership here? E.g. use a std::unique_ptr to manage the lifetime?
lib/CodeGen/TargetPassConfig.cpp
95 ↗	(On Diff #80512)	What is the official name? "machine outliner" or "MIR outliner". Please be consistent here and elsewhere.

Hi everyone,

I think I addressed most if not all of the comments from the previous patch; thanks for all of the help with reviewing this! I'm quite happy with how this has been coming along in progressing from intern project to real code.

Major changes:

No use of raw new/delete in the outliner
Tried to decouple the "program strings" from normal strings-- that seemed to be somewhat confusing
New test which also checks if the outliner obeys basic block boundaries
Lots of type changes; lots of things that didn't have to be pointers are no longer pointers
More use of appropriate LLVM data structures for memory management, maps, etc.

Tell me what you think!

Great work!

This round of review feedback is mostly about comment accuracy and some implementation improvements like centralizing/encapsulating the numbering state machine, handling the EndIdx for leaf nodes in a simpler way, and avoiding copying too many vectors around. Also various style/readability nits.

The testing still seems light considering how much functionality this adds. Can we use MIR to test this? Ideally we'd have pretty good coverage of X86InstrInfo::isLegalToOutline
One way to think about this is: suppose we run this on a bunch of programs in the wild and find some bugs in the suffix tree construction algorithm or isLegalToOutline. How are we going to write tests for those fixes?
Since you've done such a nice job keeping SuffixTree separate (with the simple interface based on unsigned) it might even be possible to write a C++ unittest for it. That requires pulling it up into a header though and other headaches/boilerplate that might not be worth it right now.

lib/CodeGen/MachineOutliner.cpp
70 ↗	(On Diff #83675)	My reading of this comment interprets this as saying that this class e.g. holds a DenseMap of MI's to integers itself, but that is not the case. Can you make this more precise? The term "mapping" is somewhat confusing since in common usage a "mapping" usually denotes a container. But throughout this patch the term "mapping" is used to refer to a "string". I think I get what sense it is meant in (e.g. "this is the mapping of this MBB through the our value numbering map"). I can't think of any really good names except for something horribly verbose like "StringOfInstructionNumbers" or something like that. Anyway, you probably want to beef up this comment to describe the sense in which "mapping" is used here and throughout this patch. After staring at the patch for a while the term "mapping" has grown on me, so it's not a big deal.
72 ↗	(On Diff #83675)	Rather than a generic "this is used for compatibility with the suffix tree", can you be more precise. Something like "Our suffix tree implementation operates on this class" or something? After changing the constructor argument of SuffixTree to `const std::vector<std::vector<unsigned>> &` the only use of this class is as an internal data structure of the SuffixTree, which is an even more precise statement.
76 ↗	(On Diff #83675)	This description in terms of "hash" vs "unique" doesn't seem accurate. Once you pull out a class to encapsulate the numbering you can just mention that here. As far as users of this class are concerned the numbers are just unique symbols; I don't think you need to go into too much detail besides linking to the place where we assign the numbering.
99 ↗	(On Diff #83675)	Generally the term "program" is reserved for talking about the final linked executable, but this code (except during LTO) generally operates on a single Module/TU. Can you give this a better name? Maybe just `BlockMappings` or something like that.
143 ↗	(On Diff #83675)	Why is this comment talking about "2D mapping"? There is just a single index.
202 ↗	(On Diff #83675)	This comment is pure gold. Nice.
241 ↗	(On Diff #83675)	Comment on the `+ 1`.
264 ↗	(On Diff #83675)	Isn't it a more like that the leaves "represent" suffixes rather than "contain" suffixes?
266 ↗	(On Diff #83675)	I don't think this is correct. It isn't the leaves representing suffixes per se that facilitates finding repeated substrings, but rather the fact that the internal nodes represent repeated substrings (shared prefixes of the suffixes). You may want to beef up this paragraph on suffix trees a bit to describe the basic invariants a bit more (leaves represent suffixes, internal nodes represent shared prefixes of the suffixes).
269 ↗	(On Diff #83675)	Same comment as on ProgramMapping. The integers themselves aren't hashes (otherwise this code would have to do something special for collisions).
279 ↗	(On Diff #83675)	I'm not sure what you are trying to convey with this paragraph. Maybe you can just mention that the implementation maintains parent links (maybe also describe what they are used for). I don't see the big picture for talking about cycles or explicit digraphs here.
288 ↗	(On Diff #83675)	The RAII behavior of BumpPtrAllocator is pretty well-known, so you don't need to explicitly mention it (that's the beauty of RAII; it just cleans up for you!). This comment can probably be reduced to "Allocator that owns all nodes in the tree".
292 ↗	(On Diff #83675)	This arrangement of adding an extra layer of indirection for the special "EndIdx" handling for leaf nodes is interesting but after staring at the code for a while it seems like it obscures things (or maybe I'm missing something). It seems that the only thing that needs this extra layer of indirection is during construction in the test `if (StrIdx > *(CurrSuffixTreeNode->EndIdx)) {` which could be replaced by something like `if (StrIdx > (CurrSuffixTreeNode->EndIdx == -1 : LeafEndIdx : CurrSuffixTreeNode->EndIdx) {` or similar. To do that, SuffixTreeNode::EndIdx would just be an integer held by value, with -1 being a sentinel indicating that this is a leaf that needs the special handling in that one test.
294 ↗	(On Diff #83675)	what does the EndIdx even mean for an internal node, since they can be shared? Maybe explain that in the comment of the EndIdx member of SuffixTreeNode?
302 ↗	(On Diff #83675)	You can use the same BumpPtrAllocator for both of these if you want. It's kind of nice to have a place for these comments though so no biggie.
347 ↗	(On Diff #83675)	Can `Parent` just be a constructor argument?
366 ↗	(On Diff #83675)	This assert is inside an `if(ChildPair.second != nullptr)` so probably doesn't buy you much.
547 ↗	(On Diff #83675)	Nit: move the comment inside so you can use the coding-standard compliant `} else if (...) {`
595 ↗	(On Diff #83675)	Nit: Putting this `CurrSuffixTreeNode->Children[QueryString[CurrIdx]]` expression (used 3 times) in a variable will make things a bit shorter and also give you an opportunity to give that value a name. E.g. maybe `if (Child && Child->IsInTree) {`?
604 ↗	(On Diff #83675)	Nit: move this comment inside the `else` so that you have `} else {`.
613 ↗	(On Diff #83675)	Comment on the `- 1` part of this. It's setting off my off-by-one error spidey sense.
673 ↗	(On Diff #83675)	You're copying quite a few std::vector's here. Can they be ArrayRef's?
725 ↗	(On Diff #83675)	This iterates over Strings.MBBMappings. In what sense does it treat Strings as "flat" if it looks at individual substrings? Also, I find it a bit weird that we take a ProgramMapping as input to this constructor, but then all these `append` calls seem to build up a different ProgramMapping. Do the two ProgramMapping's end up being equivalent? It would be nice to see a bit more explanation about this. At least for the purposes of this constructor, maybe a `const std::vector<std::vector<unsigned>> &` is the natural interface because it doesn't use any of the fancy methods on ProgramMapping.
751 ↗	(On Diff #83675)	It seems a bit weird to me that this class is caring explicitly about the distinction between the "flat" and non-"flat" senses of ProgramMapping. I thought that ProgramMapping was just supposed to encapsulate a 2D ragged array and make it look flat, but here it seems that the external code still cares about the distinction between 2D-ness and flat-ness. Can you make it a bit clearer in the code and comments what ProgramMapping is supposed to represent and what its interactions with the rest of the code are?
797 ↗	(On Diff #83675)	I don't see much mention of the function Id's in `ProgramMapping`. Looking at the code, it seems like it should be `unsigned` and roughly represents the call instruction that jumps to the outlined function.
826 ↗	(On Diff #83675)	There are a couple members here related to the legal-instruction/illegal-instruction/function numbering that could stand to be pulled out into an isolated class (which can then be held by value in the pass) separate from the pass boilerplate. Such a class will also be a good place to authoritatively document the numbering scheme and encapsulate it.
943 ↗	(On Diff #83675)	Small readability nit: use `std::tie(I, WasInserted) = ...` or something like that to make this a bit clearer (this map interface returning the pair is always confusing without that).
1154 ↗	(On Diff #83675)	It isn't really the "function's id" but rather the id for a call instruction that jumps to it, right?
1155 ↗	(On Diff #83675)	Nit: remove commented out code.

silvas added inline comments.Jan 10 2017, 4:34 AM

lib/CodeGen/MachineOutliner.cpp
95 ↗	(On Diff #83675)	Can you add a high-level explanation of why we have a 2D vector in the first place. I.e. why do we need to "pretend" instead of just materializing the flattened vector? One thing that may help make this clearer is encapsulating the mutation of `MBBMappings` a bit better. Right now there seem to be some un-encapsulated mutations, so just looking at the class it's not clear what the elementary operations on it are.
122 ↗	(On Diff #83675)	It shouldn't be too hard to bring this down to log(N) if necessary, but it surprises me that O(N) is fine when N = number of MBB's in the module (for reference, a typical FullLTO for a codebase ~ the size of clang has 10's of thousands of functions, and probably an order of magnitude more MBB's). Can you add a comment explaining that it takes linear time and why that's fine (or not fine, but fine for now)?

aprantl added inline comments.Jan 12 2017, 8:53 AM

lib/CodeGen/MachineOutliner.cpp
199 ↗	(On Diff #78585)	Did you get a chance to look into this?

I did a quick run on SPEC CPU 2006 with FullLTO and it seems like I ran into 3 different assertion failures across various programs: https://reviews.llvm.org/P7954
There seem to be 3 different assertions getting hit.

Here are some basic bugpoint-reduced test cases. Repro with llc -enable-machine-outliner -O3 foo.ll (though I expect these will be sensitive to minor codegen changes; I assume you have access to SPEC so you can reduce them again if needed)

https://reviews.llvm.org/P7957 :

Assertion `!NodePtr->isKnownSentinel()' failed.

https://reviews.llvm.org/P7958 :

Assertion `(isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"' failed.

https://reviews.llvm.org/P7960 :

Assertion `Occurrences.size() > 0 && "Longest repeated substring has no occurrences."' failed.

Also, here is a case that takes a really long time to compile (it does eventually finish) stuck in MachineOutliner::buildCandidateList :
https://reviews.llvm.org/P7959
Bugpoint found this one from the "Occurrences.size() > 0" assertion failure case.

For now, it's probably best to focus on the code review comments. Once the code is in good shape and committed, these sorts of bugs can be incrementally hammered out (tracked in bugzilla, fixed with clear individual patches (with test cases :) )).

include/llvm/CodeGen/Passes.h
402 ↗	(On Diff #83675)	Nit: comment and function aren't aligned.
include/llvm/Target/TargetInstrInfo.h
1518 ↗	(On Diff #83675)	Nit: inconsistent indent.
lib/CodeGen/MachineOutliner.cpp
48 ↗	(On Diff #83675)	Nit: this won't work on case-sensitive file systems.

In case you're wondering, that didn't take that long to reduce to that point. Just have to learn the tools and workflow. We have amazing tools for doing test case reduction and an easily understandable test-suite build system! (mad props to those who made bugpoint; the folks that made test-suite awesome; and LLD's -save-temps; and a bit of -globalopt and -metarenamer to clean things up)

I hear that you are planning to get rid of the ProgramMapping construct which would be my last real stopper.
There is still a lot of nitpicky stuff around.
This time I checked the suffix tree construction algorithm which looks good.
For future patches: Having a DenseMap<unsigned, SuffixTreeNode *> in every node is potentially wasteful. Intuitively I would expect the majority of nodes to only have a small number of entries, so it may be good to measure and explore alternative representations like a dynamically adapting datastructure (i.e. switching from linear list to map depending on number of children).

include/llvm/CodeGen/Passes.h
401–402 ↗	(On Diff #83675)	This should probably be `createMachineOutlinerPass()` to be consistent.
include/llvm/Target/TargetInstrInfo.h
1518 ↗	(On Diff #83675)	indentation. You should also consider to move these functions towards the other functions so the "private:" part can stay at the end of the class definition.
1530 ↗	(On Diff #83675)	Indentation
1559 ↗	(On Diff #83675)	As this is a codegen API I would rather pass a `MachineFunction&`
lib/CodeGen/MachineOutliner.cpp
40 ↗	(On Diff #83675)	Move this below the #includes so you do not accidentally affect the headers.
63–64 ↗	(On Diff #83675)	Maybe drop the `Stat` suffix, esp. in a statistic dump that looks superfluous.
172 ↗	(On Diff #83675)	Suggestion (possibly for later patches): As far as I see it a node is either a leaf or an inner node and never changes it nature. You could make this and the constraints on the End and Children members a bit more obvious when representing this in a type hierarchy (and safe a bit of memory): struct SuffixTreeNode { bool IsLeaf; ... }; struct SuffixTreeLeafNode : public SuffixTreeNode { size_t EndIdx; size_t SuffixIdx; }; struct SuffixTreeInternalNode : SuffixTreeNode { Map<SuffixTreeNode> Children; };
237 ↗	(On Diff #83675)	can be const.
238–243 ↗	(On Diff #83675)	Maybe handle the special case early: if (StartIdx == EmptyIdx) return 0; return EndIdx - StartIdx + 1; I assume this is not supposed to be called with EndIdx == EmptyIdx? Add an assert()?
248–250 ↗	(On Diff #83675)	Interesting to see this packaged up in an own struct instead of just putting the members directly into the SuffixTree class. But doesn't hurt either I guess.
302 ↗	(On Diff #83675)	It looks like you can use a single allocator for nodes and EndIndexes.
345 ↗	(On Diff #83675)	Maybe add else assert(EndIdx == EmptyIdx); to make sure callers know what they are doing. An alternative would be to provide different functions for inserting leafs or inner nodes.
363 ↗	(On Diff #83675)	You can save an indentation level here with if (ChildPair.second == nullptr) continue;
387 ↗	(On Diff #83675)	As you only have 1 "out" parameter you could simply return the new value instead. At the call side I find `SuffixesToAdd = extend(x, y, SuffixesToAdd);` easier to understand when you do not have to wonder whether a parameters is an "out" parameter. The NeedsLink parameter is nullptr for all callers?
390–391 ↗	(On Diff #83675)	General note on comments: I would expect comments at this indentation level to talk about properties/situations at that level and not just inside the if. This gets clearer if you formulate the comment as a question, or an if-then statement (or move the comment into the if block): // Look at the last character if the current mapping is 0. if (Active.Len == 0) Active.Idx = EndIdx; // Current mapping is 0? if (Active.Len == 0) { // Look at the last added character. Active.Idx = EndIdx; }
397 ↗	(On Diff #83675)	LastChar can be moved further down as it's not used by some paths through the function.
472 ↗	(On Diff #83675)	Add a comment that you check whether Active.Node is the root or alternatively add a `SuffixTreeNode::isRoot()` function.
483–484 ↗	(On Diff #83675)	Indentation
490 ↗	(On Diff #83675)	This should rather be ArrayRef<unsigned>.
512 ↗	(On Diff #83675)	Add `assert(SuffixesToAdd == 0);`?
517 ↗	(On Diff #83675)	Maybe `setSuffixIndices(..., /LabelHeight =/0)` instead of the local variable so readers do not keep wondering whether it is an "out" parameter.
687 ↗	(On Diff #83675)	move this into the loop.
803–807 ↗	(On Diff #83675)	C++ does the right thing for `Name(Name)` etc. so you can drop the `_` suffixes from the parameter names. Similar with some other constructors.
809 ↗	(On Diff #83675)	You should extend the anonymous namespace to include the MachineOutliner class. The only things that needs to be visible to the outside are `initializeMachineOutlinerPass()` (=the stuff coming out of INITIALIZE_PASS) and `createOutlinerPass()`.
832 ↗	(On Diff #83675)	Because this relies on implementation details of DenseMapInfo, better play it safe with static_assert so the compilation fails if someone decides to change the values: static_assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1); static_assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2); that way things also explain themselves and you can get away with a shorter comment. The module pass instance can in theory be reused for multiple programs. So the state here needs to be initialized and cleared in `runOnModule()`.
834 ↗	(On Diff #83675)	It's the next number to be assigned, isn't it? Same with CurrIllegalInstrMapping.
921 ↗	(On Diff #83675)	Indentation, MBB can be `const`
952–953 ↗	(On Diff #83675)	The overflow could (in theory) be triggered by a user and not just by compiler bugs. So could use report_fatal_error() so it stays around in release builds: if (CurrLegalInstrMapping < CurIllegalInstrMapping) report_fatal_error("Instruction mapping overflow!");
954 ↗	(On Diff #83675)	Use `DenseMapInfo<unsigned>::get{Empty\|Tombstone}Key()` instead of hardcoding the values.
1008 ↗	(On Diff #83675)	Could use `emplace_back(OccBB, StartIdxInBB, ...)`
1017 ↗	(On Diff #83675)	Could use `emplace_back()`.
1050 ↗	(On Diff #83675)	I think getOrInsertFunction() copies the name and does not take ownership of the passed string so this is an unnecessary copy and a memory leak.
1066–1068 ↗	(On Diff #83675)	Use references for variables that cannot be `nullptr`.
1103–1104 ↗	(On Diff #83675)	I think CandidateList and FunctionList can be `const` or better `ArrayRef`.
1213 ↗	(On Diff #83675)	Move to assignment.
1214–1215 ↗	(On Diff #83675)	As you have some state such as CurrentFunctionID, CurrLegalMapping in the class anyway, maybe the two vectors and the Worklist can move there as well so you do not need to pass them around? Just need to `clear()` them at the end of the function then.

And a few more for the X86 part.

lib/Target/X86/X86InstrInfo.cpp
9764–9766 ↗	(On Diff #83675)	Heh, in theory every single x86 instruction modifies RIP. But I assume we don't model it like that in LLVM. In any way restricting this to reads(RIP) should be enough.
9776–9781 ↗	(On Diff #83675)	Are those tests necessary given that you already throw out operations with FrameIndex operands?
9783 ↗	(On Diff #83675)	You could use `MachineInstr::isPosition()` instead of checking for `isLabel()` and `isCFIInstruction()`
9786 ↗	(On Diff #83675)	Better use `const MachineOperand &MOP` to avoid some copying.
lib/Target/X86/X86InstrInfo.h
617 ↗	(On Diff #83675)	This linebreak seems unnecessary.

pmatos added a subscriber: pmatos.Jan 30 2017, 4:28 AM

aprantl added inline comments.Feb 21 2017, 3:34 PM

lib/CodeGen/MachineOutliner.cpp
199 ↗	(On Diff #78585)	Ping :-) Note that it is really important to skip over any DBG_VALUE intrinsics while deciding whether to outline a sequence of instructions. Otherwise compiling with -g will produce different code than without, which we generally consider to be serious bug in the compiler.

Alright, it's been a while, but here's the next version of the outlining patch! As always, thanks to everyone taking the time to read through all this code. This version of the outliner is quite different from the previous one, since I've improved on it a lot since the last patch.

Major changes

More LLVM-ey and most comments addressed!

X86 target won't outline debug instructions anymore. There are other things to think about wrt debug info, which I'm currently working on.

More tests: Right now, MIR tests with the outliner make LLC unhappy, so I wrote a couple IR tests which should be easy enough to transition to MIR.

No more ProgramMapping: instead there's another vector which keeps track of the positions of each instruction in the program. It uses iterators because the delete function on MachineBasicBlocks takes a start and end iterator. (If that makes anyone uncomfortable, I can change it to pointers.)

Improved suffix tree pruning: the previous version was too aggressive and threw out too many candidates. The new version uses a vector of leaves.

No more unnecessary functions: the previous version had a FIXME stating that sometimes the outliner could create unnecessary functions when all of the candidates for a function were removed. Now overlap pruning happens directly before outlining, so functions are created as they're needed.

Outlined functions are now link once ODR: This allows the linker to dedupe outlined functions without LTO.

Mapper class: This class performs the instruction-integer mappings and is passed around the outliner

General suffix tree queries: Different targets will have different benefit functions, and even different types of functions to outline. For example, after this, I have a version of the outliner which supports tail-calling outlined functions. In the interest of keeping target-specific stuff out of the SuffixTree, the target now defines a benefit function, which is then maximized by the DFS query for repeated substrings. I think this will allow for more fine-grained outlining on various targets.

Tell me what you think!

aprantl added inline comments.Feb 23 2017, 11:13 AM

lib/Target/X86/X86InstrInfo.cpp
10412 ↗	(On Diff #89535)	This is not the right way to do this. We need to skip over DBG_VALUE instruction as if they didn't exist. Otherwise the presence of DBG_VALUEs in the instruction stream will have an effect on the outlining decision, which means that compiling with -g will generate different code than without. Please also be sure to include a testcase that exercises this.

Thanks for pushing this, this is coming along nicely!

At this point I don't see any correctness or compiletime problems (with the comments below addressed).
Let's commit this and keep improving it in-tree. This should also make reviewing easier as we can have smaller follow-up patches.

lib/CodeGen/MachineOutliner.cpp
782–783 ↗	(On Diff #89535)	An instance of this only describes a single outlined function.
800 ↗	(On Diff #89535)	doxygen `///`
844 ↗	(On Diff #89535)	I don't think you need to initialize this, it gets overwritten anyway in the next line.
1225–1241 ↗	(On Diff #89535)	Hmm... This hash doesn't seem collision free. Someone having two files with the same name (maybe in two different projects that he links together later) may happen. Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? I think the linker will only try to merge functions with the same name but the function name(-hash) is currently based on the name not the contents of the function so I would expect this to be not helpful in most cases. Maybe stay with the previous internal linking and try the LinkOnce tricks in a follow-up commit (where it is based on the contents).
lib/Target/X86/X86InstrInfo.cpp
10424–10425 ↗	(On Diff #89535)	This is surprising, checking the MCInstrDesc should not be necessary. This is most probably a bug somewhere else in codegen, so there is nothing we can do here. However I'd be good if you could find the time later to create a reproducer and file a PR about it, reading and writing registers without having operands for it looks like a bug waiting to happen elsewhere.
test/CodeGen/X86/machine-outliner-basic.ll
1 ↗	(On Diff #89535)	Better use `-mtriple=x86_64--` instead of `-march` so we also force the operating system etc.
24 ↗	(On Diff #89535)	You probably want to force this test to use the same outlined function everywhere. FileCheck allows assigning names and checking for repeated patterns: ; CHECK: callq [[_OUTLINED_FUNCTION[0-9]+_0:OUTLINEFUNC]] ... ; CHECK: callq [[OUTLINEFUNC]] ... ; CHECK-LABEL: [[OUTLINEFUNC]]:
test/CodeGen/X86/machine-outliner-bb-boundaries.ll
1 ↗	(On Diff #89535)	You can probably merge the tests together into a single file as they are all about the same pass and use the same llc flags.
22 ↗	(On Diff #89535)	I'd remove those standard dumping comments. If you actually care about the label give it a real name, if not the comment shouldn't be necessary either.

This revision is now accepted and ready to land.Feb 23 2017, 4:18 PM

silvas added inline comments.Feb 23 2017, 6:45 PM

lib/CodeGen/MachineOutliner.cpp
1225–1241 ↗	(On Diff #89535)	Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? No. linkonce_odr requires that if the name matches then the contents are interchangeable, since one gets selected arbitrarily. So for correctness the hash must be collision-free. (see also the discussion in D29512 which also involves finding a stable "name" for the TU) Also, I don't see the point of doing this. The linker's content-based deduplication ("ICF") should handle this case without caring about the name. If you want to use the linker's comdat/linkonce (i.e. name-based) deduplication then you can just use the function's contents as the name (mangling away NUL bytes), or a strong hash (collisions are a correctness problem). Presumably, if users are using this pass, then they care about code size and so they are likely to have ICF enabled already. So I don't see the point of doing this linkage trick.
test/CodeGen/X86/machine-outliner-interprocedural.ll
8 ↗	(On Diff #89535)	The leading underscore here is darwin-specific. Add an explicit triple to avoid this (otherwise non-Darwin bots will break).

FYI this doesn't build on Linux with gcc

[94/1782] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o
FAILED: lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o 
/usr/bin/c++   -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/CodeGen -I/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen -Iinclude -I/usr/local/google/home/silvasean/pg/llvm/llvm/include -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -std=c++11 -ffunction-sections -fdata-sections -O2 -g -DNDEBUG    -fno-exceptions -fno-rtti -MD -MT lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o -MF lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o.d -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o -c /usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:171:18: error: enclosing class of constexpr non-static member function ‘bool {anonymous}::SuffixTreeNode::isLeaf() const’ is not a literal type
   constexpr bool isLeaf() const { return SuffixIdx != EmptyIdx; }
                  ^
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:91:8: note: ‘{anonymous}::SuffixTreeNode’ is not literal because:
 struct SuffixTreeNode {
        ^
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:91:8: note:   ‘{anonymous}::SuffixTreeNode’ has a non-trivial destructor
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:174:18: error: enclosing class of constexpr non-static member function ‘bool {anonymous}::SuffixTreeNode::isRoot() const’ is not a literal type
   constexpr bool isRoot() const { return StartIdx == EmptyIdx; }
                  ^
[143/1782] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RegAllocGreedy.cpp.o

The compiler is gcc 4.8.4

I just tested on SPECCPU2006 (FullLTO) and no assertion failures!

However, 403.gcc and 483.xalancbmk (at least) seem to have a huge compile time slowdown (superlinear behavior?). Some rough numbers comparing LLC runtime:
403.gcc 11s -> 66s
483.xalancbmk 16s -> 144s
(so about 5-10x slowdown of LLC due to the suffix tree)

Most of the time seems to be spent inside buildCandidateList. Sampling a couple stacks it seems like it is stuck in findBest, usually just 1 or 2 stack frames in findBest and so at least the problem isn't that it is recursing too deeply.
I added some printfs to print out the depth and vertex degree of each node in the suffix tree for 483.xalancbmk and I got this: https://reviews.llvm.org/F3114496

So it makes sense that typically one would be only 1 or 2 stack frames deep.

Modulo the pruning that is going on, we seem to do O(N) work in bestRepeatedSubstring once per outlining candidate. Is the pruning effective enough that the sum of all calls to bestRepeatedSubstring doesn't grow out of control? My suspicion is that it isn't, and I think a contrived case like AAABBBCCCDDD... (Assume "A" represents constant-size string large enough to be profitable to outline) will trigger O(N^2) behavior in the number of instructions in the module.
Is it possible to do algorithmically better? (exploiting suffix tree invariants maybe?)

Also, it looks like this pass actually increases (1-5%) text size on all of the SPEC binaries except for 401.bzip2: https://reviews.llvm.org/F3114409

(I double and triple checked and I don't have it switched around; the raw data (doublechecked the labels are right) is: https://reviews.llvm.org/P7971)

Can you please find out why this isn't helping (and in fact is hurting)? Are better heuristics needed? At the very least, the cost function seems like it needs to be amended to take into account the true overheads.

In particular, it seems that the cost function does not take into account that the outlined functions will have some minimum alignment applied to them (or can you mark them as not requiring this alignment? still, it would end up depending on linker placement (alignment of adjacent sections) and such as to how much padding actually is inserted).
On 483.xalancbmk, the suffix tree based outliner find 2311 functoins to outline, and almost all of them are 2 instructions, which is typically less than 16 bytes, which is the minimum alignment that will be imposed (just from looking at the output binary).
A naive approach which just looks for identical runs of outlinable instructions (ignoring substrings) outlines 2391 functions (slightly more). The total benefit is somewhat greater for the suffix tree though at 29379 vs 27994 for the naive approach.
This appears to be due to the outliner finding many more length-2 sequences to outline: https://reviews.llvm.org/F3114690

Overall, it seems like the vast majority of the benefit on 483.xalancbmk is due to extremely short instruction sequences. But if we are going to avoid very short instruction sequences because they actually aren't profitable, then most of the outlinable instructions disappear on this test case (and at a glance, the other SPEC benchmarks are pretty similar). I'd also like to note that this testing is with FullLTO, so it is a best-case scenario for the outliner (whole program visibility to the suffix tree). What kinds of programs does this outliner perform well on?

For reference, here are all the outlined functions from 483.xalancbmk: https://reviews.llvm.org/F3114805

One interesting thing is that they are almost all short sequences of mov instructions. Staring at the code that calls them, it's clear why this is: almost all of the outlined functions in 483.xalancbmk are in sequences like this:

...
  362a1e:`      e8 dd 20 0e 00       `  callq  444b00 <OUTLINED_FUNCTION2637142655534006531_61>
  362a23:`      e8 4c e8 09 00       `  callq  401274 <_ZN11xercesc_2_512XMLBufferMgr13releaseBufferERNS_9XMLBufferE>
...

(FWIW, I tried and IPRA doesn't actually decrease text size much on SPEC with FullLTO)

I.e. what has been outlined is function setup overhead. There are also quite a few outlined functions right before jumps, which are factoring out code sequences like this:

00000000004a2eb0 <OUTLINED_FUNCTION2637142655534006531_2458>:
  4a2eb0:`      48 8b 41 18          `  mov    0x18(%rcx),%rax
  4a2eb4:`      48 85 c0             `  test   %rax,%rax
  4a2eb7:`      c3                   `  retq␣␣␣
  4a2eb8:`      0f 1f 84 00 00 00 00 `  nopl   0x0(%rax,%rax,1)
  4a2ebf:`      00␣

lib/CodeGen/MachineOutliner.cpp
1031 ↗	(On Diff #89535)	If I understand what this is doing correctly, it can be easily made less than O(N^2) by sorting ascending by Start and descending by End (SROA does something similar to do efficient overlap calculations).
lib/Target/X86/X86InstrInfo.cpp
10387 ↗	(On Diff #89535)	This name does not follow the coding standard. Should be `getOutliningBenefit` or something
10400 ↗	(On Diff #89535)	isFunctionSafeToOutlineFrom

Also, this pass will almost surely introduce timing side-channel attacks into cryptography code (code that would otherwise by "constant time" and needs to be for security).

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue. E.g. a size-constrained program for a secure processing element on a phone recompiles with this option and it silently breaks the security of the entire device. Hopefully the folks programming the secure element have some sort of testing to avoid this or at least have all critical primitives written in asm (or done by a hardware peripheral).

I can't think of any other optimizations we have that would move a program away from being "constant time"; is there any precedent?

In D26872#686175, @silvas wrote:

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue.

Not that it is on by default right now. Just a concern to keep in mind down the road.

In D26872#686175, @silvas wrote:

Also, this pass will almost surely introduce timing side-channel attacks into cryptography code (code that would otherwise by "constant time" and needs to be for security).

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue. E.g. a size-constrained program for a secure processing element on a phone recompiles with this option and it silently breaks the security of the entire device. Hopefully the folks programming the secure element have some sort of testing to avoid this or at least have all critical primitives written in asm (or done by a hardware peripheral).

I can't think of any other optimizations we have that would move a program away from being "constant time"; is there any precedent?

?!? This should be true for most compiler transformations. I don't know how these problems are handled in practice but I doubt they enable compiler optimizations. I don't see why we should start this discussion with this particular review.

In D26872#686220, @MatzeB wrote:

?!? This should be true for most compiler transformations. I don't know how these problems are handled in practice but I doubt they enable compiler optimizations. I don't see why we should start this discussion with this particular review.

I agree that we don't want to discuss it in this review (that's why I said "down the road"), but most compiler transformations I can think of remove indirection or otherwise simplify things towards the set of "constant time" instructions (such as elementary reg-reg adds and such). This pass introduces call instructions into arbitrary code (and calls on x86 architecturally write to memory and are subject to branch prediction, etc.). I agree, let's not have this discussion here though.

btw, down the road you may want to have this pass really know in detail the encoded length of each instruction on x86. There are quite a few *single instructions* that would be beneficial from a code size perspective to outline (if the outlined function is set to have alignment of 1). A quick analysis of an LLD binary (which contains all of LLVM linked in for LTO) shows there is over 5% code size savings just from outlining single instructions (since many x86 instructions encode to be larger than a CALL instruction which is 5 bytes). About half of the benefit (so about 2-3% of the total on this test case) comes from instructions that reference the stack via %rsp (mostly zeroing out stack slots), which could still be outlined if the offset was rewritten.

In D26872#686155, @silvas wrote:

... 403.gcc and 483.xalancbmk (at least) seem to have a huge compile time slowdown (superlinear behavior?). Some rough numbers comparing LLC runtime:
403.gcc 11s -> 66s
483.xalancbmk 16s -> 144s
(so about 5-10x slowdown of LLC due to the suffix tree)

Most of the time seems to be spent inside buildCandidateList. Sampling a couple stacks it seems like it is stuck in findBest...
...
Modulo the pruning that is going on, we seem to do O(N) work in bestRepeatedSubstring once per outlining candidate. Is the pruning effective enough that the sum of all calls to bestRepeatedSubstring doesn't grow out of control? My suspicion is that it isn't, and I think a contrived case like AAABBBCCCDDD... (Assume "A" represents constant-size string large enough to be profitable to outline) will trigger O(N^2) behavior in the number of instructions in the module.
Is it possible to do algorithmically better? (exploiting suffix tree invariants maybe?)

Yeah that'd be a nasty case, and it's worth looking into, for sure.

Some quick ideas off the top of my head:

Pre-prune nodes which can never lead to outlining candidates while setting suffix indices. This would still be vulnerable to programs that look like AABBCC, but may improve query time on average.
Keep track of a collection of prospective "Outlining points". During the first traversal, if we find anything beneficial remember where it was. On the next traversal, if we have a next best point, start at that point instead of the root.
Keep track of every beneficial substring, during one O(n) traversal, and prune overlaps choosing the most beneficial ones greedily.

In D26872#686155, @silvas wrote:

... it seems that the cost function does not take into account that the outlined functions will have some minimum alignment applied to them (or can you mark them as not requiring this alignment? still, it would end up depending on linker placement (alignment of adjacent sections) and such as to how much padding actually is inserted).

I'll have to look into that and see what happens.

In D26872#686155, @silvas wrote:

On 483.xalancbmk, the suffix tree based outliner find 2311 functoins to outline, and almost all of them are 2 instructions, which is typically less than 16 bytes, which is the minimum alignment that will be imposed (just from looking at the output binary).
...
Overall, it seems like the vast majority of the benefit on 483.xalancbmk is due to extremely short instruction sequences. But if we are going to avoid very short instruction sequences because they actually aren't profitable, then most of the outlinable instructions disappear on this test case (and at a glance, the other SPEC benchmarks are pretty similar). I'd also like to note that this testing is with FullLTO, so it is a best-case scenario for the outliner (whole program visibility to the suffix tree).

Did you modify the benefit function to verify that removing length-2 instruction sequences actually removes most candidates? We could have found, say BC as the most beneficial, which would prune out all instances of ABC. There could very well be repeated instances of ABC that would be beneficial to outline. It might be possible to impose a minimum length restriction on x86 without losing all of the candidates.

In D26872#686155, @silvas wrote:

What kinds of programs does this outliner perform well on?

In the test suite, the x86 outliner tended to do well on programs with heavy macro usage or automatically-generated code.

As you found, x86 is a particularly hostile environment for this sort of pass. :) It was just used for a proof of concept and for ease of testing. Most work for this pass should be done for other targets, like, say ARM64.

In D26872#686770, @silvas wrote:

btw, down the road you may want to have this pass really know in detail the encoded length of each instruction on x86. There are quite a few *single instructions* that would be beneficial from a code size perspective to outline (if the outlined function is set to have alignment of 1). A quick analysis of an LLD binary (which contains all of LLVM linked in for LTO) shows there is over 5% code size savings just from outlining single instructions (since many x86 instructions encode to be larger than a CALL instruction which is 5 bytes). About half of the benefit (so about 2-3% of the total on this test case) comes from instructions that reference the stack via %rsp (mostly zeroing out stack slots), which could still be outlined if the offset was rewritten.

I would really love to do this, but I'm not sure if it's possible in LLVM at the moment. If it is, then I'll gladly add it in since I think it's probably one of the main reasons that x86 tests can get larger rather than smaller. The only thing that's (architecturally) tricky is that the target would need to know about the instruction-integer mappings. This could be done by moving the InstructionMapper over to the target, but I'm not sure if that's the best approach. If it's okay to do that, I doubt it'd be too difficult.

Okay, here's the next revision, everyone!

Changes

Debug info is now skipped over as if it doesn't exist
isLegalToOutline->getOutliningType, which returns Legal, Illegal, or Invisible. Invisible is used for instructions which should be ignored.
Combined outliner tests
Added test with debug info
Outlined functions are private again without the wonky "hash"
Style conformance changes, etc.

My LGTM still stands. Should I commit on your behalf or do you already have access?

aprantl added inline comments.Feb 27 2017, 2:02 PM

lib/Target/X86/X86InstrInfo.cpp
10439 ↗	(On Diff #89928)	Thanks!
test/CodeGen/X86/machine-outliner-debuginfo.ll
43 ↗	(On Diff #89928)	There should also be a negative check to ensure no DBG_VALUE is in the outlined function and that that no debug locations are attached to the outlined function.

In D26872#687815, @MatzeB wrote:

My LGTM still stands. Should I commit on your behalf or do you already have access?

I don't have commit access yet, so go ahead.

Changes

Added negative test for debug info in machine-outliner-debuginfo.ll. The check allows for is_stmt debug stuff because that seems to be out of my control.

Changes

Realized that the last debug test wasn't sufficient, so I fixed it up. It now handles debug values as well.

Third time is the charm. Edits to the tests. The debug test now *truly* makes sure that debug values don't impact outlining. Also removed some cruft from the tests.

Closed by commit rL296418: Add MIR-level outlining pass (authored by matze). · Explain WhyFeb 27 2017, 4:45 PM

This revision was automatically updated to reflect the committed changes.

I went ahead and committed the current state as we believe all immediately actionable things are addressed. And I'd really like us to do further work on it upstream so we don't get a 2000 lines review for every little change.

We do appreciate all the discussion here and hope we can continue on llvm-dev and the upcoming patches.

Nice to see this finally land!

FWIW, I did talk to a security professional inside google (the type of person for which common advice "don't write your own crypto" doesn't apply) and they said that they weren't particularly worried about the transformation done by this pass. Phew!

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

CodeGen/

Passes.h

4 lines

InitializePasses.h

1 line

Target/

TargetInstrInfo.h

57 lines

lib/

CodeGen/

1 line

1 line

1399 lines

6 lines

Target/

X86/

X86InstrInfo.h

21 lines

X86InstrInfo.cpp

80 lines

test/

CodeGen/

X86/

machine-outliner-debuginfo.ll

75 lines

machine-outliner.ll

110 lines

Diff 89953

llvm/trunk/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// if available with PysicalRegisterUsageInfo pass.		/// if available with PysicalRegisterUsageInfo pass.
FunctionPass *createRegUsageInfoPropPass();		FunctionPass *createRegUsageInfoPropPass();

/// This pass performs software pipelining on machine instructions.		/// This pass performs software pipelining on machine instructions.
extern char &MachinePipelinerID;		extern char &MachinePipelinerID;

/// This pass frees the memory occupied by the MachineFunction.		/// This pass frees the memory occupied by the MachineFunction.
FunctionPass *createFreeMachineFunctionPass();		FunctionPass *createFreeMachineFunctionPass();

		/// This pass performs outlining on machine instructions directly before
		/// printing assembly.
		ModulePass *createMachineOutlinerPass();
} // End llvm namespace		} // End llvm namespace

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_END.		/// INITIALIZE_TM_PASS_END.
#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN		#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_BEGIN.		/// INITIALIZE_TM_PASS_BEGIN.
Show All 24 Lines

llvm/trunk/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	void initializeMachineCopyPropagationPass(PassRegistry&);			void initializeMachineCopyPropagationPass(PassRegistry&);
	void initializeMachineDominanceFrontierPass(PassRegistry&);			void initializeMachineDominanceFrontierPass(PassRegistry&);
	void initializeMachineDominatorTreePass(PassRegistry&);			void initializeMachineDominatorTreePass(PassRegistry&);
	void initializeMachineFunctionPrinterPassPass(PassRegistry&);			void initializeMachineFunctionPrinterPassPass(PassRegistry&);
	void initializeMachineLICMPass(PassRegistry&);			void initializeMachineLICMPass(PassRegistry&);
	void initializeMachineLoopInfoPass(PassRegistry&);			void initializeMachineLoopInfoPass(PassRegistry&);
	void initializeMachineModuleInfoPass(PassRegistry&);			void initializeMachineModuleInfoPass(PassRegistry&);
	void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);			void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);
				void initializeMachineOutlinerPass(PassRegistry&);
	void initializeMachinePipelinerPass(PassRegistry&);			void initializeMachinePipelinerPass(PassRegistry&);
	void initializeMachinePostDominatorTreePass(PassRegistry&);			void initializeMachinePostDominatorTreePass(PassRegistry&);
	void initializeMachineRegionInfoPassPass(PassRegistry&);			void initializeMachineRegionInfoPassPass(PassRegistry&);
	void initializeMachineSchedulerPass(PassRegistry&);			void initializeMachineSchedulerPass(PassRegistry&);
	void initializeMachineSinkingPass(PassRegistry&);			void initializeMachineSinkingPass(PassRegistry&);
	void initializeMachineTraceMetricsPass(PassRegistry&);			void initializeMachineTraceMetricsPass(PassRegistry&);
	void initializeMachineVerifierPassPass(PassRegistry&);			void initializeMachineVerifierPassPass(PassRegistry&);
	void initializeMemCpyOptLegacyPassPass(PassRegistry&);			void initializeMemCpyOptLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 1,502 Lines • ▼ Show 20 Lines	public:

/// True if the instruction is bound to the top of its basic block and no		/// True if the instruction is bound to the top of its basic block and no
/// other instructions shall be inserted before it. This can be implemented		/// other instructions shall be inserted before it. This can be implemented
/// to prevent register allocator to insert spills before such instructions.		/// to prevent register allocator to insert spills before such instructions.
virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {		virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {
return false;		return false;
}		}

		/// \brief Return how many instructions would be saved by outlining a
		/// sequence containing \p SequenceSize instructions that appears
		/// \p Occurrences times in a module.
		virtual unsigned getOutliningBenefit(size_t SequenceSize, size_t Occurrences)
		const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::getOutliningBenefit!");
		}

		/// Represents how an instruction should be mapped by the outliner.
		/// \p Legal instructions are those which are safe to outline.
		/// \p Illegal instructions are those which cannot be outlined.
		/// \p Invisible instructions are instructions which can be outlined, but
		/// shouldn't actually impact the outlining result.
		enum MachineOutlinerInstrType {Legal, Illegal, Invisible};

		/// Return true if the instruction is legal to outline.
		virtual MachineOutlinerInstrType getOutliningType(MachineInstr &MI) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::getOutliningType!");
		}

		/// Insert a custom epilogue for outlined functions.
		/// This may be empty, in which case no epilogue or return statement will be
		/// emitted.
		virtual void insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinerEpilogue!");
		}

		/// Insert a call to an outlined function into the program.
		/// Returns an iterator to the spot where we inserted the call. This must be
		/// implemented by the target.
		virtual MachineBasicBlock::iterator
		insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It, MachineFunction &MF)
		const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinedCall!");
		}

		/// Insert a custom prologue for outlined functions.
		/// This may be empty, in which case no prologue will be emitted.
		virtual void insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinerPrologue!");
		}

		/// Return true if the function can safely be outlined from.
		/// By default, this means that the function has no red zone.
		virtual bool isFunctionSafeToOutlineFrom(MachineFunction &F) const {
		llvm_unreachable("Target didn't implement "
		"TargetInstrInfo::isFunctionSafeToOutlineFrom!");
		}

private:		private:
unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;		unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
unsigned CatchRetOpcode;		unsigned CatchRetOpcode;
unsigned ReturnOpcode;		unsigned ReturnOpcode;
};		};

/// \brief Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.		/// \brief Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.
template<>		template<>
Show All 28 Lines

llvm/trunk/lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
MachineFunctionPrinterPass.cpp		MachineFunctionPrinterPass.cpp
MachineInstrBundle.cpp		MachineInstrBundle.cpp
MachineInstr.cpp		MachineInstr.cpp
MachineLICM.cpp		MachineLICM.cpp
MachineLoopInfo.cpp		MachineLoopInfo.cpp
MachineModuleInfo.cpp		MachineModuleInfo.cpp
MachineModuleInfoImpls.cpp		MachineModuleInfoImpls.cpp
MachineOptimizationRemarkEmitter.cpp		MachineOptimizationRemarkEmitter.cpp
		MachineOutliner.cpp
MachinePassRegistry.cpp		MachinePassRegistry.cpp
MachinePipeliner.cpp		MachinePipeliner.cpp
MachinePostDominators.cpp		MachinePostDominators.cpp
MachineRegionInfo.cpp		MachineRegionInfo.cpp
MachineRegisterInfo.cpp		MachineRegisterInfo.cpp
MachineScheduler.cpp		MachineScheduler.cpp
MachineSink.cpp		MachineSink.cpp
MachineSSAUpdater.cpp		MachineSSAUpdater.cpp
▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

llvm/trunk/lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeMachineCombinerPass(Registry);		initializeMachineCombinerPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
initializeMachineFunctionPrinterPassPass(Registry);		initializeMachineFunctionPrinterPassPass(Registry);
initializeMachineLICMPass(Registry);		initializeMachineLICMPass(Registry);
initializeMachineLoopInfoPass(Registry);		initializeMachineLoopInfoPass(Registry);
initializeMachineModuleInfoPass(Registry);		initializeMachineModuleInfoPass(Registry);
initializeMachineOptimizationRemarkEmitterPassPass(Registry);		initializeMachineOptimizationRemarkEmitterPassPass(Registry);
		initializeMachineOutlinerPass(Registry);
initializeMachinePipelinerPass(Registry);		initializeMachinePipelinerPass(Registry);
initializeMachinePostDominatorTreePass(Registry);		initializeMachinePostDominatorTreePass(Registry);
initializeMachineRegionInfoPassPass(Registry);		initializeMachineRegionInfoPassPass(Registry);
initializeMachineSchedulerPass(Registry);		initializeMachineSchedulerPass(Registry);
initializeMachineSinkingPass(Registry);		initializeMachineSinkingPass(Registry);
initializeMachineVerifierPassPass(Registry);		initializeMachineVerifierPassPass(Registry);
initializeXRayInstrumentationPass(Registry);		initializeXRayInstrumentationPass(Registry);
initializePatchableFunctionPass(Registry);		initializePatchableFunctionPass(Registry);
Show All 34 Lines

llvm/trunk/lib/CodeGen/MachineOutliner.cpp

				//===---- MachineOutliner.cpp - Outline instructions ------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Replaces repeated sequences of instructions with function calls.
				///
				/// This works by placing every instruction from every basic block in a
				/// suffix tree, and repeatedly querying that tree for repeated sequences of
				/// instructions. If a sequence of instructions appears often, then it ought
				/// to be beneficial to pull out into a function.
				///
				/// This was originally presented at the 2016 LLVM Developers' Meeting in the
				/// talk "Reducing Code Size Using Outlining". For a high-level overview of
				/// how this pass works, the talk is available on YouTube at
				///
				/// https://www.youtube.com/watch?v=yorld-WSOeU
				///
				/// The slides for the talk are available at
				///
				/// http://www.llvm.org/devmtg/2016-11/Slides/Paquette-Outliner.pdf
				///
				/// The talk provides an overview of how the outliner finds candidates and
				/// ultimately outlines them. It describes how the main data structure for this
				/// pass, the suffix tree, is queried and purged for candidates. It also gives
				/// a simplified suffix tree construction algorithm for suffix trees based off
				/// of the algorithm actually used here, Ukkonen's algorithm.
				///
				/// For the original RFC for this pass, please see
				///
				/// http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
				///
				/// For more information on the suffix tree data structure, please see
				/// https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
				///
				//===----------------------------------------------------------------------===//
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/Support/Allocator.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetMachine.h"
				#include "llvm/Target/TargetRegisterInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				#include <functional>
				#include <map>
				#include <sstream>
				#include <tuple>
				#include <vector>

				#define DEBUG_TYPE "machine-outliner"

				using namespace llvm;

				STATISTIC(NumOutlined, "Number of candidates outlined");
				STATISTIC(FunctionsCreated, "Number of functions created");

				namespace {

				/// Represents an undefined index in the suffix tree.
				const size_t EmptyIdx = -1;

				/// A node in a suffix tree which represents a substring or suffix.
				///
				/// Each node has either no children or at least two children, with the root
				/// being a exception in the empty tree.
				///
				/// Children are represented as a map between unsigned integers and nodes. If
				/// a node N has a child M on unsigned integer k, then the mapping represented
				/// by N is a proper prefix of the mapping represented by M. Note that this,
				/// although similar to a trie is somewhat different: each node stores a full
				/// substring of the full mapping rather than a single character state.
				///
				/// Each internal node contains a pointer to the internal node representing
				/// the same string, but with the first character chopped off. This is stored
				/// in \p Link. Each leaf node stores the start index of its respective
				/// suffix in \p SuffixIdx.
				struct SuffixTreeNode {

				/// The children of this node.
				///
				/// A child existing on an unsigned integer implies that from the mapping
				/// represented by the current node, there is a way to reach another
				/// mapping by tacking that character on the end of the current string.
				DenseMap<unsigned, SuffixTreeNode *> Children;

				/// A flag set to false if the node has been pruned from the tree.
				bool IsInTree = true;

				/// The start index of this node's substring in the main string.
				size_t StartIdx = EmptyIdx;

				/// The end index of this node's substring in the main string.
				///
				/// Every leaf node must have its \p EndIdx incremented at the end of every
				/// step in the construction algorithm. To avoid having to update O(N)
				/// nodes individually at the end of every step, the end index is stored
				/// as a pointer.
				size_t *EndIdx = nullptr;

				/// For leaves, the start index of the suffix represented by this node.
				///
				/// For all other nodes, this is ignored.
				size_t SuffixIdx = EmptyIdx;

				/// \brief For internal nodes, a pointer to the internal node representing
				/// the same sequence with the first character chopped off.
				///
				/// This has two major purposes in the suffix tree. The first is as a
				/// shortcut in Ukkonen's construction algorithm. One of the things that
				/// Ukkonen's algorithm does to achieve linear-time construction is
				/// keep track of which node the next insert should be at. This makes each
				/// insert O(1), and there are a total of O(N) inserts. The suffix link
				/// helps with inserting children of internal nodes.
				///
				/// Say we add a child to an internal node with associated mapping S. The
				/// next insertion must be at the node representing S - its first character.
				/// This is given by the way that we iteratively build the tree in Ukkonen's
				/// algorithm. The main idea is to look at the suffixes of each prefix in the
				/// string, starting with the longest suffix of the prefix, and ending with
				/// the shortest. Therefore, if we keep pointers between such nodes, we can
				/// move to the next insertion point in O(1) time. If we don't, then we'd
				/// have to query from the root, which takes O(N) time. This would make the
				/// construction algorithm O(N^2) rather than O(N).
				///
				/// The suffix link is also used during the tree pruning process to let us
				/// quickly throw out a bunch of potential overlaps. Say we have a sequence
				/// S we want to outline. Then each of its suffixes contribute to at least
				/// one overlapping case. Therefore, we can follow the suffix links
				/// starting at the node associated with S to the root and "delete" those
				/// nodes, save for the root. For each candidate, this removes
				/// O(\|candidate\|) overlaps from the search space. We don't actually
				/// completely invalidate these nodes though; doing that is far too
				/// aggressive. Consider the following pathological string:
				///
				/// 1 2 3 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3
				///
				/// If we, for the sake of example, outlined 1 2 3, then we would throw
				/// out all instances of 2 3. This isn't desirable. To get around this,
				/// when we visit a link node, we decrement its occurrence count by the
				/// number of sequences we outlined in the current step. In the pathological
				/// example, the 2 3 node would have an occurrence count of 8, while the
				/// 1 2 3 node would have an occurrence count of 2. Thus, the 2 3 node
				/// would survive to the next round allowing us to outline the extra
				/// instances of 2 3.
				SuffixTreeNode *Link = nullptr;

				/// The parent of this node. Every node except for the root has a parent.
				SuffixTreeNode *Parent = nullptr;

				/// The number of times this node's string appears in the tree.
				///
				/// This is equal to the number of leaf children of the string. It represents
				/// the number of suffixes that the node's string is a prefix of.
				size_t OccurrenceCount = 0;

				/// Returns true if this node is a leaf.
				bool isLeaf() const { return SuffixIdx != EmptyIdx; }

				/// Returns true if this node is the root of its owning \p SuffixTree.
				bool isRoot() const { return StartIdx == EmptyIdx; }

				/// Return the number of elements in the substring associated with this node.
				size_t size() const {

				// Is it the root? If so, it's the empty string so return 0.
				if (isRoot())
				return 0;

				assert(*EndIdx != EmptyIdx && "EndIdx is undefined!");

				// Size = the number of elements in the string.
				// For example, [0 1 2 3] has length 4, not 3. 3-0 = 3, so we have 3-0+1.
				return *EndIdx - StartIdx + 1;
				}

				SuffixTreeNode(size_t StartIdx, size_t EndIdx, SuffixTreeNode Link,
				SuffixTreeNode *Parent)
				: StartIdx(StartIdx), EndIdx(EndIdx), Link(Link), Parent(Parent) {}

				SuffixTreeNode() {}
				};

				/// A data structure for fast substring queries.
				///
				/// Suffix trees represent the suffixes of their input strings in their leaves.
				/// A suffix tree is a type of compressed trie structure where each node
				/// represents an entire substring rather than a single character. Each leaf
				/// of the tree is a suffix.
				///
				/// A suffix tree can be seen as a type of state machine where each state is a
				/// substring of the full string. The tree is structured so that, for a string
				/// of length N, there are exactly N leaves in the tree. This structure allows
				/// us to quickly find repeated substrings of the input string.
				///
				/// In this implementation, a "string" is a vector of unsigned integers.
				/// These integers may result from hashing some data type. A suffix tree can
				/// contain 1 or many strings, which can then be queried as one large string.
				///
				/// The suffix tree is implemented using Ukkonen's algorithm for linear-time
				/// suffix tree construction. Ukkonen's algorithm is explained in more detail
				/// in the paper by Esko Ukkonen "On-line construction of suffix trees. The
				/// paper is available at
				///
				/// https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
				class SuffixTree {
				private:
				/// Each element is an integer representing an instruction in the module.
				ArrayRef<unsigned> Str;

				/// Maintains each node in the tree.
				BumpPtrAllocator NodeAllocator;

				/// The root of the suffix tree.
				///
				/// The root represents the empty string. It is maintained by the
				/// \p NodeAllocator like every other node in the tree.
				SuffixTreeNode *Root = nullptr;

				/// Stores each leaf in the tree for better pruning.
				std::vector<SuffixTreeNode *> LeafVector;

				/// Maintains the end indices of the internal nodes in the tree.
				///
				/// Each internal node is guaranteed to never have its end index change
				/// during the construction algorithm; however, leaves must be updated at
				/// every step. Therefore, we need to store leaf end indices by reference
				/// to avoid updating O(N) leaves at every step of construction. Thus,
				/// every internal node must be allocated its own end index.
				BumpPtrAllocator InternalEndIdxAllocator;

				/// The end index of each leaf in the tree.
				size_t LeafEndIdx = -1;

				/// \brief Helper struct which keeps track of the next insertion point in
				/// Ukkonen's algorithm.
				struct ActiveState {
				/// The next node to insert at.
				SuffixTreeNode *Node;

				/// The index of the first character in the substring currently being added.
				size_t Idx = EmptyIdx;

				/// The length of the substring we have to add at the current step.
				size_t Len = 0;
				};

				/// \brief The point the next insertion will take place at in the
				/// construction algorithm.
				ActiveState Active;

				/// Allocate a leaf node and add it to the tree.
				///
				/// \param Parent The parent of this node.
				/// \param StartIdx The start index of this node's associated string.
				/// \param Edge The label on the edge leaving \p Parent to this node.
				///
				/// \returns A pointer to the allocated leaf node.
				SuffixTreeNode *insertLeaf(SuffixTreeNode &Parent, size_t StartIdx,
				unsigned Edge) {

				assert(StartIdx <= LeafEndIdx && "String can't start after it ends!");

				SuffixTreeNode *N = new (NodeAllocator) SuffixTreeNode(StartIdx,
				&LeafEndIdx,
				nullptr,
				&Parent);
				Parent.Children[Edge] = N;

				return N;
				}

				/// Allocate an internal node and add it to the tree.
				///
				/// \param Parent The parent of this node. Only null when allocating the root.
				/// \param StartIdx The start index of this node's associated string.
				/// \param EndIdx The end index of this node's associated string.
				/// \param Edge The label on the edge leaving \p Parent to this node.
				///
				/// \returns A pointer to the allocated internal node.
				SuffixTreeNode insertInternalNode(SuffixTreeNode Parent, size_t StartIdx,
				size_t EndIdx, unsigned Edge) {

				assert(StartIdx <= EndIdx && "String can't start after it ends!");
				assert(!(!Parent && StartIdx != EmptyIdx) &&
				"Non-root internal nodes must have parents!");

				size_t *E = new (InternalEndIdxAllocator) size_t(EndIdx);
				SuffixTreeNode *N = new (NodeAllocator) SuffixTreeNode(StartIdx,
				E,
				Root,
				Parent);
				if (Parent)
				Parent->Children[Edge] = N;

				return N;
				}

				/// \brief Set the suffix indices of the leaves to the start indices of their
				/// respective suffixes. Also stores each leaf in \p LeafVector at its
				/// respective suffix index.
				///
				/// \param[in] CurrNode The node currently being visited.
				/// \param CurrIdx The current index of the string being visited.
				void setSuffixIndices(SuffixTreeNode &CurrNode, size_t CurrIdx) {

				bool IsLeaf = CurrNode.Children.size() == 0 && !CurrNode.isRoot();

				// Traverse the tree depth-first.
				for (auto &ChildPair : CurrNode.Children) {
				assert(ChildPair.second && "Node had a null child!");
				setSuffixIndices(*ChildPair.second,
				CurrIdx + ChildPair.second->size());
				}

				// Is this node a leaf?
				if (IsLeaf) {
				// If yes, give it a suffix index and bump its parent's occurrence count.
				CurrNode.SuffixIdx = Str.size() - CurrIdx;
				assert(CurrNode.Parent && "CurrNode had no parent!");
				CurrNode.Parent->OccurrenceCount++;

				// Store the leaf in the leaf vector for pruning later.
				LeafVector[CurrNode.SuffixIdx] = &CurrNode;
				}
				}

				/// \brief Construct the suffix tree for the prefix of the input ending at
				/// \p EndIdx.
				///
				/// Used to construct the full suffix tree iteratively. At the end of each
				/// step, the constructed suffix tree is either a valid suffix tree, or a
				/// suffix tree with implicit suffixes. At the end of the final step, the
				/// suffix tree is a valid tree.
				///
				/// \param EndIdx The end index of the current prefix in the main string.
				/// \param SuffixesToAdd The number of suffixes that must be added
				/// to complete the suffix tree at the current phase.
				///
				/// \returns The number of suffixes that have not been added at the end of
				/// this step.
				unsigned extend(size_t EndIdx, size_t SuffixesToAdd) {
				SuffixTreeNode *NeedsLink = nullptr;

				while (SuffixesToAdd > 0) {

				// Are we waiting to add anything other than just the last character?
				if (Active.Len == 0) {
				// If not, then say the active index is the end index.
				Active.Idx = EndIdx;
				}

				assert(Active.Idx <= EndIdx && "Start index can't be after end index!");

				// The first character in the current substring we're looking at.
				unsigned FirstChar = Str[Active.Idx];

				// Have we inserted anything starting with FirstChar at the current node?
				if (Active.Node->Children.count(FirstChar) == 0) {
				// If not, then we can just insert a leaf and move too the next step.
				insertLeaf(*Active.Node, EndIdx, FirstChar);

				// The active node is an internal node, and we visited it, so it must
				// need a link if it doesn't have one.
				if (NeedsLink) {
				NeedsLink->Link = Active.Node;
				NeedsLink = nullptr;
				}
				} else {
				// There's a match with FirstChar, so look for the point in the tree to
				// insert a new node.
				SuffixTreeNode *NextNode = Active.Node->Children[FirstChar];

				size_t SubstringLen = NextNode->size();

				// Is the current suffix we're trying to insert longer than the size of
				// the child we want to move to?
				if (Active.Len >= SubstringLen) {
				// If yes, then consume the characters we've seen and move to the next
				// node.
				Active.Idx += SubstringLen;
				Active.Len -= SubstringLen;
				Active.Node = NextNode;
				continue;
				}

				// Otherwise, the suffix we're trying to insert must be contained in the
				// next node we want to move to.
				unsigned LastChar = Str[EndIdx];

				// Is the string we're trying to insert a substring of the next node?
				if (Str[NextNode->StartIdx + Active.Len] == LastChar) {
				// If yes, then we're done for this step. Remember our insertion point
				// and move to the next end index. At this point, we have an implicit
				// suffix tree.
				if (NeedsLink && !Active.Node->isRoot()) {
				NeedsLink->Link = Active.Node;
				NeedsLink = nullptr;
				}

				Active.Len++;
				break;
				}

				// The string we're trying to insert isn't a substring of the next node,
				// but matches up to a point. Split the node.
				//
				// For example, say we ended our search at a node n and we're trying to
				// insert ABD. Then we'll create a new node s for AB, reduce n to just
				// representing C, and insert a new leaf node l to represent d. This
				// allows us to ensure that if n was a leaf, it remains a leaf.
				//
				// \| ABC ---split---> \| AB
				// n s
				// C / \ D
				// n l

				// The node s from the diagram
				SuffixTreeNode *SplitNode =
				insertInternalNode(Active.Node,
				NextNode->StartIdx,
				NextNode->StartIdx + Active.Len - 1,
				FirstChar);

				// Insert the new node representing the new substring into the tree as
				// a child of the split node. This is the node l from the diagram.
				insertLeaf(*SplitNode, EndIdx, LastChar);

				// Make the old node a child of the split node and update its start
				// index. This is the node n from the diagram.
				NextNode->StartIdx += Active.Len;
				NextNode->Parent = SplitNode;
				SplitNode->Children[Str[NextNode->StartIdx]] = NextNode;

				// SplitNode is an internal node, update the suffix link.
				if (NeedsLink)
				NeedsLink->Link = SplitNode;

				NeedsLink = SplitNode;
				}

				// We've added something new to the tree, so there's one less suffix to
				// add.
				SuffixesToAdd--;

				if (Active.Node->isRoot()) {
				if (Active.Len > 0) {
				Active.Len--;
				Active.Idx = EndIdx - SuffixesToAdd + 1;
				}
				} else {
				// Start the next phase at the next smallest suffix.
				Active.Node = Active.Node->Link;
				}
				}

				return SuffixesToAdd;
				}

				/// \brief Return the start index and length of a string which maximizes a
				/// benefit function by traversing the tree depth-first.
				///
				/// Helper function for \p bestRepeatedSubstring.
				///
				/// \param CurrNode The node currently being visited.
				/// \param CurrLen Length of the current string.
				/// \param[out] BestLen Length of the most beneficial substring.
				/// \param[out] MaxBenefit Benefit of the most beneficial substring.
				/// \param[out] BestStartIdx Start index of the most beneficial substring.
				/// \param BenefitFn The function the query should return a maximum string
				/// for.
				void findBest(SuffixTreeNode &CurrNode, size_t CurrLen, size_t &BestLen,
				size_t &MaxBenefit, size_t &BestStartIdx,
				const std::function<unsigned(SuffixTreeNode &, size_t CurrLen)>
				&BenefitFn) {

				if (!CurrNode.IsInTree)
				return;

				// Can we traverse further down the tree?
				if (!CurrNode.isLeaf()) {
				// If yes, continue the traversal.
				for (auto &ChildPair : CurrNode.Children) {
				if (ChildPair.second && ChildPair.second->IsInTree)
				findBest(*ChildPair.second, CurrLen + ChildPair.second->size(),
				BestLen, MaxBenefit, BestStartIdx, BenefitFn);
				}
				} else {
				// We hit a leaf.
				size_t StringLen = CurrLen - CurrNode.size();
				unsigned Benefit = BenefitFn(CurrNode, StringLen);

				// Did we do better than in the last step?
				if (Benefit <= MaxBenefit)
				return;

				// We did better, so update the best string.
				MaxBenefit = Benefit;
				BestStartIdx = CurrNode.SuffixIdx;
				BestLen = StringLen;
				}
				}

				public:

				/// \brief Return a substring of the tree with maximum benefit if such a
				/// substring exists.
				///
				/// Clears the input vector and fills it with a maximum substring or empty.
				///
				/// \param[in,out] Best The most beneficial substring in the tree. Empty
				/// if it does not exist.
				/// \param BenefitFn The function the query should return a maximum string
				/// for.
				void bestRepeatedSubstring(std::vector<unsigned> &Best,
				const std::function<unsigned(SuffixTreeNode &, size_t CurrLen)>
				&BenefitFn) {
				Best.clear();
				size_t Length = 0; // Becomes the length of the best substring.
				size_t Benefit = 0; // Becomes the benefit of the best substring.
				size_t StartIdx = 0; // Becomes the start index of the best substring.
				findBest(*Root, 0, Length, Benefit, StartIdx, BenefitFn);

				for (size_t Idx = 0; Idx < Length; Idx++)
				Best.push_back(Str[Idx + StartIdx]);
				}

				/// Perform a depth-first search for \p QueryString on the suffix tree.
				///
				/// \param QueryString The string to search for.
				/// \param CurrIdx The current index in \p QueryString that is being matched
				/// against.
				/// \param CurrNode The suffix tree node being searched in.
				///
				/// \returns A \p SuffixTreeNode that \p QueryString appears in if such a
				/// node exists, and \p nullptr otherwise.
				SuffixTreeNode *findString(const std::vector<unsigned> &QueryString,
				size_t &CurrIdx, SuffixTreeNode *CurrNode) {

				// The search ended at a nonexistent or pruned node. Quit.
				if (!CurrNode \|\| !CurrNode->IsInTree)
				return nullptr;

				unsigned Edge = QueryString[CurrIdx]; // The edge we want to move on.
				SuffixTreeNode *NextNode = CurrNode->Children[Edge]; // Next node in query.

				if (CurrNode->isRoot()) {
				// If we're at the root we have to check if there's a child, and move to
				// that child. Don't consume the character since \p Root represents the
				// empty string.
				if (NextNode && NextNode->IsInTree)
				return findString(QueryString, CurrIdx, NextNode);
				return nullptr;
				}

				size_t StrIdx = CurrNode->StartIdx;
				size_t MaxIdx = QueryString.size();
				bool ContinueSearching = false;

				// Match as far as possible into the string. If there's a mismatch, quit.
				for (; CurrIdx < MaxIdx; CurrIdx++, StrIdx++) {
				Edge = QueryString[CurrIdx];

				// We matched perfectly, but still have a remainder to search.
				if (StrIdx > *(CurrNode->EndIdx)) {
				ContinueSearching = true;
				break;
				}

				if (Edge != Str[StrIdx])
				return nullptr;
				}

				NextNode = CurrNode->Children[Edge];

				// Move to the node which matches what we're looking for and continue
				// searching.
				if (ContinueSearching)
				return findString(QueryString, CurrIdx, NextNode);

				// We matched perfectly so we're done.
				return CurrNode;
				}

				/// \brief Remove a node from a tree and all nodes representing proper
				/// suffixes of that node's string.
				///
				/// This is used in the outlining algorithm to reduce the number of
				/// overlapping candidates
				///
				/// \param N The suffix tree node to start pruning from.
				/// \param Len The length of the string to be pruned.
				///
				/// \returns True if this candidate didn't overlap with a previously chosen
				/// candidate.
				bool prune(SuffixTreeNode *N, size_t Len) {

				bool NoOverlap = true;
				std::vector<unsigned> IndicesToPrune;

				// Look at each of N's children.
				for (auto &ChildPair : N->Children) {
				SuffixTreeNode *M = ChildPair.second;

				// Is this a leaf child?
				if (M && M->IsInTree && M->isLeaf()) {
				// Save each leaf child's suffix indices and remove them from the tree.
				IndicesToPrune.push_back(M->SuffixIdx);
				M->IsInTree = false;
				}
				}

				// Remove each suffix we have to prune from the tree. Each of these will be
				// I + some offset for I in IndicesToPrune and some offset < Len.
				unsigned Offset = 1;
				for (unsigned CurrentSuffix = 1; CurrentSuffix < Len; CurrentSuffix++) {
				for (unsigned I : IndicesToPrune) {

				unsigned PruneIdx = I + Offset;

				// Is this index actually in the string?
				if (PruneIdx < LeafVector.size()) {
				// If yes, we have to try and prune it.
				// Was the current leaf already pruned by another candidate?
				if (LeafVector[PruneIdx]->IsInTree) {
				// If not, prune it.
				LeafVector[PruneIdx]->IsInTree = false;
				} else {
				// If yes, signify that we've found an overlap, but keep pruning.
				NoOverlap = false;
				}

				// Update the parent of the current leaf's occurrence count.
				SuffixTreeNode *Parent = LeafVector[PruneIdx]->Parent;

				// Is the parent still in the tree?
				if (Parent->OccurrenceCount > 0) {
				Parent->OccurrenceCount--;
				Parent->IsInTree = (Parent->OccurrenceCount > 1);
				}
				}
				}

				// Move to the next character in the string.
				Offset++;
				}

				// We know we can never outline anything which starts one index back from
				// the indices we want to outline. This is because our minimum outlining
				// length is always 2.
				for (unsigned I : IndicesToPrune) {
				if (I > 0) {

				unsigned PruneIdx = I-1;
				SuffixTreeNode *Parent = LeafVector[PruneIdx]->Parent;

				// Was the leaf one index back from I already pruned?
				if (LeafVector[PruneIdx]->IsInTree) {
				// If not, prune it.
				LeafVector[PruneIdx]->IsInTree = false;
				} else {
				// If yes, signify that we've found an overlap, but keep pruning.
				NoOverlap = false;
				}

				// Update the parent of the current leaf's occurrence count.
				if (Parent->OccurrenceCount > 0) {
				Parent->OccurrenceCount--;
				Parent->IsInTree = (Parent->OccurrenceCount > 1);
				}
				}
				}

				// Finally, remove N from the tree and set its occurrence count to 0.
				N->IsInTree = false;
				N->OccurrenceCount = 0;

				return NoOverlap;
				}

				/// \brief Find each occurrence of of a string in \p QueryString and prune
				/// their nodes.
				///
				/// \param QueryString The string to search for.
				/// \param[out] Occurrences The start indices of each occurrence.
				///
				/// \returns Whether or not the occurrence overlaps with a previous candidate.
				bool findOccurrencesAndPrune(const std::vector<unsigned> &QueryString,
				std::vector<size_t> &Occurrences) {
				size_t Dummy = 0;
				SuffixTreeNode *N = findString(QueryString, Dummy, Root);

				if (!N \|\| !N->IsInTree)
				return false;

				// If this is an internal node, occurrences are the number of leaf children
				// of the node.
				for (auto &ChildPair : N->Children) {
				SuffixTreeNode *M = ChildPair.second;

				// Is it a leaf? If so, we have an occurrence.
				if (M && M->IsInTree && M->isLeaf())
				Occurrences.push_back(M->SuffixIdx);
				}

				// If we're in a leaf, then this node is the only occurrence.
				if (N->isLeaf())
				Occurrences.push_back(N->SuffixIdx);

				return prune(N, QueryString.size());
				}

				/// Construct a suffix tree from a sequence of unsigned integers.
				///
				/// \param Str The string to construct the suffix tree for.
				SuffixTree(const std::vector<unsigned> &Str) : Str(Str) {
				Root = insertInternalNode(nullptr, EmptyIdx, EmptyIdx, 0);
				Root->IsInTree = true;
				Active.Node = Root;
				LeafVector.reserve(Str.size());

				// Keep track of the number of suffixes we have to add of the current
				// prefix.
				size_t SuffixesToAdd = 0;
				Active.Node = Root;

				// Construct the suffix tree iteratively on each prefix of the string.
				// PfxEndIdx is the end index of the current prefix.
				// End is one past the last element in the string.
				for (size_t PfxEndIdx = 0, End = Str.size(); PfxEndIdx < End; PfxEndIdx++) {
				SuffixesToAdd++;
				LeafEndIdx = PfxEndIdx; // Extend each of the leaves.
				SuffixesToAdd = extend(PfxEndIdx, SuffixesToAdd);
				}

				// Set the suffix indices of each leaf.
				assert(Root && "Root node can't be nullptr!");
				setSuffixIndices(*Root, 0);
				}
				};

				/// \brief An individual sequence of instructions to be replaced with a call to
				/// an outlined function.
				struct Candidate {

				/// Set to false if the candidate overlapped with another candidate.
				bool InCandidateList = true;

				/// The start index of this \p Candidate.
				size_t StartIdx;

				/// The number of instructions in this \p Candidate.
				size_t Len;

				/// The index of this \p Candidate's \p OutlinedFunction in the list of
				/// \p OutlinedFunctions.
				size_t FunctionIdx;

				Candidate(size_t StartIdx, size_t Len, size_t FunctionIdx)
				: StartIdx(StartIdx), Len(Len), FunctionIdx(FunctionIdx) {}

				Candidate() {}

				/// \brief Used to ensure that \p Candidates are outlined in an order that
				/// preserves the start and end indices of other \p Candidates.
				bool operator<(const Candidate &RHS) const { return StartIdx > RHS.StartIdx; }
				};

				/// \brief The information necessary to create an outlined function for some
				/// class of candidate.
				struct OutlinedFunction {

				/// The actual outlined function created.
				/// This is initialized after we go through and create the actual function.
				MachineFunction *MF = nullptr;

				/// A number assigned to this function which appears at the end of its name.
				size_t Name;

				/// The number of times that this function has appeared.
				size_t OccurrenceCount = 0;

				/// \brief The sequence of integers corresponding to the instructions in this
				/// function.
				std::vector<unsigned> Sequence;

				/// The number of instructions this function would save.
				unsigned Benefit = 0;

				OutlinedFunction(size_t Name, size_t OccurrenceCount,
				const std::vector<unsigned> &Sequence,
				unsigned Benefit)
				: Name(Name), OccurrenceCount(OccurrenceCount), Sequence(Sequence),
				Benefit(Benefit)
				{}
				};

				/// \brief Maps \p MachineInstrs to unsigned integers and stores the mappings.
				struct InstructionMapper {

				/// \brief The next available integer to assign to a \p MachineInstr that
				/// cannot be outlined.
				///
				/// Set to -3 for compatability with \p DenseMapInfo<unsigned>.
				unsigned IllegalInstrNumber = -3;

				/// \brief The next available integer to assign to a \p MachineInstr that can
				/// be outlined.
				unsigned LegalInstrNumber = 0;

				/// Correspondence from \p MachineInstrs to unsigned integers.
				DenseMap<MachineInstr *, unsigned, MachineInstrExpressionTrait>
				InstructionIntegerMap;

				/// Corresponcence from unsigned integers to \p MachineInstrs.
				/// Inverse of \p InstructionIntegerMap.
				DenseMap<unsigned, MachineInstr *> IntegerInstructionMap;

				/// The vector of unsigned integers that the module is mapped to.
				std::vector<unsigned> UnsignedVec;

				/// \brief Stores the location of the instruction associated with the integer
				/// at index i in \p UnsignedVec for each index i.
				std::vector<MachineBasicBlock::iterator> InstrList;

				/// \brief Maps \p *It to a legal integer.
				///
				/// Updates \p InstrList, \p UnsignedVec, \p InstructionIntegerMap,
				/// \p IntegerInstructionMap, and \p LegalInstrNumber.
				///
				/// \returns The integer that \p *It was mapped to.
				unsigned mapToLegalUnsigned(MachineBasicBlock::iterator &It) {

				// Get the integer for this instruction or give it the current
				// LegalInstrNumber.
				InstrList.push_back(It);
				MachineInstr &MI = *It;
				bool WasInserted;
				DenseMap<MachineInstr *, unsigned, MachineInstrExpressionTrait>::iterator
				ResultIt;
				std::tie(ResultIt, WasInserted) =
				InstructionIntegerMap.insert(std::make_pair(&MI, LegalInstrNumber));
				unsigned MINumber = ResultIt->second;

				// There was an insertion.
				if (WasInserted) {
				LegalInstrNumber++;
				IntegerInstructionMap.insert(std::make_pair(MINumber, &MI));
				}

				UnsignedVec.push_back(MINumber);

				// Make sure we don't overflow or use any integers reserved by the DenseMap.
				if (LegalInstrNumber >= IllegalInstrNumber)
				report_fatal_error("Instruction mapping overflow!");

				assert(LegalInstrNumber != DenseMapInfo<unsigned>::getEmptyKey()
				&& "Tried to assign DenseMap tombstone or empty key to instruction.");
				assert(LegalInstrNumber != DenseMapInfo<unsigned>::getTombstoneKey()
				&& "Tried to assign DenseMap tombstone or empty key to instruction.");

				return MINumber;
				}

				/// Maps \p *It to an illegal integer.
				///
				/// Updates \p InstrList, \p UnsignedVec, and \p IllegalInstrNumber.
				///
				/// \returns The integer that \p *It was mapped to.
				unsigned mapToIllegalUnsigned(MachineBasicBlock::iterator &It) {
				unsigned MINumber = IllegalInstrNumber;

				InstrList.push_back(It);
				UnsignedVec.push_back(IllegalInstrNumber);
				IllegalInstrNumber--;

				assert(LegalInstrNumber < IllegalInstrNumber &&
				"Instruction mapping overflow!");

				assert(IllegalInstrNumber !=
				DenseMapInfo<unsigned>::getEmptyKey() &&
				"IllegalInstrNumber cannot be DenseMap tombstone or empty key!");

				assert(IllegalInstrNumber !=
				DenseMapInfo<unsigned>::getTombstoneKey() &&
				"IllegalInstrNumber cannot be DenseMap tombstone or empty key!");

				return MINumber;
				}

				/// \brief Transforms a \p MachineBasicBlock into a \p vector of \p unsigneds
				/// and appends it to \p UnsignedVec and \p InstrList.
				///
				/// Two instructions are assigned the same integer if they are identical.
				/// If an instruction is deemed unsafe to outline, then it will be assigned an
				/// unique integer. The resulting mapping is placed into a suffix tree and
				/// queried for candidates.
				///
				/// \param MBB The \p MachineBasicBlock to be translated into integers.
				/// \param TRI \p TargetRegisterInfo for the module.
				/// \param TII \p TargetInstrInfo for the module.
				void convertToUnsignedVec(MachineBasicBlock &MBB,
				const TargetRegisterInfo &TRI,
				const TargetInstrInfo &TII) {
				for (MachineBasicBlock::iterator It = MBB.begin(), Et = MBB.end(); It != Et;
				It++) {

				// Keep track of where this instruction is in the module.
				switch(TII.getOutliningType(*It)) {
				case TargetInstrInfo::MachineOutlinerInstrType::Illegal:
				mapToIllegalUnsigned(It);
				break;

				case TargetInstrInfo::MachineOutlinerInstrType::Legal:
				mapToLegalUnsigned(It);
				break;

				case TargetInstrInfo::MachineOutlinerInstrType::Invisible:
				break;
				}
				}

				// After we're done every insertion, uniquely terminate this part of the
				// "string". This makes sure we won't match across basic block or function
				// boundaries since the "end" is encoded uniquely and thus appears in no
				// repeated substring.
				InstrList.push_back(MBB.end());
				UnsignedVec.push_back(IllegalInstrNumber);
				IllegalInstrNumber--;
				}

				InstructionMapper() {
				// Make sure that the implementation of DenseMapInfo<unsigned> hasn't
				// changed.
				assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1 &&
				"DenseMapInfo<unsigned>'s empty key isn't -1!");
				assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2 &&
				"DenseMapInfo<unsigned>'s tombstone key isn't -2!");
				}
				};

				/// \brief An interprocedural pass which finds repeated sequences of
				/// instructions and replaces them with calls to functions.
				///
				/// Each instruction is mapped to an unsigned integer and placed in a string.
				/// The resulting mapping is then placed in a \p SuffixTree. The \p SuffixTree
				/// is then repeatedly queried for repeated sequences of instructions. Each
				/// non-overlapping repeated sequence is then placed in its own
				/// \p MachineFunction and each instance is then replaced with a call to that
				/// function.
				struct MachineOutliner : public ModulePass {

				static char ID;

				StringRef getPassName() const override { return "Machine Outliner"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineModuleInfo>();
				AU.addPreserved<MachineModuleInfo>();
				AU.setPreservesAll();
				ModulePass::getAnalysisUsage(AU);
				}

				MachineOutliner() : ModulePass(ID) {
				initializeMachineOutlinerPass(*PassRegistry::getPassRegistry());
				}

				/// \brief Replace the sequences of instructions represented by the
				/// \p Candidates in \p CandidateList with calls to \p MachineFunctions
				/// described in \p FunctionList.
				///
				/// \param M The module we are outlining from.
				/// \param CandidateList A list of candidates to be outlined.
				/// \param FunctionList A list of functions to be inserted into the module.
				/// \param Mapper Contains the instruction mappings for the module.
				bool outline(Module &M, const ArrayRef<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				InstructionMapper &Mapper);

				/// Creates a function for \p OF and inserts it into the module.
				MachineFunction *createOutlinedFunction(Module &M, const OutlinedFunction &OF,
				InstructionMapper &Mapper);

				/// Find potential outlining candidates and store them in \p CandidateList.
				///
				/// For each type of potential candidate, also build an \p OutlinedFunction
				/// struct containing the information to build the function for that
				/// candidate.
				///
				/// \param[out] CandidateList Filled with outlining candidates for the module.
				/// \param[out] FunctionList Filled with functions corresponding to each type
				/// of \p Candidate.
				/// \param ST The suffix tree for the module.
				/// \param TII TargetInstrInfo for the module.
				///
				/// \returns The length of the longest candidate found. 0 if there are none.
				unsigned buildCandidateList(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				SuffixTree &ST, const TargetInstrInfo &TII);

				/// \brief Remove any overlapping candidates that weren't handled by the
				/// suffix tree's pruning method.
				///
				/// Pruning from the suffix tree doesn't necessarily remove all overlaps.
				/// If a short candidate is chosen for outlining, then a longer candidate
				/// which has that short candidate as a suffix is chosen, the tree's pruning
				/// method will not find it. Thus, we need to prune before outlining as well.
				///
				/// \param[in,out] CandidateList A list of outlining candidates.
				/// \param[in,out] FunctionList A list of functions to be outlined.
				/// \param MaxCandidateLen The length of the longest candidate.
				/// \param TII TargetInstrInfo for the module.
				void pruneOverlaps(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				unsigned MaxCandidateLen,
				const TargetInstrInfo &TII);

				/// Construct a suffix tree on the instructions in \p M and outline repeated
				/// strings from that tree.
				bool runOnModule(Module &M) override;
				};

				} // Anonymous namespace.

				char MachineOutliner::ID = 0;

				namespace llvm {
				ModulePass *createMachineOutlinerPass() { return new MachineOutliner(); }
				}

				INITIALIZE_PASS(MachineOutliner, "machine-outliner",
				"Machine Function Outliner", false, false)

				void MachineOutliner::pruneOverlaps(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				unsigned MaxCandidateLen,
				const TargetInstrInfo &TII) {

				// Check for overlaps in the range. This is O(n^2) worst case, but we can
				// alleviate that somewhat by bounding our search space using the start
				// index of our first candidate and the maximum distance an overlapping
				// candidate could have from the first candidate.
				for (auto It = CandidateList.begin(), Et = CandidateList.end(); It != Et;
				It++) {
				Candidate &C1 = *It;
				OutlinedFunction &F1 = FunctionList[C1.FunctionIdx];

				// If we removed this candidate, skip it.
				if (!C1.InCandidateList)
				continue;

				// If the candidate's function isn't good to outline anymore, then
				// remove the candidate and skip it.
				if (F1.OccurrenceCount < 2 \|\| F1.Benefit < 1) {
				C1.InCandidateList = false;
				continue;
				}

				// The minimum start index of any candidate that could overlap with this
				// one.
				unsigned FarthestPossibleIdx = 0;

				// Either the index is 0, or it's at most MaxCandidateLen indices away.
				if (C1.StartIdx > MaxCandidateLen)
				FarthestPossibleIdx = C1.StartIdx - MaxCandidateLen;

				// Compare against the other candidates in the list.
				// This is at most MaxCandidateLen/2 other candidates.
				// This is because each candidate has to be at least 2 indices away.
				// = O(n * MaxCandidateLen/2) comparisons
				//
				// On average, the maximum length of a candidate is quite small; a fraction
				// of the total module length in terms of instructions. If the maximum
				// candidate length is large, then there are fewer possible candidates to
				// compare against in the first place.
				for (auto Sit = It + 1; Sit != Et; Sit++) {
				Candidate &C2 = *Sit;
				OutlinedFunction &F2 = FunctionList[C2.FunctionIdx];

				// Is this candidate too far away to overlap?
				// NOTE: This will be true in
				// O(max(FarthestPossibleIdx/2, #Candidates remaining)) steps
				// for every candidate.
				if (C2.StartIdx < FarthestPossibleIdx)
				break;

				// Did we already remove this candidate in a previous step?
				if (!C2.InCandidateList)
				continue;

				// Is the function beneficial to outline?
				if (F2.OccurrenceCount < 2 \|\| F2.Benefit < 1) {
				// If not, remove this candidate and move to the next one.
				C2.InCandidateList = false;
				continue;
				}

				size_t C2End = C2.StartIdx + C2.Len - 1;

				// Do C1 and C2 overlap?
				//
				// Not overlapping:
				// High indices... [C1End ... C1Start][C2End ... C2Start] ...Low indices
				//
				// We sorted our candidate list so C2Start <= C1Start. We know that
				// C2End > C2Start since each candidate has length >= 2. Therefore, all we
				// have to check is C2End < C2Start to see if we overlap.
				if (C2End < C1.StartIdx)
				continue;

				// C2 overlaps with C1. Because we pruned the tree already, the only way
				// this can happen is if C1 is a proper suffix of C2. Thus, we must have
				// found C1 first during our query, so it must have benefit greater or
				// equal to C2. Greedily pick C1 as the candidate to keep and toss out C2.
				DEBUG (
				size_t C1End = C1.StartIdx + C1.Len - 1;
				dbgs() << "- Found an overlap to purge.\n";
				dbgs() << "--- C1 :[" << C1.StartIdx << ", " << C1End << "]\n";
				dbgs() << "--- C2 :[" << C2.StartIdx << ", " << C2End << "]\n";
				);

				// Update the function's occurrence count and benefit to reflec that C2
				// is being removed.
				F2.OccurrenceCount--;
				F2.Benefit = TII.getOutliningBenefit(F2.Sequence.size(),
				F2.OccurrenceCount
				);

				// Mark C2 as not in the list.
				C2.InCandidateList = false;

				DEBUG (
				dbgs() << "- Removed C2. \n";
				dbgs() << "--- Num fns left for C2: " << F2.OccurrenceCount << "\n";
				dbgs() << "--- C2's benefit: " << F2.Benefit << "\n";
				);
				}
				}
				}

				unsigned
				MachineOutliner::buildCandidateList(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				SuffixTree &ST,
				const TargetInstrInfo &TII) {

				std::vector<unsigned> CandidateSequence; // Current outlining candidate.
				unsigned MaxCandidateLen = 0; // Length of the longest candidate.

				// Function for maximizing query in the suffix tree.
				// This allows us to define more fine-grained types of things to outline in
				// the target without putting target-specific info in the suffix tree.
				auto BenefitFn = [&TII](const SuffixTreeNode &Curr, size_t StringLen) {

				// Any leaf whose parent is the root only has one occurrence.
				if (Curr.Parent->isRoot())
				return 0u;

				// Anything with length < 2 will never be beneficial on any target.
				if (StringLen < 2)
				return 0u;

				size_t Occurrences = Curr.Parent->OccurrenceCount;

				// Anything with fewer than 2 occurrences will never be beneficial on any
				// target.
				if (Occurrences < 2)
				return 0u;

				return TII.getOutliningBenefit(StringLen, Occurrences);
				};

				// Repeatedly query the suffix tree for the substring that maximizes
				// BenefitFn. Find the occurrences of that string, prune the tree, and store
				// each occurrence as a candidate.
				for (ST.bestRepeatedSubstring(CandidateSequence, BenefitFn);
				CandidateSequence.size() > 1;
				ST.bestRepeatedSubstring(CandidateSequence, BenefitFn)) {

				std::vector<size_t> Occurrences;

				bool GotNonOverlappingCandidate =
				ST.findOccurrencesAndPrune(CandidateSequence, Occurrences);

				// Is the candidate we found known to overlap with something we already
				// outlined?
				if (!GotNonOverlappingCandidate)
				continue;

				// Is this candidate the longest so far?
				if (CandidateSequence.size() > MaxCandidateLen)
				MaxCandidateLen = CandidateSequence.size();

				// Keep track of the benefit of outlining this candidate in its
				// OutlinedFunction.
				unsigned FnBenefit = TII.getOutliningBenefit(CandidateSequence.size(),
				Occurrences.size()
				);

				assert(FnBenefit > 0 && "Function cannot be unbeneficial!");

				// Save an OutlinedFunction for this candidate.
				FunctionList.emplace_back(
				FunctionList.size(), // Number of this function.
				Occurrences.size(), // Number of occurrences.
				CandidateSequence, // Sequence to outline.
				FnBenefit // Instructions saved by outlining this function.
				);

				// Save each of the occurrences of the candidate so we can outline them.
				for (size_t &Occ : Occurrences)
				CandidateList.emplace_back(
				Occ, // Starting idx in that MBB.
				CandidateSequence.size(), // Candidate length.
				FunctionList.size() - 1 // Idx of the corresponding function.
				);

				FunctionsCreated++;
				}

				// Sort the candidates in decending order. This will simplify the outlining
				// process when we have to remove the candidates from the mapping by
				// allowing us to cut them out without keeping track of an offset.
				std::stable_sort(CandidateList.begin(), CandidateList.end());

				return MaxCandidateLen;
				}

				MachineFunction *
				MachineOutliner::createOutlinedFunction(Module &M, const OutlinedFunction &OF,
				InstructionMapper &Mapper) {

				// Create the function name. This should be unique. For now, just hash the
				// module name and include it in the function name plus the number of this
				// function.
				std::ostringstream NameStream;
				NameStream << "OUTLINED_FUNCTION" << "_" << OF.Name;

				// Create the function using an IR-level function.
				LLVMContext &C = M.getContext();
				Function *F = dyn_cast<Function>(
				M.getOrInsertFunction(NameStream.str(), Type::getVoidTy(C), NULL));
				assert(F && "Function was null!");

				// NOTE: If this is linkonceodr, then we can take advantage of linker deduping
				// which gives us better results when we outline from linkonceodr functions.
				F->setLinkage(GlobalValue::PrivateLinkage);
				F->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);

				BasicBlock *EntryBB = BasicBlock::Create(C, "entry", F);
				IRBuilder<> Builder(EntryBB);
				Builder.CreateRetVoid();

				MachineModuleInfo &MMI = getAnalysis<MachineModuleInfo>();
				MachineFunction &MF = MMI.getMachineFunction(*F);
				MachineBasicBlock &MBB = *MF.CreateMachineBasicBlock();
				const TargetSubtargetInfo &STI = MF.getSubtarget();
				const TargetInstrInfo &TII = *STI.getInstrInfo();

				// Insert the new function into the module.
				MF.insert(MF.begin(), &MBB);

				TII.insertOutlinerPrologue(MBB, MF);

				// Copy over the instructions for the function using the integer mappings in
				// its sequence.
				for (unsigned Str : OF.Sequence) {
				MachineInstr *NewMI =
				MF.CloneMachineInstr(Mapper.IntegerInstructionMap.find(Str)->second);
				NewMI->dropMemRefs();

				// Don't keep debug information for outlined instructions.
				// FIXME: This means outlined functions are currently undebuggable.
				NewMI->setDebugLoc(DebugLoc());
				MBB.insert(MBB.end(), NewMI);
				}

				TII.insertOutlinerEpilogue(MBB, MF);

				return &MF;
				}

				bool MachineOutliner::outline(Module &M,
				const ArrayRef<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				InstructionMapper &Mapper) {

				bool OutlinedSomething = false;

				// Replace the candidates with calls to their respective outlined functions.
				for (const Candidate &C : CandidateList) {

				// Was the candidate removed during pruneOverlaps?
				if (!C.InCandidateList)
				continue;

				// If not, then look at its OutlinedFunction.
				OutlinedFunction &OF = FunctionList[C.FunctionIdx];

				// Was its OutlinedFunction made unbeneficial during pruneOverlaps?
				if (OF.OccurrenceCount < 2 \|\| OF.Benefit < 1)
				continue;

				// If not, then outline it.
				assert(C.StartIdx < Mapper.InstrList.size() && "Candidate out of bounds!");
				MachineBasicBlock MBB = (Mapper.InstrList[C.StartIdx]).getParent();
				MachineBasicBlock::iterator StartIt = Mapper.InstrList[C.StartIdx];
				unsigned EndIdx = C.StartIdx + C.Len - 1;

				assert(EndIdx < Mapper.InstrList.size() && "Candidate out of bounds!");
				MachineBasicBlock::iterator EndIt = Mapper.InstrList[EndIdx];
				assert(EndIt != MBB->end() && "EndIt out of bounds!");

				EndIt++; // Erase needs one past the end index.

				// Does this candidate have a function yet?
				if (!OF.MF)
				OF.MF = createOutlinedFunction(M, OF, Mapper);

				MachineFunction *MF = OF.MF;
				const TargetSubtargetInfo &STI = MF->getSubtarget();
				const TargetInstrInfo &TII = *STI.getInstrInfo();

				// Insert a call to the new function and erase the old sequence.
				TII.insertOutlinedCall(M, MBB, StartIt, MF);
				StartIt = Mapper.InstrList[C.StartIdx];
				MBB->erase(StartIt, EndIt);

				OutlinedSomething = true;

				// Statistics.
				NumOutlined++;
				}

				DEBUG (
				dbgs() << "OutlinedSomething = " << OutlinedSomething << "\n";
				);

				return OutlinedSomething;
				}

				bool MachineOutliner::runOnModule(Module &M) {

				// Is there anything in the module at all?
				if (M.empty())
				return false;

				MachineModuleInfo &MMI = getAnalysis<MachineModuleInfo>();
				const TargetSubtargetInfo &STI = MMI.getMachineFunction(*M.begin())
				.getSubtarget();
				const TargetRegisterInfo *TRI = STI.getRegisterInfo();
				const TargetInstrInfo *TII = STI.getInstrInfo();

				InstructionMapper Mapper;

				// Build instruction mappings for each function in the module.
				for (Function &F : M) {
				MachineFunction &MF = MMI.getMachineFunction(F);

				// Is the function empty? Safe to outline from?
				if (F.empty() \|\| !TII->isFunctionSafeToOutlineFrom(MF))
				continue;

				// If it is, look at each MachineBasicBlock in the function.
				for (MachineBasicBlock &MBB : MF) {

				// Is there anything in MBB?
				if (MBB.empty())
				continue;

				// If yes, map it.
				Mapper.convertToUnsignedVec(MBB, TRI, TII);
				}
				}

				// Construct a suffix tree, use it to find candidates, and then outline them.
				SuffixTree ST(Mapper.UnsignedVec);
				std::vector<Candidate> CandidateList;
				std::vector<OutlinedFunction> FunctionList;

				unsigned MaxCandidateLen =
				buildCandidateList(CandidateList, FunctionList, ST, *TII);

				pruneOverlaps(CandidateList, FunctionList, MaxCandidateLen, *TII);
				return outline(M, CandidateList, FunctionList, Mapper);
				}

llvm/trunk/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,		static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,
cl::desc("Print LLVM IR input to isel pass"));		cl::desc("Print LLVM IR input to isel pass"));
static cl::opt<bool> PrintGCInfo("print-gc", cl::Hidden,		static cl::opt<bool> PrintGCInfo("print-gc", cl::Hidden,
cl::desc("Dump garbage collector data"));		cl::desc("Dump garbage collector data"));
static cl::opt<bool> VerifyMachineCode("verify-machineinstrs", cl::Hidden,		static cl::opt<bool> VerifyMachineCode("verify-machineinstrs", cl::Hidden,
cl::desc("Verify generated machine code"),		cl::desc("Verify generated machine code"),
cl::init(false),		cl::init(false),
cl::ZeroOrMore);		cl::ZeroOrMore);
		static cl::opt<bool> EnableMachineOutliner("enable-machine-outliner",
		cl::Hidden,
		cl::desc("Enable machine outliner"));

static cl::opt<std::string>		static cl::opt<std::string>
PrintMachineInstrs("print-machineinstrs", cl::ValueOptional,		PrintMachineInstrs("print-machineinstrs", cl::ValueOptional,
cl::desc("Print machine instrs"),		cl::desc("Print machine instrs"),
cl::value_desc("pass-name"), cl::init("option-unspecified"));		cl::value_desc("pass-name"), cl::init("option-unspecified"));

static cl::opt<int> EnableGlobalISelAbort(		static cl::opt<int> EnableGlobalISelAbort(
"global-isel-abort", cl::Hidden,		"global-isel-abort", cl::Hidden,
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	void TargetPassConfig::addMachinePasses() {
addPass(&LiveDebugValuesID, false);		addPass(&LiveDebugValuesID, false);

// Insert before XRay Instrumentation.		// Insert before XRay Instrumentation.
addPass(&FEntryInserterID, false);		addPass(&FEntryInserterID, false);

addPass(&XRayInstrumentationID, false);		addPass(&XRayInstrumentationID, false);
addPass(&PatchableFunctionID, false);		addPass(&PatchableFunctionID, false);

		if (EnableMachineOutliner)
		PM->add(createMachineOutlinerPass());

AddingMachinePasses = false;		AddingMachinePasses = false;
}		}

/// Add passes that optimize machine instructions in SSA form.		/// Add passes that optimize machine instructions in SSA form.
void TargetPassConfig::addMachineSSAOptimization() {		void TargetPassConfig::addMachineSSAOptimization() {
// Pre-ra tail duplication.		// Pre-ra tail duplication.
addPass(&EarlyTailDuplicateID);		addPass(&EarlyTailDuplicateID);

▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	public:
std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
decomposeMachineOperandsTargetFlags(unsigned TF) const override;		decomposeMachineOperandsTargetFlags(unsigned TF) const override;

ArrayRef<std::pair<unsigned, const char *>>		ArrayRef<std::pair<unsigned, const char *>>
getSerializableDirectMachineOperandTargetFlags() const override;		getSerializableDirectMachineOperandTargetFlags() const override;

bool isTailCall(const MachineInstr &Inst) const override;		bool isTailCall(const MachineInstr &Inst) const override;

		unsigned getOutliningBenefit(size_t SequenceSize,
		size_t Occurrences) const override;

		bool isFunctionSafeToOutlineFrom(MachineFunction &MF) const override;

		llvm::X86GenInstrInfo::MachineOutlinerInstrType
		getOutliningType(MachineInstr &MI) const override;

		bool isFixablePostOutline(MachineInstr &MI) const;

		void insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const override;

		void insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const override;

		MachineBasicBlock::iterator
		insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It,
		MachineFunction &MF) const override;

protected:		protected:
/// Commutes the operands in the given instruction by changing the operands		/// Commutes the operands in the given instruction by changing the operands
/// order and/or changing the instruction's opcode and/or the immediate value		/// order and/or changing the instruction's opcode and/or the immediate value
/// operand.		/// operand.
///		///
/// The arguments 'CommuteOpIdx1' and 'CommuteOpIdx2' specify the operands		/// The arguments 'CommuteOpIdx1' and 'CommuteOpIdx2' specify the operands
/// to be commuted.		/// to be commuted.
///		///
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,377 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char LDTLSCleanup::ID = 0;		char LDTLSCleanup::ID = 0;
FunctionPass*		FunctionPass*
llvm::createCleanupLocalDynamicTLSPass() { return new LDTLSCleanup(); }		llvm::createCleanupLocalDynamicTLSPass() { return new LDTLSCleanup(); }

		unsigned X86InstrInfo::getOutliningBenefit(size_t SequenceSize,
		size_t Occurrences) const {
		unsigned NotOutlinedSize = SequenceSize * Occurrences;

		// Sequence appears once in outlined function (Sequence.size())
		// One return instruction (+1)
		// One call per occurrence (Occurrences)
		unsigned OutlinedSize = (SequenceSize + 1) + Occurrences;

		// Return the number of instructions saved by outlining this sequence.
		return NotOutlinedSize > OutlinedSize ? NotOutlinedSize - OutlinedSize : 0;
		}

		bool X86InstrInfo::isFunctionSafeToOutlineFrom(MachineFunction &MF) const {
		return MF.getFunction()->hasFnAttribute(Attribute::NoRedZone);
		}

		X86GenInstrInfo::MachineOutlinerInstrType
		X86InstrInfo::getOutliningType(MachineInstr &MI) const {

		// Don't outline returns or basic block terminators.
		if (MI.isReturn() \|\| MI.isTerminator())
		return MachineOutlinerInstrType::Illegal;

		// Don't outline anything that modifies or reads from the stack pointer.
		//
		// FIXME: There are instructions which are being manually built without
		// explicit uses/defs so we also have to check the MCInstrDesc. We should be
		// able to remove the extra checks once those are fixed up. For example,
		// sometimes we might get something like %RAX<def> = POP64r 1. This won't be
		// caught by modifiesRegister or readsRegister even though the instruction
		// really ought to be formed so that modifiesRegister/readsRegister would
		// catch it.
		if (MI.modifiesRegister(X86::RSP, &RI) \|\| MI.readsRegister(X86::RSP, &RI) \|\|
		MI.getDesc().hasImplicitUseOfPhysReg(X86::RSP) \|\|
		MI.getDesc().hasImplicitDefOfPhysReg(X86::RSP))
		return MachineOutlinerInstrType::Illegal;

		if (MI.readsRegister(X86::RIP, &RI) \|\|
		MI.getDesc().hasImplicitUseOfPhysReg(X86::RIP) \|\|
		MI.getDesc().hasImplicitDefOfPhysReg(X86::RIP))
		return MachineOutlinerInstrType::Illegal;

		if (MI.isPosition())
		return MachineOutlinerInstrType::Illegal;

		for (const MachineOperand &MOP : MI.operands())
		if (MOP.isCPI() \|\| MOP.isJTI() \|\| MOP.isCFIIndex() \|\| MOP.isFI() \|\|
		MOP.isTargetIndex())
		return MachineOutlinerInstrType::Illegal;

		// Don't allow debug values to impact outlining type.
		if (MI.isDebugValue() \|\| MI.isIndirectDebugValue())
		return MachineOutlinerInstrType::Invisible;

		return MachineOutlinerInstrType::Legal;
		}

		void X86InstrInfo::insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {

		MachineInstr *retq = BuildMI(MF, DebugLoc(), get(X86::RETQ));
		MBB.insert(MBB.end(), retq);
		}

		void X86InstrInfo::insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		return;
		}

		MachineBasicBlock::iterator
		X86InstrInfo::insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It,
		MachineFunction &MF) const {
		It = MBB.insert(It,
		BuildMI(MF, DebugLoc(), get(X86::CALL64pcrel32))
		.addGlobalAddress(M.getNamedValue(MF.getName())));
		return It;
		}

llvm/trunk/test/CodeGen/X86/machine-outliner-debuginfo.ll

				; RUN: llc -enable-machine-outliner -mtriple=x86_64-apple-darwin < %s \| FileCheck %s

				@x = global i32 0, align 4, !dbg !0

				define i32 @main() #0 !dbg !11 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4
				; There is a debug value in the middle of this section, make sure debug values are ignored.
				; CHECK: callq l_OUTLINED_FUNCTION_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				call void @llvm.dbg.value(metadata i32 10, i64 0, metadata !15, metadata !16), !dbg !17
				store i32 4, i32* %5, align 4
				store i32 0, i32* @x, align 4, !dbg !24
				; This is the same sequence of instructions without a debug value. It should be outlined
				; in the same way.
				; CHECK: callq l_OUTLINED_FUNCTION_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				store i32 1, i32* @x, align 4, !dbg !14
				ret i32 0, !dbg !25
				}

				; CHECK-LABEL: l_OUTLINED_FUNCTION_0:
				; CHECK-NOT: .loc {{[0-9]+}} {{[0-9]+}} {{[0-9]+}} {{^(is_stmt)}}
				; CHECK-NOT: ##DEBUG_VALUE: main:{{[a-z]}} <- {{[0-9]+}}
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: retq

				declare void @llvm.dbg.declare(metadata, metadata, metadata) #1

				declare void @llvm.dbg.value(metadata, i64, metadata, metadata) #1

				attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="true" }

				!llvm.dbg.cu = !{!2}
				!llvm.module.flags = !{!7, !8, !9}
				!llvm.ident = !{!10}

				!0 = !DIGlobalVariableExpression(var: !1)
				!1 = distinct !DIGlobalVariable(name: "x", scope: !2, file: !3, line: 2, type: !6, isLocal: false, isDefinition: true)
				!2 = distinct !DICompileUnit(language: DW_LANG_C99, file: !3, producer: "clang version 5.0.0", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !4, globals: !5)
				!3 = !DIFile(filename: "debug-test.c", directory: "dir")
				!4 = !{}
				!5 = !{!0}
				!6 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!7 = !{i32 2, !"Dwarf Version", i32 4}
				!8 = !{i32 2, !"Debug Info Version", i32 3}
				!9 = !{i32 1, !"PIC Level", i32 2}
				!10 = !{!"clang version 5.0.0"}
				!11 = distinct !DISubprogram(name: "main", scope: !3, file: !3, line: 4, type: !12, isLocal: false, isDefinition: true, scopeLine: 4, flags: DIFlagPrototyped, isOptimized: false, unit: !2, variables: !4)
				!12 = !DISubroutineType(types: !13)
				!13 = !{!6}
				!14 = !DILocation(line: 7, column: 4, scope: !11)
				!15 = !DILocalVariable(name: "a", scope: !11, file: !3, line: 5, type: !6)
				!16 = !DIExpression()
				!17 = !DILocation(line: 5, column: 6, scope: !11)
				!18 = !DILocalVariable(name: "b", scope: !11, file: !3, line: 5, type: !6)
				!19 = !DILocation(line: 5, column: 9, scope: !11)
				!20 = !DILocalVariable(name: "c", scope: !11, file: !3, line: 5, type: !6)
				!21 = !DILocation(line: 5, column: 12, scope: !11)
				!22 = !DILocalVariable(name: "d", scope: !11, file: !3, line: 5, type: !6)
				!23 = !DILocation(line: 5, column: 15, scope: !11)
				!24 = !DILocation(line: 14, column: 4, scope: !11)
				!25 = !DILocation(line: 21, column: 2, scope: !11)

llvm/trunk/test/CodeGen/X86/machine-outliner.ll

				; RUN: llc -enable-machine-outliner -mtriple=x86_64-apple-darwin < %s \| FileCheck %s

				@x = global i32 0, align 4

				define i32 @check_boundaries() #0 {
				; CHECK-LABEL: _check_boundaries:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4
				store i32 0, i32* %1, align 4
				store i32 0, i32* %2, align 4
				%6 = load i32, i32* %2, align 4
				%7 = icmp ne i32 %6, 0
				br i1 %7, label %9, label %8

				; CHECK: callq l_OUTLINED_FUNCTION_1
				; CHECK: cmpl $0, -{{[0-9]+}}(%rbp)
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				br label %10

				store i32 1, i32* %4, align 4
				br label %10

				%11 = load i32, i32* %2, align 4
				%12 = icmp ne i32 %11, 0
				br i1 %12, label %14, label %13

				; CHECK: callq l_OUTLINED_FUNCTION_1
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				br label %15

				store i32 1, i32* %4, align 4
				br label %15

				ret i32 0
				}

				define i32 @empty_1() #0 {
				; CHECK-LABEL: _empty_1:
				; CHECK-NOT: callq l_OUTLINED_FUNCTION_{{[0-9]+}}
				ret i32 1
				}

				define i32 @empty_2() #0 {
				; CHECK-LABEL: _empty_2
				; CHECK-NOT: callq l_OUTLINED_FUNCTION_{{[0-9]+}}
				ret i32 1
				}

				define i32 @no_empty_outlining() #0 {
				; CHECK-LABEL: _no_empty_outlining:
				%1 = alloca i32, align 4
				store i32 0, i32* %1, align 4
				; CHECK-NOT: callq l_OUTLINED_FUNCTION_{{[0-9]+}}
				%2 = call i32 @empty_1() #1
				%3 = call i32 @empty_2() #1
				%4 = call i32 @empty_1() #1
				%5 = call i32 @empty_2() #1
				%6 = call i32 @empty_1() #1
				%7 = call i32 @empty_2() #1
				ret i32 0
				}

				define i32 @main() #0 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4

				store i32 0, i32* %1, align 4
				store i32 0, i32* @x, align 4
				; CHECK: callq l_OUTLINED_FUNCTION_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				store i32 1, i32* @x, align 4
				; CHECK: callq l_OUTLINED_FUNCTION_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				ret i32 0
				}

				attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="true" }

				; CHECK-LABEL: l_OUTLINED_FUNCTION_0:
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: retq

				; CHECK-LABEL: l_OUTLINED_FUNCTION_1:
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK-NEXT: retq

This is an archive of the discontinued LLVM Phabricator instance.

Outliner: Add MIR-level outlining passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 89953

llvm/trunk/include/llvm/CodeGen/Passes.h

llvm/trunk/include/llvm/InitializePasses.h

llvm/trunk/include/llvm/Target/TargetInstrInfo.h

llvm/trunk/lib/CodeGen/CMakeLists.txt

llvm/trunk/lib/CodeGen/CodeGen.cpp

llvm/trunk/lib/CodeGen/MachineOutliner.cpp

llvm/trunk/lib/CodeGen/TargetPassConfig.cpp

llvm/trunk/lib/Target/X86/X86InstrInfo.h

llvm/trunk/lib/Target/X86/X86InstrInfo.cpp

llvm/trunk/test/CodeGen/X86/machine-outliner-debuginfo.ll

llvm/trunk/test/CodeGen/X86/machine-outliner.ll

Outliner: Add MIR-level outlining pass
ClosedPublic