This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
CodeGen/
4/4
Passes.h
-
InitializePasses.h
-
Target/
10/12
TargetInstrInfo.h
-
lib/
-
CodeGen/
1/1
CMakeLists.txt
-
CodeGen.cpp
141/151
MachineOutliner.cpp
1/1
TargetPassConfig.cpp
-
Target/X86/
-
X86/
1/2
X86InstrInfo.h
11/12
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
2/2
machine-outliner-basic.ll
2/2
machine-outliner-bb-boundaries.ll
1/1
machine-outliner-interprocedural.ll
-
machine-outliner-nocalls.ll

Differential D26872

Outliner: Add MIR-level outlining pass
ClosedPublic

Authored by paquette on Nov 18 2016, 3:01 PM.

Download Raw Diff

Details

Reviewers

qcolombet
MatzeB
mkuper
craig.topper

Commits

rGd36410945fc6: Add MIR-level outlining pass
rL296418: Add MIR-level outlining pass

Summary

This is an updated patch for the outliner described in the RFC at: http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html

The outliner is a code-size reduction pass which works by finding repeated sequences of instructions in a program, and replacing them with calls to functions. This would be especially useful to people working in low-memory environments, where sacrificing performance for space is acceptable.

This would add a interprocedural outliner directly before printing assembly. For reference on how this would work, this patch also includes X86 target hooks and an X86 test.

The outliner is run like so:

clang -mno-red-zone -mllvm -enable-machine-outliner file.c

I would love for people to test it out and tell me about how well it works for them, and maybe even play around with the provided target hooks. Tell me what you think!

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Thanks for the feedback on the previous patches! I've updated the outliner significantly from the previous version. In this version there is...

No suffix tree in ADT: The suffix tree is now a part of MachineOutliner.h

No TerminatedString: Everything is done just using std::vectors now with a couple helper methods.

Slightly different outlining technique: We don't have to keep track of offsets anymore by sorting candidates in descending order rather than ascending order.

Better documentation, comments, etc: I fixed up the Doxygen stuff, went through, and wrote detailed explanations of each function. I also added some FIXMEs and TODOs to help guide further development on the outliner.

Tell me what you think!

It might help to mark all comments that were addressed so far as "Done" in phabricator.

I realized I missed a bunch of stuff on MachineOutliner.cpp/MachineOutliner.h, so I went ahead and picked through the code and fixed a good chunk of it.

Highlights:

No more MachineOutliner.h!
Less gross X86 TargetInstrInfo!
Range-based for loops!
Proper pass initialization!

I'd also like to ask about the best way to handle custom epilogue/prologue insertion. I left in the three separate virtual functions in TargetInstrInfo.h. Does anyone think it'd be better to just smack them all together into one function which handles function call insertion? I feel like that'd be the best way to do it, but I left it as is for now.

Thanks!
Jessica

It would be awesome to have an option that serializes the Collection for use by external tools like Souper:
//std::vector<std::vector<unsigned> *> Collection;

Perhaps replace SuffixTree with SuffixContainer<SuffixTree> and SuffixContainter<SuffixArray>? Suffix arrays are more performant and are trivial to copy. Suffix trees are nice to walk.

Also, think about Vitanyai's approximation of Kolmogorov complexity using gzip, https://arxiv.org/pdf/cs/0111054.pdf

Dirty hack, but you could use the nesting of instructions that gzip comes up with.

Hi Jessica,

this is great progress from the last patch! I think we need one more roundtrip until we can commit the code and keep improving it inside the repository (as this is an independent pass it shouldn't affect the rest of the compiler).

For the whole Outliner.cpp file: Internal types and functions should be marked as such. In this case it's probably best to open an anonymous namespace (except for the externally visible functions createOutlinerPass(), and InitializerMachineOutliner())

include/llvm/CodeGen/Passes.h
406	Indent.
include/llvm/Target/TargetInstrInfo.h
1563–1609	You should move this block a bit upwards (We usually try to keep private fields at the beginning or end of a class, this would move them in the middle).
1605	As this is a codegen API passing in a MachineFunction& parameter is more natural. (Implementations can always use MF.getFunction() to get back to the llvm::Function)
lib/CodeGen/MachineOutliner.cpp
2	MachineOutliner.cpp
109	`std::pair<String *, size_t>`?
179–187	Looks like this could be a constructor.
180	This looks like it could just be a constructor on the SuffixTreeNode struct.
190	Should this be called `deleteSuffixTreeSubTree` or similar as it deletes more than 1 node?
206–207	this has no effect
392	can be `size_t size() const`
398	`\param`
417–418	No need for to cast `EndIdx`.
469	use a reference instead of a pointer?
477	Could do an early exit: if (MaxHeight == 0) return nullptr;
552	the braces around the cases are unnecessary
583	This can use an early exit so you do not need to indent everything inside the if.
596	This seems to be an impossible case as you already tested for `N != nullptr`.
618–622	This could be for (SuffixTreeNode *T = N->Link; T && T != Root; T = T->Link) T->Valid = false; (It is usually better to go with a for loop instead of `while() { ... increment; }` on principle, because you do not have to remember to duplicate the increment code when you want to use `continue` somewhere inside the loop)
631–632	Maybe do not initialize these to give a hint to the reader that they will be set by `findString()` anyway (If findString() doesn't always set them, then you should make it).
635	Could be an early exit.
648	Use `const StringCollection &Strings`
667	The deleteSuffixTreeNode() implementation already protects against nullptrs.
714	the llvm:: prefix should not be necessary (same for OccBB)
742	You probably only need to put INITIALIZE_PASS and createOutlinerPass() in the llvm namespace.
773	Maybe use `"MIR Function Outlining"` to be in sync with INITIALIZE_PASS. Most llvm pass 'names' read more like short descriptions.
800–802	Could use references instead of pointers.
844–846	This makes no sense.
854	Maybe use `"machine-outliner"` to be in sync with DEBUG_TYPE.
869	Maybe add `assert(CurrLegalInstrMapping < CurrIllegalInstrMapping && "Overflow");` The same at the place where you increment CurrLegalInstrMapping.
876–878	The `find(), if (I != end()) ... else insert()` sequence walks over the datastructure in the find() and again in the insert() step. Instead you can simply always `insert()` and check the return value to see whether an element was actually inserted or an existing one reused: auto I = map.insert(make_pair(MI, CurrLegalInstrMapping)); // Newly inserted? if (I.second) CurrLegalInstrMapping++; unsigned MINumber = I.first;
896	Use a reference.
991	I don't think the value names need to be kept around.
1020	Can be `begin()` instead of `instr_begin()`.
1025	Should probably document this with something like `// The cloned memory operands reference the old function. Drop them.`
1026	can be `MBB->begin()`
1048–1051	Could use a range based for: for (OutlinedFunction &OF : FunctionList) OF.MF = createOutlinedFunction(M, OF);
1109–1113	Constructing names should not be necessary here. You should be able to add a GlobalValue MachineOperand instead of a Symbol one when you construct the call instruction.
1148–1150	You can move those out of the loop.
1159	There seems to be an unnecessary duplication of the string.

Some stylistic comments.

Also, the amount of testing seems small compared to the amount of code being added. Can you beef it up? Especially testing for the handling of "unsafe" cases would be good.

lib/CodeGen/MachineOutliner.cpp
17	A link to your devmtg talk might be good (together with some explanation about how it relates to what is implemented here).
55	This seems to have some nontrivial invariants, so you may want something a bit stronger than a typedef. Maybe a lightweight struct around this would be better? You already have a couple helpers that seems like they could quite naturally be methods of such a struct. I guess the question is: does it ever make sense for a random piece of code that is using this typedef to actually operate on it using the std::vector API? Or would such code inherently risk violating some sort of invariant? If the latter, a lightweight struct encapsulating the underlying vector is likely to be a good choice.
61	This doesn't really make much sense to me. Can you explain this data structure a bit better?
74	This could use a better name.
125	Is there a good online resource you can link to about Ukkonen's algorithm? If this implementation is patterned off of a particular resource that might be good to link to if possible.
127	Small coding standard nit: use static/anonymous namespace on all the helper stuff here and elsewhere: http://llvm.org/docs/CodingStandards.html#anonymous-namespaces
142	I assume "start" was meant here. But "star index" (as in Kleene) sounds vaguely plausible in the context of a string algorithm so this typo is worth fixing.
153	Do you really need a std::map? http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc Also, if the number of outgoing edges is small, a small-type container is probably better here.
158	As per the comment, maybe call this "is in tree" or "already pruned" or something like that?
164	Can you be a bit more specific about where this shortcut link comes in in Ukkonen's algorithm?
173	Both `Start` and `End` here are described as "index" but one is a pointer. That's worth explaining in the comment. Also, SuffixIndex is also an "index" and gets an "Index" suffix to its variable name. Maybe these should be `StartIndex` and `EndIndex`? Also, other places in this patch use `Idx` for variable names to mean "index". Maybe these should be `StartIdx` etc.?
307	Move the comment inside the braces so that this can use the more common `} else {` style, here and elsewhere.
338	There's quite a few naked new/delete in this code. Can you encapsulate the ownership better? If not, can you centralize the new/delete in helpers and add some comments about lifetime/ownership?
771	Can you hold this by value?
1176	Can you improve the ownership here? E.g. use a std::unique_ptr to manage the lifetime?
lib/CodeGen/TargetPassConfig.cpp
95	What is the official name? "machine outliner" or "MIR outliner". Please be consistent here and elsewhere.

Hi everyone,

I think I addressed most if not all of the comments from the previous patch; thanks for all of the help with reviewing this! I'm quite happy with how this has been coming along in progressing from intern project to real code.

Major changes:

No use of raw new/delete in the outliner
Tried to decouple the "program strings" from normal strings-- that seemed to be somewhat confusing
New test which also checks if the outliner obeys basic block boundaries
Lots of type changes; lots of things that didn't have to be pointers are no longer pointers
More use of appropriate LLVM data structures for memory management, maps, etc.

Tell me what you think!

Great work!

This round of review feedback is mostly about comment accuracy and some implementation improvements like centralizing/encapsulating the numbering state machine, handling the EndIdx for leaf nodes in a simpler way, and avoiding copying too many vectors around. Also various style/readability nits.

The testing still seems light considering how much functionality this adds. Can we use MIR to test this? Ideally we'd have pretty good coverage of X86InstrInfo::isLegalToOutline
One way to think about this is: suppose we run this on a bunch of programs in the wild and find some bugs in the suffix tree construction algorithm or isLegalToOutline. How are we going to write tests for those fixes?
Since you've done such a nice job keeping SuffixTree separate (with the simple interface based on unsigned) it might even be possible to write a C++ unittest for it. That requires pulling it up into a header though and other headaches/boilerplate that might not be worth it right now.

lib/CodeGen/MachineOutliner.cpp
71	My reading of this comment interprets this as saying that this class e.g. holds a DenseMap of MI's to integers itself, but that is not the case. Can you make this more precise? The term "mapping" is somewhat confusing since in common usage a "mapping" usually denotes a container. But throughout this patch the term "mapping" is used to refer to a "string". I think I get what sense it is meant in (e.g. "this is the mapping of this MBB through the our value numbering map"). I can't think of any really good names except for something horribly verbose like "StringOfInstructionNumbers" or something like that. Anyway, you probably want to beef up this comment to describe the sense in which "mapping" is used here and throughout this patch. After staring at the patch for a while the term "mapping" has grown on me, so it's not a big deal.
73	Rather than a generic "this is used for compatibility with the suffix tree", can you be more precise. Something like "Our suffix tree implementation operates on this class" or something? After changing the constructor argument of SuffixTree to `const std::vector<std::vector<unsigned>> &` the only use of this class is as an internal data structure of the SuffixTree, which is an even more precise statement.
77	This description in terms of "hash" vs "unique" doesn't seem accurate. Once you pull out a class to encapsulate the numbering you can just mention that here. As far as users of this class are concerned the numbers are just unique symbols; I don't think you need to go into too much detail besides linking to the place where we assign the numbering.
100	Generally the term "program" is reserved for talking about the final linked executable, but this code (except during LTO) generally operates on a single Module/TU. Can you give this a better name? Maybe just `BlockMappings` or something like that.
144	Why is this comment talking about "2D mapping"? There is just a single index.
203	This comment is pure gold. Nice.
242	Comment on the `+ 1`.
265	Isn't it a more like that the leaves "represent" suffixes rather than "contain" suffixes?
267	I don't think this is correct. It isn't the leaves representing suffixes per se that facilitates finding repeated substrings, but rather the fact that the internal nodes represent repeated substrings (shared prefixes of the suffixes). You may want to beef up this paragraph on suffix trees a bit to describe the basic invariants a bit more (leaves represent suffixes, internal nodes represent shared prefixes of the suffixes).
270	Same comment as on ProgramMapping. The integers themselves aren't hashes (otherwise this code would have to do something special for collisions).
280	I'm not sure what you are trying to convey with this paragraph. Maybe you can just mention that the implementation maintains parent links (maybe also describe what they are used for). I don't see the big picture for talking about cycles or explicit digraphs here.
289	The RAII behavior of BumpPtrAllocator is pretty well-known, so you don't need to explicitly mention it (that's the beauty of RAII; it just cleans up for you!). This comment can probably be reduced to "Allocator that owns all nodes in the tree".
293	This arrangement of adding an extra layer of indirection for the special "EndIdx" handling for leaf nodes is interesting but after staring at the code for a while it seems like it obscures things (or maybe I'm missing something). It seems that the only thing that needs this extra layer of indirection is during construction in the test `if (StrIdx > *(CurrSuffixTreeNode->EndIdx)) {` which could be replaced by something like `if (StrIdx > (CurrSuffixTreeNode->EndIdx == -1 : LeafEndIdx : CurrSuffixTreeNode->EndIdx) {` or similar. To do that, SuffixTreeNode::EndIdx would just be an integer held by value, with -1 being a sentinel indicating that this is a leaf that needs the special handling in that one test.
295	what does the EndIdx even mean for an internal node, since they can be shared? Maybe explain that in the comment of the EndIdx member of SuffixTreeNode?
303	You can use the same BumpPtrAllocator for both of these if you want. It's kind of nice to have a place for these comments though so no biggie.
348	Can `Parent` just be a constructor argument?
367	This assert is inside an `if(ChildPair.second != nullptr)` so probably doesn't buy you much.
548	Nit: move the comment inside so you can use the coding-standard compliant `} else if (...) {`
596	Nit: Putting this `CurrSuffixTreeNode->Children[QueryString[CurrIdx]]` expression (used 3 times) in a variable will make things a bit shorter and also give you an opportunity to give that value a name. E.g. maybe `if (Child && Child->IsInTree) {`?
605	Nit: move this comment inside the `else` so that you have `} else {`.
614	Comment on the `- 1` part of this. It's setting off my off-by-one error spidey sense.
674	You're copying quite a few std::vector's here. Can they be ArrayRef's?
726	This iterates over Strings.MBBMappings. In what sense does it treat Strings as "flat" if it looks at individual substrings? Also, I find it a bit weird that we take a ProgramMapping as input to this constructor, but then all these `append` calls seem to build up a different ProgramMapping. Do the two ProgramMapping's end up being equivalent? It would be nice to see a bit more explanation about this. At least for the purposes of this constructor, maybe a `const std::vector<std::vector<unsigned>> &` is the natural interface because it doesn't use any of the fancy methods on ProgramMapping.
752	It seems a bit weird to me that this class is caring explicitly about the distinction between the "flat" and non-"flat" senses of ProgramMapping. I thought that ProgramMapping was just supposed to encapsulate a 2D ragged array and make it look flat, but here it seems that the external code still cares about the distinction between 2D-ness and flat-ness. Can you make it a bit clearer in the code and comments what ProgramMapping is supposed to represent and what its interactions with the rest of the code are?
798	I don't see much mention of the function Id's in `ProgramMapping`. Looking at the code, it seems like it should be `unsigned` and roughly represents the call instruction that jumps to the outlined function.
827	There are a couple members here related to the legal-instruction/illegal-instruction/function numbering that could stand to be pulled out into an isolated class (which can then be held by value in the pass) separate from the pass boilerplate. Such a class will also be a good place to authoritatively document the numbering scheme and encapsulate it.
944	Small readability nit: use `std::tie(I, WasInserted) = ...` or something like that to make this a bit clearer (this map interface returning the pair is always confusing without that).
1155	It isn't really the "function's id" but rather the id for a call instruction that jumps to it, right?
1156	Nit: remove commented out code.

silvas added inline comments.Jan 10 2017, 4:34 AM

lib/CodeGen/MachineOutliner.cpp
96	Can you add a high-level explanation of why we have a 2D vector in the first place. I.e. why do we need to "pretend" instead of just materializing the flattened vector? One thing that may help make this clearer is encapsulating the mutation of `MBBMappings` a bit better. Right now there seem to be some un-encapsulated mutations, so just looking at the class it's not clear what the elementary operations on it are.
123	It shouldn't be too hard to bring this down to log(N) if necessary, but it surprises me that O(N) is fine when N = number of MBB's in the module (for reference, a typical FullLTO for a codebase ~ the size of clang has 10's of thousands of functions, and probably an order of magnitude more MBB's). Can you add a comment explaining that it takes linear time and why that's fine (or not fine, but fine for now)?

aprantl added inline comments.Jan 12 2017, 8:53 AM

lib/CodeGen/MachineOutliner.cpp
200	Did you get a chance to look into this?

I did a quick run on SPEC CPU 2006 with FullLTO and it seems like I ran into 3 different assertion failures across various programs: https://reviews.llvm.org/P7954
There seem to be 3 different assertions getting hit.

Here are some basic bugpoint-reduced test cases. Repro with llc -enable-machine-outliner -O3 foo.ll (though I expect these will be sensitive to minor codegen changes; I assume you have access to SPEC so you can reduce them again if needed)

https://reviews.llvm.org/P7957 :

Assertion `!NodePtr->isKnownSentinel()' failed.

https://reviews.llvm.org/P7958 :

Assertion `(isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"' failed.

https://reviews.llvm.org/P7960 :

Assertion `Occurrences.size() > 0 && "Longest repeated substring has no occurrences."' failed.

Also, here is a case that takes a really long time to compile (it does eventually finish) stuck in MachineOutliner::buildCandidateList :
https://reviews.llvm.org/P7959
Bugpoint found this one from the "Occurrences.size() > 0" assertion failure case.

For now, it's probably best to focus on the code review comments. Once the code is in good shape and committed, these sorts of bugs can be incrementally hammered out (tracked in bugzilla, fixed with clear individual patches (with test cases :) )).

include/llvm/CodeGen/Passes.h
407	Nit: comment and function aren't aligned.
include/llvm/Target/TargetInstrInfo.h
1564	Nit: inconsistent indent.
lib/CodeGen/MachineOutliner.cpp
49	Nit: this won't work on case-sensitive file systems.

In case you're wondering, that didn't take that long to reduce to that point. Just have to learn the tools and workflow. We have amazing tools for doing test case reduction and an easily understandable test-suite build system! (mad props to those who made bugpoint; the folks that made test-suite awesome; and LLD's -save-temps; and a bit of -globalopt and -metarenamer to clean things up)

I hear that you are planning to get rid of the ProgramMapping construct which would be my last real stopper.
There is still a lot of nitpicky stuff around.
This time I checked the suffix tree construction algorithm which looks good.
For future patches: Having a DenseMap<unsigned, SuffixTreeNode *> in every node is potentially wasteful. Intuitively I would expect the majority of nodes to only have a small number of entries, so it may be good to measure and explore alternative representations like a dynamically adapting datastructure (i.e. switching from linear list to map depending on number of children).

include/llvm/CodeGen/Passes.h
406–407	This should probably be `createMachineOutlinerPass()` to be consistent.
include/llvm/Target/TargetInstrInfo.h
1564	indentation. You should also consider to move these functions towards the other functions so the "private:" part can stay at the end of the class definition.
1576	Indentation
1605	As this is a codegen API I would rather pass a `MachineFunction&`
lib/CodeGen/MachineOutliner.cpp
41	Move this below the #includes so you do not accidentally affect the headers.
64–65	Maybe drop the `Stat` suffix, esp. in a statistic dump that looks superfluous.
173	Suggestion (possibly for later patches): As far as I see it a node is either a leaf or an inner node and never changes it nature. You could make this and the constraints on the End and Children members a bit more obvious when representing this in a type hierarchy (and safe a bit of memory): struct SuffixTreeNode { bool IsLeaf; ... }; struct SuffixTreeLeafNode : public SuffixTreeNode { size_t EndIdx; size_t SuffixIdx; }; struct SuffixTreeInternalNode : SuffixTreeNode { Map<SuffixTreeNode> Children; };
238	can be const.
239–244	Maybe handle the special case early: if (StartIdx == EmptyIdx) return 0; return EndIdx - StartIdx + 1; I assume this is not supposed to be called with EndIdx == EmptyIdx? Add an assert()?
249–251	Interesting to see this packaged up in an own struct instead of just putting the members directly into the SuffixTree class. But doesn't hurt either I guess.
303	It looks like you can use a single allocator for nodes and EndIndexes.
346	Maybe add else assert(EndIdx == EmptyIdx); to make sure callers know what they are doing. An alternative would be to provide different functions for inserting leafs or inner nodes.
364	You can save an indentation level here with if (ChildPair.second == nullptr) continue;
388	As you only have 1 "out" parameter you could simply return the new value instead. At the call side I find `SuffixesToAdd = extend(x, y, SuffixesToAdd);` easier to understand when you do not have to wonder whether a parameters is an "out" parameter. The NeedsLink parameter is nullptr for all callers?
391–392	General note on comments: I would expect comments at this indentation level to talk about properties/situations at that level and not just inside the if. This gets clearer if you formulate the comment as a question, or an if-then statement (or move the comment into the if block): // Look at the last character if the current mapping is 0. if (Active.Len == 0) Active.Idx = EndIdx; // Current mapping is 0? if (Active.Len == 0) { // Look at the last added character. Active.Idx = EndIdx; }
398	LastChar can be moved further down as it's not used by some paths through the function.
473	Add a comment that you check whether Active.Node is the root or alternatively add a `SuffixTreeNode::isRoot()` function.
484–485	Indentation
491	This should rather be ArrayRef<unsigned>.
513	Add `assert(SuffixesToAdd == 0);`?
518	Maybe `setSuffixIndices(..., /LabelHeight =/0)` instead of the local variable so readers do not keep wondering whether it is an "out" parameter.
688	move this into the loop.
804–808	C++ does the right thing for `Name(Name)` etc. so you can drop the `_` suffixes from the parameter names. Similar with some other constructors.
810	You should extend the anonymous namespace to include the MachineOutliner class. The only things that needs to be visible to the outside are `initializeMachineOutlinerPass()` (=the stuff coming out of INITIALIZE_PASS) and `createOutlinerPass()`.
833	Because this relies on implementation details of DenseMapInfo, better play it safe with static_assert so the compilation fails if someone decides to change the values: static_assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1); static_assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2); that way things also explain themselves and you can get away with a shorter comment. The module pass instance can in theory be reused for multiple programs. So the state here needs to be initialized and cleared in `runOnModule()`.
835	It's the next number to be assigned, isn't it? Same with CurrIllegalInstrMapping.
922	Indentation, MBB can be `const`
953–954	The overflow could (in theory) be triggered by a user and not just by compiler bugs. So could use report_fatal_error() so it stays around in release builds: if (CurrLegalInstrMapping < CurIllegalInstrMapping) report_fatal_error("Instruction mapping overflow!");
955	Use `DenseMapInfo<unsigned>::get{Empty\|Tombstone}Key()` instead of hardcoding the values.
1009	Could use `emplace_back(OccBB, StartIdxInBB, ...)`
1018	Could use `emplace_back()`.
1051	I think getOrInsertFunction() copies the name and does not take ownership of the passed string so this is an unnecessary copy and a memory leak.
1067–1069	Use references for variables that cannot be `nullptr`.
1104–1105	I think CandidateList and FunctionList can be `const` or better `ArrayRef`.
1214	Move to assignment.
1215–1216	As you have some state such as CurrentFunctionID, CurrLegalMapping in the class anyway, maybe the two vectors and the Worklist can move there as well so you do not need to pass them around? Just need to `clear()` them at the end of the function then.

And a few more for the X86 part.

lib/Target/X86/X86InstrInfo.cpp
10403–10405	Heh, in theory every single x86 instruction modifies RIP. But I assume we don't model it like that in LLVM. In any way restricting this to reads(RIP) should be enough.
10415–10420	Are those tests necessary given that you already throw out operations with FrameIndex operands?
10422	You could use `MachineInstr::isPosition()` instead of checking for `isLabel()` and `isCFIInstruction()`
10425	Better use `const MachineOperand &MOP` to avoid some copying.
lib/Target/X86/X86InstrInfo.h
634	This linebreak seems unnecessary.

pmatos added a subscriber: pmatos.Jan 30 2017, 4:28 AM

aprantl added inline comments.Feb 21 2017, 3:34 PM

lib/CodeGen/MachineOutliner.cpp
200	Ping :-) Note that it is really important to skip over any DBG_VALUE intrinsics while deciding whether to outline a sequence of instructions. Otherwise compiling with -g will produce different code than without, which we generally consider to be serious bug in the compiler.

Alright, it's been a while, but here's the next version of the outlining patch! As always, thanks to everyone taking the time to read through all this code. This version of the outliner is quite different from the previous one, since I've improved on it a lot since the last patch.

Major changes

More LLVM-ey and most comments addressed!

X86 target won't outline debug instructions anymore. There are other things to think about wrt debug info, which I'm currently working on.

More tests: Right now, MIR tests with the outliner make LLC unhappy, so I wrote a couple IR tests which should be easy enough to transition to MIR.

No more ProgramMapping: instead there's another vector which keeps track of the positions of each instruction in the program. It uses iterators because the delete function on MachineBasicBlocks takes a start and end iterator. (If that makes anyone uncomfortable, I can change it to pointers.)

Improved suffix tree pruning: the previous version was too aggressive and threw out too many candidates. The new version uses a vector of leaves.

No more unnecessary functions: the previous version had a FIXME stating that sometimes the outliner could create unnecessary functions when all of the candidates for a function were removed. Now overlap pruning happens directly before outlining, so functions are created as they're needed.

Outlined functions are now link once ODR: This allows the linker to dedupe outlined functions without LTO.

Mapper class: This class performs the instruction-integer mappings and is passed around the outliner

General suffix tree queries: Different targets will have different benefit functions, and even different types of functions to outline. For example, after this, I have a version of the outliner which supports tail-calling outlined functions. In the interest of keeping target-specific stuff out of the SuffixTree, the target now defines a benefit function, which is then maximized by the DFS query for repeated substrings. I think this will allow for more fine-grained outlining on various targets.

Tell me what you think!

aprantl added inline comments.Feb 23 2017, 11:13 AM

lib/Target/X86/X86InstrInfo.cpp
10412	This is not the right way to do this. We need to skip over DBG_VALUE instruction as if they didn't exist. Otherwise the presence of DBG_VALUEs in the instruction stream will have an effect on the outlining decision, which means that compiling with -g will generate different code than without. Please also be sure to include a testcase that exercises this.

Thanks for pushing this, this is coming along nicely!

At this point I don't see any correctness or compiletime problems (with the comments below addressed).
Let's commit this and keep improving it in-tree. This should also make reviewing easier as we can have smaller follow-up patches.

lib/CodeGen/MachineOutliner.cpp
782–783	An instance of this only describes a single outlined function.
800	doxygen `///`
844	I don't think you need to initialize this, it gets overwritten anyway in the next line.
1225–1241	Hmm... This hash doesn't seem collision free. Someone having two files with the same name (maybe in two different projects that he links together later) may happen. Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? I think the linker will only try to merge functions with the same name but the function name(-hash) is currently based on the name not the contents of the function so I would expect this to be not helpful in most cases. Maybe stay with the previous internal linking and try the LinkOnce tricks in a follow-up commit (where it is based on the contents).
lib/Target/X86/X86InstrInfo.cpp
10424–10425	This is surprising, checking the MCInstrDesc should not be necessary. This is most probably a bug somewhere else in codegen, so there is nothing we can do here. However I'd be good if you could find the time later to create a reproducer and file a PR about it, reading and writing registers without having operands for it looks like a bug waiting to happen elsewhere.
test/CodeGen/X86/machine-outliner-basic.ll
1	Better use `-mtriple=x86_64--` instead of `-march` so we also force the operating system etc.
24	You probably want to force this test to use the same outlined function everywhere. FileCheck allows assigning names and checking for repeated patterns: ; CHECK: callq [[_OUTLINED_FUNCTION[0-9]+_0:OUTLINEFUNC]] ... ; CHECK: callq [[OUTLINEFUNC]] ... ; CHECK-LABEL: [[OUTLINEFUNC]]:
test/CodeGen/X86/machine-outliner-bb-boundaries.ll
1	You can probably merge the tests together into a single file as they are all about the same pass and use the same llc flags.
22	I'd remove those standard dumping comments. If you actually care about the label give it a real name, if not the comment shouldn't be necessary either.

This revision is now accepted and ready to land.Feb 23 2017, 4:18 PM

silvas added inline comments.Feb 23 2017, 6:45 PM

lib/CodeGen/MachineOutliner.cpp
1225–1241	Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? No. linkonce_odr requires that if the name matches then the contents are interchangeable, since one gets selected arbitrarily. So for correctness the hash must be collision-free. (see also the discussion in D29512 which also involves finding a stable "name" for the TU) Also, I don't see the point of doing this. The linker's content-based deduplication ("ICF") should handle this case without caring about the name. If you want to use the linker's comdat/linkonce (i.e. name-based) deduplication then you can just use the function's contents as the name (mangling away NUL bytes), or a strong hash (collisions are a correctness problem). Presumably, if users are using this pass, then they care about code size and so they are likely to have ICF enabled already. So I don't see the point of doing this linkage trick.
test/CodeGen/X86/machine-outliner-interprocedural.ll
8	The leading underscore here is darwin-specific. Add an explicit triple to avoid this (otherwise non-Darwin bots will break).

FYI this doesn't build on Linux with gcc

[94/1782] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o
FAILED: lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o 
/usr/bin/c++   -DGTEST_HAS_RTTI=0 -DLLVM_BUILD_GLOBAL_ISEL -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/CodeGen -I/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen -Iinclude -I/usr/local/google/home/silvasean/pg/llvm/llvm/include -fPIC -fvisibility-inlines-hidden -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment -std=c++11 -ffunction-sections -fdata-sections -O2 -g -DNDEBUG    -fno-exceptions -fno-rtti -MD -MT lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o -MF lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o.d -o lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/MachineOutliner.cpp.o -c /usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:171:18: error: enclosing class of constexpr non-static member function ‘bool {anonymous}::SuffixTreeNode::isLeaf() const’ is not a literal type
   constexpr bool isLeaf() const { return SuffixIdx != EmptyIdx; }
                  ^
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:91:8: note: ‘{anonymous}::SuffixTreeNode’ is not literal because:
 struct SuffixTreeNode {
        ^
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:91:8: note:   ‘{anonymous}::SuffixTreeNode’ has a non-trivial destructor
/usr/local/google/home/silvasean/pg/llvm/llvm/lib/CodeGen/MachineOutliner.cpp:174:18: error: enclosing class of constexpr non-static member function ‘bool {anonymous}::SuffixTreeNode::isRoot() const’ is not a literal type
   constexpr bool isRoot() const { return StartIdx == EmptyIdx; }
                  ^
[143/1782] Building CXX object lib/CodeGen/CMakeFiles/LLVMCodeGen.dir/RegAllocGreedy.cpp.o

The compiler is gcc 4.8.4

I just tested on SPECCPU2006 (FullLTO) and no assertion failures!

However, 403.gcc and 483.xalancbmk (at least) seem to have a huge compile time slowdown (superlinear behavior?). Some rough numbers comparing LLC runtime:
403.gcc 11s -> 66s
483.xalancbmk 16s -> 144s
(so about 5-10x slowdown of LLC due to the suffix tree)

Most of the time seems to be spent inside buildCandidateList. Sampling a couple stacks it seems like it is stuck in findBest, usually just 1 or 2 stack frames in findBest and so at least the problem isn't that it is recursing too deeply.
I added some printfs to print out the depth and vertex degree of each node in the suffix tree for 483.xalancbmk and I got this: https://reviews.llvm.org/F3114496

So it makes sense that typically one would be only 1 or 2 stack frames deep.

Modulo the pruning that is going on, we seem to do O(N) work in bestRepeatedSubstring once per outlining candidate. Is the pruning effective enough that the sum of all calls to bestRepeatedSubstring doesn't grow out of control? My suspicion is that it isn't, and I think a contrived case like AAABBBCCCDDD... (Assume "A" represents constant-size string large enough to be profitable to outline) will trigger O(N^2) behavior in the number of instructions in the module.
Is it possible to do algorithmically better? (exploiting suffix tree invariants maybe?)

Also, it looks like this pass actually increases (1-5%) text size on all of the SPEC binaries except for 401.bzip2: https://reviews.llvm.org/F3114409

(I double and triple checked and I don't have it switched around; the raw data (doublechecked the labels are right) is: https://reviews.llvm.org/P7971)

Can you please find out why this isn't helping (and in fact is hurting)? Are better heuristics needed? At the very least, the cost function seems like it needs to be amended to take into account the true overheads.

In particular, it seems that the cost function does not take into account that the outlined functions will have some minimum alignment applied to them (or can you mark them as not requiring this alignment? still, it would end up depending on linker placement (alignment of adjacent sections) and such as to how much padding actually is inserted).
On 483.xalancbmk, the suffix tree based outliner find 2311 functoins to outline, and almost all of them are 2 instructions, which is typically less than 16 bytes, which is the minimum alignment that will be imposed (just from looking at the output binary).
A naive approach which just looks for identical runs of outlinable instructions (ignoring substrings) outlines 2391 functions (slightly more). The total benefit is somewhat greater for the suffix tree though at 29379 vs 27994 for the naive approach.
This appears to be due to the outliner finding many more length-2 sequences to outline: https://reviews.llvm.org/F3114690

Overall, it seems like the vast majority of the benefit on 483.xalancbmk is due to extremely short instruction sequences. But if we are going to avoid very short instruction sequences because they actually aren't profitable, then most of the outlinable instructions disappear on this test case (and at a glance, the other SPEC benchmarks are pretty similar). I'd also like to note that this testing is with FullLTO, so it is a best-case scenario for the outliner (whole program visibility to the suffix tree). What kinds of programs does this outliner perform well on?

For reference, here are all the outlined functions from 483.xalancbmk: https://reviews.llvm.org/F3114805

One interesting thing is that they are almost all short sequences of mov instructions. Staring at the code that calls them, it's clear why this is: almost all of the outlined functions in 483.xalancbmk are in sequences like this:

...
  362a1e:`      e8 dd 20 0e 00       `  callq  444b00 <OUTLINED_FUNCTION2637142655534006531_61>
  362a23:`      e8 4c e8 09 00       `  callq  401274 <_ZN11xercesc_2_512XMLBufferMgr13releaseBufferERNS_9XMLBufferE>
...

(FWIW, I tried and IPRA doesn't actually decrease text size much on SPEC with FullLTO)

I.e. what has been outlined is function setup overhead. There are also quite a few outlined functions right before jumps, which are factoring out code sequences like this:

00000000004a2eb0 <OUTLINED_FUNCTION2637142655534006531_2458>:
  4a2eb0:`      48 8b 41 18          `  mov    0x18(%rcx),%rax
  4a2eb4:`      48 85 c0             `  test   %rax,%rax
  4a2eb7:`      c3                   `  retq␣␣␣
  4a2eb8:`      0f 1f 84 00 00 00 00 `  nopl   0x0(%rax,%rax,1)
  4a2ebf:`      00␣

lib/CodeGen/MachineOutliner.cpp
1031	If I understand what this is doing correctly, it can be easily made less than O(N^2) by sorting ascending by Start and descending by End (SROA does something similar to do efficient overlap calculations).
lib/Target/X86/X86InstrInfo.cpp
10387	This name does not follow the coding standard. Should be `getOutliningBenefit` or something
10400	isFunctionSafeToOutlineFrom

Also, this pass will almost surely introduce timing side-channel attacks into cryptography code (code that would otherwise by "constant time" and needs to be for security).

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue. E.g. a size-constrained program for a secure processing element on a phone recompiles with this option and it silently breaks the security of the entire device. Hopefully the folks programming the secure element have some sort of testing to avoid this or at least have all critical primitives written in asm (or done by a hardware peripheral).

I can't think of any other optimizations we have that would move a program away from being "constant time"; is there any precedent?

In D26872#686175, @silvas wrote:

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue.

Not that it is on by default right now. Just a concern to keep in mind down the road.

In D26872#686175, @silvas wrote:

Also, this pass will almost surely introduce timing side-channel attacks into cryptography code (code that would otherwise by "constant time" and needs to be for security).

I'm not sure how heavily we care about this security aspect as a community, but I'm a slightly wary of having this on by default at any optimization level due to this issue. E.g. a size-constrained program for a secure processing element on a phone recompiles with this option and it silently breaks the security of the entire device. Hopefully the folks programming the secure element have some sort of testing to avoid this or at least have all critical primitives written in asm (or done by a hardware peripheral).

I can't think of any other optimizations we have that would move a program away from being "constant time"; is there any precedent?

?!? This should be true for most compiler transformations. I don't know how these problems are handled in practice but I doubt they enable compiler optimizations. I don't see why we should start this discussion with this particular review.

In D26872#686220, @MatzeB wrote:

?!? This should be true for most compiler transformations. I don't know how these problems are handled in practice but I doubt they enable compiler optimizations. I don't see why we should start this discussion with this particular review.

I agree that we don't want to discuss it in this review (that's why I said "down the road"), but most compiler transformations I can think of remove indirection or otherwise simplify things towards the set of "constant time" instructions (such as elementary reg-reg adds and such). This pass introduces call instructions into arbitrary code (and calls on x86 architecturally write to memory and are subject to branch prediction, etc.). I agree, let's not have this discussion here though.

btw, down the road you may want to have this pass really know in detail the encoded length of each instruction on x86. There are quite a few *single instructions* that would be beneficial from a code size perspective to outline (if the outlined function is set to have alignment of 1). A quick analysis of an LLD binary (which contains all of LLVM linked in for LTO) shows there is over 5% code size savings just from outlining single instructions (since many x86 instructions encode to be larger than a CALL instruction which is 5 bytes). About half of the benefit (so about 2-3% of the total on this test case) comes from instructions that reference the stack via %rsp (mostly zeroing out stack slots), which could still be outlined if the offset was rewritten.

In D26872#686155, @silvas wrote:

... 403.gcc and 483.xalancbmk (at least) seem to have a huge compile time slowdown (superlinear behavior?). Some rough numbers comparing LLC runtime:
403.gcc 11s -> 66s
483.xalancbmk 16s -> 144s
(so about 5-10x slowdown of LLC due to the suffix tree)

Most of the time seems to be spent inside buildCandidateList. Sampling a couple stacks it seems like it is stuck in findBest...
...
Modulo the pruning that is going on, we seem to do O(N) work in bestRepeatedSubstring once per outlining candidate. Is the pruning effective enough that the sum of all calls to bestRepeatedSubstring doesn't grow out of control? My suspicion is that it isn't, and I think a contrived case like AAABBBCCCDDD... (Assume "A" represents constant-size string large enough to be profitable to outline) will trigger O(N^2) behavior in the number of instructions in the module.
Is it possible to do algorithmically better? (exploiting suffix tree invariants maybe?)

Yeah that'd be a nasty case, and it's worth looking into, for sure.

Some quick ideas off the top of my head:

Pre-prune nodes which can never lead to outlining candidates while setting suffix indices. This would still be vulnerable to programs that look like AABBCC, but may improve query time on average.
Keep track of a collection of prospective "Outlining points". During the first traversal, if we find anything beneficial remember where it was. On the next traversal, if we have a next best point, start at that point instead of the root.
Keep track of every beneficial substring, during one O(n) traversal, and prune overlaps choosing the most beneficial ones greedily.

In D26872#686155, @silvas wrote:

... it seems that the cost function does not take into account that the outlined functions will have some minimum alignment applied to them (or can you mark them as not requiring this alignment? still, it would end up depending on linker placement (alignment of adjacent sections) and such as to how much padding actually is inserted).

I'll have to look into that and see what happens.

In D26872#686155, @silvas wrote:

On 483.xalancbmk, the suffix tree based outliner find 2311 functoins to outline, and almost all of them are 2 instructions, which is typically less than 16 bytes, which is the minimum alignment that will be imposed (just from looking at the output binary).
...
Overall, it seems like the vast majority of the benefit on 483.xalancbmk is due to extremely short instruction sequences. But if we are going to avoid very short instruction sequences because they actually aren't profitable, then most of the outlinable instructions disappear on this test case (and at a glance, the other SPEC benchmarks are pretty similar). I'd also like to note that this testing is with FullLTO, so it is a best-case scenario for the outliner (whole program visibility to the suffix tree).

Did you modify the benefit function to verify that removing length-2 instruction sequences actually removes most candidates? We could have found, say BC as the most beneficial, which would prune out all instances of ABC. There could very well be repeated instances of ABC that would be beneficial to outline. It might be possible to impose a minimum length restriction on x86 without losing all of the candidates.

In D26872#686155, @silvas wrote:

What kinds of programs does this outliner perform well on?

In the test suite, the x86 outliner tended to do well on programs with heavy macro usage or automatically-generated code.

As you found, x86 is a particularly hostile environment for this sort of pass. :) It was just used for a proof of concept and for ease of testing. Most work for this pass should be done for other targets, like, say ARM64.

In D26872#686770, @silvas wrote:

btw, down the road you may want to have this pass really know in detail the encoded length of each instruction on x86. There are quite a few *single instructions* that would be beneficial from a code size perspective to outline (if the outlined function is set to have alignment of 1). A quick analysis of an LLD binary (which contains all of LLVM linked in for LTO) shows there is over 5% code size savings just from outlining single instructions (since many x86 instructions encode to be larger than a CALL instruction which is 5 bytes). About half of the benefit (so about 2-3% of the total on this test case) comes from instructions that reference the stack via %rsp (mostly zeroing out stack slots), which could still be outlined if the offset was rewritten.

I would really love to do this, but I'm not sure if it's possible in LLVM at the moment. If it is, then I'll gladly add it in since I think it's probably one of the main reasons that x86 tests can get larger rather than smaller. The only thing that's (architecturally) tricky is that the target would need to know about the instruction-integer mappings. This could be done by moving the InstructionMapper over to the target, but I'm not sure if that's the best approach. If it's okay to do that, I doubt it'd be too difficult.

Okay, here's the next revision, everyone!

Changes

Debug info is now skipped over as if it doesn't exist
isLegalToOutline->getOutliningType, which returns Legal, Illegal, or Invisible. Invisible is used for instructions which should be ignored.
Combined outliner tests
Added test with debug info
Outlined functions are private again without the wonky "hash"
Style conformance changes, etc.

My LGTM still stands. Should I commit on your behalf or do you already have access?

aprantl added inline comments.Feb 27 2017, 2:02 PM

lib/Target/X86/X86InstrInfo.cpp
10439	Thanks!
test/CodeGen/X86/machine-outliner-debuginfo.ll
43 ↗	(On Diff #89928)	There should also be a negative check to ensure no DBG_VALUE is in the outlined function and that that no debug locations are attached to the outlined function.

In D26872#687815, @MatzeB wrote:

My LGTM still stands. Should I commit on your behalf or do you already have access?

I don't have commit access yet, so go ahead.

Changes

Added negative test for debug info in machine-outliner-debuginfo.ll. The check allows for is_stmt debug stuff because that seems to be out of my control.

Changes

Realized that the last debug test wasn't sufficient, so I fixed it up. It now handles debug values as well.

Third time is the charm. Edits to the tests. The debug test now *truly* makes sure that debug values don't impact outlining. Also removed some cruft from the tests.

Closed by commit rL296418: Add MIR-level outlining pass (authored by matze). · Explain WhyFeb 27 2017, 4:45 PM

This revision was automatically updated to reflect the committed changes.

I went ahead and committed the current state as we believe all immediately actionable things are addressed. And I'd really like us to do further work on it upstream so we don't get a 2000 lines review for every little change.

We do appreciate all the discussion here and hope we can continue on llvm-dev and the upcoming patches.

Nice to see this finally land!

FWIW, I did talk to a security professional inside google (the type of person for which common advice "don't write your own crypto" doesn't apply) and they said that they weren't particularly worried about the transformation done by this pass. Phew!

Revision Contents

Path

Size

include/

llvm/

CodeGen/

Passes.h

4 lines

InitializePasses.h

1 line

Target/

TargetInstrInfo.h

48 lines

lib/

CodeGen/

1 line

1 line

1369 lines

6 lines

Target/

X86/

X86InstrInfo.h

20 lines

X86InstrInfo.cpp

80 lines

test/

CodeGen/

X86/

machine-outliner-basic.ll

39 lines

machine-outliner-bb-boundaries.ll

65 lines

machine-outliner-interprocedural.ll

57 lines

machine-outliner-nocalls.ll

32 lines

Diff 89535

include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// if available with PysicalRegisterUsageInfo pass.		/// if available with PysicalRegisterUsageInfo pass.
FunctionPass *createRegUsageInfoPropPass();		FunctionPass *createRegUsageInfoPropPass();

/// This pass performs software pipelining on machine instructions.		/// This pass performs software pipelining on machine instructions.
extern char &MachinePipelinerID;		extern char &MachinePipelinerID;

/// This pass frees the memory occupied by the MachineFunction.		/// This pass frees the memory occupied by the MachineFunction.
FunctionPass *createFreeMachineFunctionPass();		FunctionPass *createFreeMachineFunctionPass();

		/// This pass performs outlining on machine instructions directly before
		MatzeBUnsubmitted Done Reply Inline Actions Indent. MatzeB: Indent.
		/// printing assembly.
		MatzeBUnsubmitted Done Reply Inline Actions This should probably be `createMachineOutlinerPass()` to be consistent. MatzeB: This should probably be `createMachineOutlinerPass()` to be consistent.
		silvasUnsubmitted Done Reply Inline Actions Nit: comment and function aren't aligned. silvas: Nit: comment and function aren't aligned.
		ModulePass *createMachineOutlinerPass();
		MatzeBUnsubmitted Done Reply Inline Actions no newline. MatzeB: no newline.
} // End llvm namespace		} // End llvm namespace

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_END.		/// INITIALIZE_TM_PASS_END.
#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN		#define INITIALIZE_TM_PASS_BEGIN INITIALIZE_PASS_BEGIN

/// Target machine pass initializer for passes with dependencies. Use with		/// Target machine pass initializer for passes with dependencies. Use with
/// INITIALIZE_TM_PASS_BEGIN.		/// INITIALIZE_TM_PASS_BEGIN.
Show All 24 Lines

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 230 Lines • ▼ Show 20 Lines
	void initializeMachineCopyPropagationPass(PassRegistry&);			void initializeMachineCopyPropagationPass(PassRegistry&);
	void initializeMachineDominanceFrontierPass(PassRegistry&);			void initializeMachineDominanceFrontierPass(PassRegistry&);
	void initializeMachineDominatorTreePass(PassRegistry&);			void initializeMachineDominatorTreePass(PassRegistry&);
	void initializeMachineFunctionPrinterPassPass(PassRegistry&);			void initializeMachineFunctionPrinterPassPass(PassRegistry&);
	void initializeMachineLICMPass(PassRegistry&);			void initializeMachineLICMPass(PassRegistry&);
	void initializeMachineLoopInfoPass(PassRegistry&);			void initializeMachineLoopInfoPass(PassRegistry&);
	void initializeMachineModuleInfoPass(PassRegistry&);			void initializeMachineModuleInfoPass(PassRegistry&);
	void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);			void initializeMachineOptimizationRemarkEmitterPassPass(PassRegistry&);
				void initializeMachineOutlinerPass(PassRegistry&);
	void initializeMachinePipelinerPass(PassRegistry&);			void initializeMachinePipelinerPass(PassRegistry&);
	void initializeMachinePostDominatorTreePass(PassRegistry&);			void initializeMachinePostDominatorTreePass(PassRegistry&);
	void initializeMachineRegionInfoPassPass(PassRegistry&);			void initializeMachineRegionInfoPassPass(PassRegistry&);
	void initializeMachineSchedulerPass(PassRegistry&);			void initializeMachineSchedulerPass(PassRegistry&);
	void initializeMachineSinkingPass(PassRegistry&);			void initializeMachineSinkingPass(PassRegistry&);
	void initializeMachineTraceMetricsPass(PassRegistry&);			void initializeMachineTraceMetricsPass(PassRegistry&);
	void initializeMachineVerifierPassPass(PassRegistry&);			void initializeMachineVerifierPassPass(PassRegistry&);
	void initializeMemCpyOptLegacyPassPass(PassRegistry&);			void initializeMemCpyOptLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 1,502 Lines • ▼ Show 20 Lines	public:

/// True if the instruction is bound to the top of its basic block and no		/// True if the instruction is bound to the top of its basic block and no
/// other instructions shall be inserted before it. This can be implemented		/// other instructions shall be inserted before it. This can be implemented
/// to prevent register allocator to insert spills before such instructions.		/// to prevent register allocator to insert spills before such instructions.
virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {		virtual bool isBasicBlockPrologue(const MachineInstr &MI) const {
return false;		return false;
}		}

		/// \brief Return how many instructions would be saved by outlining a
		/// sequence containing \p SequenceSize instructions that appears
		/// \p Occurrences times in a module.
		virtual unsigned outliningBenefit(size_t SequenceSize, size_t Occurrences) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::outliningBenefit!");
		}

		/// Return true if the instruction is legal to outline.
		virtual bool isLegalToOutline(MachineInstr &MI) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::isLegalToOutline!");
		}

		/// Insert a custom epilogue for outlined functions.
		/// This may be empty, in which case no epilogue or return statement will be
		/// emitted.
		virtual void insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinerEpilogue!");
		}

		/// Insert a call to an outlined function into the program.
		/// Returns an iterator to the spot where we inserted the call. This must be
		/// implemented by the target.
		virtual MachineBasicBlock::iterator
		insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It, MachineFunction &MF) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinedCall!");
		}

		/// Insert a custom prologue for outlined functions.
		/// This may be empty, in which case no prologue will be emitted.
		virtual void insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		llvm_unreachable(
		"Target didn't implement TargetInstrInfo::insertOutlinerPrologue!");
		}

		/// Return true if the function can safely be outlined from.
		/// By default, this means that the function has no red zone.
		virtual bool functionIsSafeToOutlineFrom(MachineFunction &F) const {
		llvm_unreachable("Target didn't implement "
		"TargetInstrInfo::functionIsSafeToOutlineFrom!");
		}

private:		private:
unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;		unsigned CallFrameSetupOpcode, CallFrameDestroyOpcode;
unsigned CatchRetOpcode;		unsigned CatchRetOpcode;
unsigned ReturnOpcode;		unsigned ReturnOpcode;
};		};

		MatzeBUnsubmitted Done Reply Inline Actions indentation. You should also consider to move these functions towards the other functions so the "private:" part can stay at the end of the class definition. MatzeB: indentation. You should also consider to move these functions towards the other functions so…
		silvasUnsubmitted Done Reply Inline Actions Nit: inconsistent indent. silvas: Nit: inconsistent indent.
/// \brief Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.		/// \brief Provide DenseMapInfo for TargetInstrInfo::RegSubRegPair.
		MatzeBUnsubmitted Done Reply Inline Actions Don't repeat function name, similar in other functions. MatzeB: Don't repeat function name, similar in other functions.
template<>		template<>
struct DenseMapInfo<TargetInstrInfo::RegSubRegPair> {		struct DenseMapInfo<TargetInstrInfo::RegSubRegPair> {
typedef DenseMapInfo<unsigned> RegInfo;		typedef DenseMapInfo<unsigned> RegInfo;

static inline TargetInstrInfo::RegSubRegPair getEmptyKey() {		static inline TargetInstrInfo::RegSubRegPair getEmptyKey() {
return TargetInstrInfo::RegSubRegPair(RegInfo::getEmptyKey(),		return TargetInstrInfo::RegSubRegPair(RegInfo::getEmptyKey(),
RegInfo::getEmptyKey());		RegInfo::getEmptyKey());
}		}
static inline TargetInstrInfo::RegSubRegPair getTombstoneKey() {		static inline TargetInstrInfo::RegSubRegPair getTombstoneKey() {
		MatzeBUnsubmitted Done Reply Inline Actions `MBB` can probably be a reference. MatzeB: `MBB` can probably be a reference.
return TargetInstrInfo::RegSubRegPair(RegInfo::getTombstoneKey(),		return TargetInstrInfo::RegSubRegPair(RegInfo::getTombstoneKey(),
RegInfo::getTombstoneKey());		RegInfo::getTombstoneKey());
		MatzeBUnsubmitted Done Reply Inline Actions unnecessary MatzeB: unnecessary
		MatzeBUnsubmitted Not Done Reply Inline Actions Indentation MatzeB: Indentation
}		}
/// \brief Reuse getHashValue implementation from		/// \brief Reuse getHashValue implementation from
/// std::pair<unsigned, unsigned>.		/// std::pair<unsigned, unsigned>.
static unsigned getHashValue(const TargetInstrInfo::RegSubRegPair &Val) {		static unsigned getHashValue(const TargetInstrInfo::RegSubRegPair &Val) {
std::pair<unsigned, unsigned> PairVal =		std::pair<unsigned, unsigned> PairVal =
std::make_pair(Val.Reg, Val.SubReg);		std::make_pair(Val.Reg, Val.SubReg);
return DenseMapInfo<std::pair<unsigned, unsigned>>::getHashValue(PairVal);		return DenseMapInfo<std::pair<unsigned, unsigned>>::getHashValue(PairVal);
		MatzeBUnsubmitted Done Reply Inline Actions This probably won't work inside bundles so can use `MachineBasicBlock::iterator`. MatzeB: This probably won't work inside bundles so can use `MachineBasicBlock::iterator`.
}		}
static bool isEqual(const TargetInstrInfo::RegSubRegPair &LHS,		static bool isEqual(const TargetInstrInfo::RegSubRegPair &LHS,
const TargetInstrInfo::RegSubRegPair &RHS) {		const TargetInstrInfo::RegSubRegPair &RHS) {
		MatzeBUnsubmitted Done Reply Inline Actions Use references for things that cannot be nullptr. Similar in other functions. MatzeB: Use references for things that cannot be nullptr. Similar in other functions.
return RegInfo::isEqual(LHS.Reg, RHS.Reg) &&		return RegInfo::isEqual(LHS.Reg, RHS.Reg) &&
RegInfo::isEqual(LHS.SubReg, RHS.SubReg);		RegInfo::isEqual(LHS.SubReg, RHS.SubReg);
}		}
};		};

} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TARGET_TARGETINSTRINFO_H		#endif // LLVM_TARGET_TARGETINSTRINFO_H
		MatzeBUnsubmitted Done Reply Inline Actions unnecessary MatzeB: unnecessary
		MatzeBUnsubmitted Done Reply Inline Actions We probably have to leave this to the targets. The ones without a concept of a red zone probably never have NoRedZone set. MatzeB: We probably have to leave this to the targets. The ones without a concept of a red zone…
		MatzeBUnsubmitted Not Done Reply Inline Actions As this is a codegen API passing in a MachineFunction& parameter is more natural. (Implementations can always use MF.getFunction() to get back to the llvm::Function) MatzeB: As this is a codegen API passing in a MachineFunction& parameter is more natural.
		MatzeBUnsubmitted Done Reply Inline Actions As this is a codegen API I would rather pass a `MachineFunction&` MatzeB: As this is a codegen API I would rather pass a `MachineFunction&`

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
MachineFunctionPrinterPass.cpp		MachineFunctionPrinterPass.cpp
MachineInstrBundle.cpp		MachineInstrBundle.cpp
MachineInstr.cpp		MachineInstr.cpp
MachineLICM.cpp		MachineLICM.cpp
MachineLoopInfo.cpp		MachineLoopInfo.cpp
MachineModuleInfo.cpp		MachineModuleInfo.cpp
MachineModuleInfoImpls.cpp		MachineModuleInfoImpls.cpp
MachineOptimizationRemarkEmitter.cpp		MachineOptimizationRemarkEmitter.cpp
		MachineOutliner.cpp
MachinePassRegistry.cpp		MachinePassRegistry.cpp
MachinePipeliner.cpp		MachinePipeliner.cpp
MachinePostDominators.cpp		MachinePostDominators.cpp
MachineRegionInfo.cpp		MachineRegionInfo.cpp
MachineRegisterInfo.cpp		MachineRegisterInfo.cpp
MachineScheduler.cpp		MachineScheduler.cpp
MachineSink.cpp		MachineSink.cpp
MachineSSAUpdater.cpp		MachineSSAUpdater.cpp
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
TargetRegisterInfo.cpp		TargetRegisterInfo.cpp
TargetSchedule.cpp		TargetSchedule.cpp
TargetSubtargetInfo.cpp		TargetSubtargetInfo.cpp
TwoAddressInstructionPass.cpp		TwoAddressInstructionPass.cpp
UnreachableBlockElim.cpp		UnreachableBlockElim.cpp
VirtRegMap.cpp		VirtRegMap.cpp
WinEHPrepare.cpp		WinEHPrepare.cpp
XRayInstrumentation.cpp		XRayInstrumentation.cpp

		MatzeBUnsubmitted Done Reply Inline Actions insert alphabetically. MatzeB: insert alphabetically.
ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen		${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen
${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP		${LLVM_MAIN_INCLUDE_DIR}/llvm/CodeGen/PBQP

LINK_LIBS ${LLVM_PTHREAD_LIB}		LINK_LIBS ${LLVM_PTHREAD_LIB}

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

add_subdirectory(SelectionDAG)		add_subdirectory(SelectionDAG)
add_subdirectory(AsmPrinter)		add_subdirectory(AsmPrinter)
add_subdirectory(MIRParser)		add_subdirectory(MIRParser)
add_subdirectory(GlobalISel)		add_subdirectory(GlobalISel)

lib/CodeGen/CodeGen.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	void llvm::initializeCodeGen(PassRegistry &Registry) {
initializeImplicitNullChecksPass(Registry);		initializeImplicitNullChecksPass(Registry);
initializeMachineCombinerPass(Registry);		initializeMachineCombinerPass(Registry);
initializeMachineCopyPropagationPass(Registry);		initializeMachineCopyPropagationPass(Registry);
initializeMachineDominatorTreePass(Registry);		initializeMachineDominatorTreePass(Registry);
initializeMachineFunctionPrinterPassPass(Registry);		initializeMachineFunctionPrinterPassPass(Registry);
initializeMachineLICMPass(Registry);		initializeMachineLICMPass(Registry);
initializeMachineLoopInfoPass(Registry);		initializeMachineLoopInfoPass(Registry);
initializeMachineModuleInfoPass(Registry);		initializeMachineModuleInfoPass(Registry);
		initializeMachineOutlinerPass(Registry);
initializeMachinePipelinerPass(Registry);		initializeMachinePipelinerPass(Registry);
initializeMachinePostDominatorTreePass(Registry);		initializeMachinePostDominatorTreePass(Registry);
initializeMachineRegionInfoPassPass(Registry);		initializeMachineRegionInfoPassPass(Registry);
initializeMachineSchedulerPass(Registry);		initializeMachineSchedulerPass(Registry);
initializeMachineSinkingPass(Registry);		initializeMachineSinkingPass(Registry);
initializeMachineVerifierPassPass(Registry);		initializeMachineVerifierPassPass(Registry);
initializeXRayInstrumentationPass(Registry);		initializeXRayInstrumentationPass(Registry);
initializePatchableFunctionPass(Registry);		initializePatchableFunctionPass(Registry);
Show All 34 Lines

lib/CodeGen/MachineOutliner.cpp

This file was added.

				//===---- MachineOutliner.cpp - Outline instructions ------------ C++ --===//
				//
				MatzeBUnsubmitted Done Reply Inline Actions MachineOutliner.cpp MatzeB: MachineOutliner.cpp
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// Replaces repeated sequences of instructions with function calls.
				MatzeBUnsubmitted Done Reply Inline Actions `/// \file ...` MatzeB: `/// \file ...`
				///
				/// This works by placing every instruction from every basic block in a
				/// suffix tree, and repeatedly querying that tree for repeated sequences of
				/// instructions. If a sequence of instructions appears often, then it ought
				/// to be beneficial to pull out into a function.
				MatzeBUnsubmitted Not Done Reply Inline Actions Should have INITIALIZE_PASS somewhere (so we can test it with llc -run-pass). MatzeB: Should have INITIALIZE_PASS somewhere (so we can test it with llc -run-pass).
				///
				silvasUnsubmitted Done Reply Inline Actions A link to your devmtg talk might be good (together with some explanation about how it relates to what is implemented here). silvas: A link to your devmtg talk might be good (together with some explanation about how it relates…
				/// This was originally presented at the 2016 LLVM Developers' Meeting in the
				/// talk "Reducing Code Size Using Outlining". For a high-level overview of
				/// how this pass works, the talk is available on YouTube at
				///
				/// https://www.youtube.com/watch?v=yorld-WSOeU
				///
				/// The slides for the talk are available at
				///
				/// http://www.llvm.org/devmtg/2016-11/Slides/Paquette-Outliner.pdf
				///
				/// The talk provides an overview of how the outliner finds candidates and
				/// ultimately outlines them. It describes how the main data structure for this
				MatzeBUnsubmitted Done Reply Inline Actions Use range based for. Use begin()/end(), instr_begin()/instr_end() will move into machine instruction bundles which is probably not wanted here. A few similar cases follow. MatzeB: Use range based for. Use begin()/end(), instr_begin()/instr_end() will move into machine…
				/// pass, the suffix tree, is queried and purged for candidates. It also gives
				/// a simplified suffix tree construction algorithm for suffix trees based off
				aprantlUnsubmitted Done Reply Inline Actions The LLVM coding guidelines prefer full sentences with a trailing "." at the end. aprantl: The LLVM coding guidelines prefer full sentences with a trailing "." at the end.
				MatzeBUnsubmitted Done Reply Inline Actions Comments should be full sentences (ending in a dot). Similar in the following comments. MatzeB: Comments should be full sentences (ending in a dot). Similar in the following comments.
				/// of the algorithm actually used here, Ukkonen's algorithm.
				///
				/// For the original RFC for this pass, please see
				///
				/// http://lists.llvm.org/pipermail/llvm-dev/2016-August/104170.html
				///
				/// For more information on the suffix tree data structure, please see
				/// https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
				///
				//===----------------------------------------------------------------------===//
				MatzeBUnsubmitted Done Reply Inline Actions Move this below the #includes so you do not accidentally affect the headers. MatzeB: Move this below the #includes so you do not accidentally affect the headers.
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/Twine.h"
				#include "llvm/CodeGen/MachineFrameInfo.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineModuleInfo.h"
				#include "llvm/CodeGen/Passes.h"
				silvasUnsubmitted Done Reply Inline Actions Nit: this won't work on case-sensitive file systems. silvas: Nit: this won't work on case-sensitive file systems.
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/Support/Allocator.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				MatzeBUnsubmitted Done Reply Inline Actions You could use `make_pair(...)` MatzeB: You could use `make_pair(...)`
				#include "llvm/Target/TargetInstrInfo.h"
				#include "llvm/Target/TargetMachine.h"
				silvasUnsubmitted Done Reply Inline Actions This seems to have some nontrivial invariants, so you may want something a bit stronger than a typedef. Maybe a lightweight struct around this would be better? You already have a couple helpers that seems like they could quite naturally be methods of such a struct. I guess the question is: does it ever make sense for a random piece of code that is using this typedef to actually operate on it using the std::vector API? Or would such code inherently risk violating some sort of invariant? If the latter, a lightweight struct encapsulating the underlying vector is likely to be a good choice. silvas: This seems to have some nontrivial invariants, so you may want something a bit stronger than a…
				#include "llvm/Target/TargetRegisterInfo.h"
				#include "llvm/Target/TargetSubtargetInfo.h"
				#include <functional>
				#include <map>
				#include <sstream>
				#include <tuple>
				silvasUnsubmitted Done Reply Inline Actions This doesn't really make much sense to me. Can you explain this data structure a bit better? silvas: This doesn't really make much sense to me. Can you explain this data structure a bit better?
				#include <vector>

				#define DEBUG_TYPE "machine-outliner"
				MatzeBUnsubmitted Done Reply Inline Actions pass basic types as value. MatzeB: pass basic types as value.

				MatzeBUnsubmitted Done Reply Inline Actions Maybe drop the `Stat` suffix, esp. in a statistic dump that looks superfluous. MatzeB: Maybe drop the `Stat` suffix, esp. in a statistic dump that looks superfluous.
				using namespace llvm;

				STATISTIC(NumOutlined, "Number of candidates outlined");
				STATISTIC(FunctionsCreated, "Number of functions created");

				namespace {
				silvasUnsubmitted Done Reply Inline Actions My reading of this comment interprets this as saying that this class e.g. holds a DenseMap of MI's to integers itself, but that is not the case. Can you make this more precise? The term "mapping" is somewhat confusing since in common usage a "mapping" usually denotes a container. But throughout this patch the term "mapping" is used to refer to a "string". I think I get what sense it is meant in (e.g. "this is the mapping of this MBB through the our value numbering map"). I can't think of any really good names except for something horribly verbose like "StringOfInstructionNumbers" or something like that. Anyway, you probably want to beef up this comment to describe the sense in which "mapping" is used here and throughout this patch. After staring at the patch for a while the term "mapping" has grown on me, so it's not a big deal. silvas: My reading of this comment interprets this as saying that this class e.g. holds a DenseMap of…

				/// Represents an undefined index in the suffix tree.
				silvasUnsubmitted Done Reply Inline Actions Rather than a generic "this is used for compatibility with the suffix tree", can you be more precise. Something like "Our suffix tree implementation operates on this class" or something? After changing the constructor argument of SuffixTree to `const std::vector<std::vector<unsigned>> &` the only use of this class is as an internal data structure of the SuffixTree, which is an even more precise statement. silvas: Rather than a generic "this is used for compatibility with the suffix tree", can you be more…
				const size_t EmptyIdx = -1;
				silvasUnsubmitted Done Reply Inline Actions This could use a better name. silvas: This could use a better name.

				/// A node in a suffix tree which represents a substring or suffix.
				///
				silvasUnsubmitted Done Reply Inline Actions This description in terms of "hash" vs "unique" doesn't seem accurate. Once you pull out a class to encapsulate the numbering you can just mention that here. As far as users of this class are concerned the numbers are just unique symbols; I don't think you need to go into too much detail besides linking to the place where we assign the numbering. silvas: This description in terms of "hash" vs "unique" doesn't seem accurate. Once you pull out a…
				/// Each node has either no children or at least two children, with the root
				/// being a exception in the empty tree.
				///
				/// Children are represented as a map between unsigned integers and nodes. If
				/// a node N has a child M on unsigned integer k, then the mapping represented
				/// by N is a proper prefix of the mapping represented by M. Note that this,
				/// although similar to a trie is somewhat different: each node stores a full
				/// substring of the full mapping rather than a single character state.
				///
				/// Each internal node contains a pointer to the internal node representing
				/// the same string, but with the first character chopped off. This is stored
				/// in \p Link. Each leaf node stores the start index of its respective
				/// suffix in \p SuffixIdx.
				struct SuffixTreeNode {

				/// The children of this node.
				MatzeBUnsubmitted Done Reply Inline Actions If I see this correctly, uses of this variable could be replaced with `FunctionList.size()`? MatzeB: If I see this correctly, uses of this variable could be replaced with `FunctionList.size()`?
				///
				/// A child existing on an unsigned integer implies that from the mapping
				/// represented by the current node, there is a way to reach another
				silvasUnsubmitted Done Reply Inline Actions Can you add a high-level explanation of why we have a 2D vector in the first place. I.e. why do we need to "pretend" instead of just materializing the flattened vector? One thing that may help make this clearer is encapsulating the mutation of `MBBMappings` a bit better. Right now there seem to be some un-encapsulated mutations, so just looking at the class it's not clear what the elementary operations on it are. silvas: Can you add a high-level explanation of why we have a 2D vector in the first place. I.e. why…
				/// mapping by tacking that character on the end of the current string.
				DenseMap<unsigned, SuffixTreeNode *> Children;

				/// A flag set to false if the node has been pruned from the tree.
				silvasUnsubmitted Done Reply Inline Actions Generally the term "program" is reserved for talking about the final linked executable, but this code (except during LTO) generally operates on a single Module/TU. Can you give this a better name? Maybe just `BlockMappings` or something like that. silvas: Generally the term "program" is reserved for talking about the final linked executable, but…
				bool IsInTree = true;

				/// The start index of this node's substring in the main string.
				size_t StartIdx = EmptyIdx;

				/// The end index of this node's substring in the main string.
				///
				MatzeBUnsubmitted Done Reply Inline Actions maybe repeat the type instead of `auto` to make it reader friendly. MatzeB: maybe repeat the type instead of `auto` to make it reader friendly.
				/// Every leaf node must have its \p EndIdx incremented at the end of every
				/// step in the construction algorithm. To avoid having to update O(N)
				MatzeBUnsubmitted Done Reply Inline Actions `std::pair<String , size_t>`? MatzeB:* `std::pair<String *, size_t>`?
				/// nodes individually at the end of every step, the end index is stored
				/// as a pointer.
				size_t *EndIdx = nullptr;

				/// For leaves, the start index of the suffix represented by this node.
				///
				/// For all other nodes, this is ignored.
				size_t SuffixIdx = EmptyIdx;

				/// \brief For internal nodes, a pointer to the internal node representing
				/// the same sequence with the first character chopped off.
				///
				/// This has two major purposes in the suffix tree. The first is as a
				/// shortcut in Ukkonen's construction algorithm. One of the things that
				silvasUnsubmitted Done Reply Inline Actions It shouldn't be too hard to bring this down to log(N) if necessary, but it surprises me that O(N) is fine when N = number of MBB's in the module (for reference, a typical FullLTO for a codebase ~ the size of clang has 10's of thousands of functions, and probably an order of magnitude more MBB's). Can you add a comment explaining that it takes linear time and why that's fine (or not fine, but fine for now)? silvas: It shouldn't be too hard to bring this down to log(N) if necessary, but it surprises me that O…
				/// Ukkonen's algorithm does to achieve linear-time construction is
				/// keep track of which node the next insert should be at. This makes each
				silvasUnsubmitted Done Reply Inline Actions Is there a good online resource you can link to about Ukkonen's algorithm? If this implementation is patterned off of a particular resource that might be good to link to if possible. silvas: Is there a good online resource you can link to about Ukkonen's algorithm? If this…
				/// insert O(1), and there are a total of O(N) inserts. The suffix link
				/// helps with inserting children of internal nodes.
				silvasUnsubmitted Done Reply Inline Actions Small coding standard nit: use static/anonymous namespace on all the helper stuff here and elsewhere: http://llvm.org/docs/CodingStandards.html#anonymous-namespaces silvas: Small coding standard nit: use static/anonymous namespace on all the helper stuff here and…
				///
				/// Say we add a child to an internal node with associated mapping S. The
				/// next insertion must be at the node representing S - its first character.
				/// This is given by the way that we iteratively build the tree in Ukkonen's
				/// algorithm. The main idea is to look at the suffixes of each prefix in the
				/// string, starting with the longest suffix of the prefix, and ending with
				/// the shortest. Therefore, if we keep pointers between such nodes, we can
				/// move to the next insertion point in O(1) time. If we don't, then we'd
				/// have to query from the root, which takes O(N) time. This would make the
				/// construction algorithm O(N^2) rather than O(N).
				MatzeBUnsubmitted Done Reply Inline Actions Could this create indeterminism (Candidate::operator< looks like a partial ordering to me). Possibly use `stable_sort` or better enhance operator< to a full ordering. MatzeB: Could this create indeterminism (Candidate::operator< looks like a partial ordering to me).
				///
				/// The suffix link is also used during the tree pruning process to let us
				/// quickly throw out a bunch of potential overlaps. Say we have a sequence
				/// S we want to outline. Then each of its suffixes contribute to at least
				/// one overlapping case. Therefore, we can follow the suffix links
				silvasUnsubmitted Done Reply Inline Actions I assume "start" was meant here. But "star index" (as in Kleene) sounds vaguely plausible in the context of a string algorithm so this typo is worth fixing. silvas: I assume "start" was meant here. But "star index" (as in Kleene) sounds vaguely plausible in…
				/// starting at the node associated with S to the root and "delete" those
				/// nodes, save for the root. For each candidate, this removes
				silvasUnsubmitted Done Reply Inline Actions Why is this comment talking about "2D mapping"? There is just a single index. silvas: Why is this comment talking about "2D mapping"? There is just a single index.
				/// O(\|candidate\|) overlaps from the search space. We don't actually
				/// completely invalidate these nodes though; doing that is far too
				/// aggressive. Consider the following pathological string:
				///
				/// 1 2 3 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3
				///
				/// If we, for the sake of example, outlined 1 2 3, then we would throw
				/// out all instances of 2 3. This isn't desirable. To get around this,
				/// when we visit a link node, we decrement its occurrence count by the
				silvasUnsubmitted Done Reply Inline Actions Do you really need a std::map? http://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc Also, if the number of outgoing edges is small, a small-type container is probably better here. silvas: Do you really need a std::map? http://llvm.org/docs/ProgrammersManual.html#map-like-containers…
				/// number of sequences we outlined in the current step. In the pathological
				MatzeBUnsubmitted Not Done Reply Inline Actions You can probably use a Twine here to avoid some of the heap allocations. MatzeB: You can probably use a Twine here to avoid some of the heap allocations.
				/// example, the 2 3 node would have an occurrence count of 8, while the
				/// 1 2 3 node would have an occurrence count of 2. Thus, the 2 3 node
				/// would survive to the next round allowing us to outline the extra
				/// instances of 2 3.
				silvasUnsubmitted Done Reply Inline Actions As per the comment, maybe call this "is in tree" or "already pruned" or something like that? silvas: As per the comment, maybe call this "is in tree" or "already pruned" or something like that?
				SuffixTreeNode *Link = nullptr;

				/// The parent of this node. Every node except for the root has a parent.
				SuffixTreeNode *Parent = nullptr;

				/// The number of times this node's string appears in the tree.
				silvasUnsubmitted Done Reply Inline Actions Can you be a bit more specific about where this shortcut link comes in in Ukkonen's algorithm? silvas: Can you be a bit more specific about where this shortcut link comes in in Ukkonen's algorithm?
				///
				/// This is equal to the number of leaf children of the string. It represents
				/// the number of suffixes that the node's string is a prefix of.
				size_t OccurrenceCount = 0;

				/// Returns true if this node is a leaf.
				constexpr bool isLeaf() const { return SuffixIdx != EmptyIdx; }

				/// Returns true if this node is the root of its owning \p SuffixTree.
				silvasUnsubmitted Done Reply Inline Actions Both `Start` and `End` here are described as "index" but one is a pointer. That's worth explaining in the comment. Also, SuffixIndex is also an "index" and gets an "Index" suffix to its variable name. Maybe these should be `StartIndex` and `EndIndex`? Also, other places in this patch use `Idx` for variable names to mean "index". Maybe these should be `StartIdx` etc.? silvas: Both `Start` and `End` here are described as "index" but one is a pointer. That's worth…
				MatzeBUnsubmitted Not Done Reply Inline Actions Suggestion (possibly for later patches): As far as I see it a node is either a leaf or an inner node and never changes it nature. You could make this and the constraints on the End and Children members a bit more obvious when representing this in a type hierarchy (and safe a bit of memory): struct SuffixTreeNode { bool IsLeaf; ... }; struct SuffixTreeLeafNode : public SuffixTreeNode { size_t EndIdx; size_t SuffixIdx; }; struct SuffixTreeInternalNode : SuffixTreeNode { Map<SuffixTreeNode> Children; }; MatzeB: Suggestion (possibly for later patches): As far as I see it a node is either a leaf or an inner…
				constexpr bool isRoot() const { return StartIdx == EmptyIdx; }

				/// Return the number of elements in the substring associated with this node.
				size_t size() const {

				// Is it the root? If so, it's the empty string so return 0.
				if (isRoot())
				MatzeBUnsubmitted Done Reply Inline Actions This looks like it could just be a constructor on the SuffixTreeNode struct. MatzeB: This looks like it could just be a constructor on the SuffixTreeNode struct.
				return 0;

				assert(*EndIdx != EmptyIdx && "EndIdx is undefined!");

				// Size = the number of elements in the string.
				// For example, [0 1 2 3] has length 4, not 3. 3-0 = 3, so we have 3-0+1.
				return *EndIdx - StartIdx + 1;
				MatzeBUnsubmitted Done Reply Inline Actions Looks like this could be a constructor. MatzeB: Looks like this could be a constructor.
				}

				SuffixTreeNode(size_t StartIdx, size_t EndIdx, SuffixTreeNode Link,
				MatzeBUnsubmitted Done Reply Inline Actions Should this be called `deleteSuffixTreeSubTree` or similar as it deletes more than 1 node? MatzeB: Should this be called `deleteSuffixTreeSubTree` or similar as it deletes more than 1 node?
				SuffixTreeNode *Parent)
				: StartIdx(StartIdx), EndIdx(EndIdx), Link(Link), Parent(Parent) {}
				MatzeBUnsubmitted Not Done Reply Inline Actions can be push_front(). MatzeB: can be push_front().

				SuffixTreeNode() {}
				};
				MatzeBUnsubmitted Done Reply Inline Actions The MI usage inside the while loop could use a separate declaration and this one moved down to the first assignment. MatzeB: The MI usage inside the while loop could use a separate declaration and this one moved down to…

				/// A data structure for fast substring queries.
				///
				/// Suffix trees represent the suffixes of their input strings in their leaves.
				/// A suffix tree is a type of compressed trie structure where each node
				aprantlUnsubmitted Not Done Reply Inline Actions I looked at the implementation of MachineFunction::CloneMachineInstr() but I still can't tell: What happens when the original instruction has a debug location on it - will it be stripped? What happens when the original instruction is a DBG_VALUE - will it be copied? aprantl: I looked at the implementation of MachineFunction::CloneMachineInstr() but I still can't tell…
				aprantlUnsubmitted Not Done Reply Inline Actions Did you get a chance to look into this? aprantl: Did you get a chance to look into this?
				aprantlUnsubmitted Done Reply Inline Actions Ping :-) Note that it is really important to skip over any DBG_VALUE intrinsics while deciding whether to outline a sequence of instructions. Otherwise compiling with -g will produce different code than without, which we generally consider to be serious bug in the compiler. aprantl: Ping :-) Note that it is really important to skip over any DBG_VALUE intrinsics while deciding…
				/// represents an entire substring rather than a single character. Each leaf
				/// of the tree is a suffix.
				///
				silvasUnsubmitted Done Reply Inline Actions This comment is pure gold. Nice. silvas: This comment is pure gold. Nice.
				/// A suffix tree can be seen as a type of state machine where each state is a
				/// substring of the full string. The tree is structured so that, for a string
				/// of length N, there are exactly N leaves in the tree. This structure allows
				/// us to quickly find repeated substrings of the input string.
				MatzeBUnsubmitted Done Reply Inline Actions this has no effect MatzeB: this has no effect
				///
				MatzeBUnsubmitted Done Reply Inline Actions Can this be rewritten to a `do {} while` loop to avoid the code duplication? MatzeB: Can this be rewritten to a `do {} while` loop to avoid the code duplication?
				/// In this implementation, a "string" is a vector of unsigned integers.
				/// These integers may result from hashing some data type. A suffix tree can
				/// contain 1 or many strings, which can then be queried as one large string.
				///
				/// The suffix tree is implemented using Ukkonen's algorithm for linear-time
				/// suffix tree construction. Ukkonen's algorithm is explained in more detail
				MatzeBUnsubmitted Done Reply Inline Actions `&` should not be necessary here, but can use a range based for loop anyway. MatzeB:* `&*` should not be necessary here, but can use a range based for loop anyway.
				/// in the paper by Esko Ukkonen "On-line construction of suffix trees. The
				/// paper is available at
				///
				/// https://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
				class SuffixTree {
				private:
				/// Each element is an integer representing an instruction in the module.
				ArrayRef<unsigned> Str;

				/// Maintains each node in the tree.
				BumpPtrAllocator NodeAllocator;

				/// The root of the suffix tree.
				///
				/// The root represents the empty string. It is maintained by the
				/// \p NodeAllocator like every other node in the tree.
				SuffixTreeNode *Root = nullptr;

				MatzeBUnsubmitted Done Reply Inline Actions range based for. MatzeB: range based for.
				/// Stores each leaf in the tree for better pruning.
				std::vector<SuffixTreeNode *> LeafVector;

				/// Maintains the end indices of the internal nodes in the tree.
				///
				/// Each internal node is guaranteed to never have its end index change
				MatzeBUnsubmitted Done Reply Inline Actions can be const. MatzeB: can be const.
				/// during the construction algorithm; however, leaves must be updated at
				/// every step. Therefore, we need to store leaf end indices by reference
				/// to avoid updating O(N) leaves at every step of construction. Thus,
				/// every internal node must be allocated its own end index.
				silvasUnsubmitted Done Reply Inline Actions Comment on the `+ 1`. silvas: Comment on the `+ 1`.
				BumpPtrAllocator InternalEndIdxAllocator;

				MatzeBUnsubmitted Done Reply Inline Actions Maybe handle the special case early: if (StartIdx == EmptyIdx) return 0; return EndIdx - StartIdx + 1; I assume this is not supposed to be called with EndIdx == EmptyIdx? Add an assert()? MatzeB: - Maybe handle the special case early: ``` if (StartIdx == EmptyIdx) return 0; return…
				/// The end index of each leaf in the tree.
				size_t LeafEndIdx = -1;

				/// \brief Helper struct which keeps track of the next insertion point in
				MatzeBUnsubmitted Done Reply Inline Actions Maybe replace this with `size_t j = i - OffsetedStringStart;` inside the loop? MatzeB: Maybe replace this with `size_t j = i - OffsetedStringStart;` inside the loop?
				/// Ukkonen's algorithm.
				struct ActiveState {
				/// The next node to insert at.
				MatzeBUnsubmitted Done Reply Inline Actions Interesting to see this packaged up in an own struct instead of just putting the members directly into the SuffixTree class. But doesn't hurt either I guess. MatzeB: Interesting to see this packaged up in an own struct instead of just putting the members…
				SuffixTreeNode *Node;

				/// The index of the first character in the substring currently being added.
				size_t Idx = EmptyIdx;

				/// The length of the substring we have to add at the current step.
				size_t Len = 0;
				};

				/// \brief The point the next insertion will take place at in the
				/// construction algorithm.
				ActiveState Active;

				/// Allocate a leaf node and add it to the tree.
				silvasUnsubmitted Done Reply Inline Actions Isn't it a more like that the leaves "represent" suffixes rather than "contain" suffixes? silvas: Isn't it a more like that the leaves "represent" suffixes rather than "contain" suffixes?
				///
				/// \param Parent The parent of this node.
				silvasUnsubmitted Done Reply Inline Actions I don't think this is correct. It isn't the leaves representing suffixes per se that facilitates finding repeated substrings, but rather the fact that the internal nodes represent repeated substrings (shared prefixes of the suffixes). You may want to beef up this paragraph on suffix trees a bit to describe the basic invariants a bit more (leaves represent suffixes, internal nodes represent shared prefixes of the suffixes). silvas: I don't think this is correct. It isn't the leaves representing suffixes per se that…
				/// \param StartIdx The start index of this node's associated string.
				/// \param Edge The label on the edge leaving \p Parent to this node.
				///
				silvasUnsubmitted Done Reply Inline Actions Same comment as on ProgramMapping. The integers themselves aren't hashes (otherwise this code would have to do something special for collisions). silvas: Same comment as on ProgramMapping. The integers themselves aren't hashes (otherwise this code…
				/// \returns A pointer to the allocated leaf node.
				SuffixTreeNode *insertLeaf(SuffixTreeNode &Parent, size_t StartIdx,
				unsigned Edge) {

				assert(StartIdx <= LeafEndIdx && "String can't start after it ends!");

				MatzeBUnsubmitted Done Reply Inline Actions `Target` or rather `STI` like most codegen code. MatzeB: `Target` or rather `STI` like most codegen code.
				SuffixTreeNode *N = new (NodeAllocator) SuffixTreeNode(StartIdx,
				&LeafEndIdx,
				nullptr,
				&Parent);
				silvasUnsubmitted Done Reply Inline Actions I'm not sure what you are trying to convey with this paragraph. Maybe you can just mention that the implementation maintains parent links (maybe also describe what they are used for). I don't see the big picture for talking about cycles or explicit digraphs here. silvas: I'm not sure what you are trying to convey with this paragraph. Maybe you can just mention that…
				Parent.Children[Edge] = N;

				return N;
				}

				/// Allocate an internal node and add it to the tree.
				///
				/// \param Parent The parent of this node. Only null when allocating the root.
				MatzeBUnsubmitted Done Reply Inline Actions Couldn't you rather do `auto StartIt = It;` after the loop? MatzeB: Couldn't you rather do `auto StartIt = It;` after the loop?
				/// \param StartIdx The start index of this node's associated string.
				MatzeBUnsubmitted Done Reply Inline Actions Can be moved after the loop. MatzeB: Can be moved after the loop.
				silvasUnsubmitted Done Reply Inline Actions The RAII behavior of BumpPtrAllocator is pretty well-known, so you don't need to explicitly mention it (that's the beauty of RAII; it just cleans up for you!). This comment can probably be reduced to "Allocator that owns all nodes in the tree". silvas: The RAII behavior of BumpPtrAllocator is pretty well-known, so you don't need to explicitly…
				/// \param EndIdx The end index of this node's associated string.
				/// \param Edge The label on the edge leaving \p Parent to this node.
				///
				/// \returns A pointer to the allocated internal node.
				silvasUnsubmitted Not Done Reply Inline Actions This arrangement of adding an extra layer of indirection for the special "EndIdx" handling for leaf nodes is interesting but after staring at the code for a while it seems like it obscures things (or maybe I'm missing something). It seems that the only thing that needs this extra layer of indirection is during construction in the test `if (StrIdx > (CurrSuffixTreeNode->EndIdx)) {` which could be replaced by something like `if (StrIdx > (CurrSuffixTreeNode->EndIdx == -1 : LeafEndIdx : CurrSuffixTreeNode->EndIdx) {` or similar. To do that, SuffixTreeNode::EndIdx would just be an integer held by value, with -1 being a sentinel indicating that this is a leaf that needs the special handling in that one test. silvas:* This arrangement of adding an extra layer of indirection for the special "EndIdx" handling for…
				SuffixTreeNode insertInternalNode(SuffixTreeNode Parent, size_t StartIdx,
				MatzeBUnsubmitted Done Reply Inline Actions Maybe `It = std::advance(It, StringLocation.second);`? Similar in the next loop. MatzeB: Maybe `It = std::advance(It, StringLocation.second);`? Similar in the next loop.
				size_t EndIdx, unsigned Edge) {
				silvasUnsubmitted Done Reply Inline Actions what does the EndIdx even mean for an internal node, since they can be shared? Maybe explain that in the comment of the EndIdx member of SuffixTreeNode? silvas: what does the EndIdx even mean for an internal node, since they can be shared? Maybe explain…

				assert(StartIdx <= EndIdx && "String can't start after it ends!");
				assert(!(!Parent && StartIdx != EmptyIdx) &&
				"Non-root internal nodes must have parents!");

				MatzeBUnsubmitted Not Done Reply Inline Actions Use a new iteration variable and count from 0 to C.Length? MatzeB: Use a new iteration variable and count from 0 to C.Length?
				size_t *E = new (InternalEndIdxAllocator) size_t(EndIdx);
				SuffixTreeNode *N = new (NodeAllocator) SuffixTreeNode(StartIdx,
				E,
				MatzeBUnsubmitted Done Reply Inline Actions It looks like you can use a single allocator for nodes and EndIndexes. MatzeB: It looks like you can use a single allocator for nodes and EndIndexes.
				silvasUnsubmitted Done Reply Inline Actions You can use the same BumpPtrAllocator for both of these if you want. It's kind of nice to have a place for these comments though so no biggie. silvas: You can use the same BumpPtrAllocator for both of these if you want. It's kind of nice to have…
				Root,
				Parent);
				if (Parent)
				Parent->Children[Edge] = N;
				silvasUnsubmitted Done Reply Inline Actions Move the comment inside the braces so that this can use the more common `} else {` style, here and elsewhere. silvas: Move the comment inside the braces so that this can use the more common `} else {` style, here…

				return N;
				}

				/// \brief Set the suffix indices of the leaves to the start indices of their
				/// respective suffixes. Also stores each leaf in \p LeafVector at its
				/// respective suffix index.
				///
				/// \param[in] CurrNode The node currently being visited.
				/// \param CurrIdx The current index of the string being visited.
				void setSuffixIndices(SuffixTreeNode &CurrNode, size_t CurrIdx) {

				bool IsLeaf = CurrNode.Children.size() == 0 && !CurrNode.isRoot();

				MatzeBUnsubmitted Done Reply Inline Actions range based for. MatzeB: range based for.
				// Traverse the tree depth-first.
				for (auto &ChildPair : CurrNode.Children) {
				assert(ChildPair.second && "Node had a null child!");
				MatzeBUnsubmitted Done Reply Inline Actions Usually called `STI` MatzeB: Usually called `STI`
				setSuffixIndices(*ChildPair.second,
				CurrIdx + ChildPair.second->size());
				}

				// Is this node a leaf?
				if (IsLeaf) {
				// If yes, give it a suffix index and bump its parent's occurrence count.
				MatzeBUnsubmitted Done Reply Inline Actions range based for. MatzeB: range based for.
				CurrNode.SuffixIdx = Str.size() - CurrIdx;
				assert(CurrNode.Parent && "CurrNode had no parent!");
				CurrNode.Parent->OccurrenceCount++;

				// Store the leaf in the leaf vector for pruning later.
				LeafVector[CurrNode.SuffixIdx] = &CurrNode;
				}
				silvasUnsubmitted Done Reply Inline Actions There's quite a few naked new/delete in this code. Can you encapsulate the ownership better? If not, can you centralize the new/delete in helpers and add some comments about lifetime/ownership? silvas: There's quite a few naked new/delete in this code. Can you encapsulate the ownership better? If…
				}

				/// \brief Construct the suffix tree for the prefix of the input ending at
				/// \p EndIdx.
				///
				/// Used to construct the full suffix tree iteratively. At the end of each
				/// step, the constructed suffix tree is either a valid suffix tree, or a
				/// suffix tree with implicit suffixes. At the end of the final step, the
				MatzeBUnsubmitted Done Reply Inline Actions Maybe add else assert(EndIdx == EmptyIdx); to make sure callers know what they are doing. An alternative would be to provide different functions for inserting leafs or inner nodes. MatzeB: Maybe add ``` else assert(EndIdx == EmptyIdx); ``` to make sure callers know what they are…
				/// suffix tree is a valid tree.
				///
				silvasUnsubmitted Done Reply Inline Actions Can `Parent` just be a constructor argument? silvas: Can `Parent` just be a constructor argument?
				/// \param EndIdx The end index of the current prefix in the main string.
				aprantlUnsubmitted Not Done Reply Inline Actions Maybe use a unique_ptr for this? aprantl: Maybe use a unique_ptr for this?
				MatzeBUnsubmitted Done Reply Inline Actions runOnModule() may be called multiple times if we have multiple modules. With the suffixtree being allocated in the constructor this would lead to multiple deletions of the same thing. MatzeB: runOnModule() may be called multiple times if we have multiple modules. With the suffixtree…
				/// \param SuffixesToAdd The number of suffixes that must be added
				/// to complete the suffix tree at the current phase.
				///
				/// \returns The number of suffixes that have not been added at the end of
				/// this step.
				unsigned extend(size_t EndIdx, size_t SuffixesToAdd) {
				SuffixTreeNode *NeedsLink = nullptr;

				while (SuffixesToAdd > 0) {

				// Are we waiting to add anything other than just the last character?
				if (Active.Len == 0) {
				// If not, then say the active index is the end index.
				Active.Idx = EndIdx;
				}
				MatzeBUnsubmitted Done Reply Inline Actions You can save an indentation level here with if (ChildPair.second == nullptr) continue; MatzeB: You can save an indentation level here with ``` if (ChildPair.second == nullptr) continue; ```

				assert(Active.Idx <= EndIdx && "Start index can't be after end index!");

				silvasUnsubmitted Done Reply Inline Actions This assert is inside an `if(ChildPair.second != nullptr)` so probably doesn't buy you much. silvas: This assert is inside an `if(ChildPair.second != nullptr)` so probably doesn't buy you much.
				// The first character in the current substring we're looking at.
				unsigned FirstChar = Str[Active.Idx];

				// Have we inserted anything starting with FirstChar at the current node?
				if (Active.Node->Children.count(FirstChar) == 0) {
				// If not, then we can just insert a leaf and move too the next step.
				insertLeaf(*Active.Node, EndIdx, FirstChar);

				// The active node is an internal node, and we visited it, so it must
				// need a link if it doesn't have one.
				if (NeedsLink) {
				NeedsLink->Link = Active.Node;
				NeedsLink = nullptr;
				}
				} else {
				// There's a match with FirstChar, so look for the point in the tree to
				// insert a new node.
				SuffixTreeNode *NextNode = Active.Node->Children[FirstChar];

				size_t SubstringLen = NextNode->size();

				MatzeBUnsubmitted Done Reply Inline Actions As you only have 1 "out" parameter you could simply return the new value instead. At the call side I find `SuffixesToAdd = extend(x, y, SuffixesToAdd);` easier to understand when you do not have to wonder whether a parameters is an "out" parameter. The NeedsLink parameter is nullptr for all callers? MatzeB: As you only have 1 "out" parameter you could simply return the new value instead. At the call…
				// Is the current suffix we're trying to insert longer than the size of
				// the child we want to move to?
				if (Active.Len >= SubstringLen) {
				// If yes, then consume the characters we've seen and move to the next
				MatzeBUnsubmitted Done Reply Inline Actions can be `size_t size() const` MatzeB: can be `size_t size() const`
				MatzeBUnsubmitted Done Reply Inline Actions General note on comments: I would expect comments at this indentation level to talk about properties/situations at that level and not just inside the if. This gets clearer if you formulate the comment as a question, or an if-then statement (or move the comment into the if block): // Look at the last character if the current mapping is 0. if (Active.Len == 0) Active.Idx = EndIdx; // Current mapping is 0? if (Active.Len == 0) { // Look at the last added character. Active.Idx = EndIdx; } MatzeB: General note on comments: I would expect comments at this indentation level to talk about…
				// node.
				Active.Idx += SubstringLen;
				Active.Len -= SubstringLen;
				Active.Node = NextNode;
				continue;
				}
				MatzeBUnsubmitted Done Reply Inline Actions `\param` MatzeB: `\param`
				MatzeBUnsubmitted Done Reply Inline Actions LastChar can be moved further down as it's not used by some paths through the function. MatzeB: LastChar can be moved further down as it's not used by some paths through the function.

				// Otherwise, the suffix we're trying to insert must be contained in the
				// next node we want to move to.
				unsigned LastChar = Str[EndIdx];

				// Is the string we're trying to insert a substring of the next node?
				if (Str[NextNode->StartIdx + Active.Len] == LastChar) {
				// If yes, then we're done for this step. Remember our insertion point
				// and move to the next end index. At this point, we have an implicit
				// suffix tree.
				if (NeedsLink && !Active.Node->isRoot()) {
				NeedsLink->Link = Active.Node;
				NeedsLink = nullptr;
				}

				Active.Len++;
				break;
				}

				// The string we're trying to insert isn't a substring of the next node,
				MatzeBUnsubmitted Done Reply Inline Actions No need for to cast `EndIdx`. MatzeB: No need for to cast `EndIdx`.
				// but matches up to a point. Split the node.
				//
				// For example, say we ended our search at a node n and we're trying to
				// insert ABD. Then we'll create a new node s for AB, reduce n to just
				// representing C, and insert a new leaf node l to represent d. This
				// allows us to ensure that if n was a leaf, it remains a leaf.
				//
				// \| ABC ---split---> \| AB
				// n s
				// C / \ D
				// n l

				// The node s from the diagram
				SuffixTreeNode *SplitNode =
				insertInternalNode(Active.Node,
				NextNode->StartIdx,
				NextNode->StartIdx + Active.Len - 1,
				FirstChar);

				// Insert the new node representing the new substring into the tree as
				// a child of the split node. This is the node l from the diagram.
				insertLeaf(*SplitNode, EndIdx, LastChar);

				// Make the old node a child of the split node and update its start
				// index. This is the node n from the diagram.
				NextNode->StartIdx += Active.Len;
				NextNode->Parent = SplitNode;
				SplitNode->Children[Str[NextNode->StartIdx]] = NextNode;

				// SplitNode is an internal node, update the suffix link.
				if (NeedsLink)
				NeedsLink->Link = SplitNode;

				NeedsLink = SplitNode;
				}

				// We've added something new to the tree, so there's one less suffix to
				// add.
				SuffixesToAdd--;

				if (Active.Node->isRoot()) {
				if (Active.Len > 0) {
				Active.Len--;
				Active.Idx = EndIdx - SuffixesToAdd + 1;
				}
				} else {
				// Start the next phase at the next smallest suffix.
				Active.Node = Active.Node->Link;
				}
				}

				MatzeBUnsubmitted Done Reply Inline Actions use a reference instead of a pointer? MatzeB: use a reference instead of a pointer?
				return SuffixesToAdd;
				}

				/// \brief Return the start index and length of a string which maximizes a
				MatzeBUnsubmitted Done Reply Inline Actions Add a comment that you check whether Active.Node is the root or alternatively add a `SuffixTreeNode::isRoot()` function. MatzeB: Add a comment that you check whether Active.Node is the root or alternatively add a…
				/// benefit function by traversing the tree depth-first.
				///
				/// Helper function for \p bestRepeatedSubstring.
				///
				MatzeBUnsubmitted Done Reply Inline Actions Could do an early exit: if (MaxHeight == 0) return nullptr; MatzeB: Could do an early exit: ``` if (MaxHeight == 0) return nullptr; ```
				/// \param CurrNode The node currently being visited.
				/// \param CurrLen Length of the current string.
				/// \param[out] BestLen Length of the most beneficial substring.
				/// \param[out] MaxBenefit Benefit of the most beneficial substring.
				/// \param[out] BestStartIdx Start index of the most beneficial substring.
				/// \param BenefitFn The function the query should return a maximum string
				/// for.
				void findBest(SuffixTreeNode &CurrNode, size_t CurrLen, size_t &BestLen,
				MatzeBUnsubmitted Done Reply Inline Actions Indentation MatzeB: Indentation
				size_t &MaxBenefit, size_t &BestStartIdx,
				const std::function<unsigned(SuffixTreeNode &, size_t CurrLen)>
				&BenefitFn) {

				if (!CurrNode.IsInTree)
				return;
				MatzeBUnsubmitted Done Reply Inline Actions This should rather be ArrayRef<unsigned>. MatzeB: This should rather be ArrayRef<unsigned>.

				// Can we traverse further down the tree?
				if (!CurrNode.isLeaf()) {
				// If yes, continue the traversal.
				for (auto &ChildPair : CurrNode.Children) {
				if (ChildPair.second && ChildPair.second->IsInTree)
				findBest(*ChildPair.second, CurrLen + ChildPair.second->size(),
				BestLen, MaxBenefit, BestStartIdx, BenefitFn);
				}
				} else {
				// We hit a leaf.
				size_t StringLen = CurrLen - CurrNode.size();
				unsigned Benefit = BenefitFn(CurrNode, StringLen);

				// Did we do better than in the last step?
				if (Benefit <= MaxBenefit)
				return;

				// We did better, so update the best string.
				MaxBenefit = Benefit;
				BestStartIdx = CurrNode.SuffixIdx;
				BestLen = StringLen;
				MatzeBUnsubmitted Done Reply Inline Actions Add `assert(SuffixesToAdd == 0);`? MatzeB: Add `assert(SuffixesToAdd == 0);`?
				}
				}

				public:

				MatzeBUnsubmitted Done Reply Inline Actions Maybe `setSuffixIndices(..., /LabelHeight =/0)` instead of the local variable so readers do not keep wondering whether it is an "out" parameter. MatzeB: Maybe `setSuffixIndices(..., /LabelHeight =/0)` instead of the local variable so readers do…
				/// \brief Return a substring of the tree with maximum benefit if such a
				/// substring exists.
				///
				/// Clears the input vector and fills it with a maximum substring or empty.
				///
				/// \param[in,out] Best The most beneficial substring in the tree. Empty
				/// if it does not exist.
				/// \param BenefitFn The function the query should return a maximum string
				/// for.
				void bestRepeatedSubstring(std::vector<unsigned> &Best,
				const std::function<unsigned(SuffixTreeNode &, size_t CurrLen)>
				&BenefitFn) {
				Best.clear();
				size_t Length = 0; // Becomes the length of the best substring.
				size_t Benefit = 0; // Becomes the benefit of the best substring.
				size_t StartIdx = 0; // Becomes the start index of the best substring.
				findBest(*Root, 0, Length, Benefit, StartIdx, BenefitFn);

				for (size_t Idx = 0; Idx < Length; Idx++)
				Best.push_back(Str[Idx + StartIdx]);
				}

				/// Perform a depth-first search for \p QueryString on the suffix tree.
				///
				/// \param QueryString The string to search for.
				/// \param CurrIdx The current index in \p QueryString that is being matched
				/// against.
				/// \param CurrNode The suffix tree node being searched in.
				///
				/// \returns A \p SuffixTreeNode that \p QueryString appears in if such a
				silvasUnsubmitted Done Reply Inline Actions Nit: move the comment inside so you can use the coding-standard compliant `} else if (...) {` silvas: Nit: move the comment inside so you can use the coding-standard compliant `} else if (...) {`
				/// node exists, and \p nullptr otherwise.
				SuffixTreeNode *findString(const std::vector<unsigned> &QueryString,
				size_t &CurrIdx, SuffixTreeNode *CurrNode) {

				MatzeBUnsubmitted Done Reply Inline Actions the braces around the cases are unnecessary MatzeB: the braces around the cases are unnecessary
				// The search ended at a nonexistent or pruned node. Quit.
				if (!CurrNode \|\| !CurrNode->IsInTree)
				return nullptr;

				unsigned Edge = QueryString[CurrIdx]; // The edge we want to move on.
				SuffixTreeNode *NextNode = CurrNode->Children[Edge]; // Next node in query.

				if (CurrNode->isRoot()) {
				// If we're at the root we have to check if there's a child, and move to
				// that child. Don't consume the character since \p Root represents the
				// empty string.
				if (NextNode && NextNode->IsInTree)
				return findString(QueryString, CurrIdx, NextNode);
				return nullptr;
				}

				size_t StrIdx = CurrNode->StartIdx;
				size_t MaxIdx = QueryString.size();
				bool ContinueSearching = false;

				// Match as far as possible into the string. If there's a mismatch, quit.
				for (; CurrIdx < MaxIdx; CurrIdx++, StrIdx++) {
				Edge = QueryString[CurrIdx];

				// We matched perfectly, but still have a remainder to search.
				if (StrIdx > *(CurrNode->EndIdx)) {
				ContinueSearching = true;
				break;
				}

				if (Edge != Str[StrIdx])
				MatzeBUnsubmitted Done Reply Inline Actions This can use an early exit so you do not need to indent everything inside the if. MatzeB: This can use an early exit so you do not need to indent everything inside the if.
				return nullptr;
				}

				NextNode = CurrNode->Children[Edge];

				// Move to the node which matches what we're looking for and continue
				// searching.
				if (ContinueSearching)
				return findString(QueryString, CurrIdx, NextNode);

				// We matched perfectly so we're done.
				return CurrNode;
				}
				MatzeBUnsubmitted Done Reply Inline Actions This seems to be an impossible case as you already tested for `N != nullptr`. MatzeB: This seems to be an impossible case as you already tested for `N != nullptr`.
				silvasUnsubmitted Done Reply Inline Actions Nit: Putting this `CurrSuffixTreeNode->Children[QueryString[CurrIdx]]` expression (used 3 times) in a variable will make things a bit shorter and also give you an opportunity to give that value a name. E.g. maybe `if (Child && Child->IsInTree) {`? silvas: Nit: Putting this `CurrSuffixTreeNode->Children[QueryString[CurrIdx]]` expression (used 3…

				/// \brief Remove a node from a tree and all nodes representing proper
				/// suffixes of that node's string.
				///
				/// This is used in the outlining algorithm to reduce the number of
				/// overlapping candidates
				///
				/// \param N The suffix tree node to start pruning from.
				/// \param Len The length of the string to be pruned.
				silvasUnsubmitted Done Reply Inline Actions Nit: move this comment inside the `else` so that you have `} else {`. silvas: Nit: move this comment inside the `else` so that you have `} else {`.
				///
				/// \returns True if this candidate didn't overlap with a previously chosen
				/// candidate.
				bool prune(SuffixTreeNode *N, size_t Len) {

				bool NoOverlap = true;
				std::vector<unsigned> IndicesToPrune;

				// Look at each of N's children.
				silvasUnsubmitted Done Reply Inline Actions Comment on the `- 1` part of this. It's setting off my off-by-one error spidey sense. silvas: Comment on the `- 1` part of this. It's setting off my off-by-one error spidey sense.
				for (auto &ChildPair : N->Children) {
				SuffixTreeNode *M = ChildPair.second;

				// Is this a leaf child?
				if (M && M->IsInTree && M->isLeaf()) {
				// Save each leaf child's suffix indices and remove them from the tree.
				IndicesToPrune.push_back(M->SuffixIdx);
				M->IsInTree = false;
				MatzeBUnsubmitted Done Reply Inline Actions This could be for (SuffixTreeNode T = N->Link; T && T != Root; T = T->Link) T->Valid = false; (It is usually better to go with a for loop instead of `while() { ... increment; }` on principle, because you do not have to remember to duplicate the increment code when you want to use `continue` somewhere inside the loop) MatzeB:* This could be ``` for (SuffixTreeNode *T = N->Link; T && T != Root; T = T->Link) T->Valid =…
				}
				}

				// Remove each suffix we have to prune from the tree. Each of these will be
				// I + some offset for I in IndicesToPrune and some offset < Len.
				unsigned Offset = 1;
				for (unsigned CurrentSuffix = 1; CurrentSuffix < Len; CurrentSuffix++) {
				for (unsigned I : IndicesToPrune) {

				unsigned PruneIdx = I + Offset;
				MatzeBUnsubmitted Done Reply Inline Actions Maybe do not initialize these to give a hint to the reader that they will be set by `findString()` anyway (If findString() doesn't always set them, then you should make it). MatzeB: Maybe do not initialize these to give a hint to the reader that they will be set by `findString…

				// Is this index actually in the string?
				if (PruneIdx < LeafVector.size()) {
				MatzeBUnsubmitted Done Reply Inline Actions Could be an early exit. MatzeB: Could be an early exit.
				// If yes, we have to try and prune it.
				// Was the current leaf already pruned by another candidate?
				if (LeafVector[PruneIdx]->IsInTree) {
				// If not, prune it.
				LeafVector[PruneIdx]->IsInTree = false;
				} else {
				// If yes, signify that we've found an overlap, but keep pruning.
				NoOverlap = false;
				}

				// Update the parent of the current leaf's occurrence count.
				SuffixTreeNode *Parent = LeafVector[PruneIdx]->Parent;

				MatzeBUnsubmitted Done Reply Inline Actions Use `const StringCollection &Strings` MatzeB: Use `const StringCollection &Strings`
				// Is the parent still in the tree?
				if (Parent->OccurrenceCount > 0) {
				Parent->OccurrenceCount--;
				Parent->IsInTree = (Parent->OccurrenceCount > 1);
				}
				}
				}

				// Move to the next character in the string.
				Offset++;
				}

				// We know we can never outline anything which starts one index back from
				// the indices we want to outline. This is because our minimum outlining
				// length is always 2.
				for (unsigned I : IndicesToPrune) {
				if (I > 0) {

				unsigned PruneIdx = I-1;
				MatzeBUnsubmitted Done Reply Inline Actions The deleteSuffixTreeNode() implementation already protects against nullptrs. MatzeB: The deleteSuffixTreeNode() implementation already protects against nullptrs.
				SuffixTreeNode *Parent = LeafVector[PruneIdx]->Parent;

				// Was the leaf one index back from I already pruned?
				if (LeafVector[PruneIdx]->IsInTree) {
				// If not, prune it.
				LeafVector[PruneIdx]->IsInTree = false;
				} else {
				silvasUnsubmitted Done Reply Inline Actions You're copying quite a few std::vector's here. Can they be ArrayRef's? silvas: You're copying quite a few std::vector's here. Can they be ArrayRef's?
				// If yes, signify that we've found an overlap, but keep pruning.
				NoOverlap = false;
				}

				// Update the parent of the current leaf's occurrence count.
				if (Parent->OccurrenceCount > 0) {
				Parent->OccurrenceCount--;
				Parent->IsInTree = (Parent->OccurrenceCount > 1);
				}
				}
				}

				// Finally, remove N from the tree and set its occurrence count to 0.
				N->IsInTree = false;
				MatzeBUnsubmitted Done Reply Inline Actions move this into the loop. MatzeB: move this into the loop.
				N->OccurrenceCount = 0;

				return NoOverlap;
				}

				/// \brief Find each occurrence of of a string in \p QueryString and prune
				/// their nodes.
				///
				/// \param QueryString The string to search for.
				/// \param[out] Occurrences The start indices of each occurrence.
				///
				/// \returns Whether or not the occurrence overlaps with a previous candidate.
				bool findOccurrencesAndPrune(const std::vector<unsigned> &QueryString,
				std::vector<size_t> &Occurrences) {
				size_t Dummy = 0;
				SuffixTreeNode *N = findString(QueryString, Dummy, Root);

				if (!N \|\| !N->IsInTree)
				return false;

				// If this is an internal node, occurrences are the number of leaf children
				// of the node.
				for (auto &ChildPair : N->Children) {
				SuffixTreeNode *M = ChildPair.second;

				// Is it a leaf? If so, we have an occurrence.
				MatzeBUnsubmitted Done Reply Inline Actions the llvm:: prefix should not be necessary (same for OccBB) MatzeB: the llvm:: prefix should not be necessary (same for OccBB)
				if (M && M->IsInTree && M->isLeaf())
				Occurrences.push_back(M->SuffixIdx);
				}

				// If we're in a leaf, then this node is the only occurrence.
				if (N->isLeaf())
				Occurrences.push_back(N->SuffixIdx);

				return prune(N, QueryString.size());
				}

				/// Construct a suffix tree from a sequence of unsigned integers.
				silvasUnsubmitted Done Reply Inline Actions This iterates over Strings.MBBMappings. In what sense does it treat Strings as "flat" if it looks at individual substrings? Also, I find it a bit weird that we take a ProgramMapping as input to this constructor, but then all these `append` calls seem to build up a different ProgramMapping. Do the two ProgramMapping's end up being equivalent? It would be nice to see a bit more explanation about this. At least for the purposes of this constructor, maybe a `const std::vector<std::vector<unsigned>> &` is the natural interface because it doesn't use any of the fancy methods on ProgramMapping. silvas: This iterates over Strings.MBBMappings. In what sense does it treat Strings as "flat" if it…
				///
				/// \param Str The string to construct the suffix tree for.
				SuffixTree(const std::vector<unsigned> &Str) : Str(Str) {
				Root = insertInternalNode(nullptr, EmptyIdx, EmptyIdx, 0);
				Root->IsInTree = true;
				Active.Node = Root;
				LeafVector.reserve(Str.size());

				// Keep track of the number of suffixes we have to add of the current
				// prefix.
				size_t SuffixesToAdd = 0;
				Active.Node = Root;

				// Construct the suffix tree iteratively on each prefix of the string.
				// PfxEndIdx is the end index of the current prefix.
				// End is one past the last element in the string.
				MatzeBUnsubmitted Done Reply Inline Actions You probably only need to put INITIALIZE_PASS and createOutlinerPass() in the llvm namespace. MatzeB: You probably only need to put INITIALIZE_PASS and createOutlinerPass() in the llvm namespace.
				for (size_t PfxEndIdx = 0, End = Str.size(); PfxEndIdx < End; PfxEndIdx++) {
				SuffixesToAdd++;
				LeafEndIdx = PfxEndIdx; // Extend each of the leaves.
				SuffixesToAdd = extend(PfxEndIdx, SuffixesToAdd);
				}

				// Set the suffix indices of each leaf.
				assert(Root && "Root node can't be nullptr!");
				setSuffixIndices(*Root, 0);
				}
				silvasUnsubmitted Done Reply Inline Actions It seems a bit weird to me that this class is caring explicitly about the distinction between the "flat" and non-"flat" senses of ProgramMapping. I thought that ProgramMapping was just supposed to encapsulate a 2D ragged array and make it look flat, but here it seems that the external code still cares about the distinction between 2D-ness and flat-ness. Can you make it a bit clearer in the code and comments what ProgramMapping is supposed to represent and what its interactions with the rest of the code are? silvas: It seems a bit weird to me that this class is caring explicitly about the distinction between…
				};

				/// \brief An individual sequence of instructions to be replaced with a call to
				/// an outlined function.
				struct Candidate {

				/// Set to false if the candidate overlapped with another candidate.
				bool InCandidateList = true;

				/// The start index of this \p Candidate.
				size_t StartIdx;

				/// The number of instructions in this \p Candidate.
				size_t Len;

				/// The index of this \p Candidate's \p OutlinedFunction in the list of
				/// \p OutlinedFunctions.
				size_t FunctionIdx;

				silvasUnsubmitted Done Reply Inline Actions Can you hold this by value? silvas: Can you hold this by value?
				Candidate(size_t StartIdx, size_t Len, size_t FunctionIdx)
				: StartIdx(StartIdx), Len(Len), FunctionIdx(FunctionIdx) {}
				MatzeBUnsubmitted Done Reply Inline Actions Maybe use `"MIR Function Outlining"` to be in sync with INITIALIZE_PASS. Most llvm pass 'names' read more like short descriptions. MatzeB: Maybe use `"MIR Function Outlining"` to be in sync with INITIALIZE_PASS. Most llvm pass 'names'…

				Candidate() {}

				/// \brief Used to ensure that \p Candidates are outlined in an order that
				/// preserves the start and end indices of other \p Candidates.
				bool operator<(const Candidate &RHS) const { return StartIdx > RHS.StartIdx; }
				};

				/// \brief Stores created outlined functions and the information needed to
				/// construct them.
				MatzeBUnsubmitted Done Reply Inline Actions An instance of this only describes a single outlined function. MatzeB: An instance of this only describes a single outlined function.
				struct OutlinedFunction {

				/// The actual outlined function created.
				/// This is initialized after we go through and create the actual function.
				MachineFunction *MF = nullptr;

				/// A number assigned to this function which appears at the end of its name.
				size_t Name;

				/// The number of times that this function has appeared.
				size_t OccurrenceCount = 0;

				/// \brief The sequence of integers corresponding to the instructions in this
				/// function.
				std::vector<unsigned> Sequence;
				silvasUnsubmitted Done Reply Inline Actions I don't see much mention of the function Id's in `ProgramMapping`. Looking at the code, it seems like it should be `unsigned` and roughly represents the call instruction that jumps to the outlined function. silvas: I don't see much mention of the function Id's in `ProgramMapping`. Looking at the code, it…

				// The number of instructions this function would save.
				MatzeBUnsubmitted Done Reply Inline Actions doxygen `///` MatzeB: doxygen `///`
				unsigned Benefit = 0;

				MatzeBUnsubmitted Done Reply Inline Actions Could use references instead of pointers. MatzeB: Could use references instead of pointers.
				OutlinedFunction(size_t Name, size_t OccurrenceCount,
				const std::vector<unsigned> &Sequence,
				unsigned Benefit)
				: Name(Name), OccurrenceCount(OccurrenceCount), Sequence(Sequence),
				Benefit(Benefit)
				{}
				MatzeBUnsubmitted Done Reply Inline Actions C++ does the right thing for `Name(Name)` etc. so you can drop the `_` suffixes from the parameter names. Similar with some other constructors. MatzeB: C++ does the right thing for `Name(Name)` etc. so you can drop the `_` suffixes from the…
				};

				MatzeBUnsubmitted Done Reply Inline Actions You should extend the anonymous namespace to include the MachineOutliner class. The only things that needs to be visible to the outside are `initializeMachineOutlinerPass()` (=the stuff coming out of INITIALIZE_PASS) and `createOutlinerPass()`. MatzeB: You should extend the anonymous namespace to include the MachineOutliner class. The only things…
				/// \brief Maps \p MachineInstrs to unsigned integers and stores the mappings.
				struct InstructionMapper {

				/// \brief The next available integer to assign to a \p MachineInstr that
				/// cannot be outlined.
				///
				/// Set to -3 for compatability with \p DenseMapInfo<unsigned>.
				unsigned IllegalInstrNumber = -3;

				/// \brief The next available integer to assign to a \p MachineInstr that can
				/// be outlined.
				unsigned LegalInstrNumber = 0;

				/// Correspondence from \p MachineInstrs to unsigned integers.
				DenseMap<MachineInstr *, unsigned, MachineInstrExpressionTrait>
				InstructionIntegerMap;

				silvasUnsubmitted Done Reply Inline Actions There are a couple members here related to the legal-instruction/illegal-instruction/function numbering that could stand to be pulled out into an isolated class (which can then be held by value in the pass) separate from the pass boilerplate. Such a class will also be a good place to authoritatively document the numbering scheme and encapsulate it. silvas: There are a couple members here related to the legal-instruction/illegal-instruction/function…
				/// Corresponcence from unsigned integers to \p MachineInstrs.
				/// Inverse of \p InstructionIntegerMap.
				DenseMap<unsigned, MachineInstr *> IntegerInstructionMap;

				/// The vector of unsigned integers that the module is mapped to.
				std::vector<unsigned> UnsignedVec;
				MatzeBUnsubmitted Done Reply Inline Actions Because this relies on implementation details of DenseMapInfo, better play it safe with static_assert so the compilation fails if someone decides to change the values: static_assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1); static_assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2); that way things also explain themselves and you can get away with a shorter comment. The module pass instance can in theory be reused for multiple programs. So the state here needs to be initialized and cleared in `runOnModule()`. MatzeB: Because this relies on implementation details of DenseMapInfo, better play it safe with…

				/// \brief Stores the location of the instruction associated with the integer
				MatzeBUnsubmitted Done Reply Inline Actions It's the next number to be assigned, isn't it? Same with CurrIllegalInstrMapping. MatzeB: It's the next number to be assigned, isn't it? Same with CurrIllegalInstrMapping.
				/// at index i in \p UnsignedVec for each index i.
				std::vector<MachineBasicBlock::iterator> InstrList;

				/// Converts \p MI to an unsigned integer.
				unsigned mapToUnsigned(MachineInstr &MI) {
				// Get the integer for this instruction or give it the current
				// LegalInstrNumber.
				bool WasInserted;
				auto It = InstructionIntegerMap.end();
				MatzeBUnsubmitted Done Reply Inline Actions I don't think you need to initialize this, it gets overwritten anyway in the next line. MatzeB: I don't think you need to initialize this, it gets overwritten anyway in the next line.
				std::tie(It, WasInserted) =
				InstructionIntegerMap.insert(std::make_pair(&MI, LegalInstrNumber));
				MatzeBUnsubmitted Done Reply Inline Actions This makes no sense. MatzeB: This makes no sense.
				unsigned MINumber = It->second;

				// There was an insertion.
				if (WasInserted) {
				LegalInstrNumber++;
				IntegerInstructionMap.insert(std::make_pair(MINumber, &MI));
				}

				MatzeBUnsubmitted Done Reply Inline Actions Maybe use `"machine-outliner"` to be in sync with DEBUG_TYPE. MatzeB: Maybe use `"machine-outliner"` to be in sync with DEBUG_TYPE.
				UnsignedVec.push_back(MINumber);

				// Make sure we don't overflow or use any integers reserved by the DenseMap.
				if (LegalInstrNumber >= IllegalInstrNumber)
				report_fatal_error("Instruction mapping overflow!");

				assert(LegalInstrNumber != DenseMapInfo<unsigned>::getEmptyKey()
				&& "Tried to assign DenseMap tombstone or empty key to instruction.");
				assert(LegalInstrNumber != DenseMapInfo<unsigned>::getTombstoneKey()
				&& "Tried to assign DenseMap tombstone or empty key to instruction.");

				return MINumber;
				}

				/// \brief Transforms a \p MachineBasicBlock into a \p vector of \p unsigneds
				MatzeBUnsubmitted Done Reply Inline Actions Maybe add `assert(CurrLegalInstrMapping < CurrIllegalInstrMapping && "Overflow");` The same at the place where you increment CurrLegalInstrMapping. MatzeB: Maybe add `assert(CurrLegalInstrMapping < CurrIllegalInstrMapping && "Overflow");` The same at…
				/// and appends it to \p UnsignedVec and \p InstrList.
				///
				/// Two instructions are assigned the same integer if they are identical.
				/// If an instruction is deemed unsafe to outline, then it will be assigned an
				/// unique integer. The resulting mapping is placed into a suffix tree and
				/// queried for candidates.
				///
				/// \param MBB The \p MachineBasicBlock to be translated into integers.
				/// \param TRI \p TargetRegisterInfo for the module.
				MatzeBUnsubmitted Done Reply Inline Actions The `find(), if (I != end()) ... else insert()` sequence walks over the datastructure in the find() and again in the insert() step. Instead you can simply always `insert()` and check the return value to see whether an element was actually inserted or an existing one reused: auto I = map.insert(make_pair(MI, CurrLegalInstrMapping)); // Newly inserted? if (I.second) CurrLegalInstrMapping++; unsigned MINumber = I.first; MatzeB: The `find(), if (I != end()) ... else insert()` sequence walks over the datastructure in the…
				/// \param TII \p TargetInstrInfo for the module.
				void convertToUnsignedVec(MachineBasicBlock &MBB,
				const TargetRegisterInfo &TRI,
				const TargetInstrInfo &TII) {

				for (MachineBasicBlock::iterator It = MBB.begin(), Et = MBB.end(); It != Et;
				It++) {

				// Keep track of where this instruction is in the module.
				InstrList.push_back(It);
				MachineInstr &MI = *It;

				// If it's not legal to outline, then give it an unique integer and move
				// on. This way it will never appear in a repeated substring.
				if (!TII.isLegalToOutline(MI)) {
				UnsignedVec.push_back(IllegalInstrNumber);
				IllegalInstrNumber--;

				MatzeBUnsubmitted Done Reply Inline Actions Use a reference. MatzeB: Use a reference.
				assert(LegalInstrNumber < IllegalInstrNumber &&
				"Instruction mapping overflow!");

				assert(IllegalInstrNumber !=
				DenseMapInfo<unsigned>::getEmptyKey()
				&& "IllegalInstrNumber cannot be DenseMap tombstone or empty key!");

				assert(IllegalInstrNumber !=
				DenseMapInfo<unsigned>::getTombstoneKey()
				&& "IllegalInstrNumber cannot be DenseMap tombstone or empty key!");
				continue;
				}

				// It's safe and we're not the last instruction in the last basic block.
				// Just map it and move on.
				mapToUnsigned(MI);
				}

				// After we're done every insertion, uniquely terminate this part of the
				// "string". This makes sure we won't match across basic block or function
				// boundaries since the "end" is encoded uniquely and thus appears in no
				// repeated substring.
				InstrList.push_back(nullptr);
				UnsignedVec.push_back(IllegalInstrNumber);
				IllegalInstrNumber--;
				}
				MatzeBUnsubmitted Done Reply Inline Actions Indentation, MBB can be `const` MatzeB: Indentation, MBB can be `const`

				InstructionMapper() {
				// Make sure that the implementation of DenseMapInfo<unsigned> hasn't
				// changed.
				assert(DenseMapInfo<unsigned>::getEmptyKey() == (unsigned)-1 &&
				"DenseMapInfo<unsigned>'s empty key isn't -1!");
				assert(DenseMapInfo<unsigned>::getTombstoneKey() == (unsigned)-2 &&
				"DenseMapInfo<unsigned>'s tombstone key isn't -2!");
				}
				};

				/// \brief An interprocedural pass which finds repeated sequences of
				/// instructions and replaces them with calls to functions.
				///
				/// Each instruction is mapped to an unsigned integer and placed in a string.
				/// The resulting mapping is then placed in a \p SuffixTree. The \p SuffixTree
				/// is then repeatedly queried for repeated sequences of instructions. Each
				/// non-overlapping repeated sequence is then placed in its own
				/// \p MachineFunction and each instance is then replaced with a call to that
				/// function.
				struct MachineOutliner : public ModulePass {

				silvasUnsubmitted Done Reply Inline Actions Small readability nit: use `std::tie(I, WasInserted) = ...` or something like that to make this a bit clearer (this map interface returning the pair is always confusing without that). silvas: Small readability nit: use `std::tie(I, WasInserted) = ...` or something like that to make this…
				static char ID;

				StringRef getPassName() const override { return "MIR Function Outlining"; }

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<MachineModuleInfo>();
				AU.addPreserved<MachineModuleInfo>();
				AU.setPreservesAll();
				ModulePass::getAnalysisUsage(AU);
				}
				MatzeBUnsubmitted Done Reply Inline Actions The overflow could (in theory) be triggered by a user and not just by compiler bugs. So could use report_fatal_error() so it stays around in release builds: if (CurrLegalInstrMapping < CurIllegalInstrMapping) report_fatal_error("Instruction mapping overflow!"); MatzeB: The overflow could (in theory) be triggered by a user and not just by compiler bugs. So could…

				MatzeBUnsubmitted Done Reply Inline Actions Use `DenseMapInfo<unsigned>::get{Empty\|Tombstone}Key()` instead of hardcoding the values. MatzeB: Use `DenseMapInfo<unsigned>::get{Empty\|Tombstone}Key()` instead of hardcoding the values.
				MachineOutliner() : ModulePass(ID) {
				initializeMachineOutlinerPass(*PassRegistry::getPassRegistry());
				}

				/// \brief Replace the sequences of instructions represented by the
				/// \p Candidates in \p CandidateList with calls to \p MachineFunctions
				/// described in \p FunctionList.
				///
				/// \param M The module we are outlining from.
				/// \param CandidateList A list of candidates to be outlined.
				/// \param FunctionList A list of functions to be inserted into the module.
				/// \param Mapper Contains the instruction mappings for the module.
				bool outline(Module &M, const ArrayRef<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				InstructionMapper &Mapper);

				/// Creates a function for \p OF and inserts it into the module.
				MachineFunction *createOutlinedFunction(Module &M, const OutlinedFunction &OF,
				InstructionMapper &Mapper);

				/// Find potential outlining candidates and store them in \p CandidateList.
				///
				/// For each type of potential candidate, also build an \p OutlinedFunction
				/// struct containing the information to build the function for that
				/// candidate.
				///
				/// \param[out] CandidateList Filled with outlining candidates for the module.
				/// \param[out] FunctionList Filled with functions corresponding to each type
				/// of \p Candidate.
				/// \param ST The suffix tree for the module.
				/// \param TII TargetInstrInfo for the module.
				///
				/// \returns The length of the longest candidate found. 0 if there are none.
				unsigned buildCandidateList(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				SuffixTree &ST, const TargetInstrInfo &TII);
				MatzeBUnsubmitted Done Reply Inline Actions I don't think the value names need to be kept around. MatzeB: I don't think the value names need to be kept around.

				/// \brief Remove any overlapping candidates that weren't handled by the
				/// suffix tree's pruning method.
				///
				/// Pruning from the suffix tree doesn't necessarily remove all overlaps.
				/// If a short candidate is chosen for outlining, then a longer candidate
				/// which has that short candidate as a suffix is chosen, the tree's pruning
				/// method will not find it. Thus, we need to prune before outlining as well.
				///
				/// \param[in,out] CandidateList A list of outlining candidates.
				/// \param[in,out] FunctionList A list of functions to be outlined.
				/// \param MaxCandidateLen The length of the longest candidate.
				/// \param TII TargetInstrInfo for the module.
				void pruneOverlaps(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				unsigned MaxCandidateLen,
				const TargetInstrInfo &TII);

				MatzeBUnsubmitted Done Reply Inline Actions Could use `emplace_back(OccBB, StartIdxInBB, ...)` MatzeB: Could use `emplace_back(OccBB, StartIdxInBB, ...)`
				/// Construct a suffix tree on the instructions in \p M and outline repeated
				/// strings from that tree.
				bool runOnModule(Module &M) override;
				};

				} // Anonymous namespace.

				char MachineOutliner::ID = 0;

				MatzeBUnsubmitted Done Reply Inline Actions Could use `emplace_back()`. MatzeB: Could use `emplace_back()`.
				namespace llvm {
				ModulePass *createMachineOutlinerPass() { return new MachineOutliner(); }
				MatzeBUnsubmitted Done Reply Inline Actions Can be `begin()` instead of `instr_begin()`. MatzeB: Can be `begin()` instead of `instr_begin()`.
				}

				INITIALIZE_PASS(MachineOutliner, "machine-outliner",
				"Machine Function Outliner", false, false)

				MatzeBUnsubmitted Done Reply Inline Actions Should probably document this with something like `// The cloned memory operands reference the old function. Drop them.` MatzeB: Should probably document this with something like ``// The cloned memory operands reference the…
				void MachineOutliner::pruneOverlaps(std::vector<Candidate> &CandidateList,
				MatzeBUnsubmitted Done Reply Inline Actions can be `MBB->begin()` MatzeB: can be `MBB->begin()`
				std::vector<OutlinedFunction> &FunctionList,
				unsigned MaxCandidateLen,
				const TargetInstrInfo &TII) {

				// Check for overlaps in the range. This is O(n^2) worst case, but we can
				silvasUnsubmitted Not Done Reply Inline Actions If I understand what this is doing correctly, it can be easily made less than O(N^2) by sorting ascending by Start and descending by End (SROA does something similar to do efficient overlap calculations). silvas: If I understand what this is doing correctly, it can be easily made less than O(N^2) by sorting…
				// alleviate that somewhat by bounding our search space using the start
				// index of our first candidate and the maximum distance an overlapping
				// candidate could have from the first candidate.
				for (auto It = CandidateList.begin(), Et = CandidateList.end(); It != Et;
				It++) {
				Candidate &C1 = *It;
				OutlinedFunction &F1 = FunctionList[C1.FunctionIdx];

				// If we removed this candidate, skip it.
				if (!C1.InCandidateList)
				continue;

				// If the candidate's function isn't good to outline anymore, then
				// remove the candidate and skip it.
				if (F1.OccurrenceCount < 2 \|\| F1.Benefit < 1) {
				C1.InCandidateList = false;
				continue;
				}

				// The minimum start index of any candidate that could overlap with this
				MatzeBUnsubmitted Done Reply Inline Actions Could use a range based for: for (OutlinedFunction &OF : FunctionList) OF.MF = createOutlinedFunction(M, OF); MatzeB: Could use a range based for: ``` for (OutlinedFunction &OF : FunctionList) OF.MF =…
				MatzeBUnsubmitted Done Reply Inline Actions I think getOrInsertFunction() copies the name and does not take ownership of the passed string so this is an unnecessary copy and a memory leak. MatzeB: I think getOrInsertFunction() copies the name and does not take ownership of the passed string…
				// one.
				unsigned FarthestPossibleIdx = 0;

				// Either the index is 0, or it's at most MaxCandidateLen indices away.
				if (C1.StartIdx > MaxCandidateLen)
				FarthestPossibleIdx = C1.StartIdx - MaxCandidateLen;

				// Compare against the other candidates in the list.
				// This is at most MaxCandidateLen/2 other candidates.
				// This is because each candidate has to be at least 2 indices away.
				// = O(n * MaxCandidateLen/2) comparisons
				//
				// On average, the maximum length of a candidate is quite small; a fraction
				// of the total module length in terms of instructions. If the maximum
				// candidate length is large, then there are fewer possible candidates to
				// compare against in the first place.
				for (auto Sit = It + 1; Sit != Et; Sit++) {
				Candidate &C2 = *Sit;
				MatzeBUnsubmitted Done Reply Inline Actions Use references for variables that cannot be `nullptr`. MatzeB: Use references for variables that cannot be `nullptr`.
				OutlinedFunction &F2 = FunctionList[C2.FunctionIdx];

				// Is this candidate too far away to overlap?
				// NOTE: This will be true in
				// O(max(FarthestPossibleIdx/2, #Candidates remaining)) steps
				// for every candidate.
				if (C2.StartIdx < FarthestPossibleIdx)
				break;

				// Did we already remove this candidate in a previous step?
				if (!C2.InCandidateList)
				continue;

				// Is the function beneficial to outline?
				if (F2.OccurrenceCount < 2 \|\| F2.Benefit < 1) {
				// If not, remove this candidate and move to the next one.
				C2.InCandidateList = false;
				continue;
				}

				size_t C2End = C2.StartIdx + C2.Len - 1;

				// Do C1 and C2 overlap?
				//
				// Not overlapping:
				// High indices... [C1End ... C1Start][C2End ... C2Start] ...Low indices
				//
				// We sorted our candidate list so C2Start <= C1Start. We know that
				// C2End > C2Start since each candidate has length >= 2. Therefore, all we
				// have to check is C2End < C2Start to see if we overlap.
				if (C2End < C1.StartIdx)
				continue;

				// C2 overlaps with C1. Because we pruned the tree already, the only way
				// this can happen is if C1 is a proper suffix of C2. Thus, we must have
				// found C1 first during our query, so it must have benefit greater or
				MatzeBUnsubmitted Done Reply Inline Actions I think CandidateList and FunctionList can be `const` or better `ArrayRef`. MatzeB: I think CandidateList and FunctionList can be `const` or better `ArrayRef`.
				// equal to C2. Greedily pick C1 as the candidate to keep and toss out C2.
				DEBUG (
				size_t C1End = C1.StartIdx + C1.Len - 1;
				dbgs() << "- Found an overlap to purge.\n";
				dbgs() << "--- C1 :[" << C1.StartIdx << ", " << C1End << "]\n";
				dbgs() << "--- C2 :[" << C2.StartIdx << ", " << C2End << "]\n";
				);

				MatzeBUnsubmitted Done Reply Inline Actions Constructing names should not be necessary here. You should be able to add a GlobalValue MachineOperand instead of a Symbol one when you construct the call instruction. MatzeB: Constructing names should not be necessary here. You should be able to add a GlobalValue…
				// Update the function's occurrence count and benefit to reflec that C2
				// is being removed.
				F2.OccurrenceCount--;
				F2.Benefit = TII.outliningBenefit(F2.Sequence.size(),
				F2.OccurrenceCount
				);

				// Mark C2 as not in the list.
				C2.InCandidateList = false;

				DEBUG (
				dbgs() << "- Removed C2. \n";
				dbgs() << "--- Num fns left for C2: " << F2.OccurrenceCount << "\n";
				dbgs() << "--- C2's benefit: " << F2.Benefit << "\n";
				);
				}
				}
				}

				unsigned
				MachineOutliner::buildCandidateList(std::vector<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				SuffixTree &ST,
				const TargetInstrInfo &TII) {

				std::vector<unsigned> CandidateSequence; // Current outlining candidate.
				unsigned MaxCandidateLen = 0; // Length of the longest candidate.

				// Function for maximizing query in the suffix tree.
				// This allows us to define more fine-grained types of things to outline in
				// the target without putting target-specific info in the suffix tree.
				auto BenefitFn = [&TII](const SuffixTreeNode &Curr, size_t StringLen) {

				// Any leaf whose parent is the root only has one occurrence.
				if (Curr.Parent->isRoot())
				return 0u;

				MatzeBUnsubmitted Done Reply Inline Actions You can move those out of the loop. MatzeB: You can move those out of the loop.
				// Anything with length < 2 will never be beneficial on any target.
				if (StringLen < 2)
				return 0u;

				size_t Occurrences = Curr.Parent->OccurrenceCount;
				silvasUnsubmitted Done Reply Inline Actions It isn't really the "function's id" but rather the id for a call instruction that jumps to it, right? silvas: It isn't really the "function's id" but rather the id for a call instruction that jumps to it…

				silvasUnsubmitted Done Reply Inline Actions Nit: remove commented out code. silvas: Nit: remove commented out code.
				// Anything with fewer than 2 occurrences will never be beneficial on any
				// target.
				if (Occurrences < 2)
				MatzeBUnsubmitted Done Reply Inline Actions There seems to be an unnecessary duplication of the string. MatzeB: There seems to be an unnecessary duplication of the string.
				return 0u;

				return TII.outliningBenefit(StringLen, Occurrences);
				};

				// Repeatedly query the suffix tree for the substring that maximizes
				// BenefitFn. Find the occurrences of that string, prune the tree, and store
				// each occurrence as a candidate.
				for (ST.bestRepeatedSubstring(CandidateSequence, BenefitFn);
				CandidateSequence.size() > 1;
				ST.bestRepeatedSubstring(CandidateSequence, BenefitFn)) {

				std::vector<size_t> Occurrences;

				bool GotNonOverlappingCandidate =
				ST.findOccurrencesAndPrune(CandidateSequence, Occurrences);

				silvasUnsubmitted Done Reply Inline Actions Can you improve the ownership here? E.g. use a std::unique_ptr to manage the lifetime? silvas: Can you improve the ownership here? E.g. use a std::unique_ptr to manage the lifetime?
				// Is the candidate we found known to overlap with something we already
				// outlined?
				if (!GotNonOverlappingCandidate)
				continue;

				// Is this candidate the longest so far?
				if (CandidateSequence.size() > MaxCandidateLen)
				MaxCandidateLen = CandidateSequence.size() + 1;

				// Keep track of the benefit of outlining this candidate in its
				// OutlinedFunction.
				unsigned FnBenefit = TII.outliningBenefit(CandidateSequence.size(),
				Occurrences.size()
				);

				assert(FnBenefit > 0 && "Function cannot be unbeneficial!");

				// Save an OutlinedFunction for this candidate.
				FunctionList.emplace_back(
				FunctionList.size(), // Number of this function.
				Occurrences.size(), // Number of occurrences.
				CandidateSequence, // Sequence to outline.
				FnBenefit // Instructions saved by outlining this function.
				);

				// Save each of the occurrences of the candidate so we can outline them.
				for (size_t &Occ : Occurrences)
				CandidateList.emplace_back(
				Occ, // Starting idx in that MBB.
				CandidateSequence.size(), // Candidate length.
				FunctionList.size() - 1 // Idx of the corresponding function.
				);

				FunctionsCreated++;
				}

				// Sort the candidates in decending order. This will simplify the outlining
				// process when we have to remove the candidates from the mapping by
				MatzeBUnsubmitted Done Reply Inline Actions Move to assignment. MatzeB: Move to assignment.
				// allowing us to cut them out without keeping track of an offset.
				std::stable_sort(CandidateList.begin(), CandidateList.end());
				MatzeBUnsubmitted Done Reply Inline Actions As you have some state such as CurrentFunctionID, CurrLegalMapping in the class anyway, maybe the two vectors and the Worklist can move there as well so you do not need to pass them around? Just need to `clear()` them at the end of the function then. MatzeB: As you have some state such as CurrentFunctionID, CurrLegalMapping in the class anyway, maybe…

				return MaxCandidateLen;
				}

				MachineFunction *
				MachineOutliner::createOutlinedFunction(Module &M, const OutlinedFunction &OF,
				InstructionMapper &Mapper) {

				// Create the function name. This should be unique. For now, just hash the
				// module name and include it in the function name plus the number of this
				// function.
				std::ostringstream NameStream;
				size_t HashedModuleName = std::hash<std::string>{}(M.getName().str());
				NameStream << "OUTLINED_FUNCTION" << HashedModuleName << "_" << OF.Name;

				// Create the function using an IR-level function.
				LLVMContext &C = M.getContext();
				Function *F = dyn_cast<Function>(
				M.getOrInsertFunction(NameStream.str(), Type::getVoidTy(C), NULL));
				assert(F && "Function was null!");

				// Allow the linker to merge together identical outlined functions between
				// modules.
				F->setLinkage(GlobalValue::LinkOnceODRLinkage);
				F->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
				MatzeBUnsubmitted Done Reply Inline Actions Hmm... This hash doesn't seem collision free. Someone having two files with the same name (maybe in two different projects that he links together later) may happen. Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? I think the linker will only try to merge functions with the same name but the function name(-hash) is currently based on the name not the contents of the function so I would expect this to be not helpful in most cases. Maybe stay with the previous internal linking and try the LinkOnce tricks in a follow-up commit (where it is based on the contents). MatzeB: Hmm... - This hash doesn't seem collision free. Someone having two files with the same name…
				silvasUnsubmitted Done Reply Inline Actions Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why even bother with a hash then? No. linkonce_odr requires that if the name matches then the contents are interchangeable, since one gets selected arbitrarily. So for correctness the hash must be collision-free. (see also the discussion in D29512 which also involves finding a stable "name" for the TU) Also, I don't see the point of doing this. The linker's content-based deduplication ("ICF") should handle this case without caring about the name. If you want to use the linker's comdat/linkonce (i.e. name-based) deduplication then you can just use the function's contents as the name (mangling away NUL bytes), or a strong hash (collisions are a correctness problem). Presumably, if users are using this pass, then they care about code size and so they are likely to have ICF enabled already. So I don't see the point of doing this linkage trick. silvas: > Of course a collision shouldn't hurt as the linker will compare the contents anyway, but why…

				BasicBlock *EntryBB = BasicBlock::Create(C, "entry", F);
				IRBuilder<> Builder(EntryBB);
				Builder.CreateRetVoid();

				MachineModuleInfo &MMI = getAnalysis<MachineModuleInfo>();
				MachineFunction &MF = MMI.getMachineFunction(*F);
				MachineBasicBlock &MBB = *MF.CreateMachineBasicBlock();
				const TargetSubtargetInfo &STI = MF.getSubtarget();
				const TargetInstrInfo &TII = *STI.getInstrInfo();

				// Insert the new function into the module.
				MF.insert(MF.begin(), &MBB);

				TII.insertOutlinerPrologue(MBB, MF);

				// Copy over the instructions for the function using the integer mappings in
				// its sequence.
				for (unsigned Str : OF.Sequence) {
				MachineInstr *NewMI =
				MF.CloneMachineInstr(Mapper.IntegerInstructionMap.find(Str)->second);
				NewMI->dropMemRefs();
				MBB.insert(MBB.end(), NewMI);
				}

				TII.insertOutlinerEpilogue(MBB, MF);

				return &MF;
				}

				bool MachineOutliner::outline(Module &M,
				const ArrayRef<Candidate> &CandidateList,
				std::vector<OutlinedFunction> &FunctionList,
				InstructionMapper &Mapper) {

				bool OutlinedSomething = false;

				// Replace the candidates with calls to their respective outlined functions.
				for (const Candidate &C : CandidateList) {

				// Was the candidate removed during pruneOverlaps?
				if (!C.InCandidateList)
				continue;

				// If not, then look at its OutlinedFunction.
				OutlinedFunction &OF = FunctionList[C.FunctionIdx];

				// Was its OutlinedFunction made unbeneficial during pruneOverlaps?
				if (OF.OccurrenceCount < 2 \|\| OF.Benefit < 1)
				continue;

				// If not, then outline it.
				MachineBasicBlock MBB = (Mapper.InstrList[C.StartIdx]).getParent();
				MachineBasicBlock::iterator StartIt = Mapper.InstrList[C.StartIdx];

				// Does this candidate have a function yet?
				if (!OF.MF)
				OF.MF = createOutlinedFunction(M, OF, Mapper);

				MachineFunction *MF = OF.MF;
				const TargetSubtargetInfo &STI = MF->getSubtarget();
				const TargetInstrInfo &TII = *STI.getInstrInfo();

				// Insert a call to the new function and erase the old sequence.
				MachineBasicBlock::iterator EndIt = StartIt;
				std::advance(EndIt, C.Len);
				StartIt = TII.insertOutlinedCall(M, MBB, StartIt, MF);
				++StartIt;
				MBB->erase(StartIt, EndIt);

				OutlinedSomething = true;

				// Statistics.
				NumOutlined++;
				}

				DEBUG (
				dbgs() << "OutlinedSomething = " << OutlinedSomething << "\n";
				);

				return OutlinedSomething;
				}

				bool MachineOutliner::runOnModule(Module &M) {

				// Is there anything in the module at all?
				if (M.empty())
				return false;

				MachineModuleInfo &MMI = getAnalysis<MachineModuleInfo>();
				const TargetSubtargetInfo &STI = MMI.getMachineFunction(*M.begin())
				.getSubtarget();
				const TargetRegisterInfo *TRI = STI.getRegisterInfo();
				const TargetInstrInfo *TII = STI.getInstrInfo();

				InstructionMapper Mapper;

				// Build instruction mappings for each function in the module.
				for (Function &F : M) {
				MachineFunction &MF = MMI.getMachineFunction(F);

				// Is the function empty? Safe to outline from?
				if (F.empty() \|\| !TII->functionIsSafeToOutlineFrom(MF))
				continue;

				// If it is, look at each MachineBasicBlock in the function.
				for (MachineBasicBlock &MBB : MF) {

				// Is there anything in MBB?
				if (MBB.empty())
				continue;

				// If yes, map it.
				Mapper.convertToUnsignedVec(MBB, TRI, TII);
				}
				}

				// Construct a suffix tree, use it to find candidates, and then outline them.
				SuffixTree ST(Mapper.UnsignedVec);
				std::vector<Candidate> CandidateList;
				std::vector<OutlinedFunction> FunctionList;

				unsigned MaxCandidateLen =
				buildCandidateList(CandidateList, FunctionList, ST, *TII);

				pruneOverlaps(CandidateList, FunctionList, MaxCandidateLen, *TII);
				return outline(M, CandidateList, FunctionList, Mapper);
				}

lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,		static cl::opt<bool> PrintISelInput("print-isel-input", cl::Hidden,
cl::desc("Print LLVM IR input to isel pass"));		cl::desc("Print LLVM IR input to isel pass"));
static cl::opt<bool> PrintGCInfo("print-gc", cl::Hidden,		static cl::opt<bool> PrintGCInfo("print-gc", cl::Hidden,
cl::desc("Dump garbage collector data"));		cl::desc("Dump garbage collector data"));
static cl::opt<bool> VerifyMachineCode("verify-machineinstrs", cl::Hidden,		static cl::opt<bool> VerifyMachineCode("verify-machineinstrs", cl::Hidden,
cl::desc("Verify generated machine code"),		cl::desc("Verify generated machine code"),
cl::init(false),		cl::init(false),
cl::ZeroOrMore);		cl::ZeroOrMore);
		static cl::opt<bool> EnableMachineOutliner("enable-machine-outliner",
		silvasUnsubmitted Done Reply Inline Actions What is the official name? "machine outliner" or "MIR outliner". Please be consistent here and elsewhere. silvas: What is the official name? "machine outliner" or "MIR outliner". Please be consistent here and…
		cl::Hidden,
		cl::desc("Enable machine outliner"));

static cl::opt<std::string>		static cl::opt<std::string>
PrintMachineInstrs("print-machineinstrs", cl::ValueOptional,		PrintMachineInstrs("print-machineinstrs", cl::ValueOptional,
cl::desc("Print machine instrs"),		cl::desc("Print machine instrs"),
cl::value_desc("pass-name"), cl::init("option-unspecified"));		cl::value_desc("pass-name"), cl::init("option-unspecified"));

static cl::opt<int> EnableGlobalISelAbort(		static cl::opt<int> EnableGlobalISelAbort(
"global-isel-abort", cl::Hidden,		"global-isel-abort", cl::Hidden,
▲ Show 20 Lines • Show All 566 Lines • ▼ Show 20 Lines	void TargetPassConfig::addMachinePasses() {
addPass(&LiveDebugValuesID, false);		addPass(&LiveDebugValuesID, false);

// Insert before XRay Instrumentation.		// Insert before XRay Instrumentation.
addPass(&FEntryInserterID, false);		addPass(&FEntryInserterID, false);

addPass(&XRayInstrumentationID, false);		addPass(&XRayInstrumentationID, false);
addPass(&PatchableFunctionID, false);		addPass(&PatchableFunctionID, false);

		if (EnableMachineOutliner)
		PM->add(createMachineOutlinerPass());

AddingMachinePasses = false;		AddingMachinePasses = false;
}		}

/// Add passes that optimize machine instructions in SSA form.		/// Add passes that optimize machine instructions in SSA form.
void TargetPassConfig::addMachineSSAOptimization() {		void TargetPassConfig::addMachineSSAOptimization() {
// Pre-ra tail duplication.		// Pre-ra tail duplication.
addPass(&EarlyTailDuplicateID);		addPass(&EarlyTailDuplicateID);

▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	public:
std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
decomposeMachineOperandsTargetFlags(unsigned TF) const override;		decomposeMachineOperandsTargetFlags(unsigned TF) const override;

ArrayRef<std::pair<unsigned, const char *>>		ArrayRef<std::pair<unsigned, const char *>>
getSerializableDirectMachineOperandTargetFlags() const override;		getSerializableDirectMachineOperandTargetFlags() const override;

bool isTailCall(const MachineInstr &Inst) const override;		bool isTailCall(const MachineInstr &Inst) const override;

		unsigned outliningBenefit(size_t SequenceSize,
		size_t Occurrences) const override;

		bool functionIsSafeToOutlineFrom(MachineFunction &MF) const override;

		bool isLegalToOutline(MachineInstr &MI) const override;

		bool isFixablePostOutline(MachineInstr &MI) const;

		void insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const override;

		void insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const override;

		MachineBasicBlock::iterator
		insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It,
		MachineFunction &MF) const override;

protected:		protected:
/// Commutes the operands in the given instruction by changing the operands		/// Commutes the operands in the given instruction by changing the operands
/// order and/or changing the instruction's opcode and/or the immediate value		/// order and/or changing the instruction's opcode and/or the immediate value
/// operand.		/// operand.
///		///
/// The arguments 'CommuteOpIdx1' and 'CommuteOpIdx2' specify the operands		/// The arguments 'CommuteOpIdx1' and 'CommuteOpIdx2' specify the operands
/// to be commuted.		/// to be commuted.
///		///
Show All 20 Lines	MachineInstr *foldMemoryOperandCustom(MachineFunction &MF, MachineInstr &MI,
unsigned Size, unsigned Align) const;		unsigned Size, unsigned Align) const;

/// isFrameOperand - Return true and the FrameIndex if the specified		/// isFrameOperand - Return true and the FrameIndex if the specified
/// operand and follow operands form a reference to the stack frame.		/// operand and follow operands form a reference to the stack frame.
bool isFrameOperand(const MachineInstr &MI, unsigned int Op,		bool isFrameOperand(const MachineInstr &MI, unsigned int Op,
int &FrameIndex) const;		int &FrameIndex) const;

/// Returns true iff the routine could find two commutable operands in the		/// Returns true iff the routine could find two commutable operands in the
/// given machine instruction with 3 vector inputs.		/// given machine instruction with 3 vector inputs.
		MatzeBUnsubmitted Done Reply Inline Actions May be left out if it is just repeating the comment from the parent class. That's less risk of the comment becoming out of date. MatzeB: May be left out if it is just repeating the comment from the parent class. That's less risk of…
/// The 'SrcOpIdx1' and 'SrcOpIdx2' are INPUT and OUTPUT arguments. Their		/// The 'SrcOpIdx1' and 'SrcOpIdx2' are INPUT and OUTPUT arguments. Their
/// input values can be re-defined in this method only if the input values		/// input values can be re-defined in this method only if the input values
/// are not pre-defined, which is designated by the special value		/// are not pre-defined, which is designated by the special value
/// 'CommuteAnyOperandIndex' assigned to it.		/// 'CommuteAnyOperandIndex' assigned to it.
/// If both of indices are pre-defined and refer to some operands, then the		/// If both of indices are pre-defined and refer to some operands, then the
/// method simply returns true if the corresponding operands are commutable		/// method simply returns true if the corresponding operands are commutable
/// and returns false otherwise.		/// and returns false otherwise.
///		///
/// For example, calling this method this way:		/// For example, calling this method this way:
/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;		/// unsigned Op1 = 1, Op2 = CommuteAnyOperandIndex;
/// findThreeSrcCommutedOpIndices(MI, Op1, Op2);		/// findThreeSrcCommutedOpIndices(MI, Op1, Op2);
/// can be interpreted as a query asking to find an operand that would be		/// can be interpreted as a query asking to find an operand that would be
/// commutable with the operand#1.		/// commutable with the operand#1.
bool findThreeSrcCommutedOpIndices(const MachineInstr &MI,		bool findThreeSrcCommutedOpIndices(const MachineInstr &MI,
unsigned &SrcOpIdx1,		unsigned &SrcOpIdx1,
unsigned &SrcOpIdx2) const;		unsigned &SrcOpIdx2) const;
};		};

} // End llvm namespace		} // End llvm namespace

#endif		#endif
		MatzeBUnsubmitted Not Done Reply Inline Actions This linebreak seems unnecessary. MatzeB: This linebreak seems unnecessary.

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,377 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}
};		};
}		}

char LDTLSCleanup::ID = 0;		char LDTLSCleanup::ID = 0;
FunctionPass*		FunctionPass*
llvm::createCleanupLocalDynamicTLSPass() { return new LDTLSCleanup(); }		llvm::createCleanupLocalDynamicTLSPass() { return new LDTLSCleanup(); }

		unsigned X86InstrInfo::outliningBenefit(size_t SequenceSize,
		silvasUnsubmitted Done Reply Inline Actions This name does not follow the coding standard. Should be `getOutliningBenefit` or something silvas: This name does not follow the coding standard. Should be `getOutliningBenefit` or something
		size_t Occurrences) const {
		unsigned NotOutlinedSize = SequenceSize * Occurrences;

		// Sequence appears once in outlined function (Sequence.size())
		// One return instruction (+1)
		// One call per occurrence (Occurrences)
		unsigned OutlinedSize = (SequenceSize + 1) + Occurrences;

		// Return the number of instructions saved by outlining this sequence.
		return NotOutlinedSize > OutlinedSize ? NotOutlinedSize - OutlinedSize : 0;
		}

		bool X86InstrInfo::functionIsSafeToOutlineFrom(MachineFunction &MF) const {
		silvasUnsubmitted Done Reply Inline Actions isFunctionSafeToOutlineFrom silvas: isFunctionSafeToOutlineFrom
		return MF.getFunction()->hasFnAttribute(Attribute::NoRedZone);
		}

		bool X86InstrInfo::isLegalToOutline(MachineInstr &MI) const {

		MatzeBUnsubmitted Done Reply Inline Actions Heh, in theory every single x86 instruction modifies RIP. But I assume we don't model it like that in LLVM. In any way restricting this to reads(RIP) should be enough. MatzeB: Heh, in theory every single x86 instruction modifies RIP. But I assume we don't model it like…
		// Don't outline returns or basic block terminators.
		if (MI.isReturn() \|\| MI.isTerminator())
		return false;

		// Don't outline debug values.
		if (MI.isDebugValue())
		return false;
		aprantlUnsubmitted Done Reply Inline Actions This is not the right way to do this. We need to skip over DBG_VALUE instruction as if they didn't exist. Otherwise the presence of DBG_VALUEs in the instruction stream will have an effect on the outlining decision, which means that compiling with -g will generate different code than without. Please also be sure to include a testcase that exercises this. aprantl: This is not the right way to do this. We need to skip over DBG_VALUE instruction as if they…

		// Don't outline anything that modifies or reads from the stack pointer.
		//
		// FIXME: There are instructions which are being manually built without
		// explicit uses/defs so we also have to check the MCInstrDesc. We should be
		// able to remove the extra checks once those are fixed up. For example,
		craig.topperUnsubmitted Done Reply Inline Actions All of the 'else's on each of these cases can be removed since the 'if's all return. craig.topper: All of the 'else's on each of these cases can be removed since the 'if's all return.
		// sometimes we might get something like %RAX<def> = POP64r 1. This won't be
		// caught by modifiesRegister or readsRegister even though the instruction
		MatzeBUnsubmitted Done Reply Inline Actions Are those tests necessary given that you already throw out operations with FrameIndex operands? MatzeB: Are those tests necessary given that you already throw out operations with FrameIndex operands?
		// really ought to be formed so that modifiesRegister/readsRegister would
		MatzeBUnsubmitted Done Reply Inline Actions Luckily CPIs aren't human and can't feel insulted :) MatzeB: Luckily CPIs aren't human and can't feel insulted :)
		// catch it.
		MatzeBUnsubmitted Done Reply Inline Actions You could use `MachineInstr::isPosition()` instead of checking for `isLabel()` and `isCFIInstruction()` MatzeB: You could use `MachineInstr::isPosition()` instead of checking for `isLabel()` and…
		if (MI.modifiesRegister(X86::RSP, &RI) \|\| MI.readsRegister(X86::RSP, &RI) \|\|
		MI.getDesc().hasImplicitUseOfPhysReg(X86::RSP) \|\|
		MI.getDesc().hasImplicitDefOfPhysReg(X86::RSP))
		MatzeBUnsubmitted Done Reply Inline Actions range based for. MatzeB: range based for.
		MatzeBUnsubmitted Done Reply Inline Actions Better use `const MachineOperand &MOP` to avoid some copying. MatzeB: Better use `const MachineOperand &MOP` to avoid some copying.
		MatzeBUnsubmitted Done Reply Inline Actions This is surprising, checking the MCInstrDesc should not be necessary. This is most probably a bug somewhere else in codegen, so there is nothing we can do here. However I'd be good if you could find the time later to create a reproducer and file a PR about it, reading and writing registers without having operands for it looks like a bug waiting to happen elsewhere. MatzeB: This is surprising, checking the MCInstrDesc should not be necessary. This is most probably a…
		return false;

		if (MI.readsRegister(X86::RIP, &RI) \|\|
		MI.getDesc().hasImplicitUseOfPhysReg(X86::RIP) \|\|
		MI.getDesc().hasImplicitDefOfPhysReg(X86::RIP))
		return false;

		if (MI.isPosition())
		return false;

		for (const MachineOperand &MOP : MI.operands())
		if (MOP.isCPI() \|\| MOP.isJTI() \|\| MOP.isCFIIndex() \|\| MOP.isFI() \|\|
		MOP.isTargetIndex())
		return false;
		aprantlUnsubmitted Not Done Reply Inline Actions Thanks! aprantl: Thanks!

		return true;
		}

		void X86InstrInfo::insertOutlinerEpilogue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {

		MachineInstr *retq = BuildMI(MF, DebugLoc(), get(X86::RETQ));
		MBB.insert(MBB.end(), retq);
		}

		void X86InstrInfo::insertOutlinerPrologue(MachineBasicBlock &MBB,
		MachineFunction &MF) const {
		return;
		}

		MachineBasicBlock::iterator
		X86InstrInfo::insertOutlinedCall(Module &M, MachineBasicBlock &MBB,
		MachineBasicBlock::iterator &It,
		MachineFunction &MF) const {
		It = MBB.insert(It,
		BuildMI(MF, DebugLoc(), get(X86::CALL64pcrel32))
		.addGlobalAddress(M.getNamedValue(MF.getName())));
		return It;
		}

test/CodeGen/X86/machine-outliner-basic.ll

This file was added.

				; RUN: llc -enable-machine-outliner -march=x86-64 < %s \| FileCheck %s
				MatzeBUnsubmitted Done Reply Inline Actions Better use `-mtriple=x86_64--` instead of `-march` so we also force the operating system etc. MatzeB: Better use `-mtriple=x86_64--` instead of `-march` so we also force the operating system etc.

				; Make sure the outliner can create simple calls.

				@x = global i32 0, align 4

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @main() #0 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4

				store i32 0, i32* %1, align 4
				store i32 0, i32* @x, align 4
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				store i32 1, i32* @x, align 4
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				MatzeBUnsubmitted Done Reply Inline Actions You probably want to force this test to use the same outlined function everywhere. FileCheck allows assigning names and checking for repeated patterns: ; CHECK: callq [[_OUTLINED_FUNCTION[0-9]+_0:OUTLINEFUNC]] ... ; CHECK: callq [[OUTLINEFUNC]] ... ; CHECK-LABEL: [[OUTLINEFUNC]]: MatzeB: You probably want to force this test to use the same outlined function everywhere. FileCheck…
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				ret i32 0
				}

				attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="true" }

				; CHECK-LABEL: _OUTLINED_FUNCTION{{[0-9]+}}_0:
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK: retq

test/CodeGen/X86/machine-outliner-bb-boundaries.ll

This file was added.

				; RUN: llc -enable-machine-outliner -march=x86-64 < %s \| FileCheck %s
				MatzeBUnsubmitted Done Reply Inline Actions You can probably merge the tests together into a single file as they are all about the same pass and use the same llc flags. MatzeB: You can probably merge the tests together into a single file as they are all about the same…

				; Make sure the outliner doesn't outline instructions that aren't in the same
				; basic block.

				@x = global i32 0, align 4

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @main() #0 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4
				store i32 0, i32* %1, align 4
				store i32 0, i32* %2, align 4
				%6 = load i32, i32* %2, align 4
				%7 = icmp ne i32 %6, 0
				br i1 %7, label %9, label %8

				; <label>:8: ; preds = %0
				MatzeBUnsubmitted Done Reply Inline Actions I'd remove those standard dumping comments. If you actually care about the label give it a real name, if not the comment shouldn't be necessary either. MatzeB: I'd remove those standard dumping comments. If you actually care about the label give it a real…
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				; CHECK: cmpl $0, -{{[0-9]+}}(%rbp)
				; CHECK: jne LBB0_{{[0-9]+}}
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				br label %10

				; <label>:9: ; preds = %0
				store i32 1, i32* %4, align 4
				br label %10

				; <label>:10: ; preds = %9, %8
				%11 = load i32, i32* %2, align 4
				%12 = icmp ne i32 %11, 0
				br i1 %12, label %14, label %13

				; <label>:13: ; preds = %10
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				; CHECK: LBB0_6:
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				br label %15

				; <label>:14: ; preds = %10
				store i32 1, i32* %4, align 4
				br label %15

				; <label>:15: ; preds = %14, %13
				ret i32 0
				}

				attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="true" }

				; CHECK-LABEL: _OUTLINED_FUNCTION{{[0-9]+}}_0:
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK: retq

test/CodeGen/X86/machine-outliner-interprocedural.ll

This file was added.

				; RUN: llc -enable-machine-outliner -march=x86-64 < %s \| FileCheck %s

				; Make sure the outliner can create simple calls across more than one function.

				@x = global i32 0, align 4

				define i32 @foo() #0 {
				; CHECK-LABEL: _foo:
				silvasUnsubmitted Done Reply Inline Actions The leading underscore here is darwin-specific. Add an explicit triple to avoid this (otherwise non-Darwin bots will break). silvas: The leading underscore here is darwin-specific. Add an explicit triple to avoid this (otherwise…
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4

				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				store i32 1, i32* @x, align 4

				ret i32 0
				}

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @main() #0 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				%2 = alloca i32, align 4
				%3 = alloca i32, align 4
				%4 = alloca i32, align 4
				%5 = alloca i32, align 4

				store i32 0, i32* %1, align 4
				store i32 0, i32* @x, align 4
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				store i32 1, i32* @x, align 4
				; CHECK: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				store i32 1, i32* %2, align 4
				store i32 2, i32* %3, align 4
				store i32 3, i32* %4, align 4
				store i32 4, i32* %5, align 4
				ret i32 0
				}

				attributes #0 = { noredzone nounwind ssp uwtable "no-frame-pointer-elim"="true" }

				; CHECK-LABEL: _OUTLINED_FUNCTION{{[0-9]+}}_0:
				; CHECK: movl $1, -{{[0-9]+}}(%rbp)
				; CHECK: movl $2, -{{[0-9]+}}(%rbp)
				; CHECK: movl $3, -{{[0-9]+}}(%rbp)
				; CHECK: movl $4, -{{[0-9]+}}(%rbp)
				; CHECK: retq

test/CodeGen/X86/machine-outliner-nocalls.ll

This file was added.

				; RUN: llc -enable-machine-outliner -march=x86-64 < %s \| FileCheck %s

				; Make sure the outliner never outlines call instructions.

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @bar() #0 {
				; CHECK-NOT: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				ret i32 1
				}

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @foo() #0 {
				; CHECK-NOT: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				ret i32 1
				}

				; Function Attrs: noinline noredzone nounwind ssp uwtable
				define i32 @main() #0 {
				; CHECK-LABEL: _main:
				%1 = alloca i32, align 4
				store i32 0, i32* %1, align 4
				; CHECK-NOT: callq _OUTLINED_FUNCTION{{[0-9]+}}_0
				%2 = call i32 @bar() #1
				%3 = call i32 @foo() #1
				%4 = call i32 @bar() #1
				%5 = call i32 @foo() #1
				%6 = call i32 @bar() #1
				%7 = call i32 @foo() #1
				ret i32 0
				}

				; CHECK-NOT: _OUTLINED_FUNCTION{{[0-9]+}}_0:

This is an archive of the discontinued LLVM Phabricator instance.

Outliner: Add MIR-level outlining passClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 89535

include/llvm/CodeGen/Passes.h

include/llvm/InitializePasses.h

include/llvm/Target/TargetInstrInfo.h

lib/CodeGen/CMakeLists.txt

lib/CodeGen/CodeGen.cpp

lib/CodeGen/MachineOutliner.cpp

lib/CodeGen/TargetPassConfig.cpp

lib/Target/X86/X86InstrInfo.h

lib/Target/X86/X86InstrInfo.cpp

test/CodeGen/X86/machine-outliner-basic.ll

test/CodeGen/X86/machine-outliner-bb-boundaries.ll

test/CodeGen/X86/machine-outliner-interprocedural.ll

test/CodeGen/X86/machine-outliner-nocalls.ll

Outliner: Add MIR-level outlining pass
ClosedPublic