This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
1/1
Scalar.h
-
Utils/
5/6
LoopUtils.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
CMakeLists.txt
3/5
LICM.cpp
80/84
LoopSink.cpp
-
Scalar.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
16/16
loopsink.ll
2/2
sink.ll

Differential D22778

Add Loop Sink pass to reverse the LICM based of basic block frequency.
ClosedPublic

Authored by danielcdh on Jul 25 2016, 2:05 PM.

Download Raw Diff

Details

Reviewers

chandlerc
davidxl
hfinkel

Commits

rGb94c09baa058: Add Loop Sink pass to reverse the LICM based of basic block frequency.
rL285308: Add Loop Sink pass to reverse the LICM based of basic block frequency.

Summary

LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.

Diff Detail

Build Status

Buildable 827
Build 827: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

davidxl added inline comments.Jul 30 2016, 9:31 PM

lib/Transforms/Scalar/LoopSink.cpp
38	Add statistics for number of instructions sinked etc.
79	LoopBase class has a member method 'contains' which can be used.
99	Remove dead code
134	add documentation for the method.
140	To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if there are any blocks that are really cold (colder than preheader). If there are not, early return.
144	add a comment for this variable.
160	if -> is
179	Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum, do not use cdt.
200	if T >= ... early return
205	Add debug trace here

Have you talked to anyone about the design for this?

I know Daniel Jasper, Quentin, and several others have looked at similar things before. Previous attempts have focused on using MachineLICM to do sinking as well as hoisting. While I don't have a strong opinion about one design over the other, we should be consistent about the plan here, and possibly consolidate some of the logic.

add more test and address David's comments.

davidxl added inline comments.Aug 11 2016, 4:22 PM

lib/Transforms/Scalar/LoopSink.cpp
123	This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop -- that should be extracted as a common utility.
132	Is the first check needed?
148	Contnue if BBs is empty
156	This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the sum of SinkBB and N. In other words, division should be used, not multiplication
184	Is the formatting correct here?

update

lib/Transforms/Scalar/LoopSink.cpp
123	Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy passmanager: Build a base class ASTLoopTransformation, and make LookInvariantCodeMotion and LoopSink sub class of it. LoopToAliasSetMap and collectAliasInfoForLoop should be protected member of the base class. The logic in the sub class will also be shared with the new pass manager. Build a base class ASTLoopLegacyPass which inherits from LoopPass, and LICMLegacyPass and LoopSinkLegacyPass subclass of it. cloneBasicBlockAnalysis, deleteAnalysisValue and deleteAnalysisLoop should be private member of the base class. Base class also need to have a ASTLoopTransformation pointer to invoke the actual logic shared with new pass manager. This is definitely doable, but seems an over kill to me because: collectAliasInfoForLoop is an optimization for compile time. And it is not yet available in the new pass manager. The refactoring mainly focuses on abstraction of the old pass manager, which will be replaced by new pass manager soon. There is much complexity involved because new pass manage does not support this optimization, and we need to make it fall back to what we do right now (add all basic blocks to AST) without introducing memory leak. Comments?
132	I think yes because if there is loop variant in its operand, sinking it into the loop may change the value every iteration.
156	It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's frequency, then the check will fail and goes to "else" branch: picking the CDT instead SinkBB.
184	I used clang-format --style=llvm for the formatting.

davidxl added inline comments.Aug 16 2016, 1:26 PM

test/Transforms/LICM/loopsink.ll
18	b6 --> b7
24	add check-not of @g after preheader.
27	add check-not @g after b3 and b4
66	Add a branch profile data here.
75	B6 --> b6
77	B3 --> b3
81	This should be b7
132	annotate with branch profile data
143	B3 -> b3
147	b6 -> b7
211	but this loop will be executed at least once per call of t4, so the loop body frequency should not be lower than entry frequency
272	This test can be simplified a little by just making an external call here.

update tests

danielcdh marked 6 inline comments as done.Aug 16 2016, 2:43 PM

danielcdh added inline comments.

test/Transforms/LICM/loopsink.ll
211	So the current algorithm is that even if the frequency is equal (as in this case), we still tend to sink because it will reduce live range.

I don't have further comments.

Hal, does this patch look ok to you?

David

Are you going to have a separate patch to hook this in the pass manager ?

lib/Transforms/Scalar/LoopSink.cpp
142	Isn't it still okay to try to sink in outside the loop if the user block is cold enough?
162	Why not early return if frequency of SinkBB is greater than PreheaderFreq.
179	I guess you intend L->contains(LI->getLoopFor(N)) ?

update

lib/Transforms/Scalar/LoopSink.cpp
143	That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently.
180	good catch. Thanks!

In D22778#520053, @junbuml wrote:

Are you going to have a separate patch to hook this in the pass manager ?

Yes, I'll send another patch to hook it up in clang.

• dberlin added a subscriber: • dberlin.Aug 18 2016, 3:54 PM

• dberlin added inline comments.

include/llvm/Transforms/Utils/LoopUtils.h
484	The comment is not right or specific. First, this is used for both hoisting and sinking. Second, it does not say what "hoistable" means. In fact, this function is really checking "is aliased with loop", you probably should call it something like that and use that.
487	Ditto above
490	This should be "Return true if a non-memory instruction can be handled by the hoister/sinker" Please don't call it isHoistableInst and then use it for sinking :)
lib/Transforms/Scalar/LICM.cpp
501	This is not really right, it returns whether a non-memory instruction is hoistable. :)
lib/Transforms/Scalar/LoopSink.cpp
42	"Don't sink instructions that require cloning unless they execute less than this percent of the time" (or whatever)
87	"sunk into loop body"
89	CurLoop is unused?
90	s/hoist/sink/
110	bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock *BB) { return BFI->getBlockFreq(BB) <= PreHeaderFreq});
133	I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing them when necessary?
170	Please factor this out into FindSinkBlocks or something. This is non-deterministic, because you are iterating over a denseset. I am also confused by this placement strategy. You are not ordering the blocks in any particular processing order, so you may not actually choose the best sink points, as once you NCA something high in the domtree and something low, NCA will always be something high in the domtree. If you ordered it so it was the lowest things first (using the DFS numbers or whatever), you may decide multiple intermediate placements are cheaper than what you are doing here.
191	Needs a comment
199	This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips if it's empty). Then you should simply move the i == 0 case outside of the loop, and the loop is just doing the insertions.
205	This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one :P)

refactor

include/llvm/Transforms/Utils/LoopUtils.h
490	Refactored code to remove these functions.
lib/Transforms/Scalar/LICM.cpp
501–502	refactored code to remove this.
lib/Transforms/Scalar/LoopSink.cpp
134	We need reverse iterator because instructions in the back of the BB may depend on the instructions in the front, thus it needs to be sunk first before other instructions can be sunk.
171	Good point. Refactored the code and updated the algorithm to iterate from cold blocks top ensure optimal.
200	SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not reach here. The i==0 check is to distinguish between the first SinkBB (that we use move instead of insert) and the later SinkBB (that we make a copy for each insert).

refactor

minor drive by comments

include/llvm/Transforms/Scalar.h
141	Missing comments
include/llvm/Transforms/Utils/Local.h
325 ↗	(On Diff #68730)	Clarification needed. This doesn't really tell me what the parameter does. In fact, there's nothing that says what the BB param is for either. Can you fix that?
lib/Transforms/Scalar/LoopSink.cpp
11	How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but updating the comment with context might be helpful.)

update comments

Integrate with reverse-iterator enhancement.

looks good to me too.

This revision is now accepted and ready to land.Sep 1 2016, 2:14 PM

So, I don't think the code is quite ready to go into the tree yet. There are a bunch of minor issues that should be cleaned up. None of these are really big (it seems like you all have sorted out the algorithmic and high level design stuff), but I think they shuold be addressed before the code goes in. Especially the refactoring bit I suggest below.

lib/Transforms/Scalar/LoopSink.cpp
128–138	I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again all for basic blocks in different containers. And `AddedBBs` doesn't really tell the reader much about what the container is doing. Compare that to `SortedLoopBBs` which says exactly what it is. `SinkBBs` might also benefit from a slightly better name (and the function name might similarly benefit).
129	Usually it is better to keep a set outside the loop and clear it on each iteration...
159–161	Why the variable rather than inlining this? Also, can you just call any_of directly since we have a using declaration and this is a range variant that doesn't exist in the standard?
172–175	This comment again doesn't parse for me, but isn't this dead code now that you're just directly using the reverse iterators?

chandlerc added inline comments.Sep 1 2016, 3:47 PM

lib/Transforms/Scalar/LICM.cpp
434–438	Can you split these refactorings into a separate patch please? They seem strictly beneficial and unrelated, and will make for example reverts messier if left in as part of this patch. I have several minor and boring comments on the refactorings, but it seems better to make them on a dedicated patch than to clutter this thread with them. (Just to be clear, I'd would leave it a static function, and just get the API you want and the implementation improvements. Then in this patch you can just make it an external function.)
lib/Transforms/Scalar/LoopSink.cpp
89–92	I'm having a hard time understanding what this comment is trying to say. Can you try to re-word this to be more clear (and more clearly worded)?
102	Since this is new code, please use the more modern form of doxygen throughout: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments (I've also updated those to more accurately reflect that we use auto-brief new rather than explicit '\brief ...' sections.)
102–106	I feel like this comment too could be much more clear. First, it isn't clear without a lot of context what the purpose of this would be. I'm guessing you mean something like find a candidate set of basic blocks to sink into? "Dominate BBs" - this is ambiguous. Do all returned basic blocks need to dominate the set of blocks in BBs? Or is it more that for each block in BBs, at least one block in the returned set must dominate that block? A better name for the parameter than "BBs" would probably help here. The frequency constraint isn't really explained. Why is that important? Give the reader more help to understand what the code will end up doing.
107–108	Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common, it would be better to use SmallPtrSetImpl as an input and SmallPtrSet as a result with a reasonable small size optimization.
126–127	This comment doesn't parse: "that are dominated it" seems to have a grammar error.

Integrate Chandler's comment.

clang-format

ping...

(Trying to first clarify the split-off of the patch I'm suggesting...)

lib/Transforms/Scalar/LICM.cpp
434–438	I see that we got confused here and in the other review. The part of this refactoring I do think makes sense to split out and send for review independently is changing the signature (for example, removing TargetLibraryInfo) and re-organizing the implementation. The only part I think needs to happen with this patch is making this routine be a public routine in the 'llvm' namespace. Does that make more sense?

rebase and update

Herald added a subscriber: beanz. · View Herald TranscriptSep 8 2016, 10:29 AM

danielcdh added inline comments.Sep 8 2016, 10:37 AM

lib/Transforms/Scalar/LICM.cpp
434–438	Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL. Please do not look at LICM changes in this patch for now because it also includes the refactoring bit. I'll rebase once D24168 is closed.

update the logic to replace dominated uses after sinking.

Herald added a subscriber: mgorny. · View Herald TranscriptSep 14 2016, 2:47 PM

ping...

rebase

Herald added a subscriber: modocache. · View Herald TranscriptOct 3 2016, 12:03 PM

First batch of comments. While I'll probably have a some more minor comments later, there are a couple of particularly interesting ones that I wanted to go ahead and send out.

include/llvm/Transforms/Utils/LoopUtils.h
472–474	Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".
lib/Transforms/Scalar/LoopSink.cpp
11–15	Some grammar issues here: "all instructions" -> "all of the instructions" "in loop preheader" -> "in the loop preheader" "sink it" -> "sink them" "the Sink pass that it only" -> "the Sink pass in that it only" "in loop's preheader" -> "in the loop's preheader". I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of: This pass does the inverse transformation of what LICM does. It traverses all of the instructions ... It differs from the Sink pass ... Does that make sense?
42	Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not.
49	I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration.
102	Naming convention: adjustedSumFreq
130	naming: findBBsToSinkInto
136–144	This shouldn't be done each time we try to sink an instruction. This should be pre-computed once for the loop and re-used for each instruction we try to sink.
149–171	This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large. Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...)
181–182	This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here rather than the old name? Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact.
190–192	I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities." I would also find this easier to read without the negation as: if (all_of(... return BFI->getBlockFreq(BB) > PreheaderFreq;
211	This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me. Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops.
215	If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment?
216–217	I would use two ifs here since one needs its own comment (and it is a nice comment!)
217	Why not use `L->contains(UI)`?
223	Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function.
235	So, this has an important problem: it introduces a non-determinism into the compiler. The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone. However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is also something we would very much like to not depend on: the use list order. I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like: int i = 0; for (auto *BB : L->blocks()) LoopBlockNumber[BB] = ++i; (Just pseudo code, but you get the idea.) That will punt the ordering requirement to LoopInfo which is I think the right place for it.
251	I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking".

Thanks for the reviews!

lib/Transforms/Scalar/LoopSink.cpp
149–171	I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(LM), but if optimal solution is desired, O(NL*M) is the best I can get. Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend). How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases?
215	Not sure if I get this right, do you mean update the comment (as I just did) to make it less redundant?
217	Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1->contains(I1) --> true Loop2->contains(I2) --> true Loop1->contains(I2) --> false For this check we want to make sure I1 and I2 both return true.

integrate Chandler's reviews

missed one comment

What is the status of this patch? Any more comments need to be addressed?

Chandler said he has more comment.

M is the number of use BBs.

The pass already filters out loops which do not have any cold blocks -- this effectively filters out most of the loops in reality so the compile time impact will be minimal. Further more, the following can be done:

only collect cold bbs in the loop body that is colder than header and sort them instead
skip any instructions with use BBs that are not member of the cold BBs collected in 1).

Example of parent BB being colder than the use BB?

Will try to make a full pass through, thanks for the extensive updates Dehao! One specific point of discussion below:

lib/Transforms/Scalar/LoopSink.cpp
149–171	I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction. The other thing is that if this scales in compile time by N then it scales in compile time by how much effect it is having. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions. I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop?

add max use threshold

This is really close. Some minor nit picks and a few more interesting comments below.

include/llvm/Transforms/Utils/LoopUtils.h
472–474	You changed one 'nullptr' to 'null' but missed the other.
lib/Transforms/Scalar/LoopSink.cpp
12–13	"and sink them/ to" -> "and sinks them to"
102	"/p" -> "\p"
160	The number of user instructions isn't really the right thing to apply the threshold to as that doesn't directly change the cost. The idea is that we need the size of `BBsToSinkInto` to be a small constant in order for the search for the coldest dominating set to be "just" linear in the number of blocks in the loop. So while a threshold of "40" may make sense for number of user instructions, I suspect the threshold should be much smaller when applied to the size of `BBsToSinkInto`. I also think you should add two comments about this. One, you should comment to the `findBBsToSinkInto` function clarifying the algorithmic complexity (That it O(N * M) or O(M^2) where N is SortedLoopBBs.size() and M is BBsToSinkInto.size()), and you should mention where you check this threshold that the reason is because we're going to call `findBBsToSinkInto` which would go quadratic if we didn't set a cap. The reason for all of this is that I'm worried some future maintainer will come along and not really understand how risk it is to adjust these thresholds so I think it is important to document the implications. I still think we will long-term need a better algorithmic approach here as I suspect we'll find annoying cases where the threshold blocks an important optimization (such as when there are way too many initial BBsToInsertInto but there are a small number of common dominating blocks). But I understand this is one of the really hard problems (its the same as optimal PHI placement and a bunch of other ones), and I don't want to hold up your patch on a comprehensive approach here. On an unrelated note, you should also document that this threshold has a secondary function: it places an upper bound on how much code growth we may trigger here. I'd document this in particular as that seems somewhat accidental and I suspect we may long-term want a better threshold for that. I would in fact encourage you to leave a FIXME to adjust this for min_size and opt_size.
179–180	We generally prefer calling `.empty()` to testing `.size()` against zero.
185	Here you don't need stable sort since this is a total ordering. You should just use std::sort and mention that this is a known total ordering in a comment. You could do that in an overall comment htat explains what you're doing here: // Copy the final BBs into a vector and sort them using the total ordering // of the loop block numbers as iterating the set doesn't give a useful // order. No need to stable sort as the block numbers are a total ordering.
195	You didn't actually switch to the sorted list here. Also, you can just use a range based for loop here.
219	This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe `sinkLoopInvariantInstructions`? Also, I think this should be a static function.
242–247	It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are less hot than the preheader. I think it might be nice to clue the reader in that this isn't all the loop BBs. Maybe `SortedColdLoopBBs`? Or just `ColdLoopBBs`? If you make this change, I'd keep the name consistent throughout of course. Also, you use `<=` here, but `<` everywhere else I see, any particular reason to include BBs in this list with the same frequency?

update

Thanks for the reviews.

I also added an overall algorithm description at file level per Madhur's suggest.

lib/Transforms/Scalar/LoopSink.cpp
160	You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the most computation intensive part of the algorithm.
195	The reason I used iterator here is because we need to handle the first entry in a different way.

Very cool. I think this patch LGTM with the comments below addressed (documentation fixes, simple changes, a fixme, a bunch of minor test cleanup). Feel free to submit. I've asked for one somewhat immediate follow-up patch, and it'd also be good to get a patch to put this into the pipeline behind a flag so folks can test the size impact.

lib/Transforms/Scalar/LoopSink.cpp
14–23	The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions. I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly: "in a way that" -> "in the following ways" "prehead" -> "preahader" "find optimal" -> "find the optimal"
85	I would use `SmallPtrSetImpl<BasicBlock *>` here and elsewhere on API boundaries where you can below.
213	So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point.
221–229	This causes the cloning to be quadratic in the number of uses as we may have a single clone for each use. I know we have an upper bound, but this is still pretty slow (the use list is slow to walk). I'd suggest you fix this in a follow-up patch though (I think it'll be a lot of code and easier to review as a follow-up), just leave a short FIXME saying that this is slow and may be quadratic in the number of uses. (For the follow-up patch, the approach I'd suggest is that as you build up BBsToInsertInto, you also build up a map from UseBBs to the dominating BB that will be inserted into. Then you can insert into each BBsToInsertInto here without rewriting any uses but building up a map from those BBs to the inserted clones. Finally, you can do a single walk over the uses and for each one look up the inserted BB in the first map and then the inserted clone in the second map and rewrite the use. Or maybe you see a simpler way? This was just the first that came to mind.)
265	Use a SmallDenseMap? Good to dodge allocations for small loops.
test/Transforms/LICM/loopsink.ll
9	You can prune out these "Function Attrs" comments... See below.
293	Please try to minimize function attributes you have in your test cases. You may not need any. If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary).
295–304	Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff).
test/Transforms/LICM/sink.ll
5–6	Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms.
23	Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly.

Thanks for the review!

Herald added a subscriber: anna. · View Herald TranscriptOct 27 2016, 9:24 AM

danielcdh closed this revision.Oct 27 2016, 9:39 AM

davidxl mentioned this in D65060: [LICM] Make Loop ICM profile aware.Jul 22 2019, 11:45 AM

wenlei mentioned this in D152772: [LoopSink] Allow sinking to PHI-use.Jun 13 2023, 8:48 AM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Utils/

LoopUtils.h

11 lines

lib/

Transforms/

Scalar/

1 line

17 lines

328 lines

5 lines

test/

Transforms/

LICM/

loopsink.ll

286 lines

sink.ll

60 lines

Diff 76055

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines
	void initializeInstructionCombiningPassPass(PassRegistry&);			void initializeInstructionCombiningPassPass(PassRegistry&);
	void initializeInstructionSelectPass(PassRegistry &);			void initializeInstructionSelectPass(PassRegistry &);
	void initializeInterleavedAccessPass(PassRegistry &);			void initializeInterleavedAccessPass(PassRegistry &);
	void initializeInternalizeLegacyPassPass(PassRegistry&);			void initializeInternalizeLegacyPassPass(PassRegistry&);
	void initializeIntervalPartitionPass(PassRegistry&);			void initializeIntervalPartitionPass(PassRegistry&);
	void initializeJumpThreadingPass(PassRegistry&);			void initializeJumpThreadingPass(PassRegistry&);
	void initializeLCSSAWrapperPassPass(PassRegistry &);			void initializeLCSSAWrapperPassPass(PassRegistry &);
	void initializeLegacyLICMPassPass(PassRegistry&);			void initializeLegacyLICMPassPass(PassRegistry&);
				void initializeLegacyLoopSinkPassPass(PassRegistry&);
	void initializeLazyBranchProbabilityInfoPassPass(PassRegistry&);			void initializeLazyBranchProbabilityInfoPassPass(PassRegistry&);
	void initializeLazyBlockFrequencyInfoPassPass(PassRegistry&);			void initializeLazyBlockFrequencyInfoPassPass(PassRegistry&);
	void initializeLazyValueInfoWrapperPassPass(PassRegistry&);			void initializeLazyValueInfoWrapperPassPass(PassRegistry&);
	void initializeLegalizerPass(PassRegistry&);			void initializeLegalizerPass(PassRegistry&);
	void initializeLibCallsShrinkWrapLegacyPassPass(PassRegistry&);			void initializeLibCallsShrinkWrapLegacyPassPass(PassRegistry&);
	void initializeLintPass(PassRegistry&);			void initializeLintPass(PassRegistry&);
	void initializeLiveDebugValuesPass(PassRegistry&);			void initializeLiveDebugValuesPass(PassRegistry&);
	void initializeLiveDebugVariablesPass(PassRegistry&);			void initializeLiveDebugVariablesPass(PassRegistry&);
	▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createIPConstantPropagationPass();		(void) llvm::createIPConstantPropagationPass();
(void) llvm::createIPSCCPPass();		(void) llvm::createIPSCCPPass();
(void) llvm::createInductiveRangeCheckEliminationPass();		(void) llvm::createInductiveRangeCheckEliminationPass();
(void) llvm::createIndVarSimplifyPass();		(void) llvm::createIndVarSimplifyPass();
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
		(void) llvm::createLoopSinkPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
(void) llvm::createLoopInterchangePass();		(void) llvm::createLoopInterchangePass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopSimplifyCFGPass();		(void) llvm::createLoopSimplifyCFGPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);			FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				reamesUnsubmitted Done Reply Inline Actions Missing comments reames: Missing comments
	//			//
				// LoopSink - This pass sinks invariants from preheader to loop body where
				// frequency is lower than loop preheader.
				//
				Pass *createLoopSinkPass();

				//===----------------------------------------------------------------------===//
				//
	// LoopInterchange - This pass interchanges loops to provide a more			// LoopInterchange - This pass interchanges loops to provide a more
	// cache-friendly memory access patterns.			// cache-friendly memory access patterns.
	//			//
	Pass *createLoopInterchangePass();			Pass *createLoopInterchangePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	▲ Show 20 Lines • Show All 387 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/LoopUtils.h

Show First 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	void addStringMetadataToLoop(Loop TheLoop, const char MDString,
unsigned V = 0);		unsigned V = 0);

/// Helper to consistently add the set of standard passes to a loop pass's \c		/// Helper to consistently add the set of standard passes to a loop pass's \c
/// AnalysisUsage.		/// AnalysisUsage.
///		///
/// All loop passes should call this as part of implementing their \c		/// All loop passes should call this as part of implementing their \c
/// getAnalysisUsage.		/// getAnalysisUsage.
void getLoopAnalysisUsage(AnalysisUsage &AU);		void getLoopAnalysisUsage(AnalysisUsage &AU);

		/// Returns true if the hoister and sinker can handle this instruction.
		/// If SafetyInfo is null, we are checking for sinking instructions from
		/// preheader to loop body (no speculation).
		/// If SafetyInfo is not null, we are checking for hoisting/sinking
		chandlercUnsubmitted Done Reply Inline Actions Here and elsewhere in comments, I would just say "is null" rather than "is nullptr". chandlerc: Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".
		chandlercUnsubmitted Done Reply Inline Actions You changed one 'nullptr' to 'null' but missed the other. chandlerc: You changed one 'nullptr' to 'null' but missed the other.
		/// instructions from loop body to preheader/exit. Check if the instruction
		/// can execute specultatively.
		///
		bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
		Loop CurLoop, AliasSetTracker CurAST,
		LoopSafetyInfo *SafetyInfo);
}		}

#endif		#endif
		dberlinUnsubmitted Done Reply Inline Actions The comment is not right or specific. First, this is used for both hoisting and sinking. Second, it does not say what "hoistable" means. In fact, this function is really checking "is aliased with loop", you probably should call it something like that and use that. dberlin: 1. The comment is not right or specific. First, this is used for both hoisting and sinking.
		dberlinUnsubmitted Done Reply Inline Actions Ditto above dberlin: Ditto above
		dberlinUnsubmitted Done Reply Inline Actions This should be "Return true if a non-memory instruction can be handled by the hoister/sinker" Please don't call it isHoistableInst and then use it for sinking :) dberlin: This should be "Return true if a non-memory instruction can be handled by the hoister/sinker"…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Refactored code to remove these functions. danielcdh: Refactored code to remove these functions.

lib/Transforms/Scalar/CMakeLists.txt

Show All 11 Lines	add_llvm_library(LLVMScalarOpts
Float2Int.cpp		Float2Int.cpp
GuardWidening.cpp		GuardWidening.cpp
GVN.cpp		GVN.cpp
GVNHoist.cpp		GVNHoist.cpp
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
		LoopSink.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
LoopInterchange.cpp		LoopInterchange.cpp
LoopLoadElimination.cpp		LoopLoadElimination.cpp
Show All 36 Lines

lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	static bool isSafeToExecuteUnconditionally(const Instruction &Inst,
const Instruction *CtxI = nullptr);		const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,		static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,
const AAMDNodes &AAInfo,		const AAMDNodes &AAInfo,
AliasSetTracker *CurAST);		AliasSetTracker *CurAST);
static Instruction *		static Instruction *
CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,		CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,
const LoopInfo *LI,		const LoopInfo *LI,
const LoopSafetyInfo *SafetyInfo);		const LoopSafetyInfo *SafetyInfo);
static bool canSinkOrHoistInst(Instruction &I, AliasAnalysis *AA,
DominatorTree *DT,
Loop CurLoop, AliasSetTracker CurAST,
LoopSafetyInfo *SafetyInfo);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, ScalarEvolution SE, bool DeleteAST);		TargetLibraryInfo TLI, ScalarEvolution SE, bool DeleteAST);

DenseMap<Loop , AliasSetTracker > &getLoopToAliasSetMap() {		DenseMap<Loop , AliasSetTracker > &getLoopToAliasSetMap() {
return LoopToAliasSetMap;		return LoopToAliasSetMap;
▲ Show 20 Lines • Show All 315 Lines • ▼ Show 20 Lines	void llvm::computeLoopSafetyInfo(LoopSafetyInfo SafetyInfo, Loop CurLoop) {
// Compute funclet colors if we might sink/hoist in a function with a funclet		// Compute funclet colors if we might sink/hoist in a function with a funclet
// personality routine.		// personality routine.
Function *Fn = CurLoop->getHeader()->getParent();		Function *Fn = CurLoop->getHeader()->getParent();
if (Fn->hasPersonalityFn())		if (Fn->hasPersonalityFn())
if (Constant *PersonalityFn = Fn->getPersonalityFn())		if (Constant *PersonalityFn = Fn->getPersonalityFn())
if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))		if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))
SafetyInfo->BlockColors = colorEHFunclets(*Fn);		SafetyInfo->BlockColors = colorEHFunclets(*Fn);
}		}

/// Returns true if the hoister and sinker can handle this instruction.		bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
/// If SafetyInfo is nullptr, we are checking for sinking instructions from
/// preheader to loop body (no speculation).
/// If SafetyInfo is not nullptr, we are checking for hoisting/sinking
/// instructions from loop body to preheader/exit. Check if the instruction
/// can execute specultatively.
///
bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
Loop CurLoop, AliasSetTracker CurAST,		Loop CurLoop, AliasSetTracker CurAST,
LoopSafetyInfo *SafetyInfo) {		LoopSafetyInfo *SafetyInfo) {
// Loads have extra constraints we have to verify before we can hoist them.		// Loads have extra constraints we have to verify before we can hoist them.
		chandlercUnsubmitted Done Reply Inline Actions Can you split these refactorings into a separate patch please? They seem strictly beneficial and unrelated, and will make for example reverts messier if left in as part of this patch. I have several minor and boring comments on the refactorings, but it seems better to make them on a dedicated patch than to clutter this thread with them. (Just to be clear, I'd would leave it a static function, and just get the API you want and the implementation improvements. Then in this patch you can just make it an external function.) chandlerc: Can you split these refactorings into a separate patch please? They seem strictly beneficial…
		chandlercUnsubmitted Not Done Reply Inline Actions I see that we got confused here and in the other review. The part of this refactoring I do think makes sense to split out and send for review independently is changing the signature (for example, removing TargetLibraryInfo) and re-organizing the implementation. The only part I think needs to happen with this patch is making this routine be a public routine in the 'llvm' namespace. Does that make more sense? chandlerc: I see that we got confused here and in the other review. The part of this refactoring I do…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL. Please do not look at LICM changes in this patch for now because it also includes the refactoring bit. I'll rebase once D24168 is closed. danielcdh: Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL.
if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (!LI->isUnordered())		if (!LI->isUnordered())
return false; // Don't hoist volatile/atomic loads!		return false; // Don't hoist volatile/atomic loads!

// Loads from constant memory are always safe to move, even if they end up		// Loads from constant memory are always safe to move, even if they end up
// in the same alias set as something that ends up being modified.		// in the same alias set as something that ends up being modified.
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	if (AliasAnalysis::onlyReadsMemory(Behavior)) {
if (!FoundMod)		if (!FoundMod)
return true;		return true;
}		}

// FIXME: This should use mod/ref information to see if we can hoist or		// FIXME: This should use mod/ref information to see if we can hoist or
// sink the call.		// sink the call.

return false;		return false;
}		}
		dberlinUnsubmitted Done Reply Inline Actions This is not really right, it returns whether a non-memory instruction is hoistable. :) dberlin: This is not really right, it returns whether a non-memory instruction is hoistable. :)

		danielcdhAuthorUnsubmitted Done Reply Inline Actions refactored code to remove this. danielcdh: refactored code to remove this.
// Only these instructions are hoistable/sinkable.		// Only these instructions are hoistable/sinkable.
if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&		if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&
!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&		!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&
!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&		!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&
!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&		!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&
!isa<InsertValueInst>(I))		!isa<InsertValueInst>(I))
return false;		return false;

▲ Show 20 Lines • Show All 664 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopSink.cpp

This file was added.

				//===-- LoopSink.cpp - Loop Sink Pass ------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass does the inverse transformation of what LICM does.
				// It traverses all of the instructions in the loop's preheader and sinks
				reamesUnsubmitted Done Reply Inline Actions How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but updating the comment with context might be helpful.) reames: How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but…
				// them to the loop body where frequency is lower than the loop's preheader.
				// This pass is a reverse-transformation of LICM. It differs from the Sink
				chandlercUnsubmitted Done Reply Inline Actions "and sink them/ to" -> "and sinks them to" chandlerc: "and sink them/ to" -> "and sinks them to"
				// pass in the following ways:
				//
				chandlercUnsubmitted Done Reply Inline Actions Some grammar issues here: "all instructions" -> "all of the instructions" "in loop preheader" -> "in the loop preheader" "sink it" -> "sink them" "the Sink pass that it only" -> "the Sink pass in that it only" "in loop's preheader" -> "in the loop's preheader". I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of: This pass does the inverse transformation of what LICM does. It traverses all of the instructions ... It differs from the Sink pass ... Does that make sense? chandlerc: Some grammar issues here: - "all instructions" -> "all of the instructions" - "in loop…
				// * It only handles sinking of instructions from the loop's preheader to the
				// loop's body
				// * It uses alias set tracker to get more accurate alias info
				// * It uses block frequency info to find the optimal sinking locations
				//
				// Overall algorithm:
				//
				// For I in Preheader:
				chandlercUnsubmitted Done Reply Inline Actions The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions. I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly: "in a way that" -> "in the following ways" "prehead" -> "preahader" "find optimal" -> "find the optimal" chandlerc: The differences seem to be a bit duplicated at this point. Sorry if this is the result of my…
				// InsertBBs = BBs that uses I
				// For BB in sorted(LoopBBs):
				// DomBBs = BBs in InsertBBs that are dominated by BB
				// if freq(DomBBs) > freq(BB)
				// InsertBBs = UseBBs - DomBBs + BB
				// For BB in InsertBBs:
				// Insert I at BB's beginning
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/BasicAliasAnalysis.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/Analysis/Loads.h"
				davidxlUnsubmitted Done Reply Inline Actions Add statistics for number of instructions sinked etc. davidxl: Add statistics for number of instructions sinked etc.
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/LoopPassManager.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				dberlinUnsubmitted Done Reply Inline Actions "Don't sink instructions that require cloning unless they execute less than this percent of the time" (or whatever) dberlin: "Don't sink instructions that require cloning unless they execute less than this percent of the…
				chandlercUnsubmitted Done Reply Inline Actions Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not. chandlerc: Do you want to count separately the number of instruction clones created as part of this? Not…
				#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Metadata.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Transforms/Scalar.h"
				chandlercUnsubmitted Done Reply Inline Actions I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration. chandlerc: I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this…
				#include "llvm/Transforms/Utils/Local.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				using namespace llvm;

				#define DEBUG_TYPE "loopsink"

				STATISTIC(NumLoopSunk, "Number of instructions sunk into loop");
				STATISTIC(NumLoopSunkCloned, "Number of cloned instructions sunk into loop");

				static cl::opt<unsigned> SinkFrequencyPercentThreshold(
				"sink-freq-percent-threshold", cl::Hidden, cl::init(90),
				cl::desc("Do not sink instructions that require cloning unless they "
				"execute less than this percent of the time."));

				static cl::opt<unsigned> MaxNumberOfUseBBsForSinking(
				"max-uses-for-sinking", cl::Hidden, cl::init(30),
				cl::desc("Do not sink instructions that have too many uses."));

				/// Return adjusted total frequency of \p BBs.
				///
				/// * If there is only one BB, sinking instruction will not introduce code
				/// size increase. Thus there is no need to adjust the frequency.
				/// * If there are more than one BB, sinking would lead to code size increase.
				/// In this case, we add some "tax" to the total frequency to make it harder
				/// to sink. E.g.
				/// Freq(Preheader) = 100
				/// Freq(BBs) = sum(50, 49) = 99
				/// Even if Freq(BBs) < Freq(Preheader), we will not sink from Preheade to
				/// BBs as the difference is too small to justify the code size increase.
				/// To model this, The adjusted Freq(BBs) will be:
				davidxlUnsubmitted Done Reply Inline Actions LoopBase class has a member method 'contains' which can be used. davidxl: LoopBase class has a member method 'contains' which can be used.
				/// AdjustedFreq(BBs) = 99 / SinkFrequencyPercentThreshold%
				static BlockFrequency adjustedSumFreq(SmallPtrSetImpl<BasicBlock *> &BBs,
				BlockFrequencyInfo &BFI) {
				BlockFrequency T = 0;
				for (BasicBlock *B : BBs)
				T += BFI.getBlockFreq(B);
				chandlercUnsubmitted Done Reply Inline Actions I would use `SmallPtrSetImpl<BasicBlock >` here and elsewhere on API boundaries where you can below. chandlerc:* I would use `SmallPtrSetImpl<BasicBlock *>` here and elsewhere on API boundaries where you can…
				if (BBs.size() > 1)
				T /= BranchProbability(SinkFrequencyPercentThreshold, 100);
				dberlinUnsubmitted Done Reply Inline Actions "sunk into loop body" dberlin: "sunk into loop body"
				return T;
				}
				dberlinUnsubmitted Done Reply Inline Actions CurLoop is unused? dberlin: CurLoop is unused?

				dberlinUnsubmitted Done Reply Inline Actions s/hoist/sink/ dberlin: s/hoist/sink/
				/// Return a set of basic blocks to insert sinked instructions.
				///
				davidxlUnsubmitted Done Reply Inline Actions Same here -- please share the utility with LICM (isHoistableLoad) davidxl: Same here -- please share the utility with LICM (isHoistableLoad)
				chandlercUnsubmitted Done Reply Inline Actions I'm having a hard time understanding what this comment is trying to say. Can you try to re-word this to be more clear (and more clearly worded)? chandlerc: I'm having a hard time understanding what this comment is trying to say. Can you try to re-word…
				/// The returned set of basic blocks (BBsToSinkInto) should satisfy:
				///
				/// * Inside the loop \p L
				/// * For each UseBB in \p UseBBs, there is at least one BB in BBsToSinkInto
				/// that domintates the UseBB
				/// * Has minimum total frequency that is no greater than preheader frequency
				///
				davidxlUnsubmitted Done Reply Inline Actions Remove dead code davidxl: Remove dead code
				/// The purpose of the function is to find the optimal sinking points to
				/// minimize execution cost, which is defined as "sum of frequency of
				/// BBsToSinkInto".
				chandlercUnsubmitted Done Reply Inline Actions Since this is new code, please use the more modern form of doxygen throughout: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments (I've also updated those to more accurately reflect that we use auto-brief new rather than explicit '\brief ...' sections.) chandlerc: Since this is new code, please use the more modern form of doxygen throughout: http://llvm.
				chandlercUnsubmitted Done Reply Inline Actions Naming convention: adjustedSumFreq chandlerc: Naming convention: adjustedSumFreq
				chandlercUnsubmitted Done Reply Inline Actions "/p" -> "\p" chandlerc: "/p" -> "\p"
				/// As a result, the returned BBsToSinkInto needs to have minimum total
				/// frequency.
				/// Additionally, if the total frequency of BBsToSinkInto exceeds preheader
				/// frequency, the optimal solution is not sinking (return empty set).
				chandlercUnsubmitted Done Reply Inline Actions I feel like this comment too could be much more clear. First, it isn't clear without a lot of context what the purpose of this would be. I'm guessing you mean something like find a candidate set of basic blocks to sink into? "Dominate BBs" - this is ambiguous. Do all returned basic blocks need to dominate the set of blocks in BBs? Or is it more that for each block in BBs, at least one block in the returned set must dominate that block? A better name for the parameter than "BBs" would probably help here. The frequency constraint isn't really explained. Why is that important? Give the reader more help to understand what the code will end up doing. chandlerc: I feel like this comment too could be much more clear. First, it isn't clear without a lot of…
				///
				/// \p ColdLoopBBs is used to help find the optimal sinking locations.
				chandlercUnsubmitted Done Reply Inline Actions Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common, it would be better to use SmallPtrSetImpl as an input and SmallPtrSet as a result with a reasonable small size optimization. chandlerc: Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common…
				/// It stores a list of BBs that is:
				///
				dberlinUnsubmitted Done Reply Inline Actions bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock BB) { return BFI->getBlockFreq(BB) <= PreHeaderFreq}); dberlin:* bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock *BB) { return BFI->getBlockFreq…
				/// * Inside the loop \p L
				/// * Has a frequency no larger than the loop's preheader
				/// * Sorted by BB frequency
				///
				/// The complexity of the function is O(UseBBs.size() * ColdLoopBBs.size()).
				davidxlUnsubmitted Done Reply Inline Actions This code is shared with LICM, can this code be refactored into some utillty (declared in LICM.h) helper? davidxl: This code is shared with LICM, can this code be refactored into some utillty (declared in LICM.
				/// To avoid expensive computation, we cap the maximum UseBBs.size() in its
				/// caller.
				static SmallPtrSet<BasicBlock *, 2>
				findBBsToSinkInto(const Loop &L, const SmallPtrSetImpl<BasicBlock *> &UseBBs,
				const SmallVectorImpl<BasicBlock *> &ColdLoopBBs,
				DominatorTree &DT, BlockFrequencyInfo &BFI) {
				SmallPtrSet<BasicBlock *, 2> BBsToSinkInto;
				if (UseBBs.size() == 0)
				davidxlUnsubmitted Done Reply Inline Actions This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop -- that should be extracted as a common utility. davidxl: This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy passmanager: Build a base class ASTLoopTransformation, and make LookInvariantCodeMotion and LoopSink sub class of it. LoopToAliasSetMap and collectAliasInfoForLoop should be protected member of the base class. The logic in the sub class will also be shared with the new pass manager. Build a base class ASTLoopLegacyPass which inherits from LoopPass, and LICMLegacyPass and LoopSinkLegacyPass subclass of it. cloneBasicBlockAnalysis, deleteAnalysisValue and deleteAnalysisLoop should be private member of the base class. Base class also need to have a ASTLoopTransformation pointer to invoke the actual logic shared with new pass manager. This is definitely doable, but seems an over kill to me because: collectAliasInfoForLoop is an optimization for compile time. And it is not yet available in the new pass manager. The refactoring mainly focuses on abstraction of the old pass manager, which will be replaced by new pass manager soon. There is much complexity involved because new pass manage does not support this optimization, and we need to make it fall back to what we do right now (add all basic blocks to AST) without introducing memory leak. Comments? danielcdh: Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy…
				return BBsToSinkInto;

				BBsToSinkInto.insert(UseBBs.begin(), UseBBs.end());
				SmallPtrSet<BasicBlock *, 2> BBsDominatedByColdestBB;
				chandlercUnsubmitted Done Reply Inline Actions This comment doesn't parse: "that are dominated it" seems to have a grammar error. chandlerc: This comment doesn't parse: "that are dominated it" seems to have a grammar error.

				// For every iteration:
				chandlercUnsubmitted Done Reply Inline Actions Usually it is better to keep a set outside the loop and clear it on each iteration... chandlerc: Usually it is better to keep a set outside the loop and clear it on each iteration...
				// * Pick the ColdestBB from ColdLoopBBs
				chandlercUnsubmitted Done Reply Inline Actions naming: findBBsToSinkInto chandlerc: naming: findBBsToSinkInto
				// * Find the set BBsDominatedByColdestBB that satisfy:
				// - BBsDominatedByColdestBB is a subset of BBsToSinkInto
				davidxlUnsubmitted Done Reply Inline Actions Is the first check needed? davidxl: Is the first check needed?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I think yes because if there is loop variant in its operand, sinking it into the loop may change the value every iteration. danielcdh: I think yes because if there is loop variant in its operand, sinking it into the loop may…
				// - Every BB in BBsDominatedByColdestBB is dominated by ColdestBB
				dberlinUnsubmitted Not Done Reply Inline Actions I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing them when necessary? dberlin: I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing…
				// * If Freq(ColdestBB) < Freq(BBsDominatedByColdestBB), remove
				davidxlUnsubmitted Done Reply Inline Actions add documentation for the method. davidxl: add documentation for the method.
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions We need reverse iterator because instructions in the back of the BB may depend on the instructions in the front, thus it needs to be sunk first before other instructions can be sunk. danielcdh: We need reverse iterator because instructions in the back of the BB may depend on the…
				// BBsDominatedByColdestBB from BBsToSinkInto, add ColdestBB to
				// BBsToSinkInto
				for (BasicBlock *ColdestBB : ColdLoopBBs) {
				BBsDominatedByColdestBB.clear();
				chandlercUnsubmitted Done Reply Inline Actions I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again all for basic blocks in different containers. And `AddedBBs` doesn't really tell the reader much about what the container is doing. Compare that to `SortedLoopBBs` which says exactly what it is. `SinkBBs` might also benefit from a slightly better name (and the function name might similarly benefit). chandlerc: I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again…
				for (BasicBlock *SinkedBB : BBsToSinkInto)
				if (DT.dominates(ColdestBB, SinkedBB))
				davidxlUnsubmitted Done Reply Inline Actions To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if there are any blocks that are really cold (colder than preheader). If there are not, early return. davidxl: To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if…
				BBsDominatedByColdestBB.insert(SinkedBB);
				if (BBsDominatedByColdestBB.size() == 0)
				junbumlUnsubmitted Done Reply Inline Actions Isn't it still okay to try to sink in outside the loop if the user block is cold enough? junbuml: Isn't it still okay to try to sink in outside the loop if the user block is cold enough?
				continue;
				danielcdhAuthorUnsubmitted Done Reply Inline Actions That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently. danielcdh: That would become general purpose sinking instead of loop-sinking. And we need to handle…
				if (adjustedSumFreq(BBsDominatedByColdestBB, BFI) >
				davidxlUnsubmitted Done Reply Inline Actions add a comment for this variable. davidxl: add a comment for this variable.
				chandlercUnsubmitted Done Reply Inline Actions This shouldn't be done each time we try to sink an instruction. This should be pre-computed once for the loop and re-used for each instruction we try to sink. chandlerc: This shouldn't be done each time we try to sink an instruction. This should be pre-computed…
				BFI.getBlockFreq(ColdestBB)) {
				for (BasicBlock *DominatedBB : BBsDominatedByColdestBB) {
				BBsToSinkInto.erase(DominatedBB);
				}
				davidxlUnsubmitted Done Reply Inline Actions Contnue if BBs is empty davidxl: Contnue if BBs is empty
				BBsToSinkInto.insert(ColdestBB);
				}
				}

				// If the total frequency of BBsToSinkInto is larger than preheader frequency,
				// do not sink.
				if (adjustedSumFreq(BBsToSinkInto, BFI) >
				BFI.getBlockFreq(L.getLoopPreheader()))
				davidxlUnsubmitted Done Reply Inline Actions This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the sum of SinkBB and N. In other words, division should be used, not multiplication davidxl: This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's frequency, then the check will fail and goes to "else" branch: picking the CDT instead SinkBB. danielcdh: It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's…
				BBsToSinkInto.clear();
				return BBsToSinkInto;
				}

				davidxlUnsubmitted Done Reply Inline Actions if -> is davidxl: if -> is
				chandlercUnsubmitted Done Reply Inline Actions The number of user instructions isn't really the right thing to apply the threshold to as that doesn't directly change the cost. The idea is that we need the size of `BBsToSinkInto` to be a small constant in order for the search for the coldest dominating set to be "just" linear in the number of blocks in the loop. So while a threshold of "40" may make sense for number of user instructions, I suspect the threshold should be much smaller when applied to the size of `BBsToSinkInto`. I also think you should add two comments about this. One, you should comment to the `findBBsToSinkInto` function clarifying the algorithmic complexity (That it O(N * M) or O(M^2) where N is SortedLoopBBs.size() and M is BBsToSinkInto.size()), and you should mention where you check this threshold that the reason is because we're going to call `findBBsToSinkInto` which would go quadratic if we didn't set a cap. The reason for all of this is that I'm worried some future maintainer will come along and not really understand how risk it is to adjust these thresholds so I think it is important to document the implications. I still think we will long-term need a better algorithmic approach here as I suspect we'll find annoying cases where the threshold blocks an important optimization (such as when there are way too many initial BBsToInsertInto but there are a small number of common dominating blocks). But I understand this is one of the really hard problems (its the same as optimal PHI placement and a bunch of other ones), and I don't want to hold up your patch on a comprehensive approach here. On an unrelated note, you should also document that this threshold has a secondary function: it places an upper bound on how much code growth we may trigger here. I'd document this in particular as that seems somewhat accidental and I suspect we may long-term want a better threshold for that. I would in fact encourage you to leave a FIXME to adjust this for min_size and opt_size. chandlerc: The number of user instructions isn't really the right thing to apply the threshold to as…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the most computation intensive part of the algorithm. danielcdh: You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the…
				// Sinks \p I from the loop \p L's preheader to its uses. Returns true if
				chandlercUnsubmitted Done Reply Inline Actions Why the variable rather than inlining this? Also, can you just call any_of directly since we have a using declaration and this is a range variant that doesn't exist in the standard? chandlerc: Why the variable rather than inlining this? Also, can you just call any_of directly since we…
				// sinking is successful.
				junbumlUnsubmitted Done Reply Inline Actions Why not early return if frequency of SinkBB is greater than PreheaderFreq. junbuml: Why not early return if frequency of SinkBB is greater than PreheaderFreq.
				// \p LoopBlockNumber is used to sort the insertion blocks to ensure
				// determinism.
				static bool sinkInstruction(Loop &L, Instruction &I,
				const SmallVectorImpl<BasicBlock *> &ColdLoopBBs,
				const SmallDenseMap<BasicBlock *, int, 16> &LoopBlockNumber,
				LoopInfo &LI, DominatorTree &DT,
				BlockFrequencyInfo &BFI) {
				// Compute the set of blocks in loop L which contain a use of I.
				dberlinUnsubmitted Done Reply Inline Actions Please factor this out into FindSinkBlocks or something. This is non-deterministic, because you are iterating over a denseset. I am also confused by this placement strategy. You are not ordering the blocks in any particular processing order, so you may not actually choose the best sink points, as once you NCA something high in the domtree and something low, NCA will always be something high in the domtree. If you ordered it so it was the lowest things first (using the DFS numbers or whatever), you may decide multiple intermediate placements are cheaper than what you are doing here. dberlin: 1. Please factor this out into FindSinkBlocks or something. 2. This is non-deterministic…
				SmallPtrSet<BasicBlock *, 2> BBs;
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Good point. Refactored the code and updated the algorithm to iterate from cold blocks top ensure optimal. danielcdh: Good point. Refactored the code and updated the algorithm to iterate from cold blocks top…
				chandlercUnsubmitted Done Reply Inline Actions This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large. Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...) chandlerc: This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(LM), but if optimal solution is desired, O(NLM) is the best I can get. Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend). How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases? danielcdh:* I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not…
				chandlercUnsubmitted Done Reply Inline Actions I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction. The other thing is that if this scales in compile time by N then it scales in compile time by how much effect it is having. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions. I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop? chandlerc: I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is…
				for (auto &U : I.uses()) {
				Instruction *UI = cast<Instruction>(U.getUser());
				// We cannot sink I to PHI-uses.
				if (dyn_cast<PHINode>(UI))
				chandlercUnsubmitted Done Reply Inline Actions This comment again doesn't parse for me, but isn't this dead code now that you're just directly using the reverse iterators? chandlerc: This comment again doesn't parse for me, but isn't this dead code now that you're just directly…
				return false;
				// We cannot sink I if it has uses outside of the loop.
				if (!L.contains(LI.getLoopFor(UI->getParent())))
				return false;
				davidxlUnsubmitted Done Reply Inline Actions Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum, do not use cdt. davidxl: Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum…
				junbumlUnsubmitted Done Reply Inline Actions I guess you intend L->contains(LI->getLoopFor(N)) ? junbuml: I guess you intend L->contains(LI->getLoopFor(N)) ?
				BBs.insert(UI->getParent());
				danielcdhAuthorUnsubmitted Done Reply Inline Actions good catch. Thanks! danielcdh: good catch. Thanks!
				chandlercUnsubmitted Done Reply Inline Actions We generally prefer calling `.empty()` to testing `.size()` against zero. chandlerc: We generally prefer calling `.empty()` to testing `.size()` against zero.
				}

				chandlercUnsubmitted Done Reply Inline Actions This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here rather than the old name? Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact. chandlerc: This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here…
				// findBBsToSinkInto is O(BBs.size() * ColdLoopBBs.size()). We cap the max
				// BBs.size() to avoid expensive computation.
				davidxlUnsubmitted Done Reply Inline Actions Is the formatting correct here? davidxl: Is the formatting correct here?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I used clang-format --style=llvm for the formatting. danielcdh: I used clang-format --style=llvm for the formatting.
				// FIXME: Handle code size growth for min_size and opt_size.
				chandlercUnsubmitted Done Reply Inline Actions Here you don't need stable sort since this is a total ordering. You should just use std::sort and mention that this is a known total ordering in a comment. You could do that in an overall comment htat explains what you're doing here: // Copy the final BBs into a vector and sort them using the total ordering // of the loop block numbers as iterating the set doesn't give a useful // order. No need to stable sort as the block numbers are a total ordering. chandlerc: Here you don't need stable sort since this is a total ordering. You should just use std::sort…
				if (BBs.size() > MaxNumberOfUseBBsForSinking)
				return false;

				// Find the set of BBs that we should insert a copy of I.
				SmallPtrSet<BasicBlock *, 2> BBsToSinkInto =
				findBBsToSinkInto(L, BBs, ColdLoopBBs, DT, BFI);
				dberlinUnsubmitted Done Reply Inline Actions Needs a comment dberlin: Needs a comment
				if (BBsToSinkInto.empty())
				chandlercUnsubmitted Done Reply Inline Actions I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities." I would also find this easier to read without the negation as: if (all_of(... return BFI->getBlockFreq(BB) > PreheaderFreq; chandlerc: I think a comment along the lines of "If there are no basic blocks with lower frequency than…
				return false;

				// Copy the final BBs into a vector and sort them using the total ordering
				chandlercUnsubmitted Done Reply Inline Actions You didn't actually switch to the sorted list here. Also, you can just use a range based for loop here. chandlerc: You didn't actually switch to the sorted list here. Also, you can just use a range based for…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions The reason I used iterator here is because we need to handle the first entry in a different way. danielcdh: The reason I used iterator here is because we need to handle the first entry in a different way.
				// of the loop block numbers as iterating the set doesn't give a useful
				// order. No need to stable sort as the block numbers are a total ordering.
				SmallVector<BasicBlock *, 2> SortedBBsToSinkInto;
				SortedBBsToSinkInto.insert(SortedBBsToSinkInto.begin(), BBsToSinkInto.begin(),
				dberlinUnsubmitted Not Done Reply Inline Actions This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips if it's empty). Then you should simply move the i == 0 case outside of the loop, and the loop is just doing the insertions. dberlin: This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips…
				BBsToSinkInto.end());
				davidxlUnsubmitted Done Reply Inline Actions if T >= ... early return davidxl: if T >= ... early return
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not reach here. The i==0 check is to distinguish between the first SinkBB (that we use move instead of insert) and the later SinkBB (that we make a copy for each insert). danielcdh: SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not…
				std::sort(SortedBBsToSinkInto.begin(), SortedBBsToSinkInto.end(),
				[&](BasicBlock A, BasicBlock B) {
				return LoopBlockNumber.find(A) < LoopBlockNumber.find(B);
				});

				davidxlUnsubmitted Done Reply Inline Actions Add debug trace here davidxl: Add debug trace here
				dberlinUnsubmitted Done Reply Inline Actions This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one :P) dberlin: This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one…
				BasicBlock MoveBB = SortedBBsToSinkInto.begin();
				// FIXME: Optimize the efficiency for cloned value replacement. The current
				// implementation is O(SortedBBsToSinkInto.size() * I.num_uses()).
				for (BasicBlock *N : SortedBBsToSinkInto) {
				if (N == MoveBB)
				continue;
				chandlercUnsubmitted Done Reply Inline Actions This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me. Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops. chandlerc: This comment is a little confusing. It seems to be describing a think (like a variable, for…
				// Clone I and replace its uses.
				Instruction *IC = I.clone();
				chandlercUnsubmitted Done Reply Inline Actions So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point. chandlerc: So, this technically will break the verifier if you ever look at the IR at this point. While…
				IC->setName(I.getName());
				IC->insertBefore(&*N->getFirstInsertionPt());
				chandlercUnsubmitted Done Reply Inline Actions If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment? chandlerc: If this is the case we can't sink I at all though, right? I think that is what the code already…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Not sure if I get this right, do you mean update the comment (as I just did) to make it less redundant? danielcdh: Not sure if I get this right, do you mean update the comment (as I just did) to make it less…
				// Replaces uses of I with IC in N
				for (Value::use_iterator UI = I.use_begin(), UE = I.use_end(); UI != UE;) {
				chandlercUnsubmitted Done Reply Inline Actions I would use two ifs here since one needs its own comment (and it is a nice comment!) chandlerc: I would use two ifs here since one needs its own comment (and it is a nice comment!)
				chandlercUnsubmitted Done Reply Inline Actions Why not use `L->contains(UI)`? chandlerc: Why not use `L->contains(UI)`?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1->contains(I1) --> true Loop2->contains(I2) --> true Loop1->contains(I2) --> false For this check we want to make sure I1 and I2 both return true. danielcdh: Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1…
				Use &U = *UI++;
				auto *I = cast<Instruction>(U.getUser());
				chandlercUnsubmitted Done Reply Inline Actions This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe `sinkLoopInvariantInstructions`? Also, I think this should be a static function. chandlerc: This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe…
				if (I->getParent() == N)
				U.set(IC);
				}
				// Replaces uses of I with IC in blocks dominated by N
				chandlercUnsubmitted Done Reply Inline Actions Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function. chandlerc: Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined…
				replaceDominatedUsesWith(&I, IC, DT, N);
				DEBUG(dbgs() << "Sinking a clone of " << I << " To: " << N->getName()
				<< '\n');
				NumLoopSunkCloned++;
				}
				DEBUG(dbgs() << "Sinking " << I << " To: " << MoveBB->getName() << '\n');
				chandlercUnsubmitted Done Reply Inline Actions This causes the cloning to be quadratic in the number of uses as we may have a single clone for each use. I know we have an upper bound, but this is still pretty slow (the use list is slow to walk). I'd suggest you fix this in a follow-up patch though (I think it'll be a lot of code and easier to review as a follow-up), just leave a short FIXME saying that this is slow and may be quadratic in the number of uses. (For the follow-up patch, the approach I'd suggest is that as you build up BBsToInsertInto, you also build up a map from UseBBs to the dominating BB that will be inserted into. Then you can insert into each BBsToInsertInto here without rewriting any uses but building up a map from those BBs to the inserted clones. Finally, you can do a single walk over the uses and for each one look up the inserted BB in the first map and then the inserted clone in the second map and rewrite the use. Or maybe you see a simpler way? This was just the first that came to mind.) chandlerc: This causes the cloning to be quadratic in the number of uses as we may have a single clone for…
				NumLoopSunk++;
				I.moveBefore(&*MoveBB->getFirstInsertionPt());

				return true;
				}

				chandlercUnsubmitted Done Reply Inline Actions So, this has an important problem: it introduces a non-determinism into the compiler. The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone. However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is also something we would very much like to not depend on: the use list order. I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like: int i = 0; for (auto BB : L->blocks()) LoopBlockNumber[BB] = ++i; (Just pseudo code, but you get the idea.) That will punt the ordering requirement to LoopInfo which is I think the right place for it. chandlerc:* So, this has an important problem: it introduces a non-determinism into the compiler. The…
				/// Sinks instructions from loop's preheader to the loop body if the
				/// sum frequency of inserted copy is smaller than preheader's frequency.
				static bool sinkLoopInvariantInstructions(Loop &L, AAResults &AA, LoopInfo &LI,
				DominatorTree &DT,
				BlockFrequencyInfo &BFI,
				ScalarEvolution *SE) {
				BasicBlock *Preheader = L.getLoopPreheader();
				if (!Preheader)
				return false;

				const BlockFrequency PreheaderFreq = BFI.getBlockFreq(Preheader);
				// If there are no basic blocks with lower frequency than the preheader then
				chandlercUnsubmitted Done Reply Inline Actions It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are less hot than the preheader. I think it might be nice to clue the reader in that this isn't all the loop BBs. Maybe `SortedColdLoopBBs`? Or just `ColdLoopBBs`? If you make this change, I'd keep the name consistent throughout of course. Also, you use `<=` here, but `<` everywhere else I see, any particular reason to include BBs in this list with the same frequency? chandlerc: It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are…
				// we can avoid the detailed analysis as we will never find profitable sinking
				// opportunities.
				if (all_of(L.blocks(), [&](const BasicBlock *BB) {
				return BFI.getBlockFreq(BB) > PreheaderFreq;
				chandlercUnsubmitted Done Reply Inline Actions I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking". chandlerc: I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone…
				}))
				return false;

				bool Changed = false;
				AliasSetTracker CurAST(AA);

				// Compute alias set.
				for (BasicBlock *BB : L.blocks())
				CurAST.add(*BB);

				// Sort loop's basic blocks by frequency
				SmallVector<BasicBlock *, 10> ColdLoopBBs;
				SmallDenseMap<BasicBlock *, int, 16> LoopBlockNumber;
				int i = 0;
				chandlercUnsubmitted Done Reply Inline Actions Use a SmallDenseMap? Good to dodge allocations for small loops. chandlerc: Use a SmallDenseMap? Good to dodge allocations for small loops.
				for (BasicBlock *B : L.blocks())
				if (BFI.getBlockFreq(B) < BFI.getBlockFreq(L.getLoopPreheader())) {
				ColdLoopBBs.push_back(B);
				LoopBlockNumber[B] = ++i;
				}
				std::stable_sort(ColdLoopBBs.begin(), ColdLoopBBs.end(),
				[&](BasicBlock A, BasicBlock B) {
				return BFI.getBlockFreq(A) < BFI.getBlockFreq(B);
				});

				// Traverse preheader's instructions in reverse order becaue if A depends
				// on B (A appears after B), A needs to be sinked first before B can be
				// sinked.
				for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {
				Instruction I = &II++;
				if (!L.hasLoopInvariantOperands(I) \|\|
				!canSinkOrHoistInst(*I, &AA, &DT, &L, &CurAST, nullptr))
				continue;
				if (sinkInstruction(L, *I, ColdLoopBBs, LoopBlockNumber, LI, DT, BFI))
				Changed = true;
				}

				if (Changed && SE)
				SE->forgetLoopDispositions(&L);
				return Changed;
				}

				namespace {
				struct LegacyLoopSinkPass : public LoopPass {
				static char ID;
				LegacyLoopSinkPass() : LoopPass(ID) {
				initializeLegacyLoopSinkPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnLoop(Loop *L, LPPassManager &LPM) override {
				if (skipLoop(L))
				return false;

				auto *SE = getAnalysisIfAvailable<ScalarEvolutionWrapperPass>();
				return sinkLoopInvariantInstructions(
				*L, getAnalysis<AAResultsWrapperPass>().getAAResults(),
				getAnalysis<LoopInfoWrapperPass>().getLoopInfo(),
				getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
				getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI(),
				SE ? &SE->getSE() : nullptr);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<BlockFrequencyInfoWrapperPass>();
				getLoopAnalysisUsage(AU);
				}
				};
				}

				char LegacyLoopSinkPass::ID = 0;
				INITIALIZE_PASS_BEGIN(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(LoopPass)
				INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
				INITIALIZE_PASS_END(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false, false)

				Pass *llvm::createLoopSinkPass() { return new LegacyLoopSinkPass(); }

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
initializeEarlyCSEMemSSALegacyPassPass(Registry);		initializeEarlyCSEMemSSALegacyPassPass(Registry);
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyLegacyPassPass(Registry);		initializeIndVarSimplifyLegacyPassPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
		initializeLegacyLoopSinkPassPass(Registry);
initializeLoopDataPrefetchLegacyPassPass(Registry);		initializeLoopDataPrefetchLegacyPassPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangePass(Registry);		initializeLoopInterchangePass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {		void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createInstructionCombiningPass());		unwrap(PM)->add(createInstructionCombiningPass());
}		}

void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {		void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createJumpThreadingPass());		unwrap(PM)->add(createJumpThreadingPass());
}		}

		void LLVMAddLoopSinkPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createLoopSinkPass());
		}

void LLVMAddLICMPass(LLVMPassManagerRef PM) {		void LLVMAddLICMPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLICMPass());		unwrap(PM)->add(createLICMPass());
}		}

void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {		void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopDeletionPass());		unwrap(PM)->add(createLoopDeletionPass());
}		}

▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

test/Transforms/LICM/loopsink.ll

This file was added.

				; RUN: opt -S -loop-sink < %s \| FileCheck %s

				@g = global i32 0, align 4

				; b1
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				chandlercUnsubmitted Done Reply Inline Actions You can prune out these "Function Attrs" comments... See below. chandlerc: You can prune out these "Function Attrs" comments... See below.
				; \ / \|
				; b5 \|
				; \ /
				; b7
				; preheader: 1000
				; b2: 15
				; b3: 7
				; b4: 7
				; Sink load to b2
				davidxlUnsubmitted Done Reply Inline Actions b6 --> b7 davidxl: b6 --> b7
				; CHECK: t1
				; CHECK: .b2:
				; CHECK: load i32, i32* @g
				; CHECK: .b3:
				; CHECK-NOT: load i32, i32* @g
				define i32 @t1(i32, i32) #0 {
				davidxlUnsubmitted Done Reply Inline Actions add check-not of @g after preheader. davidxl: add check-not of @g after preheader.
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				davidxlUnsubmitted Done Reply Inline Actions add check-not @g after b3 and b4 davidxl: add check-not @g after b3 and b4
				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !1

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 %invariant, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, 100
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !3

				.exit:
				ret i32 10
				}
				davidxlUnsubmitted Done Reply Inline Actions Add a branch profile data here. davidxl: Add a branch profile data here.

				; b1
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				; \ /
				davidxlUnsubmitted Done Reply Inline Actions B6 --> b6 davidxl: B6 --> b6
				; b7
				; preheader: 500
				davidxlUnsubmitted Done Reply Inline Actions B3 --> b3 davidxl: B3 --> b3
				; b1: 16016
				; b3: 8
				; b6: 8
				; Sink load to b3 and b6
				davidxlUnsubmitted Done Reply Inline Actions This should be b7 davidxl: This should be b7
				; CHECK: t2
				; CHECK: .preheader:
				; CHECK-NOT: load i32, i32* @g
				; CHECK: .b3:
				; CHECK: load i32, i32* @g
				; CHECK: .b4:
				; CHECK: .b6:
				; CHECK: load i32, i32* @g
				; CHECK: .b7:
				define i32 @t2(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !2

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4, !prof !1

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 5, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, %invariant
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !3

				.exit:
				ret i32 10
				davidxlUnsubmitted Done Reply Inline Actions annotate with branch profile data davidxl: annotate with branch profile data
				}

				; b1
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				; \ /
				; b7
				davidxlUnsubmitted Done Reply Inline Actions B3 -> b3 davidxl: B3 -> b3
				; preheader: 500
				; b3: 8
				; b5: 16008
				; Do not sink load from preheader.
				davidxlUnsubmitted Done Reply Inline Actions b6 -> b7 davidxl: b6 -> b7
				; CHECK: t3
				; CHECK: .preheader:
				; CHECK: load i32, i32* @g
				; CHECK: .b1:
				; CHECK-NOT: load i32, i32* @g
				define i32 @t3(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !2

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4, !prof !1

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 5, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, %invariant
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, 5
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !3

				.exit:
				ret i32 10
				}

				; For single-BB loop with <=1 avg trip count, sink load to b1
				; CHECK: t4
				; CHECK: .preheader:
				; CHECK-not: load i32, i32* @g
				; CHECK: .b1:
				; CHECK: load i32, i32* @g
				; CHECK: .exit:
				define i32 @t4(i32, i32) #0 {
				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t1, %.b1 ], [ 0, %.preheader ]
				%t1 = add nsw i32 %invariant, %iv
				davidxlUnsubmitted Done Reply Inline Actions but this loop will be executed at least once per call of t4, so the loop body frequency should not be lower than entry frequency davidxl: but this loop will be executed at least once per call of t4, so the loop body frequency should…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions So the current algorithm is that even if the frequency is equal (as in this case), we still tend to sink because it will reduce live range. danielcdh: So the current algorithm is that even if the frequency is equal (as in this case), we still…
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b1, label %.exit, !prof !1

				.exit:
				ret i32 10
				}

				; b1
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				; \ /
				; b7
				; preheader: 1000
				; b2: 15
				; b3: 7
				; b4: 7
				; There is alias store in loop, do not sink load
				; CHECK: t5
				; CHECK: .preheader:
				; CHECK: load i32, i32* @g
				; CHECK: .b1:
				; CHECK-NOT: load i32, i32* @g
				define i32 @t5(i32, i32*) #0 {
				%3 = icmp eq i32 %0, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !1

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 %invariant, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = call i32 @foo()
				br label %.b7

				.b7:
				davidxlUnsubmitted Done Reply Inline Actions This test can be simplified a little by just making an external call here. davidxl: This test can be simplified a little by just making an external call here.
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !3

				.exit:
				ret i32 10
				}

				declare i32 @foo()

				!1 = !{!"branch_weights", i32 1, i32 2000}
				!2 = !{!"branch_weights", i32 2000, i32 1}
				!3 = !{!"branch_weights", i32 100, i32 1}
				chandlercUnsubmitted Done Reply Inline Actions Please try to minimize function attributes you have in your test cases. You may not need any. If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary). chandlerc: Please try to minimize function attributes you have in your test cases. You may not need any.
				chandlercUnsubmitted Done Reply Inline Actions Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff). chandlerc: Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff).

test/Transforms/LICM/sink.ll

This file was added.

				; RUN: opt -S -licm < %s \| FileCheck %s --check-prefix=CHECK-LICM
				; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK

				; Original source code:
				; int g;
				; int foo(int p, int x) {
				chandlercUnsubmitted Done Reply Inline Actions Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms. chandlerc: Unless your tests depend on specific datalayout or triple, please avoid including them in the…
				; for (int i = 0; i != x; i++)
				; if (__builtin_expect(i == p, 0)) {
				; x += g; x *= g;
				; }
				; return x;
				; }
				;
				; Load of global value g should not be hoisted to preheader.

				@g = global i32 0, align 4

				define i32 @foo(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %._crit_edge, label %.lr.ph.preheader

				.lr.ph.preheader:
				br label %.lr.ph
				chandlercUnsubmitted Done Reply Inline Actions Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly. chandlerc: Same comments as above about function attributes. Also, please don't use C++ mangled names, but…

				; CHECK-LICM: .lr.ph.preheader:
				; CHECK-LICM: load i32, i32* @g
				; CHECK-LICM: br label %.lr.ph

				.lr.ph:
				%.03 = phi i32 [ %8, %.combine ], [ 0, %.lr.ph.preheader ]
				%.012 = phi i32 [ %.1, %.combine ], [ %1, %.lr.ph.preheader ]
				%4 = icmp eq i32 %.03, %0
				br i1 %4, label %.then, label %.combine, !prof !1

				.then:
				%5 = load i32, i32* @g, align 4
				%6 = add nsw i32 %5, %.012
				%7 = mul nsw i32 %6, %5
				br label %.combine

				; CHECK-SINK: .then:
				; CHECK-SINK: load i32, i32* @g
				; CHECK-SINK: br label %.combine

				.combine:
				%.1 = phi i32 [ %7, %.then ], [ %.012, %.lr.ph ]
				%8 = add nuw nsw i32 %.03, 1
				%9 = icmp eq i32 %8, %.1
				br i1 %9, label %._crit_edge.loopexit, label %.lr.ph

				._crit_edge.loopexit:
				%.1.lcssa = phi i32 [ %.1, %.combine ]
				br label %._crit_edge

				._crit_edge:
				%.01.lcssa = phi i32 [ 0, %2 ], [ %.1.lcssa, %._crit_edge.loopexit ]
				ret i32 %.01.lcssa
				}

				!1 = !{!"branch_weights", i32 1, i32 2000}

This is an archive of the discontinued LLVM Phabricator instance.

Add Loop Sink pass to reverse the LICM based of basic block frequency.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 76055

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/LICM.cpp

lib/Transforms/Scalar/LoopSink.cpp

lib/Transforms/Scalar/Scalar.cpp

test/Transforms/LICM/loopsink.ll

test/Transforms/LICM/sink.ll

Add Loop Sink pass to reverse the LICM based of basic block frequency.
ClosedPublic