This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
InitializePasses.h
-
LinkAllPasses.h
-
Transforms/
1/1
Scalar.h
-
Utils/
1/1
Local.h
5/6
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
-
CMakeLists.txt
-
GVN.cpp
3/5
LICM.cpp
79/83
LoopSink.cpp
-
Scalar.cpp
-
Utils/
-
Local.cpp
-
SimplifyInstructions.cpp
-
test/Transforms/LICM/
-
Transforms/
-
LICM/
15/15
loopsink.ll
2/2
sink.ll

Differential D22778

Add Loop Sink pass to reverse the LICM based of basic block frequency.
ClosedPublic

Authored by danielcdh on Jul 25 2016, 2:05 PM.

Download Raw Diff

Details

Reviewers

chandlerc
davidxl
hfinkel

Commits

rGb94c09baa058: Add Loop Sink pass to reverse the LICM based of basic block frequency.
rL285308: Add Loop Sink pass to reverse the LICM based of basic block frequency.

Summary

LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

davidxl added inline comments.Jul 30 2016, 9:31 PM

lib/Transforms/Scalar/LoopSink.cpp
38	Add statistics for number of instructions sinked etc.
79	LoopBase class has a member method 'contains' which can be used.
99	Remove dead code
134	add documentation for the method.
140	To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if there are any blocks that are really cold (colder than preheader). If there are not, early return.
144	add a comment for this variable.
160	if -> is
179	Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum, do not use cdt.
200	if T >= ... early return
205	Add debug trace here

Have you talked to anyone about the design for this?

I know Daniel Jasper, Quentin, and several others have looked at similar things before. Previous attempts have focused on using MachineLICM to do sinking as well as hoisting. While I don't have a strong opinion about one design over the other, we should be consistent about the plan here, and possibly consolidate some of the logic.

add more test and address David's comments.

davidxl added inline comments.Aug 11 2016, 4:22 PM

lib/Transforms/Scalar/LoopSink.cpp
123	This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop -- that should be extracted as a common utility.
132	Is the first check needed?
148	Contnue if BBs is empty
156	This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the sum of SinkBB and N. In other words, division should be used, not multiplication
184	Is the formatting correct here?

update

lib/Transforms/Scalar/LoopSink.cpp
123	Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy passmanager: Build a base class ASTLoopTransformation, and make LookInvariantCodeMotion and LoopSink sub class of it. LoopToAliasSetMap and collectAliasInfoForLoop should be protected member of the base class. The logic in the sub class will also be shared with the new pass manager. Build a base class ASTLoopLegacyPass which inherits from LoopPass, and LICMLegacyPass and LoopSinkLegacyPass subclass of it. cloneBasicBlockAnalysis, deleteAnalysisValue and deleteAnalysisLoop should be private member of the base class. Base class also need to have a ASTLoopTransformation pointer to invoke the actual logic shared with new pass manager. This is definitely doable, but seems an over kill to me because: collectAliasInfoForLoop is an optimization for compile time. And it is not yet available in the new pass manager. The refactoring mainly focuses on abstraction of the old pass manager, which will be replaced by new pass manager soon. There is much complexity involved because new pass manage does not support this optimization, and we need to make it fall back to what we do right now (add all basic blocks to AST) without introducing memory leak. Comments?
132	I think yes because if there is loop variant in its operand, sinking it into the loop may change the value every iteration.
156	It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's frequency, then the check will fail and goes to "else" branch: picking the CDT instead SinkBB.
184	I used clang-format --style=llvm for the formatting.

davidxl added inline comments.Aug 16 2016, 1:26 PM

test/Transforms/LICM/loopsink.ll
18	b6 --> b7
24	add check-not of @g after preheader.
27	add check-not @g after b3 and b4
66	Add a branch profile data here.
75	B6 --> b6
77	B3 --> b3
81	This should be b7
132	annotate with branch profile data
143	B3 -> b3
147	b6 -> b7
211	but this loop will be executed at least once per call of t4, so the loop body frequency should not be lower than entry frequency
272	This test can be simplified a little by just making an external call here.

update tests

danielcdh marked 6 inline comments as done.Aug 16 2016, 2:43 PM

danielcdh added inline comments.

test/Transforms/LICM/loopsink.ll
211	So the current algorithm is that even if the frequency is equal (as in this case), we still tend to sink because it will reduce live range.

I don't have further comments.

Hal, does this patch look ok to you?

David

Are you going to have a separate patch to hook this in the pass manager ?

lib/Transforms/Scalar/LoopSink.cpp
142	Isn't it still okay to try to sink in outside the loop if the user block is cold enough?
162	Why not early return if frequency of SinkBB is greater than PreheaderFreq.
179	I guess you intend L->contains(LI->getLoopFor(N)) ?

update

lib/Transforms/Scalar/LoopSink.cpp
143	That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently.
180	good catch. Thanks!

In D22778#520053, @junbuml wrote:

Are you going to have a separate patch to hook this in the pass manager ?

Yes, I'll send another patch to hook it up in clang.

• dberlin added a subscriber: • dberlin.Aug 18 2016, 3:54 PM

• dberlin added inline comments.

include/llvm/Transforms/Utils/LoopUtils.h
474	The comment is not right or specific. First, this is used for both hoisting and sinking. Second, it does not say what "hoistable" means. In fact, this function is really checking "is aliased with loop", you probably should call it something like that and use that.
477	Ditto above
480	This should be "Return true if a non-memory instruction can be handled by the hoister/sinker" Please don't call it isHoistableInst and then use it for sinking :)
lib/Transforms/Scalar/LICM.cpp
511	This is not really right, it returns whether a non-memory instruction is hoistable. :)
lib/Transforms/Scalar/LoopSink.cpp
42	"Don't sink instructions that require cloning unless they execute less than this percent of the time" (or whatever)
87	"sunk into loop body"
89	CurLoop is unused?
90	s/hoist/sink/
110	bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock *BB) { return BFI->getBlockFreq(BB) <= PreHeaderFreq});
133	I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing them when necessary?
170	Please factor this out into FindSinkBlocks or something. This is non-deterministic, because you are iterating over a denseset. I am also confused by this placement strategy. You are not ordering the blocks in any particular processing order, so you may not actually choose the best sink points, as once you NCA something high in the domtree and something low, NCA will always be something high in the domtree. If you ordered it so it was the lowest things first (using the DFS numbers or whatever), you may decide multiple intermediate placements are cheaper than what you are doing here.
191	Needs a comment
199	This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips if it's empty). Then you should simply move the i == 0 case outside of the loop, and the loop is just doing the insertions.
205	This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one :P)

refactor

include/llvm/Transforms/Utils/LoopUtils.h
480	Refactored code to remove these functions.
lib/Transforms/Scalar/LICM.cpp
511–512	refactored code to remove this.
lib/Transforms/Scalar/LoopSink.cpp
134	We need reverse iterator because instructions in the back of the BB may depend on the instructions in the front, thus it needs to be sunk first before other instructions can be sunk.
171	Good point. Refactored the code and updated the algorithm to iterate from cold blocks top ensure optimal.
200	SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not reach here. The i==0 check is to distinguish between the first SinkBB (that we use move instead of insert) and the later SinkBB (that we make a copy for each insert).

refactor

minor drive by comments

include/llvm/Transforms/Scalar.h
141	Missing comments
include/llvm/Transforms/Utils/Local.h
326	Clarification needed. This doesn't really tell me what the parameter does. In fact, there's nothing that says what the BB param is for either. Can you fix that?
lib/Transforms/Scalar/LoopSink.cpp
11	How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but updating the comment with context might be helpful.)

update comments

Integrate with reverse-iterator enhancement.

looks good to me too.

This revision is now accepted and ready to land.Sep 1 2016, 2:14 PM

So, I don't think the code is quite ready to go into the tree yet. There are a bunch of minor issues that should be cleaned up. None of these are really big (it seems like you all have sorted out the algorithmic and high level design stuff), but I think they shuold be addressed before the code goes in. Especially the refactoring bit I suggest below.

lib/Transforms/Scalar/LoopSink.cpp
128–138	I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again all for basic blocks in different containers. And `AddedBBs` doesn't really tell the reader much about what the container is doing. Compare that to `SortedLoopBBs` which says exactly what it is. `SinkBBs` might also benefit from a slightly better name (and the function name might similarly benefit).
129	Usually it is better to keep a set outside the loop and clear it on each iteration...
159–161	Why the variable rather than inlining this? Also, can you just call any_of directly since we have a using declaration and this is a range variant that doesn't exist in the standard?
172–175	This comment again doesn't parse for me, but isn't this dead code now that you're just directly using the reverse iterators?

chandlerc added inline comments.Sep 1 2016, 3:47 PM

lib/Transforms/Scalar/LICM.cpp
438–449	Can you split these refactorings into a separate patch please? They seem strictly beneficial and unrelated, and will make for example reverts messier if left in as part of this patch. I have several minor and boring comments on the refactorings, but it seems better to make them on a dedicated patch than to clutter this thread with them. (Just to be clear, I'd would leave it a static function, and just get the API you want and the implementation improvements. Then in this patch you can just make it an external function.)
lib/Transforms/Scalar/LoopSink.cpp
89–92	I'm having a hard time understanding what this comment is trying to say. Can you try to re-word this to be more clear (and more clearly worded)?
102	Since this is new code, please use the more modern form of doxygen throughout: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments (I've also updated those to more accurately reflect that we use auto-brief new rather than explicit '\brief ...' sections.)
102–106	I feel like this comment too could be much more clear. First, it isn't clear without a lot of context what the purpose of this would be. I'm guessing you mean something like find a candidate set of basic blocks to sink into? "Dominate BBs" - this is ambiguous. Do all returned basic blocks need to dominate the set of blocks in BBs? Or is it more that for each block in BBs, at least one block in the returned set must dominate that block? A better name for the parameter than "BBs" would probably help here. The frequency constraint isn't really explained. Why is that important? Give the reader more help to understand what the code will end up doing.
107–108	Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common, it would be better to use SmallPtrSetImpl as an input and SmallPtrSet as a result with a reasonable small size optimization.
126–127	This comment doesn't parse: "that are dominated it" seems to have a grammar error.

Integrate Chandler's comment.

clang-format

ping...

(Trying to first clarify the split-off of the patch I'm suggesting...)

lib/Transforms/Scalar/LICM.cpp
438–449	I see that we got confused here and in the other review. The part of this refactoring I do think makes sense to split out and send for review independently is changing the signature (for example, removing TargetLibraryInfo) and re-organizing the implementation. The only part I think needs to happen with this patch is making this routine be a public routine in the 'llvm' namespace. Does that make more sense?

rebase and update

Herald added a subscriber: beanz. · View Herald TranscriptSep 8 2016, 10:29 AM

danielcdh added inline comments.Sep 8 2016, 10:37 AM

lib/Transforms/Scalar/LICM.cpp
438–449	Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL. Please do not look at LICM changes in this patch for now because it also includes the refactoring bit. I'll rebase once D24168 is closed.

update the logic to replace dominated uses after sinking.

Herald added a subscriber: mgorny. · View Herald TranscriptSep 14 2016, 2:47 PM

ping...

rebase

Herald added a subscriber: modocache. · View Herald TranscriptOct 3 2016, 12:03 PM

First batch of comments. While I'll probably have a some more minor comments later, there are a couple of particularly interesting ones that I wanted to go ahead and send out.

include/llvm/Transforms/Utils/LoopUtils.h
474–476	Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".
lib/Transforms/Scalar/LoopSink.cpp
11–15	Some grammar issues here: "all instructions" -> "all of the instructions" "in loop preheader" -> "in the loop preheader" "sink it" -> "sink them" "the Sink pass that it only" -> "the Sink pass in that it only" "in loop's preheader" -> "in the loop's preheader". I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of: This pass does the inverse transformation of what LICM does. It traverses all of the instructions ... It differs from the Sink pass ... Does that make sense?
42	Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not.
49	I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration.
102	Naming convention: adjustedSumFreq
130	naming: findBBsToSinkInto
136–144	This shouldn't be done each time we try to sink an instruction. This should be pre-computed once for the loop and re-used for each instruction we try to sink.
149–171	This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large. Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...)
181–182	This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here rather than the old name? Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact.
190–192	I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities." I would also find this easier to read without the negation as: if (all_of(... return BFI->getBlockFreq(BB) > PreheaderFreq;
211	This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me. Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops.
215	If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment?
216–217	I would use two ifs here since one needs its own comment (and it is a nice comment!)
217	Why not use `L->contains(UI)`?
223	Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function.
235	So, this has an important problem: it introduces a non-determinism into the compiler. The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone. However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is also something we would very much like to not depend on: the use list order. I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like: int i = 0; for (auto *BB : L->blocks()) LoopBlockNumber[BB] = ++i; (Just pseudo code, but you get the idea.) That will punt the ordering requirement to LoopInfo which is I think the right place for it.
251	I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking".

Thanks for the reviews!

lib/Transforms/Scalar/LoopSink.cpp
149–171	I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(LM), but if optimal solution is desired, O(NL*M) is the best I can get. Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend). How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases?
215	Not sure if I get this right, do you mean update the comment (as I just did) to make it less redundant?
217	Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1->contains(I1) --> true Loop2->contains(I2) --> true Loop1->contains(I2) --> false For this check we want to make sure I1 and I2 both return true.

integrate Chandler's reviews

missed one comment

What is the status of this patch? Any more comments need to be addressed?

Chandler said he has more comment.

M is the number of use BBs.

The pass already filters out loops which do not have any cold blocks -- this effectively filters out most of the loops in reality so the compile time impact will be minimal. Further more, the following can be done:

only collect cold bbs in the loop body that is colder than header and sort them instead
skip any instructions with use BBs that are not member of the cold BBs collected in 1).

Example of parent BB being colder than the use BB?

Will try to make a full pass through, thanks for the extensive updates Dehao! One specific point of discussion below:

lib/Transforms/Scalar/LoopSink.cpp
149–171	I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction. The other thing is that if this scales in compile time by N then it scales in compile time by how much effect it is having. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions. I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop?

add max use threshold

This is really close. Some minor nit picks and a few more interesting comments below.

include/llvm/Transforms/Utils/LoopUtils.h
474–476	You changed one 'nullptr' to 'null' but missed the other.
lib/Transforms/Scalar/LoopSink.cpp
12–13	"and sink them/ to" -> "and sinks them to"
102	"/p" -> "\p"
160	The number of user instructions isn't really the right thing to apply the threshold to as that doesn't directly change the cost. The idea is that we need the size of `BBsToSinkInto` to be a small constant in order for the search for the coldest dominating set to be "just" linear in the number of blocks in the loop. So while a threshold of "40" may make sense for number of user instructions, I suspect the threshold should be much smaller when applied to the size of `BBsToSinkInto`. I also think you should add two comments about this. One, you should comment to the `findBBsToSinkInto` function clarifying the algorithmic complexity (That it O(N * M) or O(M^2) where N is SortedLoopBBs.size() and M is BBsToSinkInto.size()), and you should mention where you check this threshold that the reason is because we're going to call `findBBsToSinkInto` which would go quadratic if we didn't set a cap. The reason for all of this is that I'm worried some future maintainer will come along and not really understand how risk it is to adjust these thresholds so I think it is important to document the implications. I still think we will long-term need a better algorithmic approach here as I suspect we'll find annoying cases where the threshold blocks an important optimization (such as when there are way too many initial BBsToInsertInto but there are a small number of common dominating blocks). But I understand this is one of the really hard problems (its the same as optimal PHI placement and a bunch of other ones), and I don't want to hold up your patch on a comprehensive approach here. On an unrelated note, you should also document that this threshold has a secondary function: it places an upper bound on how much code growth we may trigger here. I'd document this in particular as that seems somewhat accidental and I suspect we may long-term want a better threshold for that. I would in fact encourage you to leave a FIXME to adjust this for min_size and opt_size.
179–180	We generally prefer calling `.empty()` to testing `.size()` against zero.
185	Here you don't need stable sort since this is a total ordering. You should just use std::sort and mention that this is a known total ordering in a comment. You could do that in an overall comment htat explains what you're doing here: // Copy the final BBs into a vector and sort them using the total ordering // of the loop block numbers as iterating the set doesn't give a useful // order. No need to stable sort as the block numbers are a total ordering.
195	You didn't actually switch to the sorted list here. Also, you can just use a range based for loop here.
219	This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe `sinkLoopInvariantInstructions`? Also, I think this should be a static function.
242–247	It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are less hot than the preheader. I think it might be nice to clue the reader in that this isn't all the loop BBs. Maybe `SortedColdLoopBBs`? Or just `ColdLoopBBs`? If you make this change, I'd keep the name consistent throughout of course. Also, you use `<=` here, but `<` everywhere else I see, any particular reason to include BBs in this list with the same frequency?

update

Thanks for the reviews.

I also added an overall algorithm description at file level per Madhur's suggest.

lib/Transforms/Scalar/LoopSink.cpp
160	You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the most computation intensive part of the algorithm.
195	The reason I used iterator here is because we need to handle the first entry in a different way.

Very cool. I think this patch LGTM with the comments below addressed (documentation fixes, simple changes, a fixme, a bunch of minor test cleanup). Feel free to submit. I've asked for one somewhat immediate follow-up patch, and it'd also be good to get a patch to put this into the pipeline behind a flag so folks can test the size impact.

lib/Transforms/Scalar/LoopSink.cpp
14–23	The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions. I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly: "in a way that" -> "in the following ways" "prehead" -> "preahader" "find optimal" -> "find the optimal"
85	I would use `SmallPtrSetImpl<BasicBlock *>` here and elsewhere on API boundaries where you can below.
213	So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point.
221–229	This causes the cloning to be quadratic in the number of uses as we may have a single clone for each use. I know we have an upper bound, but this is still pretty slow (the use list is slow to walk). I'd suggest you fix this in a follow-up patch though (I think it'll be a lot of code and easier to review as a follow-up), just leave a short FIXME saying that this is slow and may be quadratic in the number of uses. (For the follow-up patch, the approach I'd suggest is that as you build up BBsToInsertInto, you also build up a map from UseBBs to the dominating BB that will be inserted into. Then you can insert into each BBsToInsertInto here without rewriting any uses but building up a map from those BBs to the inserted clones. Finally, you can do a single walk over the uses and for each one look up the inserted BB in the first map and then the inserted clone in the second map and rewrite the use. Or maybe you see a simpler way? This was just the first that came to mind.)
265	Use a SmallDenseMap? Good to dodge allocations for small loops.
test/Transforms/LICM/loopsink.ll
9	You can prune out these "Function Attrs" comments... See below.
293	Please try to minimize function attributes you have in your test cases. You may not need any. If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary).
295–304	Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff).
test/Transforms/LICM/sink.ll
5–6	Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms.
23	Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly.

Thanks for the review!

Herald added a subscriber: anna. · View Herald TranscriptOct 27 2016, 9:24 AM

danielcdh closed this revision.Oct 27 2016, 9:39 AM

davidxl mentioned this in D65060: [LICM] Make Loop ICM profile aware.Jul 22 2019, 11:45 AM

wenlei mentioned this in D152772: [LoopSink] Allow sinking to PHI-use.Jun 13 2023, 8:48 AM

Revision Contents

Path

Size

include/

llvm/

InitializePasses.h

1 line

LinkAllPasses.h

1 line

Transforms/

Scalar.h

7 lines

Utils/

Local.h

6 lines

LoopUtils.h

10 lines

lib/

Transforms/

Scalar/

1 line

4 lines

46 lines

221 lines

5 lines

Utils/

Local.cpp

6 lines

SimplifyInstructions.cpp

3 lines

test/

Transforms/

LICM/

loopsink.ll

303 lines

sink.ll

73 lines

Diff 68880

include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	void initializeInstructionCombiningPassPass(PassRegistry&);			void initializeInstructionCombiningPassPass(PassRegistry&);
	void initializeInstructionSelectPass(PassRegistry &);			void initializeInstructionSelectPass(PassRegistry &);
	void initializeInterleavedAccessPass(PassRegistry &);			void initializeInterleavedAccessPass(PassRegistry &);
	void initializeInternalizeLegacyPassPass(PassRegistry&);			void initializeInternalizeLegacyPassPass(PassRegistry&);
	void initializeIntervalPartitionPass(PassRegistry&);			void initializeIntervalPartitionPass(PassRegistry&);
	void initializeJumpThreadingPass(PassRegistry&);			void initializeJumpThreadingPass(PassRegistry&);
	void initializeLCSSAWrapperPassPass(PassRegistry &);			void initializeLCSSAWrapperPassPass(PassRegistry &);
	void initializeLegacyLICMPassPass(PassRegistry&);			void initializeLegacyLICMPassPass(PassRegistry&);
				void initializeLegacyLoopSinkPassPass(PassRegistry&);
	void initializeLazyBranchProbabilityInfoPassPass(PassRegistry&);			void initializeLazyBranchProbabilityInfoPassPass(PassRegistry&);
	void initializeLazyBlockFrequencyInfoPassPass(PassRegistry&);			void initializeLazyBlockFrequencyInfoPassPass(PassRegistry&);
	void initializeLazyValueInfoWrapperPassPass(PassRegistry&);			void initializeLazyValueInfoWrapperPassPass(PassRegistry&);
	void initializeLintPass(PassRegistry&);			void initializeLintPass(PassRegistry&);
	void initializeLiveDebugValuesPass(PassRegistry&);			void initializeLiveDebugValuesPass(PassRegistry&);
	void initializeLiveDebugVariablesPass(PassRegistry&);			void initializeLiveDebugVariablesPass(PassRegistry&);
	void initializeLiveIntervalsPass(PassRegistry&);			void initializeLiveIntervalsPass(PassRegistry&);
	void initializeLiveRegMatrixPass(PassRegistry&);			void initializeLiveRegMatrixPass(PassRegistry&);
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines

include/llvm/LinkAllPasses.h

Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	ForcePassLinking() {
(void) llvm::createIPConstantPropagationPass();		(void) llvm::createIPConstantPropagationPass();
(void) llvm::createIPSCCPPass();		(void) llvm::createIPSCCPPass();
(void) llvm::createInductiveRangeCheckEliminationPass();		(void) llvm::createInductiveRangeCheckEliminationPass();
(void) llvm::createIndVarSimplifyPass();		(void) llvm::createIndVarSimplifyPass();
(void) llvm::createInstructionCombiningPass();		(void) llvm::createInstructionCombiningPass();
(void) llvm::createInternalizePass();		(void) llvm::createInternalizePass();
(void) llvm::createLCSSAPass();		(void) llvm::createLCSSAPass();
(void) llvm::createLICMPass();		(void) llvm::createLICMPass();
		(void) llvm::createLoopSinkPass();
(void) llvm::createLazyValueInfoPass();		(void) llvm::createLazyValueInfoPass();
(void) llvm::createLoopExtractorPass();		(void) llvm::createLoopExtractorPass();
(void) llvm::createLoopInterchangePass();		(void) llvm::createLoopInterchangePass();
(void) llvm::createLoopSimplifyPass();		(void) llvm::createLoopSimplifyPass();
(void) llvm::createLoopSimplifyCFGPass();		(void) llvm::createLoopSimplifyCFGPass();
(void) llvm::createLoopStrengthReducePass();		(void) llvm::createLoopStrengthReducePass();
(void) llvm::createLoopRerollPass();		(void) llvm::createLoopRerollPass();
(void) llvm::createLoopUnrollPass();		(void) llvm::createLoopUnrollPass();
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 132 Lines • ▼ Show 20 Lines
	FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);			FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				reamesUnsubmitted Done Reply Inline Actions Missing comments reames: Missing comments
	//			//
				// LoopSink - This pass sinks invariants from preheader to loop body where
				// frequency is lower than loop preheader.
				//
				Pass *createLoopSinkPass();

				//===----------------------------------------------------------------------===//
				//
	// LoopInterchange - This pass interchanges loops to provide a more			// LoopInterchange - This pass interchanges loops to provide a more
	// cache-friendly memory access patterns.			// cache-friendly memory access patterns.
	//			//
	Pass *createLoopInterchangePass();			Pass *createLoopInterchangePass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopStrengthReduce - This pass is strength reduces GEP instructions that use			// LoopStrengthReduce - This pass is strength reduces GEP instructions that use
	▲ Show 20 Lines • Show All 373 Lines • Show Last 20 Lines

include/llvm/Transforms/Utils/Local.h

	Show First 20 Lines • Show All 314 Lines • ▼ Show 20 Lines
	///			///
	/// Metadata not listed as known via KnownIDs is removed			/// Metadata not listed as known via KnownIDs is removed
	void combineMetadata(Instruction K, const Instruction J, ArrayRef<unsigned> KnownIDs);			void combineMetadata(Instruction K, const Instruction J, ArrayRef<unsigned> KnownIDs);

	/// Replace each use of 'From' with 'To' if that use is dominated by			/// Replace each use of 'From' with 'To' if that use is dominated by
	/// the given edge. Returns the number of replacements made.			/// the given edge. Returns the number of replacements made.
	unsigned replaceDominatedUsesWith(Value From, Value To, DominatorTree &DT,			unsigned replaceDominatedUsesWith(Value From, Value To, DominatorTree &DT,
	const BasicBlockEdge &Edge);			const BasicBlockEdge &Edge);

	/// Replace each use of 'From' with 'To' if that use is dominated by			/// Replace each use of 'From' with 'To' if that use is dominated by
	/// the end of the given BasicBlock. Returns the number of replacements made.			/// the end of 'BB'. Returns the number of replacements made.
				/// Replace use of 'From' with 'To' in 'BB' if 'IncludeSelf' is true.
				reamesUnsubmitted Done Reply Inline Actions Clarification needed. This doesn't really tell me what the parameter does. In fact, there's nothing that says what the BB param is for either. Can you fix that? reames: Clarification needed. This doesn't really tell me what the parameter does. In fact, there's…
	unsigned replaceDominatedUsesWith(Value From, Value To, DominatorTree &DT,			unsigned replaceDominatedUsesWith(Value From, Value To, DominatorTree &DT,
	const BasicBlock *BB);			const BasicBlock *BB, bool IncludeSelf);


	/// Return true if the CallSite CS calls a gc leaf function.			/// Return true if the CallSite CS calls a gc leaf function.
	///			///
	/// A leaf function is a function that does not safepoint the thread during its			/// A leaf function is a function that does not safepoint the thread during its
	/// execution. During a call or invoke to such a function, the callers stack			/// execution. During a call or invoke to such a function, the callers stack
	/// does not have to be made parseable.			/// does not have to be made parseable.
	///			///
	Show All 37 Lines

include/llvm/Transforms/Utils/LoopUtils.h

Show All 15 Lines

#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/EHPersonalities.h"		#include "llvm/Analysis/EHPersonalities.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"

namespace llvm {		namespace llvm {
		class AAResults;
class AliasSet;		class AliasSet;
class AliasSetTracker;		class AliasSetTracker;
class AssumptionCache;		class AssumptionCache;
class BasicBlock;		class BasicBlock;
class DataLayout;		class DataLayout;
class DominatorTree;		class DominatorTree;
		class Instruction;
class Loop;		class Loop;
class LoopInfo;		class LoopInfo;
class Pass;		class Pass;
class PredicatedScalarEvolution;		class PredicatedScalarEvolution;
class PredIteratorCache;		class PredIteratorCache;
class ScalarEvolution;		class ScalarEvolution;
class SCEV;		class SCEV;
class TargetLibraryInfo;		class TargetLibraryInfo;
▲ Show 20 Lines • Show All 424 Lines • ▼ Show 20 Lines	void addStringMetadataToLoop(Loop TheLoop, const char MDString,
unsigned V = 0);		unsigned V = 0);

/// Helper to consistently add the set of standard passes to a loop pass's \c		/// Helper to consistently add the set of standard passes to a loop pass's \c
/// AnalysisUsage.		/// AnalysisUsage.
///		///
/// All loop passes should call this as part of implementing their \c		/// All loop passes should call this as part of implementing their \c
/// getAnalysisUsage.		/// getAnalysisUsage.
void getLoopAnalysisUsage(AnalysisUsage &AU);		void getLoopAnalysisUsage(AnalysisUsage &AU);
}

		/// canSinkOrHoistInst - Return true if the hoister and sinker can handle this
		/// instruction. If SafetyInfo is not nullptr, check if the instruction can
		dberlinUnsubmitted Done Reply Inline Actions The comment is not right or specific. First, this is used for both hoisting and sinking. Second, it does not say what "hoistable" means. In fact, this function is really checking "is aliased with loop", you probably should call it something like that and use that. dberlin: 1. The comment is not right or specific. First, this is used for both hoisting and sinking.
		/// execute speculatively.
		bool canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
		chandlercUnsubmitted Done Reply Inline Actions Here and elsewhere in comments, I would just say "is null" rather than "is nullptr". chandlerc: Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".
		chandlercUnsubmitted Done Reply Inline Actions You changed one 'nullptr' to 'null' but missed the other. chandlerc: You changed one 'nullptr' to 'null' but missed the other.
		Loop CurLoop, AliasSetTracker CurAST,
		dberlinUnsubmitted Done Reply Inline Actions Ditto above dberlin: Ditto above
		LoopSafetyInfo *SafetyInfo);
		}
#endif		#endif
		dberlinUnsubmitted Done Reply Inline Actions This should be "Return true if a non-memory instruction can be handled by the hoister/sinker" Please don't call it isHoistableInst and then use it for sinking :) dberlin: This should be "Return true if a non-memory instruction can be handled by the hoister/sinker"…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Refactored code to remove these functions. danielcdh: Refactored code to remove these functions.

lib/Transforms/Scalar/CMakeLists.txt

Show All 11 Lines	add_llvm_library(LLVMScalarOpts
Float2Int.cpp		Float2Int.cpp
GuardWidening.cpp		GuardWidening.cpp
GVN.cpp		GVN.cpp
GVNHoist.cpp		GVNHoist.cpp
InductiveRangeCheckElimination.cpp		InductiveRangeCheckElimination.cpp
IndVarSimplify.cpp		IndVarSimplify.cpp
JumpThreading.cpp		JumpThreading.cpp
LICM.cpp		LICM.cpp
		LoopSink.cpp
LoadCombine.cpp		LoadCombine.cpp
LoopDeletion.cpp		LoopDeletion.cpp
LoopDataPrefetch.cpp		LoopDataPrefetch.cpp
LoopDistribute.cpp		LoopDistribute.cpp
LoopIdiomRecognize.cpp		LoopIdiomRecognize.cpp
LoopInstSimplify.cpp		LoopInstSimplify.cpp
LoopInterchange.cpp		LoopInterchange.cpp
LoopLoadElimination.cpp		LoopLoadElimination.cpp
Show All 36 Lines

lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 1,956 Lines • ▼ Show 20 Lines	while (!Worklist.empty()) {

// Replace all occurrences of 'LHS' with 'RHS' everywhere in the scope. As		// Replace all occurrences of 'LHS' with 'RHS' everywhere in the scope. As
// LHS always has at least one use that is not dominated by Root, this will		// LHS always has at least one use that is not dominated by Root, this will
// never do anything if LHS has only one use.		// never do anything if LHS has only one use.
if (!LHS->hasOneUse()) {		if (!LHS->hasOneUse()) {
unsigned NumReplacements =		unsigned NumReplacements =
DominatesByEdge		DominatesByEdge
? replaceDominatedUsesWith(LHS, RHS, *DT, Root)		? replaceDominatedUsesWith(LHS, RHS, *DT, Root)
: replaceDominatedUsesWith(LHS, RHS, *DT, Root.getStart());		: replaceDominatedUsesWith(LHS, RHS, *DT, Root.getStart(), false);

Changed \|= NumReplacements > 0;		Changed \|= NumReplacements > 0;
NumGVNEqProp += NumReplacements;		NumGVNEqProp += NumReplacements;
}		}

// Now try to deduce additional equalities from this one. For example, if		// Now try to deduce additional equalities from this one. For example, if
// the known equality was "(A != B)" == "false" then it follows that A and B		// the known equality was "(A != B)" == "false" then it follows that A and B
// are equal in the scope. Only boolean equalities with an explicit true or		// are equal in the scope. Only boolean equalities with an explicit true or
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (CmpInst *Cmp = dyn_cast<CmpInst>(LHS)) {
// looking for an instruction realizing it: there cannot be one!		// looking for an instruction realizing it: there cannot be one!
if (Num < NextNum) {		if (Num < NextNum) {
Value *NotCmp = findLeader(Root.getEnd(), Num);		Value *NotCmp = findLeader(Root.getEnd(), Num);
if (NotCmp && isa<Instruction>(NotCmp)) {		if (NotCmp && isa<Instruction>(NotCmp)) {
unsigned NumReplacements =		unsigned NumReplacements =
DominatesByEdge		DominatesByEdge
? replaceDominatedUsesWith(NotCmp, NotVal, *DT, Root)		? replaceDominatedUsesWith(NotCmp, NotVal, *DT, Root)
: replaceDominatedUsesWith(NotCmp, NotVal, *DT,		: replaceDominatedUsesWith(NotCmp, NotVal, *DT,
Root.getStart());		Root.getStart(), false);
Changed \|= NumReplacements > 0;		Changed \|= NumReplacements > 0;
NumGVNEqProp += NumReplacements;		NumGVNEqProp += NumReplacements;
}		}
}		}
// Ensure that any instruction in scope that gets the "A < B" value number		// Ensure that any instruction in scope that gets the "A < B" value number
// is replaced with false.		// is replaced with false.
// The leader table only tracks basic blocks, not edges. Only add to if we		// The leader table only tracks basic blocks, not edges. Only add to if we
// have the simple case where the edge dominates the end.		// have the simple case where the edge dominates the end.
▲ Show 20 Lines • Show All 708 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LICM.cpp

Show First 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	static bool isSafeToExecuteUnconditionally(const Instruction &Inst,
const Instruction *CtxI = nullptr);		const Instruction *CtxI = nullptr);
static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,		static bool pointerInvalidatedByLoop(Value *V, uint64_t Size,
const AAMDNodes &AAInfo,		const AAMDNodes &AAInfo,
AliasSetTracker *CurAST);		AliasSetTracker *CurAST);
static Instruction *		static Instruction *
CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,		CloneInstructionInExitBlock(Instruction &I, BasicBlock &ExitBlock, PHINode &PN,
const LoopInfo *LI,		const LoopInfo *LI,
const LoopSafetyInfo *SafetyInfo);		const LoopSafetyInfo *SafetyInfo);
static bool canSinkOrHoistInst(Instruction &I, AliasAnalysis *AA,
DominatorTree DT, TargetLibraryInfo TLI,
Loop CurLoop, AliasSetTracker CurAST,
LoopSafetyInfo *SafetyInfo);

namespace {		namespace {
struct LoopInvariantCodeMotion {		struct LoopInvariantCodeMotion {
bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,		bool runOnLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
TargetLibraryInfo TLI, ScalarEvolution SE, bool DeleteAST);		TargetLibraryInfo TLI, ScalarEvolution SE, bool DeleteAST);

DenseMap<Loop , AliasSetTracker > &getLoopToAliasSetMap() {		DenseMap<Loop , AliasSetTracker > &getLoopToAliasSetMap() {
return LoopToAliasSetMap;		return LoopToAliasSetMap;
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator II = BB->end(); II != BB->begin();) {
}		}

// Check to see if we can sink this instruction to the exit blocks		// Check to see if we can sink this instruction to the exit blocks
// of the loop. We can do this if the all users of the instruction are		// of the loop. We can do this if the all users of the instruction are
// outside of the loop. In this case, it doesn't even matter if the		// outside of the loop. In this case, it doesn't even matter if the
// operands of the instruction are loop invariant.		// operands of the instruction are loop invariant.
//		//
if (isNotUsedInLoop(I, CurLoop, SafetyInfo) &&		if (isNotUsedInLoop(I, CurLoop, SafetyInfo) &&
canSinkOrHoistInst(I, AA, DT, TLI, CurLoop, CurAST, SafetyInfo)) {		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo)) {
++II;		++II;
Changed \|= sink(I, LI, DT, CurLoop, CurAST, SafetyInfo);		Changed \|= sink(I, LI, DT, CurLoop, CurAST, SafetyInfo);
}		}
}		}
return Changed;		return Changed;
}		}

/// Walk the specified region of the CFG (defined by all blocks dominated by		/// Walk the specified region of the CFG (defined by all blocks dominated by
Show All 36 Lines	for (BasicBlock::iterator II = BB->begin(), E = BB->end(); II != E;) {
continue;		continue;
}		}

// Try hoisting the instruction out to the preheader. We can only do this		// Try hoisting the instruction out to the preheader. We can only do this
// if all of the operands of the instruction are loop invariant and if it		// if all of the operands of the instruction are loop invariant and if it
// is safe to hoist the instruction.		// is safe to hoist the instruction.
//		//
if (CurLoop->hasLoopInvariantOperands(&I) &&		if (CurLoop->hasLoopInvariantOperands(&I) &&
canSinkOrHoistInst(I, AA, DT, TLI, CurLoop, CurAST, SafetyInfo) &&		canSinkOrHoistInst(I, AA, DT, CurLoop, CurAST, SafetyInfo) &&
isSafeToExecuteUnconditionally(		isSafeToExecuteUnconditionally(
I, DT, CurLoop, SafetyInfo,		I, DT, CurLoop, SafetyInfo,
CurLoop->getLoopPreheader()->getTerminator()))		CurLoop->getLoopPreheader()->getTerminator()))
Changed \|= hoist(I, DT, CurLoop, SafetyInfo);		Changed \|= hoist(I, DT, CurLoop, SafetyInfo);
}		}

const std::vector<DomTreeNode *> &Children = N->getChildren();		const std::vector<DomTreeNode *> &Children = N->getChildren();
for (DomTreeNode *Child : Children)		for (DomTreeNode *Child : Children)
Show All 32 Lines	if (Fn->hasPersonalityFn())
if (Constant *PersonalityFn = Fn->getPersonalityFn())		if (Constant *PersonalityFn = Fn->getPersonalityFn())
if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))		if (isFuncletEHPersonality(classifyEHPersonality(PersonalityFn)))
SafetyInfo->BlockColors = colorEHFunclets(*Fn);		SafetyInfo->BlockColors = colorEHFunclets(*Fn);
}		}

/// canSinkOrHoistInst - Return true if the hoister and sinker can handle this		/// canSinkOrHoistInst - Return true if the hoister and sinker can handle this
/// instruction.		/// instruction.
///		///
bool canSinkOrHoistInst(Instruction &I, AliasAnalysis AA, DominatorTree DT,		bool llvm::canSinkOrHoistInst(Instruction &I, AAResults AA, DominatorTree DT,
TargetLibraryInfo TLI, Loop CurLoop,		Loop CurLoop, AliasSetTracker CurAST,
AliasSetTracker CurAST, LoopSafetyInfo SafetyInfo) {		LoopSafetyInfo *SafetyInfo) {
		if (!isa<LoadInst>(I) && !isa<CallInst>(I) &&
		!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&
		!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&
		!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&
		!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&
		!isa<InsertValueInst>(I))
		return false;

// Loads have extra constraints we have to verify before we can hoist them.		// Loads have extra constraints we have to verify before we can hoist them.
		chandlercUnsubmitted Done Reply Inline Actions Can you split these refactorings into a separate patch please? They seem strictly beneficial and unrelated, and will make for example reverts messier if left in as part of this patch. I have several minor and boring comments on the refactorings, but it seems better to make them on a dedicated patch than to clutter this thread with them. (Just to be clear, I'd would leave it a static function, and just get the API you want and the implementation improvements. Then in this patch you can just make it an external function.) chandlerc: Can you split these refactorings into a separate patch please? They seem strictly beneficial…
		chandlercUnsubmitted Not Done Reply Inline Actions I see that we got confused here and in the other review. The part of this refactoring I do think makes sense to split out and send for review independently is changing the signature (for example, removing TargetLibraryInfo) and re-organizing the implementation. The only part I think needs to happen with this patch is making this routine be a public routine in the 'llvm' namespace. Does that make more sense? chandlerc: I see that we got confused here and in the other review. The part of this refactoring I do…
		danielcdhAuthorUnsubmitted Not Done Reply Inline Actions Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL. Please do not look at LICM changes in this patch for now because it also includes the refactoring bit. I'll rebase once D24168 is closed. danielcdh: Sure. The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL.
if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {		if (LoadInst *LI = dyn_cast<LoadInst>(&I)) {
if (!LI->isUnordered())		if (!LI->isUnordered())
return false; // Don't hoist volatile/atomic loads!		return false; // Don't hoist volatile/atomic loads!

// Loads from constant memory are always safe to move, even if they end up		// Loads from constant memory are always safe to move, even if they end up
// in the same alias set as something that ends up being modified.		// in the same alias set as something that ends up being modified.
if (AA->pointsToConstantMemory(LI->getOperand(0)))		if (AA->pointsToConstantMemory(LI->getOperand(0)))
return true;		return true;
if (LI->getMetadata(LLVMContext::MD_invariant_load))		if (LI->getMetadata(LLVMContext::MD_invariant_load))
return true;		return true;

// Don't hoist loads which have may-aliased stores in loop.		// Don't hoist loads which have may-aliased stores in loop.
uint64_t Size = 0;		uint64_t Size = 0;
if (LI->getType()->isSized())		if (LI->getType()->isSized())
Size = I.getModule()->getDataLayout().getTypeStoreSize(LI->getType());		Size = LI->getModule()->getDataLayout().getTypeStoreSize(LI->getType());

AAMDNodes AAInfo;		AAMDNodes AAInfo;
LI->getAAMetadata(AAInfo);		LI->getAAMetadata(AAInfo);

return !pointerInvalidatedByLoop(LI->getOperand(0), Size, AAInfo, CurAST);		return !pointerInvalidatedByLoop(LI->getOperand(0), Size, AAInfo, CurAST);
} else if (CallInst *CI = dyn_cast<CallInst>(&I)) {		} else if (CallInst *CI = dyn_cast<CallInst>(&I)) {
// Don't sink or hoist dbg info; it's legal, but not useful.		// Don't sink or hoist dbg info; it's legal, but not useful.
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(*CI))
return false;		return false;

// Don't sink calls which can throw.		// Don't sink calls which can throw.
if (CI->mayThrow())		if (CI->mayThrow())
return false;		return false;

// Handle simple cases by querying alias analysis.		// Handle simple cases by querying alias analysis.
FunctionModRefBehavior Behavior = AA->getModRefBehavior(CI);		FunctionModRefBehavior Behavior = AA->getModRefBehavior(CI);
Show All 21 Lines	if (AliasAnalysis::onlyReadsMemory(Behavior)) {
}		}
}		}
if (!FoundMod)		if (!FoundMod)
return true;		return true;
}		}

// FIXME: This should use mod/ref information to see if we can hoist or		// FIXME: This should use mod/ref information to see if we can hoist or
// sink the call.		// sink the call.

return false;		return false;
}		}
		dberlinUnsubmitted Done Reply Inline Actions This is not really right, it returns whether a non-memory instruction is hoistable. :) dberlin: This is not really right, it returns whether a non-memory instruction is hoistable. :)

		danielcdhAuthorUnsubmitted Done Reply Inline Actions refactored code to remove this. danielcdh: refactored code to remove this.
// Only these instructions are hoistable/sinkable.		if (SafetyInfo)
if (!isa<BinaryOperator>(I) && !isa<CastInst>(I) && !isa<SelectInst>(I) &&
!isa<GetElementPtrInst>(I) && !isa<CmpInst>(I) &&
!isa<InsertElementInst>(I) && !isa<ExtractElementInst>(I) &&
!isa<ShuffleVectorInst>(I) && !isa<ExtractValueInst>(I) &&
!isa<InsertValueInst>(I))
return false;

// TODO: Plumb the context instruction through to make hoisting and sinking		// TODO: Plumb the context instruction through to make hoisting and sinking
// more powerful. Hoisting of loads already works due to the special casing		// more powerful. Hoisting of loads already works due to the special casing
// above.		// above.
return isSafeToExecuteUnconditionally(I, DT, CurLoop, SafetyInfo, nullptr);		return isSafeToExecuteUnconditionally(I, DT, CurLoop, SafetyInfo, nullptr);
		else
		return true;
}		}

/// Returns true if a PHINode is a trivially replaceable with an		/// Returns true if a PHINode is a trivially replaceable with an
/// Instruction.		/// Instruction.
/// This is true when all incoming values are that instruction.		/// This is true when all incoming values are that instruction.
/// This pattern occurs most often with LCSSA PHI nodes.		/// This pattern occurs most often with LCSSA PHI nodes.
///		///
static bool isTriviallyReplacablePHI(const PHINode &PN, const Instruction &I) {		static bool isTriviallyReplacablePHI(const PHINode &PN, const Instruction &I) {
▲ Show 20 Lines • Show All 647 Lines • Show Last 20 Lines

lib/Transforms/Scalar/LoopSink.cpp

This file was added.

				//===-- LoopSink.cpp - Loop Sink Pass ------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass traverses all instructions in loop preheader and sink it to the
				// loop body where frequency is lower than the loop's preheader.
				reamesUnsubmitted Done Reply Inline Actions How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but updating the comment with context might be helpful.) reames: How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but…
				// This pass is a reverse-transformation of LICM. It differs from the Sink
				// pass that it only processes instructions in loop's preheader, and has more
				chandlercUnsubmitted Done Reply Inline Actions "and sink them/ to" -> "and sinks them to" chandlerc: "and sink them/ to" -> "and sinks them to"
				// accurate alias/profile info to guide sinking decisions.
				//
				chandlercUnsubmitted Done Reply Inline Actions Some grammar issues here: "all instructions" -> "all of the instructions" "in loop preheader" -> "in the loop preheader" "sink it" -> "sink them" "the Sink pass that it only" -> "the Sink pass in that it only" "in loop's preheader" -> "in the loop's preheader". I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of: This pass does the inverse transformation of what LICM does. It traverses all of the instructions ... It differs from the Sink pass ... Does that make sense? chandlerc: Some grammar issues here: - "all instructions" -> "all of the instructions" - "in loop…
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/AliasSetTracker.h"
				#include "llvm/Analysis/BasicAliasAnalysis.h"
				#include "llvm/Analysis/BlockFrequencyInfo.h"
				#include "llvm/Analysis/Loads.h"
				chandlercUnsubmitted Done Reply Inline Actions The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions. I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly: "in a way that" -> "in the following ways" "prehead" -> "preahader" "find optimal" -> "find the optimal" chandlerc: The differences seem to be a bit duplicated at this point. Sorry if this is the result of my…
				#include "llvm/Analysis/LoopInfo.h"
				#include "llvm/Analysis/LoopPass.h"
				#include "llvm/Analysis/LoopPassManager.h"
				#include "llvm/Analysis/ScalarEvolution.h"
				#include "llvm/Analysis/ScalarEvolutionAliasAnalysis.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Metadata.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/Local.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"
				using namespace llvm;

				davidxlUnsubmitted Done Reply Inline Actions Add statistics for number of instructions sinked etc. davidxl: Add statistics for number of instructions sinked etc.
				#define DEBUG_TYPE "loopsink"

				STATISTIC(NumLoopSunk, "Number of instructions sunk into loop");

				dberlinUnsubmitted Done Reply Inline Actions "Don't sink instructions that require cloning unless they execute less than this percent of the time" (or whatever) dberlin: "Don't sink instructions that require cloning unless they execute less than this percent of the…
				chandlercUnsubmitted Done Reply Inline Actions Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not. chandlerc: Do you want to count separately the number of instruction clones created as part of this? Not…
				static cl::opt<unsigned> SinkFrequencyPercentThreshold(
				"sink-freq-percent-threshold", cl::Hidden, cl::init(90),
				cl::desc("Do not sink instructions that require cloning unless they "
				"execute less than this percent of the time."));

				static bool SinkLoop(Loop L, AliasAnalysis AA, LoopInfo *LI,
				DominatorTree DT, BlockFrequencyInfo BFI,
				chandlercUnsubmitted Done Reply Inline Actions I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration. chandlerc: I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this…
				ScalarEvolution *SE);

				namespace {
				struct LegacyLoopSinkPass : public LoopPass {
				static char ID;
				LegacyLoopSinkPass() : LoopPass(ID) {
				initializeLegacyLoopSinkPassPass(*PassRegistry::getPassRegistry());
				}

				bool runOnLoop(Loop *L, LPPassManager &LPM) override {
				if (skipLoop(L))
				return false;

				auto *SE = getAnalysisIfAvailable<ScalarEvolutionWrapperPass>();
				return SinkLoop(L, &getAnalysis<AAResultsWrapperPass>().getAAResults(),
				&getAnalysis<LoopInfoWrapperPass>().getLoopInfo(),
				&getAnalysis<DominatorTreeWrapperPass>().getDomTree(),
				&getAnalysis<BlockFrequencyInfoWrapperPass>().getBFI(),
				SE ? &SE->getSE() : nullptr);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.setPreservesCFG();
				AU.addRequired<BlockFrequencyInfoWrapperPass>();
				getLoopAnalysisUsage(AU);
				}
				};
				}

				char LegacyLoopSinkPass::ID = 0;
				davidxlUnsubmitted Done Reply Inline Actions LoopBase class has a member method 'contains' which can be used. davidxl: LoopBase class has a member method 'contains' which can be used.
				INITIALIZE_PASS_BEGIN(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(LoopPass)
				INITIALIZE_PASS_DEPENDENCY(BlockFrequencyInfoWrapperPass)
				INITIALIZE_PASS_END(LegacyLoopSinkPass, "loop-sink", "Loop Sink", false, false)

				chandlercUnsubmitted Done Reply Inline Actions I would use `SmallPtrSetImpl<BasicBlock >` here and elsewhere on API boundaries where you can below. chandlerc:* I would use `SmallPtrSetImpl<BasicBlock *>` here and elsewhere on API boundaries where you can…
				Pass *llvm::createLoopSinkPass() { return new LegacyLoopSinkPass(); }

				dberlinUnsubmitted Done Reply Inline Actions "sunk into loop body" dberlin: "sunk into loop body"
				/// Return adjusted total frequency of BBs.
				/// If the size of BBs is > 1, sinking would lead to code size increase,
				dberlinUnsubmitted Done Reply Inline Actions CurLoop is unused? dberlin: CurLoop is unused?
				/// which is tax by adding extra frequency to the total frequency.
				dberlinUnsubmitted Done Reply Inline Actions s/hoist/sink/ dberlin: s/hoist/sink/
				static BlockFrequency AdjustedSumFreq(DenseSet<BasicBlock *> &BBs,
				BlockFrequencyInfo *BFI) {
				davidxlUnsubmitted Done Reply Inline Actions Same here -- please share the utility with LICM (isHoistableLoad) davidxl: Same here -- please share the utility with LICM (isHoistableLoad)
				chandlercUnsubmitted Done Reply Inline Actions I'm having a hard time understanding what this comment is trying to say. Can you try to re-word this to be more clear (and more clearly worded)? chandlerc: I'm having a hard time understanding what this comment is trying to say. Can you try to re-word…
				BlockFrequency T = 0;
				for (BasicBlock *B : BBs)
				T += BFI->getBlockFreq(B);
				if (BBs.size() > 1)
				T /= BranchProbability(SinkFrequencyPercentThreshold, 100);
				return T;
				}
				davidxlUnsubmitted Done Reply Inline Actions Remove dead code davidxl: Remove dead code

				/// FindSinkBBs - Return a set of basic blocks that are:
				/// * Inside the loop
				chandlercUnsubmitted Done Reply Inline Actions Since this is new code, please use the more modern form of doxygen throughout: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments (I've also updated those to more accurately reflect that we use auto-brief new rather than explicit '\brief ...' sections.) chandlerc: Since this is new code, please use the more modern form of doxygen throughout: http://llvm.
				chandlercUnsubmitted Done Reply Inline Actions Naming convention: adjustedSumFreq chandlerc: Naming convention: adjustedSumFreq
				chandlercUnsubmitted Done Reply Inline Actions "/p" -> "\p" chandlerc: "/p" -> "\p"
				/// * Dominate BBs
				/// * Has minimum total frequency that is no greater than preheader frequency
				/// If no such set can be found, return an empty set.
				static DenseSet<BasicBlock > FindSinkBBs(const Loop L,
				chandlercUnsubmitted Done Reply Inline Actions I feel like this comment too could be much more clear. First, it isn't clear without a lot of context what the purpose of this would be. I'm guessing you mean something like find a candidate set of basic blocks to sink into? "Dominate BBs" - this is ambiguous. Do all returned basic blocks need to dominate the set of blocks in BBs? Or is it more that for each block in BBs, at least one block in the returned set must dominate that block? A better name for the parameter than "BBs" would probably help here. The frequency constraint isn't really explained. Why is that important? Give the reader more help to understand what the code will end up doing. chandlerc: I feel like this comment too could be much more clear. First, it isn't clear without a lot of…
				const DenseSet<BasicBlock *> &BBs,
				DominatorTree *DT,
				chandlercUnsubmitted Done Reply Inline Actions Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common, it would be better to use SmallPtrSetImpl as an input and SmallPtrSet as a result with a reasonable small size optimization. chandlerc: Do you expect these sets to be large? Naively, I wouldn't... If small is likely to be common…
				BlockFrequencyInfo *BFI) {
				DenseSet<BasicBlock *> SinkBBs;
				dberlinUnsubmitted Done Reply Inline Actions bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock BB) { return BFI->getBlockFreq(BB) <= PreHeaderFreq}); dberlin:* bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock *BB) { return BFI->getBlockFreq…
				if (BBs.size() == 0)
				return SinkBBs;

				// Sort loop's basic blocks by frequency
				SmallVector<BasicBlock *, 10> SortedLoopBBs;
				davidxlUnsubmitted Done Reply Inline Actions This code is shared with LICM, can this code be refactored into some utillty (declared in LICM.h) helper? davidxl: This code is shared with LICM, can this code be refactored into some utillty (declared in LICM.
				for (BasicBlock *B : L->blocks())
				if (BFI->getBlockFreq(B) <= BFI->getBlockFreq(L->getLoopPreheader()))
				SortedLoopBBs.push_back(B);
				std::stable_sort(SortedLoopBBs.begin(), SortedLoopBBs.end(),
				[&](BasicBlock A, BasicBlock B) {
				return BFI->getBlockFreq(A) < BFI->getBlockFreq(B);
				});

				davidxlUnsubmitted Done Reply Inline Actions This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop -- that should be extracted as a common utility. davidxl: This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy passmanager: Build a base class ASTLoopTransformation, and make LookInvariantCodeMotion and LoopSink sub class of it. LoopToAliasSetMap and collectAliasInfoForLoop should be protected member of the base class. The logic in the sub class will also be shared with the new pass manager. Build a base class ASTLoopLegacyPass which inherits from LoopPass, and LICMLegacyPass and LoopSinkLegacyPass subclass of it. cloneBasicBlockAnalysis, deleteAnalysisValue and deleteAnalysisLoop should be private member of the base class. Base class also need to have a ASTLoopTransformation pointer to invoke the actual logic shared with new pass manager. This is definitely doable, but seems an over kill to me because: collectAliasInfoForLoop is an optimization for compile time. And it is not yet available in the new pass manager. The refactoring mainly focuses on abstraction of the old pass manager, which will be replaced by new pass manager soon. There is much complexity involved because new pass manage does not support this optimization, and we need to make it fall back to what we do right now (add all basic blocks to AST) without introducing memory leak. Comments? danielcdh: Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy…
				SinkBBs.insert(BBs.begin(), BBs.end());
				// Start from the coldest BB, if its frequency is no greater than all SinkBBs
				// that are dominated it, replace them with the coldest BB.
				for (BasicBlock *BB : SortedLoopBBs) {
				chandlercUnsubmitted Done Reply Inline Actions This comment doesn't parse: "that are dominated it" seems to have a grammar error. chandlerc: This comment doesn't parse: "that are dominated it" seems to have a grammar error.
				DenseSet<BasicBlock *> AddedBBs;
				for (BasicBlock *B : SinkBBs)
				chandlercUnsubmitted Done Reply Inline Actions Usually it is better to keep a set outside the loop and clear it on each iteration... chandlerc: Usually it is better to keep a set outside the loop and clear it on each iteration...
				if (DT->dominates(BB, B))
				chandlercUnsubmitted Done Reply Inline Actions naming: findBBsToSinkInto chandlerc: naming: findBBsToSinkInto
				AddedBBs.insert(B);
				if (AddedBBs.size() == 0)
				davidxlUnsubmitted Done Reply Inline Actions Is the first check needed? davidxl: Is the first check needed?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I think yes because if there is loop variant in its operand, sinking it into the loop may change the value every iteration. danielcdh: I think yes because if there is loop variant in its operand, sinking it into the loop may…
				continue;
				dberlinUnsubmitted Not Done Reply Inline Actions I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing them when necessary? dberlin: I'm confused. Why is this necessary, instead of using the reverse iterators and just advancing…
				if (AdjustedSumFreq(AddedBBs, BFI) > BFI->getBlockFreq(BB)) {
				davidxlUnsubmitted Done Reply Inline Actions add documentation for the method. davidxl: add documentation for the method.
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions We need reverse iterator because instructions in the back of the BB may depend on the instructions in the front, thus it needs to be sunk first before other instructions can be sunk. danielcdh: We need reverse iterator because instructions in the back of the BB may depend on the…
				for (BasicBlock *B : AddedBBs) {
				SinkBBs.erase(B);
				}
				SinkBBs.insert(BB);
				chandlercUnsubmitted Done Reply Inline Actions I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again all for basic blocks in different containers. And `AddedBBs` doesn't really tell the reader much about what the container is doing. Compare that to `SortedLoopBBs` which says exactly what it is. `SinkBBs` might also benefit from a slightly better name (and the function name might similarly benefit). chandlerc: I think this code needs significantly better variable names. You have `BB`, `B`, and `B` again…
				}
				}
				davidxlUnsubmitted Done Reply Inline Actions To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if there are any blocks that are really cold (colder than preheader). If there are not, early return. davidxl: To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if…

				// If SinkBBs' frequency sum to be larger than preheader frequency, do not
				junbumlUnsubmitted Done Reply Inline Actions Isn't it still okay to try to sink in outside the loop if the user block is cold enough? junbuml: Isn't it still okay to try to sink in outside the loop if the user block is cold enough?
				// sink.
				danielcdhAuthorUnsubmitted Done Reply Inline Actions That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently. danielcdh: That would become general purpose sinking instead of loop-sinking. And we need to handle…
				if (AdjustedSumFreq(SinkBBs, BFI) > BFI->getBlockFreq(L->getLoopPreheader()))
				davidxlUnsubmitted Done Reply Inline Actions add a comment for this variable. davidxl: add a comment for this variable.
				chandlercUnsubmitted Done Reply Inline Actions This shouldn't be done each time we try to sink an instruction. This should be pre-computed once for the loop and re-used for each instruction we try to sink. chandlerc: This shouldn't be done each time we try to sink an instruction. This should be pre-computed…
				SinkBBs.clear();
				return SinkBBs;
				}

				davidxlUnsubmitted Done Reply Inline Actions Contnue if BBs is empty davidxl: Contnue if BBs is empty
				/// SinkLoop - Sink instructions from loop's preheader to the loop body if the
				/// sum frequency of inserted copy is smaller than preheader's frequency.
				bool SinkLoop(Loop L, AliasAnalysis AA, LoopInfo LI, DominatorTree DT,
				BlockFrequencyInfo BFI, ScalarEvolution SE) {
				BasicBlock *Preheader = L->getLoopPreheader();
				if (!Preheader)
				return false;

				davidxlUnsubmitted Done Reply Inline Actions This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the sum of SinkBB and N. In other words, division should be used, not multiplication davidxl: This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's frequency, then the check will fail and goes to "else" branch: picking the CDT instead SinkBB. danielcdh: It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's…
				const BlockFrequency PreheaderFreq = BFI->getBlockFreq(Preheader);
				bool HasColdBB = llvm::any_of(L->blocks(), [&](const BasicBlock *BB) {
				return BFI->getBlockFreq(BB) <= PreheaderFreq;
				});
				davidxlUnsubmitted Done Reply Inline Actions if -> is davidxl: if -> is
				chandlercUnsubmitted Done Reply Inline Actions The number of user instructions isn't really the right thing to apply the threshold to as that doesn't directly change the cost. The idea is that we need the size of `BBsToSinkInto` to be a small constant in order for the search for the coldest dominating set to be "just" linear in the number of blocks in the loop. So while a threshold of "40" may make sense for number of user instructions, I suspect the threshold should be much smaller when applied to the size of `BBsToSinkInto`. I also think you should add two comments about this. One, you should comment to the `findBBsToSinkInto` function clarifying the algorithmic complexity (That it O(N * M) or O(M^2) where N is SortedLoopBBs.size() and M is BBsToSinkInto.size()), and you should mention where you check this threshold that the reason is because we're going to call `findBBsToSinkInto` which would go quadratic if we didn't set a cap. The reason for all of this is that I'm worried some future maintainer will come along and not really understand how risk it is to adjust these thresholds so I think it is important to document the implications. I still think we will long-term need a better algorithmic approach here as I suspect we'll find annoying cases where the threshold blocks an important optimization (such as when there are way too many initial BBsToInsertInto but there are a small number of common dominating blocks). But I understand this is one of the really hard problems (its the same as optimal PHI placement and a bunch of other ones), and I don't want to hold up your patch on a comprehensive approach here. On an unrelated note, you should also document that this threshold has a secondary function: it places an upper bound on how much code growth we may trigger here. I'd document this in particular as that seems somewhat accidental and I suspect we may long-term want a better threshold for that. I would in fact encourage you to leave a FIXME to adjust this for min_size and opt_size. chandlerc: The number of user instructions isn't really the right thing to apply the threshold to as…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the most computation intensive part of the algorithm. danielcdh: You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the…
				if (!HasColdBB)
				chandlercUnsubmitted Done Reply Inline Actions Why the variable rather than inlining this? Also, can you just call any_of directly since we have a using declaration and this is a range variant that doesn't exist in the standard? chandlerc: Why the variable rather than inlining this? Also, can you just call any_of directly since we…
				return false;
				junbumlUnsubmitted Done Reply Inline Actions Why not early return if frequency of SinkBB is greater than PreheaderFreq. junbuml: Why not early return if frequency of SinkBB is greater than PreheaderFreq.

				bool Changed = false;
				AliasSetTracker CurAST(*AA);

				// Compute alias set.
				for (BasicBlock *BB : L->blocks())
				CurAST.add(*BB);

				dberlinUnsubmitted Done Reply Inline Actions Please factor this out into FindSinkBlocks or something. This is non-deterministic, because you are iterating over a denseset. I am also confused by this placement strategy. You are not ordering the blocks in any particular processing order, so you may not actually choose the best sink points, as once you NCA something high in the domtree and something low, NCA will always be something high in the domtree. If you ordered it so it was the lowest things first (using the DFS numbers or whatever), you may decide multiple intermediate placements are cheaper than what you are doing here. dberlin: 1. Please factor this out into FindSinkBlocks or something. 2. This is non-deterministic…
				// Putting all preheader instructions in a working list in reverse order.
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Good point. Refactored the code and updated the algorithm to iterate from cold blocks top ensure optimal. danielcdh: Good point. Refactored the code and updated the algorithm to iterate from cold blocks top…
				chandlercUnsubmitted Done Reply Inline Actions This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large. Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...) chandlerc: This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(LM), but if optimal solution is desired, O(NLM) is the best I can get. Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend). How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases? danielcdh:* I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not…
				chandlercUnsubmitted Done Reply Inline Actions I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction. The other thing is that if this scales in compile time by N then it scales in compile time by how much effect it is having. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions. I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop? chandlerc: I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is…
				// This maintains the integrety of the working set while some instructions
				// sunk into loop body.
				SmallVector<Instruction *, 10> INS;
				for (auto II = Preheader->rbegin(), E = Preheader->rend(); II != E;) {
				chandlercUnsubmitted Done Reply Inline Actions This comment again doesn't parse for me, but isn't this dead code now that you're just directly using the reverse iterators? chandlerc: This comment again doesn't parse for me, but isn't this dead code now that you're just directly…
				Instruction &I = *II++;
				if (L->hasLoopInvariantOperands(&I) &&
				canSinkOrHoistInst(I, AA, DT, L, &CurAST, nullptr))
				INS.push_back(&I);
				davidxlUnsubmitted Done Reply Inline Actions Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum, do not use cdt. davidxl: Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum…
				junbumlUnsubmitted Done Reply Inline Actions I guess you intend L->contains(LI->getLoopFor(N)) ? junbuml: I guess you intend L->contains(LI->getLoopFor(N)) ?
				}
				danielcdhAuthorUnsubmitted Done Reply Inline Actions good catch. Thanks! danielcdh: good catch. Thanks!
				chandlercUnsubmitted Done Reply Inline Actions We generally prefer calling `.empty()` to testing `.size()` against zero. chandlerc: We generally prefer calling `.empty()` to testing `.size()` against zero.
				for (auto I : INS) {
				// All blocks that have uses of I and are in the sub loop of L.
				chandlercUnsubmitted Done Reply Inline Actions This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here rather than the old name? Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact. chandlerc: This doxygen is still in the old form. Also, this should be 'sinkLoop'. Use `AARseluts` here…
				DenseSet<BasicBlock *> BBs;
				for (auto &U : I->uses()) {
				davidxlUnsubmitted Done Reply Inline Actions Is the formatting correct here? davidxl: Is the formatting correct here?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions I used clang-format --style=llvm for the formatting. danielcdh: I used clang-format --style=llvm for the formatting.
				Instruction *UI = cast<Instruction>(U.getUser());
				chandlercUnsubmitted Done Reply Inline Actions Here you don't need stable sort since this is a total ordering. You should just use std::sort and mention that this is a known total ordering in a comment. You could do that in an overall comment htat explains what you're doing here: // Copy the final BBs into a vector and sort them using the total ordering // of the loop block numbers as iterating the set doesn't give a useful // order. No need to stable sort as the block numbers are a total ordering. chandlerc: Here you don't need stable sort since this is a total ordering. You should just use std::sort…
				// If the use is phi node, we can not sink I to this BB.
				if (dyn_cast<PHINode>(UI) \|\|
				!L->contains(LI->getLoopFor(UI->getParent()))) {
				BBs.clear();
				break;
				}
				dberlinUnsubmitted Done Reply Inline Actions Needs a comment dberlin: Needs a comment
				BBs.insert(UI->getParent());
				chandlercUnsubmitted Done Reply Inline Actions I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities." I would also find this easier to read without the negation as: if (all_of(... return BFI->getBlockFreq(BB) > PreheaderFreq; chandlerc: I think a comment along the lines of "If there are no basic blocks with lower frequency than…
				}

				// Find the set of BBs that we should insert a copy of I.
				chandlercUnsubmitted Done Reply Inline Actions You didn't actually switch to the sorted list here. Also, you can just use a range based for loop here. chandlerc: You didn't actually switch to the sorted list here. Also, you can just use a range based for…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions The reason I used iterator here is because we need to handle the first entry in a different way. danielcdh: The reason I used iterator here is because we need to handle the first entry in a different way.
				DenseSet<BasicBlock *> SinkBBs = FindSinkBBs(L, BBs, DT, BFI);
				if (SinkBBs.size() == 0)
				continue;

				dberlinUnsubmitted Not Done Reply Inline Actions This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips if it's empty). Then you should simply move the i == 0 case outside of the loop, and the loop is just doing the insertions. dberlin: This looks ... interesting. Instead, why not add SinkBBS.size() == 0 check above (so it skips…
				auto BI = SinkBBs.begin();
				davidxlUnsubmitted Done Reply Inline Actions if T >= ... early return davidxl: if T >= ... early return
				danielcdhAuthorUnsubmitted Not Done Reply Inline Actions SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not reach here. The i==0 check is to distinguish between the first SinkBB (that we use move instead of insert) and the later SinkBB (that we make a copy for each insert). danielcdh: SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not…
				DEBUG(dbgs() << "Sinking " << I << " To: " << BI->getName() << '\n');
				NumLoopSunk++;
				I->moveBefore(&(BI)->getFirstInsertionPt());

				for (++BI; BI != SinkBBs.end(); ++BI) {
				davidxlUnsubmitted Done Reply Inline Actions Add debug trace here davidxl: Add debug trace here
				dberlinUnsubmitted Done Reply Inline Actions This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one :P) dberlin: This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one…
				BasicBlock N = BI;
				// Clone I and replace its uses.
				Instruction *IC = I->clone();
				IC->setName(I->getName());
				IC->insertBefore(&*N->getFirstInsertionPt());
				replaceDominatedUsesWith(I, IC, *DT, IC->getParent(), true);
				chandlercUnsubmitted Done Reply Inline Actions This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me. Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops. chandlerc: This comment is a little confusing. It seems to be describing a think (like a variable, for…
				DEBUG(dbgs() << "Sinking " << I << " To: " << N->getName() << '\n');
				NumLoopSunk++;
				chandlercUnsubmitted Done Reply Inline Actions So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point. chandlerc: So, this technically will break the verifier if you ever look at the IR at this point. While…
				}
				Changed = true;
				chandlercUnsubmitted Done Reply Inline Actions If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment? chandlerc: If this is the case we can't sink I at all though, right? I think that is what the code already…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Not sure if I get this right, do you mean update the comment (as I just did) to make it less redundant? danielcdh: Not sure if I get this right, do you mean update the comment (as I just did) to make it less…
				}

				chandlercUnsubmitted Done Reply Inline Actions I would use two ifs here since one needs its own comment (and it is a nice comment!) chandlerc: I would use two ifs here since one needs its own comment (and it is a nice comment!)
				chandlercUnsubmitted Done Reply Inline Actions Why not use `L->contains(UI)`? chandlerc: Why not use `L->contains(UI)`?
				danielcdhAuthorUnsubmitted Done Reply Inline Actions Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1->contains(I1) --> true Loop2->contains(I2) --> true Loop1->contains(I2) --> false For this check we want to make sure I1 and I2 both return true. danielcdh: Because it does not check for sub loops. e.g. Loop1 { I1 Loop2 { I2 } } Loop1…
				if (Changed && SE)
				SE->forgetLoopDispositions(L);
				chandlercUnsubmitted Done Reply Inline Actions This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe `sinkLoopInvariantInstructions`? Also, I think this should be a static function. chandlerc: This routine doesn't sink the loop, so a name `sinkLoop` seems confusing. Maybe…
				return Changed;
				}
				chandlercUnsubmitted Done Reply Inline Actions Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function. chandlerc: Check for an empty `BBs` here to handle the case of a use that can't be handled? This combined…
				chandlercUnsubmitted Done Reply Inline Actions I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking". chandlerc: I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone…
				chandlercUnsubmitted Done Reply Inline Actions So, this has an important problem: it introduces a non-determinism into the compiler. The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone. However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is also something we would very much like to not depend on: the use list order. I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like: int i = 0; for (auto BB : L->blocks()) LoopBlockNumber[BB] = ++i; (Just pseudo code, but you get the idea.) That will punt the ordering requirement to LoopInfo which is I think the right place for it. chandlerc:* So, this has an important problem: it introduces a non-determinism into the compiler. The…
				chandlercUnsubmitted Done Reply Inline Actions It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are less hot than the preheader. I think it might be nice to clue the reader in that this isn't all the loop BBs. Maybe `SortedColdLoopBBs`? Or just `ColdLoopBBs`? If you make this change, I'd keep the name consistent throughout of course. Also, you use `<=` here, but `<` everywhere else I see, any particular reason to include BBs in this list with the same frequency? chandlerc: It wasn't obvious to me reading this that `SortedLoopBBs` only contained the oop BBs that are…
				chandlercUnsubmitted Done Reply Inline Actions Use a SmallDenseMap? Good to dodge allocations for small loops. chandlerc: Use a SmallDenseMap? Good to dodge allocations for small loops.

lib/Transforms/Scalar/Scalar.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeGVNLegacyPassPass(Registry);		initializeGVNLegacyPassPass(Registry);
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
initializeFlattenCFGPassPass(Registry);		initializeFlattenCFGPassPass(Registry);
initializeInductiveRangeCheckEliminationPass(Registry);		initializeInductiveRangeCheckEliminationPass(Registry);
initializeIndVarSimplifyLegacyPassPass(Registry);		initializeIndVarSimplifyLegacyPassPass(Registry);
initializeJumpThreadingPass(Registry);		initializeJumpThreadingPass(Registry);
initializeLegacyLICMPassPass(Registry);		initializeLegacyLICMPassPass(Registry);
		initializeLegacyLoopSinkPassPass(Registry);
initializeLoopDataPrefetchPass(Registry);		initializeLoopDataPrefetchPass(Registry);
initializeLoopDeletionLegacyPassPass(Registry);		initializeLoopDeletionLegacyPassPass(Registry);
initializeLoopAccessLegacyAnalysisPass(Registry);		initializeLoopAccessLegacyAnalysisPass(Registry);
initializeLoopInstSimplifyLegacyPassPass(Registry);		initializeLoopInstSimplifyLegacyPassPass(Registry);
initializeLoopInterchangePass(Registry);		initializeLoopInterchangePass(Registry);
initializeLoopRotateLegacyPassPass(Registry);		initializeLoopRotateLegacyPassPass(Registry);
initializeLoopStrengthReducePass(Registry);		initializeLoopStrengthReducePass(Registry);
initializeLoopRerollPass(Registry);		initializeLoopRerollPass(Registry);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines
void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {		void LLVMAddInstructionCombiningPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createInstructionCombiningPass());		unwrap(PM)->add(createInstructionCombiningPass());
}		}

void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {		void LLVMAddJumpThreadingPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createJumpThreadingPass());		unwrap(PM)->add(createJumpThreadingPass());
}		}

		void LLVMAddLoopSinkPass(LLVMPassManagerRef PM) {
		unwrap(PM)->add(createLoopSinkPass());
		}

void LLVMAddLICMPass(LLVMPassManagerRef PM) {		void LLVMAddLICMPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLICMPass());		unwrap(PM)->add(createLICMPass());
}		}

void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {		void LLVMAddLoopDeletionPass(LLVMPassManagerRef PM) {
unwrap(PM)->add(createLoopDeletionPass());		unwrap(PM)->add(createLoopDeletionPass());
}		}

▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

lib/Transforms/Utils/Local.cpp

Show First 20 Lines • Show All 1,662 Lines • ▼ Show 20 Lines	if (DT.dominates(Root, U)) {
++Count;		++Count;
}		}
}		}
return Count;		return Count;
}		}

unsigned llvm::replaceDominatedUsesWith(Value From, Value To,		unsigned llvm::replaceDominatedUsesWith(Value From, Value To,
DominatorTree &DT,		DominatorTree &DT,
const BasicBlock *BB) {		const BasicBlock *BB,
		bool IncludeSelf) {
assert(From->getType() == To->getType());		assert(From->getType() == To->getType());

unsigned Count = 0;		unsigned Count = 0;
for (Value::use_iterator UI = From->use_begin(), UE = From->use_end();		for (Value::use_iterator UI = From->use_begin(), UE = From->use_end();
UI != UE;) {		UI != UE;) {
Use &U = *UI++;		Use &U = *UI++;
auto *I = cast<Instruction>(U.getUser());		auto *I = cast<Instruction>(U.getUser());
if (DT.properlyDominates(BB, I->getParent())) {		if ((IncludeSelf && BB == I->getParent()) \|\|
		DT.properlyDominates(BB, I->getParent())) {
U.set(To);		U.set(To);
DEBUG(dbgs() << "Replace dominated use of '" << From->getName() << "' as "		DEBUG(dbgs() << "Replace dominated use of '" << From->getName() << "' as "
<< To << " in " << U << "\n");		<< To << " in " << U << "\n");
++Count;		++Count;
}		}
}		}
return Count;		return Count;
}		}
▲ Show 20 Lines • Show All 278 Lines • Show Last 20 Lines

lib/Transforms/Utils/SimplifyInstructions.cpp

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	namespace {
struct InstSimplifier : public FunctionPass {		struct InstSimplifier : public FunctionPass {
static char ID; // Pass identification, replacement for typeid		static char ID; // Pass identification, replacement for typeid
InstSimplifier() : FunctionPass(ID) {		InstSimplifier() : FunctionPass(ID) {
initializeInstSimplifierPass(*PassRegistry::getPassRegistry());		initializeInstSimplifierPass(*PassRegistry::getPassRegistry());
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.setPreservesCFG();		AU.setPreservesCFG();
		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<AssumptionCacheTracker>();		AU.addRequired<AssumptionCacheTracker>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
}		}

/// runOnFunction - Remove instructions that simplify.		/// runOnFunction - Remove instructions that simplify.
bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;

const DominatorTreeWrapperPass *DTWP =		const DominatorTreeWrapperPass *DTWP =
getAnalysisIfAvailable<DominatorTreeWrapperPass>();		&getAnalysis<DominatorTreeWrapperPass>();
const DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;		const DominatorTree *DT = DTWP ? &DTWP->getDomTree() : nullptr;
const TargetLibraryInfo *TLI =		const TargetLibraryInfo *TLI =
&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();		&getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
AssumptionCache *AC =		AssumptionCache *AC =
&getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);		&getAnalysis<AssumptionCacheTracker>().getAssumptionCache(F);
return runImpl(F, DT, TLI, AC);		return runImpl(F, DT, TLI, AC);
}		}
};		};
Show All 27 Lines

test/Transforms/LICM/loopsink.ll

This file was added.

				; RUN: opt -S -loop-sink < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				@g = global i32 0, align 4

				; Function Attrs: norecurse nounwind readonly uwtable
				; b1
				chandlercUnsubmitted Done Reply Inline Actions You can prune out these "Function Attrs" comments... See below. chandlerc: You can prune out these "Function Attrs" comments... See below.
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				; \ /
				; b7
				; preheader: 1000
				davidxlUnsubmitted Done Reply Inline Actions b6 --> b7 davidxl: b6 --> b7
				; b2: 15
				; b3: 7
				; b4: 7
				; Sink load to b2
				; CHECK: t1
				; CHECK: .b2:
				davidxlUnsubmitted Done Reply Inline Actions add check-not of @g after preheader. davidxl: add check-not of @g after preheader.
				; CHECK: load i32, i32* @g
				; CHECK: .b3:
				; CHECK-NOT: load i32, i32* @g
				davidxlUnsubmitted Done Reply Inline Actions add check-not @g after b3 and b4 davidxl: add check-not @g after b3 and b4
				define i32 @t1(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !1

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 %invariant, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, 100
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !7
				davidxlUnsubmitted Done Reply Inline Actions Add a branch profile data here. davidxl: Add a branch profile data here.

				.exit:
				ret i32 10
				}

				; Function Attrs: norecurse nounwind readonly uwtable
				; b1
				; / \
				; b2 b6
				davidxlUnsubmitted Done Reply Inline Actions B6 --> b6 davidxl: B6 --> b6
				; / \ \|
				; b3 b4 \|
				davidxlUnsubmitted Done Reply Inline Actions B3 --> b3 davidxl: B3 --> b3
				; \ / \|
				; b5 \|
				; \ /
				; b7
				davidxlUnsubmitted Done Reply Inline Actions This should be b7 davidxl: This should be b7
				; preheader: 500
				; b1: 16016
				; b3: 8
				; b6: 8
				; Sink load to b3 and b6
				; CHECK: t2
				; CHECK: .preheader:
				; CHECK-NOT: load i32, i32* @g
				; CHECK: .b3:
				; CHECK: load i32, i32* @g
				; CHECK: .b4:
				; CHECK: .b6:
				; CHECK: load i32, i32* @g
				; CHECK: .b7:
				define i32 @t2(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !6

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4, !prof !1

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 5, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, %invariant
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				davidxlUnsubmitted Done Reply Inline Actions annotate with branch profile data davidxl: annotate with branch profile data
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !7

				.exit:
				ret i32 10
				}

				; Function Attrs: norecurse nounwind readonly uwtable
				; b1
				; / \
				; b2 b6
				davidxlUnsubmitted Done Reply Inline Actions B3 -> b3 davidxl: B3 -> b3
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				davidxlUnsubmitted Done Reply Inline Actions b6 -> b7 davidxl: b6 -> b7
				; \ /
				; b7
				; preheader: 500
				; b3: 8
				; b5: 16008
				; Do not sink load from preheader.
				; CHECK: t3
				; CHECK: .preheader:
				; CHECK: load i32, i32* @g
				; CHECK: .b1:
				; CHECK-NOT: load i32, i32* @g
				define i32 @t3(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !6

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4, !prof !1

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 5, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				%t5 = mul nsw i32 %p5, %invariant
				br label %.b7

				.b6:
				%t6 = add nsw i32 %iv, 5
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !7

				.exit:
				ret i32 10
				}

				; Function Attrs: norecurse nounwind readonly uwtable
				; For single-BB loop with <=1 avg trip count, sink load to b1
				; CHECK: t4
				; CHECK: .preheader:
				; CHECK-not: load i32, i32* @g
				; CHECK: .b1:
				; CHECK: load i32, i32* @g
				; CHECK: .exit:
				define i32 @t4(i32, i32) #0 {
				davidxlUnsubmitted Done Reply Inline Actions but this loop will be executed at least once per call of t4, so the loop body frequency should not be lower than entry frequency davidxl: but this loop will be executed at least once per call of t4, so the loop body frequency should…
				danielcdhAuthorUnsubmitted Done Reply Inline Actions So the current algorithm is that even if the frequency is equal (as in this case), we still tend to sink because it will reduce live range. danielcdh: So the current algorithm is that even if the frequency is equal (as in this case), we still…
				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t1, %.b1 ], [ 0, %.preheader ]
				%t1 = add nsw i32 %invariant, %iv
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b1, label %.exit, !prof !1

				.exit:
				ret i32 10
				}

				; Function Attrs: norecurse nounwind readonly uwtable
				; b1
				; / \
				; b2 b6
				; / \ \|
				; b3 b4 \|
				; \ / \|
				; b5 \|
				; \ /
				; b7
				; preheader: 1000
				; b2: 15
				; b3: 7
				; b4: 7
				; There is alias store in loop, do not sink load
				; CHECK: t5
				; CHECK: .preheader:
				; CHECK: load i32, i32* @g
				; CHECK: .b1:
				; CHECK-NOT: load i32, i32* @g
				define i32 @t5(i32, i32*) #0 {
				%3 = icmp eq i32 %0, 0
				br i1 %3, label %.exit, label %.preheader

				.preheader:
				%invariant = load i32, i32* @g
				br label %.b1

				.b1:
				%iv = phi i32 [ %t7, %.b7 ], [ 0, %.preheader ]
				%c1 = icmp sgt i32 %iv, %0
				br i1 %c1, label %.b2, label %.b6, !prof !1

				.b2:
				%c2 = icmp sgt i32 %iv, 1
				br i1 %c2, label %.b3, label %.b4

				.b3:
				%t3 = sub nsw i32 %invariant, %iv
				br label %.b5

				.b4:
				%t4 = add nsw i32 %invariant, %iv
				br label %.b5

				.b5:
				%p5 = phi i32 [ %t3, %.b3 ], [ %t4, %.b4 ]
				davidxlUnsubmitted Done Reply Inline Actions This test can be simplified a little by just making an external call here. davidxl: This test can be simplified a little by just making an external call here.
				%t5 = mul nsw i32 %p5, 5
				br label %.b7

				.b6:
				%t6 = call i32 @foo()
				br label %.b7

				.b7:
				%p7 = phi i32 [ %t6, %.b6 ], [ %t5, %.b5 ]
				%t7 = add nuw nsw i32 %iv, 1
				%c7 = icmp eq i32 %t7, %p7
				br i1 %c7, label %.b1, label %.exit, !prof !7

				.exit:
				ret i32 10
				}

				declare i32 @foo()

				attributes #0 = { norecurse nounwind readonly uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				chandlercUnsubmitted Done Reply Inline Actions Please try to minimize function attributes you have in your test cases. You may not need any. If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary). chandlerc: Please try to minimize function attributes you have in your test cases. You may not need any.
				!llvm.ident = !{!0}

				!0 = !{!"clang version 3.9.0 (trunk 268689)"}
				!1 = !{!"branch_weights", i32 1, i32 2000}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C++ TBAA"}
				!6 = !{!"branch_weights", i32 2000, i32 1}
				!7 = !{!"branch_weights", i32 100, i32 1}

test/Transforms/LICM/sink.ll

This file was added.

				; RUN: opt -S -licm < %s \| FileCheck %s --check-prefix=CHECK-LICM
				; RUN: opt -S -licm < %s \| opt -S -loop-sink \| FileCheck %s --check-prefix=CHECK-SINK

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				chandlercUnsubmitted Done Reply Inline Actions Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms. chandlerc: Unless your tests depend on specific datalayout or triple, please avoid including them in the…
				; Original source code:
				; int g;
				; int foo(int p, int x) {
				; for (int i = 0; i != x; i++)
				; if (__builtin_expect(i == p, 0)) {
				; x += g; x *= g;
				; }
				; return x;
				; }
				;
				; Load of global value g should not be hoisted to preheader.

				@g = global i32 0, align 4

				; Function Attrs: norecurse nounwind readonly uwtable
				define i32 @_Z3fooii(i32, i32) #0 {
				%3 = icmp eq i32 %1, 0
				chandlercUnsubmitted Done Reply Inline Actions Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly. chandlerc: Same comments as above about function attributes. Also, please don't use C++ mangled names, but…
				br i1 %3, label %._crit_edge, label %.lr.ph.preheader

				.lr.ph.preheader: ; preds = %2
				br label %.lr.ph

				; CHECK-LICM: .lr.ph.preheader:
				; CHECK-LICM: load i32, i32* @g
				; CHECK-LICM: br label %.lr.ph

				.lr.ph: ; preds = %.lr.ph.preheader, %9
				%.03 = phi i32 [ %8, %.combine ], [ 0, %.lr.ph.preheader ]
				%.012 = phi i32 [ %.1, %.combine ], [ %1, %.lr.ph.preheader ]
				%4 = icmp eq i32 %.03, %0
				br i1 %4, label %.then, label %.combine, !prof !1

				.then: ; preds = %.lr.ph
				%5 = load i32, i32* @g, align 4, !tbaa !2
				%6 = add nsw i32 %5, %.012
				%7 = mul nsw i32 %6, %5
				br label %.combine

				; CHECK-SINK: .then:
				; CHECK-SINK: load i32, i32* @g
				; CHECK-SINK: br label %.combine

				.combine: ; preds = %.lr.ph, %.then
				%.1 = phi i32 [ %7, %.then ], [ %.012, %.lr.ph ]
				%8 = add nuw nsw i32 %.03, 1
				%9 = icmp eq i32 %8, %.1
				br i1 %9, label %._crit_edge.loopexit, label %.lr.ph

				._crit_edge.loopexit: ; preds = %.combine
				%.1.lcssa = phi i32 [ %.1, %.combine ]
				br label %._crit_edge

				._crit_edge: ; preds = %._crit_edge.loopexit, %2
				%.01.lcssa = phi i32 [ 0, %2 ], [ %.1.lcssa, %._crit_edge.loopexit ]
				ret i32 %.01.lcssa
				}

				attributes #0 = { norecurse nounwind readonly uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

				!llvm.ident = !{!0}

				!0 = !{!"clang version 3.9.0 (trunk 268689)"}
				!1 = !{!"branch_weights", i32 1, i32 2000}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C++ TBAA"}

This is an archive of the discontinued LLVM Phabricator instance.

Add Loop Sink pass to reverse the LICM based of basic block frequency.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 68880

include/llvm/InitializePasses.h

include/llvm/LinkAllPasses.h

include/llvm/Transforms/Scalar.h

include/llvm/Transforms/Utils/Local.h

include/llvm/Transforms/Utils/LoopUtils.h

lib/Transforms/Scalar/CMakeLists.txt

lib/Transforms/Scalar/GVN.cpp

lib/Transforms/Scalar/LICM.cpp

lib/Transforms/Scalar/LoopSink.cpp

lib/Transforms/Scalar/Scalar.cpp

lib/Transforms/Utils/Local.cpp

lib/Transforms/Utils/SimplifyInstructions.cpp

test/Transforms/LICM/loopsink.ll

test/Transforms/LICM/sink.ll

Add Loop Sink pass to reverse the LICM based of basic block frequency.
ClosedPublic