This is an archive of the discontinued LLVM Phabricator instance.

Add Loop Sink pass to reverse the LICM based of basic block frequency.
ClosedPublic

Authored by danielcdh on Jul 25 2016, 2:05 PM.

Details

Summary

LICM may hoist instructions to preheader speculatively. Before code generation, we need to sink down the hoisted instructions inside to loop if it's beneficial. This pass is a reverse of LICM: looking at instructions in preheader and sinks the instruction to basic blocks inside the loop body if basic block frequency is smaller than the preheader frequency.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
davidxl added inline comments.Jul 30 2016, 9:31 PM
lib/Transforms/Scalar/LoopSink.cpp
38

Add statistics for number of instructions sinked etc.

79

LoopBase class has a member method 'contains' which can be used.

99

Remove dead code

134

add documentation for the method.

140

To speed up the pass, perhaps it is better to do a quick scan of the BB of the loop to see if there are any blocks that are really cold (colder than preheader). If there are not, early return.

144

add a comment for this variable.

160

if -> is

179

Perhaps compare cdt's frequency with the sum of bb and sink bb? If it is greater than the sum, do not use cdt.

200

if T >= ... early return

205

Add debug trace here

Have you talked to anyone about the design for this?

I know Daniel Jasper, Quentin, and several others have looked at similar things before. Previous attempts have focused on using MachineLICM to do sinking as well as hoisting. While I don't have a strong opinion about one design over the other, we should be consistent about the plan here, and possibly consolidate some of the logic.

danielcdh updated this revision to Diff 66880.Aug 4 2016, 4:44 PM
danielcdh marked 10 inline comments as done.
danielcdh edited edge metadata.

add more test and address David's comments.

davidxl added inline comments.Aug 11 2016, 4:22 PM
lib/Transforms/Scalar/LoopSink.cpp
123

This is not correct -- it should merge in SubLoop's AST too. See LICM's CollectAliasInfoForLoop -- that should be extracted as a common utility.

132

Is the first check needed?

148

Contnue if BBs is empty

156

This logic seems to be inverted -- Using CDT should be encouraged if its frequency equals the sum of SinkBB and N. In other words, division should be used, not multiplication

184

Is the formatting correct here?

danielcdh updated this revision to Diff 67802.Aug 11 2016, 11:13 PM
danielcdh marked an inline comment as done.

update

lib/Transforms/Scalar/LoopSink.cpp
123

Yes, we can refactor the code to reuse the CollectAliasInfoForLoop from LICM for legacy passmanager:

  1. Build a base class ASTLoopTransformation, and make LookInvariantCodeMotion and LoopSink sub class of it. LoopToAliasSetMap and collectAliasInfoForLoop should be protected member of the base class. The logic in the sub class will also be shared with the new pass manager.
  2. Build a base class ASTLoopLegacyPass which inherits from LoopPass, and LICMLegacyPass and LoopSinkLegacyPass subclass of it. cloneBasicBlockAnalysis, deleteAnalysisValue and deleteAnalysisLoop should be private member of the base class. Base class also need to have a ASTLoopTransformation pointer to invoke the actual logic shared with new pass manager.

This is definitely doable, but seems an over kill to me because:

  1. collectAliasInfoForLoop is an optimization for compile time. And it is not yet available in the new pass manager.
  2. The refactoring mainly focuses on abstraction of the old pass manager, which will be replaced by new pass manager soon.
  3. There is much complexity involved because new pass manage does not support this optimization, and we need to make it fall back to what we do right now (add all basic blocks to AST) without introducing memory leak.

Comments?

132

I think yes because if there is loop variant in its operand, sinking it into the loop may change the value every iteration.

156

It's actually expected: if CDT's frequency is equal or only a little larger than SinkBB+N's frequency, then the check will fail and goes to "else" branch: picking the CDT instead SinkBB.

184

I used clang-format --style=llvm for the formatting.

davidxl added inline comments.Aug 16 2016, 1:26 PM
test/Transforms/LICM/loopsink.ll
18

b6 --> b7

24

add check-not of @g after preheader.

27

add check-not @g after b3 and b4

66

Add a branch profile data here.

75

B6 --> b6

77

B3 --> b3

81

This should be b7

132

annotate with branch profile data

143

B3 -> b3

147

b6 -> b7

211

but this loop will be executed at least once per call of t4, so the loop body frequency should not be lower than entry frequency

272

This test can be simplified a little by just making an external call here.

danielcdh updated this revision to Diff 68259.Aug 16 2016, 2:42 PM
danielcdh marked 4 inline comments as done.

update tests

danielcdh marked 6 inline comments as done.Aug 16 2016, 2:43 PM
danielcdh added inline comments.
test/Transforms/LICM/loopsink.ll
211

So the current algorithm is that even if the frequency is equal (as in this case), we still tend to sink because it will reduce live range.

I don't have further comments.

Hal, does this patch look ok to you?

David

Are you going to have a separate patch to hook this in the pass manager ?

lib/Transforms/Scalar/LoopSink.cpp
142

Isn't it still okay to try to sink in outside the loop if the user block is cold enough?

162

Why not early return if frequency of SinkBB is greater than PreheaderFreq.

179

I guess you intend L->contains(LI->getLoopFor(N)) ?

danielcdh updated this revision to Diff 68615.Aug 18 2016, 2:46 PM
danielcdh marked 16 inline comments as done.

update

lib/Transforms/Scalar/LoopSink.cpp
143

That would become general purpose sinking instead of loop-sinking. And we need to handle alias/invariant differently.

180

good catch. Thanks!

Are you going to have a separate patch to hook this in the pass manager ?

Yes, I'll send another patch to hook it up in clang.

dberlin added inline comments.
include/llvm/Transforms/Utils/LoopUtils.h
484
  1. The comment is not right or specific.

First, this is used for both hoisting and sinking.
Second, it does not say what "hoistable" means.

In fact, this function is really checking "is aliased with loop", you probably should call it something like that and use that.

487

Ditto above

490

This should be "Return true if a non-memory instruction can be handled by the hoister/sinker"

Please don't call it isHoistableInst and then use it for sinking :)

lib/Transforms/Scalar/LICM.cpp
501

This is not really right, it returns whether a non-memory instruction is hoistable. :)

lib/Transforms/Scalar/LoopSink.cpp
42

"Don't sink instructions that require cloning unless they execute less than this percent of the time"

(or whatever)

87

"sunk into loop body"

89

CurLoop is unused?

90

s/hoist/sink/

110

bool HasColdBB =llvm::any_of(L->blocks(), [&](const BasicBlock *BB) { return BFI->getBlockFreq(BB) <= PreHeaderFreq});

133

I'm confused.
Why is this necessary, instead of using the reverse iterators and just advancing them when necessary?

170
  1. Please factor this out into FindSinkBlocks or something.
  1. This is non-deterministic, because you are iterating over a denseset.

I am also confused by this placement strategy.

You are not ordering the blocks in any particular processing order, so you may not actually choose the best sink points, as once you NCA something high in the domtree and something low, NCA will always be something high in the domtree. If you ordered it so it was the lowest things first (using the DFS numbers or whatever), you may decide multiple intermediate placements are cheaper than what you are doing here.

191

Needs a comment

199

This looks ... interesting.

Instead, why not add SinkBBS.size() == 0 check above (so it skips if it's empty).

Then you should simply move the i == 0 case outside of the loop, and the loop is just doing the insertions.

205

This is Local.cpp's replaceDominatedUsesWith (if you need a Instruction, Use version, make one :P)

danielcdh updated this revision to Diff 68722.Aug 19 2016, 12:46 PM
danielcdh marked 13 inline comments as done.

refactor

include/llvm/Transforms/Utils/LoopUtils.h
490

Refactored code to remove these functions.

lib/Transforms/Scalar/LICM.cpp
501–502

refactored code to remove this.

lib/Transforms/Scalar/LoopSink.cpp
134

We need reverse iterator because instructions in the back of the BB may depend on the instructions in the front, thus it needs to be sunk first before other instructions can be sunk.

171

Good point.

Refactored the code and updated the algorithm to iterate from cold blocks top ensure optimal.

200

SinkBBs.size() == 0 check is already moved above the total frequency check, so it will not reach here.

The i==0 check is to distinguish between the first SinkBB (that we use move instead of insert) and the later SinkBB (that we make a copy for each insert).

reames added a subscriber: reames.Aug 21 2016, 7:55 PM

minor drive by comments

include/llvm/Transforms/Scalar.h
141

Missing comments

include/llvm/Transforms/Utils/Local.h
325 ↗(On Diff #68730)

Clarification needed. This doesn't really tell me what the parameter does. In fact, there's nothing that says what the BB param is for either. Can you fix that?

lib/Transforms/Scalar/LoopSink.cpp
11

How does this related to the existing Sink.cpp pass? (Not asking for an answer in review, but updating the comment with context might be helpful.)

danielcdh updated this revision to Diff 68880.Aug 22 2016, 10:11 AM
danielcdh marked 3 inline comments as done.

update comments

danielcdh updated this revision to Diff 69869.Aug 31 2016, 9:41 AM

Integrate with reverse-iterator enhancement.

davidxl accepted this revision.Sep 1 2016, 2:14 PM
davidxl edited edge metadata.

looks good to me too.

This revision is now accepted and ready to land.Sep 1 2016, 2:14 PM

So, I don't think the code is quite ready to go into the tree yet. There are a bunch of minor issues that should be cleaned up. None of these are really big (it seems like you all have sorted out the algorithmic and high level design stuff), but I think they shuold be addressed before the code goes in. Especially the refactoring bit I suggest below.

lib/Transforms/Scalar/LoopSink.cpp
128–138

I think this code needs significantly better variable names.

You have BB, B, and B again all for basic blocks in *different* containers.

And AddedBBs doesn't really tell the reader much about what the container is doing. Compare that to SortedLoopBBs which says exactly what it is. SinkBBs might also benefit from a slightly better name (and the function name might similarly benefit).

129

Usually it is better to keep a set outside the loop and clear it on each iteration...

159–161

Why the variable rather than inlining this? Also, can you just call any_of directly since we have a using declaration and this is a range variant that doesn't exist in the standard?

172–175

This comment again doesn't parse for me, but isn't this dead code now that you're just directly using the reverse iterators?

chandlerc added inline comments.Sep 1 2016, 3:47 PM
lib/Transforms/Scalar/LICM.cpp
434–438

Can you split these refactorings into a separate patch please? They seem strictly beneficial and unrelated, and will make for example reverts messier if left in as part of this patch.

I have several minor and boring comments on the refactorings, but it seems better to make them on a dedicated patch than to clutter this thread with them.

(Just to be clear, I'd would leave it a static function, and just get the API you want and the implementation improvements. Then in this patch you can just make it an external function.)

lib/Transforms/Scalar/LoopSink.cpp
89–92

I'm having a hard time understanding what this comment is trying to say. Can you try to re-word this to be more clear (and more clearly worded)?

102

Since this is new code, please use the more modern form of doxygen throughout: http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

(I've also updated those to more accurately reflect that we use auto-brief new rather than explicit '\brief ...' sections.)

102–106

I feel like this comment too could be much more clear.

First, it isn't clear without a lot of context what the purpose of this would be. I'm guessing you mean something like find a candidate set of basic blocks to sink *into*?

"Dominate BBs" - this is ambiguous. Do *all* returned basic blocks need to dominate the set of blocks in BBs? Or is it more that for each block in BBs, at least one block in the returned set must dominate that block? A better name for the parameter than "BBs" would probably help here.

The frequency constraint isn't really explained. Why is that important? Give the reader more help to understand what the code will end up doing.

107–108

Do you expect these sets to be large? Naively, I wouldn't...

If small is likely to be common, it would be better to use SmallPtrSetImpl as an input and SmallPtrSet as a result with a reasonable small size optimization.

126–127

This comment doesn't parse: "that are dominated it" seems to have a grammar error.

danielcdh updated this revision to Diff 70123.Sep 1 2016, 6:11 PM
danielcdh marked 13 inline comments as done.
danielcdh edited edge metadata.

Integrate Chandler's comment.

danielcdh updated this revision to Diff 70124.Sep 1 2016, 6:17 PM

clang-format

(Trying to first clarify the split-off of the patch I'm suggesting...)

lib/Transforms/Scalar/LICM.cpp
434–438

I see that we got confused here and in the other review.

The part of this refactoring I *do* think makes sense to split out and send for review independently is changing the signature (for example, removing TargetLibraryInfo) and re-organizing the implementation.

The only part I think needs to happen with this patch is making this routine be a public routine in the 'llvm' namespace.

Does that make more sense?

danielcdh updated this revision to Diff 70721.Sep 8 2016, 10:29 AM

rebase and update

danielcdh added inline comments.Sep 8 2016, 10:37 AM
lib/Transforms/Scalar/LICM.cpp
434–438

Sure.

The changes in LICM are reduced to minimal in https://reviews.llvm.org/D24168, PTAL.

Please do not look at LICM changes in this patch for now because it also includes the refactoring bit. I'll rebase once D24168 is closed.

danielcdh updated this revision to Diff 71441.Sep 14 2016, 2:47 PM

update the logic to replace dominated uses after sinking.

First batch of comments. While I'll probably have a some more minor comments later, there are a couple of particularly interesting ones that I wanted to go ahead and send out.

include/llvm/Transforms/Utils/LoopUtils.h
472–474

Here and elsewhere in comments, I would just say "is null" rather than "is nullptr".

lib/Transforms/Scalar/LoopSink.cpp
11–15

Some grammar issues here:

  • "all instructions" -> "all of the instructions"
  • "in loop preheader" -> "in the loop preheader"
  • "sink it" -> "sink them"
  • "the Sink pass that it only" -> "the Sink pass in that it only"
  • "in loop's preheader" -> "in the loop's preheader".

I also think the wording could be improved to be more clear when reading it. For example "This pass does the inverse transformation of what LICM does" reads more clearly to me. Lastly, I would lead with that high-level description, then go into specifics. I would separate the comparison with the other Sink pass into a second paragraph. So something along the lines of:

This pass does the inverse transformation of what LICM does. It
traverses all of the instructions ...

It differs from the Sink pass ...

Does that make sense?

42

Do you want to count separately the number of instruction clones created as part of this? Not sure if that has been an interesting metric while working on this patch or not.

49

I would suggest sinking the LegacyLoopSinkPass to the bottom of the file to avoid needing this forward declaration.

102

Naming convention: adjustedSumFreq

130

naming: findBBsToSinkInto

136–144

This shouldn't be done each time we try to sink an instruction. This should be pre-computed once for the loop and re-used for each instruction we try to sink.

149–171

This seems really expensive. By my reading this is O(N * L * M) where L is the number of basic blocks in a loop, N is the number of instructions we try to sink into that loop, and M is the number of basic blocks within the loop that use the instructions. If there is for example one hot basic block in the loop and a large number of cold basic blocks and all of the uses are in those cold basic blocks, it seems like this could become quite large.

Have you looked at other algorithms? Is there a particular reason to go with this one? (I've not thought about the problem very closely yet...)

181–182

This doxygen is still in the old form. Also, this should be 'sinkLoop'.

Use AARseluts here rather than the old name?

Can any of these parameters be null? If not, pass references? I would generally partition the arguments into those that are required and pass references for them and then pass the optional ones as pointers. Then you can document that they are optional and the types will reinforce that fact.

190–192

I think a comment along the lines of "If there are no basic blocks with lower frequency than the preheader then we can avoid the detailed analysis as we will never find profitable sinking opportunities."

I would also find this easier to read without the negation as:

if (all_of(...
      return BFI->getBlockFreq(BB) > PreheaderFreq;
211

This comment is a little confusing. It seems to be describing a think (like a variable, for example BBs) but is also right above a loop that populates that variable. Generally, once comments can be read as implementation comments about the code, I try to make them describe behavior of the code as that reads a bit better IMO. So "Compute the set of blocks which contain a use of I and ..." would read a bit better for me.

Also "are in the sub loop of L" doesn't parse very well although I understand what you mean. I think it would be more clear to say "... blocks in the loop L which ..." rather than going into the issue of subloops.

215

If this is the case we can't sink I at all though, right? I think that is what the code already does, maybe just update the comment?

216–217

I would use two ifs here since one needs its own comment (and it is a nice comment!)

217

Why not use L->contains(UI)?

223

Check for an empty BBs here to handle the case of a use that can't be handled?

This combined with the below BBsToSinkInto makes me think this should be extracted to a helper that tries to sink one instruction so that we can use early exit from that function.

235

So, this has an important problem: it introduces a non-determinism into the compiler.

The initial problem is that SmallPtrSet does not provide stable iteration order, and so there is no predicting which basic block gets the original instruction and which one gets the clone.

However, merely using something like SetVector helps but isn't fully satisfying here because the insertion order is *also* something we would very much like to not depend on: the use list order.

I would suggest essentially numbering the basic blocks in the loop and use a vector of the BBs sorted by their number here. You can just create a map out of the blocks range with something like:

int i = 0;
for (auto *BB : L->blocks())
  LoopBlockNumber[BB] = ++i;

(Just pseudo code, but you get the idea.)

That will punt the ordering requirement to LoopInfo which is I think the right place for it.

251

I'd indicate the difference between this and the previous debug message. Maybe "Sinking a clone of " instead of just "Sinking".

danielcdh marked 13 inline comments as done.Oct 4 2016, 11:26 AM

Thanks for the reviews!

lib/Transforms/Scalar/LoopSink.cpp
149–171

I initially started with an adhoc algorithm which is O(L * M), but Danny pointed out it is not optimal, so I changed to this optimal algorithm. The lower bound for any sinking algorithm is O(L*M), but if optimal solution is desired, O(N*L*M) is the best I can get.

Yes, this could be expensive when N is large. I practice, I did not see noticeable compile time increase in speccpu2006 benchmarks after applying this patch (and enable the pass in frontend).

How about we limit the N to be no more than a certain number to avoid expensive computation in extreme cases?

215

Not sure if I get this right, do you mean update the comment (as I just did) to make it less redundant?

217

Because it does not check for sub loops.

e.g.

Loop1 {

I1
Loop2 {
  I2
}

}

Loop1->contains(I1) --> true
Loop2->contains(I2) --> true
Loop1->contains(I2) --> false

For this check we want to make sure I1 and I2 both return true.

danielcdh updated this revision to Diff 73512.Oct 4 2016, 11:27 AM

integrate Chandler's reviews

danielcdh updated this revision to Diff 73513.Oct 4 2016, 11:31 AM
danielcdh marked an inline comment as done.

missed one comment

What is the status of this patch? Any more comments need to be addressed?

Chandler said he has more comment.

M is the number of use BBs.

The pass already filters out loops which do not have any cold blocks -- this effectively filters out most of the loops in reality so the compile time impact will be minimal. Further more, the following can be done:

  1. only collect cold bbs in the loop body that is colder than header and sort them instead
  2. skip any instructions with use BBs that are not member of the cold BBs collected in 1).

Example of parent BB being colder than the use BB?

Will try to make a full pass through, thanks for the extensive updates Dehao! One specific point of discussion below:

lib/Transforms/Scalar/LoopSink.cpp
149–171

I'm not worried about N being large. I'm worried about the fact that L is >= to M and so it is quadratic in the number of basic blocks that use each instruction.

The other thing is that if this scales in compile time by N then it scales in compile time *by how much effect it is having*. If it scales in compile time by M^2, then we pay more and more compile time as loops get larger even if we only sink very few instructions.

I would either bound M to a small number, and/or look for some way to not have this be quadratic. It seems like a bunch of this should be pre-computable for the loop?

danielcdh updated this revision to Diff 74907.Oct 17 2016, 2:38 PM

add max use threshold

This is really close. Some minor nit picks and a few more interesting comments below.

include/llvm/Transforms/Utils/LoopUtils.h
472–474

You changed one 'nullptr' to 'null' but missed the other.

lib/Transforms/Scalar/LoopSink.cpp
12–13

"and sink them/ to" -> "and sinks them to"

102

"/p" -> "\p"

160

The number of user *instructions* isn't really the right thing to apply the threshold to as that doesn't directly change the cost.

The idea is that we need the size of BBsToSinkInto to be a small constant in order for the search for the coldest dominating set to be "just" linear in the number of blocks in the loop.

So while a threshold of "40" may make sense for number of user instructions, I suspect the threshold should be much smaller when applied to the size of BBsToSinkInto.

I also think you should add two comments about this. One, you should comment to the findBBsToSinkInto function clarifying the algorithmic complexity (That it O(N * M) or O(M^2) where N is SortedLoopBBs.size() and M is BBsToSinkInto.size()), and you should mention where you check this threshold that the reason is because we're going to call findBBsToSinkInto which would go quadratic if we didn't set a cap.

The reason for all of this is that I'm worried some future maintainer will come along and not really understand how risk it is to adjust these thresholds so I think it is important to document the implications.

I still think we will long-term need a better algorithmic approach here as I suspect we'll find annoying cases where the threshold blocks an important optimization (such as when there are way too many initial BBsToInsertInto but there are a small number of common dominating blocks). But I understand this is one of the really hard problems (its the same as optimal PHI placement and a bunch of other ones), and I don't want to hold up your patch on a comprehensive approach here.

On an unrelated note, you should also document that this threshold has a secondary function: it places an upper bound on how much code growth we may trigger here. I'd document this in particular as that seems somewhat accidental and I suspect we may long-term want a better threshold for that. I would in fact encourage you to leave a FIXME to adjust this for min_size and opt_size.

179–180

We generally prefer calling .empty() to testing .size() against zero.

185

Here you don't need stable sort since this is a total ordering. You should just use std::sort and mention that this is a known total ordering in a comment. You could do that in an overall comment htat explains what you're doing here:

// Copy the final BBs into a vector and sort them using the total ordering
// of the loop block numbers as iterating the set doesn't give a useful
// order. No need to stable sort as the block numbers are a total ordering.
195

You didn't actually switch to the sorted list here.

Also, you can just use a range based for loop here.

219

This routine doesn't sink the loop, so a name sinkLoop seems confusing. Maybe sinkLoopInvariantInstructions?

Also, I think this should be a static function.

242–247

It wasn't obvious to me reading this that SortedLoopBBs only contained the oop BBs that are less hot than the preheader. I think it might be nice to clue the reader in that this isn't *all* the loop BBs. Maybe SortedColdLoopBBs? Or just ColdLoopBBs? If you make this change, I'd keep the name consistent throughout of course.

Also, you use <= here, but < everywhere else I see, any particular reason to include BBs in this list with the same frequency?

danielcdh updated this revision to Diff 75159.Oct 19 2016, 8:38 AM
danielcdh marked 7 inline comments as done.

update

Thanks for the reviews.

I also added an overall algorithm description at file level per Madhur's suggest.

lib/Transforms/Scalar/LoopSink.cpp
160

You actually mean the size of BBs to be small constant? Because computing BBsToSinkInto is the most computation intensive part of the algorithm.

195

The reason I used iterator here is because we need to handle the first entry in a different way.

chandlerc accepted this revision.Oct 27 2016, 12:58 AM
chandlerc added a reviewer: chandlerc.

Very cool. I think this patch LGTM with the comments below addressed (documentation fixes, simple changes, a fixme, a bunch of minor test cleanup). Feel free to submit. I've asked for one somewhat immediate follow-up patch, and it'd also be good to get a patch to put this into the pipeline behind a flag so folks can test the size impact.

lib/Transforms/Scalar/LoopSink.cpp
13–22

The differences seem to be a bit duplicated at this point. Sorry if this is the result of my suggestions.

I think that you only need one of the prose "It Differs ..." and the bulleted list. If you want the detail in the list, I would just clean up the wording of that section so the English reads cleanly:

"in a way that" -> "in the following ways"
"prehead" -> "preahader"
"find optimal" -> "find the optimal"

84

I would use SmallPtrSetImpl<BasicBlock *> here and elsewhere on API boundaries where you can below.

212

So, this technically will break the verifier if you ever look at the IR at this point. While that is allowed, it seems fairly easy to avoid this by first creating all the clones and rewriting uses to the clones before moving the instruction. By the time you move it, the only uses remaining should be the ones dominated by the destination insertion point.

220–228

This causes the cloning to be quadratic in the number of uses as we may have a single clone for each use.

I know we have an upper bound, but this is still pretty slow (the use list is slow to walk).

I'd suggest you fix this in a follow-up patch though (I think it'll be a lot of code and easier to review as a follow-up), just leave a short FIXME saying that this is slow and may be quadratic in the number of uses.

(For the follow-up patch, the approach I'd suggest is that as you build up BBsToInsertInto, you also build up a map from UseBBs to the dominating BB that will be inserted into. Then you can insert into each BBsToInsertInto here without rewriting any uses but building up a map from those BBs to the inserted clones. Finally, you can do a single walk over the uses and for each one look up the inserted BB in the first map and then the inserted clone in the second map and rewrite the use. Or maybe you see a simpler way? This was just the first that came to mind.)

264

Use a SmallDenseMap? Good to dodge allocations for small loops.

test/Transforms/LICM/loopsink.ll
8

You can prune out these "Function Attrs" comments... See below.

292

Please try to minimize function attributes you have in your test cases. You may not need any.

If you do need them, you can attach the textual form directly to the functions which is much more friendly for test cases (and makes the comments explaining what the '#0' attribute set contains unnecessary).

294–303

Please prune out all the metadata your test isn't directly using (TBAA stuff, Clang stuff).

test/Transforms/LICM/sink.ll
4–5

Unless your tests depend on specific datalayout or triple, please avoid including them in the IR test cases so that things are more generic and less tied to platforms.

22

Same comments as above about function attributes. Also, please don't use C++ mangled names, but instead provide clean and easy to read names directly.

danielcdh updated this revision to Diff 76055.Oct 27 2016, 9:24 AM
danielcdh marked 21 inline comments as done.
danielcdh edited edge metadata.

Thanks for the review!

danielcdh closed this revision.Oct 27 2016, 9:39 AM