This is an archive of the discontinued LLVM Phabricator instance.

PGO IR-level instrumentation infrastructure

Authored by xur on Sep 10 2015, 4:03 PM.



This patch implements the infrastructure for PGO late (i.e. IR-level) instrumentation. (Refer to RFC: PGO Late instrumentation for LLVM

The main part of code is in a newly added file: lib/Transforms/Instrumentation/PGOLateInstr.cpp
This file implements a module pass PGOLateInstrumeatnion. It applies the instrumentation to each function by class PGOLateInstrumentationFunc. For each function, perform the following steps:
(1) Collect all the CFG edges. Assign an estimated weight to each edge. Critical edges and back-edges are assigned to high value of weights. One fake node and a few fake edges (from the fake node to the entry node, and from the exit nodes to the fake node) are also added to the worklist.
(2) Construct the MST. The edges with the higher weight will be put to MST first, unless it forms a cycle.
(3) Traverse the CFG and compute the CFG hash.

The above three steps are the same for profile-generate and profile-use compilation.

In the next step, for profile-generation compilation:
(4-gen) Instrument all the edges that not in the MST. If this is a critical edge, split the edge first. The actual instrumentation is to generate Intrinsic::instrprof_increment() in the instrumented BB. This intrinsic will be lowed by pass createInstrProfilingPass().

For profile-use compilation,
(4-use) Read in the counters and the CFG hash from the profile file.
(5-use) If there is no error, populate the counters to all the edges in reverse topological order of the MST.
(6-use) Once having all the edge counts, set the branch weights metadata for the IR having multiple branches. Also apply the cold/hot function attributes based on function level counts.

This pass is added to PassManagerBuilder when populate module passes. see lib/Transforms/IPO/PassManagerBuilder.cpp.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
xur added a comment.Sep 23 2015, 3:06 PM

This new patch integrated David's review comments. The main changes are
(1) more efficient code for union-find algorithm.
(2) merge EdgeCount structure into Edge
(3) merge BBGroupInfo and BBCount into and rename it to BBInfo.

xur added a subscriber: xur.Sep 23 2015, 3:12 PM

The reason I do not call SplitAllCriticalEdges is because I don't want to
split all critical edges. I only split these critical edges that need to be
instrumented. For critical edges, I prioritize them into MST to avoid the


A couple more comments I spotted when going back through the patch again.

44–45 ↗(On Diff #35559)

Why do we need to modify InstrProfWriter.{h,cpp}? That seems like a layering violation.

270–278 ↗(On Diff #35559)

No naked new/delete. Use RAII (e.g. you can probably use BumpPtrAllocator to allocate these).

davidxl added inline comments.Oct 2 2015, 9:58 PM
86–94 ↗(On Diff #35559)

Please add a comment.

117 ↗(On Diff #35559)

make it std::vector<std::unique_ptr<Edge>>

150 ↗(On Diff #35559)

make it DenseMap<const BasicBlock *, std::unique_ptr<BBInfo>> BBInfos;

272 ↗(On Diff #35559)

No need for explicit delete with unique_ptr

336 ↗(On Diff #35559)

You can use the ReversePostOrderTraversal Class defined in ADT/PostOrderIterator.h. Actually does the order of traverse matter here?

349 ↗(On Diff #35559)

What is the rationale of this heuristic?

351 ↗(On Diff #35559)

Why not compute BlockFrequencyInfo and use the real static Edge frequency (from BB freq and edge prob)? For Critical edge, the weight can be increased to infinite.

493 ↗(On Diff #35559)

As the name linkage function, the name globalization function also needs to be a common interface which FE instrumentation can share in the future.

516 ↗(On Diff #35559)

This code is shared with FE based instrumentation and it should be refactored as a utility function. Suggested location: include/llvm/Transformation/Instrumentation.h.

Converting FE to use the common interface can be done as a follow-up patch

666 ↗(On Diff #35559)

Make reader a shared object at module level.

xur updated this revision to Diff 39382.Nov 5 2015, 11:00 AM

Here is the updated patch that integrated David and Sean's comments and suggestions. The major changes are:

  • move profile reader to module level.
  • move minimum spanning tree into a utility class.
  • separate profile generate and use into to two separated passes.
  • use static profile to set the edge weights.
  • move the code that can be shared with clang instrumentation to include/llvm/Transforms/Instrumentation.h as utility functions.
  • name change (using IR instrumentation now)
  • move InstrProfWrite to a later patch.

I did not use ReversePostOrderTraversal in ADT/PostOrderiterator.h (David suggested) as compared to current version, it leads to more passes to populate the counters.

I think it is better to separate this IR level instrumentation from FE code, to avoid code duplication for other language frond-ends. So I keep to pass the file name instead of IndexedInstrProfRead to llvm. The error checking is already well encapsulated in getFunctioncounts() and ProfReader(). I kept it in middle end too.

I will add unit tests for both minimum spanning tree and IR instrumentation later.

Thanks for working on this!

Mostly nitpicking. I haven't really looked at the implementation yet.



Nit: it is a little strange to change some of the formatting.

35 ↗(On Diff #39382)

Nit: It may contain

49 ↗(On Diff #39382)

Nit: Find

165 ↗(On Diff #39382)

This will cause "unused variable" warning when building clang without assertion.

173 ↗(On Diff #39382)

Same here.

247 ↗(On Diff #39382)

Can you commit this separately if it is necessary?

140 ↗(On Diff #39382)

Should you add const here "infoString() const {"?

145 ↗(On Diff #39382)

This seems wrong. StringRef does not own the string data.

manmanren added inline comments.Nov 5 2015, 4:44 PM
36 ↗(On Diff #39382)

Can you add some high-level comments for CFGMST? What fields does this class expect on Edge and BBInfo? And what Removed field is used for? Do all members need to be public?

92 ↗(On Diff #39382)

Use nullptr instead of 0?

181 ↗(On Diff #39382)


187 ↗(On Diff #39382)


195 ↗(On Diff #39382)

emplace_back maybe?

194 ↗(On Diff #39382)

typo: option

majnemer added inline comments.
301–302 ↗(On Diff #34503)

We already have a CRC implementation in LLVM, do we need another one? Ours lives in <llvm/Support/JamCRC.h>

I'd imagine the one we have in-tree is considerably faster than this one...

davidxl added inline comments.Nov 6 2015, 11:50 AM
1 ↗(On Diff #39382)


29 ↗(On Diff #39382)

It seems that this class is still not general enough to be put into llvm/Support/ directory, I suggest move this file back to PGO directory for now.

A more general implementation should also support MachineBB. Also what interfaces need to be exposed, and what template parameters are needed, do we really need to expose edge at interface level (for this class, instead just use a pair of BBs to query on-tree status?) do we need to pass in opaque edge info types (which only clients know about) etc are all open questions -- this is the reason I don't think it is ready to be put into Support. Add a TODO for future work there.

More comments will follow later.

xur updated this revision to Diff 39600.Nov 6 2015, 3:12 PM

Thanks Manman Ren and David Majnemer for the code review. I have integrated their feedbacks to the newly upload patch.
The only thing I did not do is the "make_unque" suggestion by Manman. It seems to be a c++14 feature and I got an compile time error for that.

I'll integrate David Li's feedback in a later patch.



silvas added a comment.Nov 6 2015, 9:10 PM

This is looking really good!

My biggest concern overall about the implementation is that there seems to be overuse of classes (unnecessary inheritance; unusual use of constructors). It creates quite a maze.

I think that in the process of cleaning up the awkward PGOIRInstrumentationGenFunc Func(F, &M, BPI, BFI); to be free functions (see comment inline) you will find many simplifications to the entire implementation.

Remember, the purpose of a constructor is to initialize the invariants for the class's data members. Don't "do an algorithm" in a constructor or execute "steps" of a procedural computation.

10 ↗(On Diff #39600)

"IR level" is redundant since this is in lib/Transforms.

Please give an overview here of the structure of the file. (2 module passes: "use" and "gen"). Also explain their relationship, sharing data structures, etc.

124 ↗(On Diff #39600)

There's a lot of things here that don't seem to be used externally. So use an anonymous namespace.

161 ↗(On Diff #39600)

Do you actually need dynamic dispatch?

167 ↗(On Diff #39600)

Why "IR"? Are you planning to generalize this to Machine? For now just leave off "IR" (here and elsewhere).

193 ↗(On Diff #39600)

Switch to using a trailing underscore. Names starting with underscore followed by a capital letter are reserved for the implementation (it is undefined behavior to name a variable like that).

295 ↗(On Diff #39600)

Avoid std::unique_ptr<T> &. For a simple non-owning handle, use a T *. If you don't need to store the handle and the value is guaranteed non-null, consider using T &.

Same applies in many other parts of this patch where std::unique_ptr<T> & is used.

337 ↗(On Diff #39600)

This class is local to this file, right? No need to use such a large name; makes it sound like this is for an external interface or something.

(same comment applies to many other things in this file; I think that the module passes themselves are the only things that really need "PGOInstrumentation" prefix on their name)

Look at e.g. lib/Transforms/SROA.cpp with e.g. names like Slice or AllocaSlices.

340 ↗(On Diff #39600)

I don't think wrapping behavior is intended, so avoid unsigned types.

368 ↗(On Diff #39600)

Nit: ArrayRef

440 ↗(On Diff #39600)

Nit: no braces.

505 ↗(On Diff #39600)
517 ↗(On Diff #39600)


524 ↗(On Diff #39600)

In what way? And why does it matter to know that here?

563 ↗(On Diff #39600)

Just use reverse adapter from llvm/ADT/STLExtras.h

627 ↗(On Diff #39600)

Just declare this as SmallVector<unsigned, 2> EdgeCounts(Size, 0);

631 ↗(On Diff #39600)

Is this ever initialized?

632 ↗(On Diff #39600)

Can you use a range-for?

638 ↗(On Diff #39600)

Can you fix GetSuccessorNumber to take a const BasicBlock *? (hopefully it is a pretty obvious fix and there is no need for pre-commit review).
Then you can get rid of all these const_cast's.

665 ↗(On Diff #39600)

This is extremely awkward (declaring a local variable as a way to do a stateful mutating operation). Just use a free function. Same thing for the similar line with PGOIRInstrumentationUseFunc below.

davidxl added inline comments.Nov 7 2015, 1:55 PM
78 ↗(On Diff #39600)

Return BBInfo& instead of unique_ptr<BBInfo>&. It is more efficient for uses later .

81 ↗(On Diff #39600)

assert It->second.get() != null.

Return *It->second.get().

94 ↗(On Diff #39600)

Add more comments explaining here why need two duplicate fake edges.

104 ↗(On Diff #39600)

Redundant guard.

105 ↗(On Diff #39600)

Make successors and i 'unsigned' type.

110 ↗(On Diff #39600)

Use a macro for 1000? CRITICAL_EDGE_MULTIPLIER? Also handling overflow situation?

113 ↗(On Diff #39600)

Stale code.

127 ↗(On Diff #39600)

--> sortEdgesByWeight

146 ↗(On Diff #39600)

Should this be marked unconditionally here? Otherwise, we can simply boost the weight for such edges even higher without special treatment here?

184 ↗(On Diff #39600)

Make Edge* the return type for efficiency.

185 ↗(On Diff #39600)

Use a const var for default weight.

186 ↗(On Diff #39600)

Combine the find + insert pattern into one insert (with pair of source BB and null ptr). If the map is updated, a iterator is returned that can be updated with unique_ptr reset.

191 ↗(On Diff #39600)

Same here.

197 ↗(On Diff #39600)

return AllEdges.back().get();

167 ↗(On Diff #39600)

There was an earlier suggestion to change 'Late' in PGOLateInstrumentation to 'IR'. To add more to the bikeshed, I suggest change the class name to

class FuncPGOInstrumentation {

181 ↗(On Diff #39600)

Mst --> MST or MinSpanTree.

185 ↗(On Diff #39600)

BasicBlock *getInstrBB(const Edge &E);

silvas added inline comments.Nov 9 2015, 11:20 AM
167 ↗(On Diff #39600)

The suggestion to use "IR" instead of "Late" is just for informal discussion when needed to differentiate from what we currently have in clang. For comments in code or naming variables, all code in lib/Transforms is inherently at IR level so it is redundant. For example, we do not have a "SROAIR" pass or "InstCombineIR" pass etc. (grep for "IR" in lib/Transforms to see typical usage; it is almost never used)

davidxl added inline comments.Nov 9 2015, 11:56 AM

An interface is available in ProfileData/InstrProf.h:


which is also used by cfe.

Remove this dup.


Similarly llvm::createPGOFuncNameVar will do what you need.

Remove this duplicate.

xur marked 44 inline comments as done.Nov 13 2015, 4:27 PM

Thank David and Sean for the detailed comments -- they are really helpful.
I integrated the review to the patch. Could you please take a look. Thanks!

94 ↗(On Diff #39600)

This is a mistaken introduced in the lat minute formatting. Fixed.

104 ↗(On Diff #39600)

why Redundant? numsuccessors can be 0 (and these is else branch)

146 ↗(On Diff #39600)

The reason to split two passes is for the case of using profile count value as the weight, in which the values can vary drastically.

185 ↗(On Diff #39600)

removed the default weight.

161 ↗(On Diff #39600)

it returns different information strings. I can get rid of it. But I don't think it really matters -- it only used in debug printing.

524 ↗(On Diff #39600)

you are right. This comment is unnecessary and does not belong here. Deleted.

631 ↗(On Diff #39600)

good catch. fixed.

638 ↗(On Diff #39600)

Will do this later.

xur updated this revision to Diff 40187.Nov 13 2015, 4:29 PM
xur marked 3 inline comments as done.

Here is the latest patch that integrated David and Sean's comments.

The use of inheritance is still excessive. For example, PGOUseFunc and PGOGenFunc don't seem to need inheritance. Their "subclass state" would be better as local variables of a free function. FuncPGOInstrumentation does not need to be a base class. It can simply be a FuncInfo class that manages/caches information about a single function. Both uses of FuncPGOInstrumentation seem to call performOnFunc before they do anything else, so this appears to be establishing an invariant for the class, so this operation should be performed in the constructor (this is much more natural as FuncInfo; the complex construction was confusing when it was a base class).


Use a lambda.


Move the declaration for bool Critical = false; down one line and simplify to bool Critical = isCriticalEdge(TI, i);


No need to mention clang here. Please file a bug report for this if there isn't one open already.


Avoid using reserved names.

silvas added inline comments.Nov 13 2015, 7:04 PM

Factor the DEBUG macro usage into a single point of truth here. E.g. factor out a function printEdges(raw_ostream &OS, const StringRef Message). Then change this function to be simply DEBUG(dumpEdges(dbgs(), Message)). (this is similar to the pattern used by Value, for example).

This will also give clang-format an easier time.


Use std::tie


Other places seem to use uint64_t for weight. Why unsigned here?


Does this actually need to be virtual? Same for all the other virtual methods in this file.


"dump" functions are for debugging and should be conditionalized. For example, I suggest refactoring to allow a call

std::string Message = "Dump Function " + FuncName +
                          " after CFGMST: \t Hash: " +
MST.dumpEdges(dbgs(), Message);

Avoid having debug code be non-conditional.


Avoid const std::unique_ptr<PGOUseEdge> &.


Use a free function instrumentOneFunc


Use a free function setPGOCountOnFunc

Also, can you begin to work on tests? I think right now the core algorithm is pretty good. There are some cleanups to the implementation that I'd like to see but I can do those post-commit if necessary. Right now the main missing piece needed before this is committed are tests.

davidxl added inline comments.Nov 14 2015, 11:59 AM

if there is no BB ..


It is better to just call it 'setEdgeCount' with same documentation -- or make it just

getUnknownCountEdge() and let the client to set the edge directly.


Sean's comment here is not addressed -- use raw PGOUseEdge&


In general it is not safe to update the container (AllEdges) while iterating -- it may invalidate the iterator or element reference saved before. The code should create a worklist of edges before the count setting -- the worklist will then be 'frozen' (It is safe to update worklist, but there is no need to push new split edges into the list) and AllEdges can be safely updated.


Use the new getInstrProfRecord method -- eventually we will need to read value profile data too.

xur marked 17 inline comments as done.Nov 16 2015, 3:54 PM
xur added inline comments.

reorganized the code, also handles overflow.


changed to uint64_t.


Now guarded by DEBUG.


I see your point. But this is exactly the reason I did not use range-based loop like all others in the file. I get the vector size in the loop initialization and use the index to reference the vector elements in the body. So adding element to the vector will not be a problem. I could use a worklist, but the effect should be the same.

xur updated this revision to Diff 40361.Nov 16 2015, 3:58 PM
xur marked 3 inline comments as done.

Here is the latest patch that integrated Sean and David Li's comments.
I'm working on the test case. One thing I'm not clear is that I need the passmanagerbuild change to invoke the new functionality. That part of code was split from this patch from Manman's review comments.


Some more small comments.


Use a real variable.




return *AllEdges.back()


Give intuition for why it is edges *not in* the MST rather than edges *in* the MST? Edges with high counts should have high weight and therefore *not* be in the MST (which tries for minimum weight); we don't want to instrument edges with high counts, and they are *not* in the MST, so why would we place counters there?


Please cite the Knuth paper. Also explain what we actually do with the MST (and give intuition for why that makes sense (it's fairly simple, should not need a long explanation)).


Make these two variable just local variables of instrumentOnFunc free function.


This is only called in one place, inline the code into instrumentOnFunc.


Can you use continue to reduce indentation for the "large" side of the if? ( )


Do you mean to do a copy here? Probably std::vector<uint64_t> & is intended.


e instead of E, as is common in LLVM.


getBBInfo(Ei->SrcBB). You shouldn't need a cast here (if you do, then please fix whatever code is requiring this to cast away constness).


Just make the iteration variable BB and use getBBInfo(&BB), like you do below in the loop // Assert every BB has a valid counter..


Remove the const_cast (I think this will require making GetSuccessorNumber take const BasicBlock *, which you can do as a separate patch (no need for pre-commit review))


Do you mean instrumentOneFunc? instrumentOnFunc doesn't make sense (weird use of "on").

xur marked 14 inline comments as done.Nov 19 2015, 11:17 AM

Thanks for the suggestion. All fixed. Please refer to the newest patch for the update.

xur updated this revision to Diff 40682.Nov 19 2015, 11:18 AM

Updated with Sean's new comments.
Also includes the unit tests.



davidxl edited edge metadata.Nov 19 2015, 2:57 PM

Can you also add a script to re-generate the binary profile data needed whenever profile format changes ?

xur added a comment.Nov 19 2015, 3:11 PM

These .ll files are the complete program IR that can generate the
profiles. I can embed the commands in the comment of the .ll files. Just
to make sure: You want a separated shell script in the same director to
generate the profiles?



These test files look like they are just a dump of IR generated from a C/C++, which is extremely verbose and has a lot of inessential details. I also don't like checking in binary profdata files. Overall these tests seem extremely brittle. I also don't understand why the test files have names like "for" or "goto" or "ifelse". Those concepts don't exist in the IR. Surely the tests should have names like "criticaledge" etc.

It seems like this testing might be more readable as C++ unit tests. Using the new pass manager this should be easy to wire up. I think the hardest part would be to stub out IndexedInstrProfReader.

I thought about the size of the test case too. Initially I had the same
concern, but in second look, it does not seem to be brittle -- because it
just checks whether the key branches are properly annotated with the right
profile count or not, or profile counter update instruction is inserted or

I think adding C++ unit tests are independent tasks which can be done once
the driver part of the changes land.


xur updated this revision to Diff 41062.Nov 24 2015, 10:29 AM
xur edited edge metadata.

I changed the regression tests based on the feedbacks from Sean, David and Justin. The IR is much simplified and the profile uses the text format.
I still keep the target triplet as x86_64-unknown-linux-gnu, because this is the only platform I tested. I will try to relax it later.



Looks good with the minor fix.


Remove unused code.

This revision was automatically updated to reflect the committed changes.

Seems it behaves differently on i686. Investigating.

See also;
(It is configured as "gcc -m32")

silvas added a subscriber: silvas.Nov 24 2015, 3:16 PM

Hi Rong,

Typically, when there have been multiple active reviewers, it is common
courtesy to wait for all of them to LGTM the patch.

I had further comments on the testing here (which the bots seem to have
caught you on anyway), so I recommend reverting this patch for the moment
and reopening the review.

  • Sean Silva
xur updated this revision to Diff 41175.Nov 25 2015, 1:20 PM
xur removed rL LLVM as the repository for this revision.

Fix a few issues that being exposed by the buildbot:
(1) wrong shift operation in functionhash computing,
(2) missing type cast for vector<...>size_type (which results bad functionhash in m32 host)
(3) sorting is not stable

Also improved the test checks suggested by Sean.



I took another pass through, so some of the comments are nits. But most of the comments on the tests in particular are quite significant and not just "nits".


Why does IPO now depend on Instrumentation, but didn't previously?
What is different about the PGO passes vs the other instrumentation passes?


Naming convention.


The length of this line does not match the one below.


typo: two spaces after 'the' should be just one space.


Why are they mutually exclusive? Weren't you wanting to the the count profile in MST computation eventually?


typo: "is done"


class PGOGenFunc is no longer present, please update the comment.


These header comments easily go out of date during patch review. I would recommend reading the comment from top to bottom and verifying that it is up to date.


This option is only for testing, so please rename it to make that clear (also update the description).


Do you still need these const_cast<>'s after r253733?


You say "edges" (plural) here but then say "There should be one and only one".

7 ↗(On Diff #41175)

This would be more readable with a non-mangled name. (same for the other tests)

10 ↗(On Diff #41175)

Verify that it is attached to the instruction you expect (CHECK-SAME may be useful).

(same for the other "use" tests)

53 ↗(On Diff #41175)

Will this test fail if the call ends up in the previous or next BB?

I would recommend, for each BB, having a CHECK line verifying the BB name, followed by either CHECK or CHECK-NOT verifying the presence or absence of the increment call.

10 ↗(On Diff #41175)

Please have the names consistent with the switch values.

25 ↗(On Diff #41175)

This test case can be simplified further. At the very least, the function bar is not needed.

49 ↗(On Diff #41175)

The convention for FileCheck variable captures is all upper case (e.g. [[BW2:[0-9]+]]).
Also, can you use more semantic names instead of numbers? E.g. BW_<name> instead of BW<number>.

5 ↗(On Diff #41175)

What part of the code is this test trying to cover that is not covered by loop3 or loop1? Do we need this test?

6 ↗(On Diff #41175)

You can merge the _gen and _use files by using FileCheck's option --check-prefix (e.g. --check-prefix=GEN or --check-prefix=USE).

Actually, doing this is necessary so that there is a single point of truth for the IR used by both.

16 ↗(On Diff #41175)

These add instructions are not needed. Just make the function return void.

xur marked 17 inline comments as done.Nov 30 2015, 1:04 PM

Thanks for Sean's suggestion. They indeed make the tests cleaner and more robust. I integrated his reviews in the latest patch that I'll post soon.


thanks for catching this. This should not be there -- it is only needed for the passmangerbuilder change.

25 ↗(On Diff #41175)

bar is a file static function and was used to check the source name is part of the profile variable name.

xur updated this revision to Diff 41428.Nov 30 2015, 1:08 PM
xur marked 2 inline comments as done.

Integrated Sean's most recently review comments.

The tests are looking great! A couple nits and a final question about the test.

31 ↗(On Diff #41428)

Can you use CHECK-DAG directive to move these next to the place where the BW_* capture occurs? That would improve locality and clarify what is being checked.

(same in the other files)

4 ↗(On Diff #41428)

Tiny nit: could you name all the tests which are just checking diagnostics to match diag_*.ll? (or whatever naming convention seems appropriate)

That will help to clearly identify them.

9 ↗(On Diff #41428)

What is the importance of testing a nested for loop? What part of the code are you trying to exercise that isn't covered by loop1.ll?

davidxl added inline comments.Nov 30 2015, 8:38 PM
9 ↗(On Diff #41428)

Having loop nest in the test for better coverage is fine -- but looks like we don't actually need 3-deep nest -- a 2-deep loop nest is good enough and will be easier to read.

loop1.ll has a loop with top test. How about a loop test with bottom testing. Also a loop with more control flow inside the body, and a loop with early exit might be nice to have -- but those can be added later as follow ups if needed.

xur added a comment.Dec 1 2015, 2:47 PM

It's not clear how can I use CHECK-DAG to group together the branch weight
meta data and the related instruction. if there is a single instruction and
single meta data, I can move up the branch weight meta data next the the
instruction. But if there are multiple, I don't know how to do it. From
the document, CHECK-DAG can be used b/w two matches (or before the first
match, or after the last match). Move the branch-weight meta data and use
USE-DAG is not working.

I also tried to change all use USE to USE (except the first and last). That
won't work either.

As for the loop2, it just a more complex data flow. I'll change it to two
level nexted loop according to David's suggestion.

I used to use a buttom test loop, but I removed based on silvas's
suggestion. I'll add more tests later if necessary.

xur updated this revision to Diff 41686.Dec 2 2015, 3:37 PM

Integrated most recent review comments from Sean, David and Justin.
Let me know if I missed anything.



xur added a comment.Dec 7 2015, 10:12 AM

Sean, Justin and David: Do you have any comment on the latest patch?
If it looks fine to you, I plan to submit it again this week.