This is an archive of the discontinued LLVM Phabricator instance.

Increases full-unroll threshold.
ClosedPublic

Authored by danielcdh on Jan 5 2017, 10:55 AM.

Details

Summary

The default threshold for fully unroll is too conservative. This patch doubles the full-unroll threshold

This change will affect the following speccpu2006 benchmarks (performance numbers were collected from Intel Sandybridge):

Performance:

403 0.11%
433 0.51%
445 0.48%
447 3.50%
453 1.49%
464 0.75%

Code size:

403 0.56%
433 0.96%
445 2.16%
447 2.96%
453 0.94%
464 8.02%

The compiler time overhead is similar with code size.

Event Timeline

danielcdh updated this revision to Diff 83274.Jan 5 2017, 10:55 AM
danielcdh retitled this revision from to Give higher full-unroll boosting when the loop iteration is small..
danielcdh updated this object.
danielcdh added reviewers: mzolotukhin, davidxl, mkuper.
danielcdh added a subscriber: llvm-commits.

So 464.h264ref is getting a bit larger and faster or slower?

So 464.h264ref is getting a bit larger and faster or slower?

Yes, it's 3% larger (text size). The performance difference is in noise range (it disappears in a different run).

So 464.h264ref is getting a bit larger and faster or slower?

Yes, it's 3% larger (text size). The performance difference is in noise range (it disappears in a different run).

Is that a concern, or is it just a small benchmark?

I don't think it is a concern.

  • it is a small benchmark (1.2M text size)
  • there are a lot of countable loops in the code, thus it increased the most
  • the increased size does not change performance
  • the size will not change if optForSize() is true

I don't think it is a concern.

  • it is a small benchmark (1.2M text size)
  • there are a lot of countable loops in the code, thus it increased the most
  • the increased size does not change performance
  • the size will not change if optForSize() is true

SGTM

lib/Transforms/Scalar/LoopUnrollPass.cpp
678

fully unroll -> fully unrolling

684

Can you please put the magic numbers here into cl::opt flags so that it is easy to experiment with tuning them later?

Hi Dehao,

Does your change impact the inline decision of 464.h264ref? If so, would you please let me know the functions that are no longer inlined?

Haicheng

mzolotukhin edited edge metadata.Jan 5 2017, 3:07 PM

Hi,

I don't think that a right approach for it. We already have a lot of thresholds and I'd prefer to avoid adding another one as much as possible.

In this particular case it also seems that we can express the same 'bonus' with existing thresholds. There is unroll-max-iteration-count-to-analyze, which you might want to play with - if your loops have higher trip-counts then this thresholds, we don't try to predict benefits of unrolling and behave conservatively. Increasing this threshold should help unrolling more loops. If for some reason you don't want to change this parameter, then you can get almost the same effect as you propose by just bumping up unroll-threshold.

Please also note that we already take into account the fact that unrolling removes branches: see getUnrolledLoopSize.

Also, please also provide results of compile time testing for such changes.

Michael

Hi Dehao,

Does your change impact the inline decision of 464.h264ref? If so, would you please let me know the functions that are no longer inlined?

Haicheng

The inline decision of 464.h264ref did not change with this patch.

Hi,

I don't think that a right approach for it. We already have a lot of thresholds and I'd prefer to avoid adding another one as much as possible.

Sorry that I don't seem to quite follow the comment. We did not add a new threshold in this patch, or am I missing something? Can you help clarify?

In this particular case it also seems that we can express the same 'bonus' with existing thresholds. There is unroll-max-iteration-count-to-analyze, which you might want to play with - if your loops have higher trip-counts then this thresholds, we don't try to predict benefits of unrolling and behave conservatively. Increasing this threshold should help unrolling more loops. If for some reason you don't want to change this parameter, then you can get almost the same effect as you propose by just bumping up unroll-threshold.

Please also note that we already take into account the fact that unrolling removes branches: see getUnrolledLoopSize.

Yes and we model that in the original getFullUnrollBoostingFactor. This patch tried to model the other benefit: expanding optimization scope.

Also, please also provide results of compile time testing for such changes.

Michael

danielcdh updated this object.Jan 5 2017, 4:09 PM
danielcdh edited edge metadata.
danielcdh updated this revision to Diff 83320.Jan 5 2017, 4:12 PM
danielcdh marked an inline comment as done.

update

Hi,

I don't think that a right approach for it. We already have a lot of thresholds and I'd prefer to avoid adding another one as much as possible.

Sorry that I don't seem to quite follow the comment. We did not add a new threshold in this patch, or am I missing something? Can you help clarify?

Ahh, do you mean that we should not add a new option to tune this boosting factor? (Thanks Micheal (mkuper@) for pointing this out).

If that's the case, I've updated the patch to make it tunable with unroll-max-iteration-count-to-analyze. PTAL.

Thanks,
Dehao

In this particular case it also seems that we can express the same 'bonus' with existing thresholds. There is unroll-max-iteration-count-to-analyze, which you might want to play with - if your loops have higher trip-counts then this thresholds, we don't try to predict benefits of unrolling and behave conservatively. Increasing this threshold should help unrolling more loops. If for some reason you don't want to change this parameter, then you can get almost the same effect as you propose by just bumping up unroll-threshold.

Please also note that we already take into account the fact that unrolling removes branches: see getUnrolledLoopSize.

Yes and we model that in the original getFullUnrollBoostingFactor. This patch tried to model the other benefit: expanding optimization scope.

Also, please also provide results of compile time testing for such changes.

Michael

lib/Transforms/Scalar/LoopUnrollPass.cpp
684

Updated to make it tunable with unroll-max-iteration-count-to-analyze

Sorry that I don't seem to quite follow the comment. We did not add a new threshold in this patch, or am I missing something? Can you help clarify?

Oh, my apologies. I implicitly agreed with what @hfinkel suggested and implied that we use a cl::opt parameter instead of the magic number 400.

This patch tried to model the other benefit: expanding optimization scope.

Ok, I understand the idea. However, it looks too vague to model, and I don't see a relation with tripcounts then. If a loop has a huge tripcount, and we completely unroll it, we'll get a huge single basic block with tons of optimization opportunities, right? How is it different for a loop with a smaller tripcount? Why does it depend on the tripcount at all, and why do we model it as a 400/TripCount?

I think that we either need to be more explicit about what benefits we expect from unrolling (see e.g. what LoopUnrollAnalyzer does), or just use generic thresholds like unroll-threshold.

Michael

Sorry that I don't seem to quite follow the comment. We did not add a new threshold in this patch, or am I missing something? Can you help clarify?

Oh, my apologies. I implicitly agreed with what @hfinkel suggested and implied that we use a cl::opt parameter instead of the magic number 400.

This patch tried to model the other benefit: expanding optimization scope.

Ok, I understand the idea. However, it looks too vague to model, and I don't see a relation with tripcounts then. If a loop has a huge tripcount, and we completely unroll it, we'll get a huge single basic block with tons of optimization opportunities, right? How is it different for a loop with a smaller tripcount? Why does it depend on the tripcount at all, and why do we model it as a 400/TripCount?

I see your point. Yes, the optimization scope is not related to trip count, and it is confusing to relate one with each other.

The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".

Anyway, I think the patch needs to be updated to make it accurate. We could either use a constant minimum boosting factor, or still use the trip_count to limit "relative code size increase" but update the comment to make it accurate.

Any suggestions?

Thanks,
Dehao

I think that we either need to be more explicit about what benefits we expect from unrolling (see e.g. what LoopUnrollAnalyzer does), or just use generic thresholds like unroll-threshold.

Michael

The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".

I see. Just doubling the threshold indeed will be hard to upstream, and reasonably so. While it improves performance on your benchmarks, it also increases code-size/compile-time on many other tests. IMHO the ideal solution here would be to understand what differentiates loops in your benchmarks from other loops, then estimate how popular such cases are, and if they're popular enough, teach llvm to recognize them (i.e. see benefits of unrolling such loops). This way we'll only increase code-size/compile time in cases where we gain performance, and lose nothing in other cases.

Anyway, I think the patch needs to be updated to make it accurate. We could either use a constant minimum boosting factor, or still use the trip_count to limit "relative code size increase" but update the comment to make it accurate.

Any suggestions?

The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".

I see. Just doubling the threshold indeed will be hard to upstream, and reasonably so. While it improves performance on your benchmarks, it also increases code-size/compile-time on many other tests. IMHO the ideal solution here would be to understand what differentiates loops in your benchmarks from other loops, then estimate how popular such cases are, and if they're popular enough, teach llvm to recognize them (i.e. see benefits of unrolling such loops). This way we'll only increase code-size/compile time in cases where we gain performance, and lose nothing in other cases.

I compared the profile between our internal benchmark and 464.h264ref, the only difference is that the fully unrolled loop showed high up in the profile of our benchmark, while the fully unrolled loop is cold in 464.h264ref, thus it has no performance impact.

We could use profile info to allow more aggressive threshold only for hot loops, so that code size increase can be avoided. But this requires BFI within the loop pass, which will be expensive (compile time overhead).

OTOH, if the heuristic is generally helping performance, I guess it would be worth to tradeoff 3% code size/compile time with potential better performance?

Thanks,
Dehao

Anyway, I think the patch needs to be updated to make it accurate. We could either use a constant minimum boosting factor, or still use the trip_count to limit "relative code size increase" but update the comment to make it accurate.

Any suggestions?

I compared the profile between our internal benchmark and 464.h264ref, the only difference is that the fully unrolled loop showed high up in the profile of our benchmark, while the fully unrolled loop is cold in 464.h264ref, thus it has no performance impact.

Could you tell what exactly happens to the loop after unrolling? Do we get the performance improvement from just removing branches, or does unrolling enable later optimizations (if so, which ones)?

We could use profile info to allow more aggressive threshold only for hot loops, so that code size increase can be avoided. But this requires BFI within the loop pass, which will be expensive (compile time overhead).

Using profile info in loop-unrolling is definitely worthwhile. Most of the loops we unroll are actually in the cold parts, so if we can avoid unrolling them, we can save some budget for more aggressive unrolling in hot regions (or just get smaller code and faster compilation).

OTOH, if the heuristic is generally helping performance...

What heuristic are you referring to here?

Michael

The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".

I see. Just doubling the threshold indeed will be hard to upstream,...

Alternatively, maybe we can make the cost model more accurate. I observe the cost model used by the unroller overestimate the cost of free (S/Z)EXT and unconditional branches.

I compared the profile between our internal benchmark and 464.h264ref, the only difference is that the fully unrolled loop showed high up in the profile of our benchmark, while the fully unrolled loop is cold in 464.h264ref, thus it has no performance impact.

Could you tell what exactly happens to the loop after unrolling? Do we get the performance improvement from just removing branches, or does unrolling enable later optimizations (if so, which ones)?

It's from reduced branch as well as loop preparation code (dynamic instruction reduced from 179 to 167, which has already been captured by the unroll size analysis (boosting = rolled_cost/unroll_cost = 179/167). However, for that specific case, we need a threshold of ~200 to make the fully unroll happen.

We could use profile info to allow more aggressive threshold only for hot loops, so that code size increase can be avoided. But this requires BFI within the loop pass, which will be expensive (compile time overhead).

Using profile info in loop-unrolling is definitely worthwhile. Most of the loops we unroll are actually in the cold parts, so if we can avoid unrolling them, we can save some budget for more aggressive unrolling in hot regions (or just get smaller code and faster compilation).

I agree profile can help get a good balance here, but O2 build cannot benefit from it.

OTOH, if the heuristic is generally helping performance...

What heuristic are you referring to here?

Sorry, I meant the profile I proposed in this patch.

Thanks,
Dehao

Michael

The real motivation for this patch is to boost the threshold for fully unroll so that we can materialize the performance benefits in our benchmarks. The initial thoughts were: simply boost unroll-threshold by a minimum of 2X for fully unrolling (which is fine to materialize the performance). But I think this might be harder to be accepted by upstream as it seems too brutal-force. Then I'm thinking of integrating trip_count into the model to limit the "relative code size increase".

I see. Just doubling the threshold indeed will be hard to upstream,...

Alternatively, maybe we can make the cost model more accurate. I observe the cost model used by the unroller overestimate the cost of free (S/Z)EXT and unconditional branches.

Agree that we need more accurate model, but the problem is that even the model is 100% accurate, linear-boosting factor cannot help boost threshold big enough for our case.

Thanks,
Dehao

Agree that we need more accurate model, but the problem is that even the model is 100% accurate, linear-boosting factor cannot help boost threshold big enough for our case.

One problem with current cost model is that it's used for estimating both code size and runtime performance. It might be worth checking if we can gain anything from separating these two metrics more clearly - I think it was discussed in the past, but no decision has been made.

I agree profile can help get a good balance here, but O2 build cannot benefit from it.

There are always cases where we generate sub-optimal code. For users striving for the outmost performance we provide higher optimization levels (+LTO, +PGO) and pragmas. We cannot just bump thresholds for every case we want to unroll/inline/whatever.

Sorry, I meant the profile I proposed in this patch.

Adding Constant/TripCount looks like simply bumping the threshold to me, except it also adds complexity to the code, so I'm not convinced we want this.

Michael

Agree that we need more accurate model, but the problem is that even the model is 100% accurate, linear-boosting factor cannot help boost threshold big enough for our case.

One problem with current cost model is that it's used for estimating both code size and runtime performance. It might be worth checking if we can gain anything from separating these two metrics more clearly - I think it was discussed in the past, but no decision has been made.

The relationship between code size and runtime performance is different between different unroller.

In the dynamic unroll and partial unroll, performance will initially increase as code size increases (because dynamic branch is reduced), but when it reaches a threshold, the performance will start to degrade when code size increase (due to i-cache miss increase and loop body no long fit into LSD, etc). So a fixed threshold is usually helpful to find the performance sweet-spot.

In the fully unroll, if the loop can be fully unrolled, it will not likely to trigger LSD (not enough trip count), nor will it affect the icache-miss (fully unrolled loop is streight-line code, no temporal locality, even if it's embedded in an outer-loop, the backedge of the outer loop should be easy to predict right). So if we assume all backend optimizations is sane (e.g. SLP performs as well as loop vectorizer, RA is doing good job in large BB, etc). As a result, larger code size should always lead to better performance for fully unroll. So a threshold here is purely limiting the size of the text.

If my above analysis is reasonable, then I think probably two types of unroller should not share the same threshold? And fully unroller may better have a larger threshold?

I agree profile can help get a good balance here, but O2 build cannot benefit from it.

There are always cases where we generate sub-optimal code. For users striving for the outmost performance we provide higher optimization levels (+LTO, +PGO) and pragmas. We cannot just bump thresholds for every case we want to unroll/inline/whatever.

Sounds reasonable. How about we bump the threshold in O3, so that people who do not have profiler can still choose to fully unroll more aggressively?

Thanks,
Dehao

Sorry, I meant the profile I proposed in this patch.

Adding Constant/TripCount looks like simply bumping the threshold to me, except it also adds complexity to the code, so I'm not convinced we want this.

Michael

In the fully unroll, if the loop can be fully unrolled, it will not likely to trigger LSD (not enough trip count), nor will it affect the icache-miss (fully unrolled loop is streight-line code, no temporal locality, even if it's embedded in an outer-loop, the backedge of the outer loop should be easy to predict right). So if we assume all backend optimizations is sane (e.g. SLP performs as well as loop vectorizer, RA is doing good job in large BB, etc). As a result, larger code size should always lead to better performance for fully unroll. So a threshold here is purely limiting the size of the text.

This is not exactly true in practice. If we just bump up the threshold, we'll see both performance improvements and regressions.

I think probably two types of unroller should not share the same threshold?

This makes sense. However, I prefer not to bloat our army of thresholds without a guaranteed benefit.

How about we bump the threshold in O3, so that people who do not have profiler can still choose to fully unroll more aggressively?

For the change like this please submit a separate patch and include as much testing data as you can (including but not limited to SPEC, LLVM-testsuite, etc.). Please include runtime performance, compile time, and binary sizes.

Thanks,
Michael

In the fully unroll, if the loop can be fully unrolled, it will not likely to trigger LSD (not enough trip count), nor will it affect the icache-miss (fully unrolled loop is streight-line code, no temporal locality, even if it's embedded in an outer-loop, the backedge of the outer loop should be easy to predict right). So if we assume all backend optimizations is sane (e.g. SLP performs as well as loop vectorizer, RA is doing good job in large BB, etc). As a result, larger code size should always lead to better performance for fully unroll. So a threshold here is purely limiting the size of the text.

This is not exactly true in practice. If we just bump up the threshold, we'll see both performance improvements and regressions.

From our limited experiments, bumping up the fully unroll threshold by 2X only improves performance for both speccpu and our internal benchmarks. If we boost it by 10X, we do see perf regression on some coding/decoding benchmarks. We root-caused the problem to be SLP cannot vectorize fully-unrolled code while loop vectorizer can. @mkuper is working on SLP to solve it. Other than that, it appears even boosting the threshold to 10X is a pure win for performance.

Could you point us to the benchmarks you observed regression after boosting fully unroll threshold? We would be happy to take a look and learn why performance get worse and possibly improve it. Thanks!

I think probably two types of unroller should not share the same threshold?

This makes sense. However, I prefer not to bloat our army of thresholds without a guaranteed benefit.

What do you mean by "guaranteed benefit"?

If it means "positive speedup with no code size/compile time increase", this seems impossible as any threshold boost will lead to code size boost.

If it means "positive speedup" only, it seems to be already satisfied.

How about we bump the threshold in O3, so that people who do not have profiler can still choose to fully unroll more aggressively?

For the change like this please submit a separate patch and include as much testing data as you can (including but not limited to SPEC, LLVM-testsuite, etc.). Please include runtime performance, compile time, and binary sizes.

I'll send out a new patch for this is we decided to put this in O3. During the mean time, I collected more performance data:

  • update the data to remove the trip count logic and merely boost the fully unroll tripcount by 2X
benchmarkcode sizecompile timeperformance
447.dealII0.52%-0.24%-0.94%
453.povray0.45%-0.65%3.00%
433.milc0.20%2.01%0.47%
445.gobmk0.32%-1.12%0.32%
403.gcc0.05%0.58%0.25%
464.h264ref4.04%4.62%0.28%
  • build llvm testsuite with and without the change, it only affects the following 3 binaries. No noticeable compile time/run time has been observed.
binarycode size change
CMakeFiles/CheckTypeSize/CMAKE_SIZEOF_UNSIGNED_SHORT.bin0.1%
CMakeFiles/feature_tests.bin0%
CMakeFiles/TestEndianess.bin0.1%

Thanks,
Dehao

Thanks,
Michael

Could you point us to the benchmarks you observed regression after boosting fully unroll threshold?

I ran the standard LLVM testsuite in the past and I think I observed several runtime regressions. However my memory might play tricks on me, so if you've just remeasured it, you can ignore this.

What do you mean by "guaranteed benefit"?

I meant that while some compiletime/runtime tradeoff might be acceptable, we definitely need to be aware of it before we land such changes, and we do have to have numbers at hand for that. Ideally, that would be pure win (runtime performance improves, compile time and code size is the same), but yeah, unfortunately it's unfeasible.

update the data to remove the trip count logic and merely boost the fully unroll tripcount by 2X

What option do you mean by fully unroll tripcount threshold? -unroll-threshold?

danielcdh updated this revision to Diff 83895.Jan 10 2017, 4:46 PM

use a separate FullThreshold for fully unroller.

Could you point us to the benchmarks you observed regression after boosting fully unroll threshold?

I ran the standard LLVM testsuite in the past and I think I observed several runtime regressions. However my memory might play tricks on me, so if you've just remeasured it, you can ignore this.

What do you mean by "guaranteed benefit"?

I meant that while some compiletime/runtime tradeoff might be acceptable, we definitely need to be aware of it before we land such changes, and we do have to have numbers at hand for that. Ideally, that would be pure win (runtime performance improves, compile time and code size is the same), but yeah, unfortunately it's unfeasible.

Definitely, the result I have so far includes speccpu and llvm testsuite. We also have internal benchmarks with all positive performance impact and < 0.1% mean size increase, unfortunately we cannot show them here. Let me know if there's any other benchmarks you would like me to test for perf/code_size impacts.

update the data to remove the trip count logic and merely boost the fully unroll tripcount by 2X

What option do you mean by fully unroll tripcount threshold? -unroll-threshold?

I should have upload the updated patch to make it clear, sorry about that. The above numbers are collected with the updated patch.

Thanks,
Dehao

ping...

Thanks,
Dehao

It looks like this makes UnrollingPreferences::Threshold essentially unused? We already have a separate threshold for runtime/partial unrolling; it just isn't exposed as a command-line option.

It looks like this makes UnrollingPreferences::Threshold essentially unused? We already have a separate threshold for runtime/partial unrolling; it just isn't exposed as a command-line option.

It's still used here:

// Check for explicit Count.
// 1st priority is unroll count set by "unroll-count" option.
bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;
if (UserUnrollCount) {
  UP.Count = UnrollCount;
  UP.AllowExpensiveTripCount = true;
  UP.Force = true;
  if (UP.AllowRemainder && getUnrolledLoopSize(LoopSize, UP) < UP.Threshold)
    return true;
}

You are right, partial unroll and runtime unroll uses UP.PartialThreshold which was set the same as UP.Threshold. Any recommendations on how to make this less confusing?

My suggestion:

  1. Rename -unroll-threshold to -unroll-full-threshold.
  2. Rename UP.Threshold to UP.FullThreshold.
  3. Add an option -unroll-partial-threshold, and use it to initialize UP.PartialThreshold instead of unroll-full-threshold.
  4. Make sure all the uses of UP.PartialThreshold and UP.FullThreshold make sense.
  5. Separate out the change to modify the default full-unroll threshold into a different patch.

I agree with the plan proposed by Eli.

One thing I'm not 100% certain is "Rename -unroll-threshold to -unroll-full-threshold": I think we can keep '-unroll-threshold' for full unroll threshold and add '-unroll-partial-threshold' for the partial unrolling case. It might be better because all current uses of '-unroll-threshold' option will remain correct. But I don't feel strong about it, so whatever name you choose is fine with me.

Michael

I picked to reuse Threshold for brevity. Let me know if anyone objects.

https://reviews.llvm.org/D28831 sent out to separate -unroll-partial-threshold.

Thanks,
Dehao

danielcdh updated this revision to Diff 84774.Jan 17 2017, 4:23 PM

Rebase to only adjust the unroll threshold.

ping... (the patch now became much simpler).

Thanks,
Dehao

Using the higher threshold makes sense at -O3. Not so sure about -O2... maybe ask on llvmdev?

The new threshold looks reasonable to me. Please update title and summary?

danielcdh retitled this revision from Give higher full-unroll boosting when the loop iteration is small. to Increases full-unroll threshold..Feb 10 2017, 3:52 PM
danielcdh edited the summary of this revision. (Show Details)
hfinkel accepted this revision.Feb 10 2017, 5:08 PM

LGTM

This revision is now accepted and ready to land.Feb 10 2017, 5:08 PM
danielcdh edited the summary of this revision. (Show Details)Feb 13 2017, 8:28 AM
danielcdh updated this revision to Diff 88951.Feb 17 2017, 1:52 PM

Move the threshold update to O3

chandlerc accepted this revision.Feb 17 2017, 6:16 PM

LGTM, thanks for all the analysis!

lib/Transforms/Scalar/LoopUnrollPass.cpp
133–134

clang-format here?

danielcdh marked an inline comment as done.Feb 17 2017, 7:52 PM

Thanks for the reviews.

danielcdh updated this revision to Diff 89008.Feb 17 2017, 7:58 PM

clang-format and rebase

danielcdh closed this revision.Feb 17 2017, 7:58 PM