Page MenuHomePhabricator

[Attributor][MustExec] Deduce dereferenceable and nonnull attribute using MustBeExecutedContextExplorer
ClosedPublic

Authored by uenoku on Jul 29 2019, 9:43 AM.

Details

Summary

In D65186 and related patches, MustBeExecutedContextExplorer is introduced. This enables us to traverse instructions guaranteed to execute from function entry. If we can know the argument is used as dereferenceable or nonnull in these instructions, we can mark dereferenceable or nonnull in the argument definition:

  1. Memory instruction (similar to D64258)

Trace memory instruction pointer operand. Currently, only inbounds GEPs are traced.

define i64* @f(i64* %a) {
entry:
  %add.ptr = getelementptr inbounds i64, i64* %a, i64 1
; (because of inbounds GEP we can know that %a is at least dereferenceable(16))
  store i64 1, i64* %add.ptr, align 8
  ret i64* %add.ptr ; dereferenceable 8 (because above instruction stores into it)
}
  1. Propagation from callsite (similar to D27855)

If deref or nonnull are known in call site parameter attributes we can also say that argument also that attribute.

declare void @use3(i8* %x, i8* %y, i8* %z);
declare void @use3nonnull(i8* nonnull %x, i8* nonnull %y, i8* nonnull %z);

define void @parent1(i8* %a, i8* %b, i8* %c) {
  call void @use3nonnull(i8* %b, i8* %c, i8* %a) 
; Above instruction is always executed so we can say that@parent1(i8* nonnnull %a, i8* nonnull %b, i8* nonnull %c)
  call void @use3(i8* %c, i8* %a, i8* %b)
  ret void
}

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
xbolva00 added a comment.EditedAug 15 2019, 4:04 AM

Hello folks,

Yesterday I came up with idea to annonate allocation functions with deref_or_null.

malloc(40) -> annotate return value of malloc with deref_or_null(40) -> Good, I will work on this idea.

Now I present some ideas for Attributor (maybe following cases already work.. I didnt check them).

Use case 1:
p = malloc(40) deref_or_null(40)
p[1] = 100;
here, we can change deref_or_null(40) to deref(40) since if p was null, it is UB.

Use case 2:
p = malloc(40) // deref_or_null(40)
if (p) {

in this block p is now  deref(40)! cool.

}

Use case 3
p = malloc(40) // deref_or_null(40)
if (someotherbool) {

p[1] = 100; // here, we can change  deref_or_null(40) to  deref(40) since if p was null, it is UB.

}

What do you think? Can attributor do it? I think use case 1 is doable, 2 and 3 a bit harder ones.
Maybe @reames is interested too in this area? Please comment this idea :)

I think precise deref info can enable really neat things, see https://bugs.llvm.org/show_bug.cgi?id=43003.

Hello folks,

Yesterday I came up with idea to annonate allocation functions with deref_or_null.

malloc(40) -> annotate return value of malloc with deref_or_null(40) -> Good, I will work on this idea.

Now I present some ideas for Attributor (maybe following cases already work.. I didnt check them).

Use case 1:
p = malloc(40) deref_or_null(40)
p[1] = 100;
here, we can change deref_or_null(40) to deref(40) since if p was null, it is UB.

This should work with these patches, yes. (It will be deref(40) if nullptr is not a valid pointer else it is deref(2).)

Use case 2:
p = malloc(40) // deref_or_null(40)
if (p) {

in this block p is now  deref(40)! cool.

}

Yes, this will work at some point soon, it might already if the nonnull query is already using the "context instruction". However, there might not be a way to annotate the information. Passes will soon (after the series that ends in D66276) be able to build a Attributor and query a single property, e.g., the dereferenceability property of a pointer inside some control. We might also think about better dereferenceability annotations, maybe an intrinsic. This also ties in with the new deref and deref_globally (D61652).

Use case 3
p = malloc(40) // deref_or_null(40)
if (someotherbool) {

p[1] = 100; // here, we can change  deref_or_null(40) to  deref(40) since if p was null, it is UB.

}

Similar to above.

What do you think? Can attributor do it? I think use case 1 is doable, 2 and 3 a bit harder ones.

All are well in the scope of what we (will soon) have.

Maybe @reames is interested too in this area? Please comment this idea :)

I think precise deref info can enable really neat things, see https://bugs.llvm.org/show_bug.cgi?id=43003.

Once these patches are in we need to switch to deref and deref_globally (D61652). We should also consider using deref_globally (or sth similar) to indicate that all existing accesses will be dereferenceable, e.g.,
if we have an alloc or a malloc for which we can show it returns sth dereferenceable and the pointer is not used after the free.

Good to know, thanks for info!

Hi,
my another question :)
clang -std=c99 -Ofast -DNDEBUG -S -emit-llvm -mllvm -enable-nonnull-arg-prop -mllvm -attributor-disable=false code.c

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(%struct.tHashTableItem* nocapture readonly %symbol) local_unnamed_addr #2 {
entry:

%call = tail call noalias i8* @malloc(i64 64) #8
%0 = bitcast i8* %call to %struct.tHashTableItem*
%1 = bitcast %struct.tHashTableItem* %symbol to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(64) %call, i8* nonnull align 8 dereferenceable(64) %1, i64 64, i1 false)
%cmp = icmp eq i8* %call, null

Curently LLVM fails to propagate attributes to arguments thru bitcasts, it seems. With this patch help with this case?

xbolva00 added a comment.EditedAug 20 2019, 3:08 PM

Uhm.

Even simple case with "-mllvm -enable-nonnull-arg-prop" does not work.. strange. Hopefully with Attributor it would work.

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(i8* nocapture readonly %s) local_unnamed_addr #0 {
entry:

%call = tail call noalias i8* @calloc(i64 1, i64 64) #8
%call.i = tail call noalias i8* @calloc(i64 1, i64 64) #8
%Index.i = getelementptr inbounds i8, i8* %call.i, i64 24
%0 = bitcast i8* %Index.i to i32*
store i32 -1, i32* %0, align 8, !tbaa !14
%bcmp = tail call i32 @bcmp(i8* nonnull dereferenceable(7) %call.i, i8* nonnull dereferenceable(7) %s, i64 7)
uenoku updated this revision to Diff 216455.EditedAug 21 2019, 1:19 PM

Address some comments and change to structured deduction. I'll add new test mentioned in the comment later.

Hi,
my another question :)
clang -std=c99 -Ofast -DNDEBUG -S -emit-llvm -mllvm -enable-nonnull-arg-prop -mllvm -attributor-disable=false code.c

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(%struct.tHashTableItem* nocapture readonly %symbol) local_unnamed_addr #2 {
entry:

%call = tail call noalias i8* @malloc(i64 64) #8
%0 = bitcast i8* %call to %struct.tHashTableItem*
%1 = bitcast %struct.tHashTableItem* %symbol to i8*
tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 8 dereferenceable(64) %call, i8* nonnull align 8 dereferenceable(64) %1, i64 64, i1 false)
%cmp = icmp eq i8* %call, null

Curently LLVM fails to propagate attributes to arguments thru bitcasts, it seems. With this patch help with this case?

Uhm.

Even simple case with "-mllvm -enable-nonnull-arg-prop" does not work.. strange. Hopefully with Attributor it would work.

define dso_local noalias %struct.tHashTableItem* @symbolNewCopy(i8* nocapture readonly %s) local_unnamed_addr #0 {
entry:

%call = tail call noalias i8* @calloc(i64 1, i64 64) #8
%call.i = tail call noalias i8* @calloc(i64 1, i64 64) #8
%Index.i = getelementptr inbounds i8, i8* %call.i, i64 24
%0 = bitcast i8* %Index.i to i32*
store i32 -1, i32* %0, align 8, !tbaa !14
%bcmp = tail call i32 @bcmp(i8* nonnull dereferenceable(7) %call.i, i8* nonnull dereferenceable(7) %s, i64 7)

These cases are not so complex that I think this patch can handle.

uenoku marked 3 inline comments as done.Aug 21 2019, 1:35 PM

Happy to hear :)

uenoku updated this revision to Diff 216552.Aug 21 2019, 11:28 PM

Fix dererefenceable

uenoku updated this revision to Diff 216614.Aug 22 2019, 7:22 AM

Refactor

uenoku marked an inline comment as done.Aug 22 2019, 7:42 AM
uenoku added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
1683

In the current patch, context is traversed only in initialize and collect *interesting* AA. In updateImpl, these AAs will be merged into the state.

uenoku updated this revision to Diff 216645.Aug 22 2019, 9:34 AM

Fix broken patch.

jdoerfert added inline comments.Aug 22 2019, 2:45 PM
llvm/lib/Transforms/IPO/Attributor.cpp
750

I have the feeling the logic you add here works not only for arguments but other positions too, right? If so, we should make it a helper class that can be reused. In the generic case you can use getIRPosition().getCtxI() for the program point.

I'm also still unsure if we should not somehow "filter" the explored instructions based on the uses of the associated value. Maybe we could do the following:

  • In accumulatePredOnMustBeExecutedContext you take another method that yields "interesting" instructions we would like to visit if they are in the must-be-executed-context (MBECtx).
  • The function for deref would look at the uses of the associated value and ask for the users. If the user was found, it can transitively ask for its user (if it wants to). If the user was not found, we do not go further (would require path exploration).
  • If the asked for user is not in the visited set of the MBECtxIterator, we visit more instructions until we explored the context or we found the instruction we are looking for.

Does this make sense?

uenoku updated this revision to Diff 217078.Aug 25 2019, 11:07 PM

Apply to call site argument or floating value also.

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

llvm/include/llvm/Analysis/ValueTracking.h
259 ↗(On Diff #217078)

You can commit this separatly. It looks good to me.

llvm/lib/Transforms/IPO/Attributor.cpp
707

Make this just StateType

758

Why can't we do this in the initialize?

uenoku marked an inline comment as done.Aug 26 2019, 9:23 PM

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

Tracking transitive uses is also good to me but I'm not sure there is an API to (efficiently) determine whether use of the associated value belongs to the context. Isn't it same to iterator over the instructions in the context?

llvm/lib/Transforms/IPO/Attributor.cpp
758

In the registerAA, AA on-demand created may be overwritten. I'm not sure whether it is intended or bug. Then, I wanted to avoid to store the address of AAs in initialize.

I still think we should try to look only at (transitive) uses of the associated value instead of visiting all instructions and figuring out for each if it is interesting. You know what I mean?

Tracking transitive uses is also good to me but I'm not sure there is an API to (efficiently) determine whether use of the associated value belongs to the context. Isn't it same to iterator over the instructions in the context?

Given a use U, check if the user is an instruction dyn_cast<Instruction>(U.getUser()) and ask the iterator if it is contained (It.count(...)). If the result is false we can try to explore further but let's start by exploring the context to the fullest and then doing the use & user stuff

Is this blocked on anything right now? I would really like this capability :)

uenoku added a comment.EditedSep 20 2019, 7:09 AM

I got it. I'll rebase it within a week :)

uenoku updated this revision to Diff 222008.Sep 26 2019, 12:28 PM

Rebase and change the way to track (transitive) uses. Some test changes are missing.

I like this version a lot better. Selectively looking into the context is the right choice (I think). I added some comments, when the tests are updated I'll give it a last look over but this looks almost ready to me.

llvm/lib/Transforms/IPO/Attributor.cpp
713

Please describe the type of followUse in more detail here. (what arguments, what do they mean, that the update logic should go in there, what not to do, etc.)

750

I would have though we need to advance the explorer iterator somewhere?

2486

This code looks like the other followUse. Can we have a helper like getKnownNonNullAndDerefBytesForUse which we call from both? (Maybe that also is reusable/combinable with the existing logic we have)

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
189

I think we lack a return value only test, see below.

define i32* @f7_3() {
; ATTRIBUTOR: define nonnull dereferenceable(4) i32* @f7_3()
  %ptr = tail call i32* @unkown_ptr()
  store i32 10, i32* %ptr, align 16
  ret i32* %ptr
}
uenoku updated this revision to Diff 222354.Sep 30 2019, 12:00 AM

Address comment

uenoku updated this revision to Diff 222355.Sep 30 2019, 12:01 AM

Minor update.

Hopefully last round of comments.

llvm/lib/Transforms/IPO/Attributor.cpp
751

This also only works for the first time the explorer iterator is created, the contains call before was the right way (as long as the explorer works the way it does now).

You can do sth like:

for (const Use *U : Uses) {
  const Instruction *UserI = cast<Instruction>(U->getUser());
  auto EIt = Explorer.begin(CtxI), EEnd = Explorer.end(CtxI);
  bool Found = EIt.contains(UserI);
  while (!Found && ++EIt != EEnd)
     Found = EIt.getCurrentInst() == UserI;
  if (Found && Base::followUse(A, U, UserI))
    for (const Use &Us : UserI->uses())
              Uses.insert(&Us);
}
1553

This is only true, I think, if null is not a valid pointer.

1566

Reading this line it was at first not clear to me that I has to be an access for this to return a base pointer.

1607

Do we want this change or was it just for debugging? Shouldn't it already say "nonnull [fix]" if it is known?

1635

Change is not used. You could check if it is fixed/known after this update and not do the stuff below in that case.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
197

Is this derived or not? There is the same string after the FIXME and ATTRIBUTOR, right?

uenoku updated this revision to Diff 222626.Oct 1 2019, 8:16 AM
uenoku marked 8 inline comments as done.

Address comments.

llvm/lib/Transforms/IPO/Attributor.cpp
1566

What does it mean?

1607

It was just for debugging.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
197

This is derived.

Nice, +1

Is this the last remaining work before you turn the attributor on?

jdoerfert added a comment.EditedOct 1 2019, 9:22 AM

One minor and one not so minor comment inlined. That is the last problem I found and once fixed this is fine :)

Nice, +1

Is this the last remaining work before you turn the attributor on?

No, tuning is still needed, right now we focused on applicability and aggressive optimizations. I fixed all bugs I found in the TS and SPEC2006 the other day but I'll have to do that again with more benchmarks soon. This patch is important to fix various issues, one of which is the dereferenceability problem we have (D61652). I'll work on tuning and measuring the Attributor over the next two weeks, present results at the Dev-Meeting, and hopefully I'm able to propose to turn it on by then.

llvm/lib/Transforms/IPO/Attributor.cpp
1566

That means we might want to add a comment here or rename getBasePointerOfPointerOperand to something like getBasePointerOfAccessPointerOperand.

1571

Don't we have to check if offset is negative? Do we have a test?

something like A[-2] = 0; should not cause deref bytes. E.g., we might need to use 64 bit and signed version of deref bytes. but non-null can be set as soon as we know there is an access and null is not a valid pointer (probably need tests for these as well).

I thought we have a helper somewhere to deal with this offset and access size logic already?

uenoku updated this revision to Diff 223189.Oct 4 2019, 5:03 AM

Rename method name.

uenoku updated this revision to Diff 223206.Oct 4 2019, 6:36 AM

Minor update.

uenoku updated this revision to Diff 223223.Oct 4 2019, 7:54 AM

Add test for minus index.

uenoku marked an inline comment as done.Oct 4 2019, 9:31 AM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

jdoerfert accepted this revision.Oct 4 2019, 1:52 PM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I inlined one comment, if you agree with that one and the proposed fix, the rest looks good to me. If not, let me know.

llvm/lib/Transforms/IPO/Attributor.cpp
1575

I'm still unsure about this logic, correct me if I'm wrong but we have:

`Base = Offset + I.getPointer()`

and we know due to the access I that there are D dereferenceable bytes with

`D = DL.getTypeStoreSize(getPointerOperand(I)->getType()->getPointerElementType());`

Now, deref from Base should be:

`max(0, D + Offset)`

which is the same we have AADereferenceableFloating::updateImpl, but with an offset in the other direction, I think.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
199

I don't think we can deduce much about %p but only about the return, so I would expect

`FIXME: This should be define nonnull dereferenceable(8) i32* @test_for_minus_index(i32* nonnull %p)`

or an explanation why %p is deref.

This revision is now accepted and ready to land.Oct 4 2019, 1:52 PM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function.

AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

This example is so simple that we can debug them but it is hard to debug more complex ones.

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function. AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

    This example is so simple that we can debug them but it is hard to debug more complex ones.

I see. You can explore uses exhaustively though that is a "local" solution to a general problem.
I think we need to keep known & assumed together but we should provide a way for AAs that have multiple deduction strategies to exhaust them seperatly, e.g, AAArgumentFromCallSiteArguments is known to have 2 schemes so we should track their "fixpoints" separate somehow.
In addition, or as an alternative, we could allow updates for AAs in a fixpoint if they opt-in to it. They would do so if they can improve based on known-information around them.

uenoku added a comment.Oct 7 2019, 1:12 AM

Currently, any use is not tracked so nothing about A[1] or A[-2] is deduced.

As long as we have a test that is fine.

This would be solved once making it track gep instruction. But beforehand, I strongly suggest separating deduction for known/assumption respectively.

What do you mean by the separation part?

I mean, running two different deduction scheme(known, assumption) might cause an unpredictable result.

define i32* @test_for_minus_index(i32* %p) {
  %q = getelementptr inbounds i32, i32* %p, i32 -2
  store i32 1, i32* %q
  ret i32* %q
}

AANonNullArgument is composed of AAArgumentFromCallSiteArguments, AAFromMustBeExecutedContext.
Assume that gep is tracked in followUse.

Iteration 1 :

  • AAFromMustBeExecutedContext will traverse uses of %p and prepare uses of %q for next iteration.
  • AAArgumentFromCallSiteArguments will call indicatePessimisticFixpoint because the function is not internal function. AANonNullArgument has already reached to pessimistic fixpoint so nonnull won't be deduced.

    This example is so simple that we can debug them but it is hard to debug more complex ones.

I see. You can explore uses exhaustively though that is a "local" solution to a general problem.
I think we need to keep known & assumed together but we should provide a way for AAs that have multiple deduction strategies to exhaust them seperatly, e.g, AAArgumentFromCallSiteArguments is known to have 2 schemes so we should track their "fixpoints" separate somehow.
In addition, or as an alternative, we could allow updates for AAs in a fixpoint if they opt-in to it. They would do so if they can improve based on known-information around them.

Looks good. Anyway, the current code maintains soundness ( "general" solution might not be reached) so I'll commit it if there is no problem.

uenoku updated this revision to Diff 223854.Oct 8 2019, 7:30 AM

Rebase.

uenoku updated this revision to Diff 223867.Oct 8 2019, 7:57 AM
uenoku marked 2 inline comments as done.

Address the last comments.

llvm/lib/Transforms/IPO/Attributor.cpp
1575

I agree.

llvm/test/Transforms/FunctionAttrs/dereferenceable.ll
199

It is my mistake. Fixed.

This revision was automatically updated to reflect the committed changes.

dereferenceable attribute is not added to the arguments?

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

I would expect

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly dereferenceable(16) %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly dereferenceable(16) %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

https://godbolt.org/z/if9rle

uenoku added a comment.EditedOct 9 2019, 8:59 AM

dereferenceable attribute is not added to the arguments?

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

I would expect

define dso_local void @_Z3fooPaS_S_ii(i8* noalias nocapture writeonly dereferenceable(16) %0, i8* noalias nocapture readnone %1, i8* noalias nocapture readonly dereferenceable(16) %2, i32 %3, i32 %4) local_unnamed_addr #0 {

tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture nonnull writeonly align 1 dereferenceable(16) %0, i8* noalias nocapture nonnull readonly align 1 dereferenceable(16) %2, i64 16, i1 false) #2
ret void

}

https://godbolt.org/z/if9rle

Actually, dereferenceable is added to the arguments if opt is executed for the above IR. https://godbolt.org/z/CItEJG

As far as I see the log, the problem is that InstCombine is located after Attributor Pass in -O3. Therefore, dereferenceable is not propagated.
Please take a look at the stderr output in https://godbolt.org/z/l_6aVE.

Solutions are to change the order or to run the Attributor several times.

This comment was removed by xbolva00.

Yeah, you probably want to run it multiple times.

@jdoerfert