This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Apply NSW and NUW flags via poison value analysis
ClosedPublic

Authored by broune on Jul 14 2015, 10:22 PM.

Details

Summary

Make Scalar Evolution able to propagate NSW and NUW flags from instructions to SCEVs in some cases. This is based on reasoning about when poison from instructions with these flags would trigger undefined behavior. This gives a 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There does not seem to be clear agreement about when poison should be considered to propagate through instructions. In this analysis, poison propagates only in cases where that should be uncontroversial.

This change makes LSR able to create induction variables for expressions like &ptr[i + offset] for loops like this:

for (int i = 0; i < limit; ++i) {
  sum += ptr[i + offset];
}

Here ptr is a 64 bit pointer and offset is a 32 bit integer. For NVPTX, LSR currently creates an induction variable for i + offset instead, which is not as fast. Improving this situation is what brings the 13% speed-up on some Eigen3-based Google-internal microbenchmarks for NVPTX.

There are more details in this discussion on llvmdev.
June: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-June/thread.html#87234
July: http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-July/thread.html#87392

Patch by Bjarke Roune

Diff Detail

Event Timeline

broune updated this revision to Diff 29752.Jul 14 2015, 10:22 PM
broune retitled this revision from to [SCEV] Apply NSW and NUW flags via poison value analysis.
broune updated this object.
broune added reviewers: eliben, atrick, sanjoy.
broune added subscribers: llvm-commits, meheff, jingyue.
sanjoy edited edge metadata.Jul 15 2015, 12:09 AM

First round of comments inline. I'll do a more detailed review in the next couple of days.

lib/Analysis/ScalarEvolution.cpp
4205

LLVM convention is auto *.

4284

Same comment as earlier -- I don't think this assert adds much value. You could take Value &V if your really cared about not passing in nullptr, but if I were you I'd just remove the assert.

4287

Nit: propagate.

4307

Question: why isn't undefinedBehaviorIsGuaranteedIfPoison enough? IOW, why can't we have

if (undefinedBehaviorIsGuaranteedIfPoison(BinOp))
  return Flags;
else
  return SCEV::FlagAnyWrap;
4312

LLVM convention is auto *AddRec.

sanjoy added inline comments.Jul 15 2015, 12:09 AM
lib/Analysis/ScalarEvolution.cpp
4095–4096

I think isGuaranteedToExecuteForEveryIteration, propagatesPoison, getOperandThatCausesUndefinedBehaviorIfPoison and undefinedBehaviorIsGuaranteedIfPoison should live in ValueTracking.h, given that ValueTracking contains similar things like isSafeToSpeculativelyExecute.

4099

I don't think these asserts are adding much here. Do you think it will be cleaner to take I and L by reference instead, to indicate this invariant?

4114

Use isa. Also mayThrow implies isa<CallInst> || isa<ResumeInst> and since ResumeInst is a terminator, I think this check can just be isa<CallInst> with a comment that this checks for both infinite loops and throws.

4116

I'd swap this check with the previous one -- otherwise you'll always return false for calls.

4125

I agree these are fairly "obvious", but I'd like to run these by David Majnemer and Nuno Lopes to make sure that these rules are at least congruent on what they have in mind for poison's future.

4130

I don't think this assert adds much. It is fairly obvious that I is expected to be non null and if it is, we'll get a deterministic segfault in I->getOpcode().

4210

Same comment as above about nullness of I.

4263

Some comment as in isGuaranteedToExecuteForEveryIteration for CallInst and mayThrow.

4272

You can directly iterate over I->users().

broune updated this revision to Diff 29834.Jul 15 2015, 3:28 PM
broune edited edge metadata.
broune marked 12 inline comments as done.

Address Sanjoy's initial comments.

lib/Analysis/ScalarEvolution.cpp
4099

I would have preferred references, but all the other functions in this file take pointers, so I wanted to be consistent. Asserting non-null is itself not consistent with the rest of this file, so I took it out.

4114

Done.

4116

Done.

4125

Thank you. I'm curious what their thoughts are on it.

4130

I removed it.

4205

Done.

4210

I removed it.

4263

Done.

4272

Thanks for the tip.

4284

I removed the assert.

4287

Done.

4307

I added a comment on why the other conditions are necessary. There may be a way around isLoopInvariant, but I'm not 100% sure about that, so I left it in.

broune updated this revision to Diff 29858.Jul 15 2015, 5:47 PM

Added and used isGuaranteedToTransferExecutionToSuccessor (is there a better name?). Also slight improvement to comments.

I checked all the instructions in the langref to see if any others might also not terminate. All I found is that while the langref doesn't explicitly say so, some atomics like atomicrmw do not necessarily terminate if another thread keeps interfering. Looking at the C++14 standard, some thread is guaranteed to make progress but I could not find a statement that any given thread is guaranteed to make progress, so I made isGuaranteedToTransferExecutionToSuccessor conservative on that point.

Added and used isGuaranteedToTransferExecutionToSuccessor (is there a better name?). Also slight improvement to comments.

I checked all the instructions in the langref to see if any others might also not terminate. All I found is that while the langref doesn't explicitly say so, some atomics like atomicrmw do not necessarily terminate if another thread keeps interfering. Looking at the C++14 standard, some thread is guaranteed to make progress but I could not find a statement that any given thread is guaranteed to make progress, so I made isGuaranteedToTransferExecutionToSuccessor conservative on that point.

I think we'll end up needing some additional attribute to let us get the C++ semantics at the IR level, and we do want them, but we also need to support languages like Java where infinite loops are (sadly) well defined. Regarding C++, 1.10p27 says:

The implementation may assume that any thread will eventually do one of the following:
  terminate
  make a call to a library I/O function
  access or modify a volatile object, or
  perform a synchronization operation or an atomic operation

 [Note: This is intended to allow compiler transformations such as removal of empty loops, even
  when termination cannot be proven. — end note ]

And, thus, when we can assume C++ semantics, any thread is guaranteed to make progress, or call some external function, or access a volatile/atomic variable.

majnemer added inline comments.
lib/Analysis/ValueTracking.cpp
3369–3370

Left shift by poison is poison, not UB.

3418–3419

Call? Invoke?

sanjoy added a comment.EditedJul 15 2015, 7:35 PM

Added and used isGuaranteedToTransferExecutionToSuccessor (is there a better name?). Also slight improvement to comments.

I checked all the instructions in the langref to see if any others might also not terminate. All I found is that while the langref doesn't explicitly say so, some atomics like atomicrmw do not necessarily terminate if another thread keeps interfering. Looking at the C++14 standard, some thread is guaranteed to make progress but I could not find a statement that any given thread is guaranteed to make progress, so I made isGuaranteedToTransferExecutionToSuccessor conservative on that point.

And, thus, when we can assume C++ semantics, any thread is guaranteed to make progress, or call some external function, or access a volatile/atomic variable.

I don't think this is relevant here -- even assuming C++ semantics [edit: unless we can prove the call to be readonly / readnone] CallInst is not guaranteed to always return -- the called function could be stalled doing an infinite number of volatile accesses for instance.

broune updated this revision to Diff 29928.Jul 16 2015, 11:23 AM
broune marked 2 inline comments as done.

Addresses David Majnemer's comment on shl.

Added and used isGuaranteedToTransferExecutionToSuccessor (is there a better name?). Also slight improvement to comments.

I checked all the instructions in the langref to see if any others might also not terminate. All I found is that while the langref doesn't explicitly say so, some atomics like atomicrmw do not necessarily terminate if another thread keeps interfering. Looking at the C++14 standard, some thread is guaranteed to make progress but I could not find a statement that any given thread is guaranteed to make progress, so I made isGuaranteedToTransferExecutionToSuccessor conservative on that point.

I think we'll end up needing some additional attribute to let us get the C++ semantics at the IR level, and we do want them, but we also need to support languages like Java where infinite loops are (sadly) well defined. Regarding C++, 1.10p27 says:

The implementation may assume that any thread will eventually do one of the following:
  terminate
  make a call to a library I/O function
  access or modify a volatile object, or
  perform a synchronization operation or an atomic operation

 [Note: This is intended to allow compiler transformations such as removal of empty loops, even
  when termination cannot be proven. — end note ]

And, thus, when we can assume C++ semantics, any thread is guaranteed to make progress, or call some external function, or access a volatile/atomic variable.

From the LLVM atomics guide, we have a pass AtomicExpandPass that e.g. can expand atomicrmw into a loop with compare-exchange. There may be a reason that such a loop always terminates, but I'm not aware of one. The expanded loop does meet the requirement that it will continually perform an atomic operation (just not successfully). If that isn't guaranteed to terminate, and AtomicExpandPass is correct in choosing that implementation for atomicrmw, then it's not clear to me that we can assume that atomicrmw terminates.

It may be that we can assume that e.g. atomicrmw always terminates, I just haven't so far been able to convince myself of that, so I decided to be conservative. I also haven't looked into the Java semantics. If you're confident that it's not an issue, I'll go with that.

lib/Analysis/ValueTracking.cpp
3369–3370

You're right, thank you. From the langref on shl:

"If op2 is (statically or dynamically) equal to or larger than the number of bits in op1, the result is undefined."

I had read that as undefined behavior, but it only says undefined, which I'm thinking just means undef. That 0 << poison would be poison makes sense to me, so I'll go with that if no one objects.

3418–3419

I'm not sure what the question is. Are you suggesting that there are some intrinsics that it would be good to handle? This function is conservative, so returning false for Call and Invoke is correct.

majnemer added inline comments.Jul 16 2015, 1:54 PM
lib/Analysis/ValueTracking.cpp
3418–3419

I misread how propagatesFullPoison is supposed to work. I think the name needs some work.

broune added inline comments.Jul 16 2015, 2:13 PM
lib/Analysis/ValueTracking.cpp
3418–3419

What thing would you like the name to be clearer about? I'm happy to change it. I could do isGuaranteedToYieldFullPoisonIfGivenFullPoison; it's clearer even if not as succinct. Any suggestions?

Second round of review comments inline.

include/llvm/Analysis/ScalarEvolution.h
587

Minor & optional: I'd call this getNoWrapFlagsFromUB.

include/llvm/Analysis/ValueTracking.h
313

Is the indentation a little bit off here?

Actually, I'll just assume you'll run this change through clang-format before checking in, and won't make any further whitespace related comments. Please let me know if you'd prefer otherwise.

327

I'd prefer if you were a little more terse -- perhaps getGuaranteedNonPoisonOp?

(IOW, which operand is guaranteed to not be poison since if it were poison then the program is undefined)

339

Similarly, how about calling this isKnownNotPoison?

lib/Analysis/ScalarEvolution.cpp
4144

I'm not sure that this is correct. I think you need to prove that the instruction you used to justify UB if V was poison is what needs to execute on every loop iteration.

This is not a problem currently because undefinedBehaviorIsGuaranteedIfFullPoison only looks at a single basic block, but semantically, the following program will be problematic:

loop_header:
  %x = add nsw %p, %q
  ...

outside_the_loop:
  ;; no other uses of %x
  store i32 42, GEP(@global, %x)

You can conclude that the %x does not overflow in the last iteration, but that's it -- even if %x was poison in all the other iterations your program is well defined.

lib/Analysis/ValueTracking.cpp
3327

If you're dealing with terminator instructions here (I'm not sure that that is necessary) why is br okay? Can't a br form an infinite loop?

Thanks for the comments, Sanjoy. I'll update the code with name changes Monday.

include/llvm/Analysis/ScalarEvolution.h
587

SGTM, I'll make the change.

include/llvm/Analysis/ValueTracking.h
313

Sorry, I need to find a better way to set up my editor. I'll run clang-format before checking in.

327

I wasn't so happy with it myself and I like your suggestion. Maybe getGuaranteedNonFullPoisonOp ? I'll change it Monday.

339

Nice. isKnownNotFullPoison? I'm concerned that it would be too easy to miss the distinction between poison and full-poison and the bugs from that would be hard to shake out.

lib/Analysis/ScalarEvolution.cpp
4144

As you point out, UB is not guaranteed on poison in this example, so even if undefinedBehaviorIsGuaranteedIfFullPoison is made stronger by considering more basic blocks, it should still return false here, right?

lib/Analysis/ValueTracking.cpp
3327

Yes, br can form an infinite loop. This function should still return false for br, as each single execution of br does terminate and then transfers execution to its successor (even if it is its own successor), but I suspect that that's not what you're getting at with this question. :)

Zooming out a bit, what I'm really using this function for is as a component of proving that one instruction B strongly post-dominates another instruction A. Part of that is to prove that there are no infinite paths from A to B, since in such a path we'd never actually get to B, even if B (non-strongly) post-dominates A. In the general case, yes, we'd also need to take into account loops between A and B within the same CFG/function and prove that they terminate. The main reason that I'm only considering one basic block at a time is to make things simpler by avoiding the need for that, since I think that this change is already complex enough as it is. Even then, we still need to look out for single instructions where a single invocation can be enough to prevent strong post-dominance (even within a basic block), and this function serves that purpose.

You're right that this over-all feature could work with a function like this that did not consider terminators. I included terminators anyway just because it gives this function a cohesive responsibility hat makes it easier to think about and it's also what would/will be needed when reasoning about strong post-dominance across basic blocks. I don't have any strong objection to taking the terminators out for now, though.

sanjoy added inline comments.Jul 19 2015, 10:23 PM
lib/Analysis/ScalarEvolution.cpp
4144

In that case why do we bother with isGuaranteedToExecuteForEveryIteration?

IOW, why not

if (undefinedBehaviorIsGuaranteedIfFullPoison(BinOp))
  return Flags;
lib/Analysis/ValueTracking.cpp
3327

I think removing terminators and assert(!isa<Terminator>(I) && "..."); will be an improvement. If we later decide to make the algorithm more precise around control flow, then we can consider adding terminators.

sanjoy added inline comments.Jul 19 2015, 10:41 PM
lib/Analysis/ValueTracking.cpp
3327

On second thought, I take back what I said and agree with your reasoning -- I think the function is fine as is.

broune added inline comments.Jul 19 2015, 10:56 PM
lib/Analysis/ScalarEvolution.cpp
4144

That would prove that if BinOp is executed, then what it calculates will not wrap. There is then still the possibility that BinOp is not executed on a given iteration, in which case we have no information about wrapping of the SCEV for that iteration. Then we cannot apply the flag to the SCEV as other instructions in the loop that map to the same SCEV would then also get the flag on the shared SCEV, but we have not actually proven that the shared SCEV does not wrap.

Minor nits inline. At this point I think this change is ready to go in once the style / naming fixes are done. However:

  • I'd like to take a look at the final change before it goes in.
  • I'd also like Andy to take a look before this goes in.

Side comment and optional: have you bootstrapped clang with this change? That's a good sanity check for this sort of change. You may consider bootstrapping with ubsan / asan too to get some extra coverage: the extra control flow the sanitizers add tends to shake out a lot of bugs.

include/llvm/Analysis/ValueTracking.h
293

This is borderline bikeshedding, but the rest of the file specifies behavior as a verb -- Return true if ...

311

Minor: I'd just say "Note that this currently only looks at the loop header". Specifically, I'd avoid using the term "analysis" since that has a specific meaning within LLVM.

327

SGTM.

339

SGTM.

lib/Analysis/ScalarEvolution.cpp
4113

Nitpick: elsewhere you use SCEV not "scev".

4144

Ah, right. You also have a large comment explaining the very same thing -- sorry for making you repeat yourself.

4207

The Do an operation by itself if a no-wrap flag can be applied bit did not parse for me.

lib/Analysis/ValueTracking.cpp
3397

Can't you use for (Value *V : OBO->operands()) here?

broune updated this revision to Diff 30226.Jul 20 2015, 6:21 PM
broune marked 40 inline comments as done.

Comments addressed and ran clang-format.

In D11212#208037, @sanjoy wrote: [...]

Side comment and optional: have you bootstrapped clang with this change? That's a good sanity check for this sort of change. You may consider bootstrapping with ubsan / asan too to get some extra coverage: the extra control flow the sanitizers add tends to shake out a lot of bugs.

I'm not sure what this entails, but I made a guess: I took a release-with-asserts build of llvm, built with my patch, and used the binaries from that with cmake like so:

CXX=path/to/release/with/asserts/clang/bin/clang++ CC=path/to/release/with/asserts/clang/bin/clang++ cmake -G Ninja ../llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=Yes

then I did ninja check and ninja check-asan. If that's not what you had in mind, feel free to let me know. I didn't find a guide on how to test LLVM via bootstrapping.

lib/Analysis/ScalarEvolution.cpp
4207

I clarified the comment.

broune updated this revision to Diff 30285.Jul 21 2015, 2:19 PM

Change comments to avoid referring to what this change does as "analysis".

atrick accepted this revision.Jul 27 2015, 12:44 AM
atrick edited edge metadata.

Looks good. Nice comments.

For the goal of reassociating array address computation, I would have tried a different approach, but I think this is still valid and generally adds more power to SCEV.

I trust that Sanjoy has tried to poke enough holes in the logic. Overall, it looks pretty solid to me. Thanks for the thorough review, Sanjoy.

This revision is now accepted and ready to land.Jul 27 2015, 12:44 AM
broune updated this revision to Diff 30751.Jul 27 2015, 3:47 PM
broune edited edge metadata.

Improve handling of case where V is a ConstantExpr in createSCEV.

sanjoy accepted this revision.Jul 27 2015, 5:30 PM
sanjoy edited edge metadata.

Some very minor non-semantic nits inline, otherwise looks good to me. Feel free to check in as is and fix the nits in a follow up change if you find that more convenient.

include/llvm/Analysis/ValueTracking.h
293

Please fix this in the comments for the other functions you added as well.

lib/Analysis/ScalarEvolution.cpp
4111

Nit: change "rec" to something more descriptive, perhaps "add recurrence"?

lib/Analysis/ValueTracking.cpp
3326

Minor nit: invokes can also throw (see https://llvm.org/bugs/show_bug.cgi?id=24185, especially https://llvm.org/bugs/show_bug.cgi?id=24185). Btw, I got the idea for testing for 24185 bug from reading this function.

broune updated this revision to Diff 30769.Jul 27 2015, 6:13 PM
broune edited edge metadata.
broune marked an inline comment as done.

Tiny update to comments.

broune marked an inline comment as done.Jul 27 2015, 6:16 PM

Thank you to Sanjoy and Andy for the review.

include/llvm/Analysis/ValueTracking.h
293

Argh, sorry about that. Done.

lib/Analysis/ValueTracking.cpp
3326

As I understand the bug, an invoke could throw somewhere other than to the landingpad successor in the CFG, if the landingpad is not a match for the exception thrown, so I updated the comment. I referred to the bug you mentioned.

jingyue updated this object.Jul 28 2015, 11:22 AM
jingyue closed this revision.Jul 28 2015, 11:23 AM