This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/Transforms/Scalar/
-
llvm/
-
Transforms/
-
Scalar/
-
GVN.h
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
GVN.cpp
-
test/Transforms/GVN/PRE/
-
Transforms/
-
GVN/
-
PRE/
-
2017-10-16-LoadPRECrash.ll
-
pre-load-guards.ll
-
pre-load-implicit-cf-updates.ll
-
pre-load.ll

Differential D37460

[GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors
ClosedPublic

Authored by mkazantsev on Sep 5 2017, 3:08 AM.

Download Raw Diff

Details

Reviewers

wmi
craig.topper
• dberlin
reames
efriedma

Commits

rG488ec975bb41: Reapply "[GVN] Prevent LoadPRE from hoisting across instructions that don't…
rG3b81809e06da: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control…
rL316975: Reapply "[GVN] Prevent LoadPRE from hoisting across instructions that don't…
rL315440: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control…

Summary

This patch fixes the miscompile that happens when PRE hoists loads across guards and
other instructions that don't always pass control flow to their successors. PRE is now prohibited
to hoist across such instructions because there is no guarantee that the load standing after such
instruction is still valid before such instruction. For example, a load from under a guard may be
invalid before the guard in the following case:

int array[LEN];
...
guard(0 <= index && index < LEN);
use(array[index]);

Diff Detail

Repository: rL LLVM

Event Timeline

mkazantsev created this revision.Sep 5 2017, 3:08 AM

Fixed formatting.

This seems wrong for assumes (I don't know enough about guards).

It is not incorrect to hoist through an assume, nor should doing so generate wrong code.
Assumes are essentially metadata, and documented as such.

Can you provide an example where the code generated is incorrect?

(guards i don't know enough about to say anything).

You've done this in an N^2 way when it should be amortized constant time (using orderedinstructions)

This revision now requires changes to proceed.Sep 5 2017, 6:42 AM

Hi Daniel,

The motivating example is the following situation:

int array[LEN];

void accessArray(int index) {
guard(0 <= index && index < LEN);
do_something(array[index]);
}

Here we guard on condition that our array access is within bounds, otherwise we reject the compiled code and deoptimize. Hoisting the array access to a place above the guard means that we try to access the array without knowing if it within the bounds or not. I will add a corresponding test.

It seems that you're right about assumes, I will remove them.

Ok, I will try to do it in a faster manner.

Removed assumes-related stuff.
Added a motivating test and comment.
Daniel, I didn't get what you mean under using ordered instructions to do this stuff. OrderedInstructions can only answer if a particular guard dominates our load. It will still take linear time to gather the guards, and for them we know for sure that they dominate the load and don't basically need to ask anything from OrderdeInstructions. I used any_of to speed the things up, but don't see any way to reduce complexity.

Could you please explain what you mean under using OrderedInstructions for gaining amortized constant complexity?

The problem:
Right now, for each load, you go through the loadbb, and see if you hit a guard.
This information does not change for each load. At most it changes as GVN removes things (and even then, no).
The number of guards is smaller than the number of blocks.

Thus, you could use ordered instructions to make this O(guards) * O(Number of loads) instead of O(number of instructions in each load block) * O(number of loads)
Trivially.
But you can do even better, and this is the constant time i refer to:
To make it constant time to check, and O(guards) to compute:

For each guard in the program:
   // Make FirstGuard point to the first guard in each block
   auto Iter = FirstGuard.(guard->getParent())
   if (Iter != FIrstGuard.end())
      FirstGuard.insert({guard->parent(), guard})
   else
     if (OrderedInstructions->dominates(*Iter, guard))
      Iter->second = guard

In processLoad, replace your entire loop with:

if the bb doesn't exist in firstguard
    you are fine
if the bb does, check whether orderedinstructions->dominates(FirstGuard[BB], load)

Basically, this is: Build a table of where the first guard in the block is. Since you are trying to see if there is a guard above a given instruction in the block, the only thing that matters is where the first guard in the block is, because eventually you will hit that guard if you go up. OrderedInstructions will tell you whether that first guard is before or after the load in the block.

You already walk all the instructions in the main gvn loop in RPO order. You can fill in firstguard as you go, as it will see all the guards before anything that one could try to lift above, except in the case of loops.

Thanks for the detailed clarification! I will do that.

Of course. Just in case you try to use the code, i'm pretty sure if (OrderedInstructions->dominates(*Iter, guard))
should be if (OrderedInstructions->dominates(guard, *Iter))

(At least one of the calls is reversed)

Actually we don't even need OrderedInstructions to do that. It's enough to maintain a set of blocks where we have ever seen the guards. We iterate through blocks in reverse postorder which guarantees us that if a block has one predecessor it will be visited before the block. We also traverse block instructions in direct order. Using this, we just mark a block as having a guard when we meet one and know for sure that:

if the current block is marked, there is a guard above the current instruction, and we cannot hoist;
if the only predecessor (recursively) is marked, it also blocks us from hoisting.

I've updated the patch using this solution. Please let me know if it makes sense to you.

In D37460#864436, @mkazantsev wrote:

Actually we don't even need OrderedInstructions to do that. It's enough to maintain a set of blocks where we have ever seen the guards. We iterate through blocks in reverse postorder which guarantees us that if a block has one predecessor it will be visited before the block. We also traverse block instructions in direct order. Using this, we just mark a block as having a guard when we meet one and know for sure that:

if the current block is marked, there is a guard above the current instruction, and we cannot hoist;

if the only predecessor (recursively) is marked, it also blocks us from hoisting.

I've updated the patch using this solution. Please let me know if it makes sense to you.

Sure, this will work, but it requires clearing the guard set between iterations :)
Given how much faster either solution is than the original option, seems fine to me. I doubt it will ever be a real performance issue.

The expected size of set of blocks with guards is actually small, so cleaning should be cheap. Could you please approve the patch for merge?

Need to clean guard set between iterations, not one time in the end.

Cleaning between iterations now.

efriedma added a subscriber: efriedma.Sep 8 2017, 12:21 PM

efriedma added inline comments.

lib/Transforms/Scalar/GVN.cpp
2042 ↗	(On Diff #114316)	Why does this specifically apply to guards, as opposed to anything where isGuaranteedToTransferExecutionToSuccessor() is false?

• dberlin added inline comments.Sep 8 2017, 12:55 PM

lib/Transforms/Scalar/GVN.cpp
2042 ↗	(On Diff #114316)	Yeah, it looks like GVN presumes that if it's a straight line non-infinite loop, they have the same "guaranteed to execute" properties. As you correctly point out, since LLVM has implicit control flow, this is not always true. I suspect GVN's PRE is quite broken here in that respect in theory. However, most of the effect of isGuaranteed is on memory instructions, and those almost certainly get corrected by memdep (IE it ends up respecting the same things that isGuaranteed* is respecting). Going through the rest: if (const auto CRI = dyn_cast<CleanupReturnInst>(I)) return !CRI->unwindsToCaller(); if (const auto CatchSwitch = dyn_cast<CatchSwitchInst>(I)) return !CatchSwitch->unwindsToCaller(); if (isa<ResumeInst>(I)) return false; if (isa<ReturnInst>(I)) return false; if (isa<UnreachableInst>(I)) return false; None of these will have successors, so there shouldn't be a predecessor we hit to check anyway. You can hoist above them, just not sink below them. So those should be fine. The callsite checking, except for throwing, is probably also handled by memdep's insertion of memory dependencies. So I suspect you may be able to rig up a testcase where memdep and isGuaranteed* end up disagreeing in a way that causes this to break. But i also suspect it's hard. Thoughts?

(and using isGuarante* would handle guard since guard is marked as throwing)

So I suspect you may be able to rig up a testcase where memdep and isGuaranteed* end up disagreeing in a way that causes this to break.

See https://reviews.llvm.org/D21041. :)

mkazantsev added inline comments.Sep 11 2017, 12:10 AM

lib/Transforms/Scalar/GVN.cpp

2042 ↗

(On Diff #114316)

Hi Eli,

I tried to use use isGuaranteedToTransferExecutionToSuccessor, and it works for my example, but another tests starts failing. The test is test7 from Transforms/GVN/PRE/volatile.ll which is:

; Does cross block PRE work with volatiles?
define i32 @test7(i1 %c, i32* noalias nocapture %p, i32* noalias nocapture %q) {
; CHECK-LABEL: test7
; CHECK-LABEL: entry.header_crit_edge:
; CHECK:       %y.pre = load i32, i32* %p
; CHECK-LABEL: skip:
; CHECK:       %y1 = load i32, i32* %p
; CHECK-LABEL: header:
; CHECK:      %y = phi i32
; CHECK-NEXT: %x = load volatile i32, i32* %q
; CHECK-NEXT: %add = sub i32 %y, %x
entry:
  br i1 %c, label %header, label %skip
skip:
  %y1 = load i32, i32* %p
  call void @use(i32 %y1)
  br label %header
header:
  %x = load volatile i32, i32* %q
  %y = load i32, i32* %p
  %add = sub i32 %y, %x
  %cnd = icmp eq i32 %add, 0
  br i1 %cnd, label %exit, label %header
exit:
  ret i32 %add
}

It now fails because this function returns false for olatile loads and stores because they are allowed to trap. This logic was added in patch https://reviews.llvm.org/D21167 and Sanjoy's comment about why we do that remained unanswered. I don't clearly understand why you've added that, could you please clarify why it was done and is it OK if this test now fails?

Thanks,
Max

We consider volatile loads and stores as potentially trapping as a courtesy to our users. Strictly speaking, it's undefined behavior in C, but people sometimes intentionally trigger SIGSEGV using volatile loads, so we don't want to cause an unrelated crash before that happens. Breaking that testcase should be fine.

Eli, i presume an updated version with isGuaranteed... should fix your testcases from 21041, right?
If so, Max, can you add them to this patch?

I think it will fix those testcases... at least, it will fix the load testcase (I don't think this patch addresses the divide-by-zero problem).

In D37460#866870, @efriedma wrote:

We consider volatile loads and stores as potentially trapping as a courtesy to our users. Strictly speaking, it's undefined behavior in C, but people sometimes intentionally trigger SIGSEGV using volatile loads, so we don't want to cause an unrelated crash before that happens. Breaking that testcase should be fine.

I disagree. This would be breaking optimizations *across* a volatile load, not breaking optimizations *of* a volatile load. I think the isGuaranteedTo... scheme is probably the right one, but I think we got our semantics for that function wrong for volatile loads.

Given we've not documented volatile loads as potentially trapping, I don't think we should *partially* implement that semantic. I'd actually like to have a faulting load concept in the IR, but hacking that in for volatile seems misguided.

For the moment, I'll suggest a compromise to unblock this review from the broader discussion. Can we use the isGuaranteedTo... scheme, but additionally whitelist skipping over volatile loads? That let's us make progress on this correctness issue while taking the volatile semantics discussion elsewhere.

lib/Transforms/Scalar/GVN.cpp
2049 ↗	(On Diff #114316)	I'm not sure building this set in RPO is sufficient in the face of cycles. I believe the current PRE code would bail out of said cycles, but at minimum, that assumptions needs to be carefully documented and asserted for. i.e. how do we know that BlocksWithGuards has been populated for each block we traverse backwards through? In particular, if we extend this to isGuaranteedTo... and the scalar PRE case from https://reviews.llvm.org/D21041, it's not obvious to me this code would be correct.

The volatile load thing never really got discussed in depth, as far as I remember... I wanted to be conservative because volatile memory access can be weird, but I guess my interpretation in isGuaranteedToTransferExecutionToSuccessor() is a little suspect given the way alias analysis treats volatile operations.

In D37460#867397, @efriedma wrote:

The volatile load thing never really got discussed in depth, as far as I remember... I wanted to be conservative because volatile memory access can be weird, but I guess my interpretation in isGuaranteedToTransferExecutionToSuccessor() is a little suspect given the way alias analysis treats volatile operations.

I think that it would be good to revisit this. As you say, we'd also need to model this in AA (by essentially making all volatile access universally aliasing because the trap handlers could be arbitrary functions). Moreover, it would essentially mean modeling volatile loads as both reading and writing to memory.

What I consider to be "regular" uses of volatile, handling memory-mapped I/O and communicating with signal/interrupt handlers, I don't think require this "loads also write" treatment. These use cases might want universally-aliasing stores (because the stores might trigger some device or signal/interrupt handler to do something arbitrary), but can allow more-precise AA for volatile loads.

I don't know exactly what use cases we want to support with volatile accesses. If it's essentially just memory-mapped I/O (for device drivers and the like), then it might make sense to make . If we other use cases, then it might not make sense (and, as Philip says, we might want some other mechanism).

If we do want to make volatile accesses universally aliasing and writing, then I think we should remove them from the IR all together and add some intrinsics instead for this use case.

• dberlin added inline comments.Sep 11 2017, 7:53 PM

lib/Transforms/Scalar/GVN.cpp
2049 ↗	(On Diff #114316)	It's definitely not sufficient in the case of cycles. The only reason it's thought to be sufficient here is that you are guaranteed that any straight line predecessors have already been processed. Since it's only walking back through straight line predecessors, ... So yes, if it tried to use this info in random predecessors, it would not be correct the way it is being computed here.

mkazantsev added inline comments.Sep 11 2017, 10:02 PM

lib/Transforms/Scalar/GVN.cpp
2049 ↗	(On Diff #114316)	Yes, we only can rely on this set if we have visited all the predecessors before. This is true for the case where we use it (because there is only 1 predecessor) and not true in general. I'll write a comment on that to make it clear.

Reused isGuaranteedToTransferExecutionToSuccessor, added some clarifying comments. In this version of patch, we explicitly allow hoisting across volatile loads/stores. Given the discussion here, we might want to modify isGuaranteedToTransferExecutionToSuccessor so that it returns true for volatile loads/stores. Once the ongoing discussion on volatiles comes to some conclusion, this part can be reworked.

mkazantsev updated this revision to Diff 114762.Sep 11 2017, 11:06 PM

Could you add the testcases from D21041?

mkazantsev added a reviewer: efriedma.Sep 12 2017, 8:41 PM

mkazantsev updated this revision to Diff 114967.Sep 12 2017, 9:50 PM

Added some tests, changed commit message.

mkazantsev updated this revision to Diff 114969.Sep 12 2017, 9:52 PM

• dberlin added inline comments.Sep 12 2017, 10:08 PM

lib/Transforms/Scalar/GVN.cpp
2070 ↗	(On Diff #114967)	It's worth noting: The RPO iteration order only guarantees that checking straight line predecessors is safe[1] It would not be safe to check other predecessors unless the program is loop free (IE such a thing would have to be computed separately from this loop). [1] I can prove this if you like - A visit order dfs walk of the dominator tree is a valid RPO order [2]. Any straight line single predecessor must be your idom, because there is no path around it to your node. Because the dominator tree is acylic, and a valid RPO order, RPO order must guarantee you visit this node before you. The only way something else can be visited first is if there was another path to your node (which would in turn prove that that your idom is not really your idom) , [2] LLVM does not happen to keep it in the sort order, but you can see that, for example, NewGVN sorts it back into the "right" order.

Factored the filling of implicit control flow map out of main iteration loop.

• dberlin added inline comments.Sep 13 2017, 7:39 AM

lib/Transforms/Scalar/GVN.cpp
2351 ↗	(On Diff #114995)	This is expensive to compute, you should pass it in.
2365 ↗	(On Diff #114995)	for (auto &I : BB) (the &*BI will just be &I below)
2368 ↗	(On Diff #114995)	This map could be const BasicBlock->const Instruction, FWIW.

Nits.

efriedma added inline comments.Sep 13 2017, 11:25 AM

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	GVN got smarter, so this isn't actually testing what it's supposed to test anymore. Maybe try replacing "icmp eq i32 %p, %q" with "icmp eq i32 %p, %r", where "%r" is a parameter. And I guess pass %a to may_exit(), so it won't get moved if we come up with some new form of sinking in the future.

mkazantsev added inline comments.Sep 13 2017, 8:03 PM

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	What is the difference between `icmp eq i32 %p, %q` where `%q` is a parameter and `icmp eq i32 %p, %r` where `%r` is a parameter? If you mean that we should branch by unknown condition then we could just add parameter `i5 %cond`. Anyways, I don't think that extension of the test set can somehow block us from merging this functional fix. I also don't see what is the conceptual difference between this test and what you describe. The transformation across `may_exit` still doesn't happen. Do you mind if we consider adding new tests as NFC later?

mkazantsev added inline comments.Sep 13 2017, 8:04 PM

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	`%i1` cond off course :)

efriedma added inline comments.Sep 14 2017, 12:30 PM

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	`i1 %cond` probably does the same thing, yes. The important part is that we don't simplify `sdiv i32 %p, %q` to 1. Fixing the underlying issue here doesn't necessarily block committing this patch, I guess... but it's probably something you want to address at some point, and it's very closely related to this patch.

mkazantsev added inline comments.Sep 15 2017, 4:08 AM

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	I want to clearly distinguish fixing miscompile with guards and a missing opportunity for turning this `div` into `1`. Do we have a bug to address this missing opportunity? You could file it if you think this case is important. But I don't think it is a part of the problem being addressed by this patch.

efriedma added inline comments.Sep 15 2017, 11:06 AM

test/Transforms/GVN/PRE/local-pre.ll

48 ↗

(On Diff #115055)

Sorry, probably should have written this out in the first place.

GVN currently performs an incorrect optimization for the following testcase:

define i32 @test2(i32 %p, i32 %q, i1 %r) {
block1:
  br i1 %r, label %block2, label %block3

block2:
 %a = sdiv i32 %p, %q
 br label %block4

block3:
  br label %block4

block4:
  %phi = phi i32 [ 0, %block3 ], [ %a, %block2 ]
  call void @may_exit(i32 %phi) nounwind
  %b = sdiv i32 %p, %q
  ret i32 %b

}
declare void @may_exit(i32) nounwind

mkazantsev marked an inline comment as done.Sep 15 2017, 8:17 PM

mkazantsev added inline comments.

test/Transforms/GVN/PRE/local-pre.ll
48 ↗	(On Diff #115055)	Ah, now I got what you mean. :) Thanks for pointing out! Fixed.

Fixed one more case with PRE, updated tests accordingly.

• dberlin added inline comments.Sep 15 2017, 9:07 PM

lib/Transforms/Scalar/GVN.cpp
2170 ↗	(On Diff #115529)	Please use ordered instructions, which caches this, and invalidate it as needed.

Reworked with OrderedInstructions.

List Item

The loadPRE part of this seems very close to being ready to submit.

lib/Transforms/Scalar/GVN.cpp
2189 ↗	(On Diff #117313)	This is overly conservative for any instruction which can not ever fault. For instance, PRE a load over a throwing call is perfectly sound. Note that preserving the nsw/nuw flags is questionable in the same case, but that's a hard discussion. :) I'd prefer to see the scalarPRE part separate into it's own patch. I know I'm saying the exact opposite of Eli here, but I think this needs more discussion and trying to combine them will delay the loadPRE patch unnecessarily.
2406 ↗	(On Diff #117313)	This is scarily permissive. It (for instance) allows ordered loads and stores which (I think) is valid. It'd be much safer to add a special matcher for volatile-only loads/stores here.

This revision now requires changes to proceed.Oct 2 2017, 8:21 PM

mkazantsev added inline comments.Oct 2 2017, 9:44 PM

lib/Transforms/Scalar/GVN.cpp
2189 ↗	(On Diff #117313)	I disagree that PRE over a throwing call is invalid. Imagine the following situation: int arr[LEN]; void foo(index) { ... call rangeCheck(index, LEN); %x = load %arr, %index print(%x); } void rangeCheck(int index, int len) { if (index < 0 \|\| index >= len) throw exception; } We cannot PRE across the throwing call here.
2399 ↗	(On Diff #117313)	typo: `volatile`
2406 ↗	(On Diff #117313)	Will do.

mkazantsev added inline comments.Oct 2 2017, 9:46 PM

lib/Transforms/Scalar/GVN.cpp
2189 ↗	(On Diff #117313)	I disagree that prohibiting PRE over a throwing call is invalid :)

Need rework: address comments & fix failure found on internal test.

reames added inline comments.Oct 3 2017, 9:57 AM

lib/Transforms/Scalar/GVN.cpp
2189 ↗	(On Diff #117313)	Sorry, I should have said "PRE of a known safe to speculate load over a throwing call is perfectly sound". My sloppy wording here was really confusing, sorry!

efriedma added inline comments.Oct 3 2017, 11:24 AM

lib/Transforms/Scalar/GVN.cpp
2406 ↗	(On Diff #117313)	Hoisting a load across an Acquire operation is often not allowed, yes... but alias analysis will catch that; we don't need to handle it here.

efriedma added inline comments.Oct 3 2017, 11:53 AM

lib/Transforms/Scalar/GVN.cpp
2189 ↗	(On Diff #117313)	I'd prefer to see the scalarPRE part separate into it's own patch. I know I'm saying the exact opposite of Eli here, but I think this needs more discussion and trying to combine them will delay the loadPRE patch unnecessarily. I mostly pointed out the issue with scalars here because it makes sense to write the patches together; they're very similar issues, and the solution requires the same infrastructure. If you want to separate it out for review/bisectability, that's fine. And yes, we should probably be checking for isSafeToSpeculativelyExecute(), for both scalar and load PRE.

mkazantsev marked 3 inline comments as done.Oct 6 2017, 4:22 AM

mkazantsev added inline comments.

lib/Transforms/Scalar/GVN.cpp
2406 ↗	(On Diff #117313)	Added assert that the said load and store are volatile. Otherwise `isGuaranteedToTransferExecutionToSuccessor` is true for them.

ScalarPRE will go in follow-up patch

mkazantsev mentioned this in D38619: [GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors.Oct 6 2017, 4:50 AM

mkazantsev added a child revision: D38619: [GVN] Prevent ScalarPRE from hoisting across instructions that don't pass control flow to successors.

hiraditya added a subscriber: hiraditya.Oct 10 2017, 10:41 AM

LGTM

Closed by commit rL315440: [GVN] Prevent LoadPRE from hoisting across instructions that don't pass control… (authored by mkazantsev). · Explain WhyOct 11 2017, 1:11 AM

This revision was automatically updated to reflect the committed changes.

mppf mentioned this in D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug).Oct 12 2017, 7:15 AM

Hi @mkazantsev , this change seems to have caused a regression. Before this change, the following program compiled with opt -gvn minimal64.ll -S but after, it causes an assertion failure in the compiler. I've verified that this failure is still present on trunk as of r315579.

GVN.cpp:1073: bool llvm::GVN::PerformLoadPRE(llvm::LoadInst*, llvm::GVN::AvailValInBlkVect&, llvm::GVN::UnavailBlkVect&): Assertion `It->second->getParent() == TmpBB && "Implicit control flow map broken?"' failed.

Here is a carefully minimized test case (from a failure compiling with our frontend):

; ModuleID = 'simpler.ll'
source_filename = "bugpoint-output-5fd3360.bc"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%ArrayImpl = type { i64, i64 addrspace(100)*, [1 x i64], [1 x i64], [1 x i64], i64, i64, double addrspace(100)*, double addrspace(100)*, i8, i64 }

; Function Attrs: readnone
declare %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)*) #0

; Function Attrs: readnone
declare i64* @getaddr_i64(i64 addrspace(100)*) #0

define hidden void @wrapon_fn173() {
entry:
  %0 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)
  br label %loop

loop:
  %1 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)
  %2 = load i64 addrspace(100)*, i64 addrspace(100)** null, align 8
  %3 = call i64* @getaddr_i64(i64 addrspace(100)* %2)
  br label %loop
}

attributes #0 = { readnone }

To reproduce, put the above into minimal64.ll and run opt -gvn minimal64.ll -S.

Note that if the "readnone" attribute is removed from the getaddr functions, the test will pass.

I'd greatly appreciate any attention you can give to this matter. (I'd make a bug report but don't have an account on the bug report server yet).

Thanks!

FWIW I've also reported this as this bug: https://bugs.llvm.org/show_bug.cgi?id=34937

This is use after free.
When the first implicit control flow instruction for a bb gets erased, we now have wrong info.

Previously, when this was filled in on the fly, this would be easy to fix.
But now it's much harder.

I attached a patch for this to PR 34937.
I don't have time ATM to perform testing and shepherding of it through review, however.

Thanks guys for looking into the problem! I've submitted a more conservative version of fix proposed by Daniel for review as https://reviews.llvm.org/D38944 (it fixes the map but does not rework GVN's deletion mechanism because it leads to further troubles).

Was reverted as rL315974 due to bug https://bugs.llvm.org/show_bug.cgi?id=34937 . Fix is on review as https://reviews.llvm.org/D38944.

mkazantsev added a child revision: D38944: [GVN] Handle removal of first implicit CF instruction correctly.Oct 18 2017, 10:29 PM

mkazantsev edited the summary of this revision. (Show Details)Oct 26 2017, 10:11 PM

I am going to abandon https://reviews.llvm.org/D38944 and combine its changes into here with all possible NFC factored out. It would be more logical to keep them together.

mkazantsev added a parent revision: D39369: [GVN][NFC] Mark instruction for deletion instead of immediate erasing in LoadPRE.Oct 27 2017, 4:08 AM

Changes from https://reviews.llvm.org/D38944 (fixed version) have been included into this one since they must go together.

Ok, once more very close. Thank you for working through the issues raised and splitting out the various bits. Now that's done, one more round of updates and we should be good for a resubmit.

lib/Transforms/Scalar/GVN.cpp
2101 ↗	(On Diff #120572)	Are we guaranteed this entry exists in the map? Or are you relying on default construction of the value type? I think the second, in which case it might be worth renaming the variable to something like FirstICFOrNull for clarity.
2116 ↗	(On Diff #120572)	This check is implied by one further up in the same scope.
2312 ↗	(On Diff #120572)	It just occurred to me, are you sure loads - which are the only instruction type in use here - every have implicit control flow? If not, this piece is dead code and should be replaced with an assertion instead. In particular: assert(!MayNotTransferExecutionToSuccessor(CurInst), "loads don't have implicit control flow");
test/Transforms/GVN/PRE/pre-load.ll
434 ↗	(On Diff #120572)	across potentially throwing calls
436 ↗	(On Diff #120572)	Required - Add an analogous test case where @f is in a block other than the origin block. Required - Add a pair of analogous test cases where %x is marked as dereferenceable to exercise the speculation logic in each case.

This revision now requires changes to proceed.Oct 27 2017, 9:51 AM

reames added inline comments.Oct 29 2017, 11:28 PM

lib/Transforms/Scalar/GVN.cpp
2312 ↗	(On Diff #120572)	Talking to Max, we realized my comment here is wrong. This is the Scalar PRE block, not the Load PRE block I thought I was looking at.

mkazantsev added inline comments.Oct 29 2017, 11:32 PM

lib/Transforms/Scalar/GVN.cpp
2312 ↗	(On Diff #120572)	Yeah, and in general we mark all instructions for deletion (no matter loads or not) and then process them uniformly. That it is not done here is just a mess (due to some assertion failures if we do); I think it is to be fixed in future.

Addressed comments.

reames requested changes to this revision.Oct 30 2017, 10:47 AM

reames added inline comments.

lib/Transforms/Scalar/GVN.cpp
2318 ↗	(On Diff #120773)	Caught one last bug. The invalidation needs to be conditional since the rebuild is. As currently structured, this will leave the entry empty allowing a miscompile. Do we have a test case for the scalarPRE invalidation? If so, why didn't it catch this? If not, please add one.

This revision now requires changes to proceed.Oct 30 2017, 10:47 AM

mkazantsev added inline comments.Oct 30 2017, 9:13 PM

lib/Transforms/Scalar/GVN.cpp
2318 ↗	(On Diff #120773)	It is not a bug. Here we invalidate OrderedInstructions for this block because we have just erased an instruction. We should do it for all instructions, not only for implicit control flow. The re-fill of the implicit control flow map below is conditional because we only need it for ICF instructions. So at my point, it's OK.

LGTM

lib/Transforms/Scalar/GVN.cpp
2318 ↗	(On Diff #120773)	You are correct. I misread the code (again).

Closed by commit rL316975: Reapply "[GVN] Prevent LoadPRE from hoisting across instructions that don't… (authored by mkazantsev). · Explain WhyOct 30 2017, 10:08 PM

This revision was automatically updated to reflect the committed changes.

mkazantsev mentioned this in rL319147: [GVN] Prevent ScalarPRE from hoisting across instructions that don't pass….Nov 27 2017, 11:08 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Transforms/

Scalar/

GVN.h

7 lines

lib/

Transforms/

Scalar/

GVN.cpp

97 lines

test/

Transforms/

GVN/

PRE/

2017-10-16-LoadPRECrash.ll

7 lines

pre-load-guards.ll

146 lines

pre-load-implicit-cf-updates.ll

118 lines

pre-load.ll

158 lines

Diff 120935

llvm/trunk/include/llvm/Transforms/Scalar/GVN.h

Show All 12 Lines
///		///
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_TRANSFORMS_SCALAR_GVN_H		#ifndef LLVM_TRANSFORMS_SCALAR_GVN_H
#define LLVM_TRANSFORMS_SCALAR_GVN_H		#define LLVM_TRANSFORMS_SCALAR_GVN_H

#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SetVector.h"		#include "llvm/ADT/SetVector.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/Support/Allocator.h"		#include "llvm/Support/Allocator.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
		#include "llvm/Transforms/Utils/OrderedInstructions.h"
#include <cstdint>		#include <cstdint>
#include <utility>		#include <utility>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

class AssumptionCache;		class AssumptionCache;
class BasicBlock;		class BasicBlock;
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	private:
friend struct DenseMapInfo<Expression>;		friend struct DenseMapInfo<Expression>;

MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
DominatorTree *DT;		DominatorTree *DT;
const TargetLibraryInfo *TLI;		const TargetLibraryInfo *TLI;
AssumptionCache *AC;		AssumptionCache *AC;
SetVector<BasicBlock *> DeadBlocks;		SetVector<BasicBlock *> DeadBlocks;
OptimizationRemarkEmitter *ORE;		OptimizationRemarkEmitter *ORE;
		// Maps a block to the topmost instruction with implicit control flow in it.
		DenseMap<const BasicBlock , const Instruction >
		FirstImplicitControlFlowInsts;

		OrderedInstructions *OI;
ValueTable VN;		ValueTable VN;

/// A mapping from value numbers to lists of Value*'s that		/// A mapping from value numbers to lists of Value*'s that
/// have that value number. Use findLeader to query it.		/// have that value number. Use findLeader to query it.
struct LeaderTableEntry {		struct LeaderTableEntry {
Value *Val;		Value *Val;
const BasicBlock *BB;		const BasicBlock *BB;
LeaderTableEntry *Next;		LeaderTableEntry *Next;
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	private:
void dump(DenseMap<uint32_t, Value *> &d) const;		void dump(DenseMap<uint32_t, Value *> &d) const;
bool iterateOnFunction(Function &F);		bool iterateOnFunction(Function &F);
bool performPRE(Function &F);		bool performPRE(Function &F);
bool performScalarPRE(Instruction *I);		bool performScalarPRE(Instruction *I);
bool performScalarPREInsertion(Instruction Instr, BasicBlock Pred,		bool performScalarPREInsertion(Instruction Instr, BasicBlock Pred,
BasicBlock *Curr, unsigned int ValNo);		BasicBlock *Curr, unsigned int ValNo);
Value findLeader(const BasicBlock BB, uint32_t num);		Value findLeader(const BasicBlock BB, uint32_t num);
void cleanupGlobalSets();		void cleanupGlobalSets();
		void fillImplicitControlFlowInfo(BasicBlock *BB);
void verifyRemoved(const Instruction *I) const;		void verifyRemoved(const Instruction *I) const;
bool splitCriticalEdges();		bool splitCriticalEdges();
BasicBlock splitCriticalEdges(BasicBlock Pred, BasicBlock *Succ);		BasicBlock splitCriticalEdges(BasicBlock Pred, BasicBlock *Succ);
bool replaceOperandsWithConsts(Instruction *I) const;		bool replaceOperandsWithConsts(Instruction *I) const;
bool propagateEquality(Value LHS, Value RHS, const BasicBlockEdge &Root,		bool propagateEquality(Value LHS, Value RHS, const BasicBlockEdge &Root,
bool DominatesByEdge);		bool DominatesByEdge);
bool processFoldableCondBr(BranchInst *BI);		bool processFoldableCondBr(BranchInst *BI);
void addDeadBlock(BasicBlock *BB);		void addDeadBlock(BasicBlock *BB);
Show All 25 Lines

llvm/trunk/lib/Transforms/Scalar/GVN.cpp

Show All 32 Lines
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/InstructionSimplify.h"		#include "llvm/Analysis/InstructionSimplify.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/MemoryBuiltins.h"		#include "llvm/Analysis/MemoryBuiltins.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/PHITransAddr.h"		#include "llvm/Analysis/PHITransAddr.h"
#include "llvm/Analysis/TargetLibraryInfo.h"		#include "llvm/Analysis/TargetLibraryInfo.h"
		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
▲ Show 20 Lines • Show All 994 Lines • ▼ Show 20 Lines	bool GVN::PerformLoadPRE(LoadInst *LI, AvailValInBlkVect &ValuesPerBlock,

SmallPtrSet<BasicBlock *, 4> Blockers(UnavailableBlocks.begin(),		SmallPtrSet<BasicBlock *, 4> Blockers(UnavailableBlocks.begin(),
UnavailableBlocks.end());		UnavailableBlocks.end());

// Let's find the first basic block with more than one predecessor. Walk		// Let's find the first basic block with more than one predecessor. Walk
// backwards through predecessors if needed.		// backwards through predecessors if needed.
BasicBlock *LoadBB = LI->getParent();		BasicBlock *LoadBB = LI->getParent();
BasicBlock *TmpBB = LoadBB;		BasicBlock *TmpBB = LoadBB;
		bool IsSafeToSpeculativelyExecute = isSafeToSpeculativelyExecute(LI);

		// Check that there is no implicit control flow instructions above our load in
		// its block. If there is an instruction that doesn't always pass the
		// execution to the following instruction, then moving through it may become
		// invalid. For example:
		//
		// int arr[LEN];
		// int index = ???;
		// ...
		// guard(0 <= index && index < LEN);
		// use(arr[index]);
		//
		// It is illegal to move the array access to any point above the guard,
		// because if the index is out of bounds we should deoptimize rather than
		// access the array.
		// Check that there is no guard in this block above our intruction.
		if (!IsSafeToSpeculativelyExecute) {
		auto It = FirstImplicitControlFlowInsts.find(TmpBB);
		if (It != FirstImplicitControlFlowInsts.end()) {
		assert(It->second->getParent() == TmpBB &&
		"Implicit control flow map broken?");
		if (OI->dominates(It->second, LI))
		return false;
		}
		}
while (TmpBB->getSinglePredecessor()) {		while (TmpBB->getSinglePredecessor()) {
TmpBB = TmpBB->getSinglePredecessor();		TmpBB = TmpBB->getSinglePredecessor();
if (TmpBB == LoadBB) // Infinite (unreachable) loop.		if (TmpBB == LoadBB) // Infinite (unreachable) loop.
return false;		return false;
if (Blockers.count(TmpBB))		if (Blockers.count(TmpBB))
return false;		return false;

// If any of these blocks has more than one successor (i.e. if the edge we		// If any of these blocks has more than one successor (i.e. if the edge we
// just traversed was critical), then there are other paths through this		// just traversed was critical), then there are other paths through this
// block along which the load may not be anticipated. Hoisting the load		// block along which the load may not be anticipated. Hoisting the load
// above this block would be adding the load to execution paths along		// above this block would be adding the load to execution paths along
// which it was not previously executed.		// which it was not previously executed.
if (TmpBB->getTerminator()->getNumSuccessors() != 1)		if (TmpBB->getTerminator()->getNumSuccessors() != 1)
return false;		return false;

		// Check that there is no implicit control flow in a block above.
		if (!IsSafeToSpeculativelyExecute &&
		FirstImplicitControlFlowInsts.count(TmpBB))
		return false;
}		}

assert(TmpBB);		assert(TmpBB);
LoadBB = TmpBB;		LoadBB = TmpBB;

// Check to see how many predecessors have the loaded value fully		// Check to see how many predecessors have the loaded value fully
// available.		// available.
MapVector<BasicBlock , Value > PredLoads;		MapVector<BasicBlock , Value > PredLoads;
▲ Show 20 Lines • Show All 903 Lines • ▼ Show 20 Lines	bool GVN::runImpl(Function &F, AssumptionCache &RunAC, DominatorTree &RunDT,
MemoryDependenceResults RunMD, LoopInfo LI,		MemoryDependenceResults RunMD, LoopInfo LI,
OptimizationRemarkEmitter *RunORE) {		OptimizationRemarkEmitter *RunORE) {
AC = &RunAC;		AC = &RunAC;
DT = &RunDT;		DT = &RunDT;
VN.setDomTree(DT);		VN.setDomTree(DT);
TLI = &RunTLI;		TLI = &RunTLI;
VN.setAliasAnalysis(&RunAA);		VN.setAliasAnalysis(&RunAA);
MD = RunMD;		MD = RunMD;
		OrderedInstructions OrderedInstrs(DT);
		OI = &OrderedInstrs;
VN.setMemDep(MD);		VN.setMemDep(MD);
ORE = RunORE;		ORE = RunORE;

bool Changed = false;		bool Changed = false;
bool ShouldContinue = true;		bool ShouldContinue = true;

// Merge unconditional branches, allowing PRE to catch more		// Merge unconditional branches, allowing PRE to catch more
// optimization opportunities.		// optimization opportunities.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	for (BasicBlock::iterator BI = BB->begin(), BE = BB->end();
// If we need some instructions deleted, do it now.		// If we need some instructions deleted, do it now.
NumGVNInstr += InstrsToErase.size();		NumGVNInstr += InstrsToErase.size();

// Avoid iterator invalidation.		// Avoid iterator invalidation.
bool AtStart = BI == BB->begin();		bool AtStart = BI == BB->begin();
if (!AtStart)		if (!AtStart)
--BI;		--BI;

		bool InvalidateImplicitCF = false;
		const Instruction *MaybeFirstICF = FirstImplicitControlFlowInsts.lookup(BB);
for (auto *I : InstrsToErase) {		for (auto *I : InstrsToErase) {
assert(I->getParent() == BB && "Removing instruction from wrong block?");		assert(I->getParent() == BB && "Removing instruction from wrong block?");
DEBUG(dbgs() << "GVN removed: " << *I << '\n');		DEBUG(dbgs() << "GVN removed: " << *I << '\n');
if (MD) MD->removeInstruction(I);		if (MD) MD->removeInstruction(I);
DEBUG(verifyRemoved(I));		DEBUG(verifyRemoved(I));
		if (MaybeFirstICF == I) {
		// We have erased the first ICF in block. The map needs to be updated.
		InvalidateImplicitCF = true;
		// Do not keep dangling pointer on the erased instruction.
		MaybeFirstICF = nullptr;
		}
I->eraseFromParent();		I->eraseFromParent();
}		}

		OI->invalidateBlock(BB);
InstrsToErase.clear();		InstrsToErase.clear();
		if (InvalidateImplicitCF)
		fillImplicitControlFlowInfo(BB);

if (AtStart)		if (AtStart)
BI = BB->begin();		BI = BB->begin();
else		else
++BI;		++BI;
}		}

return ChangedFunction;		return ChangedFunction;
▲ Show 20 Lines • Show All 177 Lines • ▼ Show 20 Lines	if (MD && Phi->getType()->isPtrOrPtrVectorTy())
MD->invalidateCachedPointerInfo(Phi);		MD->invalidateCachedPointerInfo(Phi);
VN.erase(CurInst);		VN.erase(CurInst);
removeFromLeaderTable(ValNo, CurInst, CurrentBlock);		removeFromLeaderTable(ValNo, CurInst, CurrentBlock);

DEBUG(dbgs() << "GVN PRE removed: " << *CurInst << '\n');		DEBUG(dbgs() << "GVN PRE removed: " << *CurInst << '\n');
if (MD)		if (MD)
MD->removeInstruction(CurInst);		MD->removeInstruction(CurInst);
DEBUG(verifyRemoved(CurInst));		DEBUG(verifyRemoved(CurInst));
		bool InvalidateImplicitCF =
		FirstImplicitControlFlowInsts.lookup(CurInst->getParent()) == CurInst;
		// FIXME: Intended to be markInstructionForDeletion(CurInst), but it causes
		// some assertion failures.
		OI->invalidateBlock(CurrentBlock);
CurInst->eraseFromParent();		CurInst->eraseFromParent();
		if (InvalidateImplicitCF)
		fillImplicitControlFlowInfo(CurrentBlock);
++NumGVNInstr;		++NumGVNInstr;

return true;		return true;
}		}

/// Perform a purely local form of PRE that looks for diamond		/// Perform a purely local form of PRE that looks for diamond
/// control flow patterns and attempts to perform simple PRE at the join point.		/// control flow patterns and attempts to perform simple PRE at the join point.
bool GVN::performPRE(Function &F) {		bool GVN::performPRE(Function &F) {
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	bool GVN::iterateOnFunction(Function &F) {
cleanupGlobalSets();		cleanupGlobalSets();

// Top-down walk of the dominator tree		// Top-down walk of the dominator tree
bool Changed = false;		bool Changed = false;
// Needed for value numbering with phi construction to work.		// Needed for value numbering with phi construction to work.
// RPOT walks the graph in its constructor and will not be invalidated during		// RPOT walks the graph in its constructor and will not be invalidated during
// processBlock.		// processBlock.
ReversePostOrderTraversal<Function *> RPOT(&F);		ReversePostOrderTraversal<Function *> RPOT(&F);

		for (BasicBlock *BB : RPOT)
		fillImplicitControlFlowInfo(BB);
for (BasicBlock *BB : RPOT)		for (BasicBlock *BB : RPOT)
Changed \|= processBlock(BB);		Changed \|= processBlock(BB);

return Changed;		return Changed;
}		}

void GVN::cleanupGlobalSets() {		void GVN::cleanupGlobalSets() {
VN.clear();		VN.clear();
LeaderTable.clear();		LeaderTable.clear();
BlockRPONumber.clear();		BlockRPONumber.clear();
TableAllocator.Reset();		TableAllocator.Reset();
		FirstImplicitControlFlowInsts.clear();
		}

		void
		GVN::fillImplicitControlFlowInfo(BasicBlock *BB) {
		// Make sure that all marked instructions are actually deleted by this point,
		// so that we don't need to care about omitting them.
		assert(InstrsToErase.empty() && "Filling before removed all marked insns?");
		auto MayNotTransferExecutionToSuccessor = [&](const Instruction *I) {
		// If a block's instruction doesn't always pass the control to its successor
		// instruction, mark the block as having implicit control flow. We use them
		// to avoid wrong assumptions of sort "if A is executed and B post-dominates
		// A, then B is also executed". This is not true is there is an implicit
		// control flow instruction (e.g. a guard) between them.
		//
		// TODO: Currently, isGuaranteedToTransferExecutionToSuccessor returns false
		// for volatile stores and loads because they can trap. The discussion on
		// whether or not it is correct is still ongoing. We might want to get rid
		// of this logic in the future. Anyways, trapping instructions shouldn't
		// introduce implicit control flow, so we explicitly allow them here. This
		// must be removed once isGuaranteedToTransferExecutionToSuccessor is fixed.
		if (isGuaranteedToTransferExecutionToSuccessor(I))
		return false;
		if (auto *LI = dyn_cast<LoadInst>(I)) {
		assert(LI->isVolatile() && "Non-volatile load should transfer execution"
		" to successor!");
		return false;
		}
		if (auto *SI = dyn_cast<StoreInst>(I)) {
		assert(SI->isVolatile() && "Non-volatile store should transfer execution"
		" to successor!");
		return false;
		}
		return true;
		};
		FirstImplicitControlFlowInsts.erase(BB);

		for (auto &I : *BB)
		if (MayNotTransferExecutionToSuccessor(&I)) {
		FirstImplicitControlFlowInsts[BB] = &I;
		break;
		}
}		}

/// Verify that the specified instruction does not occur in our		/// Verify that the specified instruction does not occur in our
/// internal data structures.		/// internal data structures.
void GVN::verifyRemoved(const Instruction *Inst) const {		void GVN::verifyRemoved(const Instruction *Inst) const {
VN.verifyRemoved(Inst);		VN.verifyRemoved(Inst);

// Walk through the value number scope to make sure the instruction isn't		// Walk through the value number scope to make sure the instruction isn't
▲ Show 20 Lines • Show All 197 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/GVN/PRE/2017-10-16-LoadPRECrash.ll

	Show All 11 Lines
	declare i64* @getaddr_i64(i64 addrspace(100)*) #0			declare i64* @getaddr_i64(i64 addrspace(100)*) #0

	; Make sure that the test compiles without a crash.			; Make sure that the test compiles without a crash.
	; Bug https://bugs.llvm.org/show_bug.cgi?id=34937			; Bug https://bugs.llvm.org/show_bug.cgi?id=34937

	define hidden void @wrapon_fn173() {			define hidden void @wrapon_fn173() {

	; CHECK-LABEL: @wrapon_fn173			; CHECK-LABEL: @wrapon_fn173
				; CHECK: entry:
				; CHECK-NEXT: call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)
				; CHECK-NEXT: %.pre = load i64 addrspace(100), i64 addrspace(100)* null, align 8
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: call i64* @getaddr_i64(i64 addrspace(100)* %.pre)
				; CHECK-NEXT: br label %loop

	entry:			entry:
	%0 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)			%0 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)
	br label %loop			br label %loop

	loop:			loop:
	%1 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)			%1 = call %ArrayImpl* @getaddr_ArrayImpl(%ArrayImpl addrspace(100)* undef)
	%2 = load i64 addrspace(100), i64 addrspace(100)* null, align 8			%2 = load i64 addrspace(100), i64 addrspace(100)* null, align 8
	%3 = call i64* @getaddr_i64(i64 addrspace(100)* %2)			%3 = call i64* @getaddr_i64(i64 addrspace(100)* %2)
	br label %loop			br label %loop
	}			}

	attributes #0 = { readnone }			attributes #0 = { readnone }

llvm/trunk/test/Transforms/GVN/PRE/pre-load-guards.ll

				; RUN: opt < %s -basicaa -gvn -enable-load-pre -S \| FileCheck %s
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				declare void @llvm.experimental.guard(i1, ...)

				; This is a motivating example on why we prohibit hoisting through guards.
				; In the bottom block, we check that the index is within bounds and only access
				; the element in this case and deoptimize otherwise. If we hoist the load to a
				; place above the guard, it will may lead to out-of-bound array access.
				define i32 @test_motivation(i32* %p, i32* %q, i1 %C, i32 %index, i32 %len) {
				; CHECK-LABEL: @test_motivation(
				block1:
				%el1 = getelementptr inbounds i32, i32* %q, i32 %index
				%el2 = getelementptr inbounds i32, i32* %p, i32 %index
				br i1 %C, label %block2, label %block3

				block2:

				; CHECK: block2:
				; CHECK-NEXT: br
				; CHECK-NOT: load
				; CHECK-NOT: sge
				; CHECK-NOT: slt
				; CHECK-NOT: and
				br label %block4

				block3:
				store i32 0, i32* %el1
				br label %block4

				block4:

				; CHECK: block4:
				; CHECK: %cond1 = icmp sge i32 %index, 0
				; CHECK-NEXT: %cond2 = icmp slt i32 %index, %len
				; CHECK-NEXT: %in.bounds = and i1 %cond1, %cond2
				; CHECK: call void (i1, ...) @llvm.experimental.guard(i1 %in.bounds)
				; CHECK-NEXT: %PRE = load i32, i32* %P2
				; CHECK: ret i32 %PRE

				%P2 = phi i32* [%el2, %block3], [%el1, %block2]
				%cond1 = icmp sge i32 %index, 0
				%cond2 = icmp slt i32 %index, %len
				%in.bounds = and i1 %cond1, %cond2
				call void (i1, ...) @llvm.experimental.guard(i1 %in.bounds) [ "deopt"() ]
				%PRE = load i32, i32* %P2
				ret i32 %PRE
				}

				; Guard in load's block that is above the load should prohibit the PRE.
				define i32 @test_guard_01(i32* %p, i32* %q, i1 %C, i1 %G) {
				; CHECK-LABEL: @test_guard_01(
				block1:
				br i1 %C, label %block2, label %block3

				block2:

				; CHECK: block2:
				; CHECK-NEXT: br
				; CHECK-NOT: load

				br label %block4

				block3:
				store i32 0, i32* %p
				br label %block4

				block4:

				; CHECK: block4:
				; CHECK: call void (i1, ...) @llvm.experimental.guard(i1 %G)
				; CHECK-NEXT: load
				; CHECK: ret i32

				%P2 = phi i32* [%p, %block3], [%q, %block2]
				call void (i1, ...) @llvm.experimental.guard(i1 %G) [ "deopt"() ]
				%PRE = load i32, i32* %P2
				ret i32 %PRE
				}

				; Guard in load's block that is below the load should not prohibit the PRE.
				define i32 @test_guard_02(i32* %p, i32* %q, i1 %C, i1 %G) {
				; CHECK-LABEL: @test_guard_02(
				block1:
				br i1 %C, label %block2, label %block3

				block2:

				; CHECK: block2:
				; CHECK-NEXT: load i32, i32* %q

				br label %block4

				block3:
				store i32 0, i32* %p
				br label %block4

				block4:

				; CHECK: block4:
				; CHECK-NEXT: phi i32 [
				; CHECK-NEXT: phi i32* [
				; CHECK-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 %G)
				; CHECK-NOT: load
				; CHECK: ret i32

				%P2 = phi i32* [%p, %block3], [%q, %block2]
				%PRE = load i32, i32* %P2
				call void (i1, ...) @llvm.experimental.guard(i1 %G) [ "deopt"() ]
				ret i32 %PRE
				}

				; Guard above the load's block should prevent PRE from hoisting through it.
				define i32 @test_guard_03(i32* %p, i32* %q, i1 %C, i1 %G) {
				; CHECK-LABEL: @test_guard_03(
				block1:
				br i1 %C, label %block2, label %block3

				block2:

				; CHECK: block2:
				; CHECK-NEXT: br
				; CHECK-NOT: load

				br label %block4

				block3:
				store i32 0, i32* %p
				br label %block4

				block4:

				; CHECK: block4:
				; CHECK-NEXT: phi i32*
				; CHECK-NEXT: call void (i1, ...) @llvm.experimental.guard(i1 %G)
				; CHECK-NEXT: load
				; CHECK-NEXT: ret i32

				%P2 = phi i32* [%p, %block3], [%q, %block2]
				call void (i1, ...) @llvm.experimental.guard(i1 %G) [ "deopt"() ]
				br label %block5

				block5:
				%PRE = load i32, i32* %P2
				ret i32 %PRE
				}

llvm/trunk/test/Transforms/GVN/PRE/pre-load-implicit-cf-updates.ll

				; RUN: opt -S -gvn -enable-load-pre < %s \| FileCheck %s

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				; These tests exercise situations when instructions that were first instructions
				; with implicit control flow get removed. We make sure that after that we don't
				; face crashes and are still able to perform PRE correctly.

				declare i32 @foo(i32 %arg) #0

				define hidden void @test_01(i32 %x, i32 %y) {

				; c2 only throws if c1 throws, so it can be safely removed and then PRE can
				; hoist the load out of loop.

				; CHECK-LABEL: @test_01
				; CHECK: entry:
				; CHECK-NEXT: %c1 = call i32 @foo(i32 %x)
				; CHECK-NEXT: %val.pre = load i32, i32* null, align 8
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: %c3 = call i32 @foo(i32 %val.pre)
				; CHECK-NEXT: br label %loop

				entry:
				%c1 = call i32 @foo(i32 %x)
				br label %loop

				loop:
				%c2 = call i32 @foo(i32 %x)
				%val = load i32, i32* null, align 8
				%c3 = call i32 @foo(i32 %val)
				br label %loop
				}

				define hidden void @test_02(i32 %x, i32 %y) {

				; PRE is not allowed because c2 may throw.

				; CHECK-LABEL: @test_02
				; CHECK: entry:
				; CHECK-NEXT: %c1 = call i32 @foo(i32 %x)
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: %c2 = call i32 @foo(i32 %y)
				; CHECK-NEXT: %val = load i32, i32* null, align 8
				; CHECK-NEXT: %c3 = call i32 @foo(i32 %val)
				; CHECK-NEXT: br label %loop

				entry:
				%c1 = call i32 @foo(i32 %x)
				br label %loop

				loop:
				%c2 = call i32 @foo(i32 %y)
				%val = load i32, i32* null, align 8
				%c3 = call i32 @foo(i32 %val)
				br label %loop
				}

				define hidden void @test_03(i32 %x, i32 %y) {

				; PRE of load is allowed because c2 only throws if c1 throws. c3 should
				; not be eliminated. c4 is eliminated because it only throws if c3 throws.

				; CHECK-LABEL: @test_03
				; CHECK: entry:
				; CHECK-NEXT: %c1 = call i32 @foo(i32 %x)
				; CHECK-NEXT: %val.pre = load i32, i32* null, align 8
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: %c3 = call i32 @foo(i32 %y)
				; CHECK-NEXT: %c5 = call i32 @foo(i32 %val.pre)
				; CHECK-NEXT: br label %loop

				entry:
				%c1 = call i32 @foo(i32 %x)
				br label %loop

				loop:
				%c2 = call i32 @foo(i32 %x)
				%val = load i32, i32* null, align 8
				%c3 = call i32 @foo(i32 %y)
				%val2 = load i32, i32* null, align 8
				%c4 = call i32 @foo(i32 %y)
				%c5 = call i32 @foo(i32 %val)
				br label %loop
				}

				define hidden void @test_04(i32 %x, i32 %y) {

				; PRE is not allowed even after we remove c2 because now c3 prevents us from it.

				; CHECK-LABEL: @test_04
				; CHECK: entry:
				; CHECK-NEXT: %c1 = call i32 @foo(i32 %x)
				; CHECK-NEXT: br label %loop
				; CHECK: loop:
				; CHECK-NEXT: %c3 = call i32 @foo(i32 %y)
				; CHECK-NEXT: %val = load i32, i32* null, align 8
				; CHECK-NEXT: %c5 = call i32 @foo(i32 %val)
				; CHECK-NEXT: br label %loop

				entry:
				%c1 = call i32 @foo(i32 %x)
				br label %loop

				loop:
				%c2 = call i32 @foo(i32 %x)
				%c3 = call i32 @foo(i32 %y)
				%val = load i32, i32* null, align 8
				%c4 = call i32 @foo(i32 %y)
				%c5 = call i32 @foo(i32 %val)
				br label %loop
				}

				attributes #0 = { readnone }

llvm/trunk/test/Transforms/GVN/PRE/pre-load.ll

	Show First 20 Lines • Show All 424 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: %c2 = cleanuppad within none []			; CHECK-NEXT: %c2 = cleanuppad within none []
	; CHECK-NEXT: %NOTPRE = load i32, i32* %p			; CHECK-NEXT: %NOTPRE = load i32, i32* %p
	cleanup2:			cleanup2:
	%c2 = cleanuppad within none []			%c2 = cleanuppad within none []
	%NOTPRE = load i32, i32* %p			%NOTPRE = load i32, i32* %p
	call void @g(i32 %NOTPRE)			call void @g(i32 %NOTPRE)
	cleanupret from %c2 unwind to caller			cleanupret from %c2 unwind to caller
	}			}

				; Don't PRE load across potentially throwing calls.

				define i32 @test13(i32* noalias nocapture readonly %x, i32* noalias nocapture %r, i32 %a) {

				; CHECK-LABEL: @test13(
				; CHECK: entry:
				; CHECK-NEXT: icmp eq
				; CHECK-NEXT: br i1

				entry:
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				; CHECK: if.then:
				; CHECK-NEXT: load i32
				; CHECK-NEXT: store i32

				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.end:
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: load i32

				if.end:
				call void @f()
				%vv = load i32, i32* %x, align 4
				ret i32 %vv
				}

				; Same as test13, but now the blocking function is not immediately in load's
				; block.

				define i32 @test14(i32* noalias nocapture readonly %x, i32* noalias nocapture %r, i32 %a) {

				; CHECK-LABEL: @test14(
				; CHECK: entry:
				; CHECK-NEXT: icmp eq
				; CHECK-NEXT: br i1

				entry:
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				; CHECK: if.then:
				; CHECK-NEXT: load i32
				; CHECK-NEXT: store i32

				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.end:
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: load i32

				if.end:
				call void @f()
				br label %follow_1

				follow_1:
				br label %follow_2

				follow_2:
				%vv = load i32, i32* %x, align 4
				ret i32 %vv
				}

				; Same as test13, but %x here is dereferenceable. A pointer that is
				; dereferenceable can be loaded from speculatively without a risk of trapping.
				; Since it is OK to speculate, PRE is allowed.

				define i32 @test15(i32* noalias nocapture readonly dereferenceable(8) %x, i32* noalias nocapture %r, i32 %a) {

				; CHECK-LABEL: @test15
				; CHECK: entry:
				; CHECK-NEXT: icmp eq
				; CHECK-NEXT: br i1

				entry:
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				; CHECK: entry.if.end_crit_edge:
				; CHECK-NEXT: %vv.pre = load i32, i32* %x, align 4
				; CHECK-NEXT: br label %if.end

				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.then:
				; CHECK-NEXT: %uu = load i32, i32* %x, align 4
				; CHECK-NEXT: store i32 %uu, i32* %r, align 4
				; CHECK-NEXT: br label %if.end

				if.end:
				call void @f()
				%vv = load i32, i32* %x, align 4
				ret i32 %vv

				; CHECK: if.end:
				; CHECK-NEXT: %vv = phi i32 [ %vv.pre, %entry.if.end_crit_edge ], [ %uu, %if.then ]
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: ret i32 %vv

				}

				; Same as test14, but %x here is dereferenceable. A pointer that is
				; dereferenceable can be loaded from speculatively without a risk of trapping.
				; Since it is OK to speculate, PRE is allowed.

				define i32 @test16(i32* noalias nocapture readonly dereferenceable(8) %x, i32* noalias nocapture %r, i32 %a) {

				; CHECK-LABEL: @test16(
				; CHECK: entry:
				; CHECK-NEXT: icmp eq
				; CHECK-NEXT: br i1

				entry:
				%tobool = icmp eq i32 %a, 0
				br i1 %tobool, label %if.end, label %if.then

				; CHECK: entry.if.end_crit_edge:
				; CHECK-NEXT: %vv.pre = load i32, i32* %x, align 4
				; CHECK-NEXT: br label %if.end

				if.then:
				%uu = load i32, i32* %x, align 4
				store i32 %uu, i32* %r, align 4
				br label %if.end

				; CHECK: if.then:
				; CHECK-NEXT: %uu = load i32, i32* %x, align 4
				; CHECK-NEXT: store i32 %uu, i32* %r, align 4
				; CHECK-NEXT: br label %if.end

				if.end:
				call void @f()
				br label %follow_1

				; CHECK: if.end:
				; CHECK-NEXT: %vv = phi i32 [ %vv.pre, %entry.if.end_crit_edge ], [ %uu, %if.then ]
				; CHECK-NEXT: call void @f()
				; CHECK-NEXT: ret i32 %vv

				follow_1:
				br label %follow_2

				follow_2:
				%vv = load i32, i32* %x, align 4
				ret i32 %vv
				}

This is an archive of the discontinued LLVM Phabricator instance.

[GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successorsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 120935

llvm/trunk/include/llvm/Transforms/Scalar/GVN.h

llvm/trunk/lib/Transforms/Scalar/GVN.cpp

llvm/trunk/test/Transforms/GVN/PRE/2017-10-16-LoadPRECrash.ll

llvm/trunk/test/Transforms/GVN/PRE/pre-load-guards.ll

llvm/trunk/test/Transforms/GVN/PRE/pre-load-implicit-cf-updates.ll

llvm/trunk/test/Transforms/GVN/PRE/pre-load.ll

[GVN] Prevent LoadPRE from hoisting across instructions that don't pass control flow to successors
ClosedPublic