This is an archive of the discontinued LLVM Phabricator instance.

[GVN PRE] Clear nsw/nuw for original values in LoadPRE
Needs ReviewPublic

Authored by skatkov on Nov 9 2017, 2:39 AM.

Download Raw Diff

Details

Reviewers

• dberlin
efriedma

Summary

When we construct Phi node in LoadPRE we should clear
nsw/num falgs in all original values. Otherwise it is possible
that we can see the poison value where it is not expected.

The added test shows an example of incorrect behavior of
jump threading pass due to GVN PRE did not clear the flag.

Thank you to Daniel Berlin for test.

Diff Detail

Event Timeline

skatkov created this revision.Nov 9 2017, 2:39 AM

efriedma added inline comments.Nov 9 2017, 12:34 PM

test/Transforms/GVN/PRE/pre-jt-load.ll
26	Transforming this load into a phi in GVN is fine without stripping nsw. That should be more obvious if you replace "%v" in one of the additions with a different value.

Errr, outside of an irrelevant store, it generates literally the same output and jump threading result as: https://reviews.llvm.org/rL317768

So i don't see how one is buggy but the other isn't.

Sorry, I didn't really look at the jump-threading testcase in the other patch. (In general, it's bad practice to write testcases which involve multiple passes because it confuses the issue you're actually trying to fix.) We don't need to strip the nsw in pre-jt-add.ll from the other patch, and we don't need to strip nsw for this testcase.

The PRE transform in the testcase in this patch can be split into two parts: one, we clone the load and move it into the predecessors, and two, we eliminate the fully redundant loads. Cloning the load is obviously safe: it's the same operation. Eliminating the fully redundant load is also safe: storing the value to memory and loading it does not change it. Stores do not erase poison; you just get poisoned memory.

The only case which causes a problem is if you take the result of an nsw add, and use it instead of the result of a non-nsw add.

I tend to agree with Eli that we do not need this patch but I disagree that the previous patch is not needed.
And there is a between them. And it is actually here:

The only case which causes a problem is if you take the result of an nsw add, and use it instead of the result of a non-nsw add.

In this case we do not introduce new behavior while preserving the nuw/nsw. Specifically if we went by the path entry->bb->merge we stored nuw/nsw value and load it in merge block. As soon as it is nuw, %cmp will be false.

In pre-jt-add.ll, the situation is different. if we execute the path entry->bb->merge we will use an add without nuw/nsw in cmp instruction but after phi creation without clearing nuw/nsw the same path will result in nsw/nuw add will be used in cmp instruction., So we changed the behavior specifically we use the result of nuw add instead of non-nuw add. So this is a bug.

If there is an agreement on that I propose to conclude that that patch is valid and this one should be abandoned.

Comments?

In D39835#921146, @efriedma wrote:

Sorry, I didn't really look at the jump-threading testcase in the other patch. (In general, it's bad practice to write testcases which involve multiple passes because it confuses the issue you're actually trying to fix.) We don't need to strip the nsw in pre-jt-add.ll from the other patch, and we don't need to strip nsw for this testcase.

The PRE transform in the testcase in this patch can be split into two parts: one, we clone the load and move it into the predecessors, and two, we eliminate the fully redundant loads. Cloning the load is obviously safe: it's the same operation. Eliminating the fully redundant load is also safe: storing the value to memory and loading it does not change it. Stores do not erase poison; you just get poisoned memory.

I want to be very clear that we understand what is going on here in case it changes your opinion (but i don't otherwise have one myself, so don't take this as "disagreeing").
We don't actually clone the load. At all. We don't even consider doing that. It's not really PRE, it just happens to be a thing that this GVN uses it's PRE infrastructure to notice.

This is a full redundancy elimination by merging existing values without cloning

What's happening is that store to load forwarding in each predecessor determines the value is already equivalent to each add, so replaces the load with a merge of those adds. That's why the stores are there.

The only reason it determines the adds are equivalent is because it ignores nsw during value numbering. Otherwise, it would not believe they were equivalent.

In both cases it's quite literally:
pred1:
1 = add nsw
pred2:
2 = add
merge:
3= <some redundant computation>
use 3

->
pred1:
1 = add nsw
pred2:
2 = add
merge:
3 = phi(1, 2)
use 3

What i think i'm hearing you say is that:

if <some redundant computation> was originally "add", and is replaced with phi(add nsw, add) or phi(add nsw, add nsw) that's not okay.
Anything else is okay.

If that is right, then both patches are necessary, but it requires a modified testcase for load pre.

Why:
The scalar PRE patch is obviously necessary in such a world.

So why is the load PRE?

Imagine the above is:
pred1:
1 = add nsw
store
pred2:
2 = add
store
merge:
3= load
4= add
5 = use 4

In such a case, we will first transform this into:
pred1:
1 = add nsw
store
pred2:
2 = add
store
merge:
3= phi(1, 2)
4= add
5 = use 4
Okay so far.

Now we may replace 4 with 3, because they are all just equivalent to the add.
(admittedly, i have not stared at GVN to know whether there are reasons it won't do this, but there is no reason it shouldn't do this :P)

That is just as illegal as the Scalar PRE case by the rules above, as replacing an add with an add nsw.

patchAndReplaceAlluses will not remove the nsw from the phi in this case.
We could teach it to do so.
The other option is this patch, and fixing load PRE when it creates the phi in the first place.

That is just as illegal as the Scalar PRE case by the rules above, as replacing an add with an add nsw.

I agree. And in fact, NewGVN has a bug here: it will transform the following.

@H = common global i32 0
@G = common global i32 0

define i32 @test(i1 %cond, i32 %v) nounwind {
entry:
  br i1 %cond, label %bb, label %bb1

bb:
  %add.1 = add nuw nsw i32 %v, -1
  store i32 %add.1, i32* @G, align 4
  br label %merge

bb1:
  %add.2 = add i32 %v, -1
  store i32 %add.2, i32* @G, align 4
  br label %merge

merge:
  %foo = phi i32 [ %add.2, %bb1 ], [ %add.1, %bb ]
  %foo2 = add i32 %v, -1
  %cmp2 = icmp sgt i32 %foo2, 0
  br i1 %cmp2, label %action, label %return

action:
  store i32 %foo, i32* @H, align 4
  br label %return

return:
  %p = phi i32 [0, %merge], [1, %action]
  ret i32 %p
}

But old GVN doesn't compute that equivalence, as far as I can tell.

Yeah, I filed a bug about this when we started the first review, its good to have a real testcasr

What is the resolution of the review? Should I land it?

lib/Transforms/Scalar/GVN.cpp
773	Side question, will usage of I->dropPoisonGeneratingFlags() better here?

ping...

We shouldn't land this without a testcase actually showing a miscompile.

In D39835#922938, @skatkov wrote:

What is the resolution of the review? Should I land it?

I'm very strongly of the opinion this patch should not land, ever. I don't believe this change is desirable or necessary. If I'm reading through the history here correctly, what we're worried about is a two phase transform: 1) we replace a load with a phi of previously stored values, and then 2) we replace that phi with a single add instruction. It's specifically that *second* transformation which exposes the worried about miscompile. My argument here is simple: a test case with a *manually written* phi in the same place as our PRE based implementation of FRE inserts one would expose the same miscompile. Given this, we might very well have a bug in whatever bug does the second transform, but there is nothing wrong with the FRE/PRE here.

In D39835#921468, @dberlin wrote:

In D39835#921146, @efriedma wrote:

Sorry, I didn't really look at the jump-threading testcase in the other patch. (In general, it's bad practice to write testcases which involve multiple passes because it confuses the issue you're actually trying to fix.) We don't need to strip the nsw in pre-jt-add.ll from the other patch, and we don't need to strip nsw for this testcase.

The PRE transform in the testcase in this patch can be split into two parts: one, we clone the load and move it into the predecessors, and two, we eliminate the fully redundant loads. Cloning the load is obviously safe: it's the same operation. Eliminating the fully redundant load is also safe: storing the value to memory and loading it does not change it. Stores do not erase poison; you just get poisoned memory.

I want to be very clear that we understand what is going on here in case it changes your opinion (but i don't otherwise have one myself, so don't take this as "disagreeing").
We don't actually clone the load. At all. We don't even consider doing that. It's not really PRE, it just happens to be a thing that this GVN uses it's PRE infrastructure to notice.

This is a full redundancy elimination by merging existing values without cloning

What's happening is that store to load forwarding in each predecessor determines the value is already equivalent to each add, so replaces the load with a merge of those adds. That's why the stores are there.

Everything up to here, I follow and agree with. Except, PRE just inserts a PHI. It doesn't "merge the adds". The output of the PRE/FRE here is a phi which looks like this:
merge: ; preds = %bb1, %bb

%foo = phi i32 [ %add.2, %bb1 ], [ %add.1, %bb ]

That's correct.

The only reason it determines the adds are equivalent is because it ignores nsw during value numbering. Otherwise, it would not believe they were equivalent.

Danny, this is where I completely loose you. As far as I can tell, GVN/PRE is not eliminating the phi. The output appears to be completely correct. Can you clarify here?

if <some redundant computation> was originally "add", and is replaced with phi(add nsw, add) or phi(add nsw, add nsw) that's not okay.
Anything else is okay.

Er, the only correct output here is phi(add, add nsw) or phi(add, add). phi(add nsw, add nsw) or phi(add nsw, add) would be incorrect. They're adding the potential poison use (from the nsw flag) along a path not originally present in the source. Today, right now, PRE produces phi(add, add nsw) which is correct.

Why:
The scalar PRE patch is obviously necessary in such a world.

So why is the load PRE?

Imagine the above is:
pred1:
1 = add nsw
store
pred2:
2 = add
store
merge:
3= load
4= add
5 = use 4

In such a case, we will first transform this into:
pred1:
1 = add nsw
store
pred2:
2 = add
store
merge:
3= phi(1, 2)
4= add
5 = use 4
Okay so far.

Now we may replace 4 with 3, because they are all just equivalent to the add.
(admittedly, i have not stared at GVN to know whether there are reasons it won't do this, but there is no reason it shouldn't do this :P)

That is just as illegal as the Scalar PRE case by the rules above, as replacing an add with an add nsw.

I can't tell from your example enough detail to know what you're aiming for, but if there's a bug here, it's in replacing the add, not forming the phi node.

patchAndReplaceAlluses will not remove the nsw from the phi in this case.
We could teach it to do so.
The other option is this patch, and fixing load PRE when it creates the phi in the first place.

The phi is entirely correct and could be hand written. If GVN is considering adds with different flags equivalent without appropriate backpatching, that's a bug in GVN's replacement. End of story.

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

GVN.cpp

10 lines

test/

Transforms/

GVN/

PRE/

pre-jt-load.ll

39 lines

Diff 122214

lib/Transforms/Scalar/GVN.cpp

Show First 20 Lines • Show All 759 Lines • ▼ Show 20 Lines	static Value ConstructSSAForLoadSet(LoadInst LI,
SSAUpdate.Initialize(LI->getType(), LI->getName());		SSAUpdate.Initialize(LI->getType(), LI->getName());

for (const AvailableValueInBlock &AV : ValuesPerBlock) {		for (const AvailableValueInBlock &AV : ValuesPerBlock) {
BasicBlock *BB = AV.BB;		BasicBlock *BB = AV.BB;

if (SSAUpdate.HasValueForBlock(BB))		if (SSAUpdate.HasValueForBlock(BB))
continue;		continue;

		// We must clear nuw/nsw for original value. Otherwise we
		// can see the poison value where it was not originally.
		Value *V = AV.AV.Val.getPointer();
		if (V)
		if (auto *I = dyn_cast<Instruction>(V))
		if (isa<OverflowingBinaryOperator>(I)) {
		skatkovAuthorUnsubmitted Not Done Reply Inline Actions Side question, will usage of I->dropPoisonGeneratingFlags() better here? skatkov: Side question, will usage of I->dropPoisonGeneratingFlags() better here?
		I->setHasNoSignedWrap(false);
		I->setHasNoUnsignedWrap(false);
		}

SSAUpdate.AddAvailableValue(BB, AV.MaterializeAdjustedValue(LI, gvn));		SSAUpdate.AddAvailableValue(BB, AV.MaterializeAdjustedValue(LI, gvn));
}		}

// Perform PHI construction.		// Perform PHI construction.
return SSAUpdate.GetValueInMiddleOfBlock(LI->getParent());		return SSAUpdate.GetValueInMiddleOfBlock(LI->getParent());
}		}

Value AvailableValue::MaterializeAdjustedValue(LoadInst LI,		Value AvailableValue::MaterializeAdjustedValue(LoadInst LI,
▲ Show 20 Lines • Show All 1,875 Lines • Show Last 20 Lines

test/Transforms/GVN/PRE/pre-jt-load.ll

This file was added.

				; RUN: opt < %s -gvn -enable-pre -jump-threading -S \| FileCheck %s

				@H = common global i32 0
				@G = common global i32 0

				define i32 @test(i1 %cond, i32 %v) nounwind {
				; CHECK-LABEL: @test
				entry:
				br i1 %cond, label %bb, label %bb1

				bb:
				; CHECK: store
				; CHECK-NOT: br label %return
				%add.1 = add nuw nsw i32 %v, -1
				store i32 %add.1, i32* @G, align 4
				br label %merge

				bb1:
				; CHECK: store
				; CHECK-NOT: br label %return
				%add.2 = add i32 %v, -1
				store i32 %add.2, i32* @G, align 4
				br label %merge

				merge:
				%foo = load i32, i32* @G, align 4
				efriedmaUnsubmitted Not Done Reply Inline Actions Transforming this load into a phi in GVN is fine without stripping nsw. That should be more obvious if you replace "%v" in one of the additions with a different value. efriedma: Transforming this load into a phi in GVN is fine without stripping nsw. That should be more…
				%cmp = icmp sgt i32 %foo, 0
				br i1 %cmp, label %action, label %return

				action:
				; CHECK: store
				; CHECK-NEXT: br label %return
				store i32 %foo, i32* @H, align 4
				br label %return

				return:
				%p = phi i32 [0, %merge], [1, %action]
				ret i32 %p
				}