This is an archive of the discontinued LLVM Phabricator instance.

[Attributor] AAUndefinedBehavior: Check for branches on undef value.
ClosedPublic

Authored by baziotis on Dec 21 2019, 2:05 PM.

Details

Summary

A branch is considered UB if it depends on an undefined / uninitialized value.
At this point this handles simple UB branches in the form: br i1 undef, ...
We query AAValueSimplify to get a value for the branch condition, so the branch
can be more complicated than just: br i1 undef, ....

Diff Detail

Event Timeline

baziotis created this revision.Dec 21 2019, 2:05 PM
Herald added a project: Restricted Project. · View Herald Transcript
jdoerfert added inline comments.Dec 21 2019, 11:30 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2049–2051

Split it in two calls since the pointer stuff and the control flow stuff (for branch, switch, ...) is conceptually different.

Note that there's somewhat relevant prior art,
e.g. llvm/lib/Transforms/Instrumentation/PoisonChecking.cpp (D64215)
and https://github.com/AliveToolkit/alive2.
Would be great to not have this much duplication, but a single all-powerful one, but not sure it's possible (yet?)

baziotis updated this revision to Diff 235044.Dec 22 2019, 4:47 AM
  • Separate branch instructions and memory accessing instructions.
baziotis marked an inline comment as done.Dec 22 2019, 5:07 AM

Note that there's somewhat relevant prior art,
e.g. llvm/lib/Transforms/Instrumentation/PoisonChecking.cpp (D64215)
and https://github.com/AliveToolkit/alive2.
Would be great to not have this much duplication, but a single all-powerful one, but not sure it's possible (yet?)

Thanks for the reference.
I guess you mean for things like this: https://github.com/AliveToolkit/alive2/blob/master/tests/unit/undef.opt, i.e. propagating
the undef.
For the PoisonChecking that you referenced, from a quick glance I could not see how it relates :/ Could you be more specific?
It seems that poison checking adds runtime checks.

llvm/lib/Transforms/IPO/Attributor.cpp
2036–2039

Note that here it's possibly wrong and I forgot to comment yesterday. I didn't know exactly how to do it but here's the problem.

  • If it has a value and it is not undef, then it's not UB -> OK
  • If it has a value and it's undef, then it is UB -> OK

But...

  • If it doesn't have a value, we consider it not UB. Well, I'm not familiar with the internals of AAValueSimplify, but looking comments around, there were some like "No value _yet_". Which means that right now we may not have a value but we could in the future.

And that value may be undef.
This is no problem for this patch as it tries to handle cases where undef is caught in (hasValue && isa<Undef>). But eventually, AAValueSimplify could uncover things for us here and we may lose them because we put the instruction to NoUBInsts.

uenoku added a comment.EditedDec 22 2019, 7:20 AM

In the current patch, I think an instruction may be inserted to both UBInsts and NoUBInsts in some cases when AAValueSimplify changes its assumption.

In my understanding, the idea of this deduction is
" We assume br instruction causes UB. If you can prove that the instruction *doesn't* cause UB, we remove that assumption".

So I think we don't need to have both UBInsts and NoUBInsts because if an instruction is not in NoUBInsts then it is assumed to cause UB.
( You can find that AAISDeadFunction is similar. If BB is not in AssumedLiveBlocks, then it is assumed to be Dead`).

In the current patch, I think an instruction might be inserted to both UBInsts and NoUBInsts in some cases when AAValueSimplify changes its assumption.

Yes, I forgot to add the check that guards this the previous patches (check the first if in InspectMemAccessInstForUB).

In my understanding, the idea of this deduction is
" We assume br instruction causes UB. If you can prove that the instruction *doesn't* cause UB, we remove that assumption".

Yes exactly. Well, my initial assumption was this:

SimplifiedV = ...;
if (!SimplifiedV.hasValue())
  ... // Do nothing for now, no value yet.
else {
  Val = SimplifiedV.getValue();
  if (Val is undef)
    UBInsts.insert;
  else
    NoUBInsts.insert;
}

Then, now that I see that again apparently I got confused somewhere so let me change this quickly. :P
Is this what you had in mind ?

So I think we don't need to have both UBInsts and NoUBInsts because if an instruction is not in NoUBInsts then it is assumed to cause UB.
( You can find that AAISDeadFunction is similar. If BB is not in AssumedLiveBlocks, then it is assumed to be Dead`).

As I noted above, these 2 sets should have no common elements. With that, having both of them just makes some things easier
(like stats, looping over UB instructions in manifest, checking if an instruction is UB).

uenoku added a comment.EditedDec 22 2019, 8:00 AM

In the current patch, I think an instruction might be inserted to both UBInsts and NoUBInsts in some cases when AAValueSimplify changes its assumption.

Yes, I forgot to add the check that guards this the previous patches (check the first if in InspectMemAccessInstForUB).

In my understanding, the idea of this deduction is
" We assume br instruction causes UB. If you can prove that the instruction *doesn't* cause UB, we remove that assumption".

Yes exactly. Well, my initial assumption was this:

SimplifiedV = ...;
if (!SimplifiedV.hasValue())
  ... // Do nothing for now, no value yet.
else {
  Val = SimplifiedV.getValue();
  if (Val is undef)
    UBInsts.insert;
  else
    NoUBInsts.insert;
}

Then, now that I see that again apparently I got confused somewhere so let me change this quickly. :P
Is this what you had in mind ?

So I think we don't need to have both UBInsts and NoUBInsts because if an instruction is not in NoUBInsts then it is assumed to cause UB.
( You can find that AAISDeadFunction is similar. If BB is not in AssumedLiveBlocks, then it is assumed to be Dead`).

As I noted above, these 2 sets should have no common elements. With that, having both of them just makes some things easier
(like stats, looping over UB instructions in manifest, checking if an instruction is UB).

bool isAssumedToCauseUB(I*){
   return !NoUBInsts.count(I);
}
...
SimplifiedV = ...;
if (!SimplifiedV.hasValue())
  // Do nothing. Assumption holds (Because the value might be simplified to `undef`)
else {
  Val = SimplifiedV.getValue();
  if (Val is undef)
    // Do nothing. Assumption holds.
  else
    NoUBInsts.insert;
}

I think this will work. It is no problem to have both 2 sets (but a bit redundant). If so,

SimplifiedV = ...;
if (!SimplifiedV.hasValue())
  // Assumption holds (Because the value might be simplified to `undef`)
  UBInsts.insert
else {
  Val = SimplifiedV.getValue();
  if (Val is undef)
    //Assumption holds.
    UBInsts.insert
  else
    NoUBInsts.insert;
    UBInsts.remove;
}
baziotis added a comment.EditedDec 22 2019, 8:03 AM
SimplifiedV = ...;
if (!SimplifiedV.hasValue())
  ... // Do nothing for now, no value yet.
else {
  Val = SimplifiedV.getValue();
  if (Val is undef)
    UBInsts.insert;
  else
    NoUBInsts.insert;
}

Note that while this is what makes sense to me, Johannes told me that if SimplifiedV gives None (i.e. it doesn't have value), then we can assume
that it is undef but I don't know why this is true.

Edit: Just saw your comment, which you seem to not assume that.
In your code, why insert it in UBInsts and then remove it? Since we query AAValueSimplify, isn't that going to call us again and thus at some point get the value (which in turn means we don't insert it and just wait for when we'll be called again).

Note that while this is what makes sense to me, Johannes told me that if SimplifiedV gives None (i.e. it doesn't have value), then we can assume
that it is undef but I don't know why this is true.

A state of SimplifiedV is Optional<Value*> representing its simplified associated value V. Initially, set to None. If a candidate V1 is found, it is set to Some(V1). If another candidate V2 is found, trying to unify V1 and V2.
When the simplified value is None, we haven't found a candidate yet, or there is no candidate(like int f(x) { return f(x);}). So you can choose any arbitrary value(=undef) in that assumption.

Edit: Just saw your comment, which you seem to not assume that.
In your code, why insert it in UBInsts and then remove it? Since we query AAValueSimplify, isn't that going to call us again and thus at some point get the value (which in turn means we don't insert it and just wait for when we'll be called again).

The simplified value may change (None -> concrete value) so we need to track.

Note that while this is what makes sense to me, Johannes told me that if SimplifiedV gives None (i.e. it doesn't have value), then we can assume
that it is undef but I don't know why this is true.

A state of SimplifiedV is Optional<Value*> representing its simplified associated value V. Initially, set to None. If a candidate V1 is found, it is set to Some(V1). If another candidate V2 is found, trying to unify V1 and V2.
When the simplified value is None, we haven't found a candidate yet, or there is no candidate(like int f(x) { return f(x);}). So you can choose any arbitrary value(=undef) in that assumption.

I agree but I assumed wrongly apparently. You see, with "we can assume it is undef" I thought that I should also add it to the UBInsts set. Up to that point, if I were to do that, it would be wrong since
if something is added to UBInsts, it would never be checked again. But apparently Johannes did not mean this.

Edit: Just saw your comment, which you seem to not assume that.
In your code, why insert it in UBInsts and then remove it? Since we query AAValueSimplify, isn't that going to call us again and thus at some point get the value (which in turn means we don't insert it and just wait for when we'll be called again).

The simplified value may change (None -> concrete value) so we need to track.

I agree yes. My point was: At some point we _will_ get a concrete value (or, candidate), aren't we? If so, why add it and then remove it from the set instead of just waiting until we get a value? Hence my initial code.
But now I realize that we have to add it since e.g. isAssumedToCauseUB() and stats depend on it. So, I think your version with only NoUBInsts set is better with an additional UBInstsSize (for stats).

I agree but I assumed wrongly apparently. You see, with "we can assume it is undef" I thought that I should also add it to the UBInsts set. Up to that point, if I were to do that, it would be wrong since
if something is added to UBInsts, it would never be checked again. But apparently Johannes did not mean this.

I got the point! I missed that UB is intended to be used for liveness. Sorry about that. I'll rethink the problem. But it seems for me that current implementation regards *assumed* UBInsts as *known* UBInsts. Because once I is assumed to have UB, we never visit I. I guess this will cause invalid deduction.

I agree yes. My point was: At some point, we _will_ get a concrete value (or, candidate), aren't we? If so, why add it and then remove it from the set instead of just waiting until we get a value? Hence my initial code.

I think we won't get a concrete value for None in the iterations.

baziotis added a comment.EditedDec 22 2019, 10:02 AM

I agree but I assumed wrongly apparently. You see, with "we can assume it is undef" I thought that I should also add it to the UBInsts set. Up to that point, if I were to do that, it would be wrong since
if something is added to UBInsts, it would never be checked again. But apparently Johannes did not mean this.

I got the point! I missed that UB is intended to be used for liveness. Sorry about that. I'll rethink the problem. But it seems for me that current implementation regards *assumed* UBInsts as *known* UBInsts. Because once I is assumed to have UB, we never visit I. I guess this will cause invalid deduction.

I agree yes. My point was: At some point, we _will_ get a concrete value (or, candidate), aren't we? If so, why add it and then remove it from the set instead of just waiting until we get a value? Hence my initial code.

I think we won't get a concrete value for None in the iterations.

Well, actually I am sorry for that. The reason I haven't uploaded a diff yet is that I came across what you just said. Basically, I changed the code to keep 2 sets, one for KnownNoUBInsts and another for KnownUBInsts. With that the code becomes quite better as we can do:

if (!SimplifiedV.hasValue()) {
  // No value yet, we can assume any value: assume this is undef BUT
  // this is not _known_ so we don't put in the known set.
} else {
  if (undef) {
    // insert in KnownUB
  } else {
    // insert in NoUB.
  }
}

That is better because:
a) We can use the KnownUB set for the stats
b) We can use KnownNoUB set for the isAssumedToCauseUB.

But I can't progress because as you said, for some reason we never get a concrete value in the iterations. So, in the manifest there are 2 cases:

  1. make unreachable only those in KnownUB. The problem with that is that exactly because we don't get a concrete value, in something like this:
define i1 @ret_undef() {
  ret i1 undef
}

define void @test() {
  %cond = call i1 @ret_undef()
  br i1 %cond, ...

the branch never makes it to the KnownUB.

  1. Make unreachable any instruction that isAssumedToCauseUB. Those are all the instructions that are not in KnownNoUB. That apart from the fact that it doesn't seem all that correct,

it also causes stack dumps. :P

Edit: Forgot to mention that of course we never re-process any instruction in either of the sets.

Well, scratch all that, this is wrong as well. For one, we can't assume that an instruction is UB just because it isn't in the KnownNoUBInsts. If we go with that implementation, we will consider UB instructions such as unconditional branches or really any instruction that it is never checked.

I think you can't use assumption (getAssumedSimplifiedValue) for the known information. You need to use only known information.

baziotis added a comment.EditedDec 22 2019, 10:31 AM

I think you can't use assumption (getAssumedSimplifiedValue) for the known information. You need to use only known information.

In what part you're referring to? case 2) of the manifest cases?

Edit: i.e. that I end up making blocks unreachable on assumed info?

uenoku added a comment.EditedDec 22 2019, 10:39 AM
if (!SimplifiedV.hasValue()) {
  // No value yet, we can assume any value: assume this is undef BUT
  // this is not _known_ so we don't put in the known set.
} else {
  These are also assumption. You can't use these as known
  if (undef) {
    // insert in KnownUB 
  } else {
    // insert in NoUB.
  }
}
if (!SimplifiedV.hasValue()) {
  // No value yet, we can assume any value: assume this is undef BUT
  // this is not _known_ so we don't put in the known set.
} else {
  These are also assumption. You can't use these as known
  if (undef) {
    // insert in KnownUB 
  } else {
    // insert in NoUB.
  }
}

Oh right, I have to call ValueSimplifyAA.isKnown().

I haven't read all of the discussion so it might as well be possible you converged on this already but I'll say it anyway:

Assumed information can used other assumed information, known information only known information.
You can make AAUndefBehavior track assumed information instead of known information but then we need to look at the not yet known facts in every updateImpl iteration again to make sure the assumed status is still justified.

I haven't read all of the discussion so it might as well be possible you converged on this already but I'll say it anyway:

Assumed information can used other assumed information, known information only known information.
You can make AAUndefBehavior track assumed information instead of known information but then we need to look at the not yet known facts in every updateImpl iteration again to make sure the assumed status is still justified.

Alright, that makes sense!
So, one quick question that I hope will help solve some issues: If !SimplifiedV.hasValue() but ValueSimplifyAA.isKnown(), then it is known that the value is undef?

I haven't read all of the discussion so it might as well be possible you converged on this already but I'll say it anyway:

Assumed information can used other assumed information, known information only known information.
You can make AAUndefBehavior track assumed information instead of known information but then we need to look at the not yet known facts in every updateImpl iteration again to make sure the assumed status is still justified.

Alright, that makes sense!
So, one quick question that I hope will help solve some issues: If !SimplifiedV.hasValue() but ValueSimplifyAA.isKnown(), then it is known that the value is undef?

It can't happen but it is semantically correct.

baziotis added a comment.EditedDec 23 2019, 1:25 AM

Alright, that makes sense!
So, one quick question that I hope will help solve some issues: If !SimplifiedV.hasValue() but ValueSimplifyAA.isKnown(), then it is known that the value is undef?

It can't happen but it is semantically correct.

It happens though :) Unless I messed up something. It happens with e.g. this code:

define i1 @ret_undef() {
  ret i1 undef
}

define void @cond_br() {
  %cond = call i1 @ret_undef()
  br i1 %cond, label %t, label %e
t:
  ret void
e:
  ret void
}

And in the Attributor:

if (!SimplifiedV.hasValue()) {
  if (ValueSimplifyAA.isKnown())
    dbgs() << "IS IT UNDEF?\n";
  ...
}

I see the message. Sorry btw that I don't know exactly how AAValueSimplify works. When I started this patch, I assumed it was in everyone's best interest
to not spend time in it right now, so I'm guessing from looking small pieces of its code.

uenoku added a comment.EditedDec 23 2019, 1:46 AM

Alright, that makes sense!
So, one quick question that I hope will help solve some issues: If !SimplifiedV.hasValue() but ValueSimplifyAA.isKnown(), then it is known that the value is undef?

It can't happen but it is semantically correct.

It happens though :) Unless I messed up something. It happens with e.g. this code:

define i1 @ret_undef() {
  ret i1 undef
}

define void @cond_br() {
  %cond = call i1 @ret_undef()
  br i1 %cond, label %t, label %e
t:
  ret void
e:
  ret void
}

And in the Attributor:

if (!SimplifiedV.hasValue()) {
  if (ValueSimplifyAA.isKnown())
    dbgs() << "IS IT UNDEF?\n";
  ...
}

I see the message. Sorry btw that I don't know exactly how AAValueSimplify works. When I started this patch, I assumed it was in everyone's best interest
to not spend time in it right now, so I'm guessing from looking small pieces of its code.

Sorry for my lack of words. I thought you were talking about in updateImpl. I think it can't happen in updates but can happen once reaches to a fix point.

Sorry for my lack of words. I thought you were talking about in updateImpl. It can't happen in updates but can happen once reaches to a fix point.

I am talking about updateImpl(), the code above is inside updateImpl(). It seems true though that if it happens, then a fixpoint has been reached.
Essentially was assumption is that AAValueSimplify seems to return behave weirdly when the value undef (i.e. it returns None while an undef value might be known). So with that,
if it happens, we can deduce that the instruction is UB in updateImpl() and that pretty much solves the previous problems.

uenoku added a comment.EditedDec 23 2019, 5:56 AM

Sorry for my lack of words. I thought you were talking about in updateImpl. It can't happen in updates but can happen once reaches to a fix point.

I am talking about updateImpl(), the code above is inside updateImpl(). It seems true though that if it happens, then a fixpoint has been reached.
Essentially was assumption is that AAValueSimplify seems to return behave weirdly when the value undef (i.e. it returns None while an undef value might be known). So with that,
if it happens, we can deduce that the instruction is UB in updateImpl() and that pretty much solves the previous problems.

Sorry for confusing you. I have missed that the shortcut was introduced(https://github.com/llvm/llvm-project/commit/2dad729f0c7b8665d362baecd8ff52449b26051d). I agree that None and isKnown() means undef.

SimplifiedV = ...;
  if (Simplified. isKnown() && (!SimplifiedV.hasValue() || (SimplifiedV.getValue() == undef))
    KnownUBInsts.insert;
}

Finally, I think this will work anyway! I should have suggested this to you first;) Really sorry!

Sorry for confusing you. I have missed that the shortcut was introduced(https://github.com/llvm/llvm-project/commit/2dad729f0c7b8665d362baecd8ff52449b26051d). I agree that None and isKnown() means undef.

SimplifiedV = ...;
  if (Simplified. isKnown() && (!SimplifiedV.hasValue() || (SimplifiedV.getValue() == undef))
    KnownUBInsts.insert;
}

Finally, I think this will work anyway! I should have suggested this to you first;) Really sorry!

No worries, thanks for your time. :) Yes it will work, I've tested it yesterday but I was just waiting for an affirmation on the aforementioned question.

baziotis updated this revision to Diff 235160.EditedDec 23 2019, 9:55 AM
  • Use only known sets: One for known UB instructions and one for known _not_ UB instructions. The analysis correctness depends on the NoUB one. The 2 sets are used so that no same instruction is re-processed. The knownUB is also used for stats and also to change blocks to unreachable in manifest (we make unreachable the blocks that contain the instructions in this set). Note: These 2 sets are disjoint.
  • We use AAValueSimplify but only if the value is known. There are 2 caveats:
    1. If the value is known but getAssumedSimplifiedValue() gives us None (no value), then we can assume it has reached a fixpoint and the value is undef.
    2. For some reason, for branches where undef is actually written (i.e. br i1 undef, ...), the AAValueSimplify doesn't tell us it is known. Thus, before we even start querying AAValueSimplify, we check the standard LLVM Value if it is undef.

Ongoing:

  1. Because of the point 2) above, I don't know whether we ever need the case (isKnown && hasValue && the value is undef), since that was supposed to handled the simple cases, but it doesn't because we don't get the isKnown part.
  2. I think it needs more tests.

Disclaimer: I read only part of the conversation.

I see the message. Sorry btw that I don't know exactly how AAValueSimplify works. When I started this patch, I assumed it was in everyone's best interest
to not spend time in it right now, so I'm guessing from looking small pieces of its code.

Sorry for my lack of words. I thought you were talking about in updateImpl. I think it can't happen in updates but can happen once reaches to a fix point.

FWIW, I think @uenoku comment is correct here but it might help to elaborate:

The way AAValueSimplify is build ensures that only once a fixpoint is reached the simplified value is "known". One could also say, once a simplified value is known we know a fixpoint had to be reached.
Now the interesting part here is that this means a fixpoint for the AAValueSimplify object. The Attributor will determine fixpoints for attributes eagerly, thus even if others are not there yet and still iterating. It will even inform them (via indicateOptimisticFixpoint) that they reached a fixpoint and can use the assumed value as known from now on. That is why you can see the "odd behavior" in a different objects updateImpl.

baziotis updated this revision to Diff 235209.Dec 24 2019, 1:26 AM

Added tests that are FIXMEs.

  • Test for instruction that should propagate undef.
  • Test for uninitialized value.

I see the message. Sorry btw that I don't know exactly how AAValueSimplify works. When I started this patch, I assumed it was in everyone's best interest
to not spend time in it right now, so I'm guessing from looking small pieces of its code.

Sorry for my lack of words. I thought you were talking about in updateImpl. I think it can't happen in updates but can happen once reaches to a fix point.

FWIW, I think @uenoku comment is correct here but it might help to elaborate:

The way AAValueSimplify is build ensures that only once a fixpoint is reached the simplified value is "known". One could also say, once a simplified value is known we know a fixpoint had to be reached.
Now the interesting part here is that this means a fixpoint for the AAValueSimplify object. The Attributor will determine fixpoints for attributes eagerly, thus even if others are not there yet and still iterating. It will even inform them (via indicateOptimisticFixpoint) that they reached a fixpoint and can use the assumed value as known from now on. That is why you can see the "odd behavior" in a different objects updateImpl.

Yes, thanks, it helped (also @uenoku helped a lot). But I still don't get why the following happens:

  1. For some reason, for branches where undef is actually written (i.e. br i1 undef, ...), the AAValueSimplify doesn't tell us it is known. Thus, before we even start querying AAValueSimplify, we check the standard LLVM Value if it is undef.

It seems to me that for constants or undef, this should be known. Looking at initialize() of AAValueSimplifyFloating, for constants and undef, it indicates a pessimistic fixpoint and I don't understand why.

uenoku added a comment.EditedDec 24 2019, 3:48 AM

It seems to me that for constants or undef, this should be known. Looking at initialize() of AAValueSimplifyFloating, for constants and undef, it indicates a pessimistic fixpoint and I don't understand why.

At first concept of AAValueSimplify, I set the semantics of optimistic state as that the value is simplified actually. So if the state is pessimistic, getAssumedSimplifiedValue returns the original associated value.
(https://reviews.llvm.org/D66967#1659664)

Base on that, if the associated value is constant or undef, I fixed the state as pessimistic because we can't simplify the value more.
But now it seems it's more useful to reach an optimistic fixpoint when the value is constant or undef, so I can agree to change it(D71852).

It seems to me that for constants or undef, this should be known. Looking at initialize() of AAValueSimplifyFloating, for constants and undef, it indicates a pessimistic fixpoint and I don't understand why.

At first concept of AAValueSimplify, I set the semantics of optimistic state as that the value is simplified actually. So if the state is pessimistic, getAssumedSimplifiedValue returns the original associated value.
(https://reviews.llvm.org/D66967#1659664)

Base on that, if the associated value is constant or undef, I fixed the state as pessimistic because we can't simplify the value more.

Aha, ok I got the idea.

But now it seems it's more useful to reach an optimistic fixpoint when the value is constant or undef, so I can agree to change it(D71852).

Great, thank you for the quick action! After reading the above again, maybe then we want to not break the conceptual idea of AAValueSimplify. Actually let me comment on the other revision.

But now it seems it's more useful to reach an optimistic fixpoint when the value is constant or undef, so I can agree to change it(D71852).

Great, thank you for the quick action! After reading the above again, maybe then we want to not break the conceptual idea of AAValueSimplify. Actually let me comment on the other revision.

We can change things if it makes sense. What I would prefer to do wrt. AAValueSimplify is to get D68934 in. I'll put a new revision online as soon as I can, make the diff easier to read and test it some more. Though the version that is online should be very close to what it'll be.

I inlined remaining minor comments from my side. @uenoku you should finish the review and accept once your happy.

llvm/lib/Transforms/IPO/Attributor.cpp
2057

Just count, no need to check it against 0.

2086

!XXX.size() -> XXX.empty()

baziotis marked an inline comment as done.Dec 24 2019, 2:29 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2057

Ok, I just did it because otherwise there's an implicit cast to bool and well.. whatever, let me change it. :P

baziotis updated this revision to Diff 235235.EditedDec 24 2019, 2:40 PM
  • Small changes.

@uenoku please update me if you want something changed. Also, after this diff I was planning to make another to use AAValueSimplify in the other instructions as well. If you can, update me whether it's better to do it here.

Oh, also whether I should wait for https://reviews.llvm.org/D71852 to be committed.

  • Small changes.

@uenoku please update me if you want something changed. Also, after this diff I was planning to make another to use AAValueSimplify in the other instructions as well. If you can, update me whether it's better to do it here.

Oh, also whether I should wait for https://reviews.llvm.org/D71852 to be committed.

I'd wait wrt. AAValueSimplify.
I'd do the following two first:

  1. UnreachableInst is UB (though no need to replace it in the manifest.
  2. Given an instruction, determine if that instruction or one later that is known to be executed is known to cause UB. We then hook that up to the AAIsDead.
uenoku added inline comments.Dec 24 2019, 9:27 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2050

Why don't we need to check for KnownUBInsts?

baziotis marked an inline comment as done.Dec 25 2019, 2:50 AM

I'd wait wrt. AAValueSimplify.

I guess you meant "wait to be committed" (and not wait as to not use it in the memory accessing functions yet).

I'd do the following two first:

  1. UnreachableInst is UB (though no need to replace it in the manifest.
  2. Given an instruction, determine if that instruction or one later that is known to be executed is known to cause UB. We then hook that up to the AAIsDead.

Regarding 2), would it suffice to: Go through the next instructions and follow (alive) branches either by taking unconditional branches, or branches that have known value (e.g. using AAValueSimplify) true.
Note: Walk them in a DFS kind of way until we find a UB instruction (or none at all).

llvm/lib/Transforms/IPO/Attributor.cpp
2050

Oh yes, I've forgotten about that. I should have updated it when we ended up in using only known parts.
So, the correctness of this procedure is described in the comment of KnownNoUBInsts. Since the size KnownUBInsts is also monotonically increasing and bounded, then the "sum" of these 2 "functions" is also monotonically increasing and bounded.
Hence, we can (and should) include that (part of this reasoning was why inserting and removing from the set was going to give us problems).
Probably you have thought of all that but just to be sure we're on the same page. :)

uenoku added inline comments.Dec 25 2019, 3:40 AM
llvm/lib/Transforms/IPO/Attributor.cpp
2050

Ok, thanks.

2125–2126

Could you add a comment here to say that instruction in NoUBInst might cause UB?

uenoku accepted this revision.EditedDec 25 2019, 3:43 AM

LGTM otherwise

This revision is now accepted and ready to land.Dec 25 2019, 3:43 AM
baziotis marked an inline comment as done.Dec 25 2019, 7:46 AM

LGTM otherwise

Great! Your change on AAValueSimplify was committed along with my previous revisions, so let me do an updated diff based on that (and also address the comments).
I think it's better to do Johannes' suggestions on a different revision.

llvm/lib/Transforms/IPO/Attributor.cpp
2125–2126

Of course.

I'd wait wrt. AAValueSimplify.

I guess you meant "wait to be committed" (and not wait as to not use it in the memory accessing functions yet).

Yes. Also wrt. modifying AAVAlueSimply further. Using it is not a problem.

I'd do the following two first:

  1. UnreachableInst is UB (though no need to replace it in the manifest.
  2. Given an instruction, determine if that instruction or one later that is known to be executed is known to cause UB. We then hook that up to the AAIsDead.

Regarding 2), would it suffice to: Go through the next instructions and follow (alive) branches either by taking unconditional branches, or branches that have known value (e.g. using AAValueSimplify) true.
Note: Walk them in a DFS kind of way until we find a UB instruction (or none at all).

You want to use MustBeExecutedContextExplorer. You get it like this:

MustBeExecutedContextExplorer &Explorer =
    A.getInfoCache().getMustBeExecutedContextExplorer();

You iterate over it like you would with a container like this:

for (MustBeExecutedIterator &It : Explorer.range(Instruction))

There was a patch by @uenoku to deal with conditional and the merging of states when we are exploring but I don't see it in-tree and I forgot which one it was.
Nevertheless, it should immediately work for straight line code and code that is in "some merge block" after a conditional.

LGTM otherwise

Great! Your change on AAValueSimplify was committed along with my previous revisions, so let me do an updated diff based on that (and also address the comments).

Assuming I understand you correctly, you should.

I think it's better to do Johannes' suggestions on a different revision.

Yes.

@uenoku Can you commit this for @baziotis? (https://llvm.org/docs/DeveloperPolicy.html#commit-messages)

I'd wait wrt. AAValueSimplify.

I guess you meant "wait to be committed" (and not wait as to not use it in the memory accessing functions yet).

Yes. Also wrt. modifying AAVAlueSimply further. Using it is not a problem.

Ok, noted.

I'd do the following two first:

  1. UnreachableInst is UB (though no need to replace it in the manifest.
  2. Given an instruction, determine if that instruction or one later that is known to be executed is known to cause UB. We then hook that up to the AAIsDead.

Regarding 2), would it suffice to: Go through the next instructions and follow (alive) branches either by taking unconditional branches, or branches that have known value (e.g. using AAValueSimplify) true.
Note: Walk them in a DFS kind of way until we find a UB instruction (or none at all).

You want to use MustBeExecutedContextExplorer. You get it like this:

MustBeExecutedContextExplorer &Explorer =
    A.getInfoCache().getMustBeExecutedContextExplorer();

You iterate over it like you would with a container like this:

for (MustBeExecutedIterator &It : Explorer.range(Instruction))

There was a patch by @uenoku to deal with conditional and the merging of states when we are exploring but I don't see it in-tree and I forgot which one it was.
Nevertheless, it should immediately work for straight line code and code that is in "some merge block" after a conditional.

Aha, ok, thank you! I'll try it in the next revision.

@uenoku Can you commit this for @baziotis? (https://llvm.org/docs/DeveloperPolicy.html#commit-messages)

Sorry for being late, I would like to have already updated a diff so that @uenoku can commit it but I had some problems with compilation of LLVM when I pulled the last changes.

uenoku added a comment.EditedDec 25 2019, 9:14 AM

There was a patch by @uenoku to deal with conditional and the merging of states when we are exploring but I don't see it in-tree and I forgot which one it was.

It was D65593. I had forgotten too:). It is a good opportunity to rebase. I'll work on.
The idea is that if we know there is UB in both branches, we can say there is UB regardless of a condition value.

@uenoku Can you commit this for @baziotis? (https://llvm.org/docs/DeveloperPolicy.html#commit-messages)

Sure.

baziotis marked an inline comment as done.Dec 25 2019, 10:10 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2125–2126

I was thinking about it and it seems to me that having this set is kind of misleading (both the naming and the conceptual idea around it).
That is, it doesn't seem that there's a way (at least for now) to know for sure that an instruction is not UB, unless we have a constant.
So, I'm proposing to change the conceptual idea (and the naming) like this. An instruction can be in 3 categories:

  1. Known to cause UB (AAUndefinedBehavior could prove it) - make an actual set for it.
  2. Known to not cause UB (AAUndefinedBehavior could prove it because of a constant). Here would go only branch instructions, which could have a constant condition and not be UB. Memory accessing instructions can't AFAIK because if they have a constant, it will either be null (so UB) or undef (so, UB). Make another set for those (so that we don't re-process them in every update)
  3. Assumed to cause UB. Basically, every other instruction.

What we have now is sort of this scheme, but the KnownNoUBInsts are not actually known to not be UB as you mentioned and hence I think this makes the understanding of the code difficult.

(For reference, and you may as well skip that since this comment is already big, I think the current scheme is something like:
An instruction can be:

  1. Known to cause UB (AAUndefinedBehavior could prove it).
  2. Assumed to not cause UB. AAUndefinedBehavior could _not_ prove it but still it optimistically assumes it doesn't cause UB

(which is like "what??" since AAUndefinedBehavior is supposed to optimistically assume for UB).
...)
Anyway, looking forward to your opinion and sorry for the big comment (on an already accepted revision).

jdoerfert added inline comments.Dec 25 2019, 4:00 PM
llvm/lib/Transforms/IPO/Attributor.cpp
2125–2126

The two interesting categories are:

  1. Known to cause UB (and we proved it)
  2. Assumed to cause UB (every updateImpl invocation we found a reason to assume it. This is probably not what this Attribute does but it should eventually).

The third category which is tracked so we don't revisit instructions that do not fall into the first two is:

  1. Not assumed to cause UB. We failed to argue it causes UB in an updateImpl invocation. This includes things that cannot cause UB! (We track things that do cause UB not the other way around.)
baziotis marked an inline comment as done.Dec 25 2019, 4:20 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2125–2126

Ok, got it, thanks! FWIW, I didn't propose the "known no UB" set because I think it is a good idea to track "no UB" instructions.
Rather, that schemeis overoptimistic because for every instruction that we could not prove it is no UB, it assumes it is UB.
Yours is clearly better. Meaning we should have a reason to assume an instruction is UB.
I'll update it asap.

baziotis updated this revision to Diff 235310.Dec 25 2019, 6:11 PM
  • Clarification on the uses of the 2 sets.

I was trying to get AAValueSimplify to work on the memory accessing instructions as well but at some point along the way
I ran the whole Attributor suite. Important: Unfortunately, it seems that this patch breaks 3 other test cases. Specifically, the changeToUnreachable part. I couldn't
understand why, I'll come back tomorrow. Please feel free to propose ideas (it seems that AAIsDead has problems with it).

So, I tried to make a reduced test case that fails:

define void @fails() {
entry:
  br i1 undef, label %end, label %end

end:
  %phi = phi i32* [ null, %entry ], [ null, %entry ]
  %a = load i32, i32* %phi, align 4
  ret void
}

It's based on the test case IPConstantProp/PR26044.ll, which also fails. The interesting things are:

  1. If we remove the load, it doesn't fail.
  2. If we remove the phi it doesn't fail (that's also true for PR26044.ll).
  3. As noted yesterday, if we remove the changeToUnreachable in AAUndefinedBehavior::manifest(), it doesn't fail (all the test cases basically fail because of this part).
  4. Also, that AAIsDead seems to have a problem with it.

My guess is that the fact that we change a branch instruction to unreachable means that there's one less predecessor for the then and else blocks of the branch.
If one of these 2 hasn't been converted to unreachable and contains a phi, then we have problems as now the BB has one less predecessor than those listed in the phi.

uenoku added a comment.EditedDec 26 2019, 8:38 AM

So, I tried to make a reduced test case that fails:

define void @fails() {
entry:
  br i1 undef, label %end, label %end

end:
  %phi = phi i32* [ null, %entry ], [ null, %entry ]
  %a = load i32, i32* %phi, align 4
  ret void
}

It's based on the test case IPConstantProp/PR26044.ll, which also fails. The interesting things are:

  1. If we remove the load, it doesn't fail.
  2. If we remove the phi it doesn't fail (that's also true for PR26044.ll).
  3. As noted yesterday, if we remove the changeToUnreachable in AAUndefinedBehavior::manifest(), it doesn't fail (all the test cases basically fail because of this part).
  4. Also, that AAIsDead seems to have a problem with it.

My guess is that the fact that we change a branch instruction to unreachable means that there's one less predecessor for the then and else blocks of the branch.
If one of these 2 hasn't been converted to unreachable and contains a phi, then we have problems as now the BB has one less predecessor than those listed in the phi.

I think the problem here is that you are calling changeToUnrechable in manifest. This might cause unpredictable errors.
So you should cache instructions to be changed to unreachable and call changeToUnrechable after manifest(see below comment).
I tested this way locally and the error has been removed.

llvm/lib/Transforms/IPO/Attributor.cpp
5569–5570

Here

I create a patch(D71910) for this problem then with that patch, you can use like A.changeToUnreachableAfterManifest(I).

I think the problem here is that you are calling changeToUnrechable in manifest. This might cause unpredictable errors.

Thanks, I hadn't seen how manifest() fits into the big picture.

So you should cache instructions to be changed to unreachable and call changeToUnrechable after manifest(see below comment).
I tested this way locally and the error has been removed.
I create a patch(D71910) for this problem then with that patch, you can use like A.changeToUnreachableAfterManifest(I).

Much appreciated, thank you. I hadn't noticed and I wrote similar code thinking I was doing something wrong because I had to change parts outside AAUndefinedBehavior (because all the similar code changes uses not instructions).
I'll wait for it to be committed. Now the yesterday's code that uses AAValueSimplify on the memory accessing instructions should work.

baziotis updated this revision to Diff 235364.Dec 26 2019, 11:00 AM
  • Added one test to check propagation of null - it's not behaving as we'd like.
  • Abstracted the AAValueSimplify usage in AAUndefinedBehavior.

A couple of notes:

  • Now the same cases as before fail plus a couple more. But, the 3 cases that were failing before were crashing. Now they just give different result

which is expected. It probably is easy to change them.
The other cases however crash but because the number of iterations is not the specified. I may not be the most appropriate person to look into that.

  • I still don't understand why getPointerOperand() returns null on volatile instructions (although I have guess it is to prevent further processing). Is it correct what I do?
  • I still don't understand why getPointerOperand() returns null on volatile instructions (although I have guess it is to prevent further processing). Is it correct what I do?

getPointerOperand was added as a helper for dereferenceable(volatile store/load doesn't imply dereferenceable). And you can change it if you want.

I'd say I'm not sure whether volatile store/load for undef is UB.

  • I still don't understand why getPointerOperand() returns null on volatile instructions (although I have guess it is to prevent further processing). Is it correct what I do?

getPointerOperand was added as a helper for dereferenceable(volatile store/load doesn't imply dereferenceable). And you can change it if you want.

Aha ok, thanks.

I'd say I'm not sure whether volatile store/load for undef is UB.

Well, if we go by the book, which would be the LLVM IR ref manual, and optimize aggressively for UB, undef can be considered to have any bit pattern.
And we can choose it to have the null bit pattern, which is UB for both volatile and non-volatile.

As another note (and related to the diff update message), I realize that it's difficult for both of us to try and correct 16 test cases that currently fail in this revision.
I think it's better to remove the AAValueSimplify in the memory accessing instructions. That will make the failing test-cases only 3. I could then try to fix them
and it should be easier for you to review as well. What do you think?

Well, if we go by the book, which would be the LLVM IR ref manual, and optimize aggressively for UB, undef can be considered to have any bit pattern.
And we can choose it to have the null bit pattern, which is UB for both volatile and non-volatile.

Ok, thanks.

As another note (and related to the diff update message), I realize that it's difficult for both of us to try and correct 16 test cases that currently fail in this revision.
I think it's better to remove the AAValueSimplify in the memory accessing instructions. That will make the failing test-cases only 3. I could then try to fix them
and it should be easier for you to review as well.

Please split the patch.

baziotis updated this revision to Diff 235455.Dec 27 2019, 4:44 PM
  • Attributor::getPointerOperand() and getPointerOperandOfNonVolatile().
  • Removed AAValueSimplify for memory accessing instructions.
  • Updated test cases.

Notes:

  1. As it seems, there are multiple instances of getPointerOperand() across LLVM. We should probably be careful and not name a function

like this, hence I put getPointerOperand() as a static method of Attributor. You may want to check this, although it's somewhat old and probably outdated.

  1. @uenoku I updated the test cases according to what I thought they tried to test. Please verify that they're correct because I may very well have misinterpreted.
  • Attributor::getPointerOperand() and getPointerOperandOfNonVolatile().
  • Removed AAValueSimplify for memory accessing instructions.
  • Updated test cases.

Notes:

  1. As it seems, there are multiple instances of getPointerOperand() across LLVM. We should probably be careful and not name a function

like this, hence I put getPointerOperand() as a static method of Attributor. You may want to check this, although it's somewhat old and probably outdated.

  1. @uenoku I updated the test cases according to what I thought they tried to test. Please verify that they're correct because I may very well have misinterpreted.

Regarding 1, please change to like getPointerOperand(Instruction *I, bool AllowVolatile) and merge getPointerOperandOfNonVolatile into it.

llvm/lib/Transforms/IPO/Attributor.cpp
1994

Please assert with string.

baziotis updated this revision to Diff 235471.Dec 28 2019, 4:08 AM

Addressed comments

uenoku accepted this revision.Dec 28 2019, 6:43 AM

LGTM again, thank you! I'll commit.

LGTM again, thank you! I'll commit.

Thank you mate, I'll continue on a new patch.

baziotis edited the summary of this revision. (Show Details)Dec 28 2019, 7:13 AM
This revision was automatically updated to reflect the committed changes.