This is an archive of the discontinued LLVM Phabricator instance.

[Attributor][WIP] Connect AAIsDead with AAUndefinedBehavior
Needs ReviewPublic

Authored by baziotis on Dec 29 2019, 11:27 AM.

Details

Summary

In AAIsDeadFunction, don't include AliveSuccessors that are UB.

Diff Detail

Event Timeline

baziotis created this revision.Dec 29 2019, 11:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 29 2019, 11:27 AM

I think we need to query AAUB from AAIsDead in the updateImpl. See the TODO in AAIsDeadFunction::updateImpl(Attributor &A). Similarly, in AAUB we should go through the explorer context when we add something to the knownUB set. So if I is knownUB, make all instructions in the must-be-executed-context of I knownUB, thus insert all into the set. In AAIsDead we can then simply query isKnownUB and that will only need to look into the set (as before). In addition to the TODO mentioned before we need the same logic in the initialize from AAIsDeadFunction before we add the entry block as live.

llvm/lib/Transforms/IPO/Attributor.cpp
1982

We should avoid caching stuff in the AAs. isKnownToCauseUB can take an Attributor reference.

2719

As mentioned above, you can add the Attributor as a argument to calls.

baziotis marked an inline comment as done.EditedDec 30 2019, 6:11 AM

I think we need to query AAUB from AAIsDead in the updateImpl. See the TODO in AAIsDeadFunction::updateImpl(Attributor &A).

I didn't see that at all, thanks! I'll give it a try.

Similarly, in AAUB we should go through the explorer context when we add something to the knownUB set. So if I is knownUB, make all instructions in the must-be-executed-context of I knownUB, thus insert all into the set. In AAIsDead we can then simply query isKnownUB and that will only need to look into the set (as before).

If I understand it correctly, this will mark as UB all the instructions from the UB instruction and forwards (i.e. the must be executed context goes to the successors). That will probably complicate things as to see if a BB is dead, with the current code, it makes sense to see its first instruction right?

In addition to the TODO mentioned before we need the same logic in the initialize from AAIsDeadFunction before we add the entry block as live.

Oh yes, I had added and removed that basically because that I think can't work with assumed info easily. I'll add it with known info to make clear what I mean.

llvm/lib/Transforms/IPO/Attributor.cpp
2719

Ok, thanks, I'll do it that way.

baziotis updated this revision to Diff 235596.Dec 30 2019, 6:48 AM
  • Query AAUB in identifyAliveSuccessors() for BranchInst.

This is essentially a proposed PoC on how to use AAUB in AAIsDeadFunction::updateImpl(). The TODO there says "look for (assumed) UB to backwards propagate "deadness"".
In my understanding that seems kind of unorthodox since the whole process is moving forwards (with an "artificial" recursion so it's not easy to "go back" since we never actually return back). Because of that, I thought it may be better instead of that to just "not go forwards" on successors (and BBs) that have UB (not going forwards into them means they're not put in AssumedLiveBlocks, effectively keeping them dead).

  • Passed the attributor in isAssumedDead() etc. (that resulted in changes all over the place, I hope that's ok).
baziotis marked an inline comment as done.Dec 30 2019, 6:52 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
2721–2724

With the known info it's easy because we don't need to remember that we were based on assumed info (in the updateImpl()). But now that I see that again, it probably is as simple as inserting Front in the ToBeExploredFrom (always) and only make the EntryBlock live if it is not assumed to cause UB.

Similarly, in AAUB we should go through the explorer context when we add something to the knownUB set. So if I is knownUB, make all instructions in the must-be-executed-context of I knownUB, thus insert all into the set. In AAIsDead we can then simply query isKnownUB and that will only need to look into the set (as before).

If I understand it correctly, this will mark as UB all the instructions from the UB instruction and forwards (i.e. the must be executed context goes to the successors). That will probably complicate things as to see if a BB is dead, with the current code, it makes sense to see its first instruction right?

The must-be-executed-context is not defined to only contain "successors". It might right now but that will change eventually.


I think we should stick to known information right now.

I still believe this is more complex and less general than the approach I tried to describe. Maybe the following is a good compromise in the direction I think we should going but working already right now:

  1. Since the must-be-executed-context is not collecting predecessors yet, we need to change that. The code actually exists already, it just needs to be separated from some other improvements and put for review again. I'll look into that (or @uenoku you can if you want to).
  2. Collecting instructions that are executed always with one which causes UB in the "knownUB" set seems logical to me. After all, their execution is "eventually" leading to UB. Having them in the set gives us a single consistent way to check instead of iterating over must-be-executed contexts all the time. We can keep the context loop in isKnownUB for now but we should eventually not do that (thus we should add a TODO).
  3. During the liveness exploration we should check if the beginning of a block is known to cause UB, if so, we do not make it live. We can even check on the instruction level if we want to later. For now, we should be able to do all this in the assumeLive method by "not assuming it is live" if it leads to UB.

Are no tests affected?

baziotis added a comment.EditedDec 30 2019, 11:20 AM

Similarly, in AAUB we should go through the explorer context when we add something to the knownUB set. So if I is knownUB, make all instructions in the must-be-executed-context of I knownUB, thus insert all into the set. In AAIsDead we can then simply query isKnownUB and that will only need to look into the set (as before).

If I understand it correctly, this will mark as UB all the instructions from the UB instruction and forwards (i.e. the must be executed context goes to the successors). That will probably complicate things as to see if a BB is dead, with the current code, it makes sense to see its first instruction right?

The must-be-executed-context is not defined to only contain "successors". It might right now but that will change eventually.


I think we should stick to known information right now.

Ok, isAssumedToCauseUB() is overoptimistic right now.

I still believe this is more complex and less general than the approach I tried to describe. Maybe the following is a good compromise in the direction I think we should going but working already right now:

  1. Since the must-be-executed-context is not collecting predecessors yet, we need to change that. The code actually exists already, it just needs to be separated from some other improvements and put for review again. I'll look into that (or @uenoku you can if you want to).

Great, I didn't know this was planned.

  1. Collecting instructions that are executed always with one which causes UB in the "knownUB" set seems logical to me. After all, their execution is "eventually" leading to UB. Having them in the set gives us a single consistent way to check instead of iterating over must-be-executed contexts all the time. We can keep the context loop in isKnownUB for now but we should eventually not do that (thus we should add a TODO).

I surely agree. My comment was relative to the current MustBeExecutedContext. That is, since it currently only looks successors (and I didn't know it was planned to change), it would be difficult to add the right predecessors to the set when we found a UB instruction.
And that would then make more difficult the fact that to connect it with AAIsDead, I was looking at the start of the block (which could be predecessor of a UB instruction). Now it's clear, thanks!

  1. During the liveness exploration we should check if the beginning of a block is known to cause UB, if so, we do not make it live. We can even check on the instruction level if we want to later. For now, we should be able to do all this in the assumeLive method by "not assuming it is live" if it leads to UB.

This is sort of what I was going for (if you see the identifyAliveSuccessors for BranchInst, it marks alive only BBs whose first instruction is not UB, hence the whole forwards vs backwards etc.). But as you said, I made it too complicated as instead
of modifying all identifyAliveSuccessors() etc. I could just put the code in assumeLive() which I will now do. :)

Are no tests affected?

A lot, I'll upload in the next diff.

EDIT:

I think we should stick to known information right now.

There's one problem with that when I tried it in assumeLive(). It queries AAUB and blocks that are not yet known to cause UB are assumed live and are never re-tested.

Similarly, in AAUB we should go through the explorer context when we add something to the knownUB set. So if I is knownUB, make all instructions in the must-be-executed-context of I knownUB, thus insert all into the set. In AAIsDead we can then simply query isKnownUB and that will only need to look into the set (as before).

If I understand it correctly, this will mark as UB all the instructions from the UB instruction and forwards (i.e. the must be executed context goes to the successors). That will probably complicate things as to see if a BB is dead, with the current code, it makes sense to see its first instruction right?

The must-be-executed-context is not defined to only contain "successors". It might right now but that will change eventually.
I think we should stick to known information right now.

I still believe this is more complex and less general than the approach I tried to describe. Maybe the following is a good compromise in the direction I think we should going but working already right now:

  1. Since the must-be-executed-context is not collecting predecessors yet, we need to change that. The code actually exists already, it just needs to be separated from some other improvements and put for review again. I'll look into that (or @uenoku you can if you want to).

I'd like to work on it.

There's one problem with that when I tried it in assumeLive(). It queries AAUB and blocks that are not yet known to cause UB are assumed live and are never re-tested.

I see. Now this becomes a little more involved. What we can do is to call AAUB::isAssumedToCauseUB(BB.front()). Since we haven't visited BB yet, we will say true if there are instructions in there that might cause UB.
Now, that will require us to keep track of the instruction that transferred control to BB as we have to ask AAUB again. Not in assumeLive but in the updateImpl, before the the if(UsedAssumedInformation) check, we can filter the AliveSuccessors based on isAssumedToCauseUB(BB). If we filtered a block we need to set UsedAssumedInformation to true so we will revisit that instruction in the future. If isAssumedToCauseUB(BB) is true and isKnownToCauseUB(BB) as well, we can even remove the block without setting UsedAssumedInformation.

  1. Since the must-be-executed-context is not collecting predecessors yet, we need to change that. The code actually exists already, it just needs to be separated from some other improvements and put for review again. I'll look into that (or @uenoku you can if you want to).

I'd like to work on it.

Thx. There is a prototype patch out there that you can/should use as a starting point.

baziotis updated this revision to Diff 235657.Dec 30 2019, 3:54 PM

Thanks for your suggestion Johannes, fortunately that was an easy change. But it also impacts about 26 tests or so, which I won't be able to fix today.
However now the interaction between AAUB and AAIsDead becomes interesting. For example, for this code:

define internal i1 @ret_undef() {
  ret i1 undef
}

define void @cond_br_on_undef_interproc() {
  %cond = call i1 @ret_undef()
  br i1 %cond, label %t, label %e
t:
  ret void
e:
  ret void
}

the Attributor leaves... you guessed it, nothing. :)

jdoerfert added inline comments.Dec 30 2019, 9:27 PM
llvm/lib/Transforms/IPO/Attributor.cpp
3123

I think something like make_filter_range from llvm/include/llvm/ADT/STLExtras.h could make this nicer. Or maybe we just need to put it in a helper function, or both.

baziotis marked an inline comment as done.Dec 31 2019, 8:53 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Yes, using a helper would be better. make_filter_range seems cool, I didn't know it. TBH though, I think it gives less control and predictability. It's not as clear to me as in the current what happens under the hood (apart from the fact that we'll query 2 times the AAUB). Also, FWIW, the current will have a worst case of one allocation and one free and an average case of no allocation / free (assuming that most instructions have less than 8 successors).
I'll put it in a helper function and update me if you still think it's not that good and I'll change it to make filter_range.

baziotis marked an inline comment as done.Dec 31 2019, 9:05 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Btw, I forgot to explain the AssumedLiveBlocks.erase(BB) part. Basically, if an alive successor is UB and its parent BB has been put into AssumedLiveBlocks, then we should mark that BB dead by removing it. That however seems bad to me. Up to now, we only inserted to it and it feels that something can go badly if we start removing stuff (like, go to an endless loop or the not having a monotone procedure).
I'll take a look again, it was a quick change but feel free to update me if we should worry about that.

jdoerfert added inline comments.Dec 31 2019, 9:27 AM
llvm/lib/Transforms/IPO/Attributor.cpp
3123

I haven't read the code in detail, some observations:

if (AAUB.isKnownToCauseUB(AliveSuccessor) ||
         (Assumed = AAUB.isAssumedToCauseUB(AliveSuccessor))) {

Assumed is always true or uninitialized here. Reading it after results in true or UB. What you want is to check assumed and if true check known to determine if you used assumed information.


No need for count if you call erase. Just erase/insert stuff, the return value even tells you if it was in before.


I would like to understand the situation in which we think a block is UB *only after* it was put in AssumedLiveBlocks. We should not erase it but add an assertion so we can see if it triggers on the test. I hope it does not (and we can keep the assertion) as that would mean it works properly.

baziotis marked an inline comment as done.Dec 31 2019, 10:00 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Assumed is always true or uninitialized here. Reading it after results in true or UB. What you want is to check assumed and if true check known to determine if you used assumed information.

Yes, it should be initialized to false. Well, the reason the code has been written in this (maybe weird) way is because if Known returns true, we don't have to call assumed (which we will do if we always call assumed first). I think that if we just initialize to false it will be ok.


No need for count if you call erase. Just erase/insert stuff, the return value even tells you if it was in before.

Ty, I didn't know that.


I would like to understand the situation in which we think a block is UB *only after* it was put in AssumedLiveBlocks. We should not erase it but add an assertion so we can see if it triggers on the test. I hope it does not (and we can keep the assertion) as that would mean it works properly.

A simple way this can happen is with that:

define void @cond_br_on_undef_interproc() {
  br i1 undef, label %t, label %e
t:
  ret void
e:
  ret void
}

Note that br i1 undef, ... is in the entry block, which is assumed live in initialize() based on known info (and of course, at this point, it's not yet known it causes UB). So, this becomes:

define void @cond_br_on_undef_interproc() {
  unreachable
t:
  unreachable
e:
  unreachable
}

which is ok I guess, but we could have deleted the whole function (which we didn't since the block is still alive).
When other instructions go to the entry block, that gets more complicated.
I'm not yet sure what happens for non-entry blocks.

baziotis marked an inline comment as done.Dec 31 2019, 10:07 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Nit: Actually, we can remove Assumed completely and use UsedAssumedInformation which will probably make the code maybe too smart for our own good. :P

jdoerfert added inline comments.Dec 31 2019, 10:39 AM
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Note that br i1 undef, ... is in the entry block, which is assumed live in initialize() based on known info (and of course, at this point, it's not yet known it causes UB).

We need to apply the same logic (via the helper) in the initialize.

When other instructions go to the entry block, that gets more complicated.

That is forbidden.

I'm not yet sure what happens for non-entry blocks.

I think the logic in the initialize will allow us to place an assertion here.

Nit: Actually, we can remove Assumed completely and use UsedAssumedInformation which will probably make the code maybe too smart for our own good. :P

Variables do not cost anything. Make the core readable first.

baziotis marked an inline comment as done.Dec 31 2019, 11:13 AM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

When other instructions go to the entry block, that gets more complicated.

That is forbidden.

Sorry, I didn't mean to say "go" as "branch to". I meant when we add other instructions to the entry block.

We need to apply the same logic (via the helper) in the initialize.

Sorry, I didn't get that. Could you please specify what logic?
I was thinking something like (in the initialize):

  • If it is known to cause UB, just return.
  • Otherwise, insert Front in ToBeExplored and assumeLive() the block only if !AAUB.isAssumedToCauseUB(Front).

Unfortunately, I think that the fact that isAssumedToCauseUB() is over-optimistic will make all this more and more complicated.
Take for example this:

define i32 @example(i1* %alloc) {
  %cond = load i1, i1* %alloc
  br i1 %cond, label %t, label %e
t:
  ret i32 1
e:
  ret i32 2
}

(and assuming we're having the current thing in initialize()).

  1. The entry block will be marked live.
  2. AAIsDeadFunction::updateImpl() is called. The br is successor of %cond. But, because isAssumedToCauseUB() is overoptimistic, it will assume this UB. This in turn will insert %cond to KnownDeadEnds. Now we could say here that this is based on assumed info and another updateImpl() will correct but here's the catch (or you could call it something like a dead-lock):
  3. AAUB::updateImpl() will be called. But, this uses checkForAllInstructions(), which uses liveness, which in turn, uses isAssumedDead(). The br will be assumed dead (because its predecessor, %cond, is in KnownDeadEnds) and won't make it to the predicate which would eventually add it to AssumedNoUBInsts which would correct all this situation.

This means that the br never gets live and this procedure continues and we're left with:

define i32 @cond_br_on_undef_uninit(i1* nocapture nofree nonnull readonly dereferenceable(1) %alloc) #0 {
  %cond = load i1, i1* %alloc
  br i1 %cond, label %t, label %e

t:                                                ; preds = %0
  unreachable

e:                                                ; preds = %0
  unreachable
}
jdoerfert added inline comments.Dec 31 2019, 1:37 PM
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Sorry, I didn't get that. Could you please specify what logic?

The same as you apply now to filter "alive successors".


We should not mark entry live in the initilize if AAUB::isAssumedUB(EntryBB) is true. We probably need to redefine KnownDeadEnds such that we can add the entry block first instruction during initilize and it will not be assumed live. That will require some changes but should be possible. Let's talke that next year ;)

baziotis marked an inline comment as done.Dec 31 2019, 5:35 PM
baziotis added inline comments.
llvm/lib/Transforms/IPO/Attributor.cpp
3123

Let's talke that next year

Yes, happy new year. :)

Sorry for being late, this is an exam period so time is very limited. I saw again the problem and it changed a bit but it's similar. To be sure we're on the same page:
In this:

define i32 @example(i1* %alloc) {
  %cond = load i1, i1* %alloc
  br i1 %cond, label %t, label %e
t:
  ret i32 1
e:
  ret i32 2
}

The block of br initially is assumed live, but AAIsDeadFunction::updateImpl() is called, which, because the br is assumed UB, removes the block from the AssumedLiveBlocks.
AAUB looks only live instructions and the br now will never be live (also maybe the fact that we're removing - and it's the only place - messes with the monotony).

I think one solution is to only remove a block if it is known UB.

I forgot about this. Can we rebase it :) ?

I forgot about this. Can we rebase it :) ?

No problem. I just rebased and at first glance, the problem now does not exist, meaning the @example is output as is.
But a lot of things have changed in the Attributor so certainly I don't understand what is happening. I'll check again tomorrow.

baziotis updated this revision to Diff 245758.EditedFeb 20 2020, 3:21 PM
baziotis edited the summary of this revision. (Show Details)

I've been looking today again at this and it's the first time in the Attributor that I don't understand what is happening. This diff contains some prints and AAIsDeadFunction::manifest() is never called. I'll continue looking into this but any help is appreciated, I have lost episodes.

baziotis updated this revision to Diff 245764.EditedFeb 20 2020, 5:12 PM
  • Rebased with AAUB not manifesting

In this diff, note that I have commented the manifesting in AAUB, so that, for now, we can test UB blocks are deleted because of AAIsDead.
Btw, may I throw the idea that this be done in the end ? That is, blocks be deleted only because of AAIsDead.

Anyway, here are some examples:

define i32 @cond_br_on_undef_interproc(i1 %cond) {
  br i1 %cond, label %t, label %e
t:
  %a = load i32, i32* null
  ret i32 1
e:
  ret i32 2
}

In here, happens what we expect. The entry block is marked live initially and then in AAIsDeadFunction::updateImpl() the block with the UB is removed from alive blocks.
When the manifest phase in Attributor::run() comes, AAIsDeadFunction has 2/3 live blocks, so it is considered live (the AA) and its manifest ultimately deletes the block.

define i32 @example(i1* %alloc) {
  %cond = load i1, i1* %alloc
  br i1 %cond, label %t, label %e
t:
  ret i32 1
e:
  ret i32 2
}

Here, isAssumedToCauseUB() is overoptimistic and that means that the branch is assumed UB and its block is removed in AAIsDeadFunction::updateImpl().
The catch is that although assumed information is used, AAUB never corrects that since it looks at live instructions for UB.
That's a problem in and of itself but that was happening before. The weird thing is that in the manifest stage of run, AAIsDeadFunction is considered
dead because the entry point of the function is considered dead. Shouldn't that mean that the function has to be deleted?

Ping for when you have time. :)

As a note, I think that we shouldn't delete a block based on assumed info. Because then AAUB can never correct it.
This does not completely remove the problems but it should lead to a better path.

Tests?

llvm/lib/Transforms/IPO/Attributor.cpp
2105

What is happening here? Leftover?

2142

I would prefer if we pass the Explorer (or the Attributor) instead of caching it.

3117

I don't think retroactively removing blocks here is a good idea. Once assumed live it should stay that way.

baziotis updated this revision to Diff 248798.Mar 6 2020, 11:29 AM
baziotis edited the summary of this revision. (Show Details)

Don't delete blocks. Still not correct though.

baziotis marked 3 inline comments as done.Mar 6 2020, 11:35 AM

Tests?

A lot of tests fail but I'm not sure if it makes sense to include any tests that test the new behavior yet. Do you want invalid tests?
For example, this now:

define i32 @example(i1* %alloc) {
  %cond = load i1, i1* %alloc
  br i1 %cond, label %t, label %e
t:
  ret i32 1
e:
  ret i32 2
}

results in:

define i32 @example(i1* nocapture nofree nonnull readnone dereferenceable(1) %alloc) #0 {
  br i1 undef, label %t, label %e

t:                                                ; preds = %0
  unreachable

e:                                                ; preds = %0
  unreachable
}

And again, as far as I can understand, removing something from alive successors means it isn't getting updated by the AAUB because AAUB never sees it (as it is dead).
Honestly, unfortunately I can't see how this connection fits into the whole picture given the current structure.

llvm/lib/Transforms/IPO/Attributor.cpp
2105

No, I think a previous commit states that I put that to test only what AAIsDead deletes. I removed it now.

2142

Yes sorry, that was a leftover.

3117

I agree, as I had stated in a previous comment:

(also maybe the fact that we're removing - and it's the only place - messes with the monotony).

which as far as I can understand, still holds.

Let's postpone this until we have a chance to make the connection clear.

Let's postpone this until we have a chance to make the connection clear.

Ok, good.