This is an archive of the discontinued LLVM Phabricator instance.

[SafepointIRVerifier] Allow non-dereferencing uses of unrelocated or poisoned PHI nodes
ClosedPublic

Authored by DaniilSuchkov on Dec 8 2017, 6:27 AM.

Details

Summary

PHI that has at least one unrelocated input cannot cause any issues by itself,
though its uses should be carefully verified. With this patch PHIs are allowed
to have any inputs but when all inputs are unrelocated the PHI is marked as
unrelocated and if not all inputs are unrelocated then the PHI is marked as
poisoned. Poisoned pointers can be used only in three ways: to derive new
pointers, in PHIs or in comparisons against constants that are exclusively
derived from null.

Diff Detail

Repository
rL LLVM

Event Timeline

mkazantsev added inline comments.Dec 9 2017, 1:36 AM
lib/IR/SafepointIRVerifier.cpp
246 ↗(On Diff #126127)

Please clarify what is "a pointer derived from null". For example, is select %cond, null, %some derived from null? I think what you mean here is something like "... or against a constant pointer".

247 ↗(On Diff #126127)

How about "poisoned value is a value which is derived from both relocated and unrelocated values, or from another poisoned values"?

250 ↗(On Diff #126127)

You can always represent any constant pointer as gep null, %some_int_constant, so I think that this "exclusively derived from null" stuff is redundant.

255 ↗(On Diff #126127)

You can merge that free into P + Any = P if it makes sense.

260 ↗(On Diff #126127)

Maybe instead of using the term "merge pointers" stick to the term "derived pointer"? You use both, and I don't catch what is the difference between them.

261 ↗(On Diff #126127)

Maybe "A pointer derived from X and constant has the same type as X"? You could also include it into the rules above.

300 ↗(On Diff #126127)

as long as they are only used in safe instructions

532 ↗(On Diff #126127)

How about "Removing unrelocated" << I...

and below, "Removing poisoned " << I?

664 ↗(On Diff #126127)

"exclusively derived null pointer" -> "constant pointer"?

DaniilSuchkov added inline comments.Dec 9 2017, 5:18 AM
lib/IR/SafepointIRVerifier.cpp
246 ↗(On Diff #126127)

Actually I've missed one word. It should be "derived _exclusively_ from null".

247 ↗(On Diff #126127)

It's about origination of poisoned values.

250 ↗(On Diff #126127)

It refers to comment at line 639.

255 ↗(On Diff #126127)

I'd like to keep it this way.

260 ↗(On Diff #126127)

By "deriving" I mean f(x) -> x, and by merge f(x1, ..., xN) -> x.

300 ↗(On Diff #126127)

The idea is "_This_ instructions don't need verification. But nothing is said about their uses."

532 ↗(On Diff #126127)

It was intentional but I can't remember the reason so I'll fix it.

664 ↗(On Diff #126127)

Not all constant pointers are derived from null but anyway you spotted a typo, thank you.

mkazantsev added inline comments.Dec 11 2017, 4:03 PM
lib/IR/SafepointIRVerifier.cpp
247 ↗(On Diff #126127)

Even if it was like that, it has nothing to do with the rules below, since you don't explain where rules 2-4 come from. You only define how poison first appears from merging of U and R, but don't say how it's handled after that.

250 ↗(On Diff #126127)

I'm not asking to do something about it in this patch since it was there before, but it is fishy. If I can imagine a VM in which 0xFF is some special magical pointer that cannot be simply compared against normal pointers, then I can also imagine a VM where gep null, 0xFF is also some special magical pointer with same properties. Actually, I can define all such special numbers as derivatives from null.

From that perspective, how hard-coded constants are different from hard-coded offsets from null?

260 ↗(On Diff #126127)

I guess "deriving" is actually f(x1) -> x. And what you call "deriving" is just a particular case of "merging". Again, why do we need both?

300 ↗(On Diff #126127)

Ok, makes sense.

DaniilSuchkov marked 3 inline comments as done.

Comments slightly clarified.

lib/IR/SafepointIRVerifier.cpp
247 ↗(On Diff #126127)

This part is supposed to be a brief description. How it's handled is described bellow.

250 ↗(On Diff #126127)

I don't know either, but the idea is to keep this patch consistent with previous code. So I have to maintain the logic around "magic pointer constant". This patch is not about that issue, let's discuss it later.

260 ↗(On Diff #126127)

Because for deriving there is only one rule: it changes nothing and for merge everything is a bit more complicated. Both "gep/bitcast is merge" and "phi/select is deriving" looks misleading.

I agree that formally f(x1) -> x is a particular case of f(x1, ..., xN) -> x, but how to name it so that it won't be confusing?

anna added inline comments.Dec 15 2017, 7:18 AM
lib/IR/SafepointIRVerifier.cpp
250 ↗(On Diff #126127)

I'll try to clarify this. The GC relocates the *base* pointer and this is why we record the base pointer for every 'derived pointer'. After the GC relocates the base pointer at runtime, we can rematerialize the derived pointer because we have stored this information in the IR.

So, effectively it comes down to always identifying the "base" of a derived pointer. This is where the getBaseType comes in.
When we generate "magic const pointers" in IR (for example, using inttoptr magic const), the base here is that magic const.
The same idiom is GEP(null, magic const), but here the base pointer is null. Relocating a null is still a null.

So, this is why we have something like this:

%ptr = unrelocated non constant pointer
 compare (%ptr, inttoptr(magic_const)) <-- can't be reordered before a safepoint
but:
compare(%ptr, GEP(null, magic_const)) <-- can be reordered before a safepoint

Also, just as an aside, this is also why inttoptr of addrspace(1) is incorrect in the IRVerifier, but GEP(null, offset) in addrspace(1) is fine.

anna requested changes to this revision.Dec 15 2017, 9:33 AM

Comments inline.

lib/IR/SafepointIRVerifier.cpp
242 ↗(On Diff #126501)

I don't think we want to introduce one more term 'poisoned' here. Specifically, poisoning has different meanings in the optimizer (and sometimes in the GC), and can be confusing.
It looks like poisoned pointers are just derived pointers which will be *lexically* from multiple pointers. So something like:
gep, bitcasts -> derived pointer from one base
phis, selects -> derived pointers that are lexically from multiple base pointers (I say lexically, because we can have phis/selects that statically have derived from exactly one pointer).

Do we really need to make this distinction? I think it's more confusing. Pls see comment below on naming to make clearer.

262 ↗(On Diff #126501)

As mentioned, I dont think you need to explicitly state out the distinction here. Just maybe a single line at the beginning when explaining MultiSourceDerivedPtr (I prefer that instead of Poisoned naming), because we can have multiple sources and still not be poisoned.

267 ↗(On Diff #126501)

Nit: predecessor

279 ↗(On Diff #126501)

This is incorrect IR. We cannot have multiple *different* incoming phi values from exactly one predecessor.
With correct IR, I don't think we will have such false positives. If we do have, could you please add a test with FIXME?

260 ↗(On Diff #126127)

I tend to agree with Max here. We really cannot distinguish between both.
How about we focus just on the fact that we have multiple sources here? IOW, don't worry about GEPs and bitcasts because they have single source base.
These GEPs and bitcasts don't change the behaviour of unrelocated/relocated, so they shouldn;t affect this discusson.

This revision now requires changes to proceed.Dec 15 2017, 9:33 AM
mkazantsev added inline comments.Dec 15 2017, 1:12 PM
lib/IR/SafepointIRVerifier.cpp
242 ↗(On Diff #126501)

Yes, this may be confusing with poisoned pointers that are also in GC. I don't have a better idea for its name on top of my head, though. I'm OK to go with whatever you agree with.

279 ↗(On Diff #126501)

I don't get why it is incorrect. For example,

void *p = &a[10];
void *p1 = &p[20];
void *temp = p;
if (cond) {
  temp = p1;
  some_call();
}
void *p2 = temp;

Won't we have exactly this IR here?

250 ↗(On Diff #126127)

Ok, this makes sense to me.

260 ↗(On Diff #126127)

I think that "deriving" is a good name for both, because formally you apply some function f (where f can be gep, phi, bitcast or whatever) to a number of pointer arguments and have a new pointer (derived one) as result. It is unimportant -how- exactly you derived your pointer from your argument(s). You never fail verification while deriving. You can only fail verification when you misuse the derived pointer. And this is the important part.

anna added inline comments.Dec 15 2017, 7:11 PM
lib/IR/SafepointIRVerifier.cpp
279 ↗(On Diff #126501)

yup, you're right. What we have is different incoming values from different incoming blocks, something like: p2 = phi [p, def BB of p], [p1, safepoint block]
What's incorrect is, different incoming values from the same block. For example, p and p1 from their same def block: p2 = phi [p, def BB of p] [p1, def BB of p1].

DaniilSuchkov added inline comments.Dec 19 2017, 3:26 AM
lib/IR/SafepointIRVerifier.cpp
262 ↗(On Diff #126501)

We don't need MultiSourceDerivedPtr because we don't care at all from how many sources a value was derived, but we do care if all sources were (un)relocated or not. Value which was derived from multiple sources can be in any of three possible states (relocated, unrelocated, poisoned).

279 ↗(On Diff #126501)

To avoid confusion I'll make this comment a bit more clear.

DaniilSuchkov marked 6 inline comments as done.

"merge" now replaced with "derive" in comments, example in FIXME become a bit less confusing, added new test (with XFAIL) for that FIXME.

anna requested changes to this revision.Dec 21 2017, 9:09 AM

Logic looks right, some comments inline.

lib/IR/SafepointIRVerifier.cpp
409 ↗(On Diff #127496)

Pls add a comment here on why the poisoned defs are skipped. ValidUnrelocatedDefs are obvious.

481 ↗(On Diff #127496)

Don't we need to handle selects and identify poisoned versus unrelocated?

491 ↗(On Diff #127496)

Instead of the below code (logic is right, but too many conditionals), could we do something like this:

if (isNotExclusivelyConstantDerived(InValue)) {
  if (isValuePoisoned(InValue) || (HasRelocatedInputs &&  HasUnrelocatedInputs)) {
     PoisonedPointerDef = true;
     break;
   }
   if (BlockMap[InBB]->AvailableOut.count(InValue))
              HasRelocatedInputs = true;
    else
              HasUnrelocatedInputs = true;
}
if (!PoisonedPointerDef && HasUnrelocatedInputs) {
   assert(!HasRelocatedInputs && "Should be poisoned!");
   ValidUnrelocatedPointerDef = true;
}
This revision now requires changes to proceed.Dec 21 2017, 9:09 AM
DaniilSuchkov added inline comments.Dec 21 2017, 9:43 AM
lib/IR/SafepointIRVerifier.cpp
409 ↗(On Diff #127496)

You think I should repeat myself here? From comments above (where 'poisoned' pointers are introduced) it's clear why this defs are skipped.

481 ↗(On Diff #127496)

I'll do it in the next patch (in order to keep this one not too huge).

491 ↗(On Diff #127496)

But we still have to handle case when some inputs are relocated and some are not: your code won't work if the _last_ input makes this phi poisoned (because on each iteration it checks flags that might have changed on previous one). Thus we cannot remove this part:

if (HasUnrelocatedInputs) {
  if (HasRelocatedInputs)
    PoisonedPointerDef = true;
  else
    ValidUnrelocatedPointerDef = true;
}

So the only thing that might be changed is this part:

if (isValuePoisoned(InValue)) {
  // If any of inputs is poisoned, output is always poisoned too.
  HasRelocatedInputs = true;
  HasUnrelocatedInputs = true;
  break;
}

But if we'll change it to

if (isValuePoisoned(InValue)) {
  // If any of inputs is poisoned, output is always poisoned too.
  PoisonedPointerDef = true;
  break;
}

We'll have to somehow tell that if (mentioned before) not to touch this flag and it'll be even worse.
And that assert shouldn't be moved to PHI's branch, it's about all instructions.

Currently this part is pretty clear and straightforward: loop over phi's inputs sets two flags (that have clear meaning) and after that loop another two flags are changed accordingly.

anna accepted this revision.Dec 21 2017, 9:50 AM

lgtm w/ comment addressed.

lib/IR/SafepointIRVerifier.cpp
409 ↗(On Diff #127496)

Are all poisoned defs removed? Only valid ones right.

481 ↗(On Diff #127496)

pls add a TODO then.

491 ↗(On Diff #127496)

ah yes, it wont work for the last case.

This revision is now accepted and ready to land.Dec 21 2017, 9:50 AM
anna added inline comments.Dec 21 2017, 9:53 AM
lib/IR/SafepointIRVerifier.cpp
491 ↗(On Diff #127496)

also, pls point out as a comment where these rules come from (i.e. refer to header for reasoning of these rules).

DaniilSuchkov marked 3 inline comments as done.

Added some new comments.

This revision was automatically updated to reflect the committed changes.