This is an archive of the discontinued LLVM Phabricator instance.

[SafepointIRVerifier] Allow non-dereferencing uses of unrelocated or poisoned PHI nodes
ClosedPublic

Authored by DaniilSuchkov on Dec 8 2017, 6:27 AM.

Details

Summary

PHI that has at least one unrelocated input cannot cause any issues by itself,
though its uses should be carefully verified. With this patch PHIs are allowed
to have any inputs but when all inputs are unrelocated the PHI is marked as
unrelocated and if not all inputs are unrelocated then the PHI is marked as
poisoned. Poisoned pointers can be used only in three ways: to derive new
pointers, in PHIs or in comparisons against constants that are exclusively
derived from null.

Diff Detail

Event Timeline

mkazantsev added inline comments.Dec 9 2017, 1:36 AM
lib/IR/SafepointIRVerifier.cpp
246

Please clarify what is "a pointer derived from null". For example, is select %cond, null, %some derived from null? I think what you mean here is something like "... or against a constant pointer".

247

How about "poisoned value is a value which is derived from both relocated and unrelocated values, or from another poisoned values"?

250

You can always represent any constant pointer as gep null, %some_int_constant, so I think that this "exclusively derived from null" stuff is redundant.

255

You can merge that free into P + Any = P if it makes sense.

260

Maybe instead of using the term "merge pointers" stick to the term "derived pointer"? You use both, and I don't catch what is the difference between them.

261

Maybe "A pointer derived from X and constant has the same type as X"? You could also include it into the rules above.

301

as long as they are only used in safe instructions

538

How about "Removing unrelocated" << I...

and below, "Removing poisoned " << I?

670

"exclusively derived null pointer" -> "constant pointer"?

DaniilSuchkov added inline comments.Dec 9 2017, 5:18 AM
lib/IR/SafepointIRVerifier.cpp
246

Actually I've missed one word. It should be "derived _exclusively_ from null".

247

It's about origination of poisoned values.

250

It refers to comment at line 639.

255

I'd like to keep it this way.

260

By "deriving" I mean f(x) -> x, and by merge f(x1, ..., xN) -> x.

301

The idea is "_This_ instructions don't need verification. But nothing is said about their uses."

538

It was intentional but I can't remember the reason so I'll fix it.

670

Not all constant pointers are derived from null but anyway you spotted a typo, thank you.

mkazantsev added inline comments.Dec 11 2017, 4:03 PM
lib/IR/SafepointIRVerifier.cpp
247

Even if it was like that, it has nothing to do with the rules below, since you don't explain where rules 2-4 come from. You only define how poison first appears from merging of U and R, but don't say how it's handled after that.

250

I'm not asking to do something about it in this patch since it was there before, but it is fishy. If I can imagine a VM in which 0xFF is some special magical pointer that cannot be simply compared against normal pointers, then I can also imagine a VM where gep null, 0xFF is also some special magical pointer with same properties. Actually, I can define all such special numbers as derivatives from null.

From that perspective, how hard-coded constants are different from hard-coded offsets from null?

260

I guess "deriving" is actually f(x1) -> x. And what you call "deriving" is just a particular case of "merging". Again, why do we need both?

301

Ok, makes sense.

DaniilSuchkov marked 3 inline comments as done.

Comments slightly clarified.

lib/IR/SafepointIRVerifier.cpp
247

This part is supposed to be a brief description. How it's handled is described bellow.

250

I don't know either, but the idea is to keep this patch consistent with previous code. So I have to maintain the logic around "magic pointer constant". This patch is not about that issue, let's discuss it later.

260

Because for deriving there is only one rule: it changes nothing and for merge everything is a bit more complicated. Both "gep/bitcast is merge" and "phi/select is deriving" looks misleading.

I agree that formally f(x1) -> x is a particular case of f(x1, ..., xN) -> x, but how to name it so that it won't be confusing?

anna added inline comments.Dec 15 2017, 7:18 AM
lib/IR/SafepointIRVerifier.cpp
250

I'll try to clarify this. The GC relocates the *base* pointer and this is why we record the base pointer for every 'derived pointer'. After the GC relocates the base pointer at runtime, we can rematerialize the derived pointer because we have stored this information in the IR.

So, effectively it comes down to always identifying the "base" of a derived pointer. This is where the getBaseType comes in.
When we generate "magic const pointers" in IR (for example, using inttoptr magic const), the base here is that magic const.
The same idiom is GEP(null, magic const), but here the base pointer is null. Relocating a null is still a null.

So, this is why we have something like this:

%ptr = unrelocated non constant pointer
 compare (%ptr, inttoptr(magic_const)) <-- can't be reordered before a safepoint
but:
compare(%ptr, GEP(null, magic_const)) <-- can be reordered before a safepoint

Also, just as an aside, this is also why inttoptr of addrspace(1) is incorrect in the IRVerifier, but GEP(null, offset) in addrspace(1) is fine.

anna requested changes to this revision.Dec 15 2017, 9:33 AM

Comments inline.

lib/IR/SafepointIRVerifier.cpp
242

I don't think we want to introduce one more term 'poisoned' here. Specifically, poisoning has different meanings in the optimizer (and sometimes in the GC), and can be confusing.
It looks like poisoned pointers are just derived pointers which will be *lexically* from multiple pointers. So something like:
gep, bitcasts -> derived pointer from one base
phis, selects -> derived pointers that are lexically from multiple base pointers (I say lexically, because we can have phis/selects that statically have derived from exactly one pointer).

Do we really need to make this distinction? I think it's more confusing. Pls see comment below on naming to make clearer.

260

I tend to agree with Max here. We really cannot distinguish between both.
How about we focus just on the fact that we have multiple sources here? IOW, don't worry about GEPs and bitcasts because they have single source base.
These GEPs and bitcasts don't change the behaviour of unrelocated/relocated, so they shouldn;t affect this discusson.

262

As mentioned, I dont think you need to explicitly state out the distinction here. Just maybe a single line at the beginning when explaining MultiSourceDerivedPtr (I prefer that instead of Poisoned naming), because we can have multiple sources and still not be poisoned.

267

Nit: predecessor

279

This is incorrect IR. We cannot have multiple *different* incoming phi values from exactly one predecessor.
With correct IR, I don't think we will have such false positives. If we do have, could you please add a test with FIXME?

This revision now requires changes to proceed.Dec 15 2017, 9:33 AM
mkazantsev added inline comments.Dec 15 2017, 1:12 PM
lib/IR/SafepointIRVerifier.cpp
242

Yes, this may be confusing with poisoned pointers that are also in GC. I don't have a better idea for its name on top of my head, though. I'm OK to go with whatever you agree with.

250

Ok, this makes sense to me.

260

I think that "deriving" is a good name for both, because formally you apply some function f (where f can be gep, phi, bitcast or whatever) to a number of pointer arguments and have a new pointer (derived one) as result. It is unimportant -how- exactly you derived your pointer from your argument(s). You never fail verification while deriving. You can only fail verification when you misuse the derived pointer. And this is the important part.

279

I don't get why it is incorrect. For example,

void *p = &a[10];
void *p1 = &p[20];
void *temp = p;
if (cond) {
  temp = p1;
  some_call();
}
void *p2 = temp;

Won't we have exactly this IR here?

anna added inline comments.Dec 15 2017, 7:11 PM
lib/IR/SafepointIRVerifier.cpp
279

yup, you're right. What we have is different incoming values from different incoming blocks, something like: p2 = phi [p, def BB of p], [p1, safepoint block]
What's incorrect is, different incoming values from the same block. For example, p and p1 from their same def block: p2 = phi [p, def BB of p] [p1, def BB of p1].

DaniilSuchkov added inline comments.Dec 19 2017, 3:26 AM
lib/IR/SafepointIRVerifier.cpp
262

We don't need MultiSourceDerivedPtr because we don't care at all from how many sources a value was derived, but we do care if all sources were (un)relocated or not. Value which was derived from multiple sources can be in any of three possible states (relocated, unrelocated, poisoned).

279

To avoid confusion I'll make this comment a bit more clear.

DaniilSuchkov marked 6 inline comments as done.

"merge" now replaced with "derive" in comments, example in FIXME become a bit less confusing, added new test (with XFAIL) for that FIXME.

anna requested changes to this revision.Dec 21 2017, 9:09 AM

Logic looks right, some comments inline.

lib/IR/SafepointIRVerifier.cpp
409

Pls add a comment here on why the poisoned defs are skipped. ValidUnrelocatedDefs are obvious.

485

Don't we need to handle selects and identify poisoned versus unrelocated?

495

Instead of the below code (logic is right, but too many conditionals), could we do something like this:

if (isNotExclusivelyConstantDerived(InValue)) {
  if (isValuePoisoned(InValue) || (HasRelocatedInputs &&  HasUnrelocatedInputs)) {
     PoisonedPointerDef = true;
     break;
   }
   if (BlockMap[InBB]->AvailableOut.count(InValue))
              HasRelocatedInputs = true;
    else
              HasUnrelocatedInputs = true;
}
if (!PoisonedPointerDef && HasUnrelocatedInputs) {
   assert(!HasRelocatedInputs && "Should be poisoned!");
   ValidUnrelocatedPointerDef = true;
}
This revision now requires changes to proceed.Dec 21 2017, 9:09 AM
DaniilSuchkov added inline comments.Dec 21 2017, 9:43 AM
lib/IR/SafepointIRVerifier.cpp
409

You think I should repeat myself here? From comments above (where 'poisoned' pointers are introduced) it's clear why this defs are skipped.

485

I'll do it in the next patch (in order to keep this one not too huge).

495

But we still have to handle case when some inputs are relocated and some are not: your code won't work if the _last_ input makes this phi poisoned (because on each iteration it checks flags that might have changed on previous one). Thus we cannot remove this part:

if (HasUnrelocatedInputs) {
  if (HasRelocatedInputs)
    PoisonedPointerDef = true;
  else
    ValidUnrelocatedPointerDef = true;
}

So the only thing that might be changed is this part:

if (isValuePoisoned(InValue)) {
  // If any of inputs is poisoned, output is always poisoned too.
  HasRelocatedInputs = true;
  HasUnrelocatedInputs = true;
  break;
}

But if we'll change it to

if (isValuePoisoned(InValue)) {
  // If any of inputs is poisoned, output is always poisoned too.
  PoisonedPointerDef = true;
  break;
}

We'll have to somehow tell that if (mentioned before) not to touch this flag and it'll be even worse.
And that assert shouldn't be moved to PHI's branch, it's about all instructions.

Currently this part is pretty clear and straightforward: loop over phi's inputs sets two flags (that have clear meaning) and after that loop another two flags are changed accordingly.

anna accepted this revision.Dec 21 2017, 9:50 AM

lgtm w/ comment addressed.

lib/IR/SafepointIRVerifier.cpp
409

Are all poisoned defs removed? Only valid ones right.

485

pls add a TODO then.

495

ah yes, it wont work for the last case.

This revision is now accepted and ready to land.Dec 21 2017, 9:50 AM
anna added inline comments.Dec 21 2017, 9:53 AM
lib/IR/SafepointIRVerifier.cpp
495

also, pls point out as a comment where these rules come from (i.e. refer to header for reasoning of these rules).

DaniilSuchkov marked 3 inline comments as done.

Added some new comments.

This revision was automatically updated to reflect the committed changes.