PHI that has at least one unrelocated input cannot cause any issues by itself,
though its uses should be carefully verified. With this patch PHIs are allowed
to have any inputs but when all inputs are unrelocated the PHI is marked as
unrelocated and if not all inputs are unrelocated then the PHI is marked as
poisoned. Poisoned pointers can be used only in three ways: to derive new
pointers, in PHIs or in comparisons against constants that are exclusively
derived from null.
Details
Diff Detail
Event Timeline
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
246 | Please clarify what is "a pointer derived from null". For example, is select %cond, null, %some derived from null? I think what you mean here is something like "... or against a constant pointer". | |
247 | How about "poisoned value is a value which is derived from both relocated and unrelocated values, or from another poisoned values"? | |
250 | You can always represent any constant pointer as gep null, %some_int_constant, so I think that this "exclusively derived from null" stuff is redundant. | |
255 | You can merge that free into P + Any = P if it makes sense. | |
260 | Maybe instead of using the term "merge pointers" stick to the term "derived pointer"? You use both, and I don't catch what is the difference between them. | |
261 | Maybe "A pointer derived from X and constant has the same type as X"? You could also include it into the rules above. | |
300 | as long as they are only used in safe instructions | |
532 | How about "Removing unrelocated" << I... and below, "Removing poisoned " << I? | |
664 | "exclusively derived null pointer" -> "constant pointer"? |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
246 | Actually I've missed one word. It should be "derived _exclusively_ from null". | |
247 | It's about origination of poisoned values. | |
250 | It refers to comment at line 639. | |
255 | I'd like to keep it this way. | |
260 | By "deriving" I mean f(x) -> x, and by merge f(x1, ..., xN) -> x. | |
300 | The idea is "_This_ instructions don't need verification. But nothing is said about their uses." | |
532 | It was intentional but I can't remember the reason so I'll fix it. | |
664 | Not all constant pointers are derived from null but anyway you spotted a typo, thank you. |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
247 | Even if it was like that, it has nothing to do with the rules below, since you don't explain where rules 2-4 come from. You only define how poison first appears from merging of U and R, but don't say how it's handled after that. | |
250 | I'm not asking to do something about it in this patch since it was there before, but it is fishy. If I can imagine a VM in which 0xFF is some special magical pointer that cannot be simply compared against normal pointers, then I can also imagine a VM where gep null, 0xFF is also some special magical pointer with same properties. Actually, I can define all such special numbers as derivatives from null. From that perspective, how hard-coded constants are different from hard-coded offsets from null? | |
260 | I guess "deriving" is actually f(x1) -> x. And what you call "deriving" is just a particular case of "merging". Again, why do we need both? | |
300 | Ok, makes sense. |
Comments slightly clarified.
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
247 | This part is supposed to be a brief description. How it's handled is described bellow. | |
250 | I don't know either, but the idea is to keep this patch consistent with previous code. So I have to maintain the logic around "magic pointer constant". This patch is not about that issue, let's discuss it later. | |
260 | Because for deriving there is only one rule: it changes nothing and for merge everything is a bit more complicated. Both "gep/bitcast is merge" and "phi/select is deriving" looks misleading. I agree that formally f(x1) -> x is a particular case of f(x1, ..., xN) -> x, but how to name it so that it won't be confusing? |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
250 | I'll try to clarify this. The GC relocates the *base* pointer and this is why we record the base pointer for every 'derived pointer'. After the GC relocates the base pointer at runtime, we can rematerialize the derived pointer because we have stored this information in the IR. So, effectively it comes down to always identifying the "base" of a derived pointer. This is where the getBaseType comes in. So, this is why we have something like this: %ptr = unrelocated non constant pointer compare (%ptr, inttoptr(magic_const)) <-- can't be reordered before a safepoint but: compare(%ptr, GEP(null, magic_const)) <-- can be reordered before a safepoint Also, just as an aside, this is also why inttoptr of addrspace(1) is incorrect in the IRVerifier, but GEP(null, offset) in addrspace(1) is fine. |
Comments inline.
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
242 | I don't think we want to introduce one more term 'poisoned' here. Specifically, poisoning has different meanings in the optimizer (and sometimes in the GC), and can be confusing. Do we really need to make this distinction? I think it's more confusing. Pls see comment below on naming to make clearer. | |
260 | I tend to agree with Max here. We really cannot distinguish between both. | |
262 | As mentioned, I dont think you need to explicitly state out the distinction here. Just maybe a single line at the beginning when explaining MultiSourceDerivedPtr (I prefer that instead of Poisoned naming), because we can have multiple sources and still not be poisoned. | |
267 | Nit: predecessor | |
279 | This is incorrect IR. We cannot have multiple *different* incoming phi values from exactly one predecessor. |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
242 | Yes, this may be confusing with poisoned pointers that are also in GC. I don't have a better idea for its name on top of my head, though. I'm OK to go with whatever you agree with. | |
250 | Ok, this makes sense to me. | |
260 | I think that "deriving" is a good name for both, because formally you apply some function f (where f can be gep, phi, bitcast or whatever) to a number of pointer arguments and have a new pointer (derived one) as result. It is unimportant -how- exactly you derived your pointer from your argument(s). You never fail verification while deriving. You can only fail verification when you misuse the derived pointer. And this is the important part. | |
279 | I don't get why it is incorrect. For example, void *p = &a[10]; void *p1 = &p[20]; void *temp = p; if (cond) { temp = p1; some_call(); } void *p2 = temp; Won't we have exactly this IR here? |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
279 | yup, you're right. What we have is different incoming values from different incoming blocks, something like: p2 = phi [p, def BB of p], [p1, safepoint block] |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
262 | We don't need MultiSourceDerivedPtr because we don't care at all from how many sources a value was derived, but we do care if all sources were (un)relocated or not. Value which was derived from multiple sources can be in any of three possible states (relocated, unrelocated, poisoned). | |
279 | To avoid confusion I'll make this comment a bit more clear. |
"merge" now replaced with "derive" in comments, example in FIXME become a bit less confusing, added new test (with XFAIL) for that FIXME.
Logic looks right, some comments inline.
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
408 | Pls add a comment here on why the poisoned defs are skipped. ValidUnrelocatedDefs are obvious. | |
480 | Don't we need to handle selects and identify poisoned versus unrelocated? | |
490 | Instead of the below code (logic is right, but too many conditionals), could we do something like this: if (isNotExclusivelyConstantDerived(InValue)) { if (isValuePoisoned(InValue) || (HasRelocatedInputs && HasUnrelocatedInputs)) { PoisonedPointerDef = true; break; } if (BlockMap[InBB]->AvailableOut.count(InValue)) HasRelocatedInputs = true; else HasUnrelocatedInputs = true; } if (!PoisonedPointerDef && HasUnrelocatedInputs) { assert(!HasRelocatedInputs && "Should be poisoned!"); ValidUnrelocatedPointerDef = true; } |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
408 | You think I should repeat myself here? From comments above (where 'poisoned' pointers are introduced) it's clear why this defs are skipped. | |
480 | I'll do it in the next patch (in order to keep this one not too huge). | |
490 | But we still have to handle case when some inputs are relocated and some are not: your code won't work if the _last_ input makes this phi poisoned (because on each iteration it checks flags that might have changed on previous one). Thus we cannot remove this part: if (HasUnrelocatedInputs) { if (HasRelocatedInputs) PoisonedPointerDef = true; else ValidUnrelocatedPointerDef = true; } So the only thing that might be changed is this part: if (isValuePoisoned(InValue)) { // If any of inputs is poisoned, output is always poisoned too. HasRelocatedInputs = true; HasUnrelocatedInputs = true; break; } But if we'll change it to if (isValuePoisoned(InValue)) { // If any of inputs is poisoned, output is always poisoned too. PoisonedPointerDef = true; break; } We'll have to somehow tell that if (mentioned before) not to touch this flag and it'll be even worse. Currently this part is pretty clear and straightforward: loop over phi's inputs sets two flags (that have clear meaning) and after that loop another two flags are changed accordingly. |
lib/IR/SafepointIRVerifier.cpp | ||
---|---|---|
490 | also, pls point out as a comment where these rules come from (i.e. refer to header for reasoning of these rules). |
I don't think we want to introduce one more term 'poisoned' here. Specifically, poisoning has different meanings in the optimizer (and sometimes in the GC), and can be confusing.
It looks like poisoned pointers are just derived pointers which will be *lexically* from multiple pointers. So something like:
gep, bitcasts -> derived pointer from one base
phis, selects -> derived pointers that are lexically from multiple base pointers (I say lexically, because we can have phis/selects that statically have derived from exactly one pointer).
Do we really need to make this distinction? I think it's more confusing. Pls see comment below on naming to make clearer.