This is an archive of the discontinued LLVM Phabricator instance.

[LVI] Normalize pointer behavior
ClosedPublic

Authored by nikic on Nov 6 2019, 10:59 AM.

Details

Summary

Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses.

The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block.

This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change).

Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value.

I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior.

Diff Detail

Event Timeline

nikic created this revision.Nov 6 2019, 10:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2019, 10:59 AM
reames requested changes to this revision.Nov 6 2019, 6:50 PM

This is definitely heading in the right direction. Nicely done overall, see suggestions as inline comments.

llvm/lib/Analysis/LazyValueInfo.cpp
260–261

Minor: I think this can become a free function as opposed to a member function. It doesn't appear to be using member state.

630

Please add a todo about using address space property (i.e. null is defined) instead of the raw AS==0 check here.

790

Ok, I don't think this placement quite works. The challenge is that your changing the point of the block scan from the furthest reached block during the backwards walk (entry, or point of overdefine) to instead scan the query source block.

I'd suggest adjusting the two call sites you remove to use the new caching logic, and then only do the scan if a) context is terminator, or b) context is not in same block (i.e. full block case)

This revision now requires changes to proceed.Nov 6 2019, 6:50 PM
nikic marked an inline comment as done.Nov 7 2019, 12:27 AM
nikic added inline comments.
llvm/lib/Analysis/LazyValueInfo.cpp
790

The challenge is that your changing the point of the block scan from the furthest reached block during the backwards walk (entry, or point of overdefine) to instead scan the query source block.

I believe the behavior of the scan essentially stays the same: intersectAssume() is not only used to calculate the final result of the query, it is also invoked internally by LVI when computing edge values. So if you query a pointer in BB2 with incoming edge from BB1, then the first thing we do is calculate the edge value BB1 -> BB2, at which point we will intersectAssume() with the terminator of BB1.

I'd suggest adjusting the two call sites you remove to use the new caching logic, and then only do the scan if a) context is terminator, or b) context is not in same block (i.e. full block case)

We'd have to insert checks into getEdgeValue() (always) and getValueAt() / getValueInBlock() (if context is terminator), which are exactly the places where we already use intersectAssume().

nikic updated this revision to Diff 228253.Nov 7 2019, 8:24 AM
nikic marked 2 inline comments as done.

Make eraseValueFromPerBlockValueCache a free function, add todo.

reames accepted this revision.Nov 8 2019, 8:17 AM

LGTM. Very nice cleanup, thank you for doing this.

llvm/lib/Analysis/LazyValueInfo.cpp
790

You are correct. I'd misremembered the code and hadn't checked the callers before commenting. Objection withdrawn.

This revision is now accepted and ready to land.Nov 8 2019, 8:17 AM
This revision was automatically updated to reflect the committed changes.
nikic reopened this revision.Nov 18 2019, 12:56 AM

Reopening as this has been reverted in rG7a3ad48d6de0e79a92361252a815b894565b9a0f.

This revision is now accepted and ready to land.Nov 18 2019, 12:56 AM
lebedev.ri requested changes to this revision.Nov 25 2019, 2:07 AM
lebedev.ri added a subscriber: echristo.

Just marking as "changes needed" to denote the revert until there's more info.
@echristo any progress on reproducer?

This revision now requires changes to proceed.Nov 25 2019, 2:07 AM
nikic added a comment.Nov 25 2019, 2:35 PM

I'll rebase this over D70376 once that lands, which should hopefully fix the python compilation issue. To provide some context, we were not properly invalidating elements in the overdefined value cache if they are deleted, which might result in incorrect "overdefined" result if something later happens to be allocated at the same location. This just causes non-determinism, because "overdefined" is a conservative assumption. However, this patch introduces a second cache that uses the same structure as the overdefined cache, and here missing the invalidation may result in a miscompile instead.

I'm traveling right now, so will probably only get around to this next week.

nikic updated this revision to Diff 232187.Dec 4 2019, 11:53 AM

Rebase over D70376, fixing cache invalidation.

nikic added a comment.Dec 5 2019, 6:52 AM

Would be good if someone could take another quick look at it, as the caching part changed significantly. The fact that the LVI caching happens in a separate class from LVI (why?) makes things slightly awkward.

Ping for re-review.

reames requested changes to this revision.Dec 12 2019, 12:01 PM
reames added inline comments.
llvm/lib/Analysis/LazyValueInfo.cpp
163

Can you add a note to the comment about the None state being used to represent a block whose dereferenceability hasn't yet been memoized?

672

This function signature and usage seems needlessly complicated here. Can I suggest something along the lines of the following:
if (None == TheCache[BB].DereferenceablePointers) {

// populate cache

}

return TheCache[BB].DereferenceablePointers->count(V);

You could hide the population inside an ensureDereferenceabilityCached or getDereferenceablePointers interface if you wanted.

This revision now requires changes to proceed.Dec 12 2019, 12:01 PM
nikic marked an inline comment as done.Dec 12 2019, 12:10 PM
nikic added inline comments.
llvm/lib/Analysis/LazyValueInfo.cpp
672

The somewhat awkward API here is used to keep all the members of LazyValueInfoCache private -- I'd have to make parts of it public to allow directly populating the cache directly here.

nikic marked an inline comment as done.Dec 12 2019, 12:16 PM
nikic added inline comments.
llvm/lib/Analysis/LazyValueInfo.cpp
672

Or is the suggestion here to just move the population of the cache itself into LazyValueInfoCache as well? That would certainly be easiest.

nikic updated this revision to Diff 233674.Dec 12 2019, 1:23 PM

Improve cache interface: Use a single isPointerDereferenceInBlock() method which accepts a callback to initialize the cache. I think that gives a reasonable balance between hiding cache details and having natural control flow. What do you think?

reames accepted this revision.Dec 12 2019, 4:34 PM

LGTM again.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 13 2019, 12:14 AM
This revision was automatically updated to reflect the committed changes.
nikic reopened this revision.Jun 14 2020, 1:08 PM

Reopening this, as it was reverted (quite a while ago) as part of rG02a6b0bc3b54571f890a531e8c16b4c1173c55d0.

nikic updated this revision to Diff 272250.Jun 20 2020, 4:12 AM

Rebase.

Unfortunately I found that the compile-time impact of this change is a bit hit and miss: https://llvm-compile-time-tracker.com/compare.php?from=f87b785abee0da8939fdd5900a982311b4c25409&to=f3490beadb768e921e531fd61450a7c5bfa84f2d&stat=instructions Some benchmarks improve, some regress. The reason is that the new implementation performs only a single scan to find dereferenced pointers, but performs a GetUnderlyingObject() call on every query, which is quite expensive. It's probably necessary to cache underlying objects as well.

Furthermore this change seems to have a larger impact on optimization than I expected, with larger code-size changes for some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=f87b785abee0da8939fdd5900a982311b4c25409&to=f3490beadb768e921e531fd61450a7c5bfa84f2d&stat=size-text Apparently being able to eliminate a null branch at the end of the block where the pointer is dereferenced is quite valuable.

fhahn added a subscriber: fhahn.Jun 22 2020, 7:02 AM

Furthermore this change seems to have a larger impact on optimization than I expected, with larger code-size changes for some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=f87b785abee0da8939fdd5900a982311b4c25409&to=f3490beadb768e921e531fd61450a7c5bfa84f2d&stat=size-text Apparently being able to eliminate a null branch at the end of the block where the pointer is dereferenced is quite valuable.

Interesting! I am wondering if it might be worth to handle simplifications based on dereferenceability in a separate pass, rather than making the caching more complicated. I am not sure how much benefit we get from other places in LVI knowing that a ptr is not null, but I would expect it not too be too much.

I've put up a simple prototype that just keeps track of dereferenced objects based on dominance D82299 and eliminates null checks in a single traversal of a sorted list of dereferences/compares.

nikic updated this revision to Diff 272514.Jun 22 2020, 12:12 PM

Simplify the patch a bit, in particular remove the movement of isKnownNonZero that really has nothing to do with this patch. Also rename everything to use nonnull rather than dereferencable terminology, to make this more amenable to future extension.

nikic added a comment.Jun 22 2020, 1:19 PM

Thanks to D82261 GetUnderlyingObject() is now cheap enough than caching is not worth the bother. Most of the remaining cost there will be addressed by D81867. New compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=37d3030711cc30564fb142154e4e8cabdc91724e&to=e518ba98b7fb90f2645b91edd8cdeae40ad6f4df&stat=instructions I profiled the sqlite case and found that while LVI cost does increase a bit, the main contribution are second-order effects. As such I'm happy enough with performance to reapply this.

Furthermore this change seems to have a larger impact on optimization than I expected, with larger code-size changes for some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=f87b785abee0da8939fdd5900a982311b4c25409&to=f3490beadb768e921e531fd61450a7c5bfa84f2d&stat=size-text Apparently being able to eliminate a null branch at the end of the block where the pointer is dereferenced is quite valuable.

Interesting! I am wondering if it might be worth to handle simplifications based on dereferenceability in a separate pass, rather than making the caching more complicated. I am not sure how much benefit we get from other places in LVI knowing that a ptr is not null, but I would expect it not too be too much.

I've put up a simple prototype that just keeps track of dereferenced objects based on dominance D82299 and eliminates null checks in a single traversal of a sorted list of dereferences/compares.

Thanks for looking into this! Some thoughts on the tradeoff between a separate pass and having this in LVI:

  • LVI gets nonnull information from three sources: isKnownNonZero for basic cases, pointer dereferences, and pointer comparisons. One important aspect here is that this information is combined, and not limited to dominating cases only. For example, if you have two incoming edges and the pointer is dereferenced in both predecessor blocks, LVI knows that is is non-null (even though neither dereference is dominating). Similarly, if the pointer is dereferenced in one predecessor, but the other one is guarded by a !== null check, we still get the same result.
  • CVP uses LVI nonnull information both for compares and for call nonnull attributes. Handling those two in a separate pass is easy.
  • JumpThreading is unfortunately outside my area of familiarity, but I believe this is the part where having nonnull-reasoning inside LVI rather than separately is important. Jump threading does a lot of reasoning about predicates on edges and queries LVI to do so. Dropping this functionality from LVI would probably make threading of !== null less effective.
nikic added a comment.Jul 5 2020, 10:33 AM

Any further input here? I'd like to reapply this, to unblock D69686.

This revision is now accepted and ready to land.Jul 24 2020, 3:25 AM
fhahn added a comment.Jul 24 2020, 7:24 AM

Thanks to D82261 GetUnderlyingObject() is now cheap enough than caching is not worth the bother. Most of the remaining cost there will be addressed by D81867. New compile-time numbers: http://llvm-compile-time-tracker.com/compare.php?from=37d3030711cc30564fb142154e4e8cabdc91724e&to=e518ba98b7fb90f2645b91edd8cdeae40ad6f4df&stat=instructions I profiled the sqlite case and found that while LVI cost does increase a bit, the main contribution are second-order effects. As such I'm happy enough with performance to reapply this.

Furthermore this change seems to have a larger impact on optimization than I expected, with larger code-size changes for some benchmarks: https://llvm-compile-time-tracker.com/compare.php?from=f87b785abee0da8939fdd5900a982311b4c25409&to=f3490beadb768e921e531fd61450a7c5bfa84f2d&stat=size-text Apparently being able to eliminate a null branch at the end of the block where the pointer is dereferenced is quite valuable.

Interesting! I am wondering if it might be worth to handle simplifications based on dereferenceability in a separate pass, rather than making the caching more complicated. I am not sure how much benefit we get from other places in LVI knowing that a ptr is not null, but I would expect it not too be too much.

I've put up a simple prototype that just keeps track of dereferenced objects based on dominance D82299 and eliminates null checks in a single traversal of a sorted list of dereferences/compares.

Thanks for looking into this! Some thoughts on the tradeoff between a separate pass and having this in LVI:

  • LVI gets nonnull information from three sources: isKnownNonZero for basic cases, pointer dereferences, and pointer comparisons. One important aspect here is that this information is combined, and not limited to dominating cases only. For example, if you have two incoming edges and the pointer is dereferenced in both predecessor blocks, LVI knows that is is non-null (even though neither dereference is dominating). Similarly, if the pointer is dereferenced in one predecessor, but the other one is guarded by a !== null check, we still get the same result.
  • CVP uses LVI nonnull information both for compares and for call nonnull attributes. Handling those two in a separate pass is easy.
  • JumpThreading is unfortunately outside my area of familiarity, but I believe this is the part where having nonnull-reasoning inside LVI rather than separately is important. Jump threading does a lot of reasoning about predicates on edges and queries LVI to do so. Dropping this functionality from LVI would probably make threading of !== null less effective.

The last point can indeed not really be solved by a separate patch unfortunately. I also won't have much time to look into the separate patch in the near future, so I don't think we should wait with this patch until then.

This revision was automatically updated to reflect the committed changes.