This is an archive of the discontinued LLVM Phabricator instance.

[CaptureTracking] Do not capture compares of same object
Needs ReviewPublic

Authored by caojoshua on Jun 6 2023, 12:20 AM.

Details

Summary

Compares of the same object do not leak any bits.

This patch introduces getUnderlyingObjectLookThrough. It looks at the
output of getUnderlyingObject. If it is a PHI, it looks at all the
incoming underlying objects. If all those objects are the same, or the
original PHI, we determine that there is a new underlying object. This
is similar to getUnderlyingObjects, but provides a more efficient way to
find a single underlying object.

This is an attempt at solving huge compile time regressions in
https://reviews.llvm.org/D152082. First, we only look through a single
PHI, not nested PHIs. Second, we only use one callsite. There are likely
other callsites that could take advantage of this over the vanilla
getUnderlyingObjects. We need to be careful about compile times. Adding
this to BasicAA::aliasCheck increases compile times by 3% on local
builds.

This can hopefully lead to improved rustc generated code in
https://github.com/rust-lang/rust/issues/111603. rustc generates
pointers comparisons that this patch can identify as non capturing.

Diff Detail

Event Timeline

caojoshua created this revision.Jun 6 2023, 12:20 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 12:20 AM

@nikic could you try running compile time on this? Uploaded to https://github.com/caojoshua/llvm-project/tree/underlyingicmp.

I tried a couple runs locally. Last run saw this patch increase mean compile time of CTMark by +0.003%. I trust your runs more than the runs on my local machine.

caojoshua updated this revision to Diff 528736.Jun 6 2023, 1:05 AM
caojoshua edited the summary of this revision. (Show Details)

Update commit msg

caojoshua published this revision for review.Jun 6 2023, 1:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2023, 1:08 AM
nikic added a comment.Jun 6 2023, 2:39 AM

@nikic could you try running compile time on this? Uploaded to https://github.com/caojoshua/llvm-project/tree/underlyingicmp.

I tried a couple runs locally. Last run saw this patch increase mean compile time of CTMark by +0.003%. I trust your runs more than the runs on my local machine.

I'm not seeing any compile-time impact either. I've also added your LLVM fork to llvm-compile-time-tracker, if you'd like to test patches directly.

nikic requested changes to this revision.Jun 7 2023, 1:18 AM

Using getUnderlyingObject() here is not quite right: The underlying object is a pure provenance notion, while icmp is a pure address comparison. What we need here is that the icmp can be expressed as Base + Offset1 == Base + Offset2. This is *nearly* what getUnderlyingObject() does in practice, but not quite.

In particular, getUnderlyingObject() can also look through ptrmask intrinsics. However, ptrmask(P, M) == P clearly leaks bits of the pointers.

I think you'll have to implement a separate utility here that only looks through GEP + casts, but not things like ptrmask.

This revision now requires changes to proceed.Jun 7 2023, 1:18 AM

Using getUnderlyingObject() here is not quite right: The underlying object is a pure provenance notion, while icmp is a pure address comparison. What we need here is that the icmp can be expressed as Base + Offset1 == Base + Offset2. This is *nearly* what getUnderlyingObject() does in practice, but not quite.

In particular, getUnderlyingObject() can also look through ptrmask intrinsics. However, ptrmask(P, M) == P clearly leaks bits of the pointers.

The case being because depending on M is indicates which bits in P are zero?
If so then I think you can only do this for equality comparisons, otherwise something like:
Base + Offset1 < Base + Offset2 can probably end up leaking some bits (depending on what is known
about Offset1/Offset2) because of overflows.
If that is a concern then you need to also ensure the ICmp is equality (its unchecked at the moment).

I think you'll have to implement a separate utility here that only looks through GEP + casts, but not things like ptrmask.

llvm/lib/Analysis/CaptureTracking.cpp
416

think the "to" is extra.

nikic added a comment.Jun 7 2023, 8:25 AM

Using getUnderlyingObject() here is not quite right: The underlying object is a pure provenance notion, while icmp is a pure address comparison. What we need here is that the icmp can be expressed as Base + Offset1 == Base + Offset2. This is *nearly* what getUnderlyingObject() does in practice, but not quite.

In particular, getUnderlyingObject() can also look through ptrmask intrinsics. However, ptrmask(P, M) == P clearly leaks bits of the pointers.

The case being because depending on M is indicates which bits in P are zero?
If so then I think you can only do this for equality comparisons, otherwise something like:
Base + Offset1 < Base + Offset2 can probably end up leaking some bits (depending on what is known
about Offset1/Offset2) because of overflows.
If that is a concern then you need to also ensure the ICmp is equality (its unchecked at the moment).

Yes, you are right, this also needs to check for equality comparisons. I believe it can be extended to unsigned predicates if the GEPs are inbounds, but let's start simple here...

caojoshua updated this revision to Diff 530266.Jun 10 2023, 5:00 PM
caojoshua retitled this revision from [CaptureTracking] Do not capture compares of same object to [CaptureTracking] Do not capture equality compares of same object.
caojoshua edited the summary of this revision. (Show Details)
  • Remove getLookthroughUnderlyingObject. Create a utility specifically for non leaking underlying object
  • only determine non-capture for equality comparisons

If so then I think you can only do this for equality comparisons, otherwise something like:
Base + Offset1 < Base + Offset2 can probably end up leaking some bits (depending on what is known
about Offset1/Offset2) because of overflows.
If that is a concern then you need to also ensure the ICmp is equality (its unchecked at the moment).

Thats a great point. I added a check to make sure its only equality comparisons.

Does not have to be part of this patch, but could we extend this to cover non-equality comparisons if all the GEPs are inbounds? From https://llvm.org/docs/LangRef.html#id234, inbounds GEPs are poison if there is wrapping

The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (nsw).
The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw).
The successive addition of the current address, interpreted as an unsigned number, and an offset, interpreted as a signed number, does not wrap the unsigned address space and remains in bounds of the allocated object. As a corollary, if the added offset is non-negative, the addition does not wrap in an unsigned sense (nuw).

If so then I think you can only do this for equality comparisons, otherwise something like:
Base + Offset1 < Base + Offset2 can probably end up leaking some bits (depending on what is known
about Offset1/Offset2) because of overflows.
If that is a concern then you need to also ensure the ICmp is equality (its unchecked at the moment).

Thats a great point. I added a check to make sure its only equality comparisons.

Does not have to be part of this patch, but could we extend this to cover non-equality comparisons if all the GEPs are inbounds? From https://llvm.org/docs/LangRef.html#id234, inbounds GEPs are poison if there is wrapping

If inbounds is present then comparison of two pointers that are products of GEP from the same base an entirely remove base and make the comparison entirely depend on the offsets
so it should be safe to include nocapture even for non-eq comparisons. That being said, something like:

define i1 @src_gep(ptr %p, i64 %off) {
  %p_gep = getelementptr inbounds i64, ptr %p, i64 %off
  %cmp = icmp ugt ptr %p_gep, %p
  ret i1 %cmp
}

Already folds out the GEP entirely (and as a result makes %p unused) so not sure if you actually need to change anything to get the desired behavior.

The multiplication of an index by the type size does not wrap the pointer index type in a signed sense (nsw).
The successive addition of offsets (without adding the base address) does not wrap the pointer index type in a signed sense (nsw).
The successive addition of the current address, interpreted as an unsigned number, and an offset, interpreted as a signed number, does not wrap the unsigned address space and remains in bounds of the allocated object. As a corollary, if the added offset is non-negative, the addition does not wrap in an unsigned sense (nuw).

I just realized that my comment on getUnderlyingObject() wasn't right in the following sense: While generally we can't assume that a comparison with the same underlying object is non-capturing, in this particular context, if the value were used in something like ptrmask, we would already report a capture at that point, so it would actually be fine to just use getUnderlyingObject() here.

The purpose of the getUnderlyingObject() check is effectively only to make sure that we don't handle cases like (c ? alloca : other) == ..., where the other contribution does not go through CaptureTracking. Everything else is already handled by the other CaptureTracking checks.

So I think your previous version was fine in that respect, though if you also want to handle inbounds GEP chains you'll need the separate helper anyway.

Does not have to be part of this patch, but could we extend this to cover non-equality comparisons if all the GEPs are inbounds? From https://llvm.org/docs/LangRef.html#id234, inbounds GEPs are poison if there is wrapping

It's safe to handle inbounds GEP if the predicate is unsigned. Signed predicates can not be handled.

That being said, something like:

define i1 @src_gep(ptr %p, i64 %off) {
  %p_gep = getelementptr inbounds i64, ptr %p, i64 %off
  %cmp = icmp ugt ptr %p_gep, %p
  ret i1 %cmp
}

Already folds out the GEP entirely (and as a result makes %p unused) so not sure if you actually need to change anything to get the desired behavior.

I think the main interest here is in cases involving loop phis, in which case we don't reliably do this, I believe. (Though there is the "indexed compare" fold that does try.)

llvm/lib/Analysis/CaptureTracking.cpp
81

Unnecessary?

84

Don't use std::function

90

If we handle phis, we should also handle selects.

nikic added inline comments.Jun 11 2023, 2:09 AM
llvm/lib/Analysis/CaptureTracking.cpp
90

We also need to limit recursion.

A PR was submitted to rust to change pointer inductions to index inductions for array allocations, which if merged, eliminates my main motivation for this patch.

I'm questioning the benefits of this patch. GEP inductions are certainly common, but comparisons against a GEP with (start, end) pointers are much less common when you can just iterate over indices. I don't have another real world example that supports this patch. @nikic any opinions, do you think this patch is a good enhancement to LLVM?

caojoshua retitled this revision from [CaptureTracking] Do not capture equality compares of same object to [CaptureTracking] Do not capture compares of same object.
caojoshua edited the summary of this revision. (Show Details)

Go back to the original version with getUnderlyingObjectLookthrough(). Only no-capture for equality compares. Add tests to make sure we don't escape comparisons against captured objects like ptrmask.

I just realized that my comment on getUnderlyingObject() wasn't right in the following sense: While generally we can't assume that a comparison with the same underlying object is non-capturing, in this particular context, if the value were used in something like ptrmask, we would already report a capture at that point, so it would actually be fine to just use getUnderlyingObject() here.

Yes. Can confirm this with newly added tests. Compare object against ptrmask of itself still does not capture.