This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Add a boolean flag to `Address::getPointer` and `Lvalue::getPointer` that indicates whether the pointer is known not to be null
ClosedPublic

Authored by ahatanak on Jan 25 2023, 2:50 PM.

Details

Summary

The flag will be used for the arm64e work we plan to upstream in the future (see https://lists.llvm.org/pipermail/llvm-dev/2019-October/136091.html). Passing KnownNonNull_t::True allows clang to avoid emitting null checks on arm64e. Currently the flag has no effect on code generation on other targets.

This is the first patch and there are many more places where KnownNonNull_t::True can be passed. I plan to fix those places in future patches.

Diff Detail

Event Timeline

ahatanak created this revision.Jan 25 2023, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 2:50 PM
ahatanak requested review of this revision.Jan 25 2023, 2:50 PM

I'm having a little trouble following what the meaning of an "Address" is with this patch. The Pointer member refers to an encoded value, and calling getPointer() emits IR to decode it? Could use a few comments to explain the API.

rjmccall added a comment.EditedJan 25 2023, 7:58 PM

I'm having a little trouble following what the meaning of an "Address" is with this patch. The Pointer member refers to an encoded value, and calling getPointer() emits IR to decode it? Could use a few comments to explain the API.

That's something of a future direction w.r.t this patch, but it's reasonable to talk about it, yeah.

Pointer authentication includes the ability to have signed data pointers. Function pointers can only be used opaquely in C, but data pointers are subject to a lot of operations that boil down to pointer arithmetic, both as part of l-value accesses (such as signedPtr->array[i] = 5) and just in abstract pointer-typed expressions (such as &ptr->array[i]). For l-values that are ultimately accessed, we really want to make sure that we do the authentication as close as possible to the actual access rather than authenticating it immediately and then having a raw, attackable pointer sitting around while we set up the access. For pointer expressions in general, if they ultimately need to produce another signed pointer (e.g. we assign into a __ptrauth-qualified l-value), for similar reasons we really want to set up all the arithmetic ahead of time for a secure auth-add-and-resign sequence instead of authenticating and having raw, attackable pointers sitting around as we do the arithmetic. We could do all that with a specialized pointer-with-signature emitter that covers literally every kind of pointer arithmetic, but that requires duplication of a *lot* of code, especially because we have a lot of sanitizers and other special logic also tied to those operations. It seemed much less invasive to add a state to Address where it can be a pair of a signed pointer and an offset, then make a few basic GEP-formation operations accumulate into the offset when the pointer is in that state. Of course, converting an Address into a normal pointer then becomes a non-trivial operation that needs a CGF.

Pointer authentication doesn't sign null pointers, so we have to do some explicit extra checks as part of this when the pointer is possibly null, and this patch is trying to make that easier to avoid.

clang/lib/CodeGen/Address.h
26

The usual idiom for this kind of boolean enum is that you just use an unscoped enum (not enum class), and you spell the enumerators something that reads well where it'll be used. So you'd make the enumerators KnownNotNull and (I guess) something like NotKnownNotNull or NullnessUnknown.

113

Apologies if this rehashes a conversation we had earlier, but does it make more sense to track this internally in Address? Some Address sources definitely known that they're non-null, and the places where we only know that *contextually* could definitely just pass it down to e.g. EmitPointerWithAlignment. That would make it a less invasive change for all the clients of getPointer().

ahatanak added inline comments.Jan 26 2023, 12:45 PM
clang/lib/CodeGen/Address.h
113

Yes, tracking it in Address and LValue is another way to fix the problem. The implementation in my local branch currently adds an IsKnownNonNull flag to the constructor of Address and method setKnownNonNull to both Address and LValue. The method is called when the pointer is known not to be null contextually.

rjmccall added inline comments.Jan 26 2023, 2:18 PM
clang/lib/CodeGen/Address.h
113

Okay. It might make sense to pass this into EmitPointerWithAlignment contextually instead of just relying on changing it afterwards, since it's possible that some of the work there can be optimized when the pointer is contextually not allowed to be null. (For example, C++ derived-to-base conversions, although I think we might optimize those already by using a different CastKind when there's a contextual requirement to be non-null (e.g. when it's the this argument of a method call). But I'm sure there are situations that doesn't kick in that would be optimized, like if you static_cast the pointer before calling a method on it.)

ahatanak updated this revision to Diff 492843.Jan 27 2023, 10:40 AM

Add a bit to Address and LValue that tracks whether the pointer is known not to be null.

There are more places where we know contextually the pointer can't be null and can set the bit to KnownNonNull or set the bit of an LValue or Address using an existing Address's KnownNonNull bit.

ahatanak updated this revision to Diff 492860.Jan 27 2023, 11:33 AM

Fix a few typos that were causing compile errors.

ahatanak added inline comments.Jan 27 2023, 2:34 PM
clang/lib/CodeGen/Address.h
147

This isn't safe. We cannot tell whether the new pointer is non-null or not based on whether it's currently non-null.

ahatanak updated this revision to Diff 492940.Jan 27 2023, 4:02 PM

Pass a KnownNonNull_t flag to Address::withPointer.

This seems reasonable to me. Eli, are your questions answered?

@efriedma do you have any comments?

The approach here makes sense.

clang/lib/CodeGen/Address.h
65–66

Do you need to do something with IsKnownNonNull in the Alignment.isZero() case?

68–69

This comment isn't right. The max alignment is, as far as I can tell, 1<<32 exactly. (But there's something weird going on with very large values... somehow int a[1LL<<32] __attribute((aligned(1ULL<<32))) = {}; ignores the alignment.)

clang/lib/CodeGen/CGBuilder.h
164

Do address-space casts preserve non-null?

277

Do non-inbounds GEPs preserve nonnull?

clang/lib/CodeGen/CGValue.h
181

Do we not have spare bits in the bitfield?

ahatanak updated this revision to Diff 496132.Feb 9 2023, 7:53 AM
ahatanak marked 4 inline comments as done.

Address review comments.

clang/lib/CodeGen/Address.h
65–66

We can probably get away with not doing anything here because currently the alignment is zero only when Address::invalid is called.

68–69

The following function generated by tablegen (and a few others directly or indirectly calling the function) returns a 32-bit int, but it should be returning a 64-bit int.

https://github.com/llvm/llvm-project/blob/main/clang/utils/TableGen/ClangAttrEmitter.cpp#L532

clang/lib/CodeGen/CGBuilder.h
164

I think we can preserve non-null here. We just can't assume that KnownNonNull indicates the pointer is non-zero when the address space isn't the default address space.

ahatanak added inline comments.Feb 9 2023, 8:25 AM
clang/lib/CodeGen/CGExpr.cpp
1723

KnownNonNull is being passed here because the address of a load must be non-null. Do we have to check that the function doesn't have null_pointer_is_valid here?

ahatanak updated this revision to Diff 496305.Feb 9 2023, 6:27 PM
  • Set the KnownNonNull bit correctly when Address::withAlignment is called.
  • Pass NotKnownNonNull to the call to Addr.withPointer when the address of thread local storage is created.
ahatanak added inline comments.Feb 9 2023, 6:34 PM
clang/lib/CodeGen/CGExpr.cpp
1723

I don't think we should pass KnownNonNull unless we know for sure the address of the global variable is non-null, which isn't always true (for example, when it's extern weak).

We can set KnownNonNull below this line if we know NullPointerIsValid is false.

efriedma accepted this revision.Feb 14 2023, 1:04 PM

LGTM

clang/lib/CodeGen/Address.h
68–69

Filed https://github.com/llvm/llvm-project/issues/60752 so we don't lose track of this.

clang/lib/CodeGen/CGExpr.cpp
1723

If you're assuming something is non-null based on the fact that loads from null trap, then yes, you need to check NullPointerIsValid.

If you want to see the current logic LLVM uses to figure out whether a global can be null, see evaluateICmpRelation in llvm/lib/IR/ConstantFold.cpp. But in this context, you probably don't need that.

This revision is now accepted and ready to land.Feb 14 2023, 1:04 PM
ahatanak updated this revision to Diff 497719.Feb 15 2023, 10:07 AM

Fix type and add comment. Remove unnecessary cast.

This revision was landed with ongoing or failed builds.Feb 15 2023, 10:17 AM
This revision was automatically updated to reflect the committed changes.
ahatanak added inline comments.Feb 15 2023, 5:11 PM
clang/lib/CodeGen/Address.h
68–69

I just realized we can't reduce the number of bits used for alignment here as we need 6 bits for alignment of 1 << 32.

Should we allocate additional memory when AlignLog is either 31 or 32? If the 5-bit alignment is equal to 0b11111, it would mean that there is an out-of-line storage large enough to hold the alignment and any other extra information that is needed. I think https://reviews.llvm.org/D117262#3267899 proposes a similar idea.

efriedma added inline comments.Feb 16 2023, 10:32 AM
clang/lib/CodeGen/Address.h
68–69

How much does sizeof(Address) actually matter, anyway? If it's going to get that nasty to implement the packing, I'm not sure it's worth the effort to optimize.

ahatanak added inline comments.
clang/lib/CodeGen/Address.h
68–69

I'm not sure, but apparently https://reviews.llvm.org/D117262 was needed to reduce the memory usage.

I don't see a significant increase in stack usage (a little over 1%) in the files in clang/lib/CodeGen when I build clang with -fstack-usage.

ahatanak added inline comments.Feb 16 2023, 6:23 PM
clang/lib/CodeGen/Address.h
68–69

@aeubanks do we have to use the specialization AddressImpl<T, true>?

https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/Address.h#L31

aeubanks added inline comments.Feb 17 2023, 9:32 AM
clang/lib/CodeGen/Address.h
68–69

looking back at the internal bug motivating the patch, it was because some we had to lower some thresholds in a stress test, so that in it of itself isn't super important. but Address is used a lot and it's nice to keep resource usage down (1% isn't trivial). does this noticeably affect max RSS? could run a potential patch through llvm-compile-time-tracker (I can help with that) to see the impact

ahatanak added inline comments.Feb 17 2023, 3:58 PM
clang/lib/CodeGen/Address.h
68–69

Note that 1% is the increase in the files in clang/lib/CodeGen. The files in other directories (e.g., clang/lib/Sema) aren't affected by the change as they don't use Address.

I see an increase of 0.33% in max RSS when I run the tests in clang/test/CodeGen*, but I'm not sure how reliable that is.

How do you run the potential patch through llvm-compile-time-tracker? The patch just changes the type to AddressImpl<void, false> A; on this line:

https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/Address.h#L92