This is an archive of the discontinued LLVM Phabricator instance.

[clang][UBSan] Passing underaligned pointer as a function argument is undefined behaviour
Needs ReviewPublic

Authored by lebedev.ri on Apr 7 2021, 7:48 AM.

Details

Summary

This is somewhat related to the following RFC by @rjmccall:
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html

C 6.3.2.3p7 (N1548) says:

A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned) for the referenced type, the behavior is undefined.

C++ [expr.reinterpret.cast]p7 (N4527) defines pointer conversions in terms
of conversions from void*:

An object pointer can be explicitly converted to an object pointer
of a different type. When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result
is static_cast<cv T*>(static_cast<cv void*>(v)).

C++ [expr.static.cast]p13 says of conversions from void*:

A prvalue of type “pointer to cv1 void” can be converted to a
prvalue of type “pointer to cv2 T” .... If the original pointer value
represents the address A of a byte in memory and A satisfies the alignment
requirement of T, then the resulting pointer value represents the same
address as the original pointer value, that is, A. The result of any
other such pointer conversion is unspecified.

Currently clang does not optimize based on the fact that the pointers
passed into the function must be appropriately aligned for their pointee type.
We now have a motivation to change that. Namely, it was estabilished that
not doing so prevents LICM, and it is likely not the only penalized transform.
See D99249.

Now, somehow i don't think we can just start optimizing on that.
Let's first give users tools to weed out the cases where the pointer
passed into function's argument is underaligned.

I've yet to actually see just how much mayhem this causes,
at least on the LLVM stage-2 build.

Diff Detail

Event Timeline

lebedev.ri created this revision.Apr 7 2021, 7:48 AM
lebedev.ri requested review of this revision.Apr 7 2021, 7:48 AM
Herald added subscribers: Restricted Project, cfe-commits. · View Herald TranscriptApr 7 2021, 7:48 AM
jkorous added a subscriber: jkorous.Apr 7 2021, 9:54 AM

Fix usage of clang function argument -> llvm function argument mapping reading.

This is somewhat related to the following RFC by @rjmccall:
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html

It looks to me like this direction is in conflict with that RFC. In particular, it says:

It is not undefined behavior to create a pointer that is less aligned
than its pointee type. Instead, it is only undefined behavior to
access memory through a pointer that is less aligned than its pointee
type.

... and gives as an example that passing an underaligned pointer to a function that doesn't perform an underaligned access is OK under the proposed rule. Putting this behind a -fstrict-* flag (as this patch does) seems OK to me, but I think enabling it by default on any target would be inconsistent with the RFC and would need broader discussion and a change in policy.

I think it is reasonable to add stricter UBSan checks for underaligned pointers, but that should be available regardless of what properties our implementation happens to treat as defined; part of the point of UBSan is to check for portability issues and to check whether problems would arise as a prerequisite for enabling the corresponding -fstrict flag. So I do not think the checks should be gated behind the -fstrict flag. I don't think this is analogous to -fwrapv, which actually opts into a different set of language rules where integer overflow is defined; this is more like -fstrict-enums, for which we will still sanitize even in cases where we happen to choose to not optimize, or the sanitizer for floating-point division by zero, for which we still sanitize (because it's formally UB) but don't include in -fsanitize=undefined (because our implementation defines it). Generally-speaking, things controlled by -fstrict-* flags are checked by UBSan and in particular by -fsanitize=undefined regardless of whether the -fstrict-* flag is specified, and I think there are some cases where -fsanitize=undefined already diagnoses misalignment that is valid under John's RFC, so adding the new checks to the -fsanitize=undefined sanitizer group (regardless of -fstrict) seems consistent to me.

If we want to add a sanitizer that checks for misaligned pointers existing, instead of checking (as our current alignment sanitizer does) for misaligned accesses, then I think it would be more precise (and add a lot fewer checks) to enforce the language rule as written, that is, check for pointer and reference casts to a more-aligned type, rather than checking function arguments. I assume your intent here is to eventually add LLVM parameter attributes indicating the function argument is correctly-aligned, which is why you're checking on call boundaries?

Let's first give users tools to weed out the cases where the pointer
passed into function's argument is underaligned.

Isn't there already -Walign-mismatch in clang? We definitely have 1 unresolved case of that affecting the Linux kernel ATM: https://github.com/ClangBuiltLinux/linux/issues/1328

clang/docs/ReleaseNotes.rst
52

s/was//

lebedev.ri added a comment.EditedApr 7 2021, 11:36 AM

This is somewhat related to the following RFC by @rjmccall:
https://lists.llvm.org/pipermail/llvm-dev/2016-January/094012.html

It looks to me like this direction is in conflict with that RFC. In particular, it says:

It is not undefined behavior to create a pointer that is less aligned
than its pointee type. Instead, it is only undefined behavior to
access memory through a pointer that is less aligned than its pointee
type.

... and gives as an example that passing an underaligned pointer to a function that doesn't perform an underaligned access is OK under the proposed rule.

Yes. We now have a motivational reason to do so, it wouldn't be "just expose the UB for the sake of miscompiling the program".

Putting this behind a -fstrict-* flag (as this patch does) seems OK to me, but I think enabling it by default on any target would be inconsistent with the RFC and would need broader discussion and a change in policy.

I suppose so.

I think it is reasonable to add stricter UBSan checks for underaligned pointers, but that should be available regardless of what properties our implementation happens to treat as defined; part of the point of UBSan is to check for portability issues and to check whether problems would arise as a prerequisite for enabling the corresponding -fstrict flag. So I do not think the checks should be gated behind the -fstrict flag. I don't think this is analogous to -fwrapv, which actually opts into a different set of language rules where integer overflow is defined; this is more like -fstrict-enums, for which we will still sanitize even in cases where we happen to choose to not optimize, or the sanitizer for floating-point division by zero, for which we still sanitize (because it's formally UB) but don't include in -fsanitize=undefined (because our implementation defines it). Generally-speaking, things controlled by -fstrict-* flags are checked by UBSan and in particular by -fsanitize=undefined regardless of whether the -fstrict-* flag is specified, and I think there are some cases where -fsanitize=undefined already diagnoses misalignment that is valid under John's RFC, so adding the new checks to the -fsanitize=undefined sanitizer group (regardless of -fstrict) seems consistent to me.

Aha. I was also very uneasy about doing the check behind the -fstrict.
Will change it to be unconditional!

If we want to add a sanitizer that checks for misaligned pointers existing, instead of checking (as our current alignment sanitizer does) for misaligned accesses, then I think it would be more precise (and add a lot fewer checks) to enforce the language rule as written, that is, check for pointer and reference casts to a more-aligned type, rather than checking function arguments. I assume your intent here is to eventually add LLVM parameter attributes indicating the function argument is correctly-aligned, which is why you're checking on call boundaries?

It's complicated. I would like this to be more strict, and catch any misaligned pointers, yes.
But, two things:

  1. it's going to be much more noisy than only dealing with function arguments :)
  2. even if we are okay with that, handling this *just* on the callers side (when the pointer is created) is a wrong solution here. we still need to do this on the callee side. because we can't be sure that the caller will be compiled with sanitizer, but we *know* that the callee will be micompiled by us.

So at most i could handle it in more cases, but this one should stay..
Does this make sense?

Let's first give users tools to weed out the cases where the pointer
passed into function's argument is underaligned.

Isn't there already -Walign-mismatch in clang? We definitely have 1 unresolved case of that affecting the Linux kernel ATM: https://github.com/ClangBuiltLinux/linux/issues/1328

Front-end warnings won't really help here.

lebedev.ri marked an inline comment as done.

Unguard the UBSan check under -fstrict*.

I assume your intent here is to eventually add LLVM parameter attributes indicating the function argument is correctly-aligned, which is why you're checking on call boundaries?

Yep, D99791.

Yes. We now have a motivational reason to do so, it wouldn't be "just expose the UB for the sake of miscompiling the program".

Nobody thinks you're proposing this just for the sake of miscompiling. We understand that there are optimization benefits to having this information. We just don't think you get to have this information in general, because of the de facto rules of the language. Knowing that argument pointers don't alias would also be valuable for optimization, but C doesn't guarantee that; this is not that different, it's just that the language rule is a self-imposed rule of our implementation instead of being written up in a standard.

Are you certain that knowing argument pointer alignment is the only way of achieving this optimization? That's surprising to me because I would expect that most optimizations would also need dereferenceability, which of course you can't get. And if you can see (and hoist) a load to infer dereferenceability then you can also infer alignment from that.

ychen added a subscriber: ychen.Apr 7 2021, 1:01 PM
vitalybuka resigned from this revision.Apr 9 2021, 1:37 PM