This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Introduce misc No Integer To Pointer Cast check
ClosedPublic

Authored by lebedev.ri on Nov 9 2020, 1:48 AM.

Details

Summary

While casting an (integral) pointer to an integer is obvious - you just get
the integral value of the pointer, casting an integer to an (integral) pointer
is deceivingly different. While you will get a pointer with that integral value,
if you got that integral value via a pointer-to-integer cast originally,
the new pointer will lack the provenance information from the original pointer.

So while (integral) pointer to integer casts are effectively no-ops,
and are transparent to the optimizer, integer to (integral) pointer casts
are *NOT* transparent, and may conceal information from optimizer.

While that may be the intention, it is not always so. For example,
let's take a look at a routine to align the pointer up to the multiple of 16:
The obvious, naive implementation for that is:

char* src(char* maybe_underbiased_ptr) {
  uintptr_t maybe_underbiased_intptr = (uintptr_t)maybe_underbiased_ptr;
  uintptr_t aligned_biased_intptr = maybe_underbiased_intptr + 15;
  uintptr_t aligned_intptr = aligned_biased_intptr & (~15);
  return (char*)aligned_intptr; // warning: avoid integer to pointer casts [misc-no-inttoptr]
}

The check will rightfully diagnose that cast.

But when provenance concealment is not the goal of the code, but an accident,
this example can be rewritten as follows, without using integer to pointer cast:

char*
tgt(char* maybe_underbiased_ptr) {
    uintptr_t maybe_underbiased_intptr = (uintptr_t)maybe_underbiased_ptr;
    uintptr_t aligned_biased_intptr = maybe_underbiased_intptr + 15;
    uintptr_t aligned_intptr = aligned_biased_intptr & (~15);
    uintptr_t bias = aligned_intptr - maybe_underbiased_intptr;
    return maybe_underbiased_ptr + bias;
}

See also:

Diff Detail

Event Timeline

lebedev.ri created this revision.Nov 9 2020, 1:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 9 2020, 1:48 AM
lebedev.ri requested review of this revision.Nov 9 2020, 1:48 AM
nlopes added a comment.Nov 9 2020, 2:04 AM

Nice!
BTW, another popular idiom is to store data in the last few bits of the pointer (e.g., LLVM's own PointerIntPair). I guess that one can also be implement by casting the ptr to char* and doing operations over that.

Out of curiosity: why is this in clang-tidy and not in clang?

lebedev.ri added a comment.EditedNov 9 2020, 2:11 AM

Nice!

Why thank you.

BTW, another popular idiom is to store data in the last few bits of the pointer (e.g., LLVM's own PointerIntPair). I guess that one can also be implement by casting the ptr to char* and doing operations over that.

Yep, there is a number of such patterns.
My basic idea behind this patch is that it is very much non-obvious whether

  1. such a cast was actually intended to be there
  2. the provenance obscurement is intentional

so let's just diagnose everything, and the user can either silence it, or rewrite it without the cast.

Out of curiosity: why is this in clang-tidy and not in clang?

Why should this be in clang?
I can't imagine it will pass the bar of no false-positives,
and silencing clang diag is not very convient..

Nice! I wonder if this could be an off-by-default (maybe enabled by -Wpedantic) warning in clang instead?

Diagnostics like this could be very useful for CHERI since our capabilities are in some way tracking provenance in hardware: we have bounds, permissions and a validity bit for all pointers.
All pointers must be derived from another valid pointer (CHERI capability) otherwise the validity bit is cleared.
Casting from an integer to a pointer always results in a value without provenance (lacking a tag bit in hardware) and will result in traps when dereferenced.
We do however treat uintptr_t as carrying provenance (casts retain the same permissions+bounds+validity bit in hardware as the original pointer).

Out compiler has some diagnostics to flag int -> pointer casts, so we warn about the example code when using long as the intermediate value (since that will cause run-time crashes if the value is dereferenced), but we do not warn for uintptr_t:
CHERI compiler explorer

I just checked whether we emit diagnotics for the testscases here and discovered that we only diagnose explicit casts from (non-uintptr_t) int -> pointer as lacking provenance, but are missing warnings for the -Wint-conversion case.

CC'ing @rsmith regarding the suggestion/question of also having a clang diagnostic for this.

Eugene.Zelenko added a project: Restricted Project.

I did not consider false positives, but it provides better visibility. It is always on.

CC'ing @rsmith regarding the suggestion/question of also having a clang diagnostic for this.

I'm not Richard, but I have some thoughts just the same. :-) The C committee is currently exploring questions of pointer provenance that may have impact in this area. You can see the proposed text for a TS on pointer provenance here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2577.pdf The paper goes into a lot of detail about various provenance choices and how those choices impact existing code, implementations, optimization opportunities, etc. and my concern with adding a provenance check to the Clang frontend at this time is that LLVM may want to implement changes from that TS which may cause unfortunate code churn for users who enable the warning in Clang. I think keeping this check in clang-tidy gives us good utility for catching bugs today, but also gives us a safer place to react to changes in the C standard. At some point, I expect the question of how to track provenance to settle down within the committee and our implementation, and that would be a good time to consider lifting the analysis into Clang. CCing @nlopes @hfinkel @jeroen.dobbelaere and @jdoerfert to raise awareness on the provenance TS in case it wasn't yet on their radar.

Ping.
If there's no pushback within next 24h i will commit this and wait for post-commit review (if any).
Thanks.

Ping.
If there's no pushback within next 24h i will commit this and wait for post-commit review (if any).
Thanks.

FWIW, that's not the way we usually do things, so please don't push and hope for post-commit review on something people have expressed questions/concerns over.

clang-tools-extra/clang-tidy/misc/MiscTidyModule.cpp
40 ↗(On Diff #303775)

Is misc really the right place for this? If this is related to optimizer behavior, perhaps it should live in performance?

clang-tools-extra/clang-tidy/misc/NoIntToPtrCheck.cpp
26 ↗(On Diff #303775)

This doesn't really tell the user what's wrong with their code or how to fix it. There are valid use cases for casting an integer into a pointer, after all. For instance, if I'm doing this cast because I'm trying to read a register bank:

volatile int *registers = (volatile int *)0xFEEDFACE;

what is wrong with my code and what should I do to silence this diagnostic?

Intuitively, one would think that use of intptr_t or uintptr_t would have some impact on this given the standard's requirements for that type in C17 7.20.1.4p1. Could there be a certain cast path that can be used to silence the diagnostic because it's safe enough for the optimizer's needs?

clang-tools-extra/docs/clang-tidy/checks/misc-no-inttoptr.rst
6 ↗(On Diff #303775)
char buffer[16];
sprintf(buffer, "%p", ptr);

intptr_t val;
sscanf(buffer, "%" SCNxPTR, &val);

Do you think this check should catch such constructs under the same reasoning as other conversions that hide provenance?

12 ↗(On Diff #303775)

Does this run afoul of the C++ standard's requirements for pointer traceability: http://eel.is/c++draft/basic.stc.dynamic.safety ?

clang-tools-extra/test/clang-tidy/checkers/misc-no-inttoptr.cpp
3 ↗(On Diff #303775)

can not -> cannot
(same below)

aqjune added a subscriber: aqjune.Nov 27 2020, 11:42 AM
aqjune added inline comments.
clang-tools-extra/docs/clang-tidy/checks/misc-no-inttoptr.rst
12 ↗(On Diff #303775)

This is fine because the compiled IR is allowed to be more defined than the C++ source.

In C++, accessing a pointer that is derived from an integer is UB if it is out-of-bounds location of the object pointed by the integer.
In LLVM IR (I mean in the suggested memory model), it is relaxed to have well-defined behavior if the pointer is pointing to a different, but in-bounds location of an object.
This loses the precision of points-to analysis if inttoptr is involved, but makes more arithmetic optimizations on integers valid.

lebedev.ri updated this revision to Diff 308683.Dec 1 2020, 9:09 AM
lebedev.ri marked 6 inline comments as done.

@aaron.ballman @aqjune thank you for taking a look!

Rebased, mostly addressed review notes:

  • Moved into performance module
  • Don't diagnose casts of integer literals

There are a couple of questions from the previous iteration of the review that aren't answered yet:

  • the diagnostic wording doesn't actually say what's wrong with the code or how to fix it; how should users silence the diagnostic if they've found a case where it's a false positive for some reason?
  • do you expect this check to catch other hidden provenance gotchas like grabbing values through printf/scanf, fread, etc)? (Not for this patch iteration, just wondering if you expect to extend the check in the future.)
clang-tools-extra/clang-tidy/performance/NoIntToPtrCheck.cpp
22

Do we also want to exclude the case where the destination is a volatile pointer on the assumption that something out of the ordinary is going on. e.g.,

intptr_t addr;
if (something) {
  addr = SOME_CONSTANT;
} else {
  addr = SOME_OTHER_CONSTANT;
}
volatile int *register_bank = (volatile int *)addr;
clang-tools-extra/clang-tidy/performance/PerformanceTidyModule.cpp
53

Does anyone else have opinions on inttoptr vs int-to-ptr? I feel like the latter is easier to read than the former, but I'm wondering if I'm just strange.

clang-tools-extra/docs/clang-tidy/checks/list.rst
23

It looks like there's a whole bunch of unrelated changes happening here?

Ugh, i really hate that inline comments aren't submitted upon uploading the new diff, it catches me off-guard each time...

clang-tools-extra/clang-tidy/misc/MiscTidyModule.cpp
40 ↗(On Diff #303775)

Yeah, i'm not sure. performance does seem like a marginally better place for this, so let's go with that.

clang-tools-extra/clang-tidy/misc/NoIntToPtrCheck.cpp
26 ↗(On Diff #303775)

A cast from literal is a good point, we should not warn on those,
especially because those are most likely some predefined hardcoded hw registers.

Intuitively, one would think that use of intptr_t or uintptr_t would have some impact on this given the standard's requirements for that type in C17 7.20.1.4p1. Could there be a certain cast path that can be used to silence the diagnostic because it's safe enough for the optimizer's needs?

Do you mean

uintptr_t x = <>;
void *y = (void*)(x)

or

void *x = <>;
void *y = (void*)((uintptr_t)x)

The former is literally the case this check should warn on.
As for the latter, also not really, because like i have already stated,
such cast pair is used to be treated as opaque (and dropped by the optimizer),
but that is changing.

TLDR: no

clang-tools-extra/docs/clang-tidy/checks/misc-no-inttoptr.rst
6 ↗(On Diff #303775)

I'm open to suggestions as to how to title it more precisely given the idea underneath.

To recap, the check should warn on all the casts that may involve an unintentional,
subtle, unobvious cast from an integer to a pointer.
That example doesn't strike me as potentially unintentional,
while (void*)((uintptr_t)x & -16) does.

The obvious cast i guess i'd be fine with would be synthesize_pointer_from_integer_cast<>(),
but there isn't one now.

aaron.ballman added inline comments.Dec 7 2020, 10:22 AM
clang-tools-extra/clang-tidy/misc/NoIntToPtrCheck.cpp
26 ↗(On Diff #303775)

Okay, thank you for the explanation. I think it's fine to not have a way to silence the diagnostic (users can use // NOLINT, pragmas, etc), but I think the diagnostic should be made a bit more verbose so it's clear why. How about: integer to pointer cast pessimizes optimization opportunities?

clang-tools-extra/docs/clang-tidy/checks/misc-no-inttoptr.rst
6 ↗(On Diff #303775)

Okay, I think the check is fine as-is then. This isn't a "catch all the ways users can lose provenance information in a way that matters" check, it's a "catch the most common ways users do that" check. If we need a stronger variant of the check, we can add it to the static analyzer at that point.

lebedev.ri marked an inline comment as done.

@aaron.ballman thank you for taking a look!
Addressing review notes:

  • s/inttoptr/int-to-ptr/
  • Adjust diagnostic to be more descriptive of the problem
lebedev.ri marked an inline comment as done.Dec 7 2020, 11:53 AM
aaron.ballman accepted this revision.Dec 8 2020, 9:38 AM

LGTM!

clang-tools-extra/clang-tidy/performance/NoIntToPtrCheck.cpp
22

On reflection, I think we can make an adjustment here if there's user feedback that it's a frequent false positive.

This revision is now accepted and ready to land.Dec 8 2020, 9:38 AM

LGTM!

Thank you for the review!

This revision was landed with ongoing or failed builds.Dec 8 2020, 11:56 AM
This revision was automatically updated to reflect the committed changes.