This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Checkers/
-
StaticAnalyzer/
-
Checkers/
2/2
ArrayBoundCheckerV2.cpp
-
test/Analysis/
-
Analysis/
-
out-of-bounds-new.cpp

Differential D159107

[analyzer] ArrayBoundCheckerV2 should disallow forming lvalues to out-of-bounds locations
AbandonedPublic

Authored by steakhal on Aug 29 2023, 8:21 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
donat.nagy
Szelethus

Summary

After this patch, we no longer need the check::Location callback, as we
caught bugs earlier when forming bad references - aka. before loading or
binding anything to it.

(CWE-122 Heap Based Buffer Overflow: CWE-805-class-loop)

Depends on D159106

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steakhal created this revision.Aug 29 2023, 8:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 8:21 AM

Herald added subscribers: manas, ASDenysPetrov, martong and 5 others. · View Herald Transcript

steakhal requested review of this revision.Aug 29 2023, 8:21 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 29 2023, 8:21 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

steakhal added a child revision: D159108: [analyzer] CStringChecker should check the first byte of the destination of strcpy, strncpy.Aug 29 2023, 8:21 AM

Harbormaster completed remote builds in B255542: Diff 554353.Aug 29 2023, 11:32 AM

Good direction of development, this will be useful for providing better bug reports (in addition to ensuring correct behavior some situations).

Note that it's also possible to dereference pointers with the operator ->, which is represented by MemberExprs in the AST; we should probably handle that as if it was a UO_Deref.

There is also a small corner case that for an array some_type arr[N] it's well-defined to form the past-the-end pointer as &arr[N] (instead of arr + N) -- while any other use of arr[N] is undefined behavior. If this occurs in practice, then we'll probably need some additional logic to handle it. (Note that the check::Location implementation dodged this question, because it didn't report anything when the program formed &arr[N], but later created a bug report when this pointer value was dereferenced.)

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
34	Which testcase would break without the `check::Bind` callback? (Not action needed, I'm just curious.)

In D159107#4627903, @donat.nagy wrote:

Good direction of development, this will be useful for providing better bug reports (in addition to ensuring correct behavior some situations).
Note that it's also possible to dereference pointers with the operator ->, which is represented by MemberExprs in the AST; we should probably handle that as if it was a UO_Deref.

+1, +1

There is also a small corner case that for an array some_type arr[N] it's well-defined to form the past-the-end pointer as &arr[N] (instead of arr + N) -- while any other use of arr[N] is undefined behavior. If this occurs in practice, then we'll probably need some additional logic to handle it. (Note that the check::Location implementation dodged this question, because it didn't report anything when the program formed &arr[N], but later created a bug report when this pointer value was dereferenced.)

Ah, this disappoints me. You are right. This delayed mechanism definitely needs a more careful approach.

Given these concerns and the quality of the diagnostics, I don't think it's something easy to fix. I'll abandon this for now, to focus on the low-hanging patches to upstream.

Thanks for the thoughtful review! @donat.nagy

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
34	Good question. TO my surprise, no tests would be broken if we removed this callback. Nice catch. I thought it would break the new test added in D159106, but it would still pass.

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

I'd be happy to see (a slightly extended variant of) this commit merged, because I could provide much better warning messages if I can access the concrete subscript/dereference operations. Of course if you don't have time to work on this I can put this up for review myself (probably after your other commits are handled).

In D159107#4630573, @donat.nagy wrote:

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.
For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

I'd be happy to see (a slightly extended variant of) this commit merged, because I could provide much better warning messages if I can access the concrete subscript/dereference operations. Of course if you don't have time to work on this I can put this up for review myself (probably after your other commits are handled).

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

In D159107#4630764, @steakhal wrote:

In D159107#4630573, @donat.nagy wrote:

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.

As a rough solution we can simply say that this checker ignores &arr[idx] (because it's just a different way of writing arr + idx) and only checks expressions where "real" dereference happens. This way the checker would won't emit false positives on past-the-end pointers and still report every out of bound memory access (including off-by-one errors).

This can be refined by adding a check that which validates that in an expression like &arr[idx] the index satisfies 0 <= idx && idx <= array_size. This is conceptually independent (but can use the same implementation as the real memory access check) and would add some useful reports and constraints without breaking anything.

In D159107#4630764, @steakhal wrote:

For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

Pointer arithmetic is very restricted in the standard, e.g. http://eel.is/c++draft/basic.compound says that

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or
(3.3) the null pointer value for that type, or
(3.4) an invalid pointer value.

As this explicitly rules out before-first-element pointers and they are not too common (I don't recall any code that used them), I don't think that they deserve a special case (just report them if we see them).

In D159107#4630764, @steakhal wrote:

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

Of course, it's completely reasonable to focus on a limited set of changes before the vacation.

In the meantime, I'll probably work on one or more commits that would (1) switch to using the concrete check::PostStmt callbacks instead of check::Location (and Bind) like this commit (2) improve the diagnostics, add more details to the messages. When will you return from the vacation? Should I wait for you with the review of these changes (if I can write and test them before your return)?

In D159107#4631069, @donat.nagy wrote:

In D159107#4630764, @steakhal wrote:

In D159107#4630573, @donat.nagy wrote:

I don't think that the &arr[N] issue is too serious: we can just increment the array extent when the parent expression of the array subscript operator is the unary operator &. If the past-the-end pointer ends up dereferenced later, the current code is sufficient to report it as a bug (as the checker monitors all dereferences).

My instinct suggests that there is more behind the curtain.

As a rough solution we can simply say that this checker ignores &arr[idx] (because it's just a different way of writing arr + idx) and only checks expressions where "real" dereference happens. This way the checker would won't emit false positives on past-the-end pointers and still report every out of bound memory access (including off-by-one errors).

This can be refined by adding a check that which validates that in an expression like &arr[idx] the index satisfies 0 <= idx && idx <= array_size. This is conceptually independent (but can use the same implementation as the real memory access check) and would add some useful reports and constraints without breaking anything.

It makes sense, but it would need to special case on the parent or child AST node, which isn't really clean. But I think we think the same on this subject.
(Note that the & is a parent of the arr[N], where the checker would trigger, thus we would need to check the parent node to ignore such cases; and that is not really supported in the AST, and using the ParentMap for it is expensive. Remember, that this would happen for any SubscriptExpr as many times they are visited.)

In D159107#4630764, @steakhal wrote:

For example, on an expression &arr[N] (of type int arr[10]), we would constrain N. We could say that we allow the one before and one after elements, thus introduce a constraint: N: [-1, 11]. This -1 later could participate in casts, and end up interpreted as a really large unsigned number. I should also think about how would we detect off-by-one errors, which is a common category.

Pointer arithmetic is very restricted in the standard, e.g. http://eel.is/c++draft/basic.compound says that

Every value of pointer type is one of the following:
(3.1) a pointer to an object or function (the pointer is said to point to the object or function), or
(3.2) a pointer past the end of an object ([expr.add]), or
(3.3) the null pointer value for that type, or
(3.4) an invalid pointer value.

As this explicitly rules out before-first-element pointers and they are not too common (I don't recall any code that used them), I don't think that they deserve a special case (just report them if we see them).

Yes. They are really restricted, and AFAIK only past-the-end pointers are allowed, unlike before-the-first element. Such code might be written for reverse iterations.
But we know that hardware would work like that, and people might rely on it.

In D159107#4630764, @steakhal wrote:

The thing is, that I only have this week, aka. 1.5 days before I leave for vacation. :D
In the future, (no ETA), we likely come back to the Juliet improvements.

Of course, it's completely reasonable to focus on a limited set of changes before the vacation.

In the meantime, I'll probably work on one or more commits that would (1) switch to using the concrete check::PostStmt callbacks instead of check::Location (and Bind) like this commit (2) improve the diagnostics, add more details to the messages. When will you return from the vacation? Should I wait for you with the review of these changes (if I can write and test them before your return)?

I don't think you should wait for me. I'll be off for a week though, but I might not have the bandwidth to do reviews even after that.
Post it once you are fine with it. Prove it "behaves well", and just wait for someone to review it. It might be noq, xazax or me. We will see.

steakhal removed a child revision: D159108: [analyzer] CStringChecker should check the first byte of the destination of strcpy, strncpy.Sep 11 2023, 5:13 AM

GitHub <noreply@github.com> mentioned this in rGdfdedaf6dae0: [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (#72107).Tue, Dec 5, 7:17 AM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Checkers/

ArrayBoundCheckerV2.cpp

30 lines

test/

Analysis/

out-of-bounds-new.cpp

60 lines

Diff 554353

clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp

Show All 24 Lines
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <optional>		#include <optional>

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;
using namespace taint;		using namespace taint;

namespace {		namespace {
class ArrayBoundCheckerV2 : public Checker<check::Location, check::Bind> {		class ArrayBoundCheckerV2
		: public Checker<check::Bind, check::PostStmt<ArraySubscriptExpr>,
		donat.nagyUnsubmitted Done Reply Inline Actions Which testcase would break without the `check::Bind` callback? (Not action needed, I'm just curious.) donat.nagy: Which testcase would break without the `check::Bind` callback? (Not action needed, I'm just…
		steakhalAuthorUnsubmitted Done Reply Inline Actions Good question. TO my surprise, no tests would be broken if we removed this callback. Nice catch. I thought it would break the new test added in D159106, but it would still pass. steakhal: Good question. TO my surprise, no tests would be broken if we removed this callback. Nice catch.
		check::PostStmt<UnaryOperator>> {
mutable std::unique_ptr<BugType> BT;		mutable std::unique_ptr<BugType> BT;
mutable std::unique_ptr<BugType> TaintBT;		mutable std::unique_ptr<BugType> TaintBT;

enum OOB_Kind { OOB_Precedes, OOB_Excedes };		enum OOB_Kind { OOB_Precedes, OOB_Excedes };

void reportOOB(CheckerContext &C, ProgramStateRef errorState,		void reportOOB(CheckerContext &C, ProgramStateRef errorState,
OOB_Kind kind) const;		OOB_Kind kind) const;
void reportTaintOOB(CheckerContext &C, ProgramStateRef errorState,		void reportTaintOOB(CheckerContext &C, ProgramStateRef errorState,
SVal TaintedSVal) const;		SVal TaintedSVal) const;

static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);		static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);

void impl(SVal Loc, bool isLoad, const Stmt *S, CheckerContext &C) const;		void impl(SVal Loc, bool isLoad, const Stmt *S, CheckerContext &C) const;

public:		public:
void checkLocation(SVal l, bool isLoad, const Stmt *S,
CheckerContext &C) const;
void checkBind(SVal Loc, SVal, const Stmt *S, CheckerContext &) const;		void checkBind(SVal Loc, SVal, const Stmt *S, CheckerContext &) const;
		void checkPostStmt(const ArraySubscriptExpr *E, CheckerContext &C) const;
		void checkPostStmt(const UnaryOperator *E, CheckerContext &C) const;
};		};

// FIXME: Eventually replace RegionRawOffset with this class.		// FIXME: Eventually replace RegionRawOffset with this class.
class RegionRawOffsetV2 {		class RegionRawOffsetV2 {
private:		private:
const SubRegion *baseRegion;		const SubRegion *baseRegion;
NonLoc byteOffset;		NonLoc byteOffset;

▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	auto BelowThreshold =
SVB.evalBinOpNN(State, BO_LT, Value, Threshold, SVB.getConditionType()).getAs<NonLoc>();		SVB.evalBinOpNN(State, BO_LT, Value, Threshold, SVB.getConditionType()).getAs<NonLoc>();

if (BelowThreshold)		if (BelowThreshold)
return State->assume(*BelowThreshold);		return State->assume(*BelowThreshold);

return {nullptr, nullptr};		return {nullptr, nullptr};
}		}

void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
const Stmt* LoadS,
CheckerContext &checkerContext) const {
if (isLoad)
impl(location, isLoad, LoadS, checkerContext);
}

void ArrayBoundCheckerV2::checkBind(SVal Loc, SVal, const Stmt *S,		void ArrayBoundCheckerV2::checkBind(SVal Loc, SVal, const Stmt *S,
CheckerContext &C) const {		CheckerContext &C) const {
impl(Loc, /isLoad=/false, S, C);		impl(Loc, /isLoad=/false, S, C);
}		}

void ArrayBoundCheckerV2::impl(SVal location, bool isLoad, const Stmt *LoadS,		void ArrayBoundCheckerV2::impl(SVal location, bool isLoad, const Stmt *LoadS,
CheckerContext &checkerContext) const {		CheckerContext &checkerContext) const {

▲ Show 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	if (const auto *ERegion = dyn_cast<ElementRegion>(Region)) {
// it may be useful to review it in the future.		// it may be useful to review it in the future.
return RegionRawOffsetV2(SRegion, Offset);		return RegionRawOffsetV2(SRegion, Offset);
}		}
return std::nullopt;		return std::nullopt;
}		}
return std::nullopt;		return std::nullopt;
}		}

		void ArrayBoundCheckerV2::checkPostStmt(const ArraySubscriptExpr *E,
		CheckerContext &C) const {
		// TODO: Specialize the diagnostic message.
		// It might not be a "load" operation.
		impl(C.getSVal(E), /isLoad=/true, E, C);
		}
		void ArrayBoundCheckerV2::checkPostStmt(const UnaryOperator *E,
		CheckerContext &C) const {
		if (E->getOpcode() != UO_Deref)
		return;
		// TODO: Specialize the diagnostic message.
		// It might not be a "load" operation.
		impl(C.getSVal(E), /isLoad=/true, E, C);
		}

void ento::registerArrayBoundCheckerV2(CheckerManager &mgr) {		void ento::registerArrayBoundCheckerV2(CheckerManager &mgr) {
mgr.registerChecker<ArrayBoundCheckerV2>();		mgr.registerChecker<ArrayBoundCheckerV2>();
}		}

bool ento::shouldRegisterArrayBoundCheckerV2(const CheckerManager &mgr) {		bool ento::shouldRegisterArrayBoundCheckerV2(const CheckerManager &mgr) {
return true;		return true;
}		}

clang/test/Analysis/out-of-bounds-new.cpp

Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines	switch (rng()) {
case -1:		case -1:
*--p = ManyInts{3,2,1}; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}		*--p = ManyInts{3,2,1}; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
break;		break;
case 1:		case 1:
*++p = ManyInts{3,2,1}; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}		*++p = ManyInts{3,2,1}; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
break;		break;
}		}
}		}

		void test_memberwise_copy() {
		ManyInts dst[1];
		ManyInts src[2] = {5};

		dst[0] = src[0]; // ok
		dst[0].a = src[0].a; // ok

		// Positive indexing:
		switch (rng()) {
		default: break;
		case 0: {
		auto &z = dst[1]; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
		break;
		}
		case 1: {
		dst[1].a = src[1].a; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
		break;
		}
		case 2: {
		auto &z = *(dst + 1); // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
		break;
		}
		case 3: {
		auto &z = *dst;
		auto &zz = 1[&z]; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
		break;
		}
		case 4: {
		dst[1] = src[1]; // expected-warning {{Out of bound memory access (access exceeds upper limit of memory block)}}
		break;
		}
		}

		// Negative indexing:
		switch (rng()) {
		default: break;
		case 0: {
		auto &z = dst[-1]; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
		break;
		}
		case 1: {
		dst[-1].a = src[1].a; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
		break;
		}
		case 2: {
		auto &z = *(dst - 1); // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
		break;
		}
		case 3: {
		auto &z = *dst;
		auto &zz = (-1)[&z]; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
		break;
		}
		case 4: {
		dst[-1] = src[1]; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
		break;
		}
		}
		}