This is an archive of the discontinued LLVM Phabricator instance.

Add a "remark" to report on array accesses
Needs ReviewPublic

Authored by void on Feb 15 2023, 2:01 PM.

Details

Reviewers
kees
Summary

This will report a ton of information. It's basically only good for
piping to a file and using Perl to gather any useful information.

Diff Detail

Event Timeline

void created this revision.Feb 15 2023, 2:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 2:01 PM
void requested review of this revision.Feb 15 2023, 2:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 15 2023, 2:01 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
void added a reviewer: kees.Feb 15 2023, 2:01 PM

What would the information be useful for?

kees added a comment.Feb 15 2023, 10:24 PM

This information will be useful for evaluating the coverage of the bounds checker for a given program, which in turn can help guide both improvements to the bounds checker (e.g. adding knowledge from __builtin_dynamic_object_size()) and improvements to the built code base (e.g. refactoring data structures to take advantage of known compile-time or run-time bounds that will be covered by the bounds checker). It answers the question, "What percentage of array accesses are being bounds checked?" (With the implied goal of reaching 100% going forward.)

I'd like to understand what the overhead is for this. How much overhead does this add when the remark is disabled? How much overhead does this add when the remark is enabled?

This will report a ton of information. It's basically only good for piping to a file and using Perl to gather any useful information.

This is why I'm worried about overhead -- there can be *a lot* of array accesses in a complex TU and spitting data out to (potentially) stdout/stderr is *slow*.

void added a comment.EditedFeb 16 2023, 12:24 PM

I'd like to understand what the overhead is for this. How much overhead does this add when the remark is disabled? How much overhead does this add when the remark is enabled?

This will report a ton of information. It's basically only good for piping to a file and using Perl to gather any useful information.

This is why I'm worried about overhead -- there can be *a lot* of array accesses in a complex TU and spitting data out to (potentially) stdout/stderr is *slow*.

I should have indicated that I'm not expecting this to go into mainline Clang. As you mentioned, this would be a lot of overhead and is really only useful in the way @kees mentioned. I created this "arc" patch to make it easier for him to import it. :-) Sorry for the confusion.

That said, I do value any comments you have on the patch, but feel free to skip this and follow up patches.

kees added a comment.Feb 18 2023, 11:21 AM

This appears to be working for me. For before/after changes, the other half is still needed, i.e. a "accessing array of unknown size" and eventually splitting the dynamic sizing check off of that one (once -fsanitize=bounds checks __builtin_dynamic_object_size).

For example, comparing various development builds over time, if some source had 49 array accesses:

initial code: fixed:5 unknown:44
code refactored: fixed:10 unknown:39
bdos added to bounds checker: fixed:10 dynamic:4 unknown:35
code refactoring: fixed:10 dynamic:28 unknown:11

kees added a comment.Feb 18 2023, 11:55 AM

Here's a test-case. I'd expect 6 remarks from building this:

/* Build with -Wall -O2 -fstrict-flex-arrays=3 -fsanitize=bounds -Rarray-bounds */
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>

#define report_size(p, index)      do {    \
    const size_t bdos = __builtin_dynamic_object_size(p, 1); \
    \
    if (__builtin_constant_p(bdos)) { \
        if (bdos == SIZE_MAX) { \
            printf(#p " has unknowable size\n"); \
        } else { \
            printf(#p " has a fixed size: %zu\n", bdos); \
        } \
    } else { \
        printf(#p " has a dynamic size: %zu\n", bdos); \
    } \
    printf(#p "[" #index "] assignment: %d\n", (p)[index] = 15); \
} while (0)

struct fixed {
    unsigned long flags;
    size_t foo;
    int array[16];
};

/* should emit "fixed" */
void do_fixed(struct fixed *p, int index)
{
    report_size(p->array, 0);
    report_size(p->array, index);
}

struct flex {
    unsigned long flags;
    size_t foo;
    int array[];
};

/* should emit "dynamic" */
void do_dynamic(unsigned char count, int index)
{
    /* malloc() is marked with __attribute__((alloc_size(1))) */
    struct flex *p = malloc(sizeof(*p) + count * sizeof(*p->array));
    report_size(p->array, 0);
    report_size(p->array, index);
    free(p);
}

/* should emit "unknowable" */
void do_unknown(struct flex *p, int index)
{
    report_size(p->array, 0);
    report_size(p->array, index);
}

Currently, it only emits once for the compile-time known index with a compile-time known array size:

array.c:31:17: remark: accessing fixed sized array 'int[16]' by 0 [-Rarray-bounds]                  
    report_size(p->array, 0);                                                                                       ^
void added a comment.Feb 21 2023, 1:18 PM

Here's a test-case. I'd expect 6 remarks from building this:

/* Build with -Wall -O2 -fstrict-flex-arrays=3 -fsanitize=bounds -Rarray-bounds */
#include <stdint.h>
#include <stdio.h>
#include <string.h>
#include <malloc.h>

#define report_size(p, index)      do {    \
    const size_t bdos = __builtin_dynamic_object_size(p, 1); \
    \
    if (__builtin_constant_p(bdos)) { \
        if (bdos == SIZE_MAX) { \
            printf(#p " has unknowable size\n"); \
        } else { \
            printf(#p " has a fixed size: %zu\n", bdos); \
        } \
    } else { \
        printf(#p " has a dynamic size: %zu\n", bdos); \
    } \
    printf(#p "[" #index "] assignment: %d\n", (p)[index] = 15); \
} while (0)

struct fixed {
    unsigned long flags;
    size_t foo;
    int array[16];
};

/* should emit "fixed" */
void do_fixed(struct fixed *p, int index)
{
    report_size(p->array, 0);
    report_size(p->array, index);
}

struct flex {
    unsigned long flags;
    size_t foo;
    int array[];
};

/* should emit "dynamic" */
void do_dynamic(unsigned char count, int index)
{
    /* malloc() is marked with __attribute__((alloc_size(1))) */
    struct flex *p = malloc(sizeof(*p) + count * sizeof(*p->array));
    report_size(p->array, 0);
    report_size(p->array, index);
    free(p);
}

/* should emit "unknowable" */
void do_unknown(struct flex *p, int index)
{
    report_size(p->array, 0);
    report_size(p->array, index);
}

Currently, it only emits once for the compile-time known index with a compile-time known array size:

array.c:31:17: remark: accessing fixed sized array 'int[16]' by 0 [-Rarray-bounds]                  
    report_size(p->array, 0);                                                                                       ^

Right. I'll be working on the rest of these soon. Probably the FAM's will be next followed by the "dynamic" size, as that's trickier due to lack of support in Clang for some of the required features.

void updated this revision to Diff 499321.Feb 21 2023, 4:14 PM

Report when there's a non-constant access:

array_access_report.c:32:17: remark: accessing fixed sized array 'int[16]' by 'index' [-Rarray-bounds]

report_size(p->array, index);
            ^
kees added a comment.Feb 23 2023, 2:56 PM

This gets me all 6 reports. The details about the array and the index don't really matter for the basic metrics:

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/Diagnostic
SemaKinds.td
index ba831c026342..29d2167b504b 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9451,7 +9451,7 @@ def note_array_declared_here : Note<                                       
   "array %0 declared here">;
                                                  
 def remark_array_access : Remark<
-  "accessing %select{fixed|dynamic}0 sized array %1 by %2">,
+  "accessing %select{fixed|unknown|dynamic}0 sized array %1 by %2">,
   InGroup<ArrayBoundsRemarks>;                                                                     
                                                  
 def warn_inconsistent_array_form : Warning<
diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp
index 9ced29a5f5d0..1c6aa7f05c7f 100644   
--- a/clang/lib/Sema/SemaChecking.cpp                                                               
+++ b/clang/lib/Sema/SemaChecking.cpp
@@ -16207,8 +16207,26 @@ void Sema::CheckArrayAccess(const Expr *BaseExpr, const Expr *IndexExpr,   
     return;                                      
                                                  
   Expr::EvalResult Result;
-  if (!IndexExpr->EvaluateAsInt(Result, Context, Expr::SE_AllowSideEffects))
+  if (!IndexExpr->EvaluateAsInt(Result, Context, Expr::SE_AllowSideEffects)) {
+    SmallString<128> sizeString;
+    llvm::raw_svector_ostream OS(sizeString);
+
+    OS << "'";
+    IndexExpr->printPretty(OS, nullptr, getPrintingPolicy());
+    OS << "'";
+
+    if (!IsUnboundedArray) {
+      Context.getDiagnostics().Report(
+          BaseExpr->getBeginLoc(), diag::remark_array_access)
+              << 0 << ArrayTy->desugar() << OS.str();
+    } else {
+      Context.getDiagnostics().Report(
+          BaseExpr->getBeginLoc(), diag::remark_array_access)
+              << 1 << "something" << OS.str();
+    }
+
     return;
+  }
  
   llvm::APSInt index = Result.Val.getInt();
   if (IndexNegated) {
@@ -16219,6 +16237,11 @@ void Sema::CheckArrayAccess(const Expr *BaseExpr, const Expr *IndexExpr,
   if (IsUnboundedArray) {
     if (EffectiveType->isFunctionType())
       return;
+
+    Context.getDiagnostics().Report(
+        BaseExpr->getBeginLoc(), diag::remark_array_access)
+            << 1 << "something" << "whatever";
+
     if (index.isUnsigned() || !index.isNegative()) {
       const auto &ASTC = getASTContext();
       unsigned AddrBits = ASTC.getTargetInfo().getPointerWidth(

Using "desugar" on a flexible array appears to blow up. :)

void updated this revision to Diff 502760.Mar 6 2023, 12:35 PM

Revise with Kees's code.

void updated this revision to Diff 502766.Mar 6 2023, 1:05 PM

Slight typo

kees added a comment.Sep 6 2023, 10:55 AM

Can you refresh this patch to work with https://reviews.llvm.org/D148381 ? My testing seems to imply that it doesn't know the size of the array. I assume the if (!IsUnboundedArray) check is incomplete now. i.e. for a __counted_by array, I see the "unknown" remark:

array-bounds.c:341:2: remark: accessing unknown sized array by 'index - 1' [-Rarray-bounds]
  341 |         TEST_ACCESS(p, array, index, SHOULD_TRAP);               
      |         ^

which is from the array-bounds.c test cases:

TEST_SIGNAL(counted_by_enforced_by_sanitizer, SIGILL)
{
        struct annotated *p;
        int index = MAX_INDEX + unconst;

        p = alloc_annotated(index);

        REPORT_SIZE(p->array);
        TEST_ACCESS(p, array, index, SHOULD_TRAP);
}
void added a comment.Sep 26 2023, 2:07 PM

Can you refresh this patch to work with https://reviews.llvm.org/D148381 ? My testing seems to imply that it doesn't know the size of the array. I assume the if (!IsUnboundedArray) check is incomplete now. i.e. for a __counted_by array, I see the "unknown" remark:

array-bounds.c:341:2: remark: accessing unknown sized array by 'index - 1' [-Rarray-bounds]
  341 |         TEST_ACCESS(p, array, index, SHOULD_TRAP);               
      |         ^

which is from the array-bounds.c test cases:

TEST_SIGNAL(counted_by_enforced_by_sanitizer, SIGILL)
{
        struct annotated *p;
        int index = MAX_INDEX + unconst;

        p = alloc_annotated(index);

        REPORT_SIZE(p->array);
        TEST_ACCESS(p, array, index, SHOULD_TRAP);
}

Are you going to be okay with something like this:

array-bounds.c:341:2: remark: accessing 'sizeof(p) + p->count * sizeof(*p->array)' sized array by 'index - 1' [-Rarray-bounds]
   341 |         TEST_ACCESS(p, array, index, SHOULD_TRAP);
       |         ^

or can it be simplified into something like this:

array-bounds.c:341:2: remark: accessing flexible array, counted by 'count', with 'index - 1' [-Rarray-bounds]
   341 |         TEST_ACCESS(p, array, index, SHOULD_TRAP);
       |         ^
kees added a comment.Sep 26 2023, 7:16 PM

or can it be simplified into something like this:

array-bounds.c:341:2: remark: accessing flexible array, counted by 'count', with 'index - 1' [-Rarray-bounds]
   341 |         TEST_ACCESS(p, array, index, SHOULD_TRAP);
       |         ^

Yeah, this is totally enough. Thanks!