This will report a ton of information. It's basically only good for
piping to a file and using Perl to gather any useful information.
Details
- Reviewers
kees
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This information will be useful for evaluating the coverage of the bounds checker for a given program, which in turn can help guide both improvements to the bounds checker (e.g. adding knowledge from __builtin_dynamic_object_size()) and improvements to the built code base (e.g. refactoring data structures to take advantage of known compile-time or run-time bounds that will be covered by the bounds checker). It answers the question, "What percentage of array accesses are being bounds checked?" (With the implied goal of reaching 100% going forward.)
I'd like to understand what the overhead is for this. How much overhead does this add when the remark is disabled? How much overhead does this add when the remark is enabled?
This will report a ton of information. It's basically only good for piping to a file and using Perl to gather any useful information.
This is why I'm worried about overhead -- there can be *a lot* of array accesses in a complex TU and spitting data out to (potentially) stdout/stderr is *slow*.
I should have indicated that I'm not expecting this to go into mainline Clang. As you mentioned, this would be a lot of overhead and is really only useful in the way @kees mentioned. I created this "arc" patch to make it easier for him to import it. :-) Sorry for the confusion.
That said, I do value any comments you have on the patch, but feel free to skip this and follow up patches.
This appears to be working for me. For before/after changes, the other half is still needed, i.e. a "accessing array of unknown size" and eventually splitting the dynamic sizing check off of that one (once -fsanitize=bounds checks __builtin_dynamic_object_size).
For example, comparing various development builds over time, if some source had 49 array accesses:
initial code: fixed:5 unknown:44
code refactored: fixed:10 unknown:39
bdos added to bounds checker: fixed:10 dynamic:4 unknown:35
code refactoring: fixed:10 dynamic:28 unknown:11
Here's a test-case. I'd expect 6 remarks from building this:
/* Build with -Wall -O2 -fstrict-flex-arrays=3 -fsanitize=bounds -Rarray-bounds */ #include <stdint.h> #include <stdio.h> #include <string.h> #include <malloc.h> #define report_size(p, index) do { \ const size_t bdos = __builtin_dynamic_object_size(p, 1); \ \ if (__builtin_constant_p(bdos)) { \ if (bdos == SIZE_MAX) { \ printf(#p " has unknowable size\n"); \ } else { \ printf(#p " has a fixed size: %zu\n", bdos); \ } \ } else { \ printf(#p " has a dynamic size: %zu\n", bdos); \ } \ printf(#p "[" #index "] assignment: %d\n", (p)[index] = 15); \ } while (0) struct fixed { unsigned long flags; size_t foo; int array[16]; }; /* should emit "fixed" */ void do_fixed(struct fixed *p, int index) { report_size(p->array, 0); report_size(p->array, index); } struct flex { unsigned long flags; size_t foo; int array[]; }; /* should emit "dynamic" */ void do_dynamic(unsigned char count, int index) { /* malloc() is marked with __attribute__((alloc_size(1))) */ struct flex *p = malloc(sizeof(*p) + count * sizeof(*p->array)); report_size(p->array, 0); report_size(p->array, index); free(p); } /* should emit "unknowable" */ void do_unknown(struct flex *p, int index) { report_size(p->array, 0); report_size(p->array, index); }
Currently, it only emits once for the compile-time known index with a compile-time known array size:
array.c:31:17: remark: accessing fixed sized array 'int[16]' by 0 [-Rarray-bounds] report_size(p->array, 0); ^
Right. I'll be working on the rest of these soon. Probably the FAM's will be next followed by the "dynamic" size, as that's trickier due to lack of support in Clang for some of the required features.
Report when there's a non-constant access:
array_access_report.c:32:17: remark: accessing fixed sized array 'int[16]' by 'index' [-Rarray-bounds]
report_size(p->array, index); ^
This gets me all 6 reports. The details about the array and the index don't really matter for the basic metrics:
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/Diagnostic SemaKinds.td index ba831c026342..29d2167b504b 100644 --- a/clang/include/clang/Basic/DiagnosticSemaKinds.td +++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td @@ -9451,7 +9451,7 @@ def note_array_declared_here : Note< "array %0 declared here">; def remark_array_access : Remark< - "accessing %select{fixed|dynamic}0 sized array %1 by %2">, + "accessing %select{fixed|unknown|dynamic}0 sized array %1 by %2">, InGroup<ArrayBoundsRemarks>; def warn_inconsistent_array_form : Warning< diff --git a/clang/lib/Sema/SemaChecking.cpp b/clang/lib/Sema/SemaChecking.cpp index 9ced29a5f5d0..1c6aa7f05c7f 100644 --- a/clang/lib/Sema/SemaChecking.cpp +++ b/clang/lib/Sema/SemaChecking.cpp @@ -16207,8 +16207,26 @@ void Sema::CheckArrayAccess(const Expr *BaseExpr, const Expr *IndexExpr, return; Expr::EvalResult Result; - if (!IndexExpr->EvaluateAsInt(Result, Context, Expr::SE_AllowSideEffects)) + if (!IndexExpr->EvaluateAsInt(Result, Context, Expr::SE_AllowSideEffects)) { + SmallString<128> sizeString; + llvm::raw_svector_ostream OS(sizeString); + + OS << "'"; + IndexExpr->printPretty(OS, nullptr, getPrintingPolicy()); + OS << "'"; + + if (!IsUnboundedArray) { + Context.getDiagnostics().Report( + BaseExpr->getBeginLoc(), diag::remark_array_access) + << 0 << ArrayTy->desugar() << OS.str(); + } else { + Context.getDiagnostics().Report( + BaseExpr->getBeginLoc(), diag::remark_array_access) + << 1 << "something" << OS.str(); + } + return; + } llvm::APSInt index = Result.Val.getInt(); if (IndexNegated) { @@ -16219,6 +16237,11 @@ void Sema::CheckArrayAccess(const Expr *BaseExpr, const Expr *IndexExpr, if (IsUnboundedArray) { if (EffectiveType->isFunctionType()) return; + + Context.getDiagnostics().Report( + BaseExpr->getBeginLoc(), diag::remark_array_access) + << 1 << "something" << "whatever"; + if (index.isUnsigned() || !index.isNegative()) { const auto &ASTC = getASTContext(); unsigned AddrBits = ASTC.getTargetInfo().getPointerWidth(
Using "desugar" on a flexible array appears to blow up. :)
Can you refresh this patch to work with https://reviews.llvm.org/D148381 ? My testing seems to imply that it doesn't know the size of the array. I assume the if (!IsUnboundedArray) check is incomplete now. i.e. for a __counted_by array, I see the "unknown" remark:
array-bounds.c:341:2: remark: accessing unknown sized array by 'index - 1' [-Rarray-bounds] 341 | TEST_ACCESS(p, array, index, SHOULD_TRAP); | ^
which is from the array-bounds.c test cases:
TEST_SIGNAL(counted_by_enforced_by_sanitizer, SIGILL) { struct annotated *p; int index = MAX_INDEX + unconst; p = alloc_annotated(index); REPORT_SIZE(p->array); TEST_ACCESS(p, array, index, SHOULD_TRAP); }
Are you going to be okay with something like this:
array-bounds.c:341:2: remark: accessing 'sizeof(p) + p->count * sizeof(*p->array)' sized array by 'index - 1' [-Rarray-bounds] 341 | TEST_ACCESS(p, array, index, SHOULD_TRAP); | ^
or can it be simplified into something like this:
array-bounds.c:341:2: remark: accessing flexible array, counted by 'count', with 'index - 1' [-Rarray-bounds] 341 | TEST_ACCESS(p, array, index, SHOULD_TRAP); | ^
The GitHub repo with this and the counted_by change: https://github.com/bwendling/llvm-project/tree/array-bounds-remarks-with-counted-by