As part of the conversation about the scoped-noalias metadata, our need for a validator / sanitizer for aliasing was heightened. Two use cases were discussed:
- To validate the user's use of 'restrict' on pointers.
- To validate LLVM's AA infrastructure.
This implementation is certainly not ready to be committed (although it is functional, and I've found AA bugs with it when self-hosting Clang with it enabled), but I'm posting it to start a conversation on what we want and how it should be implemented.
I originally wrote this in response to bugs appearing when I enabled the use of AA during code generation (and the current implementation is certainly biased toward that use case). There are two modes implemented:
- A mode where instrumentation is inserted to check NoAlias results on load/store and store/store pairs. Checking "all" such pairs in a function is impractical for large functions, so the current implementation checks only such pairs that occur in between likely scheduling barriers. Also, it inserts the checks late (after almost all optimizations) because my use case was focused on bugs that appeared from using AA during codegen. The checks could certainly be inserted earlier as well. Compiling with -mllvm -codegen-validate-aa enables this.
- A mode when uses of NoAlias results during instruction scheduling are recorded in a file. When compiling again later, this file can be read and only those specific pairs are instrumented to be checked. This has much lower overhead, but is also less sensitive (because instruction scheduling only uses AA when other methods fail to yield a definitive result and the result might be relevant to scheduling). Compiling with -mllvm -record-aa-sched-mi=SOME_DIRECTORY will cause the AA pairs used during instruction scheduling to be recorded and then compiling with -mllvm -codegen-validate-aa -mllvm -use-recorded-aa=SOME_DIRECTORY will cause the instrumentation to be inserted.
To focus more on user errors, we'd probably want the checks inserted earlier (prior to inlining), the checks to encode source-level locations (instead of or in addition to IR instructions), and to use the sanitizer runtimes to report errors (instead of manually building sprintf/write calls). Also, it currently has no test cases ;)
To get a feel for what this current implementation does:
$ cat /tmp/ta.c
attribute((noinline)) void foo(int * restrict a, int * restrict b) {
*b = *a;
}
int main() {
int a = 5; foo(&a, &a);
}
$ clang -O3 -o /tmp/ta /tmp/ta.c -mllvm -codegen-validate-aa
$ /tmp/ta
ALIAS: /tmp/ta.c: foo: '%0 = load i32* %a, align 4, !tbaa !1' (in 'entry') and 'store i32 %0, i32* %b, align 4, !tbaa !1' (in 'entry'): [0x7fff55658a24 0x7fff55658a28) <-> [0x7fff55658a24 0x7fff55658a28)
Illegal instruction (core dumped)
Thanks in advance!