This is an archive of the discontinued LLVM Phabricator instance.

An Aliasing Validator/Sanitizer
AcceptedPublic

Authored by hfinkel on Jul 9 2014, 4:55 PM.

Details

Summary

As part of the conversation about the scoped-noalias metadata, our need for a validator / sanitizer for aliasing was heightened. Two use cases were discussed:

  1. To validate the user's use of 'restrict' on pointers.
  2. To validate LLVM's AA infrastructure.

This implementation is certainly not ready to be committed (although it is functional, and I've found AA bugs with it when self-hosting Clang with it enabled), but I'm posting it to start a conversation on what we want and how it should be implemented.

I originally wrote this in response to bugs appearing when I enabled the use of AA during code generation (and the current implementation is certainly biased toward that use case). There are two modes implemented:

  1. A mode where instrumentation is inserted to check NoAlias results on load/store and store/store pairs. Checking "all" such pairs in a function is impractical for large functions, so the current implementation checks only such pairs that occur in between likely scheduling barriers. Also, it inserts the checks late (after almost all optimizations) because my use case was focused on bugs that appeared from using AA during codegen. The checks could certainly be inserted earlier as well. Compiling with -mllvm -codegen-validate-aa enables this.
  2. A mode when uses of NoAlias results during instruction scheduling are recorded in a file. When compiling again later, this file can be read and only those specific pairs are instrumented to be checked. This has much lower overhead, but is also less sensitive (because instruction scheduling only uses AA when other methods fail to yield a definitive result and the result might be relevant to scheduling). Compiling with -mllvm -record-aa-sched-mi=SOME_DIRECTORY will cause the AA pairs used during instruction scheduling to be recorded and then compiling with -mllvm -codegen-validate-aa -mllvm -use-recorded-aa=SOME_DIRECTORY will cause the instrumentation to be inserted.

To focus more on user errors, we'd probably want the checks inserted earlier (prior to inlining), the checks to encode source-level locations (instead of or in addition to IR instructions), and to use the sanitizer runtimes to report errors (instead of manually building sprintf/write calls). Also, it currently has no test cases ;)

To get a feel for what this current implementation does:

$ cat /tmp/ta.c
attribute((noinline)) void foo(int * restrict a, int * restrict b) {

*b = *a;

}

int main() {

int a = 5;
foo(&a, &a);

}

$ clang -O3 -o /tmp/ta /tmp/ta.c -mllvm -codegen-validate-aa

$ /tmp/ta
ALIAS: /tmp/ta.c: foo: '%0 = load i32* %a, align 4, !tbaa !1' (in 'entry') and 'store i32 %0, i32* %b, align 4, !tbaa !1' (in 'entry'): [0x7fff55658a24 0x7fff55658a28) <-> [0x7fff55658a24 0x7fff55658a28)
Illegal instruction (core dumped)

Thanks in advance!

Diff Detail

Event Timeline

hfinkel updated this revision to Diff 11232.Jul 9 2014, 4:55 PM
hfinkel retitled this revision from to An Aliasing Validator/Sanitizer.
hfinkel updated this object.
hfinkel edited the test plan for this revision. (Show Details)
hfinkel added a subscriber: Unknown Object (MLST).

Ping. (Looking for feedback here... figuring out what and when is useful to check is going to require some discussion)

atrick accepted this revision.Aug 23 2014, 3:53 PM
atrick edited edge metadata.

Hal,

I think both these validation modes will be extremely useful. The first (restrict validation) tests the user annotations. The second (AA validation) tests our AA implementation.

For AA validation we want to be able to recompile a potentially broken program with exactly the same optimization and inlining, at least up to the point where we insert instrumentation.

Your design looks good to me. It can be extended to validate AA across any codegen (or ISEL) pass, as long as memory operands are intact. Validating AA decisions from earlier passes, before the IR is frozen would require a different approach. (If the pass modifies IR while recording decisions we have a problem). To deal with that (in the future) we could use an instruction numbering analysis to identify pairs.

I don't have any conerns at the moment about your design or implementation.

This revision is now accepted and ready to land.Aug 23 2014, 3:53 PM

Looks like patch was not committed.