This is an archive of the discontinued LLVM Phabricator instance.

asan: optimization experiments
ClosedPublic

Authored by dvyukov on Mar 10 2015, 6:21 AM.

Details

Reviewers
kcc
Summary

The experiments can be used to evaluate potential optimizations that remove
instrumentation (assess false negatives). Instead of completely removing
some instrumentation, you set Exp to a non-zero value (mask of optimization
experiments that want to remove instrumentation of this instruction).
If Exp is non-zero, this pass will emit special calls into runtime
(e.g. asan_report_exp_load1 instead of asan_report_load1). These calls
make runtime terminate the program in a special way (with a different
exit status). Then you run the new compiler on a buggy corpus, collect
the special terminations (ideally, you don't see them at all -- no false
negatives) and make the decision on the optimization.

The exact reaction to experiments in runtime is not implemented in this patch.
It will be defined and implemented in a subsequent patch.

Diff Detail

Event Timeline

dvyukov updated this revision to Diff 21571.Mar 10 2015, 6:21 AM
dvyukov retitled this revision from to asan: optimization experiments.
dvyukov updated this object.
dvyukov edited the test plan for this revision. (Show Details)
dvyukov added a reviewer: kcc.
dvyukov added a subscriber: Unknown Object (MLST).
kcc edited edge metadata.Mar 10 2015, 11:14 AM

You'll also need a test intest/Instrumentation/AddressSanitizer/

AddressSanitizer.cpp
1077 ↗(On Diff #21571)

The name is misleading. The address is not unusual, but size or alignment is.

lib/asan/asan_report.cc
945

the rt is used with other compilers. I would prefer to have a full comment here

lib/asan/asan_rtl.cc
169

can you avoid so much duplication here?

dvyukov updated this revision to Diff 21689.Mar 11 2015, 5:21 AM
dvyukov edited edge metadata.

PTAL

AddressSanitizer.cpp
1077 ↗(On Diff #21571)

Done

lib/asan/asan_report.cc
945

Extended it to some degree. Will extend more when we decide how to handle experiments in runtime, otherwise I don't know how to extend it.

lib/asan/asan_rtl.cc
169

Done

kcc added inline comments.Mar 11 2015, 3:16 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
926

Do I understand correctly that this patch does not set Exp to anything interesting by default,
and that will be added in the following patches?

lib/asan/asan_thread.h
121 ↗(On Diff #21689)

That's gross. Why not just pass an extra parameter??

PTAL

lib/Transforms/Instrumentation/AddressSanitizer.cpp
926

Yes, this patch does not add any active experiments.
No, there won't be following patches with experiments. There is no reason to commit any experiments. You implement an experiment locally, evaluate it and then either commit the optimization or commit nothing.

lib/asan/asan_thread.h
121 ↗(On Diff #21689)

Why is asan_load/store such a huge macro today and not a normal function?
The only reason I see is that the function is performance critical and we don't want to obtain pc/bp/sp and call another function (
asan_exp_load/store). I've made the current implementation based on this.
If the function is not performance critical, then we need wipe the entire macro and replace it with a function.

kcc added inline comments.Mar 12 2015, 1:25 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
926

Then we misunderstood each other from the beginning.
My idea was to actually commit everything to trunk and let *all* users use it for many months,
and make it clear that if any experimental report fires they should let *us* know.
I doubt you can get good enough coverage with a one-off experiment -- there are not too many bugs lying around in any single code base to have assuring statistics.
This is also why I thought we need to have just one experiment (no experiment IDs and such) so that the code is simpler and there is no additional performance / code size penalty for passing an extra param

lib/asan/asan_thread.h
121 ↗(On Diff #21689)

Yes, we certainly do not want to obtain pc/bp on the main path.
We probably can replace this macro with a force-inline function though. (A bit fragile, but I think we already rely on force inline elsewhere)
Or to be 100% safe put the function body into one macro and define two variants of functions using that body macro.

dvyukov updated this revision to Diff 21945.Mar 13 2015, 11:38 AM

PTAL

lib/Transforms/Instrumentation/AddressSanitizer.cpp
926

As you wish. The actual experiment will be in a separate change then.

But I doubt there are many users who use tip clang _and_ run it on a huge codebase with lots of bugs _and_ who know about experiments _and_ will be willing to report back _and_ will alter their infrastructure to detect experimental crashes. And in the end an experiment will somehow end up in a clang release as well.

Experiment ID is a must, otherwise you will need to ask users to send you a full reproducer (something they won't be able to do), and then you will have to stare at the code trying to figure out what llvm byte code it produced and how asan pass handled it. You won't be able to just compile it, because it uses a custom build system and the bug is not reproducible without a non-trivial input.

lib/asan/asan_thread.h
121 ↗(On Diff #21689)

done

kcc added a comment.Mar 13 2015, 2:40 PM

Experiment ID is a must, otherwise you will need to ask users to send you a full reproducer (something they won't be able to do), and then you will have to stare at the code trying to figure out what llvm byte code it produced and how asan pass handled it. You won't be able to just compile it, because it uses a custom build system and the bug is not reproducible without a non-trivial input.

I don't understand that. If we have just one experiment at any given time -- ID is completely useless.
If we have multiple experiments in flight it may be a bit useful, but we will still need reproducer to be able to perform complete analysis.
The downside of Exp parameter is added complexity, code size and performance loss (probably not huge though)

kcc added a comment.Mar 13 2015, 2:41 PM

Also, if we have, say, 2 or 3 experiments at any given time, we may get away with asan_exp1_* and asan_exp1_* instead of passing parameters

In D8198#140697, @kcc wrote:

Experiment ID is a must, otherwise you will need to ask users to send you a full reproducer (something they won't be able to do), and then you will have to stare at the code trying to figure out what llvm byte code it produced and how asan pass handled it. You won't be able to just compile it, because it uses a custom build system and the bug is not reproducible without a non-trivial input.

I don't understand that. If we have just one experiment at any given time -- ID is completely useless.
If we have multiple experiments in flight it may be a bit useful, but we will still need reproducer to be able to perform complete analysis.
The downside of Exp parameter is added complexity, code size and performance loss (probably not huge though)

Even if you are doing a single change, you still may want several experiments within it: e.g. several levels of aggressiveness and/or application to stack, globals, generic pointers. Doing it in several multi-month consecutive experiments is unbearable.
For example, in the optimization I've slightly changed handling of globals as well. After the experiment I want to get info for stack/globals separately.

kcc accepted this revision.Mar 16 2015, 1:58 PM
kcc edited edge metadata.

LGTM

This revision is now accepted and ready to land.Mar 16 2015, 1:58 PM

Committed in 232501/232502