This is an archive of the discontinued LLVM Phabricator instance.

asan: do not instrument direct inbounds accesses to stack variables
AbandonedPublic

Authored by dvyukov on Feb 12 2015, 4:50 AM.

Details

Reviewers
kcc
nlopes
Summary

This is most likely wrong.

But it eliminates 33% of instrumentation on webrtc/modules_unittests (number of memory accesses goes down from 290152 to 193998) and reduces binary size by 15% (from 74M to 64M) _and_ all existing asan tests pass.
This means that either our tests are bad or we are missing a good optimization opportunity.

Currently we instrument even the following code, which looks wrong:

define void @foo() uwtable sanitize_address {
entry:

%a = alloca i32, align 4
store i32 42, i32* %a, align 4
ret void

}

I've found one case which fails with this change. We do not instrument the store in this case:

define void @bar() sanitize_address {
entry:

%a = alloca [10 x i32], align 4
%e = getelementptr inbounds [10 x i32]* %a, i32 0, i64 12
store i32 42, i32* %e, align 4
ret void

; CHECK-LABEL: define void @bar
; CHECK: __asan_report
; CHECK: ret void
}

However, compiler should be able to recognize such cases (and it does because it issues a warning).

I am a bit lost.
There is Value::isGEPWithNoNotionalOverIndexing which looks like what we need (should eliminate array accesses with constant inbounds indices and struct field accesses). However, it does not give the desired effect.
Value::stripInBoundsConstantOffsets also looks like what we need and it gives the desired effect. However, there is a note that "Value::isGEPWithNoNotionalOverIndexing is not equivalant to, a subset of, or a superset of the "inbounds" property". Which suggests that stripInBoundsConstantOffsets is not what we want.

Also, is it possible to refer to an AllocaInst outside of its lifetime? It is not possible in C/C++ due to scoping rules, you simply can't mention a name that is out of scope. However, I am not sure about llvm and its transformations.

I am looking for advice here.

Diff Detail

Event Timeline

dvyukov updated this revision to Diff 19817.Feb 12 2015, 4:50 AM
dvyukov retitled this revision from to asan: do not instrument direct inbounds accesses to stack variables.
dvyukov updated this object.
dvyukov edited the test plan for this revision. (Show Details)
dvyukov added reviewers: kcc, chandlerc.
dvyukov added a subscriber: Unknown Object (MLST).

To make it clear, I want to eliminate instrumentation of locals within its lifetime (so it is not a use-after-scope) and that is known to be inbounds (that is, direct accesses, field accesses and array element accesses with constant indices; so it is not a out-of-bounds).

kcc added inline comments.Feb 17 2015, 5:34 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
920

mega cool.
Let's put it under a flag e.g. asan-opt-stack-inbounds (off by default) and play with it more.

924

remove this then?

dvyukov added inline comments.Feb 17 2015, 10:48 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
920

Is it possible to make it work correctly? I am lost in all possible llvm predicates and their meaning. I don't understand how "int x[10]; x[12] = 1" can be an "in bounds constant offset".

stripInBoundsConstantOffsets is not want you want, see the definition of "inbounds" ( http://llvm.org/docs/LangRef.html#getelementptr-instruction)
"If the inbounds keyword is present, the result value of the getelementptr is a poison value if the base pointer is not an in bounds address of an allocated object, or if any of the addresses that would be formed by successive addition of the offsets implied by the indices to the base address with infinitely precise signed arithmetic are not an in bounds address of that allocated object. The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end. In cases where the base is a vector of pointers the inbounds keyword applies to each of the computations element-wise."

I recall Nuno Lopes working on reducing the number of bounce checks, but all I can find is this:
./lib/Transforms/Instrumentation/BoundsChecking.cpp

dvyukov updated this revision to Diff 20592.Feb 24 2015, 6:01 AM

Rewrote to use SizeOffsetEvalType.compute to find inbounds accesses.

Thank you, Anna!

Changed the code to use SizeOffsetEvalType.compute to find inbounds accesses.
This patch still eliminates ~33% of accesses and binaries are 15% smaller. All existing and new tests pass.

Anna, does it fix the issue that you are trying to solve in the other patch?

Kostya, PTAL.

How are you getting the speedup/size improvement measurements? Are these at -O1 or -O0?

When comparing to my patch, the intend of this patch is to be more aggressive in removing bounds checking. My patch does not introduce any improvement at optimization levels higher than -O0 and is trying to simulate mem2reg. On the other hand, the analysis here are more aggressive in the checks it removes. For example, my patch would not remove provably in bounds array accesses with constant offset and it does not do anything for non-alloca values.

On the other hand, with my patch, the allocas that are known not to have instrumented accesses do not get poisoned. Currently, this patch is only targeting removal of bounds checking. Also, I am not sure what is the compile time overhead this brings and how reliable it is since I don't think ObjectSizeOffsetEvaluator is used much.

I preference is to have this on top of non-promotable allocas.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
18

These seem out of place - should be moved after ADT.

205

This makes me nervous. I don't think ObjectSizeOffsetEvaluator is used much. This should probably go through more testing, though I am not sure how to catch issues here since we are removing checking.

435

const?

1515

/*RoundToAlign=*/true

2064

Why not use getTypeStoreSize()?

uint64_t getTypeStoreSizeInBits(Type *Ty) const {
  return 8 * getTypeStoreSize(Ty);
}

Also, is overflow possible in these calculations?

Also, I believe removal of all unnecessary checks is a bigger task. If we want to ensure that all possible out of bounds accesses are instrumented, we'd have to do something similar to what Chandler suggested in the other thread. That would eliminate the false negatives that are now possible at -O1 and higher. We'd need to have an instrumentation pass that runs early on and instruments accesses before the optimizer kicks in. Later, we would work with LLVM analysis to remove those checks.

Here is the answer from Nuno on bounds check removal. Enjoy:)

"The file you mention is the instrumentation pass. It is pretty dumb: it basically instruments any memory write (it just performs one optimization to reduce the cost of the check if certain conditions hold; but it always introduces the check). So this one is not interesting for AddressSanitizer (the likelihood it can detect anything that ASan cannot detect is very small; the idea was to have something with a very small overhead to deploy on release builds). It can detect overflows that Valgrind cannot btw.

The part about removing checks is done by several passes which are run by default (with -O2). Instcombine can remove some checks, then Transforms/Scalar/CorrelatedValuePropagation.cpp performs range analysis to delete checks that are always safe. This analysis existed previously, but it had a bunch of problems and shortcomings.
The range analysis is in Analysis/LazyValueInfo.cpp. It's still fundamentally weak, because of the way it traverses basic-blocks to constrain the range due to branching conditions. We have discussed this quite a bit, but the major improvements were never implemented (ask me if you want more details).

There's also Analysis/MemoryBuiltins.cpp. This analysis provides (static or dynamic) information about the allocated size of an object (and knows about malloc, stack, etc).

A very important aspect of reducing the overhead of bounds checks is to hoist them out of loops. By default, LLVM cannot and won't do this. The reason is that it's not legal to move a function that has side-effects (namely terminate the application) because LLVM has precise exception semantics. However, for bounds checks we don't really care; if we know that the program will crash inside the loop, why not crash it sooner? (well, there's a price to pay, of course. the state of the program will be different when looking through a debugger, but I think that's ok).
Again, right now LLVM cannot do this transformation. At the time, I proposed that we introduced an "antecipable" trap to the IR. Something that the compiler could move freely up to the beginning of the function (or of the program), as long as we know it would only execute if it executed in the original program.
An example makes it clear I guess:

for (i=0; i < n; ++i) {

if (a[i] out of bounds)
  antecipable_trap();
a[i] = …;

}

Transform to:

if (any of a[0..n-1] out of bounds)

antecipable_trap();

for (i=0; i < n; ++i) {

a[i] = …;

}

This is an obvious transformation. Without it, vectorization will not kick in at all (since it is mostly oblivious about multiple exit loops). Anyway, this is something for the backend guys to work on. It isn't very hard to implement; it just hasn't been done.

The other thing that has been committed recently to LLVM (last month) is a loop splitter (Transforms/Scalar/InductiveRangeCheckElimination.cpp). The idea is that if we can prove that in the first x iterations of the loop, everything is in bounds, then we can duplicate the loop, and remove all checks from the first loop. This increases code size, but can enable many optimizations on the first x iterations (which we hope will be the majority, but that's only a hope).

Finally, how does all of this applies to ASan? I don't know. Sorry, I never took a look to how ASan instruments stuff. If it exposes, say, the bounds checks in the IR (i.e., if the comparison is inlined in the IR), then current transformation passes will look at that and try to optimized them away (with the caveat there is still work required on the backend). If ASan just introduces a function call (e.g., check_pointer_is_ok(%p)), then the optimizers have no clue what that function does and will not touch it. Of course it is possible to patch, say, the CVP pass to know about these ASan functions and then reuse the same range analysis. Or, alternatively, build a simple pass that is only runs when ASan is used and that queries current analyses.
So, yes, building analyses from scratch doesn't make sense IMHO. It should be possible to reuse what LLVM already has. Then, how to use the information is a different story, but grepping for ASan functions in the IR and using current analyses shouldn't be a hard task (again, I have no clue how ASan works).
ASan also has stronger guarantees that my bounds checkers. For example, I don't check if the object has been deleted in the meantime. And ASan does. So the analysis has to work a bit harder (basically check liveness of objects as well as the size; but then if we only managed to get one piece of information but not the other, probably it's possible to optimize the check).

Ok, so I think the email is already quite long, but feel free to ask me for more details. I just tried to give an overview of what LLVM can and cannot do. I'm sorry I don't know more about ASan to give you a more concrete answer.

Nuno"

kcc added inline comments.Feb 24 2015, 6:16 PM
lib/Transforms/Instrumentation/AddressSanitizer.cpp
205

ouch. Indeed, don't make it true by default for now.
Unfortunately, I don't know any good way to test complex optimizations that eliminate checks,
and so I don't know what it will take us to enable this by default.
But at least having this code in trunk will simplify the experiments.

dvyukov updated this revision to Diff 20673.Feb 25 2015, 7:09 AM
dvyukov updated this revision to Diff 20674.Feb 25 2015, 7:12 AM

uploaded new patch

lib/Transforms/Instrumentation/AddressSanitizer.cpp
18

Done

205

changed default value to false

435

done

1515

done

2064

done

These are int64's. I don't see how overflow can happen.

How are you getting the speedup/size improvement measurements? Are these at -O1 or -O0?

I compiled a large program with -O2 and measured binary size difference.
Then compiled with -O2 and call threshold set to -1 and objdump -d | grep "call.*__asan_report_" | wc -l

When comparing to my patch, the intend of this patch is to be more aggressive in removing bounds checking. My patch does not introduce any improvement at optimization levels higher than -O0 and is trying to simulate mem2reg. On the other hand, the analysis here are more aggressive in the checks it removes. For example, my patch would not remove provably in bounds array accesses with constant offset and it does not do anything for non-alloca values.

On the other hand, with my patch, the allocas that are known not to have instrumented accesses do not get poisoned. Currently, this patch is only targeting removal of bounds checking. Also, I am not sure what is the compile time overhead this brings and how reliable it is since I don't think ObjectSizeOffsetEvaluator is used much.

I preference is to have this on top of non-promotable allocas.

I am perfectly OK with it. But what do I know?

Also, I believe removal of all unnecessary checks is a bigger task.

I would say it is an infinite task. Besides loops there are also access coalescing, inter-BB duduplication, range deduplication (if we checked [this, this+X), then we may omit checks of everything in [this, this+X)), figuring out what calls can actually free what objects, etc.

nlopes requested changes to this revision.Feb 25 2015, 8:40 AM
nlopes added a reviewer: nlopes.
nlopes added a subscriber: nlopes.

You should use ObjectSizeOffsetVisitor instead of ObjectSizeOffsetEvaluator. The interface is the similar, but it gives up when the object size is not constant, while the later may insert new instructions in the code. ObjectSizeOffsetVisitor is well tested (it's used by alias analysis, for example).

Second, the inbounds check is unsound (it may overflow -- check the prove here: http://rise4fun.com/Z3/PVrF). The correct set of checks are the following 3:

  • Offset >= 0 (signed; you have this one)
  • Size >= Offset (unsigned)
  • Size - Offset >= AccessSize (unsigned)

Finally, you cannot derive the access size from "DL->getTypeStoreSize(OrigTy)". For example, just because an array has elements of size 4, it doesn't mean that all stores must be of size 4. You can have done a bitcast, and stored 8 bytes. Basically you need to get hold of the element stored and check its store size.

After fixing these 3 issues, I think the patch becomes correct and desirable for commit.

This revision now requires changes to proceed.Feb 25 2015, 8:40 AM
dvyukov updated this revision to Diff 21107.Mar 3 2015, 8:05 AM
dvyukov edited edge metadata.

Thanks for the review, Nuno!
All you three comments are addressed. Please take another look.

FWIW: This optimization looks right to me.
GVN uses similar logic to do an optimization to loads it can prove come directly from allocation functions.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
2055

You may want to look if this does a better/worse job than GetPointerBaseWithConstantOffset

(This is what GVN, DeadStoreElimination, etc use to do what you are doing here - compute the pointer base and constant offset, make sure they match, then they use getTypeSizeInBits to compare the sizes)

chandlerc accepted this revision.Mar 3 2015, 4:55 PM
chandlerc edited edge metadata.

This looks very good to me as well. I think you can submit now. The remaining issues are I think quite minor.

However, I would love to find some form of naming that doesn't so easily confuse the GEP "inbounds" keyword with this check. Sadly, I can't come up with any good ideas. Maybe just talk about "safe" in the APIs and explain what the object size check is computing, etc.? Anyways, a minor point.

If this causes a compile-time regression, it should be easy to track down. You might do a quick sanity check for some big inputs Dmitry, since it seems like you have them.

Danny, I'm pretty sure that the object size code is newer and should at least in theory be more powerful than just getting the base address. I could be wrong though. Still, easy to switch to that in a follow-up if needed.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
410–412

Surely clang-format puts this 'const' somewhere else...

dvyukov updated this revision to Diff 21178.Mar 4 2015, 1:29 AM
dvyukov edited edge metadata.

applied clang-format

Chandler,

I've applied clang-format.

Renamed s/inbounds/safe/. Comment above the function explains the meaning of safe.

Regarding compilation speed. On top of removing instrumentation and reducing binary size, it also speedups compilation (less stuff for backed to deal with). I've tested on llvm's largest source file lib/Target/X86/X86ISelLowering.cpp.

current without asan -O0:
real 0m4.847s
real 0m4.813s
real 0m4.776s
(I cross-checked that the new compiler has the same performance)

current -fsanitize=address -O0
real 0m7.191s
real 0m7.257s
real 0m7.183s

new -fsanitize=address -O0
real 0m6.619s
real 0m6.798s
real 0m6.722s
(~-6.4%)

current -fsanitize=address -O1
real 0m17.466s
real 0m17.553s
real 0m17.449s

new -fsanitize=address -O1
real 0m15.908s
real 0m15.853s
real 0m15.329s
(~-12.15%)

current -fsanitize=address -O2
real 0m24.337s
real 0m24.246s
real 0m24.604s

new -fsanitize=address -O2
real 0m23.662s
real 0m23.430s
real 0m23.400s
(~-3.49%)

Object file size:
w/o asan: 1228800
current with asan: 3658480 (+197.73%)
new with asan: 3444760 (-5.84%)

Amount of instrumentation (since it is just an object file, I grepped for "callq"):
w/o asan: 8235
current with asan: 37342
new with asan: 33706 (-12.5%)

dvyukov updated this revision to Diff 21193.Mar 4 2015, 5:25 AM

Reformat code with -style=Google, because otherwise linter in check-all barks.

Committed in rev 231241.
We also need some plan for testing and enabling such optimization by default.

nlopes resigned from this revision.Jul 26 2017, 5:58 AM
dvyukov abandoned this revision.Apr 15 2021, 1:19 AM