Page MenuHomePhabricator

Skip promotable allocas to improve performance at -O0
ClosedPublic

Authored by zaks.anna on Feb 18 2015, 5:19 PM.

Details

Summary

Currently, the ASan executables built with -O0 are unnecessarily slow. The main reason is that ASan instrumentation pass inserts redundant checks around promotable allocas. These allocas do not get instrumented under -O1 because they get converted to virtual registered by mem2reg. With this patch, ASan instrumentation pass will only instrument non promotable allocas, giving us a speedup of 39% on a collection of benchmarks with -O0. (There is no measurable speedup at -O1.)

Comments/Questions:

  • We've initially discussed creating another pass to annotate the uninteresting load and stores. However, the check is simple and having a separate pass seems like an overkill.
  • A bunch of ASan llvm tests are written using promotable allocas. I've added volatile stores or marked the existing stores as volatile to make the allocas non-promotable.
  • Two of the runtime tests also seem to depend on being run with -O0. With this modification, I am getting the same behavior as when the tests are run with higher optimization levels. I've turned off the mode on those tests since I am not sure if a better fix is available. They fail on comparing values returned by __asan_region_is_poisoned.
  • Should "isInterestingAlloca()" be hoisted out of "FunctionStackPoisoner" and used in "isInterestingMemoryAccess"?

Diff Detail

Event Timeline

zaks.anna updated this revision to Diff 20239.Feb 18 2015, 5:19 PM
zaks.anna retitled this revision from to Skip promotable allocas to improve performance at -O0.
zaks.anna updated this object.
zaks.anna edited the test plan for this revision. (Show Details)
zaks.anna added reviewers: kcc, samsonov, kubamracek.
zaks.anna added a subscriber: Unknown Object (MLST).
samsonov edited edge metadata.Feb 19 2015, 6:41 PM

Overall, this looks good.

A bunch of ASan llvm tests are written using promotable allocas. I've added volatile stores or marked the existing stores as volatile to make the allocas non-promotable.

I'm OK with that

Two of the runtime tests also seem to depend on being run with -O0. With this modification, I am getting the same behavior as when the tests are run with higher optimization levels. I've turned off the mode on those tests since I am not sure if a better fix is available. They fail on comparing values returned by __asan_region_is_poisoned.

Now, this part is interesting. Looks like these tests are failing right now, if we increase optimization level from -O0. I'd say, we should better make these tests more robust instead - if we will force the necessary stack layout there on higher optimization levels, they would work fine after landing this patch as well.

Should "isInterestingAlloca()" be hoisted out of "FunctionStackPoisoner" and used in "isInterestingMemoryAccess"?

Yes, I think it's fine to skip instrumenting memory accesses to "uninteresting" allocas that don't have redzones around them.

kcc added a reviewer: dvyukov.Feb 19 2015, 6:44 PM

Dmitry was digging somewhere nearby. Just FYI

kcc added a comment.Feb 19 2015, 7:08 PM

With this patch, ASan instrumentation pass will only instrument non promotable allocas, giving us a speedup of 39% on a collection of benchmarks with -O0

I wonder, what if we apply this only to those allocas that are indexed with constants?
I.e. only if we can statically prove the correctness of access.
See also Dmitry's http://reviews.llvm.org/D7583

dvyukov edited edge metadata.Feb 20 2015, 9:52 AM

Smart people, while you are here I would like to ask you to look at:
http://reviews.llvm.org/D7583
It is somewhat similar to this change. I want to eliminate
instrumentation of locals within its lifetime (so it is not a
use-after-scope) and that is known to be inbounds (that is, direct
accesses, field accesses and array element accesses with constant
indices; so it is not a out-of-bounds).
By experimentation I found that both isGEPWithNoNotionalOverIndexing
and stripInBoundsConstantOffsets don't do what I want.
isGEPWithNoNotionalOverIndexing does not eliminate enough of
unnecessary instrumentation. While stripInBoundsConstantOffsets
eliminates a bit more than necessary.
Thanks in advance!

kcc added a comment.Feb 23 2015, 3:37 PM

Anna, can you repeat your measurements with Dmitry's D7583?
If it gives speedup close to your 39%, we should go this way (but first, implement it properly, of course)

The purpose of this patch is to only remove checks for accesses that are known to be safe. All array accesses are still getting checked. For example, in the following code snippet, only stores/loads to idx will not be checked:
char test_promotable_allocas() {

char r[3];
char a[3];
int idx = 10;
a[10] = 0;
r[idx] = 10;
return a[10] + r[idx];

}

Here is the relevant code from isAllocaPromotable:
else if (const GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(U)) {

if (GEPI->getType() != Type::getInt8PtrTy(U->getContext(), AS))
  return false;
if (!GEPI->hasAllZeroIndices())
  return false;
if (!onlyUsedByLifetimeMarkers(GEPI))
  return false;

On Thu, Feb 19, 2015 at 6:59 PM, Chandler Carruth <chandlerc@google.com> wrote:

I don't really like this approach because it will introduce false negatives. We have even been helped in >debugging "miscompiles" of incorrect code by running asan at -O0 to point out where the stack variables.

Chandler, are there specific cases you are worried about here? False negatives are possible at higher optimization levels (and your proposed approach to solve them sounds reasonable); with this patch, I am not trying to get parity with -O1, but just reduce the number of unnecessary checks.

Sorry for the delayed response - I was on vacation late last week.

kcc added a comment.Feb 23 2015, 4:01 PM

Indeed, Chandler, I actually don't see why this could lead to false positives? What do we miss?

lib/Transforms/Instrumentation/AddressSanitizer.cpp
803–804

You will now execute all other if statements even if the first one is true. At leats change the following ifs to "else if", or preserve the structure with 'return' and have another function that checks isAllocaPromotable

826

exchange the order of ifs?

zaks.anna updated this revision to Diff 20566.Feb 23 2015, 6:57 PM
zaks.anna edited edge metadata.

I've addressed the outstanding comments:

  • Refactored the code to reuse isInterestingAlloca() from within isInterestingMemoryAccess().
  • Addressed Kostya's latest comments about "if" re-structuring as well.
  • Kuba is helping out by modifying the compiler-rt tests so that they are not sensitive to optimization levels :)

Kostya,
As I've commented on the other thread, I don't think http://reviews.llvm.org/D7583 is currently doing the right thing (skipping "inbounds" geps). Though, I do believe we can achieve optimizations by building on top of this and excluding allocas that only have constant and provably in bounds accesses.

kcc accepted this revision.Feb 25 2015, 12:41 PM
kcc edited edge metadata.

LGTM

This revision is now accepted and ready to land.Feb 25 2015, 12:41 PM

Committed in r230724.

zaks.anna closed this revision.Jun 25 2015, 6:14 PM