This is an archive of the discontinued LLVM Phabricator instance.

[asan] Reports suppressions for ASan recovery mode (compiler-rt part).
ClosedPublic

Authored by m.ostapenko on Nov 30 2015, 7:59 AM.

Details

Summary

This is the compiler-rt part of suppression functionality for ASan recovery mode (here is the LLVM core part: http://reviews.llvm.org/D15079).

In current implementation, I'm trying to reuse UBSan approach with SourceLocations, so moving SourceLocation class definition from ubsan to sanitizer_common sources.

Diff Detail

Repository
rL LLVM

Event Timeline

m.ostapenko retitled this revision from to [asan] Reports suppressions for ASan recovery mode (compiler-rt part)..
m.ostapenko updated this object.
m.ostapenko added reviewers: kcc, eugenis, samsonov.
m.ostapenko set the repository for this revision to rL LLVM.
m.ostapenko added subscribers: ygribov, llvm-commits.
ygribov added inline comments.Nov 30 2015, 9:12 AM
lib/asan/asan_flags.inc
143 ↗(On Diff #41396)

Do we need this flag? AFAIK UBSan suppresses dups unconditionally.

lib/asan/asan_rtl.cc
135 ↗(On Diff #41396)

Whitespace padding too large.

test/asan/TestCases/Posix/halt_on_error-torture-2.cc
1 ↗(On Diff #41396)

I wonder if this heavy multithreaded test is worth it.

m.ostapenko added inline comments.Dec 1 2015, 8:37 AM
lib/asan/asan_flags.inc
143 ↗(On Diff #41396)

I added this flag just to preserve your halt_on_error-torture.cc stress testcase, that is designed to deal with multiple reports.

m.ostapenko updated this revision to Diff 41517.Dec 1 2015, 8:41 AM
m.ostapenko removed rL LLVM as the repository for this revision.

Updating the patch according to nits in related discussion (http://reviews.llvm.org/D15079).

kcc added inline comments.Dec 1 2015, 2:30 PM
lib/asan/asan_flags.inc
141 ↗(On Diff #41517)

maybe use names similar to ones in tsan?
e.g. suppress_equal_pcs?

lib/asan/asan_rtl.cc
117 ↗(On Diff #41517)

can't you make it __sanitizer::atomic_uint32_t ?

121 ↗(On Diff #41517)

Bad.
We should not duplicate this kind of functionality.
Why not call Die() here?

129 ↗(On Diff #41517)

this is actually racy, isn't it? :)

test/asan/TestCases/halt_on_error-2.c
6 ↗(On Diff #41517)

I would just add
CHECK-NOT: ERROR
as a 4-th CHECK line

Also, add a run-line that actually produces 30 error messages and checks for that.

ygribov added inline comments.Dec 1 2015, 8:39 PM
lib/asan/asan_flags.inc
141 ↗(On Diff #41517)

And also make the flag common.

m.ostapenko added inline comments.Dec 3 2015, 4:58 AM
lib/asan/asan_flags.inc
141 ↗(On Diff #41517)

Ok.

lib/asan/asan_rtl.cc
117 ↗(On Diff #41517)

Right, thank you.

121 ↗(On Diff #41517)

Ok.

129 ↗(On Diff #41517)

Yes, but do we actually need to be atomic here? I mean, if we hit the race, we would just jump over one entry in array, so it would contain 0. I don't think exact precision is critical here. Atomic operation may introduce unnecessary slowdown... not sure about this.

test/asan/TestCases/halt_on_error-2.c
6 ↗(On Diff #41517)

Ok.

m.ostapenko updated this revision to Diff 41741.Dec 3 2015, 5:00 AM

Updating according to nits.

m.ostapenko marked 7 inline comments as done.Dec 3 2015, 5:02 AM
kcc added inline comments.Dec 3 2015, 3:56 PM
lib/asan/asan_rtl.cc
128 ↗(On Diff #41741)

phrasing? extra would?

Also, it's enough to iterate from pc_num-1 to avoid the race

test/asan/TestCases/Posix/halt_on_error-torture.cc
11 ↗(On Diff #41741)

What's ASAN_SUPPRESS_EQUAL_PCS for?

test/asan/TestCases/halt_on_error-2.c
7 ↗(On Diff #41741)

Instead of creating a separate test it better to use a different RUN line with another --check-prefix.

m.ostapenko updated this revision to Diff 41883.Dec 4 2015, 8:34 AM

Updating the diff.

It seems we cannot get rid of races without using mutex, because we should protect access to different parts of AsanBuggyPcPool array. However, using a mutex here is undesirable due to performance issue. So, for now, I see such alternatives how to implement suppression:

  1. Use mutex. This is probably an overkill.
  2. Use atomic pc_num counter and non-atomic accesses to AsanBuggyPcPool. Here, we would have races on accesses to AsanBuggyPcPool, that may lead to duplication in reports (e.g. when 2 threads with same PC are searching in AsanBuggyPcPool and no one hasn't updated the array yet). Also, we'll need to add one additional check for AsanBuggyPcPool overflow before updating the pool because it is possible that multiple threads would try to update it simultaneously.

This patch implements the second approach for now, thought it's OK to have few duplications in favor of performance.

Konstantin, what do you think about it?

kcc added a reviewer: dvyukov.Dec 4 2015, 12:58 PM
In D15080#302479, @m.ostepenko wrote:

Updating the diff.

It seems we cannot get rid of races without using mutex,

Oh, sure we can. :)

?> because we should protect access to different parts of AsanBuggyPcPool array. However, using a mutex here is undesirable due to performance issue. So, for now, I see such alternatives how to implement suppression:

  1. Use mutex. This is probably an overkill.

Yep, no mutex please.

  1. Use atomic pc_num counter and non-atomic accesses to AsanBuggyPcPool. Here, we would have races on accesses to AsanBuggyPcPool, that may lead to duplication in reports (e.g. when 2 threads with same PC are searching in AsanBuggyPcPool and no one hasn't updated the array yet). Also, we'll need to add one additional check for AsanBuggyPcPool overflow before updating the pool because it is possible that multiple threads would try to update it simultaneously.

This patch implements the second approach for now, thought it's OK to have few duplications in favor of performance.

Summoning lock-free synchronization guru

Konstantin, what do you think about it?

because we should protect access to different parts of AsanBuggyPcPool array.

I mean, before we can update AsanBuggyPcPool[pc_num], we need to iterate over the whole [0, pc_num - 1] range of array. To avoid reports duplication, we should ensure, that nobody modifies the array during lookup and I'm not sure we can achieve this w/o mutex (or something that blocks the for loop). But let's wait for Dmitry for comments.

m.ostapenko updated this revision to Diff 42169.Dec 8 2015, 7:01 AM
m.ostapenko edited edge metadata.

Okay, let's do another try.

New patch implements a lock-free approach (thanks Yura for idea!) based on CAS. The only drawback of the method is that we should iterate over AsanBuggyPcPool from it's beginning each time we perform lookup, that may become a performance bottleneck. However, this way we managed to avoid races on AsanBuggyPcPool without using mutex and the only race we have is on pc_num counter, used for fast path (early returning) only. So, I wonder, if something like this can resolve the issue?

dvyukov added inline comments.Dec 8 2015, 7:14 AM
lib/asan/asan_rtl.cc
116 ↗(On Diff #42169)

This is unnecessary complex and racy. And also produces excessive contention on AsanBuggyPcPool due to compare_exchange even when we are suppressing already reported pc. What you need is:

static bool SuppressErrorReport(uptr pc) {
	for (unsigned i = 0; i < kAsanBuggyPcPoolSize; i++) {
		u64 cmp = atomic_load_relaxed(&AsanBuggyPcPool[i]);
		if (cmp == 0 && atomic_compare_exchange_strong(&AsanBuggyPcPool[i], cmp, pc, memory_order_relaxed))
			return false;
		if (cmp == pc)
			return true;
	}
	Die();
}
ygribov added inline comments.Dec 8 2015, 7:27 AM
lib/asan/asan_rtl.cc
123 ↗(On Diff #42169)

You already have this check after loop. Why do you need this preliminary check? It'll slow fast path.

129 ↗(On Diff #42169)

I'm not familiar with atomics but I think that CAS is much slower than ordinary load. Can you do loads for fast path and only switch to CAS when reaching NULL?

136 ↗(On Diff #42169)

It's better to avoid this extra atomic by using value returned by atomic_compare_exchange.

146 ↗(On Diff #42169)

I believe we can get rid of pc_num completely.

147 ↗(On Diff #42169)

This function seems to always return false.

166 ↗(On Diff #42169)

I wonder if it makes sense to move suppression check to ReportGenericError. It'll remove duplicate code and also automatically handle other cases (ReportDoubleFree, etc.).

test/asan/TestCases/halt_on_error-2.c
4 ↗(On Diff #42169)

Why no grep here?

7 ↗(On Diff #42169)

Can you dump to %t.log or something like that? Also you don't need true.

25 ↗(On Diff #42169)

You can control this with cmdline args and avoid double compilation.

Oops, collided with Dima.

m.ostapenko added inline comments.Dec 8 2015, 8:40 AM
test/asan/TestCases/halt_on_error-2.c
4 ↗(On Diff #42169)

I would just add
CHECK-NOT: ERROR
as a 4-th CHECK line

Just addressing this comment.

m.ostapenko updated this revision to Diff 42186.Dec 8 2015, 9:02 AM

Dmitry, Yura, thanks for your hits. Updating the patch.

dvyukov edited edge metadata.Dec 8 2015, 9:29 AM

The lock-free part LGTM.

kcc added inline comments.Dec 8 2015, 11:34 PM
lib/asan/asan_report.cc
34 ↗(On Diff #42186)

use static for all global vars here, please.

lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42186)

no spaces around =

test/asan/TestCases/halt_on_error-2.c
1 ↗(On Diff #42186)

Can we rename this test to have suppress_equal_pcs in the name (e.g. halt_on_error_suppress_equal_pcs.c).
Having _2 in the name is not helpful.

5 ↗(On Diff #42186)

I want to have two more RUN lines here:
one that has suppress_equal_pcs=true and checks the suppress_equal_pcs=true is the same as default.
another that has suppress_equal_pcs=false and checks that this is different from default.

m.ostapenko edited edge metadata.

Addressing the nits.

kcc accepted this revision.Dec 9 2015, 8:10 AM
kcc edited edge metadata.

LGTM with one small nit

lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42268)

add "(asan only)" to the descr string

This revision is now accepted and ready to land.Dec 9 2015, 8:10 AM
ygribov added inline comments.Dec 9 2015, 9:26 AM
lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42268)

Do we want to enable this for UBSan as well btw?

kcc added inline comments.Dec 9 2015, 9:33 AM
lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42268)

It hasn't been a problem for us in the past.
Besides, ubsan has a mode with no runtime, so this flag will not work there.
But if you think we need it there, sure, why not.
(The code will need to be moved to sanitizer_common then)

samsonov added inline comments.Dec 9 2015, 11:24 AM
lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42268)

UBSan already has a somewhat robust deduplication based on source locations.

ygribov added inline comments.Dec 9 2015, 3:17 PM
lib/sanitizer_common/sanitizer_flags.inc
197 ↗(On Diff #42268)

Yeah, I meant a flag to disable those.

This revision was automatically updated to reflect the committed changes.