This is an archive of the discontinued LLVM Phabricator instance.

Enhance libsanitizer support for invalid-pointer-pair.
AbandonedPublic

Authored by marxin on Oct 16 2017, 1:36 PM.

Details

Reviewers
kcc
llvm-commits
Summary

Hi.

Following patch adds support of all memory origins in CheckForInvalidPointerPair function.
For small difference of pointers, it's directly done in shadow memory (the limit was set to 2048B).
Then we search for origin of first pointer and verify that the second one has the same origin. If so,
we verify that it points either to a same variable (in case of stack memory or a global variable), or to
a same heap segment.

Diff Detail

Event Timeline

marxin created this revision.Oct 16 2017, 1:36 PM
kcc edited edge metadata.

Thanks for the patch, but...

  • please follow the coding style (some comments inline)
  • please add tests
lib/asan/asan_descriptions.cc
227

the idiom used in this code is

if (T *x = getX()) {
  use(x)
}
353

We don't use 'unsigned' in this code for this purpose.

361

Please make sure to format all the code.

kcc added inline comments.Oct 16 2017, 1:41 PM
lib/asan/asan_report.cc
329

No goto please.

Thanks, I'll add some tests. May I please ask you how to run a single test-case (I guess using llvm-lit) ?

Besides the patch, we'd like to coordinate on options enabling this in the compilers.
From what I can see, in clang it is currently an experimental asan-detect-invalid-pointer-pair param,
what Martin has implemented is handle it as another kind of sanitization (separate kinds for pointer compare and pointer subtract),
-fsanitize=pointer-subtract or -fsanitize=pointer-compare. One can then do:
-fsanitize=address,pointer-subtract,pointer-compare etc., if address is not specified but one of these is, then
right now we error out. I could imagine pointer-subtract or pointer-compare to be supported even without address
sanitization, where we'd link against libasan and add poisoning/unpoisoning/global variable registration etc., but not
memory dereference checks, the problem is that the compiler then doesn't know if it should do the registration etc.
in kernel-address or user address style.

marxin updated this revision to Diff 119294.Oct 17 2017, 5:00 AM

I've updated patch where I fixed:

  • various coding style issues (as reported by make check-asan)
  • goto has been removed, not that due to ThreadRegistryLock, there are some extra scopes that guarantee that we don't reach a dead lock
  • new tests have been aded

Shouldn't detect_invalid_pointer_pairs=1 be the default? I mean, the user already makes the decision to enable it through a compiler option/parameter and the generated code adds non-zero amount of .text space and some runtime cost, and there is no extra runtime costs for this if code isn't instrumented, so detect_invalid_pointer_pairs=0 by default only causes false assumption that code is error clean.

marxin marked 4 inline comments as done.Oct 17 2017, 9:50 AM

Shouldn't detect_invalid_pointer_pairs=1 be the default? I mean, the user already makes the decision to enable it through a compiler option/parameter and the generated code adds non-zero amount of .text space and some runtime cost, and there is no extra runtime costs for this if code isn't instrumented, so detect_invalid_pointer_pairs=0 by default only causes false assumption that code is error clean.

Works for me. I'll update the patch.

So I've got patch for both default value of the option and type of the option. Currently it's int, but we don't use any levels different from 0/1.
Are both changes desired?

kcc added a comment.Oct 17 2017, 5:09 PM

Thanks, I'll add some tests. May I please ask you how to run a single test-case (I guess using llvm-lit) ?

Frankly, dunno. The entire ninja check-llvm takes 40 seconds for me
Doesn't this work?

llvm-lit <test path>

[disclaimer: this is the week of the LLVM developer conference, please expect delayed replies]

Besides the patch, we'd like to coordinate on options enabling this in the compilers.
From what I can see, in clang it is currently an experimental asan-detect-invalid-pointer-pair param,
what Martin has implemented is handle it as another kind of sanitization (separate kinds for pointer compare and pointer subtract),
-fsanitize=pointer-subtract or -fsanitize=pointer-compare. One can then do:
-fsanitize=address,pointer-subtract,pointer-compare etc., if address is not specified but one of these is, then
right now we error out.

Yes, something like this sounds good.

I've prototyped this feature quite a while ago and tried it on a large code base and got a lot of false positives (or things that looked like false positives).
One problem is that C++'s std::less explicitly allows comparing two unrelated pointers and this is used in std::set and such.
I suspect that proper implementation in LLVM will need to have the instrumentation done in Clang, not in LLVM
See also https://bugs.llvm.org/show_bug.cgi?id=18989

Unfortunately, my team is unlikely to have time for this work in Q4 -- but we will gladly review patches.

I could imagine pointer-subtract or pointer-compare to be supported even without address
sanitization, where we'd link against libasan and add poisoning/unpoisoning/global variable registration etc., but not
memory dereference checks, the problem is that the compiler then doesn't know if it should do the registration etc.
in kernel-address or user address style.

We may do it in future but I wouldn't bother about pointer checks w/o asan for now -- not until we see how this feature works.

kcc added a comment.Oct 17 2017, 5:19 PM

Shouldn't detect_invalid_pointer_pairs=1 be the default? I mean, the user already makes the decision to enable it through a compiler option/parameter and the generated code adds non-zero amount of .text space and some runtime cost, and there is no extra runtime costs for this if code isn't instrumented, so detect_invalid_pointer_pairs=0 by default only causes false assumption that code is error clean.

Maybe later.
The reason for these two-stage flags is that first we need to ensure that the instrumentation itself does not introduce any trouble (excessive code size, too frequent callbacks, etc).
Only then, we want to enable the run-time feature and we can do it selectively at process startup vi __asan_default_options or ASAN_OPTIONS

kcc added a comment.Oct 17 2017, 5:41 PM

Please use the arc tool to submit the patch or otherwise paste the patch with full context.
Please always CC llvm-commits for changes in compiler-rt (phabricator doesn't do it automatically)

(My next response is likely to be on the next week)

lib/asan/asan_descriptions.cc
225

You can simplify the interface and the implementation if you return shadow_addr or nullptr if it's not found.

229

no else after return

347

indentation doesn't look right.
If in doubt, use clang-format

353

int is even worse.
In proper C++ the only good type for iteration is size_t.
asan is not a proper C++ sadly, so we use uptr.

lib/asan/asan_descriptions.h
150

Returns true

152

naming nit pick:

points inside the same var

or

points inside a same var

?

Not native speaker myself, but I think *the* is right.

lib/asan/asan_report.cc
300

a macro is better than a goto, but still bad. Can this be expressed in just C++?
(e.g. wrap the main logic into a separate function and return from it on error).

329

ouch. We have a lock in a function that is called on every pointer cmp?
This is bad.
We need to avoid the lock.
Probably by relaxing the check a bit (e.g. never compare pointers from other thread's stacks).

Also, your tests don't seem to have threads, so this code is probably untested.

lib/asan/asan_thread.cc
357

Please use RoundUpTo() or some such

364

no need for {}

test/asan/TestCases/invalid-pointer-pairs-compare-1.cc
1 โ†—(On Diff #119294)

What does the file name mean (-1.cc vs -2.cc)?
If these are positive vs negtive, either express this in the name or (better) put everything into one file.
E.g. put all positives in the beginning and use CHECK-NOT to ensure we don't report anything there

test/asan/TestCases/invalid-pointer-pairs-subtract-1.cc
1 โ†—(On Diff #119294)

same here.

(just saw this by accident)

@marxin @kcc

May I please ask you how to run a single test-case (I guess using llvm-lit) ?

I've added LIT_FILTER environment variable for such cases. So the command line becomes e.g. env LIT_FILTER=mytestcase.cc ninja check-clang

marxin marked 11 inline comments as done.Oct 17 2017, 11:41 PM
In D38971#900253, @kcc wrote:

Thanks, I'll add some tests. May I please ask you how to run a single test-case (I guess using llvm-lit) ?

Frankly, dunno. The entire ninja check-llvm takes 40 seconds for me
Doesn't this work?

llvm-lit <test path>

Agree that make check-sanitizer is really fast.

lib/asan/asan_descriptions.cc
225

Good point!

353

I basically copied what's done in GlobalAddressDescription::Print:

for (int i = 0; i < size; i++) {

I'm changing that to uptr.

lib/asan/asan_descriptions.h
152

Agree, your name is better.

lib/asan/asan_report.cc
329

Unfortunately yes. Do you mean to call FindThreadByStackAddress (which currently needs lock as it calls FindThreadContextLocked)?
Then comparing the variables if threads do match?

I can come up with a test-case for that.

marxin marked 3 inline comments as done.Oct 17 2017, 11:42 PM
In D38971#900315, @kcc wrote:

Please use the arc tool to submit the patch or otherwise paste the patch with full context.
Please always CC llvm-commits for changes in compiler-rt (phabricator doesn't do it automatically)

Done that with new version of the patch.

(My next response is likely to be on the next week)

Works for me.

marxin updated this revision to Diff 119434.Oct 17 2017, 11:44 PM
marxin added a subscriber: llvm-commits.

Fixed most of notes spotted by @kcc.

Analyzing errors spotted on GCC itself, I would suggest to have 3 values for detect_invalid_pointer_pairs:

  • 0 = disabled
  • 1 = all except comparison with zero (0x0)
  • 2 = all

Thoughts?

kcc added a comment.Oct 24 2017, 4:53 PM

Analyzing errors spotted on GCC itself, I would suggest to have 3 values for detect_invalid_pointer_pairs:

  • 0 = disabled
  • 1 = all except comparison with zero (0x0)
  • 2 = all

Thoughts?

What does the standard say about comparisons with zero and subtracting zero from a pointer?

lib/asan/asan_report.cc
308

isn't this the same as

return __asan_region_is_poisoned(left, offset);
329

We should not call anything that grabs a lock here.
Let's start from simply ignoring any pointers from threads other than the current.

What does the standard say about comparisons with zero and subtracting zero from a pointer?

I summarize that for both languages C and C++

C:

  • Relational operators: ยง6.5.8 - point 5 - there's described that it's defined basically for same objects; last sentence says:

In all other cases, the behavior is undefined.

  • Additive operators - ยง6.5.6 - point 9 - it basically says that it's defined if it's representative in ptrdiff_t type

C++:

  • Relational operators: ยง5.9.3 - it defined explicitly which situations are defined - comparison of pointers to a different object

and also comparison with null pointer is not listed. Thus I consider it as invalid.

  • Additive operators: ยง5.7.6 - quite clear:

Unless both pointers point to elements of the same array object, or

one past the last element of the array object, the behavior is undefined.

That said, we should probably skip instrumentation of additive operators in C. What do you think @kcc and @jakubjelinek
about 3 levels of detect_invalid_pointer_pairs ?

kcc added a comment.Oct 25 2017, 4:28 PM

I summarize that for both languages C and C++

Ok, it might be worth having a separate flag.
But please do it in a separate patch.

I'll do it in a separate patch.
Can you please @kcc explain me how should I use the lock for stack access in order to have it less locking?
I'm planning also to come up with a multi-threaded test-case.

kcc added a comment.Oct 26 2017, 10:02 AM

I'll do it in a separate patch.
Can you please @kcc explain me how should I use the lock for stack access in order to have it less locking?
I'm planning also to come up with a multi-threaded test-case.

Please just don't do it in this patch -- no locking, no checking of other thread's stacks.

In D38971#908129, @kcc wrote:

I'll do it in a separate patch.
Can you please @kcc explain me how should I use the lock for stack access in order to have it less locking?
I'm planning also to come up with a multi-threaded test-case.

Please just don't do it in this patch -- no locking, no checking of other thread's stacks.

Ok, so now we check pointers in shadow memory for size <= 2048. If we don't want to grab the lock, FindThreadByStackAddress can't be used.
And thus AddrIsInStack can't be used. So do you prefer not to use it and stay with the shadow memory iteration?

kcc added a comment.Oct 30 2017, 4:07 PM

We can still use the stack bounds for the *current* thread, right?
(maybe will need some extra plumbing for this)

marxin updated this revision to Diff 121121.Nov 1 2017, 6:10 AM
marxin marked 2 inline comments as done.

Updated version where we do not grab thread registry lock. New threads test added.

marxin marked 5 inline comments as done.Nov 1 2017, 6:11 AM

Hello. I would like to gently ping the review. Hope we're close to be accepted?

kcc added a comment.Nov 7 2017, 8:25 AM

[sorry, I've been swamped]
Alexey, please make your round of reviews.

lib/asan/asan_descriptions.cc
344

I can't read this.
Please refactor in a more readable way.
I suggest to do

a = globals[i]
b = globals[j]

and implement lambda(s) to compute recurring subexpression(s)

lib/asan/asan_report.cc
307

define a

const uptr kMaxOffset = 2048; // Explain why 2048
marxin updated this revision to Diff 122031.Nov 7 2017, 11:04 PM

Updated version, as we mix __asan_globals and GlobalAddressDescription in GlobalAddressDescription::PointsInsideTheSameVariable, I decided not to use lambdas.
Even though, hope it's readable.

marxin marked 2 inline comments as done.Nov 7 2017, 11:05 PM
kcc added a comment.Nov 13 2017, 4:26 PM

Add couple more comments, sorry for delay.
Alexey, your turn now.

lib/asan/asan_report.cc
316

Ouch. Please don't do this.

uptr a;
if((a = foo())) ...

It's easy to misinterpret such code.
Instead, do

if (uptr a = foo()) ...
317

here and below instead of

if (a && b)  return true; else return false;

prefer

return a && b;
marxin updated this revision to Diff 122793.Nov 14 2017, 12:50 AM
marxin marked an inline comment as done.

Notes fixed in the new version.

marxin marked an inline comment as done.Nov 14 2017, 12:51 AM
alekseyshl edited edge metadata.Nov 20 2017, 5:12 PM

Very sorry for the delay!

lib/asan/asan_descriptions.cc
344

Move it to the outer loop

345

const __asan_global &b and the same for a

lib/asan/asan_descriptions.h
149

this description points inside the same

lib/asan/asan_report.cc
307

uptr offset = right - left;

319

What are we trying to save here? Why not just being explicit:

if (uptr shadow_offset_left = t->GetStackFrameVariableBeginning(left)) {
  uptr shadow_offset_right = t->GetStackFrameVariableBeginning(right);
  return shadow_offset_right == 0 || shadow_offset_left != shadow_offset_right;
}
332

I wonder why "right - 1"?

lib/asan/asan_thread.cc
369

It returns a pointer to one of the redzones, not to the variable beginning (as the function name suggests), right?

lib/asan/asan_thread.h
93

Returns a pointer to the start of the stack variable's shadow memory.

94

GetStackVariableShadowStart

marxin updated this revision to Diff 123746.Nov 21 2017, 2:36 AM

Updated according to provided review.
Unfortunately I'm leaving for holidays tomorrow. @jakubjelinek will fix notes if there will be any.

marxin marked 8 inline comments as done.Nov 21 2017, 2:37 AM
marxin marked an inline comment as done.Nov 21 2017, 2:39 AM
marxin added inline comments.
lib/asan/asan_report.cc
332

Because it's possible to have a valid pointer comparison of a pointer that points right after a valid location.
In case of global variables, for a pointer pointing right after a variable, I want to have it listed in GlobalAddressDescription.
I added test-case for that:

+ bar(&global[0], &global[10000]);

alekseyshl added inline comments.Nov 21 2017, 2:01 PM
lib/asan/asan_report.cc
340

Change it to // style comment.

lib/asan/asan_thread.cc
369

That was a question, actually. This function does not what its name suggests. At least, acknowledge it in the comment and explain why it is ok.

lib/asan/asan_thread.h
93

It's marked as done, but I do not see the change.

test/asan/TestCases/invalid-pointer-pairs-compare-errors.cc
18

int main() {

101

free(heap1);

test/asan/TestCases/invalid-pointer-pairs-subtract-errors.cc
45

free(heap1);
free(heap2);

test/asan/TestCases/invalid-pointer-pairs-subtract-success.cc
28

In all the tests, change /* */ comments to //

test/asan/TestCases/invalid-pointer-pairs-threads.cc
8

This test should be in test/asan/TestCases/Posix/

34

This and sleep in threads, use synchronization instead (for example, two barriers of count 3, barrier_assign and barrier_check, wait for _assign here and wait for _check after the pointer check).

That will be less flaky, not racy and much faster than 1 second.

Phabriactor doesn't allow me to update this patch, so I've uploaded it to D40600 instead. If you have a way to update the diff here, please do.

lib/asan/asan_descriptions.h
152

The comment also has a "inside a same" when it should be "inside the same".

D40600 is committed, I guess, we can abandon this one now.

marxin abandoned this revision.Mar 14 2018, 1:49 AM
marxin marked an inline comment as done.