This is an archive of the discontinued LLVM Phabricator instance.

[asan] Add an interceptor for strtok
ClosedPublic

Authored by mrigger on Feb 26 2017, 11:15 AM.

Diff Detail

Event Timeline

mrigger created this revision.Feb 26 2017, 11:15 AM
ygribov edited edge metadata.Feb 27 2017, 1:45 AM

I suggest to regenerate patch with -U99999, to get more context.

asan_interceptors.cc
549 ↗(On Diff #89814)

Please use ASAN_READ_STRING instead, similar to other string interceptors. str_size should be checked only if strict_strings is enabled, otherwise you can only check range up to first delimiter (inclusively).

tests/asan_str_test.cc
118 ↗(On Diff #89814)

Better avoid aliasing, strtok arguments are restricted (more below).

121 ↗(On Diff #89814)

delim is somewhat misleading name cause it's also used in other context here.

123 ↗(On Diff #89814)

What does this add to previous tests?

mrigger updated this revision to Diff 89941.Feb 27 2017, 2:57 PM

Thanks a lot for your feedback @ygribov! I improved the test case based on it. Additionally, I changed the interceptor to use ASAN_READ_STRING_OF_LEN instead of directly calling ASAN_READ_RANGE. However, I think I have not yet understood why using strict_strings should only verify the string until the first delimiter. If I select the passed n to be smaller than strlen(str) + 1 then the test case fails, since it does not detect the overflow. What did I understand wrong? Should I instead implement the test case in projects/compiler-rt/test/asan/TestCases/ and set RUN: %env_asan_opts=strict_string_checks=true? If yes, should I select n to be strlen(str) (like the other interceptors) or 1?

kcc added inline comments.Feb 27 2017, 5:22 PM
lib/asan/asan_interceptors.cc
541 ↗(On Diff #89941)

This should go to lib/sanitizer_common/sanitizer_common_interceptors.inc so that other sanitizers can benefit

However, I think I have not yet understood why using strict_strings should only verify the string until the first delimiter.

I actually learned this lesson the hard way. C/C++ standards require arguments of standard string functions (strcpy, strchr`, etc.) to be strings i.e. zero-terminated arrays. But unfortunately a lot of existing code seems to ignore this rule and also call string functions on arbitrary C arrays. For example

char a[10];
memcpy(a, "11112", 5);  // Non-zero terminated
strchr(a, '2');  // Works in practice, even though a isn't a string, because strchr only checks first 5 chars

By default we'd like to not warn about uninteresting errors so we only check string prefix (unless user explicitly asks us with strict_strings runtime flag).

If I select the passed n to be smaller than strlen(str) + 1 then the test case fails, since it does not detect the overflow. What did I understand wrong?

Yes, I guess the test needs to be updated. You can take a look at how other string tests are done (e.g. test/asan/TestCases//strstr*.c, you'll probly need __asan_poison_memory_region hack).

mrigger updated this revision to Diff 91474.Mar 11 2017, 8:32 AM

Thanks for feedback @ygribov and @kcc and sorry for the delay in addressing the issues in the change! I moved the implementation of the interceptor to lib/sanitizer_common/sanitizer_common_interceptors.inc, fixed the issues that existed when having strict_string_checks disabled, and improved the test case.

kcc edited edge metadata.

LGTM, but I'd like Aleksey to take a closer look.
and thanks!

alekseyshl requested changes to this revision.Mar 14 2017, 6:17 PM
alekseyshl added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
475

#if SANITIZER_INTERCEPT_STRTOK and define it to 1 for now.

485

Why are we not checking result string here? When non-null, it's expected to be a part of the original str and to be readable and valid, right?

489

What if your first call to strtok returns nullptr?

491

result_length + 1); and indent it to align with other arguments

494

First, it should be ..., del_length, del_length + 1); and second, it does not make sense to use COMMON_INTERCEPTOR_READ_STRING_OF_LEN here,

COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1);

would be enough.

499

I'd move it below strcasestr interceptor (strstr and strcasestr are related).

test/asan/TestCases/strtok.c
2

Why not have strtok-1.c, strtok-2.c etc. files? Tests would be quite a bit more readable.

23

Why other tests do not have comments? Please split them into separate files and add a top line comment explaining what exactly is tested.

This revision now requires changes to proceed.Mar 14 2017, 6:17 PM
kcc added inline comments.Mar 14 2017, 6:29 PM
test/asan/TestCases/strtok.c
23

Not worth it, IMHO
But what I did miss is that this test does "%clang_asan %s -o %t" 5 times, while 1 is enough.

alekseyshl added inline comments.Mar 14 2017, 6:44 PM
test/asan/TestCases/strtok.c
13

And, if we go with one file, please wrap these long lines like this:

RUN: ... | \
RUN: FileCheck ...

23

Ok, fine, one file is acceptable too, but still, comments would be much welcomed!

mrigger updated this revision to Diff 91904.Mar 15 2017, 11:17 AM
mrigger edited edge metadata.

Thanks for the feedback @alekseyshl! Check the in-line comments to see which issues I already fixed, and to which ones I still have questions.

lib/sanitizer_common/sanitizer_common_interceptors.inc
475

Done.

485

Other interceptors also do not check the result of the libc functions. See for example strchr or strpbrk. I think all interceptors are based on the assumption that the libc functions do not contain any errors.

489

Good catch. I fixed the error and added a test case.

491

Done.

494

I implemented it like you suggested in my first revision. The current implementation is based on @ygribov's feedback and his explanations about strict_strings. Did I understand them wrong?

499

Done.

test/asan/TestCases/strtok.c
2

I agree that the tests would be more readable. However, I observed that other larger *.str test cases follow this structure; for example, see strtol_strict, strncasecmp_strict, and strncmp_strict.

13

The other test cases, from which I copied this pattern, are all written in one line. Should I still break the existing "convention"?

23

I added a comment for each test case. Additionally, the test is now only compiled once.

alekseyshl requested changes to this revision.Mar 15 2017, 11:43 AM

Awesome, thank you for doing it!

lib/sanitizer_common/sanitizer_common_interceptors.inc
485

Ah, right, thanks.

494

You're explicitly calling strlen on that string, it will look for the \0 anyway. Check what COMMON_INTERCEPTOR_READ_STRING_OF_LEN is doing, substitute your arguments.

Your last argument being just 1 should definitely be fixed, now you're checking just the first char, right?

test/asan/TestCases/strtok.c
13

Style evolves. This is not critical, especially with your last change, lines are shorter and easier to parse, but still, I prefer to keep all lines under 80 chars. If it is not too big of a problem, I'd appreciate breaking those exceeding 80 chars at FileCheck, as I suggested, and please indent FileCheck:

RUN: FileCheck ...

This revision now requires changes to proceed.Mar 15 2017, 11:43 AM
mrigger added inline comments.Mar 15 2017, 1:18 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
494

Note, that we use the not-intercepted strlen. Following, an out-of-bounds access might not result in an error here.

Yes, we only check the first delimiter (for strict_string_checks=0), as @ygribov suggested, if I understood him right. @ygribov, can you please comment to clear things up?

Here is a demonstration of the behavior on an example:

#include <stdio.h>
#include <string.h>

int main() {
    char buf[] = "abcb";
    char del[] = {'b'}; // missing NULL terminator
    char* result = strtok(buf, del);
    printf("%s\n", result);
}

Executing the program with ASAN_OPTIONS=strict_string_checks=1 ./a.out lets ASan output an error. However, executing with ASAN_OPTIONS=strict_string_checks=0 ./a.out does not result in an error.

Here is also the documentation for strict_string_checks:

	strict_string_checks
		- If set check that string arguments are properly null-terminated

I'm happy to change it, but I want to be sure that we implement the expected behavior.

alekseyshl added inline comments.Mar 15 2017, 1:53 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
494

The code I proposed uses non-intercepted strlen as well:

COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1);

The problem I'm trying to point out is not about the first delimiter (that's fine, but it's about the other COMMON_INTERCEPTOR_READ_STRING_OF_LEN use, the one above), it's that you pass n=1 to your second macro, this one:

COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, delimiters, del_length, 1);

which means "1 char". I believe, it was not your intention.

ygribov added inline comments.Mar 15 2017, 2:56 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
494

Sorry guys, I was out. Frankly I agree with @alekseyshl in that we shouldn't unconditionally call strlen on str and delimeters. They are not necessarily null-terminated so by calling REAL(strlen) when strict_strings is off, you risk triggering a runtime error.

See my suggestions below for how to change the code to avoid strlens.

500

Bind star to ctx?

502

I'd suggest to handle !intercept_strtok case here as this will allow to reduce function nesting. It's minor though.

508

Strtok will return NULL or zero-terminated string so you should be able to do plain COMMON_INTERCEPTOR_READ_RANGE on str (probably only when strict_strings is off as otherwise str has been completely checked on the first call).

510

I agree with Alex - we shouldn't call strlen here, because if str isn't zero-terminated we risk to trigger runtime error. The best we can do is

COMMON_INTERCEPTOR_READ_STRING(ctx, str, 1);

Note that 1 is suboptimal here - we could have used strspn to find the first occurence of delimeters and use that to better estimate the length.

515

Same to line 508

COMMON_INTERCEPTOR_READ_RANGE(ctx, result, REAL(strlen)(result) + 1);

?

516

I think else-branch shouldn't be neglected - you can safely do

COMMON_INTERCEPTOR_READ_RANGE(ctx, str, REAL(strlen) + 1)

(we know that none of delimeters has been found so str must have been scanned to the terminating 0).

519

Same issue with strlen here. Just do

COMMON_INTERCEPTOR_READ_STRING(ctx, delimeters, 1);

(1 is again suboptimal).

ygribov added inline comments.Mar 15 2017, 2:59 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
494

Your last argument being just 1 should definitely be fixed,
now you're checking just the first char, right?

We can't easily check for more when strict_strings is off (unless we reimplement most of strtok in interceptor).

mrigger updated this revision to Diff 92059.Mar 16 2017, 2:35 PM
mrigger edited edge metadata.

Thanks again for the review @ygribov and @alekseyshl! I believe that I've addressed all the issues that you raised. @ygribov, I followed your suggestions and also split the strict_string_checks and !strict_string_checks paths since I think that the code is more readable like this.

lib/sanitizer_common/sanitizer_common_interceptors.inc
500

Sorry, I didn't get this comment. Should I change something in the code?

test/asan/TestCases/strtok.c
13

Done. Note that I also removed intercept_strtok=false since it is not necessary in the current implementation.

alekseyshl accepted this revision.Mar 17 2017, 2:29 PM
alekseyshl added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
523

Shouldn't this also use COMMON_INTERCEPTOR_READ_RANGE?

529

If I get the earlier @ygribov's suggestion correctly, we should check the else branch too:

if (result != nullptr) {
  COMMON_INTERCEPTOR_READ_RANGE(ctx, result, REAL(strlen)(result) + 1);
} else if (str != nullptr) {
  // No delimiter were found, it's safe to assume that the entire str was scanned.
  COMMON_INTERCEPTOR_READ_RANGE(ctx, str, REAL(strlen)(str) + 1);
}
This revision is now accepted and ready to land.Mar 17 2017, 2:29 PM
mrigger updated this revision to Diff 92215.Mar 17 2017, 4:00 PM

Thanks for all the help! Is there still something that we should improve before merging the change?

lib/sanitizer_common/sanitizer_common_interceptors.inc
523

Good point! It is clearer to use COMMON_INTERCEPTOR_READ_RANGE here.

529

Right, I forgot about that case.

Looks good, thanks!

Can someone with commit access commit the change?

Okay, thanks!

This revision was automatically updated to reflect the committed changes.