This change addresses https://github.com/google/sanitizers/issues/766. I tested the change with make check-asan and the newly added test case.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I suggest to regenerate patch with -U99999, to get more context.
asan_interceptors.cc | ||
---|---|---|
549 ↗ | (On Diff #89814) | Please use ASAN_READ_STRING instead, similar to other string interceptors. str_size should be checked only if strict_strings is enabled, otherwise you can only check range up to first delimiter (inclusively). |
tests/asan_str_test.cc | ||
118 ↗ | (On Diff #89814) | Better avoid aliasing, strtok arguments are restricted (more below). |
121 ↗ | (On Diff #89814) | delim is somewhat misleading name cause it's also used in other context here. |
123 ↗ | (On Diff #89814) | What does this add to previous tests? |
Thanks a lot for your feedback @ygribov! I improved the test case based on it. Additionally, I changed the interceptor to use ASAN_READ_STRING_OF_LEN instead of directly calling ASAN_READ_RANGE. However, I think I have not yet understood why using strict_strings should only verify the string until the first delimiter. If I select the passed n to be smaller than strlen(str) + 1 then the test case fails, since it does not detect the overflow. What did I understand wrong? Should I instead implement the test case in projects/compiler-rt/test/asan/TestCases/ and set RUN: %env_asan_opts=strict_string_checks=true? If yes, should I select n to be strlen(str) (like the other interceptors) or 1?
lib/asan/asan_interceptors.cc | ||
---|---|---|
541 ↗ | (On Diff #89941) | This should go to lib/sanitizer_common/sanitizer_common_interceptors.inc so that other sanitizers can benefit |
However, I think I have not yet understood why using strict_strings should only verify the string until the first delimiter.
I actually learned this lesson the hard way. C/C++ standards require arguments of standard string functions (strcpy, strchr`, etc.) to be strings i.e. zero-terminated arrays. But unfortunately a lot of existing code seems to ignore this rule and also call string functions on arbitrary C arrays. For example
char a[10]; memcpy(a, "11112", 5); // Non-zero terminated strchr(a, '2'); // Works in practice, even though a isn't a string, because strchr only checks first 5 chars
By default we'd like to not warn about uninteresting errors so we only check string prefix (unless user explicitly asks us with strict_strings runtime flag).
If I select the passed n to be smaller than strlen(str) + 1 then the test case fails, since it does not detect the overflow. What did I understand wrong?
Yes, I guess the test needs to be updated. You can take a look at how other string tests are done (e.g. test/asan/TestCases//strstr*.c, you'll probly need __asan_poison_memory_region hack).
Thanks for feedback @ygribov and @kcc and sorry for the delay in addressing the issues in the change! I moved the implementation of the interceptor to lib/sanitizer_common/sanitizer_common_interceptors.inc, fixed the issues that existed when having strict_string_checks disabled, and improved the test case.
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
475 ↗ | (On Diff #91474) | #if SANITIZER_INTERCEPT_STRTOK and define it to 1 for now. |
485 ↗ | (On Diff #91474) | Why are we not checking result string here? When non-null, it's expected to be a part of the original str and to be readable and valid, right? |
489 ↗ | (On Diff #91474) | What if your first call to strtok returns nullptr? |
491 ↗ | (On Diff #91474) | result_length + 1); and indent it to align with other arguments |
494 ↗ | (On Diff #91474) | First, it should be ..., del_length, del_length + 1); and second, it does not make sense to use COMMON_INTERCEPTOR_READ_STRING_OF_LEN here, COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1); would be enough. |
499 ↗ | (On Diff #91474) | I'd move it below strcasestr interceptor (strstr and strcasestr are related). |
test/asan/TestCases/strtok.c | ||
1 ↗ | (On Diff #91474) | Why not have strtok-1.c, strtok-2.c etc. files? Tests would be quite a bit more readable. |
22 ↗ | (On Diff #91474) | Why other tests do not have comments? Please split them into separate files and add a top line comment explaining what exactly is tested. |
test/asan/TestCases/strtok.c | ||
---|---|---|
22 ↗ | (On Diff #91474) | Not worth it, IMHO |
Thanks for the feedback @alekseyshl! Check the in-line comments to see which issues I already fixed, and to which ones I still have questions.
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
475 ↗ | (On Diff #91474) | Done. |
485 ↗ | (On Diff #91474) | Other interceptors also do not check the result of the libc functions. See for example strchr or strpbrk. I think all interceptors are based on the assumption that the libc functions do not contain any errors. |
489 ↗ | (On Diff #91474) | Good catch. I fixed the error and added a test case. |
491 ↗ | (On Diff #91474) | Done. |
494 ↗ | (On Diff #91474) | I implemented it like you suggested in my first revision. The current implementation is based on @ygribov's feedback and his explanations about strict_strings. Did I understand them wrong? |
499 ↗ | (On Diff #91474) | Done. |
test/asan/TestCases/strtok.c | ||
1 ↗ | (On Diff #91474) | I agree that the tests would be more readable. However, I observed that other larger *.str test cases follow this structure; for example, see strtol_strict, strncasecmp_strict, and strncmp_strict. |
12 ↗ | (On Diff #91474) | The other test cases, from which I copied this pattern, are all written in one line. Should I still break the existing "convention"? |
22 ↗ | (On Diff #91474) | I added a comment for each test case. Additionally, the test is now only compiled once. |
Awesome, thank you for doing it!
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
485 ↗ | (On Diff #91474) | Ah, right, thanks. |
494 ↗ | (On Diff #91474) | You're explicitly calling strlen on that string, it will look for the \0 anyway. Check what COMMON_INTERCEPTOR_READ_STRING_OF_LEN is doing, substitute your arguments. Your last argument being just 1 should definitely be fixed, now you're checking just the first char, right? |
test/asan/TestCases/strtok.c | ||
12 ↗ | (On Diff #91474) | Style evolves. This is not critical, especially with your last change, lines are shorter and easier to parse, but still, I prefer to keep all lines under 80 chars. If it is not too big of a problem, I'd appreciate breaking those exceeding 80 chars at FileCheck, as I suggested, and please indent FileCheck: RUN: FileCheck ... |
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
494 ↗ | (On Diff #91474) | Note, that we use the not-intercepted strlen. Following, an out-of-bounds access might not result in an error here. Yes, we only check the first delimiter (for strict_string_checks=0), as @ygribov suggested, if I understood him right. @ygribov, can you please comment to clear things up? Here is a demonstration of the behavior on an example: #include <stdio.h> #include <string.h> int main() { char buf[] = "abcb"; char del[] = {'b'}; // missing NULL terminator char* result = strtok(buf, del); printf("%s\n", result); } Executing the program with ASAN_OPTIONS=strict_string_checks=1 ./a.out lets ASan output an error. However, executing with ASAN_OPTIONS=strict_string_checks=0 ./a.out does not result in an error. Here is also the documentation for strict_string_checks: strict_string_checks - If set check that string arguments are properly null-terminated I'm happy to change it, but I want to be sure that we implement the expected behavior. |
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
494 ↗ | (On Diff #91474) | The code I proposed uses non-intercepted strlen as well: COMMON_INTERCEPTOR_READ_RANGE(ctx, delimiters, REAL(strlen)(delimiters) + 1); The problem I'm trying to point out is not about the first delimiter (that's fine, but it's about the other COMMON_INTERCEPTOR_READ_STRING_OF_LEN use, the one above), it's that you pass n=1 to your second macro, this one: COMMON_INTERCEPTOR_READ_STRING_OF_LEN(ctx, delimiters, del_length, 1); which means "1 char". I believe, it was not your intention. |
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
500 ↗ | (On Diff #91904) | Bind star to ctx? |
502 ↗ | (On Diff #91904) | I'd suggest to handle !intercept_strtok case here as this will allow to reduce function nesting. It's minor though. |
508 ↗ | (On Diff #91904) | Strtok will return NULL or zero-terminated string so you should be able to do plain COMMON_INTERCEPTOR_READ_RANGE on str (probably only when strict_strings is off as otherwise str has been completely checked on the first call). |
510 ↗ | (On Diff #91904) | I agree with Alex - we shouldn't call strlen here, because if str isn't zero-terminated we risk to trigger runtime error. The best we can do is COMMON_INTERCEPTOR_READ_STRING(ctx, str, 1); Note that 1 is suboptimal here - we could have used strspn to find the first occurence of delimeters and use that to better estimate the length. |
515 ↗ | (On Diff #91904) | Same to line 508 COMMON_INTERCEPTOR_READ_RANGE(ctx, result, REAL(strlen)(result) + 1); ? |
516 ↗ | (On Diff #91904) | I think else-branch shouldn't be neglected - you can safely do COMMON_INTERCEPTOR_READ_RANGE(ctx, str, REAL(strlen) + 1) (we know that none of delimeters has been found so str must have been scanned to the terminating 0). |
519 ↗ | (On Diff #91904) | Same issue with strlen here. Just do COMMON_INTERCEPTOR_READ_STRING(ctx, delimeters, 1); (1 is again suboptimal). |
494 ↗ | (On Diff #91474) | Sorry guys, I was out. Frankly I agree with @alekseyshl in that we shouldn't unconditionally call strlen on str and delimeters. They are not necessarily null-terminated so by calling REAL(strlen) when strict_strings is off, you risk triggering a runtime error. See my suggestions below for how to change the code to avoid strlens. |
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
494 ↗ | (On Diff #91474) |
We can't easily check for more when strict_strings is off (unless we reimplement most of strtok in interceptor). |
Thanks again for the review @ygribov and @alekseyshl! I believe that I've addressed all the issues that you raised. @ygribov, I followed your suggestions and also split the strict_string_checks and !strict_string_checks paths since I think that the code is more readable like this.
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
500 ↗ | (On Diff #91904) | Sorry, I didn't get this comment. Should I change something in the code? |
test/asan/TestCases/strtok.c | ||
12 ↗ | (On Diff #91474) | Done. Note that I also removed intercept_strtok=false since it is not necessary in the current implementation. |
lib/sanitizer_common/sanitizer_common_interceptors.inc | ||
---|---|---|
523 ↗ | (On Diff #92059) | Shouldn't this also use COMMON_INTERCEPTOR_READ_RANGE? |
529 ↗ | (On Diff #92059) | If I get the earlier @ygribov's suggestion correctly, we should check the else branch too: if (result != nullptr) { COMMON_INTERCEPTOR_READ_RANGE(ctx, result, REAL(strlen)(result) + 1); } else if (str != nullptr) { // No delimiter were found, it's safe to assume that the entire str was scanned. COMMON_INTERCEPTOR_READ_RANGE(ctx, str, REAL(strlen)(str) + 1); } |