This is an archive of the discontinued LLVM Phabricator instance.

Moar string interceptors: strstr, strcasestr, strcspn, strpbrk
AbandonedPublic

Authored by m.guseva on Oct 31 2014, 7:47 AM.

Details

Summary

This is a patch for Issue 346: moar string interceptors: strstr, strcasestr, strcspn, strpbrk

New interceptors check all bytes in second argument but in first argument check range is based on return value. Some of existing interceptors also uses this approach (like asan strchr) while other just check the full length of s1(like strstr in tsan). I prefered the first approach as more accurate. What do you find more convinient for these interceptors?

To solve overlap of asan-specific replace_str flag and new common replace_strstr/replace_strcspn flags I've chosen more priority for replace_str flag: when it is false all str interceptors(including new ones) are disabled.

Diff Detail

Event Timeline

m.guseva updated this revision to Diff 15621.Oct 31 2014, 7:47 AM
m.guseva retitled this revision from to Moar string interceptors: strstr, strcasestr, strcspn, strpbrk.
m.guseva updated this object.
m.guseva edited the test plan for this revision. (Show Details)
m.guseva updated this object.Oct 31 2014, 8:12 AM
m.guseva added reviewers: glider, kcc, samsonov.
m.guseva added subscribers: ygribov, Unknown Object (MLST).
glider edited edge metadata.Nov 12 2014, 4:42 AM

Sorry, looks like this has slipped through the cracks. I'll take a look now.

Can you please upload the whole context using `svn diff
--diff-cmd=diff -x -U999999` next time?

There are several places where the shadow checks are too aggressive compared to what the glibc implementations of these functions do.
Because the standards are unlikely to define which optimizations are applicable to each function, I think we'd better be more conservative here.

lib/sanitizer_common/sanitizer_common_interceptors.inc
260

If haystack is shorter than the needle, there's no point in reading the whole contents of needle.
The generic glibc implementation of strstr() indeed makes this optimization (not sure about the SSE-based ones).

302

Only in the case when s1 is not an empty string, otherwise there's no point in reading s2.

312

I never realized all these functions crash if we pass NULL as one of the arguments.
Fortunately we don't need to handle that.

320

This is a bit pessimistic. For the following call:

strpbrk("foobar", "fb")

only the first byte of |s2| will be read.

lib/sanitizer_common/sanitizer_flags.cc
161 ↗(On Diff #15621)

It's not clear why replace_strcspn controls the behavior of strpbrk()
I suggest to either rename the flag to "replace_strcspn_strpbrk" or introduce the "replace_strpbrk" flag.
My dream is to ultimately have a dynamically generated "replace_xyz" flag for almost every interceptor, so the second option is better IMO.

Because the standards are unlikely to define which optimizations are applicable to each function,
I think we'd better be more conservative here.

That's questionable. Some users (e.g. me) would very much prefer to get a warning even about potential errors. Can we have a flag for this like we already have for strict_memcmp and alloc_dealloc_mismatch?

joerg added a subscriber: joerg.Nov 12 2014, 12:23 PM

The interceptors should enforce the contract. For all string functions without explicit size argument, this means that the input is a valid nul-terminated string. Different implementations take different optimisation short cuts, but depending on such implementation details is part of what the interceptors should expose.

The interceptors should enforce the contract.

I believe sanitizer folks are in unfortunate situation where they have to deal with tons of legacy crap which relies heavily on glibc quirks (there's a lot of code like this, some in binary form with no ability to recompile). So I can feel their pain. On the other hand we shouldn't force users which deal with saner codebases (me or Joerg) to use overly conservative tool.

kcc edited edge metadata.Nov 12 2014, 1:20 PM

As long as the aggressive behavior can be disabled by a flag (similar to what we have to do with strict_memcmp, or just by disabling the whole interceptor logic) I am ok with this.

As long as the aggressive behavior can be disabled by a flag ... I am ok with this.

I guess we can do this for already existing interceptors as well? What setting should be default?

kcc added a comment.Nov 12 2014, 2:06 PM
In D6056#13, @ygribov wrote:

As long as the aggressive behavior can be disabled by a flag ... I am ok with this.

I guess we can do this for already existing interceptors as well?

Well, if you want to ...

What setting should be default?

I am fine with the most aggressive default

m.guseva updated this revision to Diff 17561.Dec 22 2014, 7:32 AM
m.guseva edited edge metadata.

Hi all
I've uploaded another patch for new string interceptors. As it was discussed in comments above I've introduced new flag "strict_str" in order to control aggressive checks for strings arguments. So I've also updated all existing interceptors with "const char*" arguments and added strict checks wherever they were missed.

kcc added a comment.Dec 22 2014, 1:00 PM

I like the whole thing, the more interceptors the better.
Statistically, such large changes in interceptors are risky, so please don't commit before the end of holidays in RU, even if you get LGTM before that.
Please also add a documentation blob to the commit message so that we can later copy-paste it into
https://code.google.com/p/address-sanitizer/wiki/Flags

glider, please apply this patch and try it on chrome.

lib/asan/asan_rtl.cc
130 ↗(On Diff #17561)

Is this necessary?
Such things are sometimes unavoidable, but they complicate the code, the understanding and the documentation.
I'd rather prefer users to add more flags...

test/asan/TestCases/atoi_strict.c
4

also, add a 3-rd run with no flag so that we test the default behaviour.
same for other tests.

m.guseva added inline comments.Dec 23 2014, 12:26 AM
lib/asan/asan_rtl.cc
130 ↗(On Diff #17561)

If we remove it there will be still some uncertainty in documentation. From user's point of view it's not clear which interceptors replace_str controls and which not. I supposed it should control all of them and overwrite other replace_str* flags in non default case. Otherwise I think it must be specified that replace_str controls only some subset of string interceptiors while for rest there are custom flags. Shouldn't we to fix the replace_str description for that case?

m.guseva updated this revision to Diff 17587.Dec 23 2014, 1:23 AM
m.guseva added inline comments.
test/asan/TestCases/atoi_strict.c
4

Added in Diff 17587.

glider added inline comments.Dec 24 2014, 9:36 AM
lib/msan/msan_interceptors.cc
96

Am I right that you're only using this macro with n=0?

lib/sanitizer_common/sanitizer_common_interceptors.inc
328

Please put a space before the colon.

2431

I suggest putting the UNUSED function under the #ifdef below and remove the UNUSED attribute.

2457

You don't need IsValidStrtolBase anymore:

bool is_valid_base = (base == 0) || (2 <= base && base <= 36);
if (is_valid_base) {
  ...
}
COMMON_INTERCEPTOR_READ_STRING(ctx, nptr, is_valid_base ? ...)
4796

Chrome's net_unittests crash for me with the following stacktrace:

Program received signal SIGSEGV, Segmentation fault.
__sanitizer::internal_strlen (s=s@entry=0x0)
    at /usr/local/google/ssd/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_libc.cc:155
155	  while (s[i]) i++;
(gdb) bt
#0  __sanitizer::internal_strlen (s=s@entry=0x0)
    at /usr/local/google/ssd/llvm/projects/compiler-rt/lib/sanitizer_common/sanitizer_libc.cc:155
#1  0x000000000059f080 in __interceptor_dlopen (filename=filename@entry=0x0, 
    flag=flag@entry=1)
    at /usr/local/google/ssd/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:4759
#2  0x00007ffff5167ec2 in pr_FindSymbolInProg (
    name=0x7ffff517f29f "nspr_use_zone_allocator") at prmem.c:98
#3  0x00007ffff5167fad in _PR_InitZones () at prmem.c:154
...

We'll need to add yet another dlopen() test once we figure out what's wrong.

lib/sanitizer_common/sanitizer_flags.cc
167 ↗(On Diff #17587)

I think we need to be consistent in naming the flags that disable interceptors.
There's already "intercept_tls_get_addr", so we either need to change it (no need to do that in this CL) or rename your flags into "intercept_something".
(Note that "intrin" in "replace_intrin" isn't a function name, so you don't need to be consistent with that).

173 ↗(On Diff #17587)

I think "strict_string_checks" is more informative, but that's bikeshedding.

lib/tsan/rtl/tsan_interceptors.cc
245

nit: spare whitespace before ')'

test/asan/TestCases/atoi_strict.c
15

Can you please add a space after the 'CHECK:' label here and below?

test/asan/TestCases/strstr-1.c
14

Note that you can use #ifdef directives and different check prefixes to combine related test cases into one file.

test/asan/Unit/lit.site.cfg.in
33

Why do you need this line?

test/sanitizer_common/TestCases/strcasestr.c
4

Why? Is that because strcasestr() is unavailable on Windows?

ygribov added inline comments.Dec 24 2014, 12:15 PM
lib/msan/msan_interceptors.cc
96

I think Maria wanted added n to anticipate future work on conservative length checking.

lib/sanitizer_common/sanitizer_common_interceptors.inc
4796

Looks like COMMON_INTERCEPTOR_READ_STRING needs a check for NULL.

test/asan/Unit/lit.site.cfg.in
33

AFAIR unit tests are not strict_str-safe.

glider added inline comments.Dec 25 2014, 1:31 AM
test/asan/Unit/lit.site.cfg.in
33

Is that something we may want to fix?
If so, please file a bug and add a TODO

m.guseva added inline comments.Dec 25 2014, 7:44 AM
lib/msan/msan_interceptors.cc
96

ygribov
I think Maria wanted added n to anticipate future work on conservative length checking.

Exactly.

lib/sanitizer_common/sanitizer_common_interceptors.inc
2431

These functions were moved from asan_interceptors.cc source where they are used in strtol-related interceptors. Some of them(strtol\atoi\atol) are not under any if directive. Is there any better place where to put this helpers to be shared by asan and common interceptors?

4796

Here is my bug. It needs check for NULL before COMMON_INTERCEPTOR_READ_STRING. Cause dlopen accepts null filename pointer as a normal argument and then there is nothing to check here. In other interceptors (e.g strsmthing) NULL pointer arg is an error case. So the program should fail with segfault. And it will in internal_strlen. I'll recheck other modified interceptors for such cases.

lib/sanitizer_common/sanitizer_flags.cc
167 ↗(On Diff #17587)

The "replace_" flags were suggested in original issue. I agree it should be aligned with existing one.

test/asan/Unit/lit.site.cfg.in
33

ygribov
AFAIR unit tests are not strict_str-safe.

Right. I saw fails caused by checks relied on standard optimizations in string functions. And I didn't want to remove these tests at all cause they seems meaningful.

glider added inline comments.Dec 26 2014, 12:54 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
2431

Oh, sorry, I've missed that.
It's fine to keep these functions as is (I still think IsValidStrtolBase is unnecessary and can be removed)

lib/sanitizer_common/sanitizer_flags.cc
167 ↗(On Diff #17587)

Let's see what Kostya and Alexey think.

kcc added a comment.Jan 15 2015, 2:38 PM

Sorry for the delay in the review. Frankly, I've lost the track of who is waiting for whose reply.
Large changes are hard to review, I wonder If you could split it into a few smaller ones?

I thought Maria was going to upload a new revision.

m.guseva updated this revision to Diff 18524.Jan 21 2015, 9:09 AM

Hi all
I've prepared new patch with fixes suggested in glider's review. I've also done merge with recent trunk. Please take a look.

Regarding splitting the patch:
indeed there two main changes in this patch:

  1. new strict_string_checks run-time flag introduced and applied for existing common, asan, msan and tsan interceptors. New asan tests added for this flag.
  2. new interceptors added as it was suggested in original request(strstr, strcasestr, strcspn, strpbrk) with strict_string_checks flag supported. New common and asan tests added for them.

If it wil be easier I can split patch in these two correspondingly. If so should I attach both to this revision? Or create new one?

m.guseva added inline comments.Jan 21 2015, 9:18 AM
test/sanitizer_common/TestCases/strcasestr.c
4

Yes. I've referenced printf-*.c tests as printf is not intercepted on Windows. Althouth there is a difference because strcasestr is unavailable at all on Windows so it won't even compile. What do you think is more correct way to exclude it?

kcc added a comment.Jan 21 2015, 12:43 PM

Yes, please split it.
If there is a good logical split point -- always try to do that.
Code reviews often have O(N^2) complexity, you don't want N to be too large.

kcc added a comment.Jan 21 2015, 12:43 PM

(Just create two fresh patches and abandon this one)

Submitted first separate patch: New "strict_string_checks" run-time flag http://reviews.llvm.org/D7123
The second one(new interceptors) depends on this so I'll submit it when the first review passed.

m.guseva abandoned this revision.Jan 27 2015, 12:00 AM

Ok, now I am abandoning this revision - let's move all discussions to new splitted patches. So I'm looking for your feedback in http://reviews.llvm.org/D7123 regarding strict_string_checks flag.