This is an archive of the discontinued LLVM Phabricator instance.

More string interceptors: strstr, strcasestr, strspn, strcspn, strpbrk
ClosedPublic

Authored by m.guseva on Apr 14 2015, 8:00 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

m.guseva updated this revision to Diff 23733.Apr 14 2015, 8:00 AM
m.guseva retitled this revision from to More string interceptors: strstr, strcasestr, strcspn, strpbrk.
m.guseva updated this object.
m.guseva edited the test plan for this revision. (Show Details)
m.guseva added reviewers: samsonov, glider, kcc, dvyukov.
m.guseva added subscribers: Unknown Object (MLST), kubamracek, joerg, ygribov.
dvyukov added inline comments.Apr 14 2015, 9:45 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
269 ↗(On Diff #23733)

delete empty line (two consequent empty lines)

271 ↗(On Diff #23733)

delete empty line (that's how it is written everywhere in this file)

277 ↗(On Diff #23733)

Is "r - s1 + len2 + 1" correct?
I think it should be "r - s1 + len2", because strstr can read only len2 bytes to find a match.

279 ↗(On Diff #23733)

I am not sure such complex computations are necessary. People should not rely on such things.
If anything, I think, the fifth argument should be 1, because if I call strstr("aaaaaaaaaaaa", "bcdef") I can "expect" that the function will read only the first symbol "b" from the second string.

283 ↗(On Diff #23733)

add parens around COMMON_INTERCEPTOR_NOTHING_IS_INITIALIZED

288 ↗(On Diff #23733)

drop {}

294 ↗(On Diff #23733)

#define INIT_STRSTR \

COMMON_INTERCEPT_FUNCTION(strstr); \
COMMON_INTERCEPT_FUNCTION(strcasestr);

and move below

306 ↗(On Diff #23733)

remove

310 ↗(On Diff #23733)

remove

312 ↗(On Diff #23733)

delete empty line

315 ↗(On Diff #23733)

delete empty line

322 ↗(On Diff #23733)

I don't believe this is necessary. Make it just:
READ_RANGE(s2, REAL(strlen)(s2) + 1);

328 ↗(On Diff #23733)

move below (see how it is formatted in other case in this file)

337 ↗(On Diff #23733)

I don't believe this complex code is necessary. Make it just:
READ_RANGE(s2, REAL(strlen)(s2) + 1);

4945 ↗(On Diff #23733)

delete

lib/sanitizer_common/sanitizer_flags.inc
156 ↗(On Diff #23733)

do we need all these additional flags? why?

lib/sanitizer_common/sanitizer_platform_interceptors.h
58 ↗(On Diff #23733)

this is unused, delete

60 ↗(On Diff #23733)

delete

m.guseva added inline comments.Apr 15 2015, 9:21 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
279 ↗(On Diff #23733)

Do you mean the following: (ctx, s2, len2, r ? len2 + 1 : (len2 <= len1 ? 1 : len1 + 1)); ?
In that case we may not check real read of bigger length from s2 in case there is some not null intersection of s1 and s2.
The most accurate result would be if repeat explicitly strstr computations and find that real length. But I've just added non-strict check assuming the worst case.

322 ↗(On Diff #23733)

Why not?
Indeed we don't need to read the full length of s2. User may expect it in non-strict case.

337 ↗(On Diff #23733)

The same - please explain. My intention was to avoid strict checks when possible.

lib/sanitizer_common/sanitizer_flags.inc
156 ↗(On Diff #23733)

It was discussed in initial patch for this http://reviews.llvm.org/D6056#92330: glider suggested to have individual flags. I agree, it's more flexible.

lib/sanitizer_common/sanitizer_platform_interceptors.h
58 ↗(On Diff #23733)

It's my bug, SANITIZER_INTERCEPT_STRCASESTR must be used in sanitizer_common_interceptors.inc cause indeed there is no strcasestr on Windows. I'll fix it.

The rest is always 1 so I agree it could be removed. I wonder why there are other defines allways equal to 1 in sanitizer_platform_interceptors.h?

m.guseva updated this object.Apr 15 2015, 9:23 AM
dvyukov added inline comments.Apr 20 2015, 7:02 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
279 ↗(On Diff #23733)

The worst case is strlen(s2)+1. An implementation is allowed to read that much.
The non-strict mode should consider the _best_ case if anything. But I would just delete this complexity and use strlen.

322 ↗(On Diff #23733)

Because these are very complex expressions with lots of corner cases and potential off-by-ones.
User should not expect that the function won't read the whole string. There are not such guarantees in the documentation. Documentation refers to arguments as to strings. Strings are addressable zero-terminated regions of memory.

lib/sanitizer_common/sanitizer_flags.inc
156 ↗(On Diff #23733)

Okay
I am not very strong about this

lib/sanitizer_common/sanitizer_platform_interceptors.h
58 ↗(On Diff #23733)

I mean delete them because they are unused, not because they are always 1.
I guess we have always 1 defines for flexibility and consistency.

m.guseva added inline comments.Apr 21 2015, 8:29 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
279 ↗(On Diff #23733)

The non-strict mode should consider the _best_ case if anything.

I understand the non-strict mode as ability to check less then full string length when it's possible. It shouldn't rely on best case, it should check at least as much as needed for getting the result. I think it's better to check more bytes than to miss incorrect read. If proposed read check is inaccurate I agree it's better to replace with full length check.

322 ↗(On Diff #23733)

User should not expect that the function won't read the whole string. There are not such guarantees in the documentation.

The "strict_string_checks" flag was introduced exactly cause user may expect such behaviour despite of documentation. Which are corner cases here? Can I just cover them?

ygribov added inline comments.Apr 21 2015, 10:51 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
322 ↗(On Diff #23733)

I guess Dmitry means that there is a tradeoff between functionality and code complexity. And he'd prefer to drop overly complex checks in favor of simplicity.

dvyukov added inline comments.Apr 22 2015, 12:38 AM
lib/sanitizer_common/sanitizer_common_interceptors.inc
322 ↗(On Diff #23733)

Yes, what Yury said. "r ? internal_strchr(s2, *r) - s2 + 1 : *s1 ? REAL(strlen)(s2) + 1 : 0" is complex without providing any obvious benefit.

If anything, non-strict mode check at _most_ (rather than at least) as much as needed for getting the result. But what is required to get the result? It depends on the implementation of the function in libc.
For example, I, as a user, can argue than strstr("aaaaaaaaaaaa", "bcdef") must read no more than 1 byte from the second string. While your non-strict implementation checks more, thus will cause a crash which looks non-legitimate to me.
The non-strict checks were introduced based on real cases that we observed and were unable to fix. Adding them proactively looks wrong to me. If a user hits such a case, then the first thing he should try is to fix the code (formally it is illegal). This should cover majority of the cases. If fixing the code is not possible, then he can use the flag to disable the interceptor. Meanwhile we can figure out whether we want to relax checking in the interceptor or not.

m.guseva updated this revision to Diff 26522.May 26 2015, 10:46 AM

Hi,

I've uploaded new patch with fixes suggested by Dmitry. Could you please take a look? Thank you.

dvyukov accepted this revision.May 26 2015, 11:12 AM
dvyukov edited edge metadata.

thanks, the code is now much easier to understand
LGTM

lib/sanitizer_common/sanitizer_common_interceptors.inc
276 ↗(On Diff #26522)

remove whitespace after s2

318 ↗(On Diff #26522)

please add strspn interceptor for completeness

This revision is now accepted and ready to land.May 26 2015, 11:12 AM
m.guseva updated this revision to Diff 26589.May 27 2015, 6:16 AM
m.guseva retitled this revision from More string interceptors: strstr, strcasestr, strcspn, strpbrk to More string interceptors: strstr, strcasestr, strspn, strcspn, strpbrk.
m.guseva edited edge metadata.

Thank you. I've uploaded new patch with strspn interceptors added. I moved both strspn and strcspn under the same flag intercept_strspn and one macro SANITIZER_INTERCEPT_STRSPN.

I'll submit tomorrow (if no objections from Dima) so that we are around to cope with bugs.

This revision was automatically updated to reflect the committed changes.