This is an archive of the discontinued LLVM Phabricator instance.

Add new interceptor for strtonum(3)
ClosedPublic

Authored by krytarowski on Nov 14 2018, 7:53 AM.

Details

Summary

strtonum(3) reliably convertss string value to an integer.
This function is used in OpenBSD compat namespace
and is located inside NetBSD's libc.

Add a dedicated test for this interface.

It's a reworked version of the original code by Yang Zheng.

Diff Detail

Repository
rL LLVM

Event Timeline

krytarowski created this revision.Nov 14 2018, 7:53 AM
krytarowski edited the summary of this revision. (Show Details)Nov 15 2018, 7:14 PM
krytarowski added a subscriber: tomsun.0.7.
vitalybuka added inline comments.Nov 19 2018, 1:16 PM
test/sanitizer_common/TestCases/NetBSD/strtonum.cc
13

what is going to happen with following?
strtonum("100 long suffix....", 0, 1000, &errstr)
if this stops parsing just after 100 then COMMON_INTERCEPTOR_READ_RANGE(ctx, nptr, REAL(strlen)(nptr) + 1) is performance bottleneck
maybe we should get actual read size with StrtolFixAndCheck

we had issues with other strto* with strict_string_checks=1 on parsers like python
it just calls such methods in the middle of a large file and you get O(N) -> O(N^2)

vitalybuka added inline comments.Nov 19 2018, 1:17 PM
test/sanitizer_common/TestCases/NetBSD/strtonum.cc
13

maybe we should get actual read size with StrtolFixAndCheck

correction: get actual read size with different strto* call and pass it into StrtolFixAndCheck

vitalybuka requested changes to this revision.Nov 29 2018, 1:59 PM
This revision now requires changes to proceed.Nov 29 2018, 1:59 PM
krytarowski edited the summary of this revision. (Show Details)
  • rework the strtonum(3) interceptor according to comments from review
krytarowski marked an inline comment as done.Dec 5 2018, 11:40 PM
krytarowski added inline comments.
test/sanitizer_common/TestCases/NetBSD/strtonum.cc
13

strtonum("100 long suffix....", 0, 1000, &errstr) it will stop parsin after 100 on ' '.

src.illumos.org/source/xref/openbsd-src/lib/libc/stdlib/strtonum.c

krytarowski marked an inline comment as done.Dec 6 2018, 8:16 AM
krytarowski added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
7268

I had to define on top these variables, as compiler due to goto was complaining that the code is invalid otherwise.

If this patch looks fine, I will follow up with this approach in strtoi(3)/strtou(3) https://reviews.llvm.org/D54702

vitalybuka added inline comments.Dec 6 2018, 12:28 PM
test/sanitizer_common/TestCases/NetBSD/strtonum.cc
13

According implementation form src.illumos.org/source/xref/openbsd-src/lib/libc/stdlib/strtonum.c
it uses strtoll
So if we intercept strtoll, we just need to unpoison errstrp, it it was in existing implementation. What is the problem this patch is solving?

krytarowski marked an inline comment as done.Dec 6 2018, 12:33 PM
krytarowski added inline comments.
test/sanitizer_common/TestCases/NetBSD/strtonum.cc
13

Do you mean what the interceptor code is solving? Reading input string only as much as the strtonum(3) function and unpoisioning errstr.

vitalybuka added inline comments.Dec 6 2018, 12:43 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
7266–7287
INTERCEPTOR(long long, strtonum, const char *nptr, long long minval,
            long long maxval, const char **errstr) {
  void *ctx;
  COMMON_INTERCEPTOR_ENTER(ctx, strtonum, nptr, minval, maxval, errstr);
  
  char *real_endptr;
  long long ret = REAL(strtoll)(nptr, &real_endptr, 10);
  StrtolFixAndCheck(ctx, nptr, nullptr, real_endptr, 10);

  ret = REAL(strtonum)(nptr, minval, maxval, errstr);
  if (errstr) {
    COMMON_INTERCEPTOR_WRITE_RANGE(ctx, errstr, sizeof(const char *));
     if (*errstr)
      COMMON_INTERCEPTOR_WRITE_RANGE(ctx, *errstr, REAL(strlen)(*errstr) + 1);
  }
  return ret;
}
vitalybuka added inline comments.Dec 6 2018, 12:46 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
7266–7267

Patch says that base already has SANITIZER_INTERCEPT_STRTONUM
but master has non of it

vitalybuka added inline comments.Dec 6 2018, 12:58 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
7266–7267

Phabricator does something weird:
Base vs Diff 1 : it's a new interceptor
Base vs Diff 2 : it's modified interceptor

I guess I was confused by this, I assumed that you are changing interceptor, not adding new one.
So in Diff 1 I don't like REAL(strlen). So I proposed to use something to get actual read len and use StrtolFixAndCheck.

I think it's better to just to call parsing twice (line in snippets I've sent), instead of exposing error handling into interceptor, like in Diff 2.

However I am not sure what is going to happen on BSD when we call REAL(strtonum). Is this going to hit strtoll interceptor, which is unfortunately not in common yet?

Hmm, I will try to reupload diff. It's a new interceptor.

  • eliminate goto
  • fix diff
krytarowski marked an inline comment as done.Dec 6 2018, 7:39 PM
krytarowski added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
7266–7287

Hmm.. this is evaluating the input string twice... but performance impact is probably marginal, so I will switch to it.

krytarowski edited the summary of this revision. (Show Details)
  • add a new version
vitalybuka accepted this revision.Dec 7 2018, 1:15 PM
vitalybuka added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
7275

why not strtoll, it returns the same type as interceptor and it's already used in strtonum implementation?

test/sanitizer_common/TestCases/NetBSD/strtonum.cc
45

asserts?

This revision is now accepted and ready to land.Dec 7 2018, 1:15 PM
krytarowski marked 2 inline comments as done.Dec 7 2018, 1:19 PM
krytarowski added inline comments.
lib/sanitizer_common/sanitizer_common_interceptors.inc
7275

It doesn't build, as strtoll isn't a valid interceptor for all sanitizers.

test/sanitizer_common/TestCases/NetBSD/strtonum.cc
45

I will switch to asserts.

vitalybuka added inline comments.Dec 7 2018, 1:49 PM
lib/sanitizer_common/sanitizer_common_interceptors.inc
7275

Ah right, to call REAL we need interceptor or some workaround.
LGTM as is, but we should move these from asan/msan to common at some point

This revision was automatically updated to reflect the committed changes.