This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add strtok_r implementation.
ClosedPublic

Authored by cgyurgyik on Aug 11 2020, 2:02 PM.

Diff Detail

Event Timeline

cgyurgyik created this revision.Aug 11 2020, 2:02 PM
cgyurgyik requested review of this revision.Aug 11 2020, 2:02 PM
sivachandra added inline comments.Aug 12 2020, 8:54 AM
libc/spec/posix.td
17

Other restricted pointer types are named like RestrictSigSetType. For consistency, we should use just one style everywhere.

libc/src/string/strtok_r.cpp
17

Would it make sense to implement strtok and strtok_r over a common function which is equivalent in functionality to strtok_r?

36

Can we operator on src instead of saveptr?

if (*src) {
  *src = '\0';
  ++src;
}
*saveptr = src;
cgyurgyik added inline comments.Aug 12 2020, 9:39 AM
libc/spec/posix.td
17

Are you saying at the file level, or the libc level?
For example, in .../spec/stdc.td, it is named in the following manner:
RestrictedPtrType CharRestrictedPtr = RestrictedPtrType<CharType>. Which would we prefer?

cgyurgyik marked an inline comment as not done.Aug 12 2020, 9:39 AM
sivachandra added inline comments.Aug 12 2020, 12:13 PM
libc/spec/posix.td
17

Keep it as in this patch then. We can cleanup separately.

cgyurgyik marked an inline comment as done.Aug 12 2020, 12:19 PM
cgyurgyik added inline comments.
libc/spec/posix.td
17

Ok I'll make a note for myself to fix this in a later revision. Currently seeing if there's a clean way to use a single function for both.

abrachet added inline comments.
libc/src/string/strtok_r.cpp
17

Making the prototype for this function restrict but not the function itself doesn't allow for optimizations. src and delimiter_string should be marked __restrict here and I imagine in a couple of other string functions too

cgyurgyik marked an inline comment as done.Aug 12 2020, 1:56 PM
cgyurgyik added inline comments.
libc/src/string/strtok_r.cpp
17

Yeah you're right, I haven't been using __restrict at all. I should've probably asked about this at an earlier point, wasn't sure if there was a reason to use it or not. Will fix this here, and other places in future revisions as well.

cgyurgyik added inline comments.Aug 12 2020, 3:30 PM
libc/src/string/strtok_r.cpp
17

I'm struggling to get a static inline function working for both a static char * and a char ** without running into lifetime issues with the saveptr. Was there something you had in mind? Maybe I'm missing something simple here.

cgyurgyik added inline comments.Aug 12 2020, 3:40 PM
libc/src/string/strtok_r.cpp
17

One way I could do this is to have the utility function return a struct {char *token, char *strtok_string}, and then set the strtok_string to the respective value, static char * for strtok, and dereferenced char** for strtok_r. Then we can return the struct.token.

abrachet added inline comments.Aug 12 2020, 4:48 PM
libc/src/string/strtok_r.cpp
17

strtok can just be

char *LLVM_LIBC_ENTRYPOINT(strtok)(char *src, const char *delimiter_string) {
  static char *strtok_str;
  return __llvm_libc::strtok_r(src, delimiter_string, &strtok_str);
}

There is no lifetime issue.

cgyurgyik added inline comments.Aug 12 2020, 5:44 PM
libc/src/string/strtok_r.cpp
17

Yep... that's the simple solution I avoided for much too long. I was trying to do the reverse. Will fix. Thanks abrachet :)

cgyurgyik updated this revision to Diff 285229.Aug 12 2020, 6:14 PM

Address siva's and abrachet's comments.

cgyurgyik marked 5 inline comments as done.Aug 12 2020, 6:15 PM
cgyurgyik added inline comments.
libc/src/string/strtok_r.cpp
17

Left a TODO for myself to address this in a separate revision, since this applies to other functions as well.

LGTM but not yet accepting as I asked for some extension to the tests.

libc/src/string/string_utils.h
40

static?

libc/src/string/strtok_r.cpp
17

SGTM. And sorry I missed these in previous patches.

libc/test/src/string/strtok_r_test.cpp
12

While having these differently named tests is fine, I am mildly concerned with the fact that when I first read them, the tests seems incomplete. For example, most tests are making only a single call to strtok_r. The results of subsequent calls to strtok_r are not tested. I suggest putting all of this in one test called Various with each of the tests you have in their own comment annotated scoped block. For example:

Test(StrTokRTest, Various) {
  { // Empty source string and delimiter string
    char empty[] = "";
    char *reserve = nullptr;
    ASSERT_STREQ(__llvm_libc::strtok_r(empty, "",  &reserve), nullptr);
    // Another call to ensure that a bad state was not written in to |reserve|.
    ASSERT_STREQ(__llvm_libc::strtok_r(empty, "",  &reserve), nullptr);
    ASSERT_STREQ(__llvm_libc::strtok_r(nullptr, "",  &reserve), nullptr);
  }
  ...
}

That said, putting everything in one test is my personal preference. You can choose to the keep the different test classes as you have currently, but include follow up calls to strtok_r as suggested in the above example.

cgyurgyik updated this revision to Diff 285335.Aug 13 2020, 5:05 AM

Address siva's comments.

cgyurgyik added inline comments.Aug 13 2020, 5:08 AM
libc/src/string/string_utils.h
40

Thanks :). This was me trying to debug some quirky lifetime issues with the not-so-elegant approach I was taking beforehand.

libc/test/src/string/strtok_r_test.cpp
12

Thanks. Done. I'm still personally a fan of trying to separate behaviors by test, but I tried to group some of the bigger entities here.

sivachandra accepted this revision.Aug 13 2020, 12:06 PM
This revision is now accepted and ready to land.Aug 13 2020, 12:06 PM
This revision was automatically updated to reflect the committed changes.