This is an archive of the discontinued LLVM Phabricator instance.

[libc] Add strtol, strtoul, and strtoull
ClosedPublic

Authored by michaelrj on Aug 12 2021, 2:38 PM.

Details

Summary

Updates the internal string conversion function so that it
uses the new Limits.h added in a previous commit for max and min values,
and has a templated type. This makes implementing the other strto*
functions very simple.

Diff Detail

Event Timeline

michaelrj created this revision.Aug 12 2021, 2:38 PM
michaelrj requested review of this revision.Aug 12 2021, 2:38 PM

Can you add implementation of strtol and strtoul and tests for them in this same patch? They will help us catch problems in the corner cases if any (one of which I have commented about inline).

libc/src/__support/str_conv_utils.h
78–81

I am not sure this is doing what you want. Lets take the example of T being the type int. I think you want NEGATIVE_MAX to be INT_MAX + 1. But, the above code is effectively:

unsigned long long const NEGATIVE_MASK = INT_MIN;

Which makes NEGATIVE_MASK a number much larger than INT_MAX + 1. This should fix it:

unsigned long long const NEGATIVE_MAX =
    (__llvm_libc::cpp::NumericLimits<T>::min() != 0)
        ? static_cast<unsigned long long>(__llvm_libc::cpp::NumericLimits<T>::max()) + 1
        : __llvm_libc::cpp::NumericLimits<T>::max();
119–127

We should prefer T(...) or static_cast<T>(...) over c-style casts. Also, use the ternary operator instead of an if-else statement?

michaelrj updated this revision to Diff 366365.Aug 13 2021, 3:36 PM
michaelrj marked 2 inline comments as done.

add strtol, strtoul, and strtoull, as well as address comments.

michaelrj retitled this revision from [libc] change internal strtoll to be templated to [libc] Add strtol, strtoul, and strtoull.Aug 13 2021, 3:36 PM
michaelrj edited the summary of this revision. (Show Details)
sivachandra added inline comments.Aug 16 2021, 3:21 PM
libc/src/__support/str_conv_utils.h
114–115

There is change in behavior from the earlier code. Even after detecting that the result is out of range we advance the pointer now, which was not the case previously because of the break statements. But I think this change is doing it right - it is reading out the full number even if it is out of range. So, may be add a test for it?

Also, can we reduce the loop body once overflow is detected? As in, before line 92, can we have something like:

if (result == ABS_MAX)
  continue;
129

When the sign is negative and result is ABS_MAX and the return type is a signed integer type, then I think we have to handle that specially. Consider this: T is long, result_sign is - and result is ABS_MAX which is effectively LONG_MAX + 1. Since LONG_MAX + 1 cannot be represent as a long type, the static_cast here becomes an implementation defined operation.

I think previously also it was implementation defined.

libc/test/src/stdlib/strtoll_test.cpp
179

Why did you change this to unsigned here and other similar places?

michaelrj updated this revision to Diff 366761.Aug 16 2021, 4:09 PM
michaelrj marked 2 inline comments as done.

fix the static cast, add a test for long numbers, and a shortcut for the loop.

libc/src/__support/str_conv_utils.h
114–115

Ah, yes the change in behavior was intentional, I noticed that the str_end value was different between my implementation and the existing strtoll. I have added a test to check that in all of the strto* functions. I also added the loop shortcut.

129

You're right, I've added an extra case and a comment explaining it.

libc/test/src/stdlib/strtoll_test.cpp
179

I changed it because the compiler was giving warnings for comparing int base with the unsigned digits (e.g. first_digit < base) and so I made it unsigned, and decided to make it consistent across all of the strto* tests. It doesn't affect anything.

sivachandra accepted this revision.Aug 16 2021, 10:04 PM
sivachandra added inline comments.
libc/src/__support/str_conv_utils.h
78

This can be constexpr.

94

Add a comment here explaining the early return; point out that we read out the full number even if it is out of range.

129

Can we may be handle the overflow case like this:

if (result == ABS_MAX) {
  if (result_sign == '+' || is_unsigned) 
     return __llvm_libc::cpp::NumericLimits<T>::max();
  else // T is signed and there is a negative overflow
     return __llvm_libc::cpp::NumericLimits<T>::min();
}

For better readability, before line 78 you can define the following and use it above and also on line 79.

constexpr bool is_unsigned = (__llvm_libc::cpp::NumericLimits<T>::min() == 0);

Along the same lines, you can probably also define:

const bool is_positive = (result_sign == '+');
libc/src/stdlib/CMakeLists.txt
93

Fix here and in many other files.

This revision is now accepted and ready to land.Aug 16 2021, 10:04 PM
michaelrj marked 4 inline comments as done.

cleaned up over/underflow handling. Fixed an edge case with passing LLONG_MAX followed by other numbers to strtoll.

libc/src/__support/str_conv_utils.h
94

I did that, in addition I added setting errno, since it's possible to hit ABS_MAX exactly, and then have more numbers. With the code as it was previously that would not set ERANGE in that case. I've added a test to strtoll_test.cpp to handle that.

sivachandra accepted this revision.Aug 17 2021, 11:46 AM
sivachandra added inline comments.
libc/src/__support/str_conv_utils.h
94

Good catch! You can probably move setting of errno from here (and below) to inside the conditional at line 120.

127

You don't need parenthesis around is_positive.

michaelrj marked 2 inline comments as done.

small tweaks, respond to comments.

libc/src/__support/str_conv_utils.h
94

Thank you, and we can't do that because that would set errno if the input is exactly the maximum as well as if it's greater than the maximum.

sivachandra added inline comments.Aug 17 2021, 12:36 PM
libc/src/__support/str_conv_utils.h
94

Ah, ok!

This revision was automatically updated to reflect the committed changes.
michaelrj marked an inline comment as done.