This is an archive of the discontinued LLVM Phabricator instance.

[libc] Better IntegerToString API
ClosedPublic

Authored by gchatelet on Aug 3 2023, 2:55 AM.

Details

Summary

This patch is an alternative to D155902. It provides the following benefits:

  • No buffer manual allocation and error handling for the general case
  • More flexible API : width specifier, sign and prefix handling
  • Simpler code

The more flexible API removes the need for manually tweaking the buffer afterwards, and so prevents relying on implementation details of IntegerToString.

Diff Detail

Event Timeline

gchatelet created this revision.Aug 3 2023, 2:55 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 3 2023, 2:55 AM
gchatelet requested review of this revision.Aug 3 2023, 2:55 AM

I will submit the following part of this patch individually:

  • support for make_signed
  • construction of string from string_view
jhuber6 added inline comments.Aug 3 2023, 6:54 AM
libc/src/__support/integer_to_string.h
113

I'm not a fan of the name, since this isn't a string it's a config for the radix. Could we do something like radix::Decimal, etc?

gchatelet added inline comments.Aug 3 2023, 7:58 AM
libc/src/__support/integer_to_string.h
113

The intent was that it could be read naturally : IntegerTo<DecString::WithPrefix> -> Integer to dec string with prefix.
Now I'm fine with another approach.

I'd like to gather a few other opinions / suggestions before changing though.

gchatelet updated this revision to Diff 547135.Aug 4 2023, 1:44 AM
  • Fix documentation
  • Fix missing dependency in bazel build
gchatelet updated this revision to Diff 547819.Aug 7 2023, 9:25 AM
  • Fix documentation
  • Fix missing dependency in bazel build
  • Revert name to IntegerToString, move radix to namespace, extract buffer handling class, default to decimal conversion.
gchatelet updated this revision to Diff 547822.Aug 7 2023, 9:34 AM
  • Remove TODO, the compiler optimizes out string formatting and allocation because its on the stack
gchatelet marked an inline comment as done.Aug 7 2023, 9:37 AM

@jhuber6 is it better like this?
@michaelrj @lntue any suggestions?

Just a few nits, I like it overall but will defer accepting to the standard libc contributors.

libc/src/__support/FPUtil/fpbits_str.h
51–53
libc/src/__support/StringUtil/signal_to_string.cpp
28–29

Do we need a helper for what is already a one-liner? I think it should fit without needing to be split to the next line.

libc/src/__support/integer_to_string.h
113

I just feel like it's clearer if you have a function called IntegerToString which takes a radix config by template, e.g. IntegerToString<radix::Decimal>. It's more words but I just find it a little weird to

206–208

Unrelated, but this really should probably go in some common libc header since there's another use in rpc_client.h.

Overall LGTM, I like the idea of the buffer being provided by the class and therefor guaranteed to be the right size. I left a few comments, but I think this is the right design moving forward.

libc/src/__support/integer_to_string.h
91

why limit this to only these bases? Without the asserts this class would support any base from 2-36. Additionally, that would mirror how the string to integer functions work.

210

why are signs only supported in base 10?

303

is there handling for value being the lowest negative value for a signed type (e.g. LONG_MIN)? In that case -value is equivalent to value.

gchatelet marked 7 inline comments as done.Aug 8 2023, 4:55 AM
gchatelet added inline comments.
libc/src/__support/StringUtil/signal_to_string.cpp
28–29

I agree it's not useful anymore.

libc/src/__support/integer_to_string.h
91

Although the API would absolutely work with other bases I couldn't come up with legitimate uses so I decided to restrict them.
I will revert to the previous state and provide a way to access other bases.

113

I just feel like it's clearer if you have a function called IntegerToString which takes a radix config by template, e.g. IntegerToString<radix::Decimal>. It's more words but I just find it a little weird to

IntegerToString<radix::XXX> it is then.

206–208

Unrelated, but this really should probably go in some common libc header since there's another use in rpc_client.h.

We definitely need it. I plan to provide it in libc/src/__support/CPP/algorithms.h.

210

This is consistent with :

  • printf/scanf specification
  • hex/binary calculator

It is technically doable but I don't see a use case for it. We can always lift this restriction afterwards if it proves useful.

303

is there handling for value being the lowest negative value for a signed type (e.g. LONG_MIN)? In that case -value is equivalent to value.

Yes, this is covered by the tests.

EXPECT(type, INT8_MIN, "-128");
EXPECT(type, INT16_MIN, "-32768");
EXPECT(type, INT32_MIN, "-2147483648");
EXPECT(type, INT64_MIN, "-9223372036854775808");

C++17 does not enforce a single representation for signed integers but C++20 will enforce two's complement as the only valid representation of signed integer : "However, all C++ compilers use two's complement representation, and as of C++20, it is the only representation allowed by the standard".

Also "The builtin unary minus operator calculates the negative of its promoted operand"

In the case of int8_t, -INT8_MIN will actually perform -static_cast<int>(INT8_MIN) which will end up being int(128) then this value is converted back to uint8_t.
Then the standard says "if the target type can represent the value, the value is unchanged" this is the case here, 128 is legal in uint8_t so this will be the value passed down to write_number.

The same reasoning works for int16_t, int32_t and int64_t except that type promotion does not always kick in. In that case I believe that the second rule applies. It basically states that modulo two arithmetic applies for the conversion from signed to unsigned. Practically speaking for int64_t this is equivalent to std::bit_cast<uint64_t>(INT64_MIN) leaving the bit pattern unchanged.

INT64_MIN interpreted as int64_t

hex : 8000 0000 0000 0000
dec : -9 223 372 036 854 775 808

interpreted as uint64_t

hex : 8000 0000 0000 0000
dec : 9 223 372 036 854 775 808
gchatelet updated this revision to Diff 548157.Aug 8 2023, 4:56 AM
gchatelet marked 6 inline comments as done.
  • Fix documentation
  • Fix missing dependency in bazel build
  • Revert name to IntegerToString, move radix to namespace, extract buffer handling class, default to decimal conversion.
  • Remove TODO, the compiler optimizes out string formatting and allocation because its on the stack
  • Address comments
gchatelet added inline comments.Aug 8 2023, 6:23 AM
libc/src/__support/integer_to_string.h
303

Tests do fail in certain situations so there's probably something fishy. I'm investigating.

gchatelet added inline comments.Aug 8 2023, 7:02 AM
libc/src/__support/integer_to_string.h
303

So according to -fsanitize=signed-integer-overflow it is not generally safe to negate a signed number
"Signed integer overflow, where the result of a signed integer computation cannot be represented in its type."

This stackoverflow seems to confirm it.

By special casing the min value we can still have nicely optimized code though on all platforms I've tested (x86, aarch64, rv64)
https://godbolt.org/z/zjcETfP1v

I'll fix the code. Thx for raising the question @michaelrj !

gchatelet updated this revision to Diff 548223.Aug 8 2023, 7:52 AM
  • Fix signed negate overflow
  • Fix missing + for 0 WithSign
gchatelet updated this revision to Diff 548241.Aug 8 2023, 8:23 AM
  • Fix documentation
  • Fix missing dependency in bazel build
  • Revert name to IntegerToString, move radix to namespace, extract buffer handling class, default to decimal conversion.
  • Remove TODO, the compiler optimizes out string formatting and allocation because its on the stack
  • Address comments
  • Fix signed negate overflow
  • rebase
gchatelet updated this revision to Diff 548243.Aug 8 2023, 8:40 AM
  • Inline buffer_size in error_to_string.cpp

I've tested it on ARM, aarch64, x86 and riscv. It seems to be good to go.
LGTM anyone?

jhuber6 accepted this revision.Aug 8 2023, 9:06 AM

LGTM

This revision is now accepted and ready to land.Aug 8 2023, 9:06 AM
michaelrj accepted this revision.Aug 8 2023, 10:26 AM

LGTM, thank you for doing this

This revision was landed with ongoing or failed builds.Aug 8 2023, 1:39 PM
This revision was automatically updated to reflect the committed changes.
gchatelet reopened this revision.Aug 8 2023, 1:55 PM

The libc-aarch64-ubuntu-fullbuild-dbg buildbot failed in LlvmLibcPThreadattrTest.SetAndGetStackSize and LlvmLibcPThreadattrTest.SetAndGetStack
https://lab.llvm.org/buildbot/#/builders/223/builds/25141/steps/8/logs/stdio

I've reverted and will reland once fixed.

This revision is now accepted and ready to land.Aug 8 2023, 1:55 PM