This is an archive of the discontinued LLVM Phabricator instance.

Do not use non-POSIX strtoll_l and strtoull_l
AbandonedPublic

Authored by georgthegreat on Mar 26 2022, 12:40 PM.

Details

Reviewers
ldionne
EricWF
Group Reviewers
Restricted Project
Summary

Do not use non-POSIX strtoll_l and strtoull_l

strtoll_l() and strtoull_l() are not specified in POSIX and e. g. musl libc
does not provide these methods.

The only usage of the methods is in locale, where they are used with "C"
locale only. I believe there hardly is any implementation that implement integers
parsing in a locale-dependent way.

As I see, blame for the usaging of this method goes all the way down to initial libcxx export.

Diff Detail

Event Timeline

georgthegreat created this revision.Mar 26 2022, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2022, 12:40 PM
georgthegreat requested review of this revision.Mar 26 2022, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 26 2022, 12:40 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
georgthegreat edited the summary of this revision. (Show Details)Mar 26 2022, 12:48 PM
georgthegreat edited the summary of this revision. (Show Details)

I don't really have strong push back per-se, however the fact that strotoull_l & friends exist seems to suggest that perhaps it's useful to have locale-dependent versions of those functions?

@mstorsjo You've done some work related to locales, do you have an opinion on this?

I don't really have strong push back per-se, however the fact that strotoull_l & friends exist seems to suggest that perhaps it's useful to have locale-dependent versions of those functions?

@mstorsjo You've done some work related to locales, do you have an opinion on this?

Not much opinion - I didn't touch much of the implementation other than staring at test details... Offhand, from what I remember from some localization work ages ago, I think it's possible to express numbers with other unicode chars than the regular '0' - '9' (in Arabic/Farsi maybe, although I might misremember). So it's at least theoretically possible that a strtoull_l, with such a locale, could parse such a string (but it'd be nice to have an example), so that could be a reason for there existing such a function.

However, as @georgthegreat points out, this is only ever called with the "C" locale here, so then it should probably(?) be equivalent to just calling the plain strtoull. So I think this patch could be considered fine. But I don't feel super authoritative on the matter...

georgthegreat added a subscriber: dalias.EditedMar 28 2022, 1:12 PM

this is only ever called with the "C" locale here, so then it should probably(?) be equivalent to just calling the plain strtoull

Unfortunately, it is not.
As I see in glibc sources, strtoll is implemented (through a extensive chain of weak_aliases and macro definitions) as follows:

INT
INTERNAL (strtol) (const STRING_TYPE *nptr, STRING_TYPE **endptr,
           int base, int group)
{
  return INTERNAL (__strtol_l) (nptr, endptr, base, group, _NL_CURRENT_LOCALE);
}

with NL_CURRENT_LOCALE used for extracting the current locale from tls:

#define _NL_CURRENT_LOCALE  (__libc_tsd_get (locale_t, LOCALE))

So this PR indeed changes to behavior to take current locale into account.

My whole research started from an attempt to remove a patch locally applied to our copy of musl.
@mstorsjo point regarding Arabic / Farsi locales (though I unable to test it) is a good point to ask what musl developers think.
@dalias could you comment on the reasons for musl to have no strto*_l methods for integers?

The lack of _l methods in musl looks intentional:

https://git.musl-libc.org/cgit/musl/tree/include/stdlib.h#n161

musl has these methods for floating point types in glibc compatibility mode, yet there is no trace of locale-dependent integer parsing.

I wonder whether this is the right approach. There's include/__support/musl/xlocale.h which defines strtoll_l (it drops the locale and calls strtoll). So this should work on musl. Did you enable the CMake option LIBCXX_HAS_MUSL_LIBC?

Thanks for pointing this out, @Mordante.
It turns out that our local version of musl/xlocale.h was lacking these two proxy functions for unknown reason.
Once I have restored them, the things got fixed.

While this fix allows to workaround the problem, I still see no reason for libcxx to provide strtoull_l.
It looks like it should be either provided by the underlying libc, or must not be used at all.

Current proxy from musl/xlocale.h does exactly the same this as I am proposing here, but does it libc-dependent way which only makes the logic more confusing.

Good to hear it now works on your platform!

I did some digging and I'm not sure whether it's safe to switch to stroull even when the Standard requires this.
__num_get_unsigned_integral is used in Stage 3 of do_get http://eel.is/c++draft/facet.num.get.virtuals#3.3
This function uses the locale of the ios_base, which may differ from the global locale.

Based C17's FDIS https://web.archive.org/web/20181230041359/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf
7.22.1.4 The strtol, strtoll, strtoul, and strtoull functions
§ 6 In other than the "C" locale, additional locale-specific subject sequence forms may be accepted.

So this leaves room for strtoull to accept its input in a locale specific way.

So before changing this I like to make sure this is really safe. I don't know how well our test coverage is this area is, but my experience is that our locale validation isn't as extensive as it should be. (For example, last weekend I discovered we don't test time_put's alternative digits with a locale that uses alternative digits. I'm working on a patch to improve this.)

Maybe @howard.hinnant can shed more light on why the code uses strtoll_l.

ldionne requested changes to this revision.Mar 31 2022, 5:48 AM

Thanks a lot everyone for the discussion. So -- it looks like we don't want to move forward with this patch, at least not until we have a better understanding of what support we're losing concretely. I'll mark this as "request changes" to make that clear. If we don't have a push for this within a couple of weeks, I may suggest abandoning the revision to clear up the review queue.

This revision now requires changes to proceed.Mar 31 2022, 5:48 AM
georgthegreat abandoned this revision.Mar 31 2022, 6:26 AM

I think this can be abandoned right now, I have no rationale to persist with merging this PR, as my initial problem with musl was fixed.