This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix simple cases of locale name construction
ClosedPublic

Authored by Jake-Egan on Feb 10 2022, 7:20 AM.

Details

Summary

When using the following constructors:

locale(const locale& other, const char* std_name, category cat);
locale(const locale& other, const string& std_name, category cat);
locale(const locale& other, const locale& one, category cats);

The new locale name is always "*". Locale names formed from parts of two named locales (that is, C++ locales having names) are supposed to have names in turn (see C++20 subclause 28.3.1.1 [locale.general] paragraph 8). This patch fixes the name construction for cases when either of locales are unnamed, when the category is locale::none, and when the two locale names are the same.

Diff Detail

Event Timeline

Jake-Egan requested review of this revision.Feb 10 2022, 7:20 AM
Jake-Egan created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2022, 7:20 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Jake-Egan edited the summary of this revision. (Show Details)Feb 10 2022, 7:22 AM
Jake-Egan edited the summary of this revision. (Show Details)Feb 10 2022, 7:30 AM
ldionne requested changes to this revision.Feb 10 2022, 8:34 AM
ldionne added a subscriber: ldionne.

Thanks for the patch! locales are not my strength, so I left some comment but TBH I don't fully grok why we'd want to make this change. Are we non-conforming at the moment?

libcxx/src/locale.cpp
315

You seem to have gone from 4 spaces to 2 spaces in the indentation? Please indent consistently with the rest of the code.

323–328

Ditto below.

552

The Standard says for e.g. locale( const locale& other, const locale& one, category cat );:

Constructs a copy of other except for all the facets identified by the cat argument, which are copied from one. If both other and one have names, then the resulting locale also has a name.

Here, it seems that if other doesn't have a name but one does, the resulting locale will have a name. Is that intended?

561

This is nitpicky, but this should be indented 4 spaces to be consistent with surrounding code.

libcxx/test/std/localization/locales/locale/locale.cons/name_construction.pass.cpp
25–27

Perhaps this would be easier to follow below?

std::locale us(LOCALE_en_US_UTF_8);
std::locale fr(LOCALE_fr_FR_UTF_8);
std::locale unnamed(std::locale(), new std::ctype<char>);
This revision now requires changes to proceed.Feb 10 2022, 8:34 AM

Thanks for the patch! locales are not my strength, so I left some comment but TBH I don't fully grok why we'd want to make this change. Are we non-conforming at the moment?

Yes, because locales formed from parts of two named locales (that is, C++ locales having names) have names in turn (see C++20 subclause 28.3.1.1 [locale.general] paragraph 8). Locales with names are supposed to compare equal if their names compare equal (see C++20 subclause 28.3.1.5 [locale.operators] paragraph 1). If the implementation wants to give named locales that are formed differently the same name (*), then it is required to report that they are equal. It seems an interesting experiment in conforming but unhelpful implementation methods to retain the * and correct the equality comparison for conformance, thus we did not propose that direction for this patch.

#include <locale>
#include <cassert>
#include <stdio.h>
int main(void) {
  std::locale America("en_US.UTF-8");
  std::locale Generic("C");
  fprintf(stderr, "%s\n", America.name().c_str());
  fprintf(stderr, "%s\n", Generic.name().c_str());

  std::locale Financial(Generic, America, std::locale::monetary);
  fprintf(stderr, "%s\n", Financial.name().c_str());

  std::locale Ascii(America, Generic, std::locale::collate | std::locale::ctype);
  fprintf(stderr, "%s\n", Ascii.name().c_str());

  assert((Financial.name() == Ascii.name()) == (Financial == Ascii));
}

Compiler Explorer link: https://godbolt.org/z/vKj9qMMT4

libcxx/src/locale.cpp
552

Sounds like the specification issue referred to by https://cplusplus.github.io/LWG/lwg-active.html#2295.

libcxx/src/locale.cpp
552

Sounds like the specification issue referred to by https://cplusplus.github.io/LWG/lwg-active.html#2295.

Interestingly enough, libc++ chooses to keep the name for the nullptr facet case. So, it seems more consistent for libc++ to prefer to keep the name in the locale::none case.

If the implementation wants to give named locales that are formed differently the same name (*), then it is required to report that they are equal. It seems an interesting experiment in conforming but unhelpful implementation methods to retain the * and correct the equality comparison for conformance, thus we did not propose that direction for this patch.

I also opened a new issue for libc++ locales that compare equal but behave differently: #53797

libcxx/src/locale.cpp
552

Here, it seems that if other doesn't have a name but one does, the resulting locale will have a name. Is that intended?

Oh, you are talking of the locale::all case more than the locale::none one. I agree that the locale::all case is a problem. For example, the standard categories do not necessarily cover all of the categories present in the C implementation. locale:all should only modify the standard categories in terms of the name.

553–554

See above comment.

564

This call to setlocale is not okay. It modifies the current C locale for the program.

libcxx/src/locale.cpp
564

For example, on Linux, the name of the locale can be discovered piece-wise by using _NL_LOCALE_NAME:

locale_t p = newlocale(LC_ALL, "", nullptr);
fprintf(stderr, "%s\n", nl_langinfo_l(_NL_LOCALE_NAME(LC_NUMERIC), p));
578

All signs are that the AIX format is space-separated.

583
588–590
592

Safer to check that the length is not negative because of unexpected format.

595–596

Avoid modifying the string all the time when updating a scan position will do.

602

Although AIX is happy with having 5 here and getting the 6th added after this check, z/OS seems to require 8 (7 at the point of this check).

604
607
631

Same comment as for the previous loop.

634

Same comment as for the previous loop.

641

Should there not be a check that there are at least enough for the indexing based on the category?

681
685–687

These appear to be documented as GNU extensions. The #else block here is unlikely to be appropriate on Darwin, BSD, or Windows.

libcxx/src/locale.cpp
563

It is insufficient to only do this for "".
On Linux, the environment is queried for each category that is specified with an empty string in the expanded LC_ALL format, e.g. (https://godbolt.org/z/cvYMcMbx4):

setlocale(LC_ALL, "LC_CTYPE=;LC_NUMERIC=;LC_TIME=;LC_COLLATE=en_US.UTF-8;LC_MONETARY=;LC_MESSAGES=;LC_PAPER=;LC_NAME=;LC_ADDRESS=;LC_TELEPHONE=;LC_MEASUREMENT=;LC_IDENTIFICATION=");
Jake-Egan planned changes to this revision.Feb 15 2022, 12:00 PM

For info: getlocalename_l is the proposed/new interface to get (per category) locale names in a thread-safe manner.

Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 1:57 PM
Jake-Egan edited the summary of this revision. (Show Details)

Reducing this patch to the simpler cases for easier review. Will address the more complicated cases in
another patch.

Jake-Egan marked 3 inline comments as done.Mar 16 2023, 12:19 PM
Jake-Egan retitled this revision from [libc++] Fix locale name construction to [libc++] Fix simple cases of locale name construction.Mar 16 2023, 12:22 PM
Jake-Egan edited the summary of this revision. (Show Details)

Fix indent.

Jake-Egan edited the summary of this revision. (Show Details)Mar 16 2023, 12:29 PM
Jake-Egan updated this revision to Diff 505915.Mar 16 2023, 1:30 PM

Fix indent 2.

Jake-Egan updated this revision to Diff 509646.Mar 30 2023, 6:18 AM

XFAIL apple backdeployment targets.

Jake-Egan updated this revision to Diff 510121.Mar 31 2023, 1:32 PM
Jake-Egan updated this revision to Diff 510492.Apr 3 2023, 7:20 AM

Can you add the relevant part of your reply to the commit message, that makes looking at the reason for the change easier in the future.

Yes, because locales formed from parts of two named locales (that is, C++ locales having names) have names in turn (see C++20 subclause 28.3.1.1 [locale.general] paragraph 8). Locales with names are supposed to compare equal if their names compare equal (see C++20 subclause 28.3.1.5 [locale.operators] paragraph 1). If the implementation wants to give named locales that are formed differently the same name (*), then it is required to report that they are equal. It seems an interesting experiment in conforming but unhelpful implementation methods to retain the * and correct the equality comparison for conformance, thus we did not propose that direction for this patch.
libcxx/src/locale.cpp
179

Nit why pass by value?
Why a member function instead of a non-member function?

Jake-Egan edited the summary of this revision. (Show Details)Apr 18 2023, 8:13 AM
Jake-Egan updated this revision to Diff 514653.Apr 18 2023, 9:00 AM
  • Updated the description
  • Use pass by reference
  • Use non-member function
Jake-Egan marked an inline comment as done.Apr 18 2023, 9:00 AM
Jake-Egan updated this revision to Diff 520462.May 8 2023, 12:12 PM
  • Trigger CI
Jake-Egan updated this revision to Diff 520672.May 9 2023, 6:09 AM
Jake-Egan updated this revision to Diff 520674.May 9 2023, 6:19 AM

I see several comments off @ldionne and @hubert.reinterpretcast not marked as done. Are they done?

Jake-Egan marked 5 inline comments as done.May 15 2023, 6:35 AM

I see several comments off @ldionne and @hubert.reinterpretcast not marked as done. Are they done?

This patch used to be a lot larger, but I've reduced it to a few cases. All the comments regarding the reduced patch have been addressed. I'll be posting another patch that addresses the rest of the comments.

ldionne accepted this revision.Jul 3 2023, 7:21 AM

Let's rebase and ship, sorry for the delay. This LGTM now, it's indeed much smaller than initially.

libcxx/src/locale.cpp
131–141
This revision is now accepted and ready to land.Jul 3 2023, 7:21 AM
Jake-Egan marked an inline comment as done.Jul 5 2023, 6:05 PM
Jake-Egan updated this revision to Diff 537780.Jul 6 2023, 10:21 AM

Remove whitespace

This revision was automatically updated to reflect the committed changes.