This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Use the default initializer for char_type in std::num_get::do_get.
ClosedPublic

Authored by ldionne on Apr 6 2021, 7:33 PM.

Details

Summary

This is to fix the following error that occurred when the char type has the shown constructors.

root@amdsuplus1:/test# /llvm-install/bin/clang++ -o /dev/null -c test.cc -stdlib=libc++
In file included from test.cc:1:
/llvm-install/bin/../include/c++/v1/locale:1072:15: error: conversion from 'int' to 'std::num_get<Char>::char_type' (aka 'Char') is ambiguous
    char_type __thousands_sep = 0;
              ^                 ~
/llvm-install/bin/../include/c++/v1/locale:583:14: note: in instantiation of member function 'std::num_get<Char>::do_get' requested here
    explicit num_get(size_t __refs = 0)
             ^
test.cc:49:18: note: in instantiation of member function 'std::num_get<Char>::num_get' requested here
void foo() { new std::num_get< Char >; }
                 ^
test.cc:6:3: note: candidate constructor
  Char(char);
  ^
test.cc:7:3: note: candidate constructor
  Char(unsigned);
  ^
1 error generated.
root@amdsuplus1:/test# cat test.cc
#include <locale>
class Char {
public:
  typedef int32_t a;
  Char();
  Char(char);
  Char(unsigned);
  explicit Char(a);
  operator a() const;
};
template <> struct std::char_traits< Char > {
  typedef Char char_type;
  typedef int int_type;
  typedef streamoff off_type;
  typedef streampos pos_type;
  static char_type *find(const char_type *, size_t, char_type);
  static char_type to_char_type(int_type);
  static int_type to_int_type(char_type);
  static bool eq_int_type(int_type, int_type);
  static int_type eof();
};
namespace std {
template <> class ctype< Char > : public locale::facet {
public:
  static locale::id id;
  Char toupper(Char) const;
  char widen(const char *, const char *, Char *) const;
};
template <> class numpunct< Char > : public std::locale::facet {
public:
  typedef basic_string< Char > string_type;
  static locale::id id;
  Char decimal_point() const;
  Char thousands_sep() const;
  string grouping() const;
  string_type truename() const;
  string_type falsename() const;
};
template <> class basic_string< Char > {
public:
  typedef allocator< Char >::const_reference const_reference;
  const_reference operator[](int) const;
  int copy(Char *, int) const;
  int size() const;
  bool empty() const;
};
}

void foo() { new std::num_get< Char >; }

root@amdsuplus1:/test# /llvm-install/bin/clang++ --version
clang version 13.0.0 (https://github.com/llvm/llvm-project.git 1696b8ae96b2d8bcbf90894bd344a8a090f43c84)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /llvm-install/bin

Diff Detail

Event Timeline

oraluben created this revision.Apr 6 2021, 7:33 PM
oraluben requested review of this revision.Apr 6 2021, 7:33 PM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptApr 6 2021, 7:33 PM

I submit this patch before asking because I thought this is a trivial and easy-to-fix issue, please correct me if I underestimated this problem.

Could you please minimize your example (if possible) and make a test out of it, please?

Have you, by any chance, found the part of standard that requires the standard library to accept it?

libcxx/include/locale
1088

Wouldn't this be more generic?

I doubt that char_type must be constructible from char (at least, no more than from int).
And I guess that a type that is not default-initializable would fail in other places anyway.

oraluben updated this revision to Diff 335742.Apr 7 2021, 12:57 AM

Follow other initializations of __thousands_sep in <locale>.

Could you please minimize your example (if possible) and make a test out of it, please?

This is already the minimal version I've got. Will update the test later.

Have you, by any chance, found the part of the standard that requires the standard library to accept it?

Not really, I think this should be revised based on two facts:

  1. delete any of char/unsigned int constructors will make the code accepted by libc++;
  2. the code works fine with libstdc++.

I updated the revision based on other char_type __thousands_sep initializations in the file. All other initializations use the default constructors and seem to work fine.

This will mean we have a UB. The fact that other places kept __thousands_sep uninitialized is that it's set by __stage2_int_prep/__stage2_float_prep.

oraluben updated this revision to Diff 335749.Apr 7 2021, 1:35 AM

This will mean we have a UB. The fact that other places kept __thousands_sep uninitialized is that it's set by __stage2_int_prep/__stage2_float_prep.

Thanks. Then char_type() would be much better. If there's no default constructor somewhere else is indeed failing.

oraluben marked an inline comment as done.Apr 7 2021, 4:32 AM

@curdeius How to add a compile-only test case? I think an executable test is not necessary for this case.

@curdeius How to add a compile-only test case? I think an executable test is not necessary for this case.

A (passing) compile-only test has an extension .compile.pass.cpp. Have a look at other tests like this. You can also get rid of main altogether in such tests.

oraluben updated this revision to Diff 335800.EditedApr 7 2021, 6:54 AM
oraluben retitled this revision from [libc++] Use the more precise constructor for char_type in std::num_get::do_get. to [libc++] Use the default initializer for char_type in std::num_get::do_get..

I checked the standard and [locale.category] says only num_­get<char>, num_­get<wchar_­t> are valid, so the test case is not a valid C++ program.
However, num_­get<C, InputIterator> is also valid for "any other implementation-defined character types that meet the requirements for a character on which any of the iostream components can be instantiated".

Quuxplusone requested changes to this revision.Apr 7 2021, 7:34 AM
Quuxplusone added a subscriber: Quuxplusone.
Quuxplusone added inline comments.
libcxx/include/locale
1088

I would like to see some evidence that CharT is permitted to be anything other than char or wchar_t. If it is, then there ought to be a line in the standard that permits it, and there ought to be some libc++ tests that test it.

I went through my local libcxx/include/locale and grepped for TEMPLATE_VIS. At the top of each of these templates, I added the line

static_assert(is_same<_CharT, char>::value || is_same<_CharT, wchar_t>::value, "");

Then I ran all the tests in libcxx/test/*/localization/. They all passed.

So I'd be favorably inclined toward a patch that just added those static_asserts (putting the burden of proof on some user to explain why class-typed CharT should be permitted at all). If this current PR is intended to formally support class-typed CharT in <locale>, then I would ask for testing at the level of our existing tests for char and wchar_t.

Also, if you want to support types other than char and wchar_t, I think you should have a "concept" for what types you're planning to support. The obvious extension would be "character types," or even "integral types"; but you want to go further, into class types. But not all class types, obviously! You can't have a std::num_get<std::mutex>! So, what operations/affordances do you think we actually need out of a "character-like type"? Personally, I would have put "it has a zero value" very high on that list. What if someone gives us a move-only CharT type? or a CharT with an overloaded operator& or operator,? Where's the line drawn, and can you write it down so we all agree on it?

This PR is basically trivial — just changing 0 to char_type() in one place — but it is also a readability regression, in the sense that x = 0 is clearly better code than x = char_type(). All else being equal, we'd obviously prefer to write 0. So if we're going to make this trivial change, we should have a technical reason to do it.

This revision now requires changes to proceed.Apr 7 2021, 7:34 AM

[defns.character.container] allows character type to be classes and [defns.character] allows a character to be any value that provides the definitions required in related libraries, which in <locale> is, "any other implementation-defined character types that meet the requirements for a character on which any of the iostream components can be instantiated", in [locale.category-6]. iostream ([iostreams.limits.pos-2]) only requires "any other implementation-defined character types that meet the requirements for a character on which any of the iostream components can be instantiated".
Since I can instantiated a std::basic_ios<Char>, I think that means Char in the example is a valid char type for <locale>.
FRI: libstdc++ ('s basic_ios) does require such Char to be default-initializable while libc++'s basic_ios doesn't and libc++'s <locale> does.

But I have to say I'm not an expert in C++, I found this case trying to compile tntnet, a package in Debian repository, against libc++. From the perspective of a C++ user, I think this kind of case that using user-defined class as char type is not very rare.

Quuxplusone added inline comments.Apr 8 2021, 8:04 AM
libcxx/include/locale
1088

I retract my objection here, since this seems like a deep can of worms, and the patch is trivial. My current impression is that (1) libc++ is not required to support this use-case; (2) but it is permitted to support "implementation-defined character types" which are not "ordinary character types," which I agree must be meant to signify class types that behave like ordinary character types to some implementation-defined degree; and (3) according to the OP, libstdc++'s implementation-defined rules permit OP's class Char where libc++'s rules currently don't, but this PR gets us closer to agreeing with libstdc++. So, okay, huge mess, PR doesn't make it worse, LGTM. :)

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/char.compile.pass.cpp
62 ↗(On Diff #335800)

I would like this to be a .pass.cpp, not a .compile.pass.cpp, because I'd like us to test that the code not only compiles but also links (e.g. it doesn't accidentally attempt to use any undefined template specializations) and runs with the intended behavior.

I'm surprised CI passes; I would have thought this file needed UNSUPPORTED: libcpp-has-no-localization. Maybe once you make it link, it will need that.

ldionne commandeered this revision.Sep 8 2023, 7:16 AM
ldionne edited reviewers, added: oraluben; removed: ldionne.

[Github PR transition cleanup]

Commandeering to finish.

Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2023, 7:16 AM
ldionne updated this revision to Diff 556260.Sep 8 2023, 7:59 AM
ldionne marked an inline comment as done.
ldionne edited the summary of this revision. (Show Details)

Rebase and make it a basic runtime test.

ldionne accepted this revision.Sep 8 2023, 7:59 AM
ldionne marked an inline comment as done.

Will ship once CI is complete.

This revision was not accepted when it landed; it landed in state Needs Review.Sep 11 2023, 11:16 AM
This revision was automatically updated to reflect the committed changes.
bcain added a subscriber: bcain.Sep 18 2023, 7:39 AM
bcain added inline comments.
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
80

I'm investigating a failure that occurs when on this test case in our downstream. I'm not quite certain but it's possible the ambiguity is due to the fact that we are using a 32-bit architecture?

libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:80:39: error: ambiguous conversion for functional-style cast from 'int' to 'Char'
   80 |   Char toupper(Char c) const { return Char(std::toupper(c.underlying_)); }
      |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ldionne added inline comments.Sep 21 2023, 2:38 PM
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
80

Yes, that could be. I guess

Char(unsigned i) : underlying_(i) {}
explicit Char(std::int32_t i) : underlying_(i) {}

must be ambiguous somehow? Can you open a PR that fixes your problem? (kinda hard without being able to reproduce locally)

bcain added inline comments.Oct 4 2023, 9:54 PM
libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
80

This happens to be effective in getting the test to pass but I'm afraid it might not be a suitable change.

diff --git a/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp b/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
index d7b4b816d975..9d5d6bc03f7f 100644
--- a/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp
@@ -21,8 +21,12 @@
 struct Char {
   Char() = default;
   Char(char c) : underlying_(c) {}
+#if UINTPTR_MAX == 0xffffffffffffffff
   Char(unsigned i) : underlying_(i) {}
   explicit Char(std::int32_t i) : underlying_(i) {}
+#else
+  // 32-bit or other platform, remove potentially ambiguous constructors
+#endif
   operator std::int32_t() const { return underlying_; }
 
   char underlying_;

BTW here is the unabridged error for more context.

# command stderr:
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:80:39: error: ambiguous conversion for functional-style cast from 'int' to 'Char'
   80 |   Char toupper(Char c) const { return Char(std::toupper(c.underlying_)); }
      |                                       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:23:3: note: candidate constructor
   23 |   Char(char c) : underlying_(c) {}
      |   ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:24:3: note: candidate constructor
   24 |   Char(unsigned i) : underlying_(i) {}
      |   ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:25:12: note: candidate constructor
   25 |   explicit Char(std::int32_t i) : underlying_(i) {}
      |            ^
In file included from /local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:16:
In file included from /local/mnt/workspace/hex/obj_runtimes_hex88_qurt_v75_ON_ON_shared/include/c++/v1/locale:200:
/local/mnt/workspace/hex/obj_runtimes_hex88_qurt_v75_ON_ON_shared/include/c++/v1/__iterator/istreambuf_iterator.h:82:17: error: ambiguous conversion for static_cast from 'int_type' (aka 'int') to 'char_type' (aka 'Char')
   82 |         {return static_cast<char_type>(__sbuf_->sgetc());}
      |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/local/mnt/workspace/hex/obj_runtimes_hex88_qurt_v75_ON_ON_shared/include/c++/v1/locale:323:22: note: in instantiation of member function 'std::istreambuf_iterator<Char>::operator*' requested here
  323 |         _CharT __c = *__b;
      |                      ^
/local/mnt/workspace/hex/obj_runtimes_hex88_qurt_v75_ON_ON_shared/include/c++/v1/locale:930:37: note: in instantiation of function template specialization 'std::__scan_keyword<std::istreambuf_iterator<Char>, const std::basic_string<Char> *, std::ctype<Char>>' requested here
  930 |     const string_type* __i = _VSTD::__scan_keyword(__b, __e, __names, __names+2,
      |                                     ^
/local/mnt/workspace/hex/obj_runtimes_hex88_qurt_v75_ON_ON_shared/include/c++/v1/locale:610:14: note: in instantiation of member function 'std::num_get<Char>::do_get' requested here
  610 |     explicit num_get(size_t __refs = 0)
      |              ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:113:45: note: in instantiation of member function 'std::num_get<Char>::num_get' requested here
  113 |   std::locale l(std::locale::classic(), new std::num_get<Char>);
      |                                             ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:23:3: note: candidate constructor
   23 |   Char(char c) : underlying_(c) {}
      |   ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:24:3: note: candidate constructor
   24 |   Char(unsigned i) : underlying_(i) {}
      |   ^
/local/mnt/workspace/hex/llvm-project/libcxx/test/std/localization/locale.categories/category.numeric/locale.num.get/user_defined_char_type.pass.cpp:25:12: note: candidate constructor
   25 |   explicit Char(std::int32_t i) : underlying_(i) {}
      |            ^
2 errors generated.