Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

[libcxx] basic_ios<wchar_t> cannot store fill character WCHAR_MAX
Needs ReviewPublic

Authored by xingxue on Apr 27 2022, 1:59 PM.

Details

Reviewers
SeanP
hubert.reinterpretcast
daltenty
ldionne
cebowleratibm
zibi
Group Reviewers
Restricted Project
Summary

On AIX and zOS in 64-bit mode, wchar_t is 4 bytes unsigned and wint_t is also 4 bytes. libcxx std::basic_ios uses WEOF to indicate that the fill value is uninitialized, which cannot be distinguished from WCHAR_MAX by std::char_traits<wchar_t>::eq_int_type.

The constraints of the C++ standard impose:

  • Cannot map any wchar_t value to WEOF.
  • All wchar_t values must map uniquely.

Compatibility with the platform C ABI imposes:

  • wint_t and wchar_t have the same width on AIX in 64-bit mode.

The library is implemented correctly with respect to the C++ standard, however, an implementation cannot be compliant where wchar_t is unsigned and the same size as wint_t, consequently I've added a __fill_set_ member to indicate whether or not the fill character has been initialized.

I plan to have IBM follow up with the C++ standard to have the issue dealt with. In particular, C++ imposes requirements on C types not also imposed by C (thereby breaking compatibility with C on platforms with preexisting C implementations).

The change is enabled in both 32 and 64 bit on AIX and z/OS for compatibility with the current IBM system provided libc++ on the targets. Unfortunately this member could have been added more selectively, however, in interest of compatibility I need to propose the patch as-is.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ldionne requested changes to this revision.Jun 8 2022, 7:41 AM

In its current state, this patch would cause any new target where wint_t can't represent all values of CharType to have the same bug. IIUC, the same is true on existing targets like x86_64 Darwin with a custom CharType that has the same size as int_type -- is that correct? If so, I think we should instead fix the problem by default, but provide an escape hatch for existing targets where this would break the ABI to opt out. For example:

template <class _Traits>
struct _OptionalFill {
  _OptionalFill() : __set_(false) { }
  _OptionalFill& operator=(typename _Traits::int_type __x) { __set_ = true; __fill_ = __x; return *this; }
  bool __is_set() const { return __set_; }
  typename _Traits::int_type __fill() const { return __fill_; }

private:
  typename _Traits::int_type __fill_;
  bool __set_;
};

template <class _Traits>
struct _SentinelValueFill {
  _SentinelValueFill() : __fill_(_Traits::eof()) { }
  _SentinelValueFill& operator=(typename _Traits::int_type __x) { __fill_ = __x; return *this; }
  bool __is_set() const { return __fill_ == _Traits::eof(); }
  typename _Traits::int_type __fill() const { return __fill_; }

private:
  typename _Traits::int_type __fill_;
};

template <class _CharT, class _Traits>
class basic_ios {
  // ...

private:
  basic_ostream<char_type, traits_type>* __tie_;

#if defined(_WIN32) || defined(whatever)
  static constexpr bool _OptOutForABICompat = true;
#else
  static constexpr bool _OptOutForABICompat = false;
#endif

  // Here, basically always use an appropriate representation for the fill, except on
  // platforms where we'd rather keep the bug in place than break the ABI.
  using _FillType = _If<sizeof(char_type) >= sizeof(typename traits_type::int_type) && !_OptOutForABICompat, _OptionalFill<_Traits>, _SentinelValueFill<_Traits> >;
  _FillType __fill_;
};

Thoughts?

Also, please make sure to follow https://www.llvm.org/docs/Phabricator.html#requesting-a-review-via-the-command-line when uploading patches, it ensures that there's context in Phabricator, which is useful for reviewing :-)

libcxx/include/ios
707

Do we need to set the __fill_ when we use __fill_set_?

782

When we use __fill_set_, isn't this condition redundant?

libcxx/test/std/input.output/iostream.format/std.manip/setfill_wcharmax.pass.cpp
9–12

Do I understand correctly that all of those targets currently have a bug that one can't set a fill char to WCHAR_MAX, however fixing that requires breaking the ABI?

17
35

Let's use // UNSUPPORTED: no-wide-characters instead.

This revision now requires changes to proceed.Jun 8 2022, 7:41 AM
using _FillType = _If<sizeof(char_type) >= sizeof(typename traits_type::int_type) && !_OptOutForABICompat, _OptionalFill<_Traits>, _SentinelValueFill<_Traits> >;

The problem isn't necessarily exposed based on the type size. For example: if wchar is 4 bytes signed then WCHAR_MAX is 0x7fffffff and WEOF is 0xffffffff so the sentinel value can be represented.

I suggest:

using _FillType = _If<traits_type::eq_int_type(std::numeric_limits<char_type>::max(), traits_type::eof) && !_OptOutForABICompat, _OptionalFill<_Traits>, _SentinelValueFill<_Traits> >;
libcxx/include/ios
707

I agree, we don't have to initialize __fill_ if we have __fill_set_. I was trying to mitigate the need to check __fill_set_ elsewhere but I think we will always need to on the first call.

782

I wrote it this way to mitigate the need to check the __fill_set_ member to minimize the impact of the ABI breakage. The only case you really need to check __fill_set is if the fill charafter was is eof.

Reflecting now I think the effort was misguided because you're always going to get in here on the first call to fill. Probably better to keep it simple than get cute.

libcxx/test/std/input.output/iostream.format/std.manip/setfill_wcharmax.pass.cpp
9–12

Those were all the targets the failed the new test via the CI when I submitted the first version of the patch. I believe these targets will need to break the ABI if they want the fix.

template <class _Traits>
struct _OptionalFill {
  _OptionalFill() : __set_(false) { }
  _OptionalFill& operator=(typename _Traits::int_type __x) { __set_ = true; __fill_ = __x; return *this; }
  bool __is_set() const { return __set_; }
  typename _Traits::int_type __fill() const { return __fill_; }

private:
  typename _Traits::int_type __fill_;
  bool __set_;
};

Thoughts?

From the AIX and z/OS perspective, the replacement of the two members with a member having two fields is not 100% binary compatible for all usage. At least if the basic_ios specialization is used as a non-virtual base class, any derived class members that could be allocated in the padding following the bool would now move. This could be preemptively avoided if the new class type is given a user-provided copy constructor or destructor and the new member made [[no_unique_address]]. I don't know if [[no_unique_address]] is usable is some form under C++98/03 though.

template <class _Traits>
struct _OptionalFill {
  _OptionalFill() : __set_(false) { }
  _OptionalFill& operator=(typename _Traits::int_type __x) { __set_ = true; __fill_ = __x; return *this; }
  bool __is_set() const { return __set_; }
  typename _Traits::int_type __fill() const { return __fill_; }

private:
  typename _Traits::int_type __fill_;
  bool __set_;
};

Thoughts?

From the AIX and z/OS perspective, the replacement of the two members with a member having two fields is not 100% binary compatible for all usage. At least if the basic_ios specialization is used as a non-virtual base class, any derived class members that could be allocated in the padding following the bool would now move. This could be preemptively avoided if the new class type is given a user-provided copy constructor or destructor and the new member made [[no_unique_address]]. I don't know if [[no_unique_address]] is usable is some form under C++98/03 though.

For background, the (customized) libc++ that is provided on AIX used the form where the bool member is injected directly in basic_ios.

I drafted an update with the _OptionalFill / _SentinelFill suggestion and have confirmed that there are layout differences on AIX in derived classes when the first derived member is 1 or 2 byte aligned. Alternatively, even if all inheritance of basic_ios is virtual, there's still potential layout differences where a user derived class has multiple virtual bases and the second virtual base wants to start inside the padding. I did try [[no_unique_address]], however, the syntax is not accepted with -std=c++03. Unless we want to version the 03 code from the C++11 forward code, I don't think there's a way to get a layout compatible type using this approach.

One idea Hubert and I discussed offline is to use a member declaration bool __is_set[Boolean expression] and have it deliberately zero-size when the __is_set member isn't needed. We can still use _OptionalFill / _SentinelFill to control access to the member.

Another obstacle is that the proper check to enable to use of _OptionalFill looks like:

using _FillType = _If<traits_type::eq_int_type(
                          std::numeric_limits<char_type>::max(),
                          traits_type::eof()),
                      _OptionalFill, _SentinelValueFill>;

but note that eq_int_type and eof can only participate in constant expressions where they are constexpr in C++11, so I again have a challenge in delivering a solution that works in C++03. I'll need to use a wchar specialization and use something like WEOF < (wint_t)WCHAR_MIN || (win_t)WCHAR_MAX < WEOF.

Given my limited experience with LLVM libc++ some guidance along which fundamental approach we should take would be appreciated.

cebowleratibm retitled this revision from [libcxx][AIX][z/OS] basic_ios<wchar_t> cannot store fill character WCHAR_MAX to [libcxx] basic_ios<wchar_t> cannot store fill character WCHAR_MAX.

The new approach injects the new __fill_set_ only for wchar_t and only if WEOF is in the range [WCHAR_MIN, WCHAR_MAX]. I didn't use numeric_limits because the solution needs to work in C++03.

On AIX I need to force the injection of the member to retain compatibility, even when it's not actually needed.

I disabled the ABI break on _WIN32 targets, and I suspect I need to further customize the compatibility opt out. The default behaviour for new targets is to adopt the fix for wchar_t if wchar_t is impacted. I can't fix this for other char types without constexpr numeric limits and char_traits and we need to keep the layout compatible between C++03 and C++11.

cebowleratibm marked 3 inline comments as done.Jun 14 2022, 5:52 PM
cebowleratibm added inline comments.Jun 14 2022, 6:00 PM
libcxx/include/ios
699

I think I should respect _LIBCPP_HAS_NO_WIDE_CHARACTERS

cebowleratibm added inline comments.Jun 15 2022, 2:49 PM
libcxx/include/ios
697

I made a mistake, for AIX libc++ compatibility I need to inject the bool member even for non wchar_t types.

  1. The selection of the fill set member is moved to the __config.
  2. This patch retains ABI compatibility for any of the libc++ CI that reported failures on this issue.
  3. Always inject the fill set member for AIX to retain compatibility for AIX.
  4. This only fixes basic_ios<wchar_t>. It is not compliant to instantiate basic_ios with a char type where char_traits::eof is not a sentinel value. #wchar_t is an exception because C++ shouldn't be adding constraints beyond what C guarantees.

I haven't received significant feedback since I overhauled the patch in response to @ldionne 's initial feedback. The new design implements the fix as needed for new targets, which I think was the primary reason for the revision.

cebowleratibm marked 2 inline comments as done.Jun 22 2022, 2:05 PM
libcxx/include/__config
319

It is my understanding that z/OS is also in the same situation.

322

It seems to me that _M_ARM and _M_ARM64 is associated with Windows and not Linux.

libcxx/include/ios
709

If wchar_t is 16-bit unsigned, wint_t is 32-bit signed and WEOF is -1, then WEOF is less than (wint_t)WCHAR_MAX and value becomes true even though the new member is unneeded.

libcxx/include/ios
709

This should detect when wint_t is not wider than wchar_t:

WCHAR_MAX >= WINT_MAX || 1 + (wint_t)WCHAR_MAX == (wint_t)WCHAR_MIN

but using sizeof might be reasonable.

cebowleratibm planned changes to this revision.Jun 24 2022, 1:46 PM
cebowleratibm marked an inline comment as done.

Writing revision to use SentinelFill/OptionalFill classes.

libcxx/include/__config
319

zOS hasn't provided a 32-bit libc++ solution yet. They're 64-bit injected a bool a __fill_set_ member. The compile time selection of the fill_set implementation will do the right thing for zOS 32/64 bit without need for explicit overrides.

libcxx/include/ios
709

I agree. Turns out signed wchar also has a problem with -1 and @ldionne 's suggestion was the right way to go. I'm going to use sizeof and post a revision using OptionalFill/SentinelFill suggestion.

cebowleratibm added inline comments.Jun 24 2022, 4:43 PM
libcxx/include/ios
709

I forgot the reason we chose not to use the OptionalFill/SentinelFill suggestion is that it may affect the layout of derived classes, compared to appending the __is_set_ member directly in basic_ios

I'm now planning a tweaked version of the current revision that uses sizeof but continues to use the zero-length/size 1 array member.

Any basic_ios where sizeof(char_type) >= sizeof(int_type) has at least some char value that overlaps with the eof character. I've modified the proposed test to demonstrate signed char types are affected at -1.

I've proposed a __has_fill_ member directly in basic_ios to retain layout compatibility of derived classes on what IBM has done (downstream) with prior libc++ on AIX.

The patch as proposed also works for zOS. There's no need for explicit change in the config as the sizeof check get the desired layout in both 32 and 64 bit.

cebowleratibm marked 3 inline comments as done.Jun 25 2022, 10:32 AM

Quick testcase fix.

daltenty added inline comments.Jul 12 2022, 10:54 AM
libcxx/include/__config
319

What about adding an ABI guard around this whole block? Something like:

#if _LIBCPP_ABI_BASIC_IOS_HAS_FILL_SET_IF_NEEDED

And defining that for the unstable ABI (_LIBCPP_ABI_VERSION >= 2) in the relevant block here? That would let folks not requiring the stable ABI on the below platforms pickup the fix automatically if needed.

daltenty added inline comments.Jul 12 2022, 10:56 AM
libcxx/include/__config
319

edit: or rather the reverse check:

#ifndef _LIBCPP_ABI_BASIC_IOS_HAS_FILL_SET_IF_NEEDED
cebowleratibm planned changes to this revision.Jul 15 2022, 7:51 AM

Plan to revise the config to pick up the "fill set member as needed" mode in the config for all targets when _LIBCPP_ABI_VERSION >= 2.

libcxx/include/__config
319

There are 3 states:

  1. The fill set member always exists (AIX compat)
  2. The fill set member never exists (non AIX target compat)
  3. Add the fill set member only if needed. (the desirable path)

I chose a single macro _LIBCPP_BASIC_IOS_HAS_FILL_SET to represent these states. The third state is achieved by leaving the macro undefined, which is to say: there's no force override so just do the right thing.

If I introduce another macro _LIBCPP_ABI_BASIC_IOS_HAS_FILL_SET_IF_NEEDED then the config is conflicted if both macros are defined. I thought it better to avoid the potential for things to go wrong.

You have an excellent point that we can enable the "as needed" mode on all targets when _LIBCPP_ABI_VERSION >= 2. If will revise the patch for this.

  • Only add fill set as needed for _LIBCPP_ABI > 1 for all targets
  • zOS 64-bit always needs fill set for compatibility
  • tweaked the test to use the char value for eof as it exposes the problem for both signed and unsigned char types
cebowleratibm marked 2 inline comments as done.

Fix formatting and a typo in the __config

xingxue added inline comments.Jul 19 2022, 8:09 AM
libcxx/include/__config
333

Delete the extra ).

libcxx/include/ios
800–801
813

Formatting.

cebowleratibm planned changes to this revision.Jul 19 2022, 10:21 AM
cebowleratibm marked 3 inline comments as done.

Thanks for catching the formatting problems Xing. I'll need to look at my workflow to see why I didn't catch those.

Formatting and typo.

New test fails on macos targets (as it should) but I needed to correct a typo on the XFAIL line.

The new test passes/fails based on wchar_t/wint_t types, which is tedious to manage across targets with XFAIL so I've decided to enable it only for AIX/zOS targets where it is expected to pass.

xingxue commandeered this revision.Sep 26 2023, 8:36 AM
xingxue edited reviewers, added: cebowleratibm; removed: xingxue.

I am taking over this revision because @cebowleratibm is working on a different project.

Hi @ldionne, It would be appreciated if you could take a look at this patch and provide guidance on how to proceed.

From the AIX and z/OS perspective, the replacement of the two members with a member having two fields is not 100% binary compatible for all usage. At least if the basic_ios specialization is used as a non-virtual base class, any derived class members that could be allocated in the padding following the bool would now move. This could be preemptively avoided if the new class type is given a user-provided copy constructor or destructor and the new member made [[no_unique_address]]. I don't know if [[no_unique_address]] is usable is some form under C++98/03 though.

We may be able to use some form of packed too.

From the AIX and z/OS perspective, the replacement of the two members with a member having two fields is not 100% binary compatible for all usage. At least if the basic_ios specialization is used as a non-virtual base class, any derived class members that could be allocated in the padding following the bool would now move. This could be preemptively avoided if the new class type is given a user-provided copy constructor or destructor and the new member made [[no_unique_address]]. I don't know if [[no_unique_address]] is usable is some form under C++98/03 though.

We may be able to use some form of packed too.

Thanks for the suggestion, I will look into it.

xingxue updated this revision to Diff 557935.EditedOct 30 2023, 7:16 AM

Updated the patch using the approach suggested by @ldionne in comment https://reviews.llvm.org/D124555#3566746 and by @hubert.reinterpretcast in comment https://reviews.llvm.org/D124555#4651438. The layout of this implementation is compatible with what IBM has done (downstream) with prior libc++ on AIX (32-bit and 64-bit) and zOS (64-bit only).

xingxue updated this revision to Diff 557966.Wed, Nov 1, 3:01 PM

Changes:

  • fix the problem in C++03 mode
  • use the condition as suggested for determining the fill type
xingxue updated this revision to Diff 557987.Thu, Nov 2, 4:47 AM

Fixed a typo.

xingxue updated this revision to Diff 557988.Thu, Nov 2, 6:57 AM

Use _LIBCPP_PACKED instead of __attribute__((packed)).

xingxue updated this revision to Diff 557989.Thu, Nov 2, 9:08 AM

Fixed the header file order.

xingxue updated this revision to Diff 557998.Fri, Nov 3, 8:05 AM

Removed _LIBCPP_ABI_VERSION check.