This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Add an option to disable wide character support in libc++
ClosedPublic

Authored by ldionne on Oct 6 2021, 1:20 PM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Commits
rGf4c1258d5633: [libc++] Add an option to disable wide character support in libc++
Summary

Some embedded platforms do not wish to support the C library functionality
for handling wchar_t because they have no use for it. It makes sense for
libc++ to work properly on those platforms, so this commit adds a carve-out
of functionality for wchar_t.

Unfortunately, unlike some other carve-outs (e.g. random device), this
patch touches several parts of the library. However, despite the wide
impact of this patch, I still think it is important to support this
configuration since it makes it much simpler to port libc++ to some
embedded platforms.

Diff Detail

Event Timeline

ldionne created this revision.Oct 6 2021, 1:20 PM
ldionne requested review of this revision.Oct 6 2021, 1:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2021, 1:20 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

This is a monstruous diff, but fortunately it's pretty mechanical.

Calling out the only two things that are tricky in this patch IMO. If someone's willing to take a look at this patch, it would be great.

libcxx/include/__mbstate_t.h
2

I'm calling out attention on this file since I'm not really satisfied by the solution, but don't know how to make it better. Suggestions welcome.

This patch was developed on macOS so obviously some of this might not fly out-of-the-box on Linux, but we should find a way to make it work on all platforms, all the time.

libcxx/src/locale.cpp
4552

Calling this out too, I'm not entirely sure how to handle this properly when wchar_t isn't supported.

ldionne updated this revision to Diff 378261.Oct 8 2021, 8:58 AM

Rebase onto main.

We'll want to land this fairly quickly because of the size of the patch, which lends itself nicely
to bit rotting.

ldionne updated this revision to Diff 378295.Oct 8 2021, 10:29 AM

Fix some failures.

I agree we should try to land this as quickly as possible, this patch seems a huge effort!
I mainly looked at the buildsystem related, format related and locale files. I mostly ignored the tests; I expect Buildkite to help there.

I didn't see any huge issues.

libcxx/CMakeLists.txt
123

Can you add this option to docs/BuildingLibcxx.rst?

Is this feature worth mentioning in the release notes?

libcxx/include/__format/formatter_bool.h
94

Can you modify formatter_char.h in this directory and disable formatter<char, wchar_t> and formatter<wchar_t, wchar_t>?

libcxx/include/regex
1073

I'm not fond of formatting changes in this patch. I would strongly recommend to just commit the format changes now so this patch looks cleaner.

Somehow two of my comments weren't submitted and were lost. So here they are again.

libcxx/src/locale.cpp
4573

I we should return false instead of reaching _LIBCPP_UNREACHABLE();.

When I understand the code correctly this is only an issue when the supplied "string" contains more than one character. This basically means numpunct's decimal_point or thousands_sep isn't a single character. I think it would suffice when adding this return and handle the returned false in the callers.

When vendors report that this doesn't work properly we can discuss with them what a better solution would be, for example using a lookup table to translate their "string" to a char. For now that seems over engineering.

4667–4668

I think we should validate the return value here and two lines below.
Something like done in moneypunct_byname<char, false>

if (!checked_string_to_char_convert(__decimal_point_,
                                    lc->mon_decimal_point,
                                    loc.get()))
  __decimal_point_ = base::do_decimal_point();
ldionne updated this revision to Diff 378349.Oct 8 2021, 1:56 PM
ldionne marked 5 inline comments as done.

Address review comments. This should still fail on Linux, I'll address that next week.

libcxx/CMakeLists.txt
123

Did both, good suggestions!

As a self note, I don't really like the duplication of information between the CMake file and the Sphinx documentation for build options. They are often out-of-sync and not everything is everywhere. But I just wanted to raise awareness about the situation -- this isn't the right place to fix it.

libcxx/include/regex
1073

You're right, sorry this slipped through. It should be one of the only places where there are such changes, and I'll avoid touching the formatting at all.

libcxx/src/locale.cpp
4667–4668

Thanks, honestly <locale> is gibberish to me.

Mordante added inline comments.Oct 9 2021, 7:41 AM
libcxx/CMakeLists.txt
123

I agree I think we mentioned this duplication issue before.

libcxx/include/__mbstate_t.h
32

This fixes this compilation error on my system.

libcxx/include/locale
218

After fixing mbstate_t this header give me issues.
Commenting out a set of functions fixed my build issues.
For your convenience I created D111492 with my local modifications which is building here
https://buildkite.com/llvm-project/libcxx-ci/builds/5848

ldionne updated this revision to Diff 378735.Oct 11 2021, 10:51 AM
ldionne marked 2 inline comments as done.

Address CI failures (hopefully).

libcxx/include/locale
218

Yup thanks, I fixed it locally.

ldionne accepted this revision as: Restricted Project.Oct 12 2021, 2:45 AM

I'm going to ship this now since this is a giant patch and I want to reduce the likelihood of conflicts with other patches. If there are additional comments, I'll fix them post-commit.

This revision was not accepted when it landed; it landed in state Needs Review.Oct 12 2021, 3:08 AM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Oct 13 2021, 10:01 AM

This is a cool feature!

I showed it to someone (not at work) and they provided the following feedback (see comment on wchar.h). I think it's a good point so I'm forwarding it, but up to you what to do with it.

libcxx/include/wchar.h
113

"""
I know it's not my job to judge, but these lines might lead to unnecessary problems:
I assume there's quite a few build scripts that look at this and say, "hey, I've got wchar.h, why not include it"
... this is not just an assumption. libc++ does the same thing with locale.h and ncurses's c++ example files greedily include it if they find it
"""

Quuxplusone added inline comments.
libcxx/include/wchar.h
113

+1 in that this is also inconsistent with how we treat #include <ranges> in pre-C++20 modes, and so on. For better or worse, the precedent is pretty consistently "when a header isn't supported, including it should be a no-op."

Mordante added inline comments.Oct 13 2021, 10:33 AM
libcxx/include/wchar.h
113

I know we discussed this for the incomplete ranges and format header and decided not to do an error.

However there are more CMake options that turn including a specific header in an error. For example when disabling _LIBCPP_HAS_NO_LOCALIZATION including locale.h will result in an error. I don't have a strong opinion on whether or not we should remove the #error in wchar.h, but if we remove it I think we should evaluate the other #errors in libc++.