This is an archive of the discontinued LLVM Phabricator instance.

Make libcxx work according to Clang C++ Status if -fchar8_t is passed
AbandonedPublic

Authored by ldionne on Nov 27 2020, 12:39 AM.

Details

Reviewers
mclow.lists
EricWF
miscco
georgthegreat
Group Reviewers
Restricted Project
Summary

At the time libcxx does not work as expected with -std=c++17 -fchar8_t.

In my POV, this option was indended to enable the entire P0482, as it has landed into the standard.

At least Clang C++ Status says so:

Prior to Clang 8, this feature (i. e. entire proposal – Yuriy) is not enabled by -std=c++20, but can be enabled with -fchar8_t.

This PR makes libcxx to define u8string / u8string_view regarding the standard feature flag, disregarding wrether the standard was switched to C++20.

Diff Detail

Event Timeline

georgthegreat created this revision.Nov 27 2020, 12:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2020, 12:40 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
georgthegreat requested review of this revision.Nov 27 2020, 12:40 AM
georgthegreat added inline comments.Nov 27 2020, 12:42 AM
libcxx/include/__config
956

As of this PR, _LIBCPP_NO_HAS_CHAR8_T became a simple negation of __cpp_char8_t.

The configuration might become easier to read upon replacing all invocation of these options with feature testing macro.

georgthegreat retitled this revision from Make libcxx work as expected with -fchar8_t to Make libcxx work according to clang user manual if -fchar8_t is passed.Nov 27 2020, 12:44 AM
georgthegreat edited the summary of this revision. (Show Details)
georgthegreat retitled this revision from Make libcxx work according to clang user manual if -fchar8_t is passed to Make libcxx work according to Clang C++ Status if -fchar8_t is passed.Nov 27 2020, 12:56 AM
georgthegreat edited the summary of this revision. (Show Details)
miscco accepted this revision.Nov 27 2020, 1:33 AM
miscco added a subscriber: miscco.

I am super in favor of using feature macros everywhere. That said I guess we need to thumbs up from the maintainers

NOTE: I am not a maintainer, so please wait for an approval of one
georgthegreat edited the summary of this revision. (Show Details)Nov 27 2020, 1:52 AM

Another thing to consider is that -fchar8_t changes the behavior of u8"" literals, hence they can no longer be assigned to old good std::string-types.
At the time libcxx provides no alternative.

ldionne requested changes to this revision.Nov 27 2020, 7:15 AM
ldionne added a subscriber: ldionne.

So.. my issue with this is that it's a step towards enabling C++20 features in C++17 mode, which is just a bad idea to begin with. We have plenty of problems caused by trying to support features in older standards, and I just don't see the point of it. If you need char8_t, just use -std=c++20 -- what's the issue?

So, actually, a change that would be more in line with what I'd like is instead:

#if _LIBCPP_STD_VER <= 17

Then, we'd assume that if you're building as C++20, then you do have char8_t support. This would only break on older compilers, which we currently claim to support but don't actually support properly (we don't test on a regular basis, etc).

I'm definitely willing to have a conversation about this, though, so I'm eager to hear what you have to say about the above.

This revision now requires changes to proceed.Nov 27 2020, 7:15 AM

@ldionne, this PR appeared as a part of a large monorepo migration to a new standard.
If I just change -std=c++17 to -std=c++20, thousands of new errors appear (even upon switching off all the major deprecation warnings), so I am trying to split the switch into smaller parts and fix them one by one.
New standard changes behavior of u8"" literals in an incompatible way, so I have to remove these literals from the codebase and prevent new std::string = u8"blah" lines from appearance.

So, I am going to apply this change to a local copy of libcxx, no matter if this diff will land into upstream. I just can not think of any other solution. Switching entire codebase in one commit is not an available option.

Now, to the changes itself.
I think that WG21 invented feature testing macros for a reason.
At the time checking _LIBCPP_STD_VER just does not work, because it corresponds to a different set of features, depending on which compiler is used (i. e. clang12 will implement more of C++20 compared to clang11).
IMO, an ideal implementation must not check _LIBCPP_STD_VER, but rather check if a specific feature is available and enabled in the current compiler, no matter how it was switched on.

NB: in the frontend world of javascript / css, a similar technique, codenamed as feature detection is widely used.
See, e. g.:

https://en.wikipedia.org/wiki/Feature_detection_(web_development)
https://developer.mozilla.org/en-US/docs/Learn/Tools_and_testing/Cross_browser_testing/Feature_detection

Found the official explanation and motivation regarding the feature testing macros.

It lacks a proper distinction between compiler and standard library, simply using term Implementers. However, as I understand it:

C++ compiler defines (via some internal mechanism, invisible in the code being compiled) __cpp_char8_t, if corresponding proposal is supported (that is, if char8_t is available, is a distinct type and so on).
C++ standard library defines __cpp_lib_char8_t if it implements the library part of the corresponding proposal (that is, if std::u8string{,_view} typedefs are available). __cpp_lib_char8_t requires _cpp_char8_t to be defined.
C++ programmer must check the availability of u8string_view by checking __cpp_lib_char8_t, not by checking the standard version.

georgthegreat requested review of this revision.Nov 29 2020, 10:22 AM
mclow.lists added inline comments.Nov 30 2020, 8:37 AM
libcxx/include/__config
956

I'm a big fan of keeping the libc++-specific macro.

miscco added inline comments.Nov 30 2020, 8:56 AM
libcxx/include/__config
956

Could you name a benefit that vendor specific macros have over the universal ones?

mclow.lists added inline comments.Nov 30 2020, 10:04 AM
libcxx/include/__config
956

Yes.

  1. It's more readable.
  2. Until this patch, it was different from __cpp_char8_t. It may need to be so again in the future.
  1. My thoughts are quite different. This macro requires double negation upon usage: #ifndef _LIBCPP_NO_HAS_CHAR8_T. I personally find troubles reading any negation. #ifdef __cpp_char8_t is much shorter and easier to understand. And it is standartized!

As for the 2, I can not understand why libcxx should have any logic depending on the compiler feature support. Could you elaborate / invent artificial scenario when such logic is necessary?

Could you elaborate / invent artificial scenario when such logic is necessary?

I don't have to invent any such logic. It's right there in the existing code.

and if "and it is standardized", that's a moot point. The question is "what is the most useful way for libc++"?

And I believe that using _LIBCPP_NO_HAS_CHAR8_T, like it does with _LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER, _LIBCPP_HAS_NO_LONG_LONG, _LIBCPP_HAS_NO_NULLPTR, _LIBCPP_HAS_NO_AUTO_TYPE, _LIBCPP_HAS_NO_CXX14_CONSTEXPR, _LIBCPP_HAS_NO_VARIABLE_TEMPLATES, _LIBCPP_HAS_NO_NOEXCEPT, _LIBCPP_HAS_NO_ASAN, _LIBCPP_HAS_NO_CXX20_CHRONO_LITERALS, _LIBCPP_HAS_NO_CXX14_CONSTEXPR, _LIBCPP_HAS_NO_VARIABLE_TEMPLATES, _LIBCPP_HAS_NO_ASAN, _LIBCPP_HAS_NO_CXX14_CONSTEXPR, _LIBCPP_HAS_NO_VARIABLE_TEMPLATES, _LIBCPP_HAS_NO_ASAN, _LIBCPP_HAS_NO_VECTOR_EXTENSION, _LIBCPP_HAS_NO_UNICODE_CHARS, _LIBCPP_HAS_NO_VARIABLE_TEMPLATES, _LIBCPP_HAS_NO_ASAN, _LIBCPP_HAS_NO_VECTOR_EXTENSION, _LIBCPP_HAS_NO_INT128, _LIBCPP_HAS_NO_LIBRARY_ALIGNED_ALLOCATION, _LIBCPP_HAS_NO_LIBRARY_ALIGNED_ALLOCATION, _LIBCPP_HAS_NO_LIBRARY_ALIGNED_ALLOCATION, _LIBCPP_HAS_NO_GLOBAL_FILESYSTEM_NAMESPACE, _LIBCPP_HAS_NO_STDIN, _LIBCPP_HAS_NO_STDOUT, _LIBCPP_HAS_NO_THREAD_UNSAFE_C_FUNCTIONS, _LIBCPP_HAS_NO_ATOMIC_HEADER, _LIBCPP_HAS_NO_BUILTIN_ADDRESSOF, _LIBCPP_HAS_NO_BUILTIN_IS_CONSTANT_EVALUATED, _LIBCPP_HAS_NO_OFF_T_FUNCTIONS, _LIBCPP_HAS_NO_DEDUCTION_GUIDES, _LIBCPP_HAS_NO_IS_AGGREGATE, _LIBCPP_HAS_NO_COROUTINES, _LIBCPP_HAS_NO_SPACESHIP_OPERATOR, _LIBCPP_HAS_NO_PRAGMA_PUSH_POP_MACRO, _LIBCPP_HAS_NO_FGETPOS_FSETPOS (and others).

mclow.lists added a comment.EditedNov 30 2020, 11:57 AM

(reading back)

So.. my issue with this is that it's a step towards enabling C++20 features in C++17 mode, which is just a bad idea to begin with.

I have to concur with Louis here. P0482 was adopted for C++20. Not 17, not 14, not 11, not 03.
We (libc++) should provide it for C++20.

And we should be careful to define __cpp_lib_char8_t only when we provide the feature (which we do, based on _LIBCPP_NO_HAS_CHAR8_T)

Just fix the logical expression without introducing radical changes to libc++ logic

ldionne commandeered this revision.Dec 1 2020, 11:50 AM
ldionne edited reviewers, added: georgthegreat; removed: ldionne.

@ldionne, this PR appeared as a part of a large monorepo migration to a new standard.
If I just change -std=c++17 to -std=c++20, thousands of new errors appear (even upon switching off all the major deprecation warnings), so I am trying to split the switch into smaller parts and fix them one by one.

The usual way I've seen migrations being done is by fixing the code such that it works both in C++17 and C++20. You iterate locally, fixing issues and committing them until the whole code base compiles both as C++17 and C++20. You then switch the compiler to -std=c++20, and can get rid of any C++17 specific workarounds you introduced.

Requiring the compiler and standard library to provide some sort of mode -std=c++17-but-with-char8_t is the wrong way to do this. What if you did not even have the -fchar8_t flag? Would you implement it just to ease your transition? The bottom line is that we don't want to backport arbitrary C++20 features to C++17 to ease code base migrations -- that's just not the right way to make these migrations.

If you're wondering why I'm having such a strong stance, take a look at libc++'s attempt to provide C++11 features in C++03. While it had value initially and I understand why that choice was made at the beginning, it has become a terrible mess that we're stuck with. There's bugs and ABI incompatibilities in it, and it adds a lot of complexity for relatively little benefit since C++11 has been around for almost 10 years. Of course, that is made worse by the fact that C++03 and C++11 were greatly different (so 11 features were difficult to implement in 03), however the principle is the same. The library and language are evolved as a coherent whole, and trying to pull features in or out of a given standard mode is always a challenge, and it adds a maintenance burden.

I think that WG21 invented feature testing macros for a reason.

Indeed, but also note that feature test macros were a highly contentious issue, and not all implementers were very favourable to their introduction. But they were standardized, and so libc++ does implement them. However, implementing feature test macros does not imply that the library should start giving meaning to arbitrary combinations of flags (in this case -std=c++17 -fchar8_t). The trap we really want to avoid is what I illustrated above with C++11 features in C++03 mode.

I think that using the compiler's feature test macros in libc++ to conditionally enable library features is reasonable in some cases (e.g. when the library feature is fairly self contained, etc), and for limited times (e.g. until the oldest compiler we support implements the feature). Once our support window moves and the oldest compiler we support definitely supports the feature in the right standard mode, I think it's a lot better to get rid of these conditionals and instead assume the compiler feature works when the right standard is set. This reduces the complexity of the code by removing #ifdefs, which is the source of a lot of our woes in a codebase like libc++. It also reduces the strain on the test suite, which has to be annotated for everything that we conditionally support.

Other unrelated comments for the next time:

  1. Please make sure the LLVM monorepo is added to your Phabricator revision, otherwise CI doesn't kick in.
  2. Please add context to the diff you upload to Phabricator.

I'm going to commandeer this revision to suggest an approach that should solve your issue, while being OK with me (I can't speak for Marshall, though). Basically, I will:

  1. Conditionalize __cpp_lib_char8_t on whether __cpp_char8_t is defined, meaning the compiler supports the feature. I won't look at the Standard version.
  2. Use __cpp_lib_char8_t consistently instead of _LIBCPP_NO_HAS_CHAR8_T -- my personal opinion is that using standardized macros when we can is easier to grok than libc++ defining its own. This seems to be an opinion shared by not all, but some people.
  3. Once we've straightened our compiler requirements (which are currently way outdated), I will switch to defining __cpp_lib_char8_t solely based on the version of __cplusplus, and will update/simplify the conditionals that use __cpp_lib_char8_t inside the library whenever possible.

I'd like to know what you folks think about this plan.

ldionne set the repository for this revision to rG LLVM Github Monorepo.Dec 1 2020, 11:50 AM
ldionne updated this revision to Diff 308728.Dec 1 2020, 11:51 AM
ldionne edited the summary of this revision. (Show Details)

Implement my suggestion -- please let me know if that works with everyone.

georgthegreat accepted this revision.Dec 1 2020, 12:44 PM

You iterate locally, fixing issues and committing them until the whole code base compiles both as C++17 and C++20.

This was the initial plan. The problem arises with u8 literals, as C++20 changes behavior of u8"" literals in an incompatible way. Hence there is no way to use these literals in a codebase which is compatible with both C++17 and C++20. My idea was to remove the literals once, then enable -fchar8_t feature flag to prevent new improperly used literals from appearance.

I have implemented the test to check if everything works and it turned out that is just does not.

The bottom line is that we don't want to backport arbitrary C++20 features to C++17.

I am sorry if this PR appeared to suggest such thing.

My intention was to show that simply testing __cplusplus value is not enough to test if certain feature is supported by compiler or not (I assume that if certain compiler does not fully support specific feature, the library should not implement this feature for this compiler). Though clang11 supports -std=c++20 , it does not support coroutines, modules, operator spaceship and so on. Should the library (try to) implement these feature based on __cplusplus value? I think it should not. So, the options are (values are :

#if __cpluscplus > 202002L && clang_version_major > 12 && clang_version_minor > 1 //clang versions prior 12.1 do not implement feature support

//continue testing every known compiler in the world

or

#if defined(__cpp_feature) && __cpp_feature > 201702 // assume that compiler follows the standard in preprocessor macro definitions

I would choose the latter without making attempts to switch back to testing __cplusplus. There should be a single source of truth.
I am fine with the changes (and, as I said, non-applying them would not block the migration process; I will apply the diff which solves my issue to a local copy of libcxx).

You iterate locally, fixing issues and committing them until the whole code base compiles both as C++17 and C++20.

This was the initial plan. The problem arises with u8 literals, as C++20 changes behavior of u8"" literals in an incompatible way. Hence there is no way to use these literals in a codebase which is compatible with both C++17 and C++20. My idea was to remove the literals once, then enable -fchar8_t feature flag to prevent new improperly used literals from appearance.

I have implemented the test to check if everything works and it turned out that is just does not.

Did you try something like what https://stackoverflow.com/a/56833374/627587 suggests?

I would choose the latter without making attempts to switch back to testing __cplusplus. There should be a single source of truth.

The issue with keeping these conditionals based on __cpp_feature instead of the actual standard version is that in the long term, you end up with a lot of these conditionals. There's then an expectation that the code base works with arbitrary combinations of those being defined, which isn't realistic. Instead, if the majority of conditionals use the standard version, you know for sure that those code paths are being tested, and they're working as intended. It also reduces the cognitive burden since you can take for granted that -std=c++20 means "everything in C++20 will work", with very few exceptions.

I am fine with the changes (and, as I said, non-applying them would not block the migration process; I will apply the diff which solves my issue to a local copy of libcxx).

I think it's fairly important for us to decide which way we're taking the project, since similar questions will happen in the future. We're not only having a discussion about char8_t here, we're having a design discussion about how to handle feature test macros defined by the compiler.

Did you try something like what https://stackoverflow.com/a/56833374/627587 suggests?

I did not, but as the world is larger that std::string / std::string_view, every text processing function will need a duplicate.
Hence I did not consider this to be a working solution.

I think it's fairly important for us to decide which way we're taking the project, since similar questions will happen in the future. We're not only having a discussion about char8_t here, we're having a design discussion about how to handle feature test macros defined by the compiler.

Could you, please, fix the decisions made in libcxx/docs/DesignDocs or somewhere around?

I think this is the wrong direction - getting rid of _LIBCPP_NO_HAS_CHAR8_T
We're making the feature macro __cpp_lib_char8_t always match __cpp_char8_t
There are two macros for a reason.

I think this is the wrong direction - getting rid of _LIBCPP_NO_HAS_CHAR8_T
We're making the feature macro __cpp_lib_char8_t always match __cpp_char8_t
There are two macros for a reason.

Just to make sure I understand, your preference would be this, right? Please correct me if I'm wrong:

#if _LIBCPP_STD_VER > 17 && defined(__cpp_char8_t)
#  define __cpp_lib_char8_t
#endif

In other words, define the library parts of char8_t only when the standard is >= C++20 *and* the compiler supports char8_t. Is that correct? That would basically be the status quo.

Like I said above, I do agree strongly that we shouldn't try to enable C++20 features in C++17, however in this case the compiler has already decided to make this possible (by providing -fchar8_t that can be enabled in C++17 mode), for compatibility reasons documented in the paper. In this instance, I believe that it's worse to have the library be inconsistent with the compiler than to digress from the design principle of not backporting stuff. Thoughts?

Consider the following program:

#include <iostream>

int main()
{
#ifdef __cpp_char8_t
	std::cout <<  __cpp_char8_t << std::endl;
#endif
}

Now, build it five times:

clang++ -std=c++2a -fchar8_t junk.cpp && ./a.out
clang++ -std=c++17 -fchar8_t junk.cpp && ./a.out
clang++ -std=c++14 -fchar8_t junk.cpp && ./a.out
clang++ -std=c++11 -fchar8_t junk.cpp && ./a.out
clang++ -std=c++03 -fchar8_t junk.cpp && ./a.out

What output do you get?

btw, I am ok to limit this behavior to -std=c++17. Condition in <version> might be changed as follows:

# #if _LIBCPP_STD_VER >= 17 && defined(__cpp_char8_t)
#   define __cpp_lib_char8_t                            201811L
# endif

Thus the weird case -std=c++20 -fnochar8_t will be somewhat supported.

ldionne abandoned this revision.Dec 3 2020, 9:38 AM

After speaking with Marshall offline, I am now of the opinion that we don't want to do this, in any form.

  • We don't want to support char8_t in -std=c++17 or older standards, ever. That's according to our policy.
  • We do support -std=c++2a -fno-char8_t, via the check for whether the compiler supports char8_t (using __cpp_char8_t). That supports the migration story, where someone migrates to C++20 by using -std=c++2a but can't migrate the char8_t part, so they use -std=c++2a -fno-char8_t.

Whether we should replace _LIBCPP_HAS_NO_CHAR8_T by !__cpp_lib_char8_t is a different topic. My personal opinion is that we should, in order to remove custom macros as much as we can, but we're still talking about that.

Should not we remove || !defined(__cpp_char8_t) part of LIBCPP_NO_HAS_CHAR8_T then?
This looks like an attempt to support compilation with -std=c++20 -fnochar8_t which is the most weird mode ever (and it is not tested via CI)

Should not we remove || !defined(__cpp_char8_t) part of LIBCPP_NO_HAS_CHAR8_T then?
This looks like an attempt to support compilation with -std=c++20 -fnochar8_t which is the most weird mode ever (and it is not tested via CI)

On the contrary, the intent is that if you can't migrate to -std=c++20 because char8_t is giving you trouble, then you can use -std=c++20 -fno-char8_t to migrate to C++20 except for the char8_t part. So this is purposefully supported, or at least that's the intent. Does that make sense?

Ok, I got you point.
I would not use this approach for migration though.

char8_t is more or less easy / takes predictable time to fix. I can not say the same about other changes in C++20.