This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Consistently enable __CORRECT_ISO_CPP_WCHAR_H_PROTO in mbstate.
ClosedPublic

Authored by rupprecht on May 5 2023, 7:16 PM.

Details

Summary

In libc++'s wchar.h, before we forward to the system wchar.h, we set __CORRECT_ISO_CPP_WCHAR_H_PROTO to ensure it defines the correct signature (e.g. extern "C++" const wchar_t *wmemchr and not extern wchar_t *wmemchr). After D148542, there are cases where we include the system wchar.h from within __mbstate_t.h without setting that, and so we get a function type mismatch if we transitively include wchar.h through multiple headers in a modules-enabled build. Consistently setting it here resolves those build errors.

Alternative 1: we could put this in __config instead. I chose to put it here for a more limited scope.

Alternative 2: we could patch wchar.h itself to work correctly and remove references __CORRECT_ISO_CPP_WCHAR_H_PROTO from libc++ entirely. It does already set it, but with an additional condition that it is being built by GCC >= 4.4. Clang does pretend to be GCC via __GNUC__ etc. which can be controlled via -fgnuc-version command line flags, but that might have other consequences.

Diff Detail

Event Timeline

rupprecht created this revision.May 5 2023, 7:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 7:16 PM
rupprecht requested review of this revision.May 5 2023, 7:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2023, 7:16 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
dxf added a subscriber: dxf.May 5 2023, 7:21 PM

I can't reproduce this outside of a modules-enabled build, and unfortunately I don't know how to write a modules test case. Anyway, here's the stderr I see:

wchar.h:316:29: error: functions that differ only in their return type cannot be overloaded
extern "C++" const wchar_t *wmemchr (const wchar_t *__s, wchar_t __c,
                   ~~~~~~~~~^
wchar.h:320:17: note: previous declaration is here
extern wchar_t *wmemchr (const wchar_t *__s, wchar_t __c, size_t __n)
       ~~~~~~~~~^

Those two definitions are the difference between __CORRECT_ISO_CPP_WCHAR_H_PROTO being defined vs not defined.

MaskRay accepted this revision.EditedMay 5 2023, 9:17 PM
MaskRay added a subscriber: MaskRay.

D148542 introduced a bug.

  • iosfwd includes __std_mbstate_t.h, which includes __mbstate_t.h, which includes glibc wchar.h without defining __CORRECT_ISO_CPP_WCHAR_H_PROTO.
  • cwchar includes wchar.h, which defines __CORRECT_ISO_CPP_WCHAR_H_PROTO and then includes glibc wchar.h.

If glibc wchar.h doesn't define __CORRECT_ISO_CPP_WCHAR_H_PROTO itself, we will get mismatching declarations.
Before 2019-12, glibc wchar.h didn't define __CORRECT_ISO_CPP_WCHAR_H_PROTO (https://sourceware.org/bugzilla/show_bug.cgi?id=25232).

The patch looks good to me. I think we need the #define __CORRECT_ISO_CPP_WCHAR_H_PROTO everywhere before # include_next <wchar.h>.

I have some write-up last year: https://maskray.me/blog/2022-05-15-c-standard-library-headers-in-c++

Thanks for working on this. I'm not too familiar with this part, so I really would like @ldionne to have a look too.

libcxx/include/__mbstate_t.h
33

This looks wrong. When users include both wchar.h and uchar.h the system may try to define __CORRECT_ISO_CPP_WCHAR_H_PROTO twice.

MaskRay added inline comments.May 6 2023, 12:05 PM
libcxx/include/__mbstate_t.h
33

This is fine. For

#  define __CORRECT_ISO_CPP_WCHAR_H_PROTO
#  define __CORRECT_ISO_CPP_WCHAR_H_PROTO

The two replacement lists (empty) are identical and are allowed. Actually, this is what will happen with a glibc newer than 2019-12 (https://sourceware.org/bugzilla/show_bug.cgi?id=25232). glibc wchar.h defines the macro __CORRECT_ISO_CPP_WCHAR_H_PROTO as well.

MaskRay added a comment.EditedMay 6 2023, 4:53 PM

The below demonstrate the glibc<2.26 breakage caused by D148542.

glibc before 2.26 did not provide [bits/types/mbstate_t.h](https://sourceware.org/git/?p=glibc.git;a=commit;h=199fc19d3aaaf57944ef036e15904febe877fc93).
If we include iosfwd before cwchar, we will include glibc wchar.h without defining __CORRECT_ISO_CPP_WCHAR_H_PROTO and will get the "incorrect" declarations.

#ifdef X
#include <iosfwd>
#include <cwchar>
#else
#include <cwchar>
#include <iosfwd>
#endif
int main() {
  wchar_t *v1;
  const wchar_t *cv2 = L"/";
  v1 = wcsstr(cv2, L"/");  // should fail
}
% clang --sysroot=glibc-2.25 -stdlib=libc++ -c test.cc
test.cc:11:8: error: assigning to 'wchar_t *' from 'const wchar_t *' discards qualifiers
  v1 = wcsstr (cv2, L"/");
       ^~~~~~~~~~~~~~~~~~
1 error generated.
% clang --sysroot=glibc-2.25 -stdlib=libc++ -c test.cc -DX  # no error

If we change the libc++ module.modulemap file (in my build, /tmp/Rel/include/c++/v1/module.modulemap) to be a textual header, or just remove the entry from the modulemap file,

module __mbstate_t         { private textual header "__mbstate_t.h"       export * }

the following command will fail. (Yes, we use -x c++ even when compiling a modulemap file, otherwise Clang will consider the modulemap file an object file for linking.)

% clang --sysroot=glibc-2.25 -stdlib=libc++ -c -Xclang -emit-module -fmodules -fmodule-name=std -Xclang=-fmodules-local-submodule-visibility -xc++ /tmp/Rel/include/c++/v1/module.modulemap -o std.pcm
...
In file included from /tmp/Rel/bin/../include/c++/v1/__mbstate_t.h:35:
glibc-2.25/include/wchar.h:227:17: error: functions that differ only in their return type cannot be overloaded
extern wchar_t *wcschr (const wchar_t *__wcs, wchar_t __wc)
       ~~~~~~~~~^
glibc-2.25/include/wchar.h:224:29: note: previous declaration is here
extern "C++" const wchar_t *wcschr (const wchar_t *__wcs, wchar_t __wc)
                   ~~~~~~~~~^

IMO we should just #include <wchar.h> and <uchar.h> instead. That way we don't have to do anything funky in multiple headers. In fact, our wchar.h fixes up some interfaces for non-C++-friendly C libraries.

Mordante added inline comments.May 7 2023, 3:38 AM
libcxx/include/__mbstate_t.h
33

Ah yes good point.

bgraur added a subscriber: bgraur.May 8 2023, 2:45 AM

IMO we should just #include <wchar.h> and <uchar.h> instead. That way we don't have to do anything funky in multiple headers. In fact, our wchar.h fixes up some interfaces for non-C++-friendly C libraries.

The libc++ wchar.h file includes __mbstate_t.h though:

// In libcxx/include/wchar.h:
#  if __has_include_next(<wchar.h>)
#    include_next <wchar.h>
#  else
#    include <__mbstate_t.h> // make sure we have mbstate_t regardless of the existence of <wchar.h>
#  endif

Wouldn't that cause a circular reference between mbstate and wchar? Sorry, this kind of layering between different headers is beyond my expertise.
That said, I did try it, and this works for me:

#elif !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS) && __has_include(<wchar.h>)
#   include <wchar.h> // fall back to the C standard provider of mbstate_t

Just not sure if that is supposed to work generally, and it's a less trivial change than I was planning to make. If that's what you meant, I can update the patch.

The below demonstrate the glibc<2.26 breakage caused by D148542.

glibc before 2.26 did not provide [bits/types/mbstate_t.h](https://sourceware.org/git/?p=glibc.git;a=commit;h=199fc19d3aaaf57944ef036e15904febe877fc93).
If we include iosfwd before cwchar, we will include glibc wchar.h without defining __CORRECT_ISO_CPP_WCHAR_H_PROTO and will get the "incorrect" declarations.

#ifdef X
#include <iosfwd>
#include <cwchar>
#else
#include <cwchar>
#include <iosfwd>
#endif
int main() {
  wchar_t *v1;
  const wchar_t *cv2 = L"/";
  v1 = wcsstr(cv2, L"/");  // should fail
}

Thanks, that repro explains the problem this patch is fixing and works outside of a modules build.

libcxx/include/__mbstate_t.h
33

If preferred, I suppose we could write it as this:

#ifdef __cplusplus && !defined(__CORRECT_ISO_CPP_WCHAR_H_PROTO)
#  define __CORRECT_ISO_CPP_WCHAR_H_PROTO
#endf

But I agree with Ray that it should be fine to leave it as-is; the duplicate identical definitions shouldn't conflict.

rupprecht updated this revision to Diff 520827.May 9 2023, 2:51 PM
  • Use #include instead of setting __CORRECT_ISO_CPP_WCHAR_H_PROTO again

IMO we should just #include <wchar.h> and <uchar.h> instead. That way we don't have to do anything funky in multiple headers. In fact, our wchar.h fixes up some interfaces for non-C++-friendly C libraries.

Updated the patch to do this instead. PTAL.

MaskRay added inline comments.May 9 2023, 3:08 PM
libcxx/include/__mbstate_t.h
34

Is __has_include(<wchar.h>) always true? libc++ ships a copy of wchar.h that dispatches to the C library wchar.h (through #include_next <wchar.h>).

Does any existing test fail when we run the test suite with glibc-2.25, or do we need to add a test case for this?

I think the fix we're looking for is simply to do:

#ifdef __cplusplus
#  define __CORRECT_ISO_CPP_WCHAR_H_PROTO
#endif

// current if-else chain
libcxx/include/__mbstate_t.h
34

Right, the problem is that __has_include(<wchar.h>) is always true because libc++ ships its own.

rupprecht updated this revision to Diff 520982.May 10 2023, 6:59 AM

Restore to v1 of the patch

Does any existing test fail when we run the test suite with glibc-2.25, or do we need to add a test case for this?

No existing test case that I'm aware of. Let me see if the repro Ray posted can be added to the test suite.

I think the fix we're looking for is simply to do:

#ifdef __cplusplus
#  define __CORRECT_ISO_CPP_WCHAR_H_PROTO
#endif

// current if-else chain

You're in luck, that was the original version of this patch :)

rupprecht updated this revision to Diff 520991.May 10 2023, 7:52 AM
  • Add some tests, and temporarily comment out the fix to see if CI catches it.
ldionne added inline comments.May 10 2023, 9:03 AM
libcxx/test/std/strings/c.strings/cwchar_include_order1.compile.verify.cpp
1 ↗(On Diff #520991)

I think linking to the bug report somewhere in the tests would be great!

  • Add XFAIL: no-wide-characters to fix CI, and update the comment with bug details
rupprecht marked an inline comment as done.May 10 2023, 10:12 AM

Sadly, when I commented out the fix & ran it through CI, the only failure was on the "no wide characters" bot because I didn't have the right XFAIL line for that. I guess we don't have any bots with an old enough glibc?

I accidentally uploaded two tests with the same include ordering, but they both should have caused a test failure. I'm going to run it through one more time to see if a second CI run catches it, and then I'll un-comment the fix.

libcxx/test/std/strings/c.strings/cwchar_include_order1.compile.verify.cpp
1 ↗(On Diff #520991)

I didn't have one filed yet, so I created https://llvm.org/PR62638 and linked it here for context.

iana added inline comments.May 10 2023, 10:20 AM
libcxx/include/__mbstate_t.h
34

Can we add a comment to say why we always define this, and not just if we're going to include_next wchar.h?

rupprecht marked an inline comment as done.
  • Add a comment explaining __CORRECT_ISO_CPP_WCHAR_H_PROTO, and uncomment the actual fix now that I'm done testing CI
iana added inline comments.May 10 2023, 12:32 PM
libcxx/include/__mbstate_t.h
30–33

Can you also mention why this isn't done only in the #elif !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS) && __has_include_next(<wchar.h>) branch? It looks like that macro should only affect <wchar.h>, but I think you were saying that it's also needed for <bits/types/mbstate_t.h> I think?

ldionne accepted this revision.May 10 2023, 12:42 PM

The patch makes sense to me in its current state. If it fixes your issues and the CI is green, feel free to go ahead.

This revision is now accepted and ready to land.May 10 2023, 12:42 PM
This revision was landed with ongoing or failed builds.May 10 2023, 3:43 PM
This revision was automatically updated to reflect the committed changes.
rupprecht added inline comments.May 10 2023, 4:01 PM
libcxx/include/__mbstate_t.h
30–33

(just noticed this comment too, right after submitting)

I don't think it matters much one way or the other -- I'm fairly certain that _LIBCPP_HAS_NO_WIDE_CHARACTERS only affects wchar.h, and none of the other include/include_next entries here will transitively include wchar.h, so it would be safe to put it within the if/elif chain. It's slightly more readable to have it outside of the block IMHO, and I don't think there's any harm in having it there -- it should be safe to even have it in libc++'s __config header.

But anyway, proposed D150322 to do that. Happy to land that if you'd prefer that.