This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
1/1
wchar.h
-
src/
-
CMakeLists.txt
-
support/ibm/
-
ibm/
16/16
mbsnrtowcs.cpp
11/11
wcsnrtombs.cpp

Differential D98207

[SystemZ][z/OS] Missing wchar functions libc++
ClosedPublic

Authored by muiez on Mar 8 2021, 11:48 AM.

Download Raw Diff

Details

Reviewers

curdeius
ldionne
Jonathan.Crowther
DanielMcIntosh-IBM
• Quuxplusone

Group Reviewers

Restricted Project

Commits

rG4247381e26dd: [SystemZ][z/OS] Missing wchar functions libc++
rGfebbf68b423b: [SystemZ][z/OS] Missing wchar functions libc++

Summary

The aim is to add the missing z/OS specific implementations for mbsnrtowcs and wcsnrtombs, as part of libc++.

Diff Detail

Event Timeline

muiez created this revision.Mar 8 2021, 11:48 AM

Herald added a subscriber: mgorny. · View Herald TranscriptMar 8 2021, 11:48 AM

muiez requested review of this revision.Mar 8 2021, 11:48 AM

Harbormaster completed remote builds in B92720: Diff 329086.Mar 8 2021, 2:06 PM

If I understand correctly, these are almost the same as the files in support/solaris/ but:

without the _l suffix, and hence without the loc parameter
calling functions without _l and without loc
not calling FIX_LOCALE(loc;)

libcxx/include/wchar.h
178–179	Could you please fix this comment while here?

In D98207#2613281, @curdeius wrote:

If I understand correctly, these are almost the same as the files in support/solaris/ but:

without the _l suffix, and hence without the loc parameter

calling functions without _l and without loc

not calling FIX_LOCALE(loc;)

Yes, that is correct.

Fixed comment

muiez marked an inline comment as done.Mar 9 2021, 6:20 AM

Harbormaster completed remote builds in B92866: Diff 329315.Mar 9 2021, 7:57 AM

Ping :)

LGTM.

This revision is now accepted and ready to land.Mar 12 2021, 11:34 AM

Peanut gallery says: What's the significance of the .inc file extension (as opposed to .cpp)?
(Either way, I'm definitely not qualified to mark this as accepted, neither on the z/OS front nor on the is-this-license-acceptable front.)

FYI, this has a precedence in support/solaris. The licence is the same. The files differ only by _l suffix and so the use (or not) of locale paremeter.

Closed by commit rGfebbf68b423b: [SystemZ][z/OS] Missing wchar functions libc++ (authored by muiez). · Explain WhyMar 12 2021, 12:41 PM

This revision was automatically updated to reflect the committed changes.

muiez added a commit: rGfebbf68b423b: [SystemZ][z/OS] Missing wchar functions libc++.

Herald added a project: Restricted Project. · View Herald TranscriptMar 12 2021, 12:41 PM

Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Sorry for not having said that before, but for the next time please wait for a libc++ group approval before landing.

In D98207#2623292, @curdeius wrote:

Sorry for not having said that before, but for the next time please wait for a libc++ group approval before landing.

Thanks for the note. I saw 'This revision is now accepted and ready to land' and landed after that. I'll keep this in mind for next time.

Sorry, but we can't take this contribution unless it is made under the LLVM license. I'm going to have to revert this. I'll give you 24h to reply to this before reverting.

@ldionne, then something needs to be done with support/solaris that has almost the same license, no?
https://github.com/llvm/llvm-project/tree/main/libcxx/src/support/solaris

In D98207#2629651, @curdeius wrote:

@ldionne, then something needs to be done with support/solaris that has almost the same license, no?
https://github.com/llvm/llvm-project/tree/main/libcxx/src/support/solaris

Yes indeed, I believe those would need to be LLVM licensed. Those support files were added back in 2012 and there's a README explaining it (https://github.com/llvm/llvm-project/blob/main/libcxx/src/support/solaris/README), so I'm not sure what the thought was at the time.

However, regarding new contributions at least, the LLVM developer policy is pretty clear about what licensing is acceptable: https://llvm.org/docs/DeveloperPolicy.html#copyright-license-and-patents.

In D98207#2629675, @ldionne wrote:

In D98207#2629651, @curdeius wrote:

@ldionne, then something needs to be done with support/solaris that has almost the same license, no?
https://github.com/llvm/llvm-project/tree/main/libcxx/src/support/solaris

Yes indeed, I believe those would need to be LLVM licensed. Those support files were added back in 2012 and there's a README explaining it (https://github.com/llvm/llvm-project/blob/main/libcxx/src/support/solaris/README), so I'm not sure what the thought was at the time.

However, regarding new contributions at least, the LLVM developer policy is pretty clear about what licensing is acceptable: https://llvm.org/docs/DeveloperPolicy.html#copyright-license-and-patents.

Thanks for the notice. After looking through the LLVM developer policy, there is the following statement:

The scope of relicensing is all code that is considered part of the LLVM project, including the main LLVM repository, runtime libraries (compiler_rt, OpenMP, etc), Polly, and all other subprojects. There are a few exceptions:

Code imported from other projects (e.g. Google Test, Autoconf, etc) will remain as it is. This code isn’t developed as part of the LLVM project, it is used by LLVM.

The copyright in this patch (and in solaris) mentions that the code is from the FreeBSD libc project. Maybe this falls into the bullet above? If not, are there any suggestions/alternatives to move forward?

ldionne added a subscriber: chandlerc.Mar 18 2021, 9:56 AM

In D98207#2629795, @muiez wrote:

[...]

Thanks for the notice. After looking through the LLVM developer policy, there is the following statement:

The scope of relicensing is all code that is considered part of the LLVM project, including the main LLVM repository, runtime libraries (compiler_rt, OpenMP, etc), Polly, and all other subprojects. There are a few exceptions:

Code imported from other projects (e.g. Google Test, Autoconf, etc) will remain as it is. This code isn’t developed as part of the LLVM project, it is used by LLVM.

The copyright in this patch (and in solaris) mentions that the code is from the FreeBSD libc project. Maybe this falls into the bullet above? If not, are there any suggestions/alternatives to move forward?

Indeed, but I read that as talking about third-party projects that we check-in directly into the LLVM monorepo but we don't ever modify (for example Google Benchmark). This is not the same thing here, as arguably libcxx/src/support/ibm/wcsnrtombs.inc is now considered a libc++ file after this patch and we'd modify it if there was a bug in it, for example.

I am not a lawyer, I'm just trying to make sure we're following the LLVM contribution policy. I know @chandlerc has done a lot of work related to licensing, so perhaps he can provide more information here.

Ping. Do you see a solution or should I revert this?

In D98207#2642506, @ldionne wrote:

Ping. Do you see a solution or should I revert this?

I was hoping for @chandlerc to provide insight; however, if that is not the case, it could be reverted. I will try to rewrite the functions.

ldionne added a reverting change: rG1e337b1dd903: [libc++] Revert "[SystemZ][z/OS] Missing wchar functions libc++".Mar 22 2021, 2:36 PM

Thanks. Reverted in:

commit 1e337b1dd90369763ebe6aa4eb3e563c61ddb213
Author: Louis Dionne <ldionne.2@gmail.com>
Date:   Mon Mar 22 17:17:58 2021 -0400

    [libc++] Revert "[SystemZ][z/OS] Missing wchar functions libc++"

    This reverts commit febbf68b423b14d55a14980d2ba3ec37ef1e31dc because it
    added files that were not under the LLVM license.

    See https://reviews.llvm.org/D98207 for details.

FWIW, what I care about is just that the contribution is made under an acceptable license for LLVM, I otherwise have no problems with the patch itself.

I'm going to request changes on this so it shows up properly in the review queue.

This revision now requires changes to proceed.Mar 23 2021, 10:27 AM

Added missing functions under LLVM license (FYI: implementations are similar to those in libcxx/src/support/win32/support.cpp).

Ping (ready for new round of review, FYI @ldionne)

Clang tidy

Harbormaster completed remote builds in B98723: Diff 337499.Apr 14 2021, 12:33 PM

Harbormaster completed remote builds in B98733: Diff 337514.Apr 14 2021, 3:30 PM

Ping :)

Sorry for the nit picking and I know this is coming from existing Windows implementation but we have the opportunity to clean it up.
Most comments refer to both functions.

libcxx/src/support/ibm/mbsnrtowcs.cpp
2	Can you replace first 2 lines with ===----------------------- mbsnrtowcs.cpp -------------------------------=== This will make consistent with majority of other source files.
18	Can you clean up this description so it's easer to read? For example: `and no "out" parameters are updated`. Does it refer to `dst` or `src`? Also if you refer to the parameter name please annotate it to make it stand out.
27	Any reason to keep this commented out line around?
45	Apparently comments need to be proper sentences. Also fix the comment so it actually refer to `char_size` and not result. It looks like result will never be set to >0 value..

Improved readability of comments and documentation.

muiez marked 4 inline comments as done.Apr 26 2021, 11:34 AM

LGTM, just wait for libc++ approval.

libcxx/src/support/ibm/wcsnrtombs.cpp
44	Can you fix this as well?

Addressed comment.

muiez marked an inline comment as done.Apr 26 2021, 1:58 PM

Harbormaster completed remote builds in B100991: Diff 340595.Apr 26 2021, 7:39 PM

Harbormaster completed remote builds in B101016: Diff 340637.Apr 26 2021, 10:02 PM

No changes. Used to kick-off CI again.

Harbormaster completed remote builds in B101165: Diff 340844.Apr 27 2021, 10:12 AM

Ping

muiez added a reviewer: Jonathan.Crowther.May 11 2021, 7:46 AM

Ping, patch is ready for new round of review (@ldionne). Thanks

Ping.

If this is taken from windows, I will point out that, on windows wide characters are only 16 bits (on most systems, wchar_t is 32 bits). This may result in some assumptions in the implementation which aren't true for z/OS. I'm particularly concerned about line 39 potentially writing more than dest_remaining characters (TBH, I'm a little surprised this part works correctly on windows). This could result in writing past the end of dst (i.e. a buffer overflow). This would then be made even worse by dest_remaining -= char_size not setting dest to 0 at that point, and thus the ! dest_remaining check not triggering. Assuming it has the correct license, using the z/OS implementation of wcsrtombs as a reference might be better (unlike wcsnrtombs which is a POSIX extention, wcsrtombs is a part of the C and C++ standards).

If you don't end up re-writing this based on the z/OS implementation of wcsrtombs, I've made several comments regarding how to clean up the control flow. Most are small, but together I think they will result in a significant improvement to readability.

libcxx/src/support/ibm/wcsnrtombs.cpp
36–37	Any reason this can't just be a part of the loop condition? i.e. `while (source_converted != max_source_chars && dest_remaining != 0)`
46–56	As per LLVM coding standards we should probably invert this condition so it is an early-exit. i.e. `if (char_size <= 0) { have_result = true; break; }`
49	This one takes a little bit more work, but I think it's cleaner to return here instead of break. (Also, return is an even earlier exit than break, so I think that would be preferred by the LLVM coding standards). We know control will jump to line 59, and the `if (terminator_found)` on line 60 is only true if we got to line 59 from this `break`. Thus, we can replace line 47 and 48 with if (dst) *src = NULL; return dest_converted; Then, get rid of the `terminator_found` state variable, and simplify lines 59-64. I think this will also probably enable further simplification of the control flow in a few places, but at this point I've recommended so many changes it's hard to keep track.
67–69	If I'm not mistaken, when `char_size < 0`, `have_result` is guaranteed to be true (if the loop doesn't run then `char_size == 0`, and if `char_size <= 0` in the loop, then we set `have_result = true` and immediately exit). Thus, we can get rid of the `have_result` state variable.

This revision now requires changes to proceed.Jun 9 2021, 10:49 AM

Immediately after hitting submit I notice a few more things. Added them as comments too

libcxx/src/support/ibm/mbsnrtowcs.cpp
40–41	This could also probably be made part of the loop condition. (Don't forget to invert it properly when you do)
48	again, char_size is unsigned, so this won't behave as you've described in the error case.
48	As in `wcsnrtombs`, this should also probably be inverted to convert it to an early-exit
51	Probably easier to increment `dst` directly here. You'll have to do so behind an extra `if (dst)`, but in exchange you can get rid of the ternary operator on line 45, and get rid of a state variable.
59	I think that we may be able to get rid of `have_result` here too, but I'm not as sure about this one, since `result` is initialized to 0, and `terminated_sequence == static_cast<size_t>(0)`.
libcxx/src/support/ibm/wcsnrtombs.cpp
45–46	char_size is actually not `-1` but rather `(size_t)-1`, and size_t is an unsigned type, so this test won't behave as you've described in the error case.
54	Since dst is a pointer and not a pointer-to-a-pointer, wouldn't it be easier to just add `char_size` directly to `dst` and do away with `dest_converted`? You don't even need to add an extra `if (dst)` since you have one on line 51

DanielMcIntosh-IBM added inline comments.Jun 9 2021, 12:03 PM

libcxx/src/support/ibm/wcsnrtombs.cpp
54	Never mind this part, I forgot we return `dest_converted`

Updated implementation to fix error handling, account for dst buffer overflow and refactoring.

removed todo comment

muiez marked 5 inline comments as done.Sep 2 2021, 2:26 PM

muiez added inline comments.

libcxx/src/support/ibm/mbsnrtowcs.cpp
51	I'd prefer to not add an extra conditional within the loop due to a performance tradeoff. Also, as mentioned below, `dest_converted` is returned so we need that state variable either way.
59	Yup, by using `result` directly within the loop (since it gets updated), I was able to remove `have_result`.

Harbormaster completed remote builds in B122398: Diff 370407.Sep 2 2021, 3:22 PM

I'm going to get out of your way and remove this from the review queue. The bits in libcxx/include LGTM, and I'll let you folks review the libcxx/src/support/ibm between yourselves cause it's your platform anyways. Once you're all happy with it, you can ship it assuming:

The CI is green (BTW I'm going to request for the nth time that we add a CI bot on z/OS)
You don't make additional modifications outside of libcxx/src/support/ibm without pinging me for a quick look

Thanks!

Minor refactor

Harbormaster completed remote builds in B123048: Diff 371334.Sep 8 2021, 9:02 AM

Check for ps nullptr

As discussed over slack, there is an edge case that needs to be addressed (the case where ps == nullptr on line 55 of wcsnrtombs and/or on line 61 of mbsnrtowcs).
Once that's fixed, LGTM. I've left several comments for a few more ways it could be tidied up, but none of them are functional changes.

libcxx/src/support/ibm/mbsnrtowcs.cpp
30–36	Similar to my comments for wcsnrtombs.cpp, we might want to move some of these so they have a narrower scope. (buff to line 60, mbstate_tmp to line 61, and source_remaining is discussed on line 42)
43	By tweaking this condition a bit we can get rid of `source_remaining` as a state variable: `source_converted < src_size_bytes && (!dst \|\| dest_converted < max_dest_chars)` You may still want to keep `source_remaining` around, but declare it inside the loop so it's no longer used as a state variable (`size_t source_remaining = src_size_bytes - source_converted;`). This also allows you to make it obvious that `source_converted` and `dest_converted` are only declared outside the for loop so that we can read from them after: After moving all the declarations except `terminated_sequence`, `invalid_sequence`, `incomplete_sequence`, `source_converted`, and `dest_converted`, the only non-const variables you've declared before the loop are `source_converted`, and `dest_converted`. If you move the zero-initialization to the init statement of the for loop, this signals that all writes to these two variables happen within the for loop, and after we only read from them (even better, we'd actually be doing all the writing within the init and increment of the loop statement, showing the reader right away what the pre-conditions and post-conditions are for each loop iteration).
49	See my comment in `wcsnrtombs.cpp`.
81	If you move source_remaining into a for loop init statement, this should probably become the increment statement of the for loop. (probably along with the next 2 lines)
libcxx/src/support/ibm/wcsnrtombs.cpp
28–34	Is there some reason we need to preserve all of these across loop iterations? For example, could we move mbstate_tmp's declaration to line 55 to limit the scope, and thereby reduce the number of variables the reader needs to keep track of. What about moving the declaration of char_size to line 42, and buff to line 54?
38–41	If you change this around so it's more like mbsnrtowcs, you can get rid of `dest_remaining` as a state variable: instead of `dest_remaining`, you set `dst_size_bytes = static_cast<size_t>(-1)`, then the loop condition can be changed to `source_converted < max_source_chars && (!dst \|\| dest_converted < dst_size_bytes)`. You may still want to keep `dest_remaining` around, but declare it inside the loop so it's no longer used as a state variable (`size_t dest_remaining = dst_size_bytes - dest_converted;`). This also gets rid of line 87 and 88, allowing you to cleanly turn line 86 and 89 into the increment statement in a for loop. If you move the declarations as I suggested, then you can go one step further and move the zero-initialization of `source_converted` and `dest_converted` to the init statement of the for loop for all the reasons I explained in my comments on line 42 of mbsnrtowcs.
44	I think we could make things a bit more clear here by inverting this if and reducing the level of nesting a little. i.e. if (dst == nullptr) { char_size = wcrtomb(nullptr, c, ps); } else if (dest_remaining >= static_cast<size_t>(MB_CUR_MAX)) { char_size = wcrtomb(dst + dest_converted, c, ps); } else { //... }

This revision now requires changes to proceed.Sep 8 2021, 2:35 PM

Harbormaster completed remote builds in B123111: Diff 371435.Sep 8 2021, 3:39 PM

More refactoring [NFC]

libcxx/src/support/ibm/mbsnrtowcs.cpp
43	Yup, I tweaked the condition to remove the `source_remaining` state variable. On the other hand, I attempted to use a for loop to include the init, condition & increment; however, this turned out to be tedious (i.e lots of variables, conditions, increments, etc.). For example we'd atleast have the following: for( size_t source_converted = 0; source_converted < src_size_bytes && (!dst \|\| dest_converted < max_dest_chars); source_converted += result, ++dest_converted) { } Nonetheless, since we return `dest_converted` and must declare it out of the loop, then it might be more concise to initialize it upon declaration too. Also, since writing (*src) is conditional, a while loop is more appropriate.

Harbormaster completed remote builds in B123305: Diff 371715.Sep 9 2021, 3:16 PM

DanielMcIntosh-IBM accepted this revision.Sep 10 2021, 12:52 PM

This revision is now accepted and ready to land.Sep 10 2021, 12:52 PM

Refactor to use for loop

Harbormaster completed remote builds in B123500: Diff 371999.Sep 10 2021, 2:39 PM

Closed by commit rG4247381e26dd: [SystemZ][z/OS] Missing wchar functions libc++ (authored by muiez). · Explain WhySep 13 2021, 6:44 AM

This revision was automatically updated to reflect the committed changes.

muiez added a commit: rG4247381e26dd: [SystemZ][z/OS] Missing wchar functions libc++.

Revision Contents

Path

Size

libcxx/

include/

wchar.h

6 lines

src/

CMakeLists.txt

2 lines

support/

ibm/

mbsnrtowcs.cpp

96 lines

wcsnrtombs.cpp

96 lines

Diff 370407

libcxx/include/wchar.h

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines

wchar_t* __libcpp_wmemchr(const wchar_t* __s, wchar_t __c, size_t __n) {return (wchar_t*)wmemchr(__s, __c, __n);}

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_PREFERRED_OVERLOAD

const wchar_t* wmemchr(const wchar_t* __s, wchar_t __c, size_t __n) {return __libcpp_wmemchr(__s, __c, __n);}

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_PREFERRED_OVERLOAD

wchar_t* wmemchr( wchar_t* __s, wchar_t __c, size_t __n) {return __libcpp_wmemchr(__s, __c, __n);}

}

#endif

#if defined(__cplusplus) && defined(_LIBCPP_MSVCRT_LIKE)

#if defined(__cplusplus) && (defined(_LIBCPP_MSVCRT_LIKE) || defined(__MVS__))

extern "C" {

size_t mbsnrtowcs(wchar_t *__restrict dst, const char **__restrict src,

size_t nmc, size_t len, mbstate_t *__restrict ps);

size_t wcsnrtombs(char *__restrict dst, const wchar_t **__restrict src,

size_t nwc, size_t len, mbstate_t *__restrict ps);

} // extern "C++"

} // extern "C"

curdeiusUnsubmitted

Done

size_t nwc, size_t len, mbstate_t *__restrict ps);

- } // extern "C++"

+ } // extern "C"

#endif // __cplusplus && (_LIBCPP_MSVCRT || __MVS__)

Could you please fix this comment while here?

curdeius: Could you please fix this comment while here?

#endif // __cplusplus && _LIBCPP_MSVCRT