This patch is to fix issue related to charconv , the problem is in EBCDIC table on z/OS, nth letters are not continuous, they are grouped into a few groups and each group has 9 character. so need new logic in EBCDIC mode to return a correct integer value for nth letters.
Details
- Reviewers
muiez abhina.sreeskantharajan SeanP zibi Mordante • Quuxplusone - Group Reviewers
Restricted Project
Diff Detail
Event Timeline
libcxx/include/charconv | ||
---|---|---|
92 | I think we need a guard here as well to fix the CI failure. |
LGMT modulo some nits. Please wait for a libc++ approval before committing.
libcxx/include/charconv | ||
---|---|---|
436 | Since none of the code used in this header is constexpr, it has no effect here. | |
437 | Please use the style already used. I also think the name isn't that clear, maybe __letter_value? | |
443 | I'm not familiar with EBCDIC so I just assume this is correct. | |
448 | Please remove one blank line. |
address comment: change function name from LetterNum to letter_value, and remove extra blank line , also remove constexpr.
libcxx/include/__locale | ||
---|---|---|
18–19 | I don't understand why that's necessary. The intent is for __locale not to be includable when _LIBCPP_HAS_NO_LOCALIZATION is enabled. | |
libcxx/include/charconv | ||
436 | I'm not sure I understand what that function is actually doing. Is it taking a character and returning an integer that represents the position of that character in a usual human-understandable ordering from a to z? It doesn't handle uppercase characters and that's why you use tolower below, right? In that case, I think it would make sense to rename it to something like __alphabetical_index or something like that, and it should return a real integer, not the character type taken as an input. In other words, if that's a function from "characters" to "integers", let's be explicit about it -- I suspect it might be useful in other places. We might also want to make it support uppercase letters. |
libcxx/include/charconv | ||
---|---|---|
436 | function returns re-ordered index, unlike in ASCIi the letter is continuous, EBCDIC letters are grouped in 3 groups for both lower case and upper case. upper case letter and lower letter in EBCDIC table are different value but similar pattern after order [a-z] and [A-Z]. I will change name to __alphabetical_index . change return type to int. |
libcxx/include/charconv | ||
---|---|---|
436 | forgot to mention, use tolower function to reduce duplicate code, since both fall into similar patter , just different letter value. |
libcxx/include/charconv | ||
---|---|---|
437 | FWIW, I found __letter_value confusing on first reading, because I would think that the "value" of a should be 10 and the "value" of f should be 15. In fact, this function returns 0 for a and 6 for f; i.e., it's returning the letter's zero-indexed index in the alphabet. |
@ldionne @Quuxplusone I have addressed comment, Can you help review again. Thanks for the feedback.
libcxx/include/charconv | ||
---|---|---|
443 | fyi ... if you want a reference to EBCDIC code points. http://en.wikipedia.org/wiki/Ebcdic. Note EBCDIC has these "variant" code points that change value in different code pages (eg. 1047 vs others). Some characters that change code point are '[', ']', '{', '}', '#' which is just awesome for C/C++ programming. |
ping :)
libcxx/include/charconv | ||
---|---|---|
443 | Thanks Sean. good information for anyone who wants to know about about EBCDIC. |
libcxx/include/__locale | ||
---|---|---|
18–19 | @NancyWang2222: I don't think this comment from Louis has been addressed/answered. What goes wrong if you remove this diff? | |
libcxx/include/charconv | ||
472–473 | Is this the only reason you #include <locale> at the top of this file? I don't think we should do that. We should keep the nice fast arithmetic (the old code) for ASCII platforms, at least. Is there any way to express the EBCDIC codepath in speedy terms, or must we call out to tolower when #ifdef __MVS__? #if defined(__MVS__) && !defined(__NATIVE_ASCII_F) if ('a' <= __c && __c <= 'i') return {__c - 'a' + 10 < __base, __c - 'a' + 10}; else if ('j' <= __c && __c <= 'r') return {__c - 'j' + 19 < __base, __c - 'j' + 19}; else if ('s' <= __c && __c <= 'z') return {__c - 's' + 28 < __base, __c - 's' + 28}; else if ('A' <= __c && __c <= 'I') return {__c - 'A' + 10 < __base, __c - 'A' + 10}; else if ('J' <= __c && __c <= 'R') return {__c - 'J' + 19 < __base, __c - 'J' + 19}; else if ('S' <= __c && __c <= 'Z') return {__c - 'S' + 28 < __base, __c - 'S' + 28}; else return {false, 0}; #else else if ('a' <= __c && __c < 'a' + __base - 10) return {true, __c - 'a' + 10}; else return {'A' <= __c && __c < 'A' + __base - 10, __c - 'A' + 10}; #endif |
I don't understand why that's necessary. The intent is for __locale not to be includable when _LIBCPP_HAS_NO_LOCALIZATION is enabled.