This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ]:[z/OS]:[libcxx]: fix nthLetter issue for charconv header
Needs RevisionPublic

Authored by NancyWang2222 on Feb 2 2022, 2:28 PM.


Group Reviewers
Restricted Project

This patch is to fix issue related to charconv , the problem is in EBCDIC table on z/OS, nth letters are not continuous, they are grouped into a few groups and each group has 9 character. so need new logic in EBCDIC mode to return a correct integer value for nth letters.

Diff Detail

Event Timeline

NancyWang2222 requested review of this revision.Feb 2 2022, 2:28 PM
NancyWang2222 created this revision.
NancyWang2222 edited the summary of this revision. (Show Details)
NancyWang2222 retitled this revision from [SystemZ]:[libcxx]: fix nthLetter issue for charconv header to [SystemZ]:[z/OS]:[libcxx]: fix nthLetter issue for charconv header .Feb 2 2022, 2:51 PM

move #include <locale> up one line according to alphabetical order

only include <locale> when _LIBCPP_HAS_NO_LOCALIZATION is not defined

put guard in __locale when include locale.h


I think we need a guard here as well to fix the CI failure.

need a guard in libcxx/include/charconv as well.

include <cctype> so tolower function can be recognized when localization is disabled

This revision is now accepted and ready to land.Feb 4 2022, 7:21 AM
Mordante accepted this revision as: Mordante.Feb 9 2022, 11:02 AM
Mordante added 1 blocking reviewer(s): Restricted Project.
Mordante added a subscriber: Mordante.

LGMT modulo some nits. Please wait for a libc++ approval before committing.


Since none of the code used in this header is constexpr, it has no effect here.


Please use the style already used. I also think the name isn't that clear, maybe __letter_value?


I'm not familiar with EBCDIC so I just assume this is correct.


Please remove one blank line.

This revision now requires review to proceed.Feb 9 2022, 11:02 AM
NancyWang2222 added inline comments.Feb 9 2022, 6:39 PM

Thanks will remove it


OK. I will use __letter_value


it was tested on z/OS which has both ASCII and EBCDIC


will remove blank line. thanks

address comment: change function name from LetterNum to letter_value, and remove extra blank line , also remove constexpr.

I rebased with latest master. hope it will fix error.

NancyWang2222 marked 2 inline comments as done.Feb 10 2022, 9:47 AM
NancyWang2222 marked 2 inline comments as done.
abhina.sreeskantharajan added a project: Restricted Project.Feb 11 2022, 6:25 AM

match the format suggested by git-clang-format

ldionne added inline comments.

I don't understand why that's necessary. The intent is for __locale not to be includable when _LIBCPP_HAS_NO_LOCALIZATION is enabled.


I'm not sure I understand what that function is actually doing. Is it taking a character and returning an integer that represents the position of that character in a usual human-understandable ordering from a to z? It doesn't handle uppercase characters and that's why you use tolower below, right?

In that case, I think it would make sense to rename it to something like __alphabetical_index or something like that, and it should return a real integer, not the character type taken as an input. In other words, if that's a function from "characters" to "integers", let's be explicit about it -- I suspect it might be useful in other places. We might also want to make it support uppercase letters.

remove extra space

NancyWang2222 added inline comments.Feb 11 2022, 1:15 PM

function returns re-ordered index, unlike in ASCIi the letter is continuous, EBCDIC letters are grouped in 3 groups for both lower case and upper case. upper case letter and lower letter in EBCDIC table are different value but similar pattern after order [a-z] and [A-Z]. I will change name to __alphabetical_index . change return type to int.

NancyWang2222 added inline comments.Feb 11 2022, 1:20 PM

forgot to mention, use tolower function to reduce duplicate code, since both fall into similar patter , just different letter value.

Quuxplusone added inline comments.

FWIW, I found __letter_value confusing on first reading, because I would think that the "value" of a should be 10 and the "value" of f should be 15. In fact, this function returns 0 for a and 6 for f; i.e., it's returning the letter's zero-indexed index in the alphabet.
It might be too cutesy, but if this were my code, I'd name this function simply __letter_minus_a. It basically returns (__c - 'a'), modulo any relativistic effects due to EBCDIC.

NancyWang2222 added inline comments.Feb 11 2022, 2:18 PM

@ldionne suggests __alphabetical_index which match what function does. i will use that name. we order letters to be new alphabetical index

change function name letter_value to alphabetical_index

@ldionne @Quuxplusone I have addressed comment, Can you help review again. Thanks for the feedback.

NancyWang2222 set the repository for this revision to rG LLVM Github Monorepo.Feb 14 2022, 8:59 AM
NancyWang2222 marked an inline comment as done.Feb 14 2022, 6:51 PM
SeanP added inline comments.Feb 15 2022, 10:46 AM

fyi ... if you want a reference to EBCDIC code points. Note EBCDIC has these "variant" code points that change value in different code pages (eg. 1047 vs others). Some characters that change code point are '[', ']', '{', '}', '#' which is just awesome for C/C++ programming.

NancyWang2222 marked an inline comment as done.Feb 16 2022, 9:07 AM

ping :)


Thanks Sean. good information for anyone who wants to know about about EBCDIC.

NancyWang2222 marked an inline comment as done.Feb 18 2022, 11:35 AM

Can I have 2nd review from libcxx? Thanks

please kindly review this patch again. thanks.

Quuxplusone requested changes to this revision.Mar 1 2022, 8:31 AM
Quuxplusone added inline comments.

@NancyWang2222: I don't think this comment from Louis has been addressed/answered. What goes wrong if you remove this diff?
(But I fully expect this will be moot and you can remove this diff, after adopting my suggested rewrite below.)


Is this the only reason you #include <locale> at the top of this file? I don't think we should do that. We should keep the nice fast arithmetic (the old code) for ASCII platforms, at least. Is there any way to express the EBCDIC codepath in speedy terms, or must we call out to tolower when #ifdef __MVS__?
IOW, I suggest lines 474-476 become this instead:

#if defined(__MVS__) && !defined(__NATIVE_ASCII_F)
    if ('a' <= __c && __c <= 'i')
        return {__c - 'a' + 10 < __base, __c - 'a' + 10};
    else if ('j' <= __c && __c <= 'r')
        return {__c - 'j' + 19 < __base, __c - 'j' + 19};
    else if ('s' <= __c && __c <= 'z') 
        return {__c - 's' + 28 < __base, __c - 's' + 28};
    else if ('A' <= __c && __c <= 'I')
        return {__c - 'A' + 10 < __base, __c - 'A' + 10};
    else if ('J' <= __c && __c <= 'R')
        return {__c - 'J' + 19 < __base, __c - 'J' + 19};
    else if ('S' <= __c && __c <= 'Z') 
        return {__c - 'S' + 28 < __base, __c - 'S' + 28};
        return {false, 0};
    else if ('a' <= __c && __c < 'a' + __base - 10)
        return {true, __c - 'a' + 10};
        return {'A' <= __c && __c < 'A' + __base - 10, __c - 'A' + 10};
This revision now requires changes to proceed.Mar 1 2022, 8:31 AM