This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ]:[z/OS]:[libcxx]: fix isascii function to work for z/OS
Needs RevisionPublic

Authored by NancyWang2222 on Feb 2 2022, 3:05 PM.

Details

Summary

current logic in isascii function assumes character is ASCII. it is true when character is between 0 - 255. we need to change logic so it will work on both ASCII and EBCDIC mode.

Diff Detail

Event Timeline

NancyWang2222 requested review of this revision.Feb 2 2022, 3:05 PM
NancyWang2222 created this revision.
muiez accepted this revision.Feb 3 2022, 7:24 AM

LGTM

This revision is now accepted and ready to land.Feb 3 2022, 7:24 AM
muiez set the repository for this revision to rG LLVM Github Monorepo.Feb 3 2022, 7:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2022, 7:27 AM
Herald added 1 blocking reviewer(s): Restricted Project. · View Herald Transcript
This revision now requires review to proceed.Feb 3 2022, 7:27 AM
SeanP added inline comments.Feb 3 2022, 9:12 AM
libcxx/include/__locale
571

Rather than making this a static member function, I suggest creating an anonymous namespace outside the class and putting the function in there and as an inline function. You can then get rid of the duplicate definition in the other class.

I suspect to that for MVS we need to have one check for EBCDIC ([0,255]) and another for ASCII ([0,127]).

SeanP added inline comments.Feb 3 2022, 1:59 PM
libcxx/include/__locale
571

Nancy reminded me off-line that the char traits tables for MVS work on the range 0-255 for both ASCII and EBCDIC mode on MVS. The code, as is, is good.

Mordante added inline comments.
libcxx/include/__locale
571

Names in the library should be uglified, so for example __isbasic. I also like a better name, isascii is quite clear, isbasic not really. How about __is_valid_in_encoding? Maybe a bit long, but it's clear what you test.

573

Should this be #if defined(__MVS__) && !defined(__NATIVE_ASCII_F) like you did in D118849?

686

Is this intended to always return true, if you please just return true. You can use a (void) __c; to silence compiler diagnostics.

SeanP added inline comments.Feb 9 2022, 11:45 AM
libcxx/include/__locale
573

Thanks for thinking about ebcdic. The purpose of this function is to make sure the value in the range supported by the __tab indexing. On z/OS, the __tab is 256 chars regardless of ascii or ebcdic so the 0-255 range check works for both ascii & ebcdic.

I like the proposed new name.

NancyWang2222 added inline comments.Feb 9 2022, 1:37 PM
libcxx/include/__locale
571

I will use new function name.

686

clang will emit error if argument c is not used in function. that is why I checked c.

NancyWang2222 added inline comments.Feb 9 2022, 2:07 PM
libcxx/include/__locale
686

I will add void(__c);

NancyWang2222 updated this revision to Diff 407303.EditedFeb 9 2022, 2:34 PM

change isbasic function name and fix return __c ? true: true statement

fix function name for locale.cpp as well.

run git-clang-format for the changes

In general it looks good, some minor nits. I'd like to see it pass the CI before approving. I expect the ASAN build to fail since a recent compiler update in the build-nodes causes failures.

libcxx/include/__locale
571

@NancyWang2222 when you addressed a comment can you mark it as done? That makes it easier to see which comments haven't been addressed yet.

libcxx/src/locale.cpp
1026

I don't mind the formatting in the other #if blocks, but then please update the indention here also.
Please also look at the other indention changes you made.

NancyWang2222 marked 5 inline comments as done.Feb 10 2022, 9:31 AM
NancyWang2222 added inline comments.
libcxx/src/locale.cpp
1026

sure . I will update it.

NancyWang2222 marked an inline comment as done.

fix indentation for the function modified.

NancyWang2222 marked an inline comment as done.Feb 10 2022, 10:11 AM
Mordante accepted this revision as: Mordante.Feb 11 2022, 8:55 AM

LGTM, please wait for a libc++ approval before committing.

ldionne requested changes to this revision.Feb 11 2022, 9:55 AM
ldionne added a subscriber: ldionne.
ldionne added inline comments.
libcxx/include/__locale
571

Here and below.

686

Why is this intended to always return true?

This revision now requires changes to proceed.Feb 11 2022, 9:55 AM
NancyWang2222 added inline comments.Feb 11 2022, 11:17 AM
libcxx/include/__locale
571

i will add.

686

isascii function is not right test for z/OS, char type is between [0-255] which is valid character indexing in __tab function on z/OS for both ASCII and EBCDIC.

add _LIBCPP_HIDE_FROM_ABI for function __is_valid_in_encoding

NancyWang2222 marked an inline comment as done.

change function name and return type

please ignore comment.

NancyWang2222 marked an inline comment as done.Feb 14 2022, 7:02 AM

@ldionne hi Louis, Can you help review again? Thanks everyone for providing feedback.

NancyWang2222 set the repository for this revision to rG LLVM Github Monorepo.Feb 14 2022, 8:59 AM

kindly ping for review or approval from libcxx group

can I have 2nd review from libcxx?

Quuxplusone requested changes to this revision.Feb 19 2022, 6:48 AM
Quuxplusone added a subscriber: Quuxplusone.
Quuxplusone added inline comments.
libcxx/src/locale.cpp
1029

Changes like this one make me nervous. The original purpose of isascii(x) seems to have been a bounds check: it says "yes, x is in the range 0 to 127 inclusive." That doesn't really have anything to do with the ASCII versus EBCDIC question — hence the mass renaming to __is_valid_in_encoding, on which I'm neutral. However, it is very important to all of these table lookups: If x isn't in the range 0 to 127, e.g. if x is -1 or 10000, then these lookups will have UB.

Can you explain why you want to make this change? Specifically, what behaves wrongly on your platform if we don't do any of this?

This revision now requires changes to proceed.Feb 19 2022, 6:48 AM
SeanP added inline comments.Feb 22 2022, 7:12 AM
libcxx/src/locale.cpp
1029

The tables on z/OS have the range 0-255. This is true for both ASCII and EBCDIC. The tables need to be 0-255 since EBCDIC characters are spread over this entire range (eg. '9' is 0xf9). The system uses the same range in for the ASCII tables which is why the code uses 0-255 for z/OS.

libcxx/include/__locale
686

IIUC now (based on @SeanP's comment below), __is_valid_in_encoding(x) means no more or less than "0 <= x && x < N, where N is the number of elements in the array that was passed to the constructor on line 681." For most platforms this N is 128; for z/OS in particular, this N is 256.

Is this N always the same as ctype::table_size from lines 777–779? What goes wrong if we make this function simply

static bool __is_in_range(size_t __c) { return __c < table_size; }

? See
https://en.cppreference.com/w/cpp/locale/ctype_char/classic_table
https://en.cppreference.com/w/cpp/locale/ctype_char/table

Personally (although @ldionne is welcome to overrule me) I'd like to move away from the existing code's use of isascii(x) as a shorthand for x < 128, because this bounds check doesn't really have anything to do with ASCII, and it's silly to have to read a man page just to find out it means x < 128. If we mean x < table_size, we should just say that.

libcxx/src/locale.cpp
1029

Ah, thanks. I've resumed the comment thread on __is_valid_in_encoding above.

SeanP added inline comments.Feb 22 2022, 9:28 AM
libcxx/include/__locale
686

Good observation. However, I noticed that table_size==256. I don't think that we can assume that ctype<char>::classic_table and __tab_ have the same range. The code as it is will return false for anything out of range of __tab_ for ctype<char>::classic_table. If they should have the same size then we would have to update table_size for non-z/OS platforms. It is currently set to 256 for all platforms. I worry this will break people.

If you think the __is_in_range() fucntion should return table_size, I'd rather do that in a separate patch to avoid mixing it in with making things work for z/OS.

ping for the review. Thanks

Quuxplusone added inline comments.Mar 1 2022, 8:15 AM
libcxx/include/__locale
686

My current thought is that there are at least two, maybe three, changes going on in this PR:
(1) Rename isascii(x) to __is_valid_in_encoding(x). I agree that isascii is the wrong name, but I still hypothesize that the right name is nothing specifically to do with "encoding," but more like __is_valid_table_index. Sometimes the table in question is __tab_; sometimes it's classic_table(); do we want two different functions depending on which table we're talking about? Are all these tables guaranteed to have the same size? IMHO this renaming should be its own PR.
(2) Specifically for z/OS, change the behavior of __is_valid_in_encoding (or whatever its name(s) ends up as). This seems uncontroversial to me, but sadly is blocked on (1).
(3?) Other ad-hoc EBCDICifications, like replacing uses of x - 'a'? I see some non-functional diffs involving L'A' - L'a' and so on, but I'm not sure if there might be some functional diffs hiding among them. Anyway, I would imagine these would be uncontroversial, if they exist.