current logic in isascii function assumes character is ASCII. it is true when character is between 0 - 255. we need to change logic so it will work on both ASCII and EBCDIC mode.
Details
- Reviewers
muiez abhina.sreeskantharajan SeanP zibi fanbo-meng Mordante ldionne • Quuxplusone - Group Reviewers
Restricted Project
Diff Detail
Unit Tests
Event Timeline
libcxx/include/__locale | ||
---|---|---|
571 | Rather than making this a static member function, I suggest creating an anonymous namespace outside the class and putting the function in there and as an inline function. You can then get rid of the duplicate definition in the other class. I suspect to that for MVS we need to have one check for EBCDIC ([0,255]) and another for ASCII ([0,127]). |
libcxx/include/__locale | ||
---|---|---|
571 | Nancy reminded me off-line that the char traits tables for MVS work on the range 0-255 for both ASCII and EBCDIC mode on MVS. The code, as is, is good. |
libcxx/include/__locale | ||
---|---|---|
571 | Names in the library should be uglified, so for example __isbasic. I also like a better name, isascii is quite clear, isbasic not really. How about __is_valid_in_encoding? Maybe a bit long, but it's clear what you test. | |
573 | Should this be #if defined(__MVS__) && !defined(__NATIVE_ASCII_F) like you did in D118849? | |
685 | Is this intended to always return true, if you please just return true. You can use a (void) __c; to silence compiler diagnostics. |
libcxx/include/__locale | ||
---|---|---|
573 | Thanks for thinking about ebcdic. The purpose of this function is to make sure the value in the range supported by the __tab indexing. On z/OS, the __tab is 256 chars regardless of ascii or ebcdic so the 0-255 range check works for both ascii & ebcdic. I like the proposed new name. |
libcxx/include/__locale | ||
---|---|---|
685 | I will add void(__c); |
In general it looks good, some minor nits. I'd like to see it pass the CI before approving. I expect the ASAN build to fail since a recent compiler update in the build-nodes causes failures.
libcxx/include/__locale | ||
---|---|---|
571 | @NancyWang2222 when you addressed a comment can you mark it as done? That makes it easier to see which comments haven't been addressed yet. | |
libcxx/src/locale.cpp | ||
1032 | I don't mind the formatting in the other #if blocks, but then please update the indention here also. |
libcxx/src/locale.cpp | ||
---|---|---|
1032 | sure . I will update it. |
@ldionne hi Louis, Can you help review again? Thanks everyone for providing feedback.
libcxx/src/locale.cpp | ||
---|---|---|
1036 | Changes like this one make me nervous. The original purpose of isascii(x) seems to have been a bounds check: it says "yes, x is in the range 0 to 127 inclusive." That doesn't really have anything to do with the ASCII versus EBCDIC question — hence the mass renaming to __is_valid_in_encoding, on which I'm neutral. However, it is very important to all of these table lookups: If x isn't in the range 0 to 127, e.g. if x is -1 or 10000, then these lookups will have UB. Can you explain why you want to make this change? Specifically, what behaves wrongly on your platform if we don't do any of this? |
libcxx/src/locale.cpp | ||
---|---|---|
1036 | The tables on z/OS have the range 0-255. This is true for both ASCII and EBCDIC. The tables need to be 0-255 since EBCDIC characters are spread over this entire range (eg. '9' is 0xf9). The system uses the same range in for the ASCII tables which is why the code uses 0-255 for z/OS. |
libcxx/include/__locale | ||
---|---|---|
685 | IIUC now (based on @SeanP's comment below), __is_valid_in_encoding(x) means no more or less than "0 <= x && x < N, where N is the number of elements in the array that was passed to the constructor on line 681." For most platforms this N is 128; for z/OS in particular, this N is 256. Is this N always the same as ctype::table_size from lines 777–779? What goes wrong if we make this function simply static bool __is_in_range(size_t __c) { return __c < table_size; } ? See Personally (although @ldionne is welcome to overrule me) I'd like to move away from the existing code's use of isascii(x) as a shorthand for x < 128, because this bounds check doesn't really have anything to do with ASCII, and it's silly to have to read a man page just to find out it means x < 128. If we mean x < table_size, we should just say that. | |
libcxx/src/locale.cpp | ||
1036 | Ah, thanks. I've resumed the comment thread on __is_valid_in_encoding above. |
libcxx/include/__locale | ||
---|---|---|
685 | Good observation. However, I noticed that table_size==256. I don't think that we can assume that ctype<char>::classic_table and __tab_ have the same range. The code as it is will return false for anything out of range of __tab_ for ctype<char>::classic_table. If they should have the same size then we would have to update table_size for non-z/OS platforms. It is currently set to 256 for all platforms. I worry this will break people. If you think the __is_in_range() fucntion should return table_size, I'd rather do that in a separate patch to avoid mixing it in with making things work for z/OS. |
libcxx/include/__locale | ||
---|---|---|
685 | My current thought is that there are at least two, maybe three, changes going on in this PR: |
Rather than making this a static member function, I suggest creating an anonymous namespace outside the class and putting the function in there and as an inline function. You can then get rid of the duplicate definition in the other class.
I suspect to that for MVS we need to have one check for EBCDIC ([0,255]) and another for ASCII ([0,127]).