Details
- Reviewers
mclow.lists jroelofs EricWF
Diff Detail
Event Timeline
_LIBCPP_PROVDES_DEFAULT_RUNE_TABLE -->
_LIBCPP_PROVIDES_DEFAULT_RUNE_TABLE
Hi mclow.lists, jroelofs, EricWF,
REPOSITORY
rL LLVM
Files:
include/__locale
Index: include/__locale
- include/__locale
+++ include/__locale
@@ -329,7 +329,21 @@
class _LIBCPP_TYPE_VIS ctype_base
{
public:
-#ifdef GLIBC
+#ifdef _LIBCPP_PROVDES_DEFAULT_RUNE_TABLE
+ typedef unsigned short mask;
+ static const mask space = 1 << 0;
+ static const mask print = 1 << 1;
+ static const mask cntrl = 1 << 2;
+ static const mask upper = 1 << 3;
+ static const mask lower = 1 << 4;
+ static const mask alpha = 1 << 5;
+ static const mask digit = 1 << 6;
+ static const mask punct = 1 << 7;
+ static const mask xdigit = 1 << 8;
+ static const mask blank = 1 << 9;
+ static const mask alnum = alpha | digit;
+ static const mask graph = alnum | punct;
+#elif defined(GLIBC)
typedef unsigned short mask; static const mask space = _ISspace; static const mask print = _ISprint;
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
It was my belief that the underlying OS/C library would provide these defines.
My goal was to get the same results as if you were calling islower, etc in the C library.
Is that not true for Android, etc?
@mclow.lists AFAICT, for Newlib, it's not possible for isblank to match between the c and c++ libs here (without the separate table) because '\t' doesn't have _B set in the table, but isblank(int) is defined like this:
int isblank(int c) { return ((__ctype_ptr__[c+1] & _B) || (c == '\t')); }
so ctype<char>::is(ctype_base::blank, '\t') would return false, whereas isblank('\t') would return true. Using a libc++-provided table, this problem goes away. I think Bionic has the same issue.
We could put in exhaustive checks that the two match, but I'm not convinced that failing that sort of test really means anything... we're allowed to differ here, right?
There was another case where I thought we absolutely needed a separate table (having to do with Newlib's masks not being true bitmasks), but my notes on the subject are only confusing me more :(
I dug through my notes some more, found the bug with Newlib/Bionic's non-masks. This testcase demonstrates the issue:
#include <locale> #include <assert.h> class CheckUpper : public std::ctype_byname<wchar_t> { public: CheckUpper() : std::ctype_byname<wchar_t>("C",0) { } void check() { assert(false == do_is(ctype_base::upper, (L"a")[0])); } }; int main() { CheckUpper().check(); }
The mask for alpha/upper/lower is defined as this in the Newlib section of the ctype_base mask definitions in <__locale>:
static const mask upper = _U; static const mask lower = _L; static const mask alpha = _U | _L;
Here's the definition of do_is in src/locale.cpp:
1260 bool 1261 ctype_byname<wchar_t>::do_is(mask m, char_type c) const 1262 { 1263 #ifdef _LIBCPP_WCTYPE_IS_MASK 1264 return static_cast<bool>(iswctype_l(c, m, __l)); 1265 #else 1266 bool result = false; 1267 wint_t ch = static_cast<wint_t>(c); 1268 if (m & space) result |= (iswspace_l(ch, __l) != 0); 1269 if (m & print) result |= (iswprint_l(ch, __l) != 0); 1270 if (m & cntrl) result |= (iswcntrl_l(ch, __l) != 0); 1271 if (m & upper) result |= (iswupper_l(ch, __l) != 0); 1272 if (m & lower) result |= (iswlower_l(ch, __l) != 0); 1273 if (m & alpha) result |= (iswalpha_l(ch, __l) != 0); 1274 if (m & digit) result |= (iswdigit_l(ch, __l) != 0); 1275 if (m & punct) result |= (iswpunct_l(ch, __l) != 0); 1276 if (m & xdigit) result |= (iswxdigit_l(ch, __l) != 0); 1277 if (m & blank) result |= (iswblank_l(ch, __l) != 0); 1278 return result; 1279 #endif 1280 }
In this example, line 1273 sets the result to true, but lower_a is not an upper-case character, so § 22.3.3.1.1 doesn't hold true for isupper:
...
template <class charT> bool isupper (charT c, const locale& loc);
template <class charT> bool islower (charT c, const locale& loc);
template <class charT> bool isalpha (charT c, const locale& loc);
...
1 Each of these functions isF returns the result of the expression:
use_facet< ctype<charT> >(loc).is(ctype_base::F, c)
where F is the ctype_base::mask value corresponding to that function (22.4.1)
A few months ago, I put in a patch that changes all of the lines like if (m & alpha) ... to if ((m & alpha) == alpha) to avoid this problem, and shortly thereafter reverted the patch thinking that it was non-conforming. Now I'm not so sure I should have reverted it.
I'm moving the discussion of the bitmask bug to PR22858. That's a separate issue than what this patch addresses.
The #else case (waaaaaaay down at the bottom) already does this... my mistake.
It was my belief that the underlying OS/C library would provide these defines.
My goal was to get the same results as if you were calling islower, etc in the C library.Is that not true for Android, etc?
For Android and newlib the problem is that the mask of the table isn't wide enough, so we run in to cases where we don't have m & n == 0 for any m and n in the ctype_base masks. _CTYPE_R, for example, is _CTYPE_P | _CTYPE_U | _CTYPE_L | _CTYPE_D | _CTYPE_B, rather than just 1 << 1.
For those implementations that have this issue, we can just fall through to the #else case and use the defaults.
jroelofs: you should submit that test case to keep us honest. I think we already have some tests that do that, but they're in tests that require more locale support (any) than some platforms (bionic, newlib) provide, so they don't get run.