This is an archive of the discontinued LLVM Phabricator instance.

Add mask values for the default rune table.
AbandonedPublic

Authored by danalbert on Mar 6 2015, 2:38 PM.

Details

Diff Detail

Event Timeline

danalbert updated this revision to Diff 21402.Mar 6 2015, 2:38 PM
danalbert retitled this revision from to Add mask values for the default rune table..
danalbert updated this object.
danalbert edited the test plan for this revision. (Show Details)
danalbert added reviewers: mclow.lists, jroelofs, EricWF.
danalbert set the repository for this revision to rL LLVM.
danalbert added a subscriber: Unknown Object (MLST).
EricWF edited edge metadata.Mar 6 2015, 3:44 PM

_LIBCPP_PROVDES_DEFAULT_RUNE_TABLE -->
_LIBCPP_PROVIDES_DEFAULT_RUNE_TABLE
Hi mclow.lists, jroelofs, EricWF,

REPOSITORY

rL LLVM

http://reviews.llvm.org/D8129

Files:

include/__locale

Index: include/__locale

  • include/__locale

+++ include/__locale
@@ -329,7 +329,21 @@
class _LIBCPP_TYPE_VIS ctype_base
{
public:
-#ifdef GLIBC
+#ifdef _LIBCPP_PROVDES_DEFAULT_RUNE_TABLE
+ typedef unsigned short mask;
+ static const mask space = 1 << 0;
+ static const mask print = 1 << 1;
+ static const mask cntrl = 1 << 2;
+ static const mask upper = 1 << 3;
+ static const mask lower = 1 << 4;
+ static const mask alpha = 1 << 5;
+ static const mask digit = 1 << 6;
+ static const mask punct = 1 << 7;
+ static const mask xdigit = 1 << 8;
+ static const mask blank = 1 << 9;
+ static const mask alnum = alpha | digit;
+ static const mask graph = alnum | punct;
+#elif defined(GLIBC)

typedef unsigned short mask;
static const mask space  = _ISspace;
static const mask print  = _ISprint;

EMAIL PREFERENCES

http://reviews.llvm.org/settings/panel/emailpreferences/
mclow.lists edited edge metadata.Mar 7 2015, 4:40 PM

It was my belief that the underlying OS/C library would provide these defines.
My goal was to get the same results as if you were calling islower, etc in the C library.

Is that not true for Android, etc?

jroelofs edited edge metadata.Mar 8 2015, 10:58 AM

@mclow.lists AFAICT, for Newlib, it's not possible for isblank to match between the c and c++ libs here (without the separate table) because '\t' doesn't have _B set in the table, but isblank(int) is defined like this:

int isblank(int c) {
  return ((__ctype_ptr__[c+1] & _B) || (c == '\t'));
}

so ctype<char>::is(ctype_base::blank, '\t') would return false, whereas isblank('\t') would return true. Using a libc++-provided table, this problem goes away. I think Bionic has the same issue.

We could put in exhaustive checks that the two match, but I'm not convinced that failing that sort of test really means anything... we're allowed to differ here, right?

There was another case where I thought we absolutely needed a separate table (having to do with Newlib's masks not being true bitmasks), but my notes on the subject are only confusing me more :(

I dug through my notes some more, found the bug with Newlib/Bionic's non-masks. This testcase demonstrates the issue:

#include <locale>
#include <assert.h>

class CheckUpper : public std::ctype_byname<wchar_t> {
public:
  CheckUpper() : std::ctype_byname<wchar_t>("C",0) { }
  void check() {
    assert(false == do_is(ctype_base::upper, (L"a")[0]));
  }
};

int main() {
  CheckUpper().check();
}

The mask for alpha/upper/lower is defined as this in the Newlib section of the ctype_base mask definitions in <__locale>:

static const mask upper = _U;
static const mask lower = _L;
static const mask alpha = _U | _L;

Here's the definition of do_is in src/locale.cpp:

1260 bool
1261 ctype_byname<wchar_t>::do_is(mask m, char_type c) const
1262 {
1263 #ifdef _LIBCPP_WCTYPE_IS_MASK
1264     return static_cast<bool>(iswctype_l(c, m, __l));
1265 #else
1266     bool result = false;
1267     wint_t ch = static_cast<wint_t>(c);
1268     if (m & space) result |= (iswspace_l(ch, __l) != 0);
1269     if (m & print) result |= (iswprint_l(ch, __l) != 0);
1270     if (m & cntrl) result |= (iswcntrl_l(ch, __l) != 0);
1271     if (m & upper) result |= (iswupper_l(ch, __l) != 0);
1272     if (m & lower) result |= (iswlower_l(ch, __l) != 0);
1273     if (m & alpha) result |= (iswalpha_l(ch, __l) != 0);
1274     if (m & digit) result |= (iswdigit_l(ch, __l) != 0);
1275     if (m & punct) result |= (iswpunct_l(ch, __l) != 0);
1276     if (m & xdigit) result |= (iswxdigit_l(ch, __l) != 0);
1277     if (m & blank) result |= (iswblank_l(ch, __l) != 0);
1278     return result;
1279 #endif
1280 }

In this example, line 1273 sets the result to true, but lower_a is not an upper-case character, so § 22.3.3.1.1 doesn't hold true for isupper:

...
template <class charT> bool isupper (charT c, const locale& loc);
template <class charT> bool islower (charT c, const locale& loc);
template <class charT> bool isalpha (charT c, const locale& loc);
...
1 Each of these functions isF returns the result of the expression:
use_facet< ctype<charT> >(loc).is(ctype_base::F, c)
where F is the ctype_base::mask value corresponding to that function (22.4.1)

A few months ago, I put in a patch that changes all of the lines like if (m & alpha) ... to if ((m & alpha) == alpha) to avoid this problem, and shortly thereafter reverted the patch thinking that it was non-conforming. Now I'm not so sure I should have reverted it.

I'm moving the discussion of the bitmask bug to PR22858. That's a separate issue than what this patch addresses.

danalbert abandoned this revision.Mar 10 2015, 1:50 PM

The #else case (waaaaaaay down at the bottom) already does this... my mistake.

It was my belief that the underlying OS/C library would provide these defines.
My goal was to get the same results as if you were calling islower, etc in the C library.

Is that not true for Android, etc?

For Android and newlib the problem is that the mask of the table isn't wide enough, so we run in to cases where we don't have m & n == 0 for any m and n in the ctype_base masks. _CTYPE_R, for example, is _CTYPE_P | _CTYPE_U | _CTYPE_L | _CTYPE_D | _CTYPE_B, rather than just 1 << 1.

For those implementations that have this issue, we can just fall through to the #else case and use the defaults.

jroelofs: you should submit that test case to keep us honest. I think we already have some tests that do that, but they're in tests that require more locale support (any) than some platforms (bionic, newlib) provide, so they don't get run.