This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Fix definition of regex_traits::__regex_word on big-endian glibc systems
ClosedPublic

Authored by dsanders on Feb 11 2016, 2:10 AM.

Details

Summary

On glibc, the bits used for the various character classes is endian dependant
(see _ISbit() in ctypes.h) but __regex_word does not account for this and uses
a spare bit that isn't spare on big-endian. On big-endian, it overlaps with the
bit for graphic characters which causes '-', '@', etc. to be considered a word
character.

Fixed this by defining the value using _ISbit(15) on glibc systems.

Fixes PR26476.

Diff Detail

Event Timeline

dsanders updated this revision to Diff 47611.Feb 11 2016, 2:10 AM
dsanders retitled this revision from to [libcxx] Fix definition of regex_traits::__regex_word on big-endian glibc systems.
dsanders updated this object.
dsanders added reviewers: mclow.lists, hans.
dsanders added a subscriber: cfe-commits.
dsanders added inline comments.Feb 11 2016, 2:17 AM
include/regex
980

The static_cast is necessary to silence a false-positive warning on little-endian. _ISbit(15) expands to:

((15) < 8 ? ((1 << (15)) << 8) : ((1 << (15)) >> 8))

which simplifies to:

0 ? 0x800000 : 0x80

Clang warns about the truncation of the 0x800000 to char_class_type (unsigned short) even though this value doesn't matter.

mclow.lists edited edge metadata.Feb 11 2016, 7:37 AM

In r260527, I added some tests to catch this if it happens again.

If those tests fail w/o this patch and succeed with, then I'm happy with applying it.

In r260527, I added some tests to catch this if it happens again.

If those tests fail w/o this patch and succeed with, then I'm happy with applying it.

The tests in r260527, don't fail without this patch because it doesn't check 'graph'. It also doesn't fail when I add it. The reason for this is that std::ctype_base::graph isn't the same as _ISgraph:

std::regex_traits<char>::__regex_word = 128
std::ctype_base::graph = 1036
_ISgraph = 128

ctype_base is defining graph by combining alnum and punct instead of using the _IS* macro like the other bits.

I'm not sure where it's getting the definition of classic_table. Presumably it's coming from glibc because adding a #error next to the optional one in libcxx doesn't cause build failures.

mclow.lists added a comment.EditedFeb 11 2016, 11:29 AM

ctype_base is defining graph by combining alnum and punct instead of using the _IS* macro like the other bits.

Because that's how the standard (22.4.1) shows to do it :-)

I'm not sure where it's getting the definition of classic_table. Presumably it's coming from glibc ...

Exactly. The classic_table is there for C libraries (like Bionic) that don't have their own rune table.

I haven't fully proven this yet (because I haven't found the table), but I think that C and C++ are sharing the same table and we're colliding with a bit that only makes sense to C. This would explain why your tests don't fail but regex_word still collides with _ISgraph in the table.

I've confirmed that glibc is testing for _ISgraph, and libstdc++ defines ctype_base as per the standard. I just need to find the table.

The table is apparently inside locale-archive and it looks like C and C++ both share this table. At this point I'm satisfied that the table also contains information that C++ isn't interested in and it's this data that regex_word collides with.

Are you happy to lift the condition that your tests fail without my patch?

Sorry for the early ping but I need to fix this for rc3 (which Hans is hoping to tag mid-week) and I'm stuck as long as the requirement that the C++ tests in r260527 fail without my patch is in place.

I'm happy to make this '#ifdef mips' if that helps.

mclow.lists accepted this revision.Feb 16 2016, 2:37 PM
mclow.lists edited edge metadata.

Let's wrap it in a MIPS ifdef for the moment, and then, post-release we can see if that can be relaxed.

Given that, LGTM.

This revision is now accepted and ready to land.Feb 16 2016, 2:37 PM

Thanks. I'll commit using a MIPS ifdef in the morning and merge once it's through the buildbots.

dsanders closed this revision.Feb 17 2016, 5:20 AM