This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/
-
regex
-
test/std/re/re.results/re.results.acc/
-
std/
-
re/
-
re.results/
-
re.results.acc/
6
index.pass.cpp

Differential D7111

[libcxx] Fix match_results for alternatives
ClosedPublic

Authored by K-ballo on Jan 21 2015, 3:47 PM.

Download Raw Diff

Details

Reviewers

mclow.lists
EricWF

Summary

Initialize submatch results to unmatched. They will not be modified for failed alternatives (they might not even be looked at), and would otherwise leave the pair of iterators value-initialized instead of pointing to the end of the searched sequence. Fixes PR22061.

Diff Detail

Event Timeline

K-ballo updated this revision to Diff 18563.Jan 21 2015, 3:47 PM

K-ballo retitled this revision from to [libcxx] Fix match_results for alternatives.

K-ballo updated this object.

K-ballo edited the test plan for this revision. (Show Details)

K-ballo added reviewers: mclow.lists, EricWF.

K-ballo added a subscriber: Unknown Object (MLST).

The fix looks reasonable, it fixes the OP's problem, and the new test fails w/o the fix.

However, the change to <regex> is in two places, and I don't see how this test exercises both code paths.

Other than that (and the nits in the test) this looks good.

test/std/re/re.results/re.results.acc/index.pass.cpp
27	How about `assert(m.size() == 6)` here? - or `>5` since we're checking elements 0..5
50	Should this be `m[5]` ?

In D7111#112810, @mclow.lists wrote:

The fix looks reasonable, it fixes the OP's problem, and the new test fails w/o the fix.

However, the change to <regex> is in two places, and I don't see how this test exercises both code paths.

Good point, I'll parametrize the test and invoke it for both ECMAScript and extended POSIX.

test/std/re/re.results/re.results.acc/index.pass.cpp
27	`assert(m.size() == 4)` actually, the main match plus three subexpressions. Any out of range access shall return an unmatched result, and the original test was exercising that.
50	Indeed, good catch.

Address review comments.

mclow.lists accepted this revision.Jan 24 2015, 12:48 PM

mclow.lists edited edge metadata.

mclow.lists added inline comments.

test/std/re/re.results/re.results.acc/index.pass.cpp
28	Ok, then I'm really confused. The code for `match_results::operator[]` is: return __n < __matches_.size() ? __matches_[__n] : __unmatched_; So how does the change in `<regex>` affect this result?

This revision is now accepted and ready to land.Jan 24 2015, 12:48 PM

K-ballo added inline comments.Jan 24 2015, 4:40 PM

test/std/re/re.results/re.results.acc/index.pass.cpp
28	The changes affect unmatched alternative subexpressions. Consider instead the expression `"(z)\|cd((e)fg)hi"`, the `(z)` subexpression would then be `m[1]` instead of `m[3]`. In both cases the subscript is within range, so the out-of-range `__unmatched_` result is not referenced.

SVN r227384

Revision Contents

Path

Size

include/

regex

14 lines

test/

std/

re/

re.results/

re.results.acc/

index.pass.cpp

9 lines

Diff 18688

include/regex

Show First 20 Lines • Show All 5,595 Lines • ▼ Show 20 Lines	basic_regex<_CharT, _Traits>::__match_at_start_ecma(
const _CharT* __first, const _CharT* __last,		const _CharT* __first, const _CharT* __last,
match_results<const _CharT*, _Allocator>& __m,		match_results<const _CharT*, _Allocator>& __m,
regex_constants::match_flag_type __flags, bool __at_first) const		regex_constants::match_flag_type __flags, bool __at_first) const
{		{
vector<__state> __states;		vector<__state> __states;
__node* __st = __start_.get();		__node* __st = __start_.get();
if (__st)		if (__st)
{		{
		sub_match<const _CharT*> __unmatched;
		__unmatched.first = __last;
		__unmatched.second = __last;
		__unmatched.matched = false;

__states.push_back(__state());		__states.push_back(__state());
__states.back().__do_ = 0;		__states.back().__do_ = 0;
__states.back().__first_ = __first;		__states.back().__first_ = __first;
__states.back().__current_ = __first;		__states.back().__current_ = __first;
__states.back().__last_ = __last;		__states.back().__last_ = __last;
__states.back().__sub_matches_.resize(mark_count());		__states.back().__sub_matches_.resize(mark_count(), __unmatched);
__states.back().__loop_data_.resize(__loop_count());		__states.back().__loop_data_.resize(__loop_count());
__states.back().__node_ = __st;		__states.back().__node_ = __st;
__states.back().__flags_ = __flags;		__states.back().__flags_ = __flags;
__states.back().__at_first_ = __at_first;		__states.back().__at_first_ = __at_first;
do		do
{		{
__state& __s = __states.back();		__state& __s = __states.back();
if (__s.__node_)		if (__s.__node_)
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	basic_regex<_CharT, _Traits>::__match_at_start_posix_subs(
vector<__state> __states;		vector<__state> __states;
__state __best_state;		__state __best_state;
ptrdiff_t __j = 0;		ptrdiff_t __j = 0;
ptrdiff_t __highest_j = 0;		ptrdiff_t __highest_j = 0;
ptrdiff_t _Np = _VSTD::distance(__first, __last);		ptrdiff_t _Np = _VSTD::distance(__first, __last);
__node* __st = __start_.get();		__node* __st = __start_.get();
if (__st)		if (__st)
{		{
		sub_match<const _CharT*> __unmatched;
		__unmatched.first = __last;
		__unmatched.second = __last;
		__unmatched.matched = false;

__states.push_back(__state());		__states.push_back(__state());
__states.back().__do_ = 0;		__states.back().__do_ = 0;
__states.back().__first_ = __first;		__states.back().__first_ = __first;
__states.back().__current_ = __first;		__states.back().__current_ = __first;
__states.back().__last_ = __last;		__states.back().__last_ = __last;
__states.back().__sub_matches_.resize(mark_count());		__states.back().__sub_matches_.resize(mark_count(), __unmatched);
__states.back().__loop_data_.resize(__loop_count());		__states.back().__loop_data_.resize(__loop_count());
__states.back().__node_ = __st;		__states.back().__node_ = __st;
__states.back().__flags_ = __flags;		__states.back().__flags_ = __flags;
__states.back().__at_first_ = __at_first;		__states.back().__at_first_ = __at_first;
const _CharT* __current = __first;		const _CharT* __current = __first;
bool __matched = false;		bool __matched = false;
do		do
{		{
▲ Show 20 Lines • Show All 827 Lines • Show Last 20 Lines

test/std/re/re.results/re.results.acc/index.pass.cpp

	Show All 11 Lines
	// class match_results<BidirectionalIterator, Allocator>			// class match_results<BidirectionalIterator, Allocator>

	// const_reference operator[](size_type n) const;			// const_reference operator[](size_type n) const;

	#include <regex>			#include <regex>
	#include <cassert>			#include <cassert>

	void			void
	test()			test(std::regex_constants::syntax_option_type syntax)
	{			{
	std::match_results<const char*> m;			std::match_results<const char*> m;
	const char s[] = "abcdefghijk";			const char s[] = "abcdefghijk";
	assert(std::regex_search(s, m, std::regex("cd((e)fg)hi")));			assert(std::regex_search(s, m, std::regex("cd((e)fg)hi\|(z)", syntax)));

				assert(m.size() == 4);

				mclow.listsUnsubmitted Not Done Reply Inline Actions How about `assert(m.size() == 6)` here? - or `>5` since we're checking elements 0..5 mclow.lists: How about `assert(m.size() == 6)` here? - or `>5` since we're checking elements 0..5
				K-balloAuthorUnsubmitted Not Done Reply Inline Actions `assert(m.size() == 4)` actually, the main match plus three subexpressions. Any out of range access shall return an unmatched result, and the original test was exercising that. K-ballo: `assert(m.size() == 4)` actually, the main match plus three subexpressions. Any out of range…
	assert(m[0].first == s+2);			assert(m[0].first == s+2);
				mclow.listsUnsubmitted Not Done Reply Inline Actions Ok, then I'm really confused. The code for `match_results::operator[]` is: return __n < __matches_.size() ? __matches_[__n] : __unmatched_; So how does the change in `<regex>` affect this result? mclow.lists: Ok, then I'm really confused. The code for `match_results::operator[]` is: return __n <…
				K-balloAuthorUnsubmitted Not Done Reply Inline Actions The changes affect unmatched alternative subexpressions. Consider instead the expression `"(z)\|cd((e)fg)hi"`, the `(z)` subexpression would then be `m[1]` instead of `m[3]`. In both cases the subscript is within range, so the out-of-range `__unmatched_` result is not referenced. K-ballo: The changes affect unmatched alternative subexpressions. Consider instead the expression `"…
	assert(m[0].second == s+9);			assert(m[0].second == s+9);
	assert(m[0].matched == true);			assert(m[0].matched == true);

	assert(m[1].first == s+4);			assert(m[1].first == s+4);
	assert(m[1].second == s+7);			assert(m[1].second == s+7);
	assert(m[1].matched == true);			assert(m[1].matched == true);

	assert(m[2].first == s+4);			assert(m[2].first == s+4);
	assert(m[2].second == s+5);			assert(m[2].second == s+5);
	assert(m[2].matched == true);			assert(m[2].matched == true);

	assert(m[3].first == s+11);			assert(m[3].first == s+11);
	assert(m[3].second == s+11);			assert(m[3].second == s+11);
	assert(m[3].matched == false);			assert(m[3].matched == false);

	assert(m[4].first == s+11);			assert(m[4].first == s+11);
	assert(m[4].second == s+11);			assert(m[4].second == s+11);
	assert(m[4].matched == false);			assert(m[4].matched == false);
	}			}

	int main()			int main()
	{			{
				mclow.listsUnsubmitted Not Done Reply Inline Actions Should this be `m[5]` ? mclow.lists: Should this be `m[5]` ?
				K-balloAuthorUnsubmitted Not Done Reply Inline Actions Indeed, good catch. K-ballo: Indeed, good catch.
	test();			test(std::regex_constants::ECMAScript);
				test(std::regex_constants::extended);
	}			}