Page MenuHomePhabricator

Regex backreference [1/3] Fixes backreferences for extended grammar.

Authored by Mordante on May 25 2019, 11:54 AM.



The regex backreferences were not properly parsed and used when using
the extended grammar. This change parses them.

The issue was found while working on bug 34297.

Diff Detail

Event Timeline

Mordante created this revision.May 25 2019, 11:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2019, 11:54 AM

Maybe add a test to ensure that the correct error is thrown?

I added a unit test in part 3 of the series [1]. Would you prefer to have the test for this fix in this patch?


Ah yes, I didn't see that patch. I will defer to the code-owners on this, but my $0.02 are, if this patch were a feature or added/removed functionality I would say add the tests to this patch. But, because this patch only fixes a small bug, it should be OK if the tests come later. If for whatever reason the other patch does not make it, you will have to remember to make another patch with the tests.

This looks quite reasonable to me, but I have to agree with Zoe here; since you've added behavior in this patch, the test for it should be in this patch as well.

Especially since I'm not convinced that is actually a bug, rather than a "This might be nice"

mclow.lists added inline comments.Jun 26 2019, 10:49 AM

Alternately, you could remove these two lines from this patch, and put them into D62453 (where the tests are).


Why two checks for m.length(0) here?

Why not just:

assert(static_cast<size_t>(m.length(0)) == std::char_traits<char>::length(s));

since you know that length(s) > 0
(Obviously, this applies to several places)

Mordante added inline comments.Jun 27 2019, 12:11 PM

I agree your proposed style would also work. I emulated the other tests in this file, they all use this assert.
It feels wrong to me to change the new tests to use your proposed style
assert(static_cast<size_t>(m.length(0)) == std::char_traits<char>::length(s));
This would make the style of the new test inconsistent with the style of the existing tests.
Do you prefer to keep the new tests as I wrote them or shall I create a separate patch to modify the style of the existing test?

mclow.lists added inline comments.Jul 3 2019, 1:37 PM

Just keep it the same as the others, and I'll clean them all up sometime.

Mordante updated this revision to Diff 208738.Jul 9 2019, 10:46 AM
Mordante edited the summary of this revision. (Show Details)

Removes the throw and move that to D62453.

Mordante marked an inline comment as done.Jul 9 2019, 10:47 AM
zoecarver added inline comments.Aug 17 2019, 9:59 AM

Is the intent here that __first = ++__temp will be executed regardless of the value of __val, or that it will only be executed if __val is greater than 1 and less than 9? If the former, I think you should unindent the statement, as it makes the intent confusing. If the latter, then you need braces around the if statement.

Mordante marked 2 inline comments as done.Aug 17 2019, 10:58 AM
Mordante added inline comments.

Good catch, still not entirely used to not attaching braces to all ifs. So forgot them when needed. Will upload a new patch later today.

Mordante updated this revision to Diff 215750.Aug 17 2019, 12:50 PM
Mordante marked an inline comment as done.

Fixes the issue found by @zoecarver .

mclow.lists added inline comments.Aug 21 2019, 7:56 AM

This looks like a copy/pasta of __first = __parse_BACKREF(__first, __last) to me.
How is it different?

Mordante marked 2 inline comments as done.Aug 24 2019, 2:41 AM
Mordante added inline comments.

It is a copy-paste of the body of __parse_BACKREF . The __parse_BACKREF is used in the basic regex (RE) part of the code. It searches for an entire backreference ( \n where n is a single digit). This code is part of the extended regex (ERE) and already has found a \ so only needs to find the digit.

I can move this block to a seperate function and adjust __parse_BACKREF. However I felt this would be a bit awkward due to updating __first. We can do it like this:

  __first = ++__temp;

or this: __first = __validate_backref(__first, __temp);
But then the __validate_backref has a __first, __current instead of a __first, __last set of iterator arguments.

Do you want me to move the common part to a separate function? If yes which of the two APIs do you prefer or do you have an alternative? Then I also need to adjust D62453 to move the __throw_regex_error the new function.

Mordante marked an inline comment as done.
mclow.lists added a comment.EditedSep 11 2019, 9:34 AM

I'm not a big fan of duplicated code; but especially in this case, since in D62453 you edit the duplicated code to add a throw; but you have to do it twice - once in __parse_backref and once in the new code. Why is that preferable?

I think there is a minor misunderstanding. I don't mind to avoid duplicating code. But __parse_BACKREF can't be used in both places. So the question was which of the proposed APIs you prefer. I'll post patch with my preferred solution. I'll wait with updating D62453 until this patch is approved.

Mordante updated this revision to Diff 220716.Sep 18 2019, 11:47 AM

Adds a helper function __test_back_ref to avoid code duplication.

Friendly ping

Mordante updated this revision to Diff 228575.Nov 9 2019, 8:11 AM

Rebased on master

@mclow.lists Do you still have a concern about code duplication? If not, I would go ahead and commit both this and, and close

Friendly ping.

ldionne accepted this revision.Feb 19 2020, 12:58 PM

Thanks for the patch and thanks for being patient!

This revision is now accepted and ready to land.Feb 19 2020, 12:58 PM
This revision was automatically updated to reflect the committed changes.

Thanks for the review and committing! (BTW I have commit access.)