This is an archive of the discontinued LLVM Phabricator instance.

Support escaping in TrigramIndex.
ClosedPublic

Authored by krasin on Dec 1 2016, 12:50 PM.

Details

Summary

This is a follow up to r288303, where I have introduced TrigramIndex
to speed up SpecialCaseList for the cases when all rules are
simple wildcards, like *hello*wor.d*.

Here, I add support for escaping, so that it's possible to
specify rules like *c\+\+abi*.

Event Timeline

krasin updated this revision to Diff 79973.Dec 1 2016, 12:50 PM
krasin retitled this revision from to Support escaping in TrigramIndex..
krasin updated this object.
krasin added a reviewer: pcc.
krasin added a subscriber: llvm-commits.
pcc added inline comments.Dec 1 2016, 7:57 PM
lib/Support/TrigramIndex.cpp
60–68

This part doesn't look right. Neither 'n', 't' nor '0' are special. See: http://llvm-cs.pcc.me.uk/lib/Support/regcomp.c#379

I confirmed it with this passing test:

diff --git a/llvm/unittests/Support/RegexTest.cpp b/llvm/unittests/Support/RegexTest.cpp
index 5e3ce39..e151f47 100644
--- a/llvm/unittests/Support/RegexTest.cpp
+++ b/llvm/unittests/Support/RegexTest.cpp
@@ -171,4 +171,16 @@ TEST_F(RegexTest, MatchInvalid) {
   EXPECT_FALSE(r1.match("X"));
 }
 
+TEST_F(RegexTest, foo) {
+  Regex r1("\\t");
+  EXPECT_TRUE(r1.match("t"));
+  EXPECT_FALSE(r1.match("\t"));
+  Regex r2("\\n");
+  EXPECT_TRUE(r2.match("n"));
+  EXPECT_FALSE(r2.match("\n"));
+  Regex r3("\\0");
+  EXPECT_TRUE(r3.match("0"));
+  EXPECT_FALSE(r3.match("x"));
+}
+
 }
krasin updated this revision to Diff 80103.Dec 2 2016, 11:00 AM

Fix handling escaped symbols: they are all self-escaped in LLVM.

krasin marked an inline comment as done.Dec 2 2016, 11:01 AM
krasin added inline comments.
lib/Support/TrigramIndex.cpp
60–68

You're right. Thank you for the catch. Fixed.

pcc added inline comments.Dec 2 2016, 11:41 AM
lib/Support/TrigramIndex.cpp
60–68

What about '1' through '9'?

diff --git a/llvm/unittests/Support/RegexTest.cpp b/llvm/unittests/Support/RegexTest.cpp
index 5e3ce39..bc4995f 100644
--- a/llvm/unittests/Support/RegexTest.cpp
+++ b/llvm/unittests/Support/RegexTest.cpp
@@ -171,4 +171,19 @@ TEST_F(RegexTest, MatchInvalid) {
   EXPECT_FALSE(r1.match("X"));
 }
 
+TEST_F(RegexTest, foo) {
+  Regex r1("\\t");
+  EXPECT_TRUE(r1.match("t"));
+  EXPECT_FALSE(r1.match("\t"));
+  Regex r2("\\n");
+  EXPECT_TRUE(r2.match("n"));
+  EXPECT_FALSE(r2.match("\n"));
+  Regex r3("\\0");
+  EXPECT_TRUE(r3.match("0"));
+  EXPECT_FALSE(r3.match("x"));
+  Regex r4("(a)\\1");
+  EXPECT_TRUE(r4.match("aa"));
+  EXPECT_FALSE(r4.match("a1"));
+}
+
 }
krasin updated this revision to Diff 80120.Dec 2 2016, 12:21 PM
krasin marked an inline comment as done.

Add backreferences back (except \0).

krasin added inline comments.Dec 2 2016, 12:23 PM
lib/Support/TrigramIndex.cpp
60–68

I have added \1 to \9 back, and also added tests for these cases.

pcc added inline comments.Dec 2 2016, 12:36 PM
lib/Support/TrigramIndex.cpp
59–68

Perhaps more simply:

if (Escaped && Char >= '1' && Char <= '9') {
  Defeated = true;
  return;
}
krasin updated this revision to Diff 80136.Dec 2 2016, 2:44 PM

switch -> if

krasin marked an inline comment as done.Dec 2 2016, 2:45 PM
pcc added inline comments.Dec 2 2016, 2:53 PM
lib/Support/TrigramIndex.cpp
30

Nit: nullptr.

unittests/Support/TrigramIndexTest.cpp
113

I think this is being defeated because of your use of parens, rather than the \1 (which I think should really be \\1).

119

Same here.

krasin updated this revision to Diff 80140.Dec 2 2016, 3:28 PM

NULL -> nullptr; fix the backreferences test cases.

krasin marked 3 inline comments as done.Dec 2 2016, 3:28 PM
pcc accepted this revision.Dec 2 2016, 3:38 PM
pcc edited edge metadata.

LGTM

This revision is now accepted and ready to land.Dec 2 2016, 3:38 PM
krasin closed this revision.Dec 2 2016, 3:40 PM