This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] (trivial) bracket-matching
ClosedPublic

Authored by sammccall on May 18 2022, 10:24 AM.

Details

Summary

Error-tolerant bracket matching enables our error-tolerant parsing strategies.
The implementation here is *not* yet error tolerant: this patch sets up the APIs
and plumbing, and describes the planned approach.

Diff Detail

Event Timeline

sammccall created this revision.May 18 2022, 10:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 10:24 AM
Herald added a subscriber: mgorny. · View Herald Transcript
sammccall requested review of this revision.May 18 2022, 10:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2022, 10:24 AM

The Bracket struct & translation between token/bracket streams is not really justified by the implementation in this patch.
Next patch will run nontrivial algorithms on the bracket strings.

Happy to scope this patch down (avoid that struct) or up (include the full algorithm) if you prefer, but I think this keeps the diffs clearest.

hokein accepted this revision.May 20 2022, 1:46 PM

Thanks, this looks a good start version!

clang-tools-extra/pseudo/include/clang-pseudo/Bracket.h
9

brackes => braces

clang-tools-extra/pseudo/lib/Bracket.cpp
84

this is not really parsing, maybe findBrackets?

clang-tools-extra/pseudo/unittests/BracketTest.cpp
103

For this simple case, just using ^ is fine.

I think in the near future, we would need more to verify that e.g. which two brackets are paired. And since we're defining some some common functions used by the tests, I wonder what's the plan here (using something like { $1^( $1^)?)

This revision is now accepted and ready to land.May 20 2022, 1:46 PM
sammccall added inline comments.May 20 2022, 2:19 PM
clang-tools-extra/pseudo/unittests/BracketTest.cpp
103

I think in the near future, we would need more to verify that e.g. which two brackets are paired.

I thought about the simplest way to specify these tests, and I think the ^ is sufficient.

The combination of:
a) the set of brackets that are paired
b) for each bracket, knowing whether it is paired forwards or backwards
c) certainty that brackets are well-nested
fully determines the bracket pairing.

The test specifies a) and we check it, and b) and c) can be verified with no extra information.

Does this make sense? It's a little implicit, but makes the testcases much more readable than having to specifiy $1^ etc.
If so, I'll explain this in a comment.

hokein added inline comments.May 23 2022, 2:20 AM
clang-tools-extra/pseudo/unittests/BracketTest.cpp
103

Sure, using ^ is sufficient, and agree that make test cases more readable.

I was a bit worried about the error-repair cases. For these cases, I thought b) (e.g. we know this { should be paired with the last }, not the next one) is more important, it is probably better explicitly specify in the test, rather than doing it silently. I think we can use trailing comments to explain that while keeping ^.

class A ^{ // A
  void foo() {
^} // A
sammccall marked 4 inline comments as done.May 24 2022, 6:13 AM
sammccall added inline comments.
clang-tools-extra/pseudo/unittests/BracketTest.cpp
103

Added a comment explaining how it's sufficient, and that we should use comments if needed.

This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.