Delay consuming tokens until we are certain that the next token is not a top-level
block. Otherwise we bail out as if we saw an @end for better diagnostic and recovery.
Details
- Reviewers
aaron.ballman rjmccall
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | Why is there a ConsumeToken() call at all here? The token is already being consumed in line 729. |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | Didn't notice this, thanks for reminding! |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | I don't think any language expertise is required here — just seems like a straightforward bug on an error path that's probably not exercised all that often. Maybe somebody moved the ConsumeToken and forgot to fix this case or something. |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | OH! This is consuming the identifier for the implementation/interface name itself. e.g., @interface Frobble The consume on line 709 gets the @, the consume on line 729 gets the interface, and the consume on line 749 is getting the Frobble. That makes sense to me now. |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | What concerns me about this fix is that we don't typically check whether the token is EOF or not before consuming; that's usually an anti-pattern, isn't it? Wouldn't it make sense for this to use SkipUntil(tok::identifier) instead? |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | Okay, so now I can bring a little language expertise to bear. :) We're in the middle of parsing an ObjC block (e.g. @interface), and we see @interface or @implementation, which starts a new block. You can never nest these ObjC blocks, so the parser is reasonably assuming that the second @keyword is an attempt to start a new block and the user just forgot to terminate the last block with @end. Unfortunately, the actual recovery done by the parser doesn't seem to match the diagnostic and the fixit — it's trying to swallow @interface Foo (or whatever) and then continue the loop as if it were part of the current block, which is definitely not the right thing to do. The right way to recover here is to act like we actually saw @end and break out of the loop, leaving Tok on the @ so that the parser will pick up parsing @interface normally after we return. To do that, we just need to get the ObjC keyword by peeking at the next token instead of consuming. Also, we should take this recovery path on every @ keyword that's only allowed at the top level (so @class, @compatibility_alias, @interface, @implementation, and @protocol). |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | It's really great to learn things here! I don't know two much about ObjC. I seached google trying to find some standard or specs for ObjC but only docs like tutorials teaching how to use it can be found, so I might not be able to give a good enough fix for this issue. I'll give it a try though. |
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | You should be able to follow the guidance here without needing to know much more about ObjC, just understanding how the parser works. The key is that you need to delay consuming tokens until you're certain you're going to commit to parsing this @-directive as part of the current declaration. Start with the line SourceLocation AtLoc = ConsumeToken(); Instead of consuming the @ and then looking at Tok to see what the keyword is, you can get the location of Tok without consuming it, then use NextToken() to peek ahead to the next token to see the keyword. Be sure to sink the right number of ConsumeToken calls down onto all of the paths that *aren't* bailing out. |
Delay consuming tokens and bail out early if nested top-level block are met
for better diag & recovery.
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
749 | Actually clang/lib/Parse/ParseObjc.cpp has ObjC part of the syntax well documented, plus well written code, it's really helpful. |
clang/docs/ReleaseNotes.rst | ||
---|---|---|
125 ↗ | (On Diff #545069) | "Fixed a crash when parsing top-level ObjC blocks that aren't properly terminated. |
clang/lib/Parse/ParseObjc.cpp | ||
725 | This use of goto is a little too spaghetti-ish for me — we're breaking out of a loop and then branching into one case of an if/else chain. There are two reasonable ways to do this with more structure. The first is that we can make sure that we emit the "missing @end" diagnostic before we break out; to avoid code repetition, you can extract that into a helper function. The second is that we could set a flag to tell the code after the loop that we need to emit that diagnostic instead of trying to recreate the case analysis that we did in the loop. If you make a little helper function like isTopLevelObjCKeyword then you can just break out instead of having the awkwardness of being inside a switch. | |
737 | I would suggest pulling this up to right after (or even before) the isTopLevelObjCKeyword check. Then after the check you know that you're dealing with a keyword that you actually want to handle, and you can consume both the @ and the keyword token in one place. | |
827 | I don't think this case is actually reachable — we handle code-completion within the loop and then return. |
LGTM. Small wording suggestion for the comment, but feel free to commit with that change.
clang/lib/Parse/ParseObjc.cpp | ||
---|---|---|
742 | "If we see something like '@interface' that's only allowed at the top level, |
This use of goto is a little too spaghetti-ish for me — we're breaking out of a loop and then branching into one case of an if/else chain. There are two reasonable ways to do this with more structure. The first is that we can make sure that we emit the "missing @end" diagnostic before we break out; to avoid code repetition, you can extract that into a helper function. The second is that we could set a flag to tell the code after the loop that we need to emit that diagnostic instead of trying to recreate the case analysis that we did in the loop.
If you make a little helper function like isTopLevelObjCKeyword then you can just break out instead of having the awkwardness of being inside a switch.