This is an archive of the discontinued LLVM Phabricator instance.

[Parser][ObjC] Fix parser crash on nested top-level block with better recovery path
ClosedPublic

Authored by danix800 on Jul 25 2023, 4:45 PM.

Details

Summary

Delay consuming tokens until we are certain that the next token is not a top-level
block. Otherwise we bail out as if we saw an @end for better diagnostic and recovery.

Fixes https://github.com/llvm/llvm-project/issues/64065

Diff Detail

Event Timeline

danix800 created this revision.Jul 25 2023, 4:45 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2023, 4:45 PM
danix800 requested review of this revision.Jul 25 2023, 4:45 PM
tbaeder added inline comments.
clang/lib/Parse/ParseObjc.cpp
749

Why is there a ConsumeToken() call at all here? The token is already being consumed in line 729.

danix800 added inline comments.Jul 26 2023, 1:35 AM
clang/lib/Parse/ParseObjc.cpp
749

Didn't notice this, thanks for reminding!

aaron.ballman added inline comments.Jul 26 2023, 11:04 AM
clang/lib/Parse/ParseObjc.cpp
749

I have the same question as @tbaeder -- what token is this intending to consume? CC @rjmccall for Obj-C expertise

rjmccall added inline comments.Jul 26 2023, 11:36 AM
clang/lib/Parse/ParseObjc.cpp
749

I don't think any language expertise is required here — just seems like a straightforward bug on an error path that's probably not exercised all that often. Maybe somebody moved the ConsumeToken and forgot to fix this case or something.

aaron.ballman added inline comments.Jul 26 2023, 11:37 AM
clang/lib/Parse/ParseObjc.cpp
749

OH! This is consuming the identifier for the implementation/interface name itself. e.g.,

@interface Frobble

The consume on line 709 gets the @, the consume on line 729 gets the interface, and the consume on line 749 is getting the Frobble. That makes sense to me now.

aaron.ballman added inline comments.Jul 26 2023, 12:01 PM
clang/lib/Parse/ParseObjc.cpp
749

What concerns me about this fix is that we don't typically check whether the token is EOF or not before consuming; that's usually an anti-pattern, isn't it? Wouldn't it make sense for this to use SkipUntil(tok::identifier) instead?

rjmccall added inline comments.Jul 26 2023, 12:31 PM
clang/lib/Parse/ParseObjc.cpp
749

Okay, so now I can bring a little language expertise to bear. :)

We're in the middle of parsing an ObjC block (e.g. @interface), and we see @interface or @implementation, which starts a new block. You can never nest these ObjC blocks, so the parser is reasonably assuming that the second @keyword is an attempt to start a new block and the user just forgot to terminate the last block with @end. Unfortunately, the actual recovery done by the parser doesn't seem to match the diagnostic and the fixit — it's trying to swallow @interface Foo (or whatever) and then continue the loop as if it were part of the current block, which is definitely not the right thing to do.

The right way to recover here is to act like we actually saw @end and break out of the loop, leaving Tok on the @ so that the parser will pick up parsing @interface normally after we return. To do that, we just need to get the ObjC keyword by peeking at the next token instead of consuming.

Also, we should take this recovery path on every @ keyword that's only allowed at the top level (so @class, @compatibility_alias, @interface, @implementation, and @protocol).

danix800 added inline comments.Jul 27 2023, 10:23 AM
clang/lib/Parse/ParseObjc.cpp
749

It's really great to learn things here! I don't know two much about ObjC. I seached google trying to find some standard or specs for ObjC but only docs like tutorials teaching how to use it can be found, so I might not be able to give a good enough fix for this issue. I'll give it a try though.

danix800 planned changes to this revision.Jul 27 2023, 10:57 AM
rjmccall added inline comments.Jul 27 2023, 11:03 AM
clang/lib/Parse/ParseObjc.cpp
749

You should be able to follow the guidance here without needing to know much more about ObjC, just understanding how the parser works. The key is that you need to delay consuming tokens until you're certain you're going to commit to parsing this @-directive as part of the current declaration.

Start with the line SourceLocation AtLoc = ConsumeToken(); Instead of consuming the @ and then looking at Tok to see what the keyword is, you can get the location of Tok without consuming it, then use NextToken() to peek ahead to the next token to see the keyword.

Be sure to sink the right number of ConsumeToken calls down onto all of the paths that *aren't* bailing out.

danix800 updated this revision to Diff 545056.Jul 28 2023, 1:42 AM
danix800 retitled this revision from [Parser][ObjC] Stop parsing on eof to [Parser][ObjC] Fix parser crash on nested top-level block with better recovery path.
danix800 edited the summary of this revision. (Show Details)

Delay consuming tokens and bail out early if nested top-level block are met
for better diag & recovery.

danix800 updated this revision to Diff 545069.Jul 28 2023, 1:58 AM

Update clang/docs/ReleaseNotes.rst.

danix800 added inline comments.Jul 28 2023, 2:06 AM
clang/lib/Parse/ParseObjc.cpp
749

Actually clang/lib/Parse/ParseObjc.cpp has ObjC part of the syntax well documented, plus well written code, it's really helpful.

danix800 edited the summary of this revision. (Show Details)Jul 28 2023, 2:15 AM
rjmccall added inline comments.Jul 28 2023, 10:54 AM
clang/docs/ReleaseNotes.rst
125 ↗(On Diff #545069)

"Fixed a crash when parsing top-level ObjC blocks that aren't properly terminated.
Clang should now also recover better when an @end is missing between blocks."

clang/lib/Parse/ParseObjc.cpp
725

This use of goto is a little too spaghetti-ish for me — we're breaking out of a loop and then branching into one case of an if/else chain. There are two reasonable ways to do this with more structure. The first is that we can make sure that we emit the "missing @end" diagnostic before we break out; to avoid code repetition, you can extract that into a helper function. The second is that we could set a flag to tell the code after the loop that we need to emit that diagnostic instead of trying to recreate the case analysis that we did in the loop.

If you make a little helper function like isTopLevelObjCKeyword then you can just break out instead of having the awkwardness of being inside a switch.

737

I would suggest pulling this up to right after (or even before) the isTopLevelObjCKeyword check. Then after the check you know that you're dealing with a keyword that you actually want to handle, and you can consume both the @ and the keyword token in one place.

827

I don't think this case is actually reachable — we handle code-completion within the loop and then return.

danix800 updated this revision to Diff 545343.Jul 29 2023, 12:48 AM

Cleanup spaghetti-ish code.

rjmccall accepted this revision.Aug 2 2023, 12:02 PM

LGTM. Small wording suggestion for the comment, but feel free to commit with that change.

clang/lib/Parse/ParseObjc.cpp
742

"If we see something like '@interface' that's only allowed at the top level,
bail out as if we saw an '@end'. We'll diagnose this below."

This revision is now accepted and ready to land.Aug 2 2023, 12:02 PM
danix800 closed this revision.Aug 2 2023, 10:01 PM