This is an archive of the discontinued LLVM Phabricator instance.

Keep invalid Switch in the AST
Needs ReviewPublic

Authored by ogoffart on Nov 7 2016, 6:40 AM.

Details

Summary
  • When the condition is invalid, replace it by an OpaqueValueExpr
  • When parsing an invalid CaseStmt, don't drop the sub statement, just return it instead.
  • In Sema::ActOnStartOfSwitchStmt, always keep the SwitchStmt, even if it has duplicate case or defaults statement or that the condition cannot be converted to an integral type.

Diff Detail

Event Timeline

ogoffart updated this revision to Diff 77030.Nov 7 2016, 6:40 AM
ogoffart retitled this revision from to Keep invalid Switch in the AST.
ogoffart updated this object.
ogoffart added reviewers: rsmith, cfe-commits.

I believe only the change in ActOnFinishSwitchStmt might be controversial.
Is it breaking an invariant than having switches kept in the AST?

aaron.ballman edited reviewers, added: aaron.ballman; removed: cfe-commits.Jul 14 2017, 5:28 AM
aaron.ballman added a subscriber: cfe-commits.
aaron.ballman edited edge metadata.Jul 14 2017, 5:30 AM

You've explained how you are accomplishing this but not why. I don't think Clang typically keeps erroneous AST nodes in the tree. What kind of problem is this intended to solve?

The problem i'm trying to solve is precisely to keep as much as possible of the valid AST in the main AST, despite errors.
I've already done some work with r249982, r272962 and more, and there is still a lot to do. But the goal is to keep as much as possible of it.

The reason i'm working on this is highlighting of code where some code might be potentially invalid (because you are editing it, or because the tool don't have access to all headers)
Things like if statement from my previous patch, or switch statement like this patch are the things which have the more impact, because not keeping them removes highlighting for potentially big blocks of code.

This is useful for example for IDE such as KDevelop which use clang for highlighting, or my own tool [code.woboq.org]

A random example is https://code.woboq.org/linux/linux/arch/arm/kernel/module.c.html?style=kdevelop#101
(generated with this patch applied.) There are errors because this is an arm file built with the option for an intel kernel. Yet, most of the file is properly highlighted. If this patch was not applied, the whole switch statement would be removed from the AST and nothing within would be hightlighted, only because some case label are invalid.

The problem i'm trying to solve is precisely to keep as much as possible of the valid AST in the main AST, despite errors.
I've already done some work with r249982, r272962 and more, and there is still a lot to do. But the goal is to keep as much as possible of it.

Thank you for the explanation -- I like the goal, but am definitely concerned about the assumptions it might break by doing this (in general, not specific to this patch). We try to recover gracefully whenever possible, but still, a whole lot of frontend code relies on knowing when something is invalid to prevent the compiler from doing even worse things. I'm not certain the best balance to strike with that, but suspect it's going to adversely impact tools like clang-tidy which tend to assume that AST matchers match *valid* code and not invalid code.

lib/Parse/ParseStmt.cpp
1297

succeed -> succeeds

Are the concerns pointed out in the FIXME addressed by code not in this patch?

lib/Sema/SemaStmt.cpp
669

This makes the condition result valid when it isn't. Users of this condition result may expect a valid condition result to return nonnull values when calling get(), which makes me uncomfortable.

Thanks for your review and i'll try to address the concerns.

I believe tools that really need valid code relies on the diagnostics and bail out on error. On the other hand, tools that may work on code containing error do a best effort to work on the remaining AST .
And patches like this one improve the remaining AST, so tools like clang-tidy will be able to also do valid transformation inside the switch statement, which could not have been possible when the whole body is gone.

The AST stays "valid" in the sense that all the nodes exist (no nullptr) and so conform to the expectation of the code. The condition of a switch may now be an OpaqueValueExpr which should not disturb the matchers.

lib/Parse/ParseStmt.cpp
1297

The FIXME is pointing out problems occuring if the parser found 'default' or 'case' statement, but cannot connect it to the corresponding 'switch' statement (because that switch statement did not exist as it was removed from the AST).

Now that we always keep the 'switch' statement, this is no longer a problem.

lib/Sema/SemaStmt.cpp
669

Get return a non-null value.
That's why i'm constructing an OpaqueValueExpr placeholder expression.

The ConditionVar (nullptr in the line bellow) can be null. It is null in valid code most of the time actually, when one does not declare a new variable in in condition.

But the result is that users of this condition will get a OpaqueValueExpr when calling get and should not be disturbed by that as they will just take that as an expression.

This looks reasonable to me, but you should wait for @rsmith to sign off before committing.

lib/Sema/SemaStmt.cpp
669

Ah, sorry, I misread the code in my haste.

rsmith added inline comments.Oct 10 2017, 3:07 PM
lib/Parse/ParseStmt.cpp
1297

I'm uncomfortable about this; this change couples Parser to the implementation details of Sema. How about this: remove this assert and change ActOnFinishSwitchStmt to take a StmtResult instead (which might be invalid). Then you can tell from within ActOnFinishSwitchStmt whether to check the case statements against the condition based on whether the switch is in fact invalid.

(Alternatively: change ActOnStartSwitchStmt to return void and make ActOnFinishSwitchStmt pick up the switch statement from the SwitchStack.)

lib/Sema/SemaStmt.cpp
672

Won't this result in warnings or errors later on if we have case labels with expressions of other types? (Eg, narrowing warnings/errors)

Please instead (somehow) track that the switch condition is invalid and skip those checks -- perhaps either by returning an invalid-but-not-null statement here and passing that back into ActOnFinishSwitchStmt, or by tracking an "invalid" flag on the SwitchStack entry.

1173–1177

Hmm. Removing this will result in us producing invalid ASTs in some cases (with duplicate case or default labels). That's a condition that it would be reasonable for AST consumers to assert on currently, so this is concerning.

That said... it's inevitable that this work to keep more invalid constructs in the AST will result in such changes. Perhaps what we need is just a marker to say "beyond this point the AST does not necessarily correspond to any valid source code" for Stmt nodes, analogous to the Invalid marker on declarations. (Maybe a wrapper InvalidStmt node, so that tree traversals can easily avoid walking through it.)

Let's try this change out as-is. It may be that this concern is baseless.

ogoffart updated this revision to Diff 119142.Oct 16 2017, 5:49 AM

Updated the patch so that ActOnStartOfSwitchStmt returns void, and ActOnFinishSwitchStmt will skip some checks in case of error

rsmith added inline comments.Feb 28 2018, 4:49 PM
lib/Sema/SemaStmt.cpp
823

It's fragile to assume that the only way you can see an OpaqueValueExpr here is by it being created in ActOnStartOfSwitchStmt. We could tunnel this information through in another way, though, such as by tracking a bool in the SwitchStack in addition to the statement.

However, perhaps it's time to bite the bullet and add actual support for error nodes in the AST. For example, we could add a new kind of placeholder type for an erroneous expression, and build syntactic expression trees with that type when we encounter errors.

aaron.ballman added inline comments.Feb 28 2018, 6:45 PM
lib/Sema/SemaStmt.cpp
823

FWIW, I would find error nodes in the AST to be extremely useful.