This is an archive of the discontinued LLVM Phabricator instance.

Fix compiler crash when an expression parsed in the tentative parsing and must be claimed in the another evaluation context.
ClosedPublic

Authored by ABataev on Jun 1 2020, 7:30 AM.

Details

Summary

Clang crashes when trying to finish function body. MaybeODRUseExprs is
not empty because of const static data member parsed in outer evaluation
context, upon call for isTypeIdInParens() function. It builds
annot_primary_expr, later parsed in ParseConstantExpression() in
inner constant expression evaluation context.

Diff Detail

Event Timeline

ABataev created this revision.Jun 1 2020, 7:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 7:30 AM

Narrowly this seems to fix the immediate problem, but I feel like we're in trouble if tentative parsing is changing the semantic context in ways that persist. In particular, I'm concerned that we could end up tentatively parsing an ODR use as part of an expression and then completely discarding it, causing Sema to think that there's an ODR use later because it never sees an L2R conversion (because the expression is not actually used). Probably the most architectural thing would be for tentative expression parsing to push a possibly-unevaluated context, and then when we claim an expression annotation token we can do the retroactive work necessary to make it an expression in the proper context. We already have most of the logic to support that because of C99 sizeof, which is usually not evaluated but can be in the narrow circumstance of a VLA.

If we do decide to solve this more narrowly, then we should audit our use of the tentative-parsing queries to make sure that we're pushing contexts consistently, and we should leave comments in places like this to make sure that maintainers understand the subtleties.

Narrowly this seems to fix the immediate problem, but I feel like we're in trouble if tentative parsing is changing the semantic context in ways that persist. In particular, I'm concerned that we could end up tentatively parsing an ODR use as part of an expression and then completely discarding it, causing Sema to think that there's an ODR use later because it never sees an L2R conversion (because the expression is not actually used). Probably the most architectural thing would be for tentative expression parsing to push a possibly-unevaluated context, and then when we claim an expression annotation token we can do the retroactive work necessary to make it an expression in the proper context. We already have most of the logic to support that because of C99 sizeof, which is usually not evaluated but can be in the narrow circumstance of a VLA.

If we do decide to solve this more narrowly, then we should audit our use of the tentative-parsing queries to make sure that we're pushing contexts consistently, and we should leave comments in places like this to make sure that maintainers understand the subtleties.

So, you suggest to not create annot_primary_expr during tentative parsing and revert parsing completely, right?

Narrowly this seems to fix the immediate problem, but I feel like we're in trouble if tentative parsing is changing the semantic context in ways that persist. In particular, I'm concerned that we could end up tentatively parsing an ODR use as part of an expression and then completely discarding it, causing Sema to think that there's an ODR use later because it never sees an L2R conversion (because the expression is not actually used). Probably the most architectural thing would be for tentative expression parsing to push a possibly-unevaluated context, and then when we claim an expression annotation token we can do the retroactive work necessary to make it an expression in the proper context. We already have most of the logic to support that because of C99 sizeof, which is usually not evaluated but can be in the narrow circumstance of a VLA.

If we do decide to solve this more narrowly, then we should audit our use of the tentative-parsing queries to make sure that we're pushing contexts consistently, and we should leave comments in places like this to make sure that maintainers understand the subtleties.

So, you suggest to not create annot_primary_expr during tentative parsing and revert parsing completely, right?

Not creating the annotation doesn't help if we're still making Sema calls. Also, I assume we're making the annotation token intentionally, probably to avoid re-doing the lookup. But I do think we could recognize that we're doing this, push an unevaluated context in tentative parsing, and then call TransformToPotentiallyEvaluated when we see the token in expression parsing.

Narrowly this seems to fix the immediate problem, but I feel like we're in trouble if tentative parsing is changing the semantic context in ways that persist. In particular, I'm concerned that we could end up tentatively parsing an ODR use as part of an expression and then completely discarding it, causing Sema to think that there's an ODR use later because it never sees an L2R conversion (because the expression is not actually used). Probably the most architectural thing would be for tentative expression parsing to push a possibly-unevaluated context, and then when we claim an expression annotation token we can do the retroactive work necessary to make it an expression in the proper context. We already have most of the logic to support that because of C99 sizeof, which is usually not evaluated but can be in the narrow circumstance of a VLA.

If we do decide to solve this more narrowly, then we should audit our use of the tentative-parsing queries to make sure that we're pushing contexts consistently, and we should leave comments in places like this to make sure that maintainers understand the subtleties.

So, you suggest to not create annot_primary_expr during tentative parsing and revert parsing completely, right?

Not creating the annotation doesn't help if we're still making Sema calls. Also, I assume we're making the annotation token intentionally, probably to avoid re-doing the lookup. But I do think we could recognize that we're doing this, push an unevaluated context in tentative parsing, and then call TransformToPotentiallyEvaluated when we see the token in expression parsing.

Ah, got it, will try to implement it.

ABataev updated this revision to Diff 267743.Jun 1 2020, 2:51 PM

Reworked after comments.

rjmccall added inline comments.Jun 1 2020, 7:14 PM
clang/lib/Parse/ParseExpr.cpp
1009 ↗(On Diff #267743)

Pushing an unevaluated context here isn't correct. Here we're parsing the expression for real and should already be in the right context for TransformToPotentiallyEvaluated.

clang/lib/Parse/ParseTentative.cpp
1279 ↗(On Diff #267743)

Suggestion:

// Tentative parsing may not be done in the right evaluation context
// for the ultimate expression.  Enter an unevaluated context to prevent
// Sema from immediately e.g. treating this lookup as a potential ODR-use.
// If we generate an expression annotation token and the parser actually
// claims it as an expression, we'll transform the expression to a
// potentially-evaluated one then.
ABataev marked an inline comment as done.Jun 2 2020, 5:09 AM
ABataev added inline comments.
clang/lib/Parse/ParseExpr.cpp
1009 ↗(On Diff #267743)

TransformToPotentiallyEvaluated expects that the innermost context is reevaluated. By the time we approach this branch, we're already out of the original unevaluated context. So, we need to create fake unevaluated context to mimic the original one. Function TransformToPotentiallyEvaluated copies the context state from the previous context to this new one and rebuilds the expression correctly. Instead, I can add a new function that transforms the expression in the current context and use it in TransformToPotentiallyEvaluated

ABataev retitled this revision from Fix compiler crash when trying to parse alignment argument as a constant expression. to Fix compiler crash when tentative parsing is unsuccessful..Jun 2 2020, 5:15 AM
ABataev retitled this revision from Fix compiler crash when tentative parsing is unsuccessful. to Fix compiler crash when an expression parsed in the tentative parsing and must be claimed in the another evaluation context..
rjmccall added inline comments.Jun 2 2020, 9:46 AM
clang/lib/Parse/ParseExpr.cpp
1009 ↗(On Diff #267743)

TransformToPotentiallyEvaluated expects that the innermost context is reevaluated. By the time we approach this branch, we're already out of the original unevaluated context.

Oh, I see. TransformToPotentiallyEvaluated expects that it will still be in a (temporary) unevaluated context and then looks through that context to build it in the surrounding context. (In fact, it actually changes the current context to be that outer context and never changes it back.) So we need to push an unevaluated context to balance things out.

I actually really dislike that design approach; the caller should be expected to have popped off the evaluation context under which the operand was parsed, and we should rebuild in the current context. There's way too much subtle reasoning about the exact global state here. But given that it is what it is, please leave a comment explaining why pushing a context is necessary here.

ABataev updated this revision to Diff 267927.Jun 2 2020, 10:23 AM

Rebase + fixes

This revision is now accepted and ready to land.Jun 2 2020, 10:34 AM
This revision was automatically updated to reflect the committed changes.

This is at best only a partial fix. Sema::NC_ContextIndependentExpr is supposed to be used (unsurprisingly) only if we form a context-independent annotation, but here we're forming a context-dependent expression that depends on whether it appears in an unevaluated context. I think the better approach would be to fix the case in Sema::ClassifyName that violates context-independence instead (there's a FIXME there for this issue).

This fix changed us from producing a bad AST if the member reference was not supposed to be evaluated, to producing a bad AST if the member reference was supposed to be annotated and is type-dependent -- we now crash in CodeGen on this invalid code:

struct C { void g(); };
template<typename T> struct A {
  T x;
  static void f() {
    (x.g());
  }
};
void h() { A<C>::f(); }

... because we now incorrectly form an unevaluated DeclRefExpr for x when disambiguating between a cast and a parenthesized expression (and we don't fix it due to the added "type-dependent" check).

I'm going to try to fix this a different way, by fixing the bad case in Sema::ClassifyName instead.

I'm going to try to fix this a different way, by fixing the bad case in Sema::ClassifyName instead.

Done in llvmorg-12-init-1234-g23d6525cbdc.