This is an archive of the discontinued LLVM Phabricator instance.

[clang] Do not crash on arrow operator on dependent type.
ClosedPublic

Authored by adamcz on Mar 16 2022, 9:44 AM.

Diff Detail

Event Timeline

adamcz created this revision.Mar 16 2022, 9:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 9:44 AM
adamcz requested review of this revision.Mar 16 2022, 9:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 9:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Thanks for looking at this. It seems an issue caused by https://reviews.llvm.org/D120812 (sorry), where we build a recovery-expr to model declrefexpr pointing to an invalid decl.

clang/lib/Sema/TreeTransform.h
14709

It looks like we are somehow doing a transformation on a dependent-type expression (I think it is the recoveryExpression) during the template instantiation, which leads to the crash.

clang/test/SemaCXX/arrow-operator.cpp
79

The AST nodes looks weird and inconsistent in the primary template and instantiated template.

VarDecl x AST node in the primary template look like (with a Valid bit!)

`-DeclStmt 
   `-VarDecl  x 'A<int> &'

while in the instantiated template (with an Invalid bit!):

`-DeclStmt
   `-VarDecl invalid x 'A<int> &'

The difference of valid bit results in two different forms of the expression x->call() (a normal CXXMemberCallExpr vs. a dependent-type CallExpr wrapping a RecoveryExpr), I think this is likely the root cause of the crash.

If we invalidate the VarDecl x in the primary template for this case, the issue should be gone. This seems a reasonable fix to me -- a reference-type VarDecl is invalid, when it doesn't have an initializer (either 1. it is not written in the source code or 2. it is too malformed to preserve ). Clang AST already does 1. We're missing 2, I think it is in Sema::ActOnInitializerError.

81

if we mark vardecl x invalid, I think the bogus requires an initializer diagnostic will be gone.

adamcz updated this revision to Diff 416561.Mar 18 2022, 10:36 AM

changed to marking VarDecl as invalid if it's ref type and initializer is invalid

Thanks!

I updated the change. Let me know if this is what you had in mind.
I kept the original test too, can't hurt right?

hokein accepted this revision.Mar 22 2022, 2:01 AM

Thanks, looks good!

please update the description of the patch accordingly (saying we invalidate the vardecl)

clang/test/SemaCXX/arrow-operator.cpp
79

I think the comment x is dependent should associate to x->call() statement nor the vardecl.

This revision is now accepted and ready to land.Mar 22 2022, 2:01 AM

I reduced something very similar recently as https://github.com/clangd/clangd/issues/1073

This patch does not fix it, but looks closely related, want to take a look?

adamcz updated this revision to Diff 417331.Mar 22 2022, 10:14 AM

Reverted to previous version + new test

I reduced something very similar recently as https://github.com/clangd/clangd/issues/1073

This patch does not fix it, but looks closely related, want to take a look?

Very similar. My original fix would've fixed that too. I'm reverting this change to that version and adding this as a test.

Based on my conversation with Sam I'm also reverting the mark-ref-VarDecl-as-invalid-when-on-invalid-init, as I no longer believe this is appropriate. The idea is that, init or not, we know the type of VarDecl so it's not invalid (unless it's auto or something).

PTAL

Sorry, I thought this crash is fixed as a bonus effort by improving the AST for reference-type var-decl.

Beyond the crash cased by the dependent-type RecoveryExpr built in Sema::BuildDeclarationNameExpr, I think there is another issue which is worth fixing: whether we should mark the reference-type var-decl valid if it doesn't have an initializer (either the initializer is not provided in the source code or too broken to preserve). The current AST behavior is inconsistent:

int &a; // var-decl a is invalid without an initializer
int &b = undefine[1111]; // var-decl b is valid without an initializer
int &c = undefine; // var-decl c is valid with a recovery-expr initializer

From the AST point of view, there is no difference between a and b, they both don't have an initializer, but their valid bits are different. IMO marking b invalid is an improvement, the valid bit is consistent for refer-type var-decls without initializers; it avoids the bogus requires an initializer diagnostic during template instantiations; as a bonus, it seems to "fix" the crash (at least for the original testcase).

Another option would be to mark a valid regardless of the initializer. Pros: preserve more AST nodes for a; the valid bit are consistent among three cases. Cons: unknown? whether it'll break some subtle invariants in clang; the bogus requires an initializer diagnostic is still there.

But yeah, this is a separate issue, we should fix the crash first.

clang/lib/Sema/TreeTransform.h
14708

I'm not sure this is the only place triggering the crash, it looks like that we're fixing a symptom.

While here, First refers to a dependent-type RecoveryExpr (which is built from the code path: TransformDeclRefExpr -> RebuildDeclRefExpr-> Sema::BuildDeclarationNameExpr). So I think we have a high chance that the RecoveryExpr will spread multiple places in the TreeTransform and cause similar violations in other places.

A safe fix will be to not build the RecoveryExpr in TreeTransform::TransformDeclRefExpr -- Sema::BuildDeclarationNameExpr has a AcceptInvalidDecl parameter (by default it is false), we could reuse it to control whether we should build a RecoveryExpr.

Sorry, I thought this crash is fixed as a bonus effort by improving the AST for reference-type var-decl.

Beyond the crash cased by the dependent-type RecoveryExpr built in Sema::BuildDeclarationNameExpr, I think there is another issue which is worth fixing: whether we should mark the reference-type var-decl valid if it doesn't have an initializer (either the initializer is not provided in the source code or too broken to preserve). The current AST behavior is inconsistent:

Agree about the inconsistency. I think ideally we'd move in the direction of *not* marking such things invalid. Here's the quote from Richard:

Generally I think we should be moving towards finer-grained "invalid" / "contains errors" bits, so that we can preserve as much of the AST as possible and produce accurate diagnostics for independent errors wherever possible. To that end, I think generally the "invalid" bit on a declaration should concern *only* the "declaration" part of that declaration (how it's seen from other contexts that don't "look too closely"). So in particular:

  • The initializer of a variable should play no part in its "invalid" bit. If the initializer expression is marked as "contains errors", then things that care about the initializer (in particular, constant evaluation and any diagnostics that look into the initializer) may need to bail out, but we should be able to form a DeclRefExpr referring to that variable -- even if (say) the variable is constexpr. Exception: if the variable has a deduced type and the type can't be deduced due to an invalid initializer, then the declaration should be marked invalid.
  • The body of a function should play no part in its "invalid" bit. (This came up in a different code review recently.)

But practically, it's at least as important to make changes that make diagnostics better and not worse, and don't introduce new crashes. This is hard to do while keeping scope limited, so I'm OK with bending principle too.

Another option would be to mark a valid regardless of the initializer. Pros: preserve more AST nodes for a; the valid bit are consistent among three cases. Cons: unknown? whether it'll break some subtle invariants in clang; the bogus requires an initializer diagnostic is still there.
But yeah, this is a separate issue, we should fix the crash first.

+1

clang/lib/Sema/TreeTransform.h
14708

I agree with this FWIW: in principle it makes sense to have RecoveryExpr produced in template instantiation, in practice we probably haven't weakened the assumptions in template instantiation enough to do so safely, in the way that we have for "regular" sema.

We could try to do so in an ongoing way, but at least for syntax based tools like clangd the payoff won't be large as long as we keep mostly ignoring template instantiations.

That said, the current form of the patch is simple and fixes the crash in an obvious way, if this really is a more general problem then we'll see it again and have more data to generalize.

Oops, forgot conclusion:
Thanks for incorporating my testcase, and the patch LGTM in its current form (most of the alternatives discussed also sound OK).
I'll leave to Haojian to stamp.

hokein accepted this revision.Mar 23 2022, 1:49 AM

OK, let's move forward with it. Thanks for the investigation and the fix!

I will take a look on the invalid-bit problem, and monitor the crash report more closely.

clang/lib/Sema/TreeTransform.h
14708

yeah, I'm more worry about this is a more general problem in template instantiation.
I agree that having RecoveryExpr produced in template instantiation makes sensible (mostly for diagnostics), but it seems to me that we're opening a can of worms, and I'm not sure this is a good tradeoff -- from our experience, tracing and fixing these kind of crashes is quite painful and requires large amount of effort (personally, I will be more conservative).

Given the current patch is a definitely crash fix, I'm fine with it.

14708

nit: please add some comments explaining why we hit a dependent-type express here.

adamcz updated this revision to Diff 418219.Mar 25 2022, 7:47 AM
adamcz marked an inline comment as done.

added a comment

Thanks for all the comments!

This revision was landed with ongoing or failed builds.Mar 25 2022, 7:53 AM
This revision was automatically updated to reflect the committed changes.