Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Thanks for looking at this. It seems an issue caused by https://reviews.llvm.org/D120812 (sorry), where we build a recovery-expr to model declrefexpr pointing to an invalid decl.
clang/lib/Sema/TreeTransform.h | ||
---|---|---|
14765 | It looks like we are somehow doing a transformation on a dependent-type expression (I think it is the recoveryExpression) during the template instantiation, which leads to the crash. | |
clang/test/SemaCXX/arrow-operator.cpp | ||
79 | The AST nodes looks weird and inconsistent in the primary template and instantiated template. VarDecl x AST node in the primary template look like (with a Valid bit!) `-DeclStmt `-VarDecl x 'A<int> &' while in the instantiated template (with an Invalid bit!): `-DeclStmt `-VarDecl invalid x 'A<int> &' The difference of valid bit results in two different forms of the expression x->call() (a normal CXXMemberCallExpr vs. a dependent-type CallExpr wrapping a RecoveryExpr), I think this is likely the root cause of the crash. If we invalidate the VarDecl x in the primary template for this case, the issue should be gone. This seems a reasonable fix to me -- a reference-type VarDecl is invalid, when it doesn't have an initializer (either 1. it is not written in the source code or 2. it is too malformed to preserve ). Clang AST already does 1. We're missing 2, I think it is in Sema::ActOnInitializerError. | |
81 | if we mark vardecl x invalid, I think the bogus requires an initializer diagnostic will be gone. |
Thanks!
I updated the change. Let me know if this is what you had in mind.
I kept the original test too, can't hurt right?
Thanks, looks good!
please update the description of the patch accordingly (saying we invalidate the vardecl)
clang/test/SemaCXX/arrow-operator.cpp | ||
---|---|---|
79 | I think the comment x is dependent should associate to x->call() statement nor the vardecl. |
I reduced something very similar recently as https://github.com/clangd/clangd/issues/1073
This patch does not fix it, but looks closely related, want to take a look?
Very similar. My original fix would've fixed that too. I'm reverting this change to that version and adding this as a test.
Based on my conversation with Sam I'm also reverting the mark-ref-VarDecl-as-invalid-when-on-invalid-init, as I no longer believe this is appropriate. The idea is that, init or not, we know the type of VarDecl so it's not invalid (unless it's auto or something).
PTAL
Sorry, I thought this crash is fixed as a bonus effort by improving the AST for reference-type var-decl.
Beyond the crash cased by the dependent-type RecoveryExpr built in Sema::BuildDeclarationNameExpr, I think there is another issue which is worth fixing: whether we should mark the reference-type var-decl valid if it doesn't have an initializer (either the initializer is not provided in the source code or too broken to preserve). The current AST behavior is inconsistent:
int &a; // var-decl a is invalid without an initializer int &b = undefine[1111]; // var-decl b is valid without an initializer int &c = undefine; // var-decl c is valid with a recovery-expr initializer
From the AST point of view, there is no difference between a and b, they both don't have an initializer, but their valid bits are different. IMO marking b invalid is an improvement, the valid bit is consistent for refer-type var-decls without initializers; it avoids the bogus requires an initializer diagnostic during template instantiations; as a bonus, it seems to "fix" the crash (at least for the original testcase).
Another option would be to mark a valid regardless of the initializer. Pros: preserve more AST nodes for a; the valid bit are consistent among three cases. Cons: unknown? whether it'll break some subtle invariants in clang; the bogus requires an initializer diagnostic is still there.
But yeah, this is a separate issue, we should fix the crash first.
clang/lib/Sema/TreeTransform.h | ||
---|---|---|
14760 | I'm not sure this is the only place triggering the crash, it looks like that we're fixing a symptom. While here, First refers to a dependent-type RecoveryExpr (which is built from the code path: TransformDeclRefExpr -> RebuildDeclRefExpr-> Sema::BuildDeclarationNameExpr). So I think we have a high chance that the RecoveryExpr will spread multiple places in the TreeTransform and cause similar violations in other places. A safe fix will be to not build the RecoveryExpr in TreeTransform::TransformDeclRefExpr -- Sema::BuildDeclarationNameExpr has a AcceptInvalidDecl parameter (by default it is false), we could reuse it to control whether we should build a RecoveryExpr. |
Agree about the inconsistency. I think ideally we'd move in the direction of *not* marking such things invalid. Here's the quote from Richard:
But practically, it's at least as important to make changes that make diagnostics better and not worse, and don't introduce new crashes. This is hard to do while keeping scope limited, so I'm OK with bending principle too.
Another option would be to mark a valid regardless of the initializer. Pros: preserve more AST nodes for a; the valid bit are consistent among three cases. Cons: unknown? whether it'll break some subtle invariants in clang; the bogus requires an initializer diagnostic is still there.
But yeah, this is a separate issue, we should fix the crash first.
+1
clang/lib/Sema/TreeTransform.h | ||
---|---|---|
14760 | I agree with this FWIW: in principle it makes sense to have RecoveryExpr produced in template instantiation, in practice we probably haven't weakened the assumptions in template instantiation enough to do so safely, in the way that we have for "regular" sema. We could try to do so in an ongoing way, but at least for syntax based tools like clangd the payoff won't be large as long as we keep mostly ignoring template instantiations. That said, the current form of the patch is simple and fixes the crash in an obvious way, if this really is a more general problem then we'll see it again and have more data to generalize. |
Oops, forgot conclusion:
Thanks for incorporating my testcase, and the patch LGTM in its current form (most of the alternatives discussed also sound OK).
I'll leave to Haojian to stamp.
OK, let's move forward with it. Thanks for the investigation and the fix!
I will take a look on the invalid-bit problem, and monitor the crash report more closely.
clang/lib/Sema/TreeTransform.h | ||
---|---|---|
14760 | yeah, I'm more worry about this is a more general problem in template instantiation. Given the current patch is a definitely crash fix, I'm fine with it. | |
14760 | nit: please add some comments explaining why we hit a dependent-type express here. |
I'm not sure this is the only place triggering the crash, it looks like that we're fixing a symptom.
While here, First refers to a dependent-type RecoveryExpr (which is built from the code path: TransformDeclRefExpr -> RebuildDeclRefExpr-> Sema::BuildDeclarationNameExpr). So I think we have a high chance that the RecoveryExpr will spread multiple places in the TreeTransform and cause similar violations in other places.
A safe fix will be to not build the RecoveryExpr in TreeTransform::TransformDeclRefExpr -- Sema::BuildDeclarationNameExpr has a AcceptInvalidDecl parameter (by default it is false), we could reuse it to control whether we should build a RecoveryExpr.