This is an archive of the discontinued LLVM Phabricator instance.

[Sema] Fix crash on __fp16 parameters in template instantiations
ClosedPublic

Authored by ilya-biryukov on Mar 20 2023, 7:54 AM.

Details

Summary

Fixes #61441.

Currently, Clang stores nullptr in the parameter lists inside
FunctionProtoTypeLoc if __fp16 is used without pointer qualifiers.

Any code path that calls Declarator::setInvalidType() before
GetFullTypeForDeclarator will lead to the same problem downstream.

The relevant code is:

cpp
if (D.isInvalidType())
  return Context.getTrivialTypeSourceInfo(T);

return GetTypeSourceInfoForDeclarator(state, T, TInfo);

GetTypeSourceInfoForDeclarator sets the parameter Decl, but we can't
call it when isInvalidType() == true as this causes other assertion
failures that seem harder to fix.

Diff Detail

Event Timeline

ilya-biryukov created this revision.Mar 20 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 7:54 AM
ilya-biryukov requested review of this revision.Mar 20 2023, 7:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 7:54 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
shafik added a subscriber: shafik.Mar 20 2023, 9:03 AM

Please edit the summary to indicate this fixes: https://github.com/llvm/llvm-project/issues/61441

I am happy someone had more time to dig into this problem after my initial investigation.

ilya-biryukov edited the summary of this revision. (Show Details)Mar 20 2023, 9:05 AM

Please edit the summary to indicate this fixes: https://github.com/llvm/llvm-project/issues/61441
I am happy someone had more time to dig into this problem after my initial investigation.

Thanks, done!
I wasn't aware of the upstream bug and your prior investigation. It would have definitely saved some time.

shafik added inline comments.Mar 20 2023, 9:07 AM
clang/test/SemaCXX/crash-params.cpp
3

If this test really does not fit into any other test set the please rename the test file GH61441.cpp so we know it is a regression test for that github bug report.

Otherwise if we add a test from a bug report into an existing test file then we wrap it something like namespace GH61441 so we know that test is a regression test for github bug 61441.

shafik added inline comments.Mar 20 2023, 9:12 AM
clang/lib/Sema/SemaDeclCXX.cpp
1714

This comment seems to no longer be true since we don't emit a diagnostic for the !PD case.

Please edit the summary to indicate this fixes: https://github.com/llvm/llvm-project/issues/61441
I am happy someone had more time to dig into this problem after my initial investigation.

Thanks, done!
I wasn't aware of the upstream bug and your prior investigation. It would have definitely saved some time.

That is unfortunate, I definitely would have preferred for you to have benefitted from my work.

shafik added a reviewer: Restricted Project.Mar 20 2023, 9:22 AM

Adding clang-language-wg for more visibility

ilya-biryukov marked an inline comment as done.
  • rename test file to GH61441.cpp, add a comment this checks for no crash
  • Update outdated comment
clang/lib/Sema/SemaDeclCXX.cpp
1714

Good catch! Updated the comment.

clang/test/SemaCXX/crash-params.cpp
3

I am not very good at navigating the test files and couldn't find a good existing file for the test.
Ended up creating GH61441.cpp.

kadircet accepted this revision.Mar 20 2023, 10:52 AM

as discussed offline, this feels a little fishy and we should probably try and not put nulls into the parameter lists at all (and mark the functiontype as invalid instead), but since i don't know how to do that change myself it doesn't feel fair to ask it from you :)
LG and addresses a common crash we see on clangd, so let's ship it.

This revision is now accepted and ready to land.Mar 20 2023, 10:52 AM
aaron.ballman added a subscriber: aaron.ballman.

This feels like it's heading in the wrong direction -- the AST should not have holes in it. An invalid type should be replaced by a valid type (after diagnosing the invalid type, of course) so that we can keep as much of the AST around as possible (for example, we typically stub in int and continue compilation, as in: https://godbolt.org/z/MvxjGovGh), which should then result in a non-null ParmVarDecl. This way, we don't need to sprinkle nullptr checks all over the compiler when inspecting a function's parameters.

This feels like it's heading in the wrong direction -- the AST should not have holes in it. An invalid type should be replaced by a valid type (after diagnosing the invalid type, of course) so that we can keep as much of the AST around as possible (for example, we typically stub in int and continue compilation, as in: https://godbolt.org/z/MvxjGovGh), which should then result in a non-null ParmVarDecl. This way, we don't need to sprinkle nullptr checks all over the compiler when inspecting a function's parameters.

Frankly, I would like to see us just insert the __fp16 type instead. We've already diagnosed, so we won't go to codegen, which is where the parameter issue is going to cause a problem. I can't imagine we have ANY code that depends on __fp16 and would break if it is a parameter.

ilya-biryukov added a comment.EditedMar 20 2023, 12:21 PM

This feels like it's heading in the wrong direction -- the AST should not have holes in it. An invalid type should be replaced by a valid type (after diagnosing the invalid type, of course) so that we can keep as much of the AST around as possible (for example, we typically stub in int and continue compilation, as in: https://godbolt.org/z/MvxjGovGh), which should then result in a non-null ParmVarDecl. This way, we don't need to sprinkle nullptr checks all over the compiler when inspecting a function's parameters.

I actually prototyped doing this and it caused extra diagnostics with no locations during template instantiations.
Moreover, this happens not only with __fp16, but for all cases when Declarator::setInvalidType is called (and that's quite a lot of cases).
We also have checks for nullptr inside parameter lists in many places of the codebase, e.g. see a usage in SemaTemplateInstantiateDecl.cpp.

I fully agree that the direction taken is not very good, but it has not been chosen by this patch.
A more involved refactoring that would ensure parameter lists never have nulls is definitely very welcome, this change intends to quickly fix a crash we're experiencing.

This feels like it's heading in the wrong direction -- the AST should not have holes in it. An invalid type should be replaced by a valid type (after diagnosing the invalid type, of course) so that we can keep as much of the AST around as possible (for example, we typically stub in int and continue compilation, as in: https://godbolt.org/z/MvxjGovGh), which should then result in a non-null ParmVarDecl. This way, we don't need to sprinkle nullptr checks all over the compiler when inspecting a function's parameters.

Frankly, I would like to see us just insert the __fp16 type instead. We've already diagnosed, so we won't go to codegen, which is where the parameter issue is going to cause a problem. I can't imagine we have ANY code that depends on __fp16 and would break if it is a parameter.

+1 to that, __fp16 is a better choice here, e.g. that's something I would expect to see in AST dumps.
We just need a fix that avoids nulls inside parameter lists.
Note that the types are already handled properly, but for some reason that I didn't dig out we choose to defer setting the parameter decls until a call to GetTypeSourceInfoForDeclarator.

As I noted in the bug report not doing D.setInvalidType(); does fix this bug and seems harmless since the error diagnostic should prevent us from getting to codegen but it is not clear to me if this has other negative impacts. Reading your replies it is not obvious you looked at this path or not.

I would not mind a quick fix but I think we should have a plan for a more correct fix before we do that.

As I noted in the bug report not doing D.setInvalidType(); does fix this bug and seems harmless since the error diagnostic should prevent us from getting to codegen but it is not clear to me if this has other negative impacts. Reading your replies it is not obvious you looked at this path or not.

There are too many other code paths that set setInvalidType(), I'll try to add more tests that capture those too.
It would be easy to avoid calling setInvalidType() for __fp16 in particular, but checking for nulls actually saves us from other potential crashes .

I would not mind a quick fix but I think we should have a plan for a more correct fix before we do that.

Based on conversations in this thread, I think the right way is to make sure parameter lists never have nulls.
One way to fix this is to make sure we always fill parameters with non-null values, even if GetTypeSourceInfoForDeclarator does not get called.
An alternative way would be to call GetTypeSourceInfoForDeclarator even for invalid types (IIUC, it can perfectly handle function types, but other types also fail there).

I'll try to explore some these fixes and try to come up with a follow-up change.

  • Add a test for block pointers
This revision was landed with ongoing or failed builds.Mar 21 2023, 6:09 AM
This revision was automatically updated to reflect the committed changes.

I landed the fix to unbreak our crashes and explored a bit of the alternative solution.

Digging a bit deeper, trying to always set non-null parameters causes ~30 test failures, but the ones I looked at so far look more localized and should be fixable.
Some started producing more diagnostics (I assume those are just noise, but I haven't checked closely), it might be trickier to sort those out.

I'll try to spend some more time on it and come up with a better fix some time this week.

As I noted in the bug report not doing D.setInvalidType(); does fix this bug and seems harmless since the error diagnostic should prevent us from getting to codegen but it is not clear to me if this has other negative impacts. Reading your replies it is not obvious you looked at this path or not.

There are too many other code paths that set setInvalidType(), I'll try to add more tests that capture those too.
It would be easy to avoid calling setInvalidType() for __fp16 in particular, but checking for nulls actually saves us from other potential crashes .

I would not mind a quick fix but I think we should have a plan for a more correct fix before we do that.

Based on conversations in this thread, I think the right way is to make sure parameter lists never have nulls.
One way to fix this is to make sure we always fill parameters with non-null values, even if GetTypeSourceInfoForDeclarator does not get called.
An alternative way would be to call GetTypeSourceInfoForDeclarator even for invalid types (IIUC, it can perfectly handle function types, but other types also fail there).

I'll try to explore some these fixes and try to come up with a follow-up change.

Thank you for offering to do that in a follow-up, but please, next time wait for there to be agreement on the patch before landing it. Multiple reviewers expressed concerns that this is papering over a larger bug and didn't have the chance to respond to your comments. Reverting recently broken patches is fine because that gets us back into a better state, but pushing temporary hacks is a different situation entirely (those have a tendency to become permanent and cause problems in the future because the design is more confused). (No need to revert your changes that have already landed, that'll just cause churn, but those changes should be reverted in your follow-up patch.)

Note that the types are already handled properly, but for some reason that I didn't dig out we choose to defer setting the parameter decls until a call to GetTypeSourceInfoForDeclarator.

It's been this way since the function was introduced: https://github.com/llvm/llvm-project/commit/fc8b02281020fa97d34fa98326080052ca0bb565, so this is a very long-standing way of handling things. I did a bit of debugging and can see that the tree transform is given a function prototype that has no ParmVarDecl objects to transform, so the transformed lambda call operator gets null ParmVarDecl objects as well.

GetTypeSourceInfoForDeclarator sets the parameter Decl, but we can't call it when isInvalidType() == true as this causes other assertion failures that seem harder to fix.

The implementation there is inconsistent and I suspect that might be part of the issue. Notice how this code block https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaType.cpp#L5447 doesn't set the declarator type as invalid despite issuing an error diagnostic, while this block https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaType.cpp#L5468 does. IIUC, setting the declarator type to be invalid is done to avoid additional diagnostics later (https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaType.cpp#L5608), but GetTypeSourceInfoForDeclarator() doesn't generate any additional diagnostics.

Thank you for offering to do that in a follow-up, but please, next time wait for there to be agreement on the patch before landing it. Multiple reviewers expressed concerns that this is papering over a larger bug and didn't have the chance to respond to your comments. Reverting recently broken patches is fine because that gets us back into a better state, but pushing temporary hacks is a different situation entirely (those have a tendency to become permanent and cause problems in the future because the design is more confused). (No need to revert your changes that have already landed, that'll just cause churn, but those changes should be reverted in your follow-up patch.)

Sorry about that. This looked like a small change and I tried to rush it in to address the crash we had in production.
I wouldn't mind a revert from someone else on the basis of not getting agreement here.
FWIW, new cases started showing up not handled by this patch, so the discussion was fully warranted and I think my original approach does not even help with that.

I tried to look at the problem more closely and I think there is a way to avoid getting null pointers entirely, but we'll have to make the recovery a bit more robust in case of errors.
See D146971 for the first attempt at this approach, let's continue the discussion there.