This is an archive of the discontinued LLVM Phabricator instance.

[modules] Fix miscompilation when using two RecordDecl definitions with the same name.
ClosedPublic

Authored by vsapsai on Jul 28 2021, 1:07 PM.

Details

Summary

When deserializing a RecordDecl we don't enforce that redeclaration
chain contains only a single definition. So if the canonical decl is not
a definition itself, RecordType::getDecl can return different objects
before and after an include. It means we can build CGRecordLayout for
one RecordDecl with its set of FieldDecl but try to use it with
FieldDecl belonging to a different RecordDecl. With assertions enabled
it results in

Assertion failed: (FieldInfo.count(FD) && "Invalid field for record!"),
function getLLVMFieldNo, file llvm-project/clang/lib/CodeGen/CGRecordLayout.h, line 199.

and with assertions disabled a bunch of fields are treated as their
memory is located at offset 0.

Fix by keeping the first encountered RecordDecl definition and marking
the subsequent ones as non-definitions. Also need to merge FieldDecl
properly, so that getPrimaryMergedDecl works correctly and during name
lookup we don't treat fields from same-name RecordDecl as ambiguous.

rdar://80184238

Diff Detail

Event Timeline

vsapsai created this revision.Jul 28 2021, 1:07 PM
vsapsai requested review of this revision.Jul 28 2021, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2021, 1:07 PM
vsapsai added subscribers: Restricted Project, rjmccall.Jul 28 2021, 1:09 PM

Added John McCall in case I'm missing some CodeGen pieces.

vsapsai added inline comments.Jul 28 2021, 1:19 PM
clang/lib/Serialization/ASTReaderDecl.cpp
832 ↗(On Diff #362508)

Here is the perfect place to compare if RD and OldDef are equivalent and emit diagnostic if they are not. I've tried StructuralEquivalenceContext for this purpose but it compares canonical decls which doesn't work in this case. I think the best approach for this task would be ODR hash comparison proposed in https://reviews.llvm.org/D71734 It will need some tweaks to work with the current patch but overall the plan is to use ODR hash instead of any other decl comparison.

For what it's worth, I think the right way to handle this in C is to properly implement the "compatible types" rule instead of trying to invent something ODR-ish: in C, struct definitions in different translation units are different types, but if they're structurally equivalent then they're compatible and you can implicitly pass an object of some type to a function accepting a compatible type. This would mean that we could have multiple, different types with the same name, and we'd need name lookup to deduplicate compatible types, but we wouldn't need to do any cross-module ODR-like struct merging.

But assuming we want to keep the current ODR-in-C approach, this looks OK. There might be some places that assume the lexical and semantic DeclContext for a C FieldDecl are the same (etc) but I don't think there's a good way to find such things other than by testing this patch broadly.

clang/lib/Serialization/ASTReaderDecl.cpp
832 ↗(On Diff #362508)

Just a minor note: it's not safe to emit diagnostics from here in general; in particular, emitting a diagnostic that refers to a declaration can trigger deserialization, which can reenter the AST reader in unfortunate ways and crash. But we can make a note to do the structural equivalence check here and then actually perform the check when we finish deserialization (with the other merging checks).

For what it's worth, I think the right way to handle this in C is to properly implement the "compatible types" rule instead of trying to invent something ODR-ish: in C, struct definitions in different translation units are different types, but if they're structurally equivalent then they're compatible and you can implicitly pass an object of some type to a function accepting a compatible type. This would mean that we could have multiple, different types with the same name, and we'd need name lookup to deduplicate compatible types, but we wouldn't need to do any cross-module ODR-like struct merging.

I agree that implementing the "compatible types" looks better as it models the language more faithfully. And in the long run we might need to do that anyway. Is there any work done for "compatible types" already? Or I can start by creating a new type for a new definition with the same name and see how it breaks the lookup?

From pragmatic perspective we are pretty invested into this ODR-ish approach and it's not clear how much work switching to "compatible types" would take. So I'd like to continue with the definition merging and evaluate the effort for "compatible types". That's why I'm curious what work is done already.

But assuming we want to keep the current ODR-in-C approach, this looks OK. There might be some places that assume the lexical and semantic DeclContext for a C FieldDecl are the same (etc) but I don't think there's a good way to find such things other than by testing this patch broadly.

Are there any known signs for mixing lexical and semantic DeclContext? I plan to test the change on our internal codebase, hopefully it'll help to catch any remaining issues.

clang/lib/Serialization/ASTReaderDecl.cpp
832 ↗(On Diff #362508)

Thanks for pointing it out, I didn't realize diagnostic can trigger deserialization. Was planning to do something like

if (OldDef->getODRHash() != RD->getODRHash())
  Reader.PendingRecordOdrMergeFailures[OldDef].push_back(RD);

For what it's worth, I think the right way to handle this in C is to properly implement the "compatible types" rule instead of trying to invent something ODR-ish: in C, struct definitions in different translation units are different types, but if they're structurally equivalent then they're compatible and you can implicitly pass an object of some type to a function accepting a compatible type. This would mean that we could have multiple, different types with the same name, and we'd need name lookup to deduplicate compatible types, but we wouldn't need to do any cross-module ODR-like struct merging.

I agree that implementing the "compatible types" looks better as it models the language more faithfully. And in the long run we might need to do that anyway. Is there any work done for "compatible types" already? Or I can start by creating a new type for a new definition with the same name and see how it breaks the lookup?

From pragmatic perspective we are pretty invested into this ODR-ish approach and it's not clear how much work switching to "compatible types" would take. So I'd like to continue with the definition merging and evaluate the effort for "compatible types". That's why I'm curious what work is done already.

I don't think there's been any real work done on a cross-TU implementation of compatible types. I also don't want that idea to get in the way of this patch, which seems like a clear improvement following our current approach.

But assuming we want to keep the current ODR-in-C approach, this looks OK. There might be some places that assume the lexical and semantic DeclContext for a C FieldDecl are the same (etc) but I don't think there's a good way to find such things other than by testing this patch broadly.

Are there any known signs for mixing lexical and semantic DeclContext? I plan to test the change on our internal codebase, hopefully it'll help to catch any remaining issues.

The kinds of things I saw go wrong when we were bringing this up on the C++ side were generally in code that would walk the list of (say) fields of a record building up some information, and then attempt to look up a given FieldDecl* in that data structure. That can fail if fields get merged, because the lookup key may be a different redeclaration of the same field than the one found by walking the class's members. The fix is usually to add getCanonicalDecl calls in the right places. The sign of this kind of bug happening was usually a crash or assert, usually pretty close to where the problem was.

clang/lib/Serialization/ASTReaderDecl.cpp
832 ↗(On Diff #362508)

That seems reasonable to me.

Are there any known signs for mixing lexical and semantic DeclContext? I plan to test the change on our internal codebase, hopefully it'll help to catch any remaining issues.

The kinds of things I saw go wrong when we were bringing this up on the C++ side were generally in code that would walk the list of (say) fields of a record building up some information, and then attempt to look up a given FieldDecl* in that data structure. That can fail if fields get merged, because the lookup key may be a different redeclaration of the same field than the one found by walking the class's members. The fix is usually to add getCanonicalDecl calls in the right places. The sign of this kind of bug happening was usually a crash or assert, usually pretty close to where the problem was.

Thanks, that's helpful.

vsapsai planned changes to this revision.Jul 29 2021, 3:38 PM

Discovered ambiguous name lookup for IndirectFieldDecl in anonymous structs.

vsapsai updated this revision to Diff 363891.Aug 3 2021, 3:28 PM

Handle nested anonymous structs and IndirectFieldDecl; more tests to cover unions and bitfields.

vsapsai updated this revision to Diff 363894.Aug 3 2021, 3:33 PM

Add missing changes back.

vsapsai added inline comments.Aug 3 2021, 3:55 PM
clang/lib/Serialization/ASTReaderDecl.cpp
3346–3347 ↗(On Diff #363894)

In D71734 we have

if (auto *RD = dyn_cast<RecordDecl>(DC))
  if (!RD->getASTContext().getLangOpts().CPlusPlus)
    return RD->getCanonicalDecl()->getDefinition();

I've verified that in C++ unions are also CXXRecordDecl, so I think CPlusPlus check is not required. Locally I'm testing with

if (auto *RD = dyn_cast<RecordDecl>(DC)) {
  assert(!RD->getASTContext().getLangOpts().CPlusPlus &&
         "Unexpected RecordDecl in C++");
  return RD->getDefinition();
}

to get extra reassurance. But don't think it should be in the final version.

Tested clang with this change on internal code and there were no regressions. Also have done limited testing of runtime behavior of projects built with this clang - no errors encountered. So the testing so far hasn't found any issues.

rsmith accepted this revision.Aug 30 2021, 12:39 PM

LGTM

clang/lib/Serialization/ASTReaderDecl.cpp
3328–3331 ↗(On Diff #363894)

I believe there's no need to have logic matching this case in C because the only way that a class definition can be added by an update record is due to template instantiation. So we can use the simpler logic below for C.

This revision is now accepted and ready to land.Aug 30 2021, 12:39 PM

Thanks for the review!

clang/lib/Serialization/ASTReaderDecl.cpp
3328–3331 ↗(On Diff #363894)

Thanks, it's good to know.