This is an archive of the discontinued LLVM Phabricator instance.

[ODRHash] Hash `RecordDecl` and diagnose discovered mismatches.
ClosedPublic

Authored by vsapsai on Dec 19 2019, 4:29 PM.

Details

Summary

When two modules contain struct/union with the same name, check the
definitions are equivalent and diagnose if they are not. This is similar
to CXXRecordDecl where we already discover and diagnose mismatches.

rdar://problem/56764293

Diff Detail

Event Timeline

bruno created this revision.Dec 19 2019, 4:29 PM
bruno edited the summary of this revision. (Show Details)Dec 19 2019, 4:30 PM
jdoerfert resigned from this revision.Dec 19 2019, 4:53 PM
martong added inline comments.Dec 20 2019, 2:28 AM
clang/lib/Serialization/ASTReader.cpp
9524

Would it be possible to check the structural non-equivalency with ODRHash?
I wonder if the ODRHash of the two Decls are different in this case or not.
ASTStructuralEquivalency and ODRHash seems to serve a very similar use case, however, AFAIK in the rest of the ASTReader code ODRHash is used exclusively.

(On a long term, I am thinking about that perhaps ASTImporter could use ODRHash instead of ASTStructuralEquivalency.)

bruno updated this revision to Diff 236964.Jan 8 2020, 9:50 PM

Change the approach: handle type merging for tag types using the ODRHash mechanism

bruno marked an inline comment as done.Jan 8 2020, 9:53 PM
clang/lib/Serialization/ASTReader.cpp
9524

Great idea. Updated the patch with another approach!

bruno edited the summary of this revision. (Show Details)Jan 8 2020, 9:54 PM
bruno added a reviewer: rtrieu.
bruno updated this revision to Diff 236965.Jan 8 2020, 9:58 PM

Remove some FIXMEs that are now done.

rtrieu added inline comments.Jan 9 2020, 6:49 PM
clang/lib/Serialization/ASTReader.cpp
9699

Is this just a copy of the other loop? That's a lot of code duplication.

bruno marked an inline comment as done.Jan 10 2020, 9:50 AM
bruno added inline comments.
clang/lib/Serialization/ASTReader.cpp
9699

There's a lot of copy but a lot of CXXRecord specific logic got trimmed out and there are small differences in the approach, so a direct refactor of it won't work, but there's certainly more small pieces that can be factored out, will follow up with that.

bruno updated this revision to Diff 237447.Jan 10 2020, 3:34 PM

Updated the patch to refactor mode code.

  • Move off everything that can be reused for RecordDecl.
  • Early exist when not C++.

Once the patch gets approved the idea is to upstream some of the refactoring
first so its easier to spot the actual non NFC part.

rtrieu added inline comments.Jan 10 2020, 6:16 PM
clang/lib/Serialization/ASTReader.cpp
9699

I like this refactoring much better. Probably more refactoring could be done to this function since it's so long, but that's an issue for another time.

Think I'll need to make another review pass but here are my comments so far:

  • Why are you adding ODR hash support for RecordDecl and not TagDecl? We already have support for EnumDecl, so TagDecl seems like a good candidate to cover both. Honestly, I don't know if it is possible or a good idea but it looks plausible enough to consider.
  • Are anonymous structs working? Worth adding test cases.
  • Are unions working? Didn't notice any code specifically for them but RecordDecl covers both structs and unions, so they should be working and we need to test that.
  • Few testing additions. These cases might be already covered or might be low value, so take these suggestions with a grain of salt:
    • test self-referential structs like struct Node { struct Node *next; };
    • test comparing structs and forward declarations, e.g., struct S; and struct S { ... };, and another couple struct S { ... }; and struct S; struct S { ... }; The motivation is to make sure we aren't stumped when we cannot find struct definition or when the definition is in unexpected place.
clang/lib/AST/Decl.cpp
4896

To be consistent with the existing code, it is better to call it as setHasODRHash(true)

Heads up in case it affects you refactoring work:
While looking through this code, I found a bug I previously made. I fixed it with a small change, but that lies in the middle of your refactoring during FieldDecl handling. The change is here:

https://reviews.llvm.org/rGa60e8927297005898b10a46300d929ba5cf7833c

bruno added a comment.Jan 31 2020, 5:46 PM
  • Why are you adding ODR hash support for RecordDecl and not TagDecl? We already have support for EnumDecl, so TagDecl seems like a good candidate to cover both. Honestly, I don't know if it is possible or a good idea but it looks plausible enough to consider.

The reason is that in C++ ODR diagnostics for TagDecl are handled as part of CXXRecordDecl, I didn't want add a path in TagDecl for C/ObjC only.

  • Are anonymous structs working? Worth adding test cases.

They are at the top level but not nested ones, I fixed that and should upload a new patch with that next.

  • Are unions working? Didn't notice any code specifically for them but RecordDecl covers both structs and unions, so they should be working and we need to test that.

Done in upcoming patch.

  • Few testing additions. These cases might be already covered or might be low value, so take these suggestions with a grain of salt:
    • test self-referential structs like struct Node { struct Node *next; };
    • test comparing structs and forward declarations, e.g., struct S; and struct S { ... };, and another couple struct S { ... }; and struct S; struct S { ... }; The motivation is to make sure we aren't stumped when we cannot find struct definition or when the definition is in unexpected place.

Ditto.

Heads up in case it affects you refactoring work:
While looking through this code, I found a bug I previously made. I fixed it with a small change, but that lies in the middle of your refactoring during FieldDecl handling. The change is here:

https://reviews.llvm.org/rGa60e8927297005898b10a46300d929ba5cf7833c

Thanks for the heads up @rtrieu, I end up including that as part of rG90f58eaeff5f

bruno updated this revision to Diff 241855.Jan 31 2020, 5:49 PM
  • Update patch after Volodymyr's review.
  • Refactor bits done as part of rG90f58eaeff5f, update it to remove that part.
rsmith added inline comments.Feb 4 2020, 6:20 PM
clang/include/clang/AST/Decl.h
4268–4269

We shouldn't store this here; this will make all CXXRecordDecls larger to store a field they will never look at.

We have 27 spare trailing bits in RecordDeclBitfields (28 if you don't add the HasODRHash bit and use 0 to mean "hash not computed"). Can we store this there instead of here?

clang/lib/AST/ODRHash.cpp
559–560

This is, at least in principle, not reliable. After definition merging, we could have picked a different definition of this record type as "the" definition. (It's probably not straightforward to get this to happen in practice, and maybe not even possible, but I'm not certain of that.)

Maybe we could add to TagDecl something like

bool isThisDeclarationADemotedDefinition() const {
  return !isThisDeclarationADefinition() && BraceRange.isValid();
}

and then check isThisDeclarationADefinition() || isThisDeclarationADemotedDefinition() here? (I'm not sure we always update BraceRange properly for demoted definitions either, so maybe that's not quite right.)

clang/lib/Serialization/ASTReader.cpp
9666–9668

I don't understand why this assertion would be correct if the above branch can ever be taken. It's possible for two different types to have the same hash, and it looks like we'll assert here if that happens. Should we be branching on hasSameType above instead of comparing hashes?

bruno marked 3 inline comments as done.Feb 13 2020, 8:00 AM

Thanks for taking a look Richard, comments inline.

clang/include/clang/AST/Decl.h
4268–4269

Makes sense!

clang/lib/AST/ODRHash.cpp
559–560

There are two things I realized while looking at this:

  • What I really intended here was to hash underlying anonymous structs/unions, not just any underlying RecordDecl. Will change the logic in ODRHash::AddRecordDecl do reflect that properly.
  • This patch was skipping RecordDecls without definitions during ODR diagnostics, I've changed to instead decide at merge time if we want to register those Decls for later diagnose, and the new rule is clang skips RecordDecls that disagree on whether they have a definition. This should still allow clang to diagnose declarations without defintions that have different attributes (which I intend to add for specific attributes in future patches).

Will update the patch to reflect that. WDYT?

clang/lib/Serialization/ASTReader.cpp
9666–9668

This is trying to cover two things. The first one is to handle nested anonymous unions, which might have the same type, but underlying mismatching fields:

#if defined(FIRST)
struct AU {
  union {
    int a;
  };
};
#elif defined(SECOND)
struct AU {
  union {
    char a;
  };
};
#else
struct AU au;

The second is to allow for @interfaces (downstream patches at the moment) to reuse logic to diagnose fields in ODRDiagField without having to add its own check for the underlying types. Does that makes sense?

bruno updated this revision to Diff 244435.Feb 13 2020, 8:02 AM

Address Richard's review.

rsmith added inline comments.Feb 25 2020, 9:47 AM
clang/lib/AST/DeclCXX.cpp
487 ↗(On Diff #244435)

I think this change is no longer necessary.

clang/lib/Serialization/ASTReader.cpp
9666–9668

I still don't understand.

Either the types are always the same before the previous if and can only differ in type sugar (in which case this change is unnecessary), or the types can be different in more than just sugar (in which case this assert is wrong because there's no guarantee that different types will have different hashes).

What am I missing?

clang/lib/Serialization/ASTWriterDecl.cpp
496

It would be better to avoid writing this at all for a CXXRecordDecl, to avoid adding 6 unused bits per class to the bitcode. (Please also look at isa<CXXRecordDecl>(D) not at the LangOpts here.)

vsapsai commandeered this revision.Jun 18 2021, 11:59 AM
vsapsai edited reviewers, added: bruno; removed: vsapsai.
bruno added inline comments.Jun 18 2021, 12:27 PM
clang/lib/Serialization/ASTReader.cpp
9666–9668

@vsapsai: just to follow up here a bit. I don't remember the full context anymore, but it should be fine to reintroduce this in a later patch with a better explanation to @rsmith, or change the approach if it makes sense. Thanks for taking this over!

vsapsai updated this revision to Diff 353081.Jun 18 2021, 12:50 PM
vsapsai marked an inline comment as done.

Rebase on top of "main" and address review comments.

vsapsai added inline comments.Jun 18 2021, 1:10 PM
clang/lib/AST/DeclCXX.cpp
487 ↗(On Diff #244435)

Reverted the change.

clang/lib/Serialization/ASTReader.cpp
9666–9668

I think it makes the most sense to remove the assertion. Based on the local code, it doesn't make much sense, as equality of field names doesn't imply the equality of field types. On the higher level, we were checking field types in a different place, specifically we populated PendingOdrMergeChecks in ASTDeclReader::findExisting and diagnosed it earlier in this mega-method ASTReader::diagnoseOdrViolations (error constant err_module_odr_violation_missing_decl). Also you can notice that the diagnostic for

struct S { int x; };

struct S { float x; };

is different, it is

'S::x' from module 'SecondModule' is not present in definition of 'S' in module 'FirstModule'

compared to

'S' has different definitions in different modules; first difference is definition in module 'FirstModule' found field 'x' with type 'int'

that we emit in ODRDiagField.

It is possible to try to make unions behave the same way as structs (and keep the assertion) but I'm not sure it is worth it. Looks like the major difference is that not all tag types are true Redeclerable and I don't know if it is something we should change.

clang/lib/Serialization/ASTWriterDecl.cpp
496

Changed to write ODR hash for everything except CXXRecordDecl (right now everything is RecordDecl). Also updated ASTReaderDecl.cpp. Not 100% sure I understand your comment correctly, so if my change is way off, please let me know.

vsapsai added inline comments.Jul 21 2021, 12:09 PM
clang/lib/Serialization/ASTReaderDecl.cpp
834

During investigation of a different issue noticed there is no error for the following test case

#if defined(FIRST)
typedef struct FW FW;
struct FW {
  int x;
};
#elif defined(SECOND)
struct FW {
  float a;
};
#else
struct FW fw;
#endif

And that is because Canon is a forward declaration from the typedef and we don't compare non-definition with definition.

Found another case that doesn't emit an error

#if defined(FIRST)
struct Indirect {
  int x;
};
struct Direct {
  struct Indirect i;
};
#elif defined(SECOND)
struct Indirect {
  double a;
};
struct Direct {
  struct Indirect i;
};
#else
struct Direct d;
#endif

According to my debugging there is no error because Direct fields aren't deserialized in -fsyntax-only mode and therefore Indirect definitions aren't compared. But during IRGen there is diagnostic and that's because calculating record layout triggers full deserialization. Also there is diagnostic in C++ because we are dealing with default initialization and DeclareImplicitDefaultConstructor iterates through all the fields deserializing them.

I believe the best user experience is consistent diagnostic, so we should emit the error even with -fsyntax-only. If anybody has any objections, please let me know, it would save time.

vsapsai planned changes to this revision.Aug 12 2021, 1:08 PM

Need to detect mismatches in nested structs.

vsapsai updated this revision to Diff 482981.Dec 14 2022, 1:42 PM

Rebase and use ODRDiagsEmitter. Add more tests.

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2022, 1:42 PM
vsapsai retitled this revision from [Modules] Handle tag types and complain about bad merges in C/Objective-C mode to [ODRHash] Hash `RecordDecl` and diagnose discovered mismatches..Dec 14 2022, 2:07 PM
vsapsai edited the summary of this revision. (Show Details)
vsapsai added reviewers: ChuanqiXu, jansvoboda11.
vsapsai added a subscriber: Restricted Project.

If anybody is curious about anonymous structs/unions, the relevant change is D140055.

Ping. Also checked the change on a bunch of internal Objective-C/C code using modules - no new errors.

ChuanqiXu added inline comments.Jan 16 2023, 11:30 PM
clang/lib/AST/Decl.cpp
4897–4898

I am curious for the operation. It looks dangerous. How can we be sure that the hash value is still meaningful after remove its lower 6 bits?

clang/lib/AST/ODRDiagsEmitter.cpp
1550–1551

The implementation of this function looks redundant with other overloads. Of course this is not the problem of the patch. I think we can add a FIXME at least.

clang/lib/AST/ODRHash.cpp
598–599

For the current implementation, if it makes sense to add an assertion !isa<CXXRecordDecl>(Record);

vsapsai added inline comments.Jan 17 2023, 8:10 PM
clang/lib/AST/Decl.cpp
4897–4898

Hash value itself has no meaning. The risk with truncation is that RecordDecls with different hashes can end up with equal truncated hashes. It means we'd miss some mismatches. Currently we are missing all mismatches, so some looks like an improvement.

This applies only to C and Objective-C but not to C++ as CXXRecordDecl stores its own ODR hash separately. So some imperfection in Objective-C seems more desirable than adding 4 bytes in memory consumption per each struct and class, including C++ classes.

clang/lib/AST/ODRDiagsEmitter.cpp
1550–1551

Do you have any specific ideas to reduce redundancy? PopulateHashes probably can be made the same for different entities but the rest has many annoying differences. Diagnosing mismatches for different entities consists of the same pieces (that are already put into reusable methods) but the composition of such pieces is different for different entities.

I've tried to push complex logic into reusable methods and repeat the simple stuff. For example, for std::string FirstModule = getOwningModuleNameForDiagnostic(FirstRecord); obtaining a module name is non-trivial and that's why it is in a reusable method. But the same assignment doesn't seem to carry the problems of copy-pasted code.

By the way, Phabricator shows big [visually] contiguous chunks as being copied. But in fact these big chunks consist of smaller pieces that are taken from different places. So I think that reusing the same pieces multiple times and composing them in different ways is actually useful.

clang/lib/AST/ODRHash.cpp
598–599

Yes, that's a good idea as calling AddRecordDecl with CXXRecordDecl looks like an easy mistake to make.

vsapsai updated this revision to Diff 490036.Jan 17 2023, 10:04 PM

Rebase and add assertion in ODRHash::AddRecordDecl.

vsapsai marked an inline comment as done.Jan 17 2023, 10:04 PM
ChuanqiXu accepted this revision.Jan 17 2023, 10:48 PM

LGTM generally. It'd better to mention this in the Potentially Breaking Changes section of ReleaseNotes.

clang/lib/AST/Decl.cpp
4714

nit: setODRHash(0) may be more consistent with above style.

4897–4898

It sounds not bad.

clang/lib/AST/ODRDiagsEmitter.cpp
1550–1551

No specific ideas really. I feel it is scary to see so many copied chunks and we should be able to extract them into smaller common functions. But this is not the problem of the patch. I don't want to block it for such reasons.

This revision is now accepted and ready to land.Jan 17 2023, 10:48 PM
vsapsai updated this revision to Diff 490354.Jan 18 2023, 6:35 PM

setODRHash(0) for consistency, mention the change in Potentially Breaking Changes.

vsapsai marked an inline comment as done.Jan 18 2023, 8:01 PM

LGTM generally. It'd better to mention this in the Potentially Breaking Changes section of ReleaseNotes.

Thanks for all your efforts during the review! Mentioned the change in the release notes.

clang/lib/AST/Decl.cpp
4714

You are right, changed it.

This revision was landed with ongoing or failed builds.Jan 19 2023, 1:59 PM
This revision was automatically updated to reflect the committed changes.
vsapsai marked an inline comment as done.