This is an archive of the discontinued LLVM Phabricator instance.

[ODRHash] Hash attributes on declarations.
AcceptedPublic

Authored by vsapsai on Oct 7 2022, 11:24 AM.

Details

Summary

Arguments not for all attributes are covered. More can be added later as
needed.

rdar://100482330

Diff Detail

Event Timeline

vsapsai created this revision.Oct 7 2022, 11:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 11:24 AM
Herald added a subscriber: ributzka. · View Herald Transcript
vsapsai requested review of this revision.Oct 7 2022, 11:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2022, 11:24 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
vsapsai added inline comments.Oct 7 2022, 11:32 AM
clang/lib/AST/ODRHash.cpp
479–480

I'm not sure isImplicit is the best indicator of attributes to check, so suggestions in this area are welcome. I think we can start strict and relax some of the checks if needed.

If people have strong opinions that some attributes shouldn't be ignored, we can add them to the tests to avoid regressions. Personally, I believe that alignment and packed attributes should never be silently ignored.

clang/test/Modules/odr_hash.cpp
3640

As we land hashing for C and Objective-C, we can move these tests to their own file. But for now I think it makes sense to keep everything in odr_hash.cpp. Though I don't have a strong preference.

I guess the reason why we didn't check odr violation for attributes is that the attributes was not a part of declarations in C++ before. But it is odd to skip the check for attributes from the perspective of users. So I think this one should be good. The only concern is that it misses too many checks for attributes. Do you plan to add it in near future or in longer future? And I guess we need to mention this in ReleaseNotes in Potential Breaking Changes section.

clang/lib/AST/ODRDiagsEmitter.cpp
341–358

I feel like we can merge these two statements.

clang/lib/AST/ODRHash.cpp
479–480

Agreed. I feel isImplicit is enough for now.

aaron.ballman added inline comments.
clang/include/clang/AST/ODRDiagsEmitter.h
119 ↗(On Diff #466133)
clang/lib/AST/ODRDiagsEmitter.cpp
159

Typedefs can have attributes, so should this be updated as well? (alignment, ext vector types, mode attributes, etc all come to mind)

306

Do we want to have more control over the printing policy here? e.g., do we really want to claim an ODR difference if one module's printing policy specifies indentation of 8 and another specifies indentation of 4? Or if one module prints restrict while another prints __restrict, etc?

315

You can probably drop the getIdentifier() here as the diagnostics engine knows how to print named declarations properly already.

322

Same here.

clang/lib/AST/ODRHash.cpp
479–480

The tricky part is -- sometimes certain attributes add additional implicit attributes and those implicit attributes matter (https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L9380). And some attributes seem to just do the wrong thing entirely: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaDeclAttr.cpp#L7344

So I think isImplicit() is a good approximation, but I'm more wondering what the principle is for whether an attribute should or should not be considered part of the ODR hash. Type attributes, attributes that impact object layout, etc all seem like they should almost certainly be part of ODR hashing. Others are a bit more questionable though.

I think this is something that may need a per-attribute flag in Attr.td so attributes can opt in or out of it because I'm not certain what ODR issues could stem from [[maybe_unused]] or [[deprecated]] disagreements across module boundaries.

494

100% agreed.

Most of my concerns change based on the feedback others have given, so after those are dealt with, I'll do another depever view.

clang/include/clang/Basic/DiagnosticASTKinds.td
852 ↗(On Diff #466133)

Wowzer these are tough to read... can you provide a magic-decoder ring for me of some sort?

clang/lib/AST/ODRHash.cpp
479–480

I don't think 'isImplicit' is particularly good. I think the idea of auto-adding 'type' attributes and having the 'rest' be analyzed to figure out which are important.

Alternatively, I wonder if we're better off just adding ALL attributes and seeing what the fallout is. We can later decide when we don't want them to be ODR significant (which, might be OTHERWISE meaningful later!).

I guess the reason why we didn't check odr violation for attributes is that the attributes was not a part of declarations in C++ before. But it is odd to skip the check for attributes from the perspective of users. So I think this one should be good.

I think we haven't seen many problems with attribute mismatches because in existing code attributes are often hidden behind macros. And we have sugar MacroQualifiedType that causes the definitions

#define NODEREF __attribute__((noderef))
struct StructWithField {
  int NODEREF *i_ptr;
};
struct StructWithField {
  int __attribute__((noderef)) *i_ptr;
};

to be treated as incompatible.

But the keyword alignas is used without macros and more attributes can be used in multiple compilers without macros. That's why I think it is useful to detect and to diagnose mismatches in attributes.

The only concern is that it misses too many checks for attributes. Do you plan to add it in near future or in longer future?

In the near future I was planning to add various Objective-C and Swift attributes. For other attributes I don't know which are high-value. I definitely want to check attributes affecting the memory layout (alignment, packing) and believe I've addressed them (totally could have missed something).

And I guess we need to mention this in ReleaseNotes in Potential Breaking Changes section.

Good idea, will do.

clang/lib/AST/ODRDiagsEmitter.cpp
341–358

Sorry, I don't really get what two statements you are talking about. Is it

  • !LHS || !RHS || LHS->getKind() != RHS->getKind()
  • ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS)

?

vsapsai added inline comments.Oct 11 2022, 7:47 PM
clang/include/clang/Basic/DiagnosticASTKinds.td
852 ↗(On Diff #466133)

Guess I went too far with avoiding repetition (was feeling soo smart doing it). I'll go back to the messages with a little bit more repetition but which should be easier to read and understand.

clang/lib/AST/ODRDiagsEmitter.cpp
159

It should be tested for sure, thanks for pointing it out.

306

FullAttributeText is used for diagnostics and not for the mismatch detection, so we shouldn't complain about restrict/__ restrict mismatches (DifferentAlignmentKeywords tests that __attribute__((aligned(8))) and alignas(8) are not a mismatch).

We use the same ASTContext to print both attributes, so that shouldn't be confusing. Diagnostic can be unstable across clang versions and probably across language modes. But I think that can happen to other diagnostic messages too, so think that's acceptable.

315

Thanks for the hint, will do.

clang/lib/AST/ODRHash.cpp
479–480

One criteria to decide which attributes should be hashed is if they affect IRGen. But that's not exhaustive and I'm not sure how practical it is.

The rule I'm trying to follow right now is if declarations with different attributes can be substituted instead of each other. For example, structs with different memory layout cannot be interchanged, so it is reasonable to reject them.

But maybe we should review attributes on case-by-case basis. For example, for [[deprecated]] I think the best for developers is not to complain about it but merge the attributes and then have regular diagnostic about using a deprecated entity.

479–480

One option was to hash isa<InheritedAttr> attributes as they are "sticky" and should be more significant, so shouldn't be ignored. But I don't think this argument is particularly convincing.

At this stage I think we can still afford to add all attributes and exclude some as-needed because modules adoption is still limited. Though I don't have strong feelings about this approach, just getting more restrictive later can be hard.

In the near future I was planning to add various Objective-C and Swift attributes. For other attributes I don't know which are high-value. I definitely want to check attributes affecting the memory layout (alignment, packing) and believe I've addressed them (totally could have missed something).

It looks like you're planning to add them one by one, or do I misunderstand? It looks not so good to add them one by one. Maybe it'll be a better idea to add them in the Attr.td?

clang/lib/AST/ODRDiagsEmitter.cpp
341–358

Sorry for the ambiguity. Since LHS->getKind() != RHS->getKind() is covered by ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS). I feel like it is reasonable to:

if (!LHS || !RHS || ComputeAttrODRHash(LHS) != ComputeAttrODRHash(RHS)) {
    DiagError();
    DiagNote();
    DiagNote();
    DiagNoteAttrLoc();
    return true;
}
clang/lib/AST/ODRHash.cpp
479–480

It looks not a bad idea to add all attributes as experiments, @vsapsai how do you feel about this?

aaron.ballman added inline comments.Oct 12 2022, 6:31 AM
clang/lib/AST/ODRDiagsEmitter.cpp
306

Oh, that's great, thank you!

clang/lib/AST/ODRHash.cpp
479–480

My intuition is that we're going to want this to be controlled from Attr.td on a case-by-case basis and to automatically generate the ODR hash code for attribute arguments.

We can either force every attribute to decide explicitly (this seems pretty disruptive as a final design but would be a really good way to ensure we audit all attributes if done as an experiment), or we can pick a default behavior to ODR hash/not. I suspect we're going to want to pick a default behavior based on whatever the audit tells us the most common situation is.

I think we're going to need something a bit more nuanced than "this attribute matters for ODR" because there are attribute arguments that don't always contribute to the entity (for example, we have "fake" arguments that are used to give the semantic attribute more information, there may also be cases where one argument matter and another argument does not such as enable_if where the condition matters greatly but the message doesn't matter much). So we might need a marking on the attribute level *and* on the parameter level to determine what all factors into attribute identity for generating the ODR hashing code. Hopefully we can avoid needing more granularity than that.

ChuanqiXu added inline comments.Oct 13 2022, 1:14 AM
clang/lib/AST/ODRHash.cpp
479–480

I feel the default behavior would be to do ODR hashes. I just took a quick look in Attr.td. And I feel most of them affecting the code generation. For example, there're a lot of attributes which is hardware related.

because there are attribute arguments that don't always contribute to the entity

When you're talking about "entities", I'm not sure if you're talking the "entities" in the language spec or a general abstract ones. I mean, for example, the always_inline attribute doesn't contribute to the declaration from the language perspective. But it'll be super odd if we lost it. So I feel it may be better to change the requirement to be "affecting code generation"

Also, to be honest, I am even not sure if we should take "affecting code generation " as the requirement. For example, for the preferred_name attribute, this is used for better printing names. It doesn't affect code generation nor the entity you mentioned. But if we drop it, then the printing name may not be what users want. I'm not sure if this is desired.

aaron.ballman added inline comments.Oct 13 2022, 8:17 AM
clang/lib/AST/ODRHash.cpp
479–480

I feel the default behavior would be to do ODR hashes. I just took a quick look in Attr.td. And I feel most of them affecting the code generation. For example, there're a lot of attributes which is hardware related.

Hmmm, I'm seeing a reasonably even split. We definitely have a ton of attributes like multiversioning, interrupts, calling conventions, availability, etc that I think all should be part of an ODR hash. We also have a ton of other ones that don't seem like they should matter for ODR hashing like hot/cold, consumed/returns retained, constructor/destructor, deprecated, nodiscard, maybe_unused, etc.

Then we have the really fun ones where I think we'll want them to be in the ODR hash but maybe we don't, like [[clang::annotate]], restrict, etc.

So I think we really do need an actual audit of the attributes to decide what the default should be (if there should even be one).

When you're talking about "entities", I'm not sure if you're talking the "entities" in the language spec or a general abstract ones. I mean, for example, the always_inline attribute doesn't contribute to the declaration from the language perspective. But it'll be super odd if we lost it. So I feel it may be better to change the requirement to be "affecting code generation"

Not in a language spec meaning -- I meant mostly that there's some notion of identify for the one definition rule and not all attribute arguments contribute to that identity the same way.

I don't know that "affecting code gen" really captures the whole of it. For example, whether a function is/is not marked hot or cold affects codegen, but really has nothing to do with the identity of the function (it's the same function whether the hint is there or not). preferred_name is another example -- do we really think a structure with that attribute is fundamentally a different thing than one without that attribute? I don't think so.

To me, it's more about what cross-TU behaviors you'll get if the attribute is only present on one side. Things like ABI tags, calling conventions, attributes that behave like qualifiers, etc all contribute to the identity of something where a mismatch will impact the correctness of the program. But things like optimization hints and diagnostic hints don't seem like they contribute to the identity of something because they don't impact correctness.

vsapsai updated this revision to Diff 467564.Oct 13 2022, 12:14 PM
  • Small fixes to address review comments.
  • Try to make diagnostics more understandable.
  • Check attributes on typedefs.
vsapsai marked 3 inline comments as done.Oct 13 2022, 12:30 PM

Given all the discussion about which attributes should be added to ODR hash, I think it is useful at this point to have TableGen infrastructure to get this information from Attr.td. So I'll work on that.

clang/lib/AST/ODRDiagsEmitter.cpp
159

Typedefs are done. But while adding support for them I've realized we don't check method parameters. So it requires a little bit more work.

341–358

There are 2 separate cases to improve the diagnostic. In the first case we'd have

... first difference is definition in module 'FirstModule' found attribute 'enum_extensibility'
attribute specified here
but in 'SecondModule' found no attribute

And if we reach the second ones, it implies the kind of attributes is the same and the only difference is attribute arguments, so the diagnostics are like

first difference is definition in module 'FirstModule' found attribute ' __attribute__((enum_extensibility("open")))'
attribute specified here
but in 'SecondModule' found different attribute argument ' __attribute__((enum_extensibility("closed")))'
attribute specified here

From my limited experience, it is actually useful to have more details than trying to figure out the difference in attributes.

vsapsai updated this revision to Diff 467570.Oct 13 2022, 12:32 PM

Fix the starting point of the diff.

vsapsai marked an inline comment as done.Oct 13 2022, 12:34 PM
vsapsai added inline comments.
clang/include/clang/Basic/DiagnosticASTKinds.td
852 ↗(On Diff #466133)

It should be easier to understand now (easier doesn't mean easy).

ChuanqiXu accepted this revision.Oct 13 2022, 7:26 PM

LGTM basically. Our discussion is mainly about the future work in Attr.td and not this patch. So I think they are not blocking issues.

clang/lib/AST/ODRDiagsEmitter.cpp
341–358

Agreed. I just wanted to say the codes can be further be shorten by making the diagnostic kinds contains more cases:

%select {
|
|
...
}

We have some such examples before. But this is not required. Since it actually moves the complexity from the source codes to the table gen. And the implementation doesn't look bad.

clang/lib/AST/ODRHash.cpp
479–480

I got your point. But from my developing experience, I want to say the optimization hints matters too. And I am wondering how about checking the ODR for all attributes. It would emit error if the attributes that affecting the correctness mismatches. And if the other attributes mismatches, a warning enough. (If we worry about the compile time performance, I feel it won't take too long. And we can have an option to control it actually.)

This revision is now accepted and ready to land.Oct 13 2022, 7:26 PM

To fix pre-merge errors like

error: 'error' diagnostics seen but not expected:
  File .../clang/test/Modules/Output/odr_hash.cpp.tmp/Inputs/first.h Line 3797: ... found 'AlignedDoublePtr' with attribute ' __attribute__((align_value(0xb7dc9b8)))'

posted D135931.

Given all the discussion about which attributes should be added to ODR hash, I think it is useful at this point to have TableGen infrastructure to get this information from Attr.td. So I'll work on that.

Thank you, I think that's a good approach for the moment. Once you have the basic framework in place, we can opt in the uncontroversial attributes (like, I think anything inheriting from TypeAttr or DeclOrTypeAttr should be automatically opted in and any Argument whose Fake flag is set should not contribute to the ODR hash) and then we can do a follow-up to figure out how to distinguish "definitely an ODR violation" attributes from "not an ODR violation but still something we want to find a way to alert users about".

clang/lib/AST/ODRHash.cpp
479–480

I got your point. But from my developing experience, I want to say the optimization hints matters too.

That's why I'd like to have an idea about what our policy is. To me, ODR hashing should be restricted to finding ODR violations. It sounds like you'd like ODR hashing to also optionally find other kinds of differences that aren't necessarily ODR violations but still surprises nonetheless. I think that's reasonable, but I think we should ensure the diagnostics distinguish between "this is definitely UB" and "this might not have the optimization or diagnostic properties you expect but is still correct".

And I am wondering how about checking the ODR for all attributes. It would emit error if the attributes that affecting the correctness mismatches. And if the other attributes mismatches, a warning enough. (If we worry about the compile time performance, I feel it won't take too long. And we can have an option to control it actually.)

I don't think we should ODR-check all attributes blindly. There's no ODR violation for one function having [[nodiscard]] in a TU and another function not having it. In fact, attributes are sometimes used additively so that you can put extra constraints locally that aren't there globally. e.g., I've seen real code doing:

// Header.h
int some_func();

// Source.cpp
#include "Header.h"

// We've had too many bugs from people ignoring the results,
// which really matters in this particular file. Redeclare the API
// with extra diagnostic checking for this TU.
[[nodiscard]] int some_func();

...

It shouldn't cause ODR violation diagnostics if some_func() is defined in another source file without the attribute.

ChuanqiXu added inline comments.Oct 14 2022, 9:26 AM
clang/lib/AST/ODRHash.cpp
479–480

Oh, yeah, you're right. Let's try to make the policy clearly when we started to it further.

vsapsai updated this revision to Diff 470915.Oct 26 2022, 1:22 PM

Rebase and diagnose attribute mismatch for function parameters.