This is an archive of the discontinued LLVM Phabricator instance.

PR50767: clear non-distinct debuginfo for function with nodebug definition after undecorated declaration
ClosedPublic

Authored by brunodefraine on Jun 23 2021, 4:30 AM.

Details

Summary

Fix suggested by Yuanfang Chen:

Non-distinct debuginfo is attached to the function due to the undecorated declaration. Later, when seeing the function definition and nodebug attribute, the non-distinct debuginfo should be cleared.

Diff Detail

Event Timeline

brunodefraine requested review of this revision.Jun 23 2021, 4:30 AM
brunodefraine created this revision.

Fix issues from Windows/clang-format buildbot. Fix mistake in code comment.

@aaron.ballman Do we have attributes/infrastructure for attributes that have to be the same from their first declaration or at least from their first call? I'm wondering if it might be simpler/better to require nodebug to be provided earlier, rather than fixing it up after the fact like this.

(for instance, requiring it to be attributed on the declaration would ensure we don't create call_site descriptions for calls to nodebug functions - which would save some debug info size)

@aaron.ballman Do we have attributes/infrastructure for attributes that have to be the same from their first declaration or at least from their first call? I'm wondering if it might be simpler/better to require nodebug to be provided earlier, rather than fixing it up after the fact like this.

(for instance, requiring it to be attributed on the declaration would ensure we don't create call_site descriptions for calls to nodebug functions - which would save some debug info size)

We don't have any attributes that have to be the same "from their first call" that I'm aware of. But we do have attributes that need to be the same on all declarations. IIRC, the section and code_seg attributes both have to be specified on the initial declaration in some circumstances, and [[noreturn]] needs to be on the first declaration we see. We don't have any tablegen machinery to enforce this, the logic needs to be manually written. I think it'd live somewhere around Sema::mergeDeclAttributes() or Sema::MergeFunctionDecl() most likely.

Fixing up after the fact would be a bit strange though. A declaration is typically considered marked by an attribute from the point of declaration including the attribute onward, not backwards. e.g., https://godbolt.org/z/4E1Ya764n I won't say we *never* do this back propagation, but I would say it's a sign that the attribute is venturing into novel, perhaps dangerous territory. For example with this patch, an AST matcher won't see the nodebug attribute on the initial declaration of t1() and so it won't react to it the same way that codegen is being changed.

@aaron.ballman Do we have attributes/infrastructure for attributes that have to be the same from their first declaration or at least from their first call? I'm wondering if it might be simpler/better to require nodebug to be provided earlier, rather than fixing it up after the fact like this.

(for instance, requiring it to be attributed on the declaration would ensure we don't create call_site descriptions for calls to nodebug functions - which would save some debug info size)

We don't have any attributes that have to be the same "from their first call" that I'm aware of. But we do have attributes that need to be the same on all declarations. IIRC, the section and code_seg attributes both have to be specified on the initial declaration in some circumstances, and [[noreturn]] needs to be on the first declaration we see. We don't have any tablegen machinery to enforce this, the logic needs to be manually written. I think it'd live somewhere around Sema::mergeDeclAttributes() or Sema::MergeFunctionDecl() most likely.

Yeah, all that sounds reasonable to me - @brunodefraine could you look into supporting nodebug in a similar way as @aaron.ballman has described here?

Fixing up after the fact would be a bit strange though. A declaration is typically considered marked by an attribute from the point of declaration including the attribute onward, not backwards. e.g., https://godbolt.org/z/4E1Ya764n I won't say we *never* do this back propagation, but I would say it's a sign that the attribute is venturing into novel, perhaps dangerous territory. For example with this patch, an AST matcher won't see the nodebug attribute on the initial declaration of t1() and so it won't react to it the same way that codegen is being changed.

Yeah, fair concerns for sure - thanks for the tips!

Yeah, all that sounds reasonable to me - @brunodefraine could you look into supporting nodebug in a similar way as @aaron.ballman has described here?

Since the debuginfo for use() is slightly affected by the nodebug version of t1() that follows it, I can see how this back propagation is perhaps dangerous. Checking that nodebug is the same on all declarations of a function is a way to prevent this.

But when discussing the PR, @probinson wrote "I'm inclined to think we want this to work" and I can see what he means from the use case where I observed the bug. If you don't want debuginfo for the implementation of t1(), it should be fine to annotate just the function definition in an implementation file, not the declaration in a header, since the debuginfo of the implementation is not of the caller's concern. But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

Yeah, all that sounds reasonable to me - @brunodefraine could you look into supporting nodebug in a similar way as @aaron.ballman has described here?

Since the debuginfo for use() is slightly affected by the nodebug version of t1() that follows it, I can see how this back propagation is perhaps dangerous. Checking that nodebug is the same on all declarations of a function is a way to prevent this.

But when discussing the PR, @probinson wrote "I'm inclined to think we want this to work" and I can see what he means from the use case where I observed the bug. If you don't want debuginfo for the implementation of t1(), it should be fine to annotate just the function definition in an implementation file, not the declaration in a header, since the debuginfo of the implementation is not of the caller's concern. But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

I can see the convenience there, to be sure, being able to put the attribute directly on the function you want to debug - but consistency in how attributes are handled (admitedly this isn't a strong consistency - some are handled this way, some aren't) & consistently seeing the same state for an attribute for a given function seems useful.

@probinson - thoughts?

Yeah, all that sounds reasonable to me - @brunodefraine could you look into supporting nodebug in a similar way as @aaron.ballman has described here?

Since the debuginfo for use() is slightly affected by the nodebug version of t1() that follows it, I can see how this back propagation is perhaps dangerous. Checking that nodebug is the same on all declarations of a function is a way to prevent this.

But when discussing the PR, @probinson wrote "I'm inclined to think we want this to work" and I can see what he means from the use case where I observed the bug. If you don't want debuginfo for the implementation of t1(), it should be fine to annotate just the function definition in an implementation file, not the declaration in a header, since the debuginfo of the implementation is not of the caller's concern. But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

I can see the convenience there, to be sure, being able to put the attribute directly on the function you want to debug - but consistency in how attributes are handled (admitedly this isn't a strong consistency - some are handled this way, some aren't) & consistently seeing the same state for an attribute for a given function seems useful.

@probinson - thoughts?

To me, this is the key bit:

But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

because it does affect the callers, the programmer introducing the API should be aware of that. Making this case an error helps them to understand that this attribute is actually a part of their API and not an implementation detail, and silently applying the attribute may cause hard-to-debug problems for them after deployment of their API.

Yeah, all that sounds reasonable to me - @brunodefraine could you look into supporting nodebug in a similar way as @aaron.ballman has described here?

Since the debuginfo for use() is slightly affected by the nodebug version of t1() that follows it, I can see how this back propagation is perhaps dangerous. Checking that nodebug is the same on all declarations of a function is a way to prevent this.

But when discussing the PR, @probinson wrote "I'm inclined to think we want this to work" and I can see what he means from the use case where I observed the bug. If you don't want debuginfo for the implementation of t1(), it should be fine to annotate just the function definition in an implementation file, not the declaration in a header, since the debuginfo of the implementation is not of the caller's concern. But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

I can see the convenience there, to be sure, being able to put the attribute directly on the function you want to debug - but consistency in how attributes are handled (admitedly this isn't a strong consistency - some are handled this way, some aren't) & consistently seeing the same state for an attribute for a given function seems useful.

@probinson - thoughts?

To me, this is the key bit:

But nodebug as it exists does affect the debuginfo of callers as well, so I cannot really express that I don't want debuginfo for the implementation of a function and leave its callers unaffected?

because it does affect the callers, the programmer introducing the API should be aware of that. Making this case an error helps them to understand that this attribute is actually a part of their API and not an implementation detail, and silently applying the attribute may cause hard-to-debug problems for them after deployment of their API.

Yeah - the effect on callers is fairly minimal when it comes to DWARF. It changes whether DW_TAG_call_sites are emitted for the function - those call sites can help describe tail calls (which isn't nothing - in the absence of this info you can have missing stack frames, I think) but also mostly helps by describing "entry values" (if a caller stashes a value before calling a function, the call site can record where the extra copy was stored - then the callee might be able to describe the value of its parameters in terms of those entry values even if the callee's copy has been clobbered) - that these features would be dropped only in the translation unit that's compiling the implementation of the function, but not in every calling TU, is not a great loss - adding the attribute to the header would remove this extra information from everywhere (but at the expense of a longer build, having to rebuild all the dependencies).

Hmm - yeah, nodebug is more the sort of thing you put on a function because it's not an abstraction you want to debug - so I /think/ it makes sense to put it on the declaration in the header, rather than only on the implementation (I was going to say I thought it'd be the thing someone might do more iteratively/interactively - but that's "optnone" - where you'd want to put it on a function during a test/debug loop and wouldn't want to have to rebuild the whole project because it's more one-off, but "nodebug" isn't intended for that same sort of one-off use case)

Eh, I could see it going either way...

Following suggestion by @dblaikie and @aaron.ballman I'm preparing a new diff where a function redeclaration that adds nodebug is flagged as an error in Sema::mergeDeclAttributes.

This triggers test failures because of violations in builtin header files, for example:

In file included from clang/test/CodeGen/ms-intrinsics.c:18:
build/lib/clang/13.0.0/include/intrin.h:451:24: error: function declared with 'nodebug' attribute was previously declared without the 'nodebug' attribute
static __inline__ void __DEFAULT_FN_ATTRS __movsb(unsigned char *__dst,
                       ^
build/lib/clang/13.0.0/include/intrin.h:37:62: note: expanded from macro '__DEFAULT_FN_ATTRS'
#define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__))
                                                             ^
build/lib/clang/13.0.0/include/intrin.h:80:6: note: previous declaration is here
void __movsb(unsigned char *, unsigned char const *, size_t);
     ^

From what I understand from clang/lib/Headers/intrin.h the functions are declared in general at the beginning of the headers, then implemented for specific architectures as inline function containing assembly. I do not know if it is appropriate to mark the general declarations as nodebug...

That I already encounter this in builtin headers is probably an indication that the new error will break some code in the wild. A pragmatic solution would be to only raise the error in Sema::mergeDeclAttributes if Old->isUsed(), i.e. an earlier declaration without nodebug can be forgiven if not yet used. I think that would still make it impossible to trigger the crash?

dblaikie accepted this revision.Jun 28 2021, 8:45 AM

That I already encounter this in builtin headers is probably an indication that the new error will break some code in the wild.

Fair point, breaking backwards compatibility isn't great.

A pragmatic solution would be to only raise the error in Sema::mergeDeclAttributes if Old->isUsed(), i.e. an earlier declaration without nodebug can be forgiven if not yet used. I think that would still make it impossible to trigger the crash?

I'm not too fussed about the original patch - just figured it might be nicer/cleaner to require a consistent view of the attribute value. I'm not sure it's worth splitting hairs on "it can be inconsistent but only in this narrow way".

I can appreciate that only invalidating the cases that were crashes before avoids increasing the feature surface area more than necessary - but I'm not sure it makes for a more ergonomic feature. (though, equally - the idea that adding a definition to a translation unit would modify the DWARF produced for calls to the function that were previously fine is weird too - but I guess we already had that property if you added the definition at the start of the translation unit)

Let's just go with the original patch then - thanks for your patience/working through the alternatives!

This revision is now accepted and ready to land.Jun 28 2021, 8:45 AM

Slightly improved test case (both C and C++).