This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo][CodeView] Change in line tables only mode to emit parent/context scopes for functions, using declarations for types
ClosedPublic

Authored by akhuang on Jan 13 2021, 3:23 PM.

Details

Summary

In line-tables-only mode, we used to emit qualified names as the display name for functions when using CodeView.
This patch changes to emitting the parent scopes instead, with forward declarations for class types.
Also remove the linkage name in line-tables-only to save size.
The total object file size ends up being slightly smaller than if we use the full qualified names.

Diff Detail

Event Timeline

akhuang created this revision.Jan 13 2021, 3:23 PM
akhuang requested review of this revision.Jan 13 2021, 3:23 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 13 2021, 3:23 PM

What parts of this are motivated by CodeView requirements (do functions have to have unique names in CV?)?
It'd be a bit unfortunate if we have divergence in -gmlt behavior between DWARF and CodeView due to different usage requirements, rather than format requirements - gmlt in DWARF I think uses only the unqualified name, though eventually combined with -fdebug-info-for-profiling which put the mangled name on such functions to make it clear for profilers. (though the unqualified name only behavior is still handy for the sanitizers - where the unqualified/ambiguous name is a "good enough" point for size V accuracy)

What parts of this are motivated by CodeView requirements (do functions have to have unique names in CV?)?
It'd be a bit unfortunate if we have divergence in -gmlt behavior between DWARF and CodeView due to different usage requirements, rather than format requirements - gmlt in DWARF I think uses only the unqualified name, though eventually combined with -fdebug-info-for-profiling which put the mangled name on such functions to make it clear for profilers. (though the unqualified name only behavior is still handy for the sanitizers - where the unqualified/ambiguous name is a "good enough" point for size V accuracy)

This was kind of motivated by the issue in https://bugs.llvm.org/show_bug.cgi?id=48432, even though this patch doesn't solve that particular bug. I guess the general issue is that in CV the only way to make the functions unique is by their display name or type. So previously we were going in the direction of differentiating functions by the display name (like using the qualified name), but it seems like it's more size-efficient to use types instead, which is what this patch does.

What parts of this are motivated by CodeView requirements (do functions have to have unique names in CV?)?
It'd be a bit unfortunate if we have divergence in -gmlt behavior between DWARF and CodeView due to different usage requirements, rather than format requirements - gmlt in DWARF I think uses only the unqualified name, though eventually combined with -fdebug-info-for-profiling which put the mangled name on such functions to make it clear for profilers. (though the unqualified name only behavior is still handy for the sanitizers - where the unqualified/ambiguous name is a "good enough" point for size V accuracy)

This was kind of motivated by the issue in https://bugs.llvm.org/show_bug.cgi?id=48432, even though this patch doesn't solve that particular bug. I guess the general issue is that in CV the only way to make the functions unique is by their display name or type. So previously we were going in the direction of differentiating functions by the display name (like using the qualified name), but it seems like it's more size-efficient to use types instead, which is what this patch does.

OK - thanks for walking me through it, that sounds good to me.

Might be worth rephrasing from "emit type information for function scopes" to "emit parent/context scopes for functions, using declarations for any scopes that are types". (my own fault, but reading this quickly, I assumed it was about emitting type information for the parameters (DWARF gmlt drops the parameters/types))

How does any of this deal with overloading? I guess for either solution (qualified name or real scopes) you have to include all the parameter type info too to avoid the functions being treated as identical/duplicate by CV?

What parts of this are motivated by CodeView requirements (do functions have to have unique names in CV?)?

Might be worth rephrasing from "emit type information for function scopes" to "emit parent/context scopes for functions, using declarations for any scopes that are types". (my own fault, but reading this quickly, I assumed it was about emitting type information for the parameters (DWARF gmlt drops the parameters/types))

That sounds good

How does any of this deal with overloading? I guess for either solution (qualified name or real scopes) you have to include all the parameter type info too to avoid the functions being treated as identical/duplicate by CV?

Yep, this patch doesn't deal with overloading or lambdas. For lambdas I have a separate patch, and for overloading I haven't really thought much about yet. Seems like we'd have to include all the parameter type info somehow (or include all the parameters in the display name).

akhuang retitled this revision from [DebugInfo][CodeView] Change in line tables only mode to emit type information for function scopes, rather than using the qualified name. to [DebugInfo][CodeView] Change in line tables only mode to emit parent/context scopes for functions, using declarations for types.Jan 13 2021, 4:46 PM
akhuang edited the summary of this revision. (Show Details)
rnk added a comment.Jan 13 2021, 5:42 PM

How does any of this deal with overloading? I guess for either solution (qualified name or real scopes) you have to include all the parameter type info too to avoid the functions being treated as identical/duplicate by CV?

Yep, this patch doesn't deal with overloading or lambdas. For lambdas I have a separate patch, and for overloading I haven't really thought much about yet. Seems like we'd have to include all the parameter type info somehow (or include all the parameters in the display name).

Yes, I encouraged Amy to approach this incrementally:

  • emit forward declarations for types used in function scopes (this patch)
  • make lambda type names more unique (next)
  • try discarding the linkage name from our type info to save size
  • add overload type information

In particular, I wanted to measure the size impact of adding function type info in isolation. I suspect that the type info we need to make overload signatures unique is very small. My theory is that most debug info size comes from complete descriptions of class types.

rnk added inline comments.Jan 13 2021, 5:48 PM
clang/lib/CodeGen/CGDebugInfo.cpp
1050–1052

I see Amy included the linkage name step as part of this patch, so disregard that comment.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
370–373 ↗(On Diff #316525)

Do we still need the LLVM part of this? We needed it before because we were operating on strings like foo::<unnamed-tag>::operator(). Can we leave it as is for now, or handle it as a separate patch?

(I'm generally good with this - llvm & clang changes should be committed separately & maybe the linkage part too)

I'll leave the rest of the review/final approval up to @rnk

akhuang added inline comments.Jan 14 2021, 1:59 PM
clang/lib/CodeGen/CGDebugInfo.cpp
1050–1052

oops, didn't realize that I included it here. Anyway, I can commit this separately.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
370–373 ↗(On Diff #316525)

Oh, right. I don't think we need this anymore..

akhuang updated this revision to Diff 316779.Jan 14 2021, 2:02 PM
akhuang edited the summary of this revision. (Show Details)

Remove llvm change.

akhuang edited the summary of this revision. (Show Details)Jan 14 2021, 2:49 PM
rnk accepted this revision.Jan 14 2021, 5:24 PM

lgtm

(I'm generally good with this - llvm & clang changes should be committed separately & maybe the linkage part too)

I'll leave the rest of the review/final approval up to @rnk

Thanks for checking the direction, David, I felt it was worth getting your input.

This revision is now accepted and ready to land.Jan 14 2021, 5:24 PM
SeTSeR added a subscriber: SeTSeR.Jul 30 2021, 12:13 PM

Hello! As far as I understand, Clang does not preserve info about namespace for functions declared in namespaces with -gline-numbers-only flag. Is there any way to retrieve this info for DWARF? I thought FIXME at line 261 in CGDebugInfo was addressing exactly this issue.

rnk added a comment.Jul 30 2021, 12:55 PM

Hello! As far as I understand, Clang does not preserve info about namespace for functions declared in namespaces with -gline-numbers-only flag. Is there any way to retrieve this info for DWARF? I thought FIXME at line 261 in CGDebugInfo was addressing exactly this issue.

No, I don't think there is a way to achieve this.

The current -gcodeview -gline-tables-only behavior as I understand it is to emit regular debug info, but skip two key source of info:

  • never emit complete class declarations, always use forward decls
  • never emit variable locations, only emit source locations

For DWARF, I think we also try to skip namespace scopes, class forward decls, function types, and other things like that, but I forget the details.

I happen to know that many users of -gline-tables-only are very space constrained: we can't simply port this change from CV to DWARF. We'd have to add a new flag. The bar for such a flag is high. If someone provides a well-motivated use case, we could consider it.

In D94639#2917445, @rnk wrote:

Hello! As far as I understand, Clang does not preserve info about namespace for functions declared in namespaces with -gline-numbers-only flag. Is there any way to retrieve this info for DWARF? I thought FIXME at line 261 in CGDebugInfo was addressing exactly this issue.

No, I don't think there is a way to achieve this.

The current -gcodeview -gline-tables-only behavior as I understand it is to emit regular debug info, but skip two key source of info:

  • never emit complete class declarations, always use forward decls
  • never emit variable locations, only emit source locations

For DWARF, I think we also try to skip namespace scopes, class forward decls, function types, and other things like that, but I forget the details.

I happen to know that many users of -gline-tables-only are very space constrained: we can't simply port this change from CV to DWARF. We'd have to add a new flag. The bar for such a flag is high. If someone provides a well-motivated use case, we could consider it.

Thank you for the detailed answer. I'll discuss our use case with our team. Should I create a separate ticket for this? Or maybe it would be better if I submitted the PR adding this flag?

rnk added a comment.Aug 4 2021, 9:25 AM

Thank you for the detailed answer. I'll discuss our use case with our team. Should I create a separate ticket for this? Or maybe it would be better if I submitted the PR adding this flag?

It's free to upload patches, and they help drive discussion forward, but I think the main thing is to be really clear about what problem this solves and what the costs are. An easy metric for the cost would be the binary size of clang with -g1 and how much it increases in the new mode.