This is an archive of the discontinued LLVM Phabricator instance.

Set AnonymousTagLocations false for ASTContext if column info is expected not to be used
Needs RevisionPublic

Authored by twoh on Sep 19 2017, 4:31 PM.

Details

Summary

Currently, location (line, column number) information of anonymous types are printed even when the column location infomation is expected to be turned off. These information are even included in the final binary as strings, and make clang to generate different output when preprocessed source file is given as an input instead of the original source code (because of the macro expansion). This inhibits the use of clang with a build system that supports both distcc and caching, as distcc feeds preprocessed source file to remote build machines.

This patch turns off location information of anonymous types if -debug-column-info is not provided. I renamed -dwarf-column-info to -debug-column-info as it is not only for dwarf now (and the corresponding codegen option name was DebugColumnInfo, as shown in lib/Frontend/CompilerInvocation.cpp:521).

Diff Detail

Event Timeline

twoh created this revision.Sep 19 2017, 4:31 PM
twoh edited the summary of this revision. (Show Details)Sep 19 2017, 6:56 PM
twoh added reviewers: echristo, doug.gregor.
twoh added a subscriber: llvm-commits.
twoh added a comment.Sep 27 2017, 11:30 AM

Ping, Thanks!

twoh added a comment.Nov 1 2017, 1:29 PM

Ping. Thanks!

Meta comment: please split the rename out. I don't mind the rename, but it makes the change much harder to review.

Needs a testcase for the actual behavior you're trying to ensure here.

Thanks!

twoh updated this revision to Diff 164564.Sep 7 2018, 10:54 PM
twoh edited the summary of this revision. (Show Details)

Addressing comments from @echristo. Reverted option name change, and added a test case. Sorry I haven't work on this code for a while so it took time to invent a test case.

I think Adrian has looked at this more recently than I have. Adding him here.

I have a vague recollection that this column info hack was added to disambiguate two types defined on the same line (which is something that happened more often than one would think because of macro expansion). Did you do the git archeology to ensure that the original reason for doing it this way has been resolved? FWIW, I'm fine with doing this change for the Darwin platforms because column info is turned on by default, so this shouldn't affect us.

Oh wait, this patch is just for dumping the ASTs? Can you elaborate why this makes it into a binary then?

twoh added a comment.Oct 22 2018, 10:47 AM

@aprantl It is a debug info. If you compile test/CodeGenCXX/debug-info-anonymous.cpp file with clang -g2 -emit-llvm -S , you will find debug metadata something like distinct !DISubprogram(name: "foo<X::(anonymous enum at /home/twoh/llvms/llvm/tools/clang/test/CodeGenCXX/debug-info-anonymous.cpp:9:3)>", linkageName: "_Z3fooIN1XUt_EEiT_" ..., which will eventually be included in .debug_info section.

@aprantl It is a debug info. If you compile test/CodeGenCXX/debug-info-anonymous.cpp file with clang -g2 -emit-llvm -S , you will find debug metadata something like distinct !DISubprogram(name: "foo<X::(anonymous enum at /home/twoh/llvms/llvm/tools/clang/test/CodeGenCXX/debug-info-anonymous.cpp:9:3)>", linkageName: "_Z3fooIN1XUt_EEiT_" ..., which will eventually be included in .debug_info section.

For comparison, GCC names these "foo<X::<anonymous enum> >" - this is somewhat of a compatibility problem, since the template function definition's names won't match between GCC and Clang, but I guess this is by far not the only instance of that. GCC keeps that naming scheme even when it's clearly ambiguous (ie: it's not just making a short name when there's no other X::anonymous enum, it does it even when there's two of them, etc)

twoh added a comment.Oct 25 2018, 8:34 AM

@dblaikie I see. The problem we're experiencing is that with Clang's naming scheme we end up having different function name between the original source and the preprocessed source (as macro expansion changes the column number). This doesn't work well for me if I want to use distcc on top of our caching system, as the caching scheme expects the output to be same as far as the original source has not been changed (regardless of it's compiled directly or first preprocessed then compiled), but the distcc preprocesses the source locally then sends it to the remote build machine (and we do not turn distcc on for all workflow). I wonder if you have any suggestion to resolve this issue? Thanks!

@dblaikie I see. The problem we're experiencing is that with Clang's naming scheme we end up having different function name between the original source and the preprocessed source (as macro expansion changes the column number).

I imagine that's not achievable, though - as soon as you have a multi-line macro at least, it seems like it'd be difficult to preserve (I guess you could put loc directives between every line in the macro to ensure every line was attributed to the line the macro was written on, maybe?)

This doesn't work well for me if I want to use distcc on top of our caching system, as the caching scheme expects the output to be same as far as the original source has not been changed (regardless of it's compiled directly or first preprocessed then compiled), but the distcc preprocesses the source locally then sends it to the remote build machine (and we do not turn distcc on for all workflow). I wonder if you have any suggestion to resolve this issue? Thanks!

Best thing to do, I think, if possible, is teach the distributed build system not to fully preprocess but to use something like Clang's -frewrite-includes - this handles includes, but leaves macro definitions and uses in-place. This would preserve Clang's warning semantics (where macros can help inform Clang's diagnostic choices, improving diagnostic quality (lowering false positives, etc)) and other things like debug info.

rsmith requested changes to this revision.Oct 25 2018, 12:20 PM
rsmith added a subscriber: rsmith.
rsmith added inline comments.
lib/Frontend/CompilerInstance.cpp
491–494

It's not reasonable for a debug info flag to affect the printing policy used throughout the compiler.

This revision now requires changes to proceed.Oct 25 2018, 12:20 PM
twoh added a comment.Oct 29 2018, 11:17 AM

@rsmith @dblaikie Thank you for the comments! It seems that this is not the appropriate way to handle the issue. I'll find different way to resolve the problem.

twoh added a subscriber: wenlei.Apr 22 2019, 10:22 AM

Hello @rsmith, @wenlei and I took another look at this, and we couldn't find any use of AnonymousTagLocations outside of debug info. If that's actually the case, wouldn't it make sense to have DebugColumnInfo to control the field even if AnonymousTagLocations the part of generic printing policy?

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2019, 10:22 AM
Herald added a subscriber: dexonsmith. · View Herald Transcript
twoh added a comment.Apr 30 2019, 8:42 AM

Friendly ping for comments. Thanks!

Seems like the right thing would be for the DWARF code that wants a rendered type name to pass its own printing policy, rather than changing some relatively global one.

(though also I have my doubts about the whole approach - macro expansion can change the line number as well as the column number, so only suppressing column number would be insufficient - and this also reduces the usability for users (because the file/line/column of an anonymous type is a useful debugging aid). Seems to me like this should be opt-in if it's supported at all - though ideally build systems would use -frewrite-includes rather than full preprocessing, and then macros and lines/columns would be preserved, I think)

Seems like the right thing would be for the DWARF code that wants a rendered type name to pass its own printing policy, rather than changing some relatively global one.

(though also I have my doubts about the whole approach - macro expansion can change the line number as well as the column number, so only suppressing column number would be insufficient - and this also reduces the usability for users (because the file/line/column of an anonymous type is a useful debugging aid). Seems to me like this should be opt-in if it's supported at all - though ideally build systems would use -frewrite-includes rather than full preprocessing, and then macros and lines/columns would be preserved, I think)

Line number is ok because preprocessor also inserts #linemarker, and later on these markers are used to recover the original line number before macro expansion. Column is problematic, as there's nothing like a column marker (if possible at all) so there's no way to see through the column change from macro expansion. We tried -frewrite-includes too, but it has its own issues - without macro expansion, it bloats the file size that needs to send to distcc remote machine significantly.

Seems like the right thing would be for the DWARF code that wants a rendered type name to pass its own printing policy, rather than changing some relatively global one.

(though also I have my doubts about the whole approach - macro expansion can change the line number as well as the column number, so only suppressing column number would be insufficient - and this also reduces the usability for users (because the file/line/column of an anonymous type is a useful debugging aid). Seems to me like this should be opt-in if it's supported at all - though ideally build systems would use -frewrite-includes rather than full preprocessing, and then macros and lines/columns would be preserved, I think)

Line number is ok because preprocessor also inserts #linemarker, and later on these markers are used to recover the original line number before macro expansion. Column is problematic, as there's nothing like a column marker (if possible at all) so there's no way to see through the column change from macro expansion. We tried -frewrite-includes too, but it has its own issues - without macro expansion, it bloats the file size that needs to send to distcc remote machine significantly.

Do you have some data on that growth/size comparison & its overall performance impact on the build? I'd be curious.

But, yeah, fair point about the line markers - I'd thought a multiline macro would break that, but seems not (it gets stamped out by the preprocessor all on one line anyway).

Still - figure this should be opt-in and done at the debug info level, not by changing a global printing policy. (& I'd still suggest using -frewrite-includes because Clang changes its diagnostic behavior based on the presence of macros to reduce diagnostic false positives, so you'll be degrading the usability in other ways by compiling fully preprocessed code)