This is an archive of the discontinued LLVM Phabricator instance.

Implement llvm.commandline named metadata
ClosedPublic

Authored by scott.linder on Nov 13 2018, 12:06 PM.

Details

Summary

Implement a new special named metadata, llvm.commandline, to support frontends embedding their command-line options in IR/ASM/code-objects. Specifically this supports adding the -frecord-gcc-switches option to Clang.

This differs from the GCC implementation in a few ways, and I am looking for feedback on whether these differences are reasonable/correct.

In GCC there is only one command-line possible per compilation-unit, in LLVM it mirrors llvm.ident and multiple are allowed.

For the ELF section (".GCC.command.line"), in the case of GCC the options are null-terminated (e.g. '-foo\0-bar\0-baz\0'); in LLVM, in order to differentiate multiple merged command-lines, the null byte instead separates entire command-lines (e.g. '\0compiler1 -foo -bar -baz\0compiler2 -qux\0'). This is the biggest departure in terms of implementation, but in the face of section merging I don't understand how the GCC approach works in any meaningful way. The advantage I see to the GCC approach is unambiguously separating individual arguments without relying on shell-style parsing.

Diff Detail

Repository
rL LLVM

Event Timeline

scott.linder created this revision.Nov 13 2018, 12:06 PM

I know you are just trying to match gcc behavior, but do you know why they use SHT_PROGBITS instead of something like SHT_NOTE for the information? On OpenVMS, we use .note entries to record compilation date/time, compiler version, command line, etc. information that is later used by our linker to print out detail link maps. Or is the intention that the information get propagated all the way to the final executable or shared library?

I know you are just trying to match gcc behavior, but do you know why they use SHT_PROGBITS instead of something like SHT_NOTE for the information? On OpenVMS, we use .note entries to record compilation date/time, compiler version, command line, etc. information that is later used by our linker to print out detail link maps. Or is the intention that the information get propagated all the way to the final executable or shared library?

I don't have a good answer as to why GCC decided to use SHT_PROGBITS, and you're correct that I simply followed their lead. I think SHT_NOTE seems more appropriate, although at AMD we do want to propagate the section to the final executable/shared object; are the two incompatible? It seems fine if tools like strip remove the section.

I don't really see a need for a new directive, it's convenient but does nothing you couldn't do with existing directives.
It does have some downside, in that it's supported only by the LLVM integrated assembler.

I don't really see a need for a new directive, it's convenient but does nothing you couldn't do with existing directives.
It does have some downside, in that it's supported only by the LLVM integrated assembler.

That's very reasonable; I will update the patch to remove the new directive.

Doesn't clang already has -grecord-gcc-switches option (https://clang.llvm.org/docs/ClangCommandLineReference.html#debug-information-flags) ?

Yes, but the -g and -f variants have different purposes. This patch adds the -f version to embed the flags in e.g. an ELF section, rather than the -g version which embeds it in e.g. DWARF.

I know you are just trying to match gcc behavior, but do you know why they use SHT_PROGBITS instead of something like SHT_NOTE for the information? On OpenVMS, we use .note entries to record compilation date/time, compiler version, command line, etc. information that is later used by our linker to print out detail link maps. Or is the intention that the information get propagated all the way to the final executable or shared library?

I don't have a good answer as to why GCC decided to use SHT_PROGBITS, and you're correct that I simply followed their lead. I think SHT_NOTE seems more appropriate, although at AMD we do want to propagate the section to the final executable/shared object; are the two incompatible? It seems fine if tools like strip remove the section.

Got it. Yes, SHT_NOTEs probably don't survive into the final image so SHT_PROGBITS is what was chosen. I wonder if the linker special cases on the section name to keep it from being part of a load section? Seems like a waste to read that stuff into memory at image activation time.

Doesn't clang already has -grecord-gcc-switches option (https://clang.llvm.org/docs/ClangCommandLineReference.html#debug-information-flags) ?

Yes, but that is somewhat different. The "-g" option seems to just append the command line to the producer string, yes? It is harder to parse out later and it can be stripped out. gcc has both -g and -f forms.

Yes, but that is somewhat different. The "-g" option seems to just append the command line to the producer string, yes? It is harder to parse out later and it can be stripped out. gcc has both -g and -f forms.

Thanks, I was confused given the similarity of option strings.

Remove new ".commandline" ASM directive in favor of standard ones (e.g. .section, .ascii)

Got it. Yes, SHT_NOTEs probably don't survive into the final image so SHT_PROGBITS is what was chosen. I wonder if the linker special cases on the section name to keep it from being part of a load section? Seems like a waste to read that stuff into memory at image activation time.

As far as I understand it the SHT_PROGBITS won't cause the section to be loaded if it is not also SHF_ALLOC, but I have not actually confirmed this.

scott.linder edited the summary of this revision. (Show Details)Nov 27 2018, 11:15 AM

I'm not certain who else might want to review this, but I'm adding some who have reviewed the touched files recently.

I'll ask the larger question of why did you pick ".LLVM.command.line" as the section name? While distasteful, there might be existing scripts that do things like:

readelf -p .GCC.command.line a.out

to dump out the arguments like

String dump of section '.GCC.command.line':

[     0]  a.c
[     4]  -mtune=generic
[    13]  -march=x86-64
[    21]  -O2
[    25]  -frecord-gcc-switches

Yes, the format of contents is not well-defined (as mentioned in the gcc man pages), but for somebody switching from a gcc-based toolchain to an LLVM-based toolchain, would they expect the sections to have the same names?

Your description says this would allow clang to add -frecord-gcc-switches (and not -frecord-llvm-switches) so would someone expect that clang would follow the module and create a '.GCC.command.line' section name?

Or perhaps also create a .GCC.command.line section that would contain 'see .LLVM.command.line for additional command line descriptions' so at least existing scripts might not get tripped up?

Or am I overthinking this from working in a high "upward-compatibility, don't break anything" environment for 30+ years?

I'll ask the larger question of why did you pick ".LLVM.command.line" as the section name? While distasteful, there might be existing scripts that do things like:

readelf -p .GCC.command.line a.out

to dump out the arguments like

String dump of section '.GCC.command.line':

[     0]  a.c
[     4]  -mtune=generic
[    13]  -march=x86-64
[    21]  -O2
[    25]  -frecord-gcc-switches

Yes, the format of contents is not well-defined (as mentioned in the gcc man pages), but for somebody switching from a gcc-based toolchain to an LLVM-based toolchain, would they expect the sections to have the same names?

Your description says this would allow clang to add -frecord-gcc-switches (and not -frecord-llvm-switches) so would someone expect that clang would follow the module and create a '.GCC.command.line' section name?

Or perhaps also create a .GCC.command.line section that would contain 'see .LLVM.command.line for additional command line descriptions' so at least existing scripts might not get tripped up?

Or am I overthinking this from working in a high "upward-compatibility, don't break anything" environment for 30+ years?

Looking at it again I think I agree that changing the name is not as useful as I had intended, and probably does more harm than it is worth. I originally wanted to avoid breaking the "ABI" of the section, for tools which go beyond inspecting it visually and rely on a certain binary layout (as that layout is changing), but if that format is not well-defined anyway I think preserving the name is reasonable. I will update both patches, and will be explicit in the Clang docs about the change in format.

scott.linder edited the summary of this revision. (Show Details)

Use the same ELF section name as GCC

Does the verifier bail out after the first failure? If not, you can combine the failure tests all into one test.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2021 ↗(On Diff #177579)

You can reduce some indentation by consistently using the early-return idiom:

const NamedMDNode *NMD = ...;
if (!NMD || !NMD->getNumOperands())
  return;
// rest of function
test/Linker/commandline.ll
8 ↗(On Diff #177579)

You can make sure that each of the strings has one of the expected MD numbers:

CHECK-DAG: !{{[0-2]}} = !{!"compiler -v1"}

etc.

test/Verifier/commandline-meta1.ll
7 ↗(On Diff #177579)

I think it would be ok (and more obvious) if you omit the valid entry.

test/Verifier/commandline-meta2.ll
9 ↗(On Diff #177579)

Simpler and more obvious if you omit the valid entries.

JohnReagan added inline comments.Dec 12 2018, 11:23 AM
lib/CodeGen/TargetLoweringObjectFileImpl.cpp
799 ↗(On Diff #177579)

So now that we've switched to a section name with GCC, do you want a comment so any future contributor won't be confused? If you only look at the code without these review comments (or the corresponding clang edits), you might wonder about the overall design goals. Something like

// Use ".GCC.command.line" since this feature is to support 
// clang's -frecord-gcc-switches which in turn attempts to 
// mimic GCC's switch of the same name.
scott.linder marked 5 inline comments as done.

Address feedback. The verifier appears to bail on the first failure, at least with llc/llvm-as.

LGTM although I think John should have a chance to get in a last word.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 14 2018, 7:41 AM
This revision was automatically updated to reflect the committed changes.
MaskRay added inline comments.
cfe/trunk/test/Driver/debug-options.c
161

As the continuation line doesn't use RUN:, this command will always fail. We don't detect this error because -### subsumes many errors and return with 0. Caught by D156363

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 26 2023, 1:42 PM
Herald added a subscriber: pengfei. · View Herald Transcript
Matt added a subscriber: Matt.Aug 9 2023, 9:24 PM