This is an archive of the discontinued LLVM Phabricator instance.

Implement -frecord-command-line (-frecord-gcc-switches)
ClosedPublic

Authored by scott.linder on Nov 13 2018, 12:08 PM.

Details

Reviewers
rjmccall
Summary

See the parent revision https://reviews.llvm.org/D54487 for differences between this implementation and the equivalent GCC option.

Diff Detail

Event Timeline

scott.linder created this revision.Nov 13 2018, 12:08 PM
troyj added a subscriber: troyj.Nov 13 2018, 12:39 PM

I realize that you're probably striving for option compatibility with gcc, but continuing to name it -frecord-gcc-switches when it actually records Clang switches seems weird to me. It almost sounds like something that would dump gcc equivalents of all Clang options, or maybe let you know which Clang options you've used that match gcc options. Either way, by the name -- if you aren't familiar with the gcc option -- it doesn't read like it records Clang options.

Would it be that bad to name it -frecord-clang-switches? Or just -frecord-switches?

I realize that you're probably striving for option compatibility with gcc, but continuing to name it -frecord-gcc-switches when it actually records Clang switches seems weird to me. It almost sounds like something that would dump gcc equivalents of all Clang options, or maybe let you know which Clang options you've used that match gcc options. Either way, by the name -- if you aren't familiar with the gcc option -- it doesn't read like it records Clang options.

Would it be that bad to name it -frecord-clang-switches? Or just -frecord-switches?

I agree, and this was my original plan, but then I noticed that Clang already implements -grecord-gcc-switches and so I decided to mirror the naming for the -f variant as well.

If anything I think dropping the -gcc- altogether would make the most sense. I don't understand why GCC includes it in the first place.

I realize that you're probably striving for option compatibility with gcc, but continuing to name it -frecord-gcc-switches when it actually records Clang switches seems weird to me. It almost sounds like something that would dump gcc equivalents of all Clang options, or maybe let you know which Clang options you've used that match gcc options. Either way, by the name -- if you aren't familiar with the gcc option -- it doesn't read like it records Clang options.

Would it be that bad to name it -frecord-clang-switches? Or just -frecord-switches?

I agree, and this was my original plan, but then I noticed that Clang already implements -grecord-gcc-switches and so I decided to mirror the naming for the -f variant as well.

If anything I think dropping the -gcc- altogether would make the most sense. I don't understand why GCC includes it in the first place.

Well, having something in the name that make it clear that this is about command line switches and not switch statements seems sensible. It's a little unfortunate to give a useful feature a vendor-specific name, but I think the right approach is probably to accept that for compatibility and then also accept a more neutral and descriptive name.

The code seems fine.

scott.linder retitled this revision from Implement -frecord-gcc-switches to Implement -frecord-command-line (-frecord-gcc-switches).

Change canonical option name to -frecord-command-line and add an alias from -frecord-gcc-switches

Do the same for -grecord-command-line/-grecord-gcc-switches

rjmccall added inline comments.Nov 14 2018, 4:30 PM
docs/ClangCommandLineReference.rst
797
  1. Is this section always called .LLVM.command.line, or does it differ by target object-file format?
  2. For that matter, is this actually supported by all object-file formats?
  3. Please describe the format of the section. Is it just a flat C string? Is it nul-terminated? Are different options combined with spaces, and if so, does that mean that interior spaces in command line arguments are impossible to distinguish? If the latter, I assume that this is required behavior for backward compatibility; please at least apologize for that in the documentation. :)
scott.linder added inline comments.Nov 15 2018, 6:36 AM
docs/ClangCommandLineReference.rst
797

For (1) I think the answer should be that we emit .LLVM.command.line (or whatever name we land on) for every object-file format with the concept of names for sections.

For (2), I have only implemented this for ELF so far in LLVM. I don't know how reasonable that is, and if it isn't I can look at adding it to other common object-file formats that LLVM supports.

For (3), my current proposal for the format is the same as the .comment section for idents: null-surrounded strings. Interior spaces are escaped, in the same manner as for the -g variant. There may still be more thought to put into the format, as the GCC version null-terminates each individual option; the reason I avoided this is that during object linking the options of many command-lines simply get mixed together, which seems less than useful.

As an example, for ELF clang -frecord-command-line -S "foo .c" produces the ASM:

.section        .LLVM.command.line,"MS",@progbits,1
.zero   1
.ascii  "/usr/local/bin/clang-8 -frecord-command-line -S foo\\ .c"
.zero   1

And for multiple command-lines in a single object (e.g. linking three objects with recorded command-lines) this would be:

.section        .LLVM.command.line,"MS",@progbits,1
.zero   1
.ascii  "/usr/local/bin/clang-8 -frecord-command-line -c foo\\ .c"
.zero   1
.ascii  "/usr/local/bin/clang-8 -frecord-command-line -some -unique -options -c bar.c"
.zero   1
.ascii  "/usr/local/bin/clang-8 -frecord-command-line -something -else -c baz.c"
.zero   1

I will try to capture more of this in the documentation.

Looking a bit further, it seems some other object-file formats have conventions for naming (e.g. Mach-O sections are _snake_case) and so .LLVM.command.line may not end up being what we want universally.

Additionally it seems things like .ident are not supported by all object-file formats in LLVM; using Mach-O as an example again, it uses the default of "no .ident support", and I don't see the ident in the .S or .o

rjmccall added inline comments.Nov 15 2018, 1:01 PM
docs/ClangCommandLineReference.rst
797

Okay. I don't particularly need you to implement this on additional targets, but yes, please capture all this layout information in the documentation and make sure that the driver enforces any meaningful restrictions about which targets support it.

Update documentation for new option and error in the driver when generating for unsupported object-file format.

rjmccall added inline comments.Nov 16 2018, 1:02 PM
docs/ClangCommandLineReference.rst
799

How about something like "After linking, the section may contain multiple command lines, which will be individually terminated by null bytes. Separate arguments within a command line are combined with spaces; spaces and backslashes within an argument are escaped with backslashes."

And please write out "(ELF Only)" as something more like "This option is only currently supported on ELF targets."

Update documentation for new option

rjmccall accepted this revision.Nov 27 2018, 11:55 AM

Thanks, LGTM.

This revision is now accepted and ready to land.Nov 27 2018, 11:55 AM

There may be changes to some details in the LLVM patch; once they are finalized I will update the Clang patch.

Update documentation to match section naming change, and to explicitly note that the section format differs from the equivalent GCC format.

So we're using the same command line option as GCC to produce something in the same section as GCC but formatting that section incompatibly with GCC? That combination of choices does not seem like a good idea.

So we're using the same command line option as GCC to produce something in the same section as GCC but formatting that section incompatibly with GCC? That combination of choices does not seem like a good idea.

From the GCC man page -frecord-gcc-switches "is only implemented on some targets and the exact format of the recording is target and binary file format dependent, but it usually takes the form of a section containing ASCII text"

In the LLVM portion of this review it was decided that the implications of this is that well-behaved consumers should not make assumptions about the format. They should also not make assumptions about the name of the section, but we are effectively backwards-compatible in that we are a section with ASCII text, so tools which just e.g. dump the section for uses will continue to work with our implementation.

So we're using the same command line option as GCC to produce something in the same section as GCC but formatting that section incompatibly with GCC? That combination of choices does not seem like a good idea.

From the GCC man page -frecord-gcc-switches "is only implemented on some targets and the exact format of the recording is target and binary file format dependent, but it usually takes the form of a section containing ASCII text"

In the LLVM portion of this review it was decided that the implications of this is that well-behaved consumers should not make assumptions about the format. They should also not make assumptions about the name of the section, but we are effectively backwards-compatible in that we are a section with ASCII text, so tools which just e.g. dump the section for uses will continue to work with our implementation.

Ah, okay. I didn't realize the format was target-specific on GCC. That's fine.

rjmccall accepted this revision.Dec 13 2018, 1:35 PM