This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][NFC] Refactor CopyConfig structure - categorize options.
ClosedPublic

Authored by avl on May 11 2021, 2:16 PM.

Details

Summary

This patch continues refactoring done by D99055. It puts format specific
options into the correponding CopyConfig structures.

Diff Detail

Unit TestsFailed

Event Timeline

avl created this revision.May 11 2021, 2:16 PM
avl requested review of this revision.May 11 2021, 2:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2021, 2:16 PM
jhenderson added inline comments.May 14 2021, 1:55 AM
llvm/tools/llvm-objcopy/ConfigManager.cpp
561–562

const_cast is always a bit scary to me. Would it make more sense to make NewSymbolVisibility and SymbolsToAdd mutable in the ELFConfig?

llvm/tools/llvm-objcopy/ELF/ELFConfig.h
22
25

Unlike the Section and Symbol matchers comments below, this is more of a sentence, so deserves a .. Might as well fix whilst you're moving the code.

avl added inline comments.May 14 2021, 4:29 AM
llvm/tools/llvm-objcopy/ConfigManager.cpp
561–562

using mutable inside ELFConfig will allow to change the fields by consumer:

struct ELFConfig {
  mutable uint8_t NewSymbolVisibility;
};

void consumer (const ELFConfig& ELFConfig ) {
   ELFConfig.NewSymbolVisibility = STV_HIDDEN; // <<<<< incorrect usage
}

That would be not proper usage. The ConfigManager::getELFConfig() uses lazy initialization pattern. It is usually OK to use const_cast for lazy initialization. it is used in other places too:

llvm/CodeGen/ScheduleDAG.h

unsigned getDepth() const {
  if (!isDepthCurrent)
    const_cast<SUnit *>(this)->ComputeDepth();
  return Depth;
}

So I think it is better to not use mutable inside ELFConfig.

avl updated this revision to Diff 346703.May 20 2021, 5:12 AM

addressed comments.

Something is a bit odd with the diff.

llvm/tools/llvm-objcopy/ConfigManager.cpp
562–563

I really don't like this boolean here. We have Optional for a reason, and it does exactly this without the need of the additional boolean, right?

My gut says that our "unsupported option" handling needs redoing. Some of the options are applicable for multiple formats, but not yet implemented. Such options belong in the CommonConfig, which means we can cleanly reference them from within the file-format specific configs. I don't think we should be warning for other options. E.g. if a user uses an option that is tied to ELF, such as the new symbol visibility option, because the concept is not applicable in other formats, it shouldn't be checked for. This then means we can avoid referencing file format specific configs from each other, which I think is a good design.

What do you think?

avl added inline comments.May 22 2021, 8:09 AM
llvm/tools/llvm-objcopy/ConfigManager.cpp
562–563

I really don't like this boolean here. We have Optional for a reason, and it does exactly this without the need of the additional boolean, right?

Let`s temporarily delay this point.

My gut says that our "unsupported option" handling needs redoing. Some of the options are applicable for multiple formats, but not yet implemented. Such options belong in the CommonConfig, which means we can cleanly reference them from within the file-format specific configs. I don't think we should be warning for other options. E.g. if a user uses an option that is tied to ELF, such as the new symbol visibility option, because the concept is not applicable in other formats, it shouldn't be checked for. This then means we can avoid referencing file format specific configs from each other, which I think is a good design.

What do you think?

My understanding is: whether such change is OK or not depends on the usage idea of the llvm-objcopy tool(and friends llvm-strip...). Is it supposed to work with multiple formats at a time?

I think, the intention of such checks :

if (MachO.StripSwiftSymbols || MachO.KeepUndefined)
  return createStringError(llvm::errc::invalid_argument,
                           "option not supported by llvm-objcopy for ELF");

is to avoid multi-format usage:

llvm-strip -S -T ELF-file MachO-file
llvm-strip: error: option not supported by llvm-objcopy for ELF

If we will avoid format-specific checks then we would allow multi-format usages.

Is it OK?

If we would like to preserve the idea of avoiding multi-format usages and we are fine with redoing "unsupported options" handling then it might be better to implement that idea explicitly(if I did not miss something):

  1. Before starting to parse options, detect the format of the input file(single source file, first source file, first file inside an archive).
  2. Parse options for the detected format and display warning if there were specified options for another format (other than we detected at step 1.).
  3. while handling source file(s), ignore(or fail?) files which have a format different from what we detected at stage 1.
  4. while handling source file(s), check options from CommonConfig: If there are some unimplemented yet - then display the warning message.

Such an approach, explicitly force only single format usage(which looks like a good thing). Also, it avoids the necessity to implement lazy options, which complicate interfaces and implementation.

What do you think?

Taking the conversation out of line.

My gut says that our "unsupported option" handling needs redoing. Some of the options are applicable for multiple formats, but not yet implemented. Such options belong in the CommonConfig, which means we can cleanly reference them from within the file-format specific configs. I don't think we should be warning for other options. E.g. if a user uses an option that is tied to ELF, such as the new symbol visibility option, because the concept is not applicable in other formats, it shouldn't be checked for. This then means we can avoid referencing file format specific configs from each other, which I think is a good design.

What do you think?

My understanding is: whether such change is OK or not depends on the usage idea of the llvm-objcopy tool(and friends llvm-strip...). Is it supposed to work with multiple formats at a time?

Short answer: yes the tools are supposed to work with multiple formats in a single run. llvm-objcopy/llvm-strip need to be compatible with GNU objcopy/strip. I just ran a quick experiment and GNU objcopy is able to handle inputs with different file formats in the same run (I specifically tested COFF and ELF). As such, llvm-objcopy needs to be able to handle the same cases.

For reference, I have heard of instances in the past where people create a single archive containing files of different formats, so that they can use it in a cross-platform manner. The specific example I know of was for an ELF32 target and an ELF64 target, but I believe the principle is equally applicable for COFF, Mach-O etc.

I did a quick experiment with GNU objcopy and couldn't immediately identify any options that were specific to ELF (I didn't look too hard though), but there are some PE-specific options, and these don't generate errors when run on an ELF object. As such, my earlier comment about warning/error-ing for unimplemented options that apply, and being silent for other options would be the most GNU compatible.

llvm/tools/llvm-objcopy/ConfigManager.cpp
562–563

I really don't like this boolean here. We have Optional for a reason, and it does exactly this without the need of the additional boolean, right?

Let`s temporarily delay this point.

My gut says that our "unsupported option" handling needs redoing. Some of the options are applicable for multiple formats, but not yet implemented. Such options belong in the CommonConfig, which means we can cleanly reference them from within the file-format specific configs. I don't think we should be warning for other options. E.g. if a user uses an option that is tied to ELF, such as the new symbol visibility option, because the concept is not applicable in other formats, it shouldn't be checked for. This then means we can avoid referencing file format specific configs from each other, which I think is a good design.

What do you think?

My understanding is: whether such change is OK or not depends on the usage idea of the llvm-objcopy tool(and friends llvm-strip...). Is it supposed to work with multiple formats at a time?

I think, the intention of such checks :

if (MachO.StripSwiftSymbols || MachO.KeepUndefined)
  return createStringError(llvm::errc::invalid_argument,
                           "option not supported by llvm-objcopy for ELF");

is to avoid multi-format usage:

llvm-strip -S -T ELF-file MachO-file
llvm-strip: error: option not supported by llvm-objcopy for ELF

If we will avoid format-specific checks then we would allow multi-format usages.

Is it OK?

If we would like to preserve the idea of avoiding multi-format usages and we are fine with redoing "unsupported options" handling then it might be better to implement that idea explicitly(if I did not miss something):

  1. Before starting to parse options, detect the format of the input file(single source file, first source file, first file inside an archive).
  2. Parse options for the detected format and display warning if there were specified options for another format (other than we detected at step 1.).
  3. while handling source file(s), ignore(or fail?) files which have a format different from what we detected at stage 1.
  4. while handling source file(s), check options from CommonConfig: If there are some unimplemented yet - then display the warning message.

Such an approach, explicitly force only single format usage(which looks like a good thing). Also, it avoids the necessity to implement lazy options, which complicate interfaces and implementation.

What do you think?

avl added a comment.EditedMay 25 2021, 2:16 AM

Short answer: yes the tools are supposed to work with multiple formats in a single run. llvm-objcopy/llvm-strip need to be compatible with GNU objcopy/strip. I just ran a quick experiment and GNU objcopy is able to handle inputs with different file formats in the same run (I specifically tested COFF and ELF). As such, llvm-objcopy needs to be able to handle the same cases.

For reference, I have heard of instances in the past where people create a single archive containing files of different formats, so that they can use it in a cross-platform manner. The specific example I know of was for an ELF32 target and an ELF64 target, but I believe the principle is equally applicable for COFF, Mach-O etc.

I see. From previous conversation I had an impression that we do not expect multiple inputs in different formats(https://reviews.llvm.org/D99055#2701025). Also, There would be problems with parsing lazy options for the case "multiple inputs in different formats". From that point of view, restriction on avoiding usages of "multiple inputs in different formats" might make sense. Probably, useful restriction would be - avoid using of format specific options when multiple inputs in different formats are specified.

Actually, current format specific checks verify exactly this : "avoid using of format specific options when multiple inputs in different formats are specified". That probably means that we need to keep them? (To prevent incorrect lazy options parsing)

Short answer: yes the tools are supposed to work with multiple formats in a single run. llvm-objcopy/llvm-strip need to be compatible with GNU objcopy/strip. I just ran a quick experiment and GNU objcopy is able to handle inputs with different file formats in the same run (I specifically tested COFF and ELF). As such, llvm-objcopy needs to be able to handle the same cases.

For reference, I have heard of instances in the past where people create a single archive containing files of different formats, so that they can use it in a cross-platform manner. The specific example I know of was for an ELF32 target and an ELF64 target, but I believe the principle is equally applicable for COFF, Mach-O etc.

I see. From previous conversation I had an impression that we do not expect multiple inputs in different formats(https://reviews.llvm.org/D99055#2701025). Also, There would be problems with parsing lazy options for the case "multiple inputs in different formats". From that point of view, restriction on avoiding usages of "multiple inputs in different formats" might make sense. Probably, useful restriction would be - avoid using of format specific options when multiple inputs in different formats are specified.

Anyway, if "multiple formats in a single run" is legal then it looks like we need to avoid format specific checks as you suggested. Will update the diff.

Yeah, I made that comment without experimenting, and then dredeged up from my memory the multi-platform archive too.

avl added a comment.May 25 2021, 2:26 AM

I edited my previous comment with this part:

Actually, current format specific checks verify exactly this : "avoid using of format specific options when multiple inputs in different formats are specified". That probably means that we need to keep them? (To prevent incorrect lazy options parsing)

I edited my previous comment with this part:

Actually, current format specific checks verify exactly this : "avoid using of format specific options when multiple inputs in different formats are specified". That probably means that we need to keep them? (To prevent incorrect lazy options parsing)

I don't think it's a goal of llvm-objcopy to prevent users from using format-specific options when those formats are not present, or when other formats are present. GNU objcopy doesn't do this, so doing it ourselves is probably a bad idea, as it makes the tool less compatible with GNU, for no real benefit. The exception is for unimplemented options, as discussed earlier.

I dug back into the history of the lazy option parsing, and it was my original suggestion, so that we avoided the ELF-specific bits from happening for other formats. However, the only two options that are done here are --new-symbol-visibility (which is ELF specific, and doesn't need to be lazily parsed, since it's pretty trivial), and --add-symbol (which is generic, but only implemented for ELF currently). The problem with --add-symbol is not, as I previously thought, the syntax (it's identical for all file formats as far as I can tell in GNU objcopy), but rather what to do when this option is encountered. Each individual format will need its own parsing, and obviously there's no point in parsing the option for file formats that aren't used. That being said, doing that parsing at the moment is probably harmless (there are no errors in the parseNewSymbolInfo code that are format-specific).

Perhaps we should just rewrite how --add-symbol is processed. Rather than have an ELF-specific NewSymbolInfo struct, we could have a generic one which contains the symbol name, value, section and input flags in (the latter possibly converted to a generic enum value), with that bit being handled at the same time as the rest of the command-line options, and then the processing of the flags done in ELFObjcopy when the list of these is actually about to be used. That should allow us to completely drop all lazy parsing, I think.

What do you think?

avl added a comment.May 26 2021, 2:50 AM

I don't think it's a goal of llvm-objcopy to prevent users from using format-specific options when those formats are not present, or when other formats are present. GNU objcopy doesn't do this, so doing it ourselves is probably a bad idea, as it makes the tool less compatible with GNU, for no real benefit. The exception is for unimplemented options, as discussed earlier.

the problem is in how following case might be handled:

--add-symbol sym1=0x100,global --add-symbol sym2=0x200,elf-specific-flag --add-symbol sym3=0x300,coff-specific-flag

If we allow multiple files with different formats and allow format specific options then above legal situation
could not be handled: while parsing options, for elf there would be detected unknown "coff-specific" value,
for the coff there would be detected unknown "elf-specific" value. This would result in a error for both input files.
i.e. it looks like weak command-line interface which does not work for legal case.

From the other side, GNU objcopy does not solve this case also and the whole case looks quite specific.
so, if we are fine with such a behavior then let it be so.

Perhaps we should just rewrite how --add-symbol is processed. Rather than have an ELF-specific NewSymbolInfo struct, we could have a generic one which contains the symbol name, value, section and input flags in (the latter possibly converted to a generic enum value), with that bit being handled at the same time as the rest of the command-line options, and then the processing of the flags done in ELFObjcopy when the list of these is actually about to be used. That should allow us to completely drop all lazy parsing, I think.

What do you think?

I like this suggestion which allows to remove lazy options. If we do not need to support lazy options then we could simplificate CopyConfig structures. We might use one CopyConfig for all formats:

struct CopyConfig {
  // common options
  
  // ELF options
  
  ... 
};

Would it be OK or you think we need separate ELFConfig, COFFConfig(like it is currently done)...?

I don't think it's a goal of llvm-objcopy to prevent users from using format-specific options when those formats are not present, or when other formats are present. GNU objcopy doesn't do this, so doing it ourselves is probably a bad idea, as it makes the tool less compatible with GNU, for no real benefit. The exception is for unimplemented options, as discussed earlier.

the problem is in how following case might be handled:

--add-symbol sym1=0x100,global --add-symbol sym2=0x200,elf-specific-flag --add-symbol sym3=0x300,coff-specific-flag

If we allow multiple files with different formats and allow format specific options then above legal situation
could not be handled: while parsing options, for elf there would be detected unknown "coff-specific" value,
for the coff there would be detected unknown "elf-specific" value. This would result in a error for both input files.
i.e. it looks like weak command-line interface which does not work for legal case.

From the other side, GNU objcopy does not solve this case also and the whole case looks quite specific.
so, if we are fine with such a behavior then let it be so.

The existing ELF behaviour already recognises but does nothing with certain flags that are COFF specific. I don't think we need to change this behaviour (i.e. COFF specific flags are ignored for ELF objects and (in the future) vice versa).

Perhaps we should just rewrite how --add-symbol is processed. Rather than have an ELF-specific NewSymbolInfo struct, we could have a generic one which contains the symbol name, value, section and input flags in (the latter possibly converted to a generic enum value), with that bit being handled at the same time as the rest of the command-line options, and then the processing of the flags done in ELFObjcopy when the list of these is actually about to be used. That should allow us to completely drop all lazy parsing, I think.

What do you think?

I like this suggestion which allows to remove lazy options. If we do not need to support lazy options then we could simplificate CopyConfig structures. We might use one CopyConfig for all formats:

struct CopyConfig {
  // common options
  
  // ELF options
  
  ... 
};

Would it be OK or you think we need separate ELFConfig, COFFConfig(like it is currently done)...?

I don't have a particular preference as to whether format-specific options are in separate configs or not. I think if we're thinking in terms of a library, it might make sense to split them up, so that a person who wants to modify COFF objects doesn't have to populate the ELF stuff, but certainly I'm open to alternatives. If we do keep separate configs, we should ensure the option split is based on whether options are truly format specific, or just not implemented, as discussed earlier (unimplemented generic options should still be in the generic config class).

avl updated this revision to Diff 371182.Sep 7 2021, 1:59 PM

addressed comments: moved format specific options into the specific configs.

avl edited the summary of this revision. (Show Details)Sep 7 2021, 2:09 PM
avl updated this revision to Diff 371184.Sep 7 2021, 2:13 PM
avl edited the summary of this revision. (Show Details)

addressed comments: moved format specific options into the specific configs.

avl updated this revision to Diff 371186.Sep 7 2021, 2:18 PM

addressed comments: moved format specific options into the specific configs.

avl added a comment.Sep 7 2021, 3:00 PM

@jhenderson James, Would you mind to continue reviewing this refactoring effort, please? I returned back to this task and did changes which we discussed last - https://reviews.llvm.org/D102277#inline-974299. i.e. Format specific options are moved into the specific configs. Checks for format specific options are removed from get*Config() methods.

jhenderson accepted this revision.Sep 8 2021, 12:20 AM

One nit, otherwise this looks good to me.

llvm/tools/llvm-objcopy/MachO/MachOObjcopy.cpp
147

If the CommonConfig isn't used anymore, it seems like you could drop the parameter here?

This revision is now accepted and ready to land.Sep 8 2021, 12:20 AM
avl added a comment.Sep 8 2021, 7:50 AM

Thank you for the review. Will address comment and integrate.

This revision was landed with ongoing or failed builds.Sep 8 2021, 9:17 AM
This revision was automatically updated to reflect the committed changes.