This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Refactor CopyConfig structure.
ClosedPublic

Authored by avl on Mar 21 2021, 11:40 PM.

Details

Summary

This patch prepares llvm-objcopy to move its implementation
into a separate library. To make it possible it is necessary
to minimize internal dependencies. Current CopyConfig has the following
drawbacks:

  1. Current CopyConfig has parsed and unparsed options at the same time. Thus, parsing functionality is located in different places. Instead, It would be good to parse options once and then have parsed options inside CopyConfig. This patch does all parsing job in the parse*() routines.
  1. Current implementation has an attempt to hide format-specific options. This patch makes it in a more general way: separate format-specific options using subclasses and allow to create the format-specific views at all options.

    So, this patch creates some interfaces representing common and format-specific options:

    CopyConfigBase CopyConfigELF CopyConfigCOFF CopyConfigMachO CopyConfigWasm

    And a class that allows to switch the view on the full options set:

    MultiFormatCopyConfig.

Diff Detail

Event Timeline

avl created this revision.Mar 21 2021, 11:40 PM
avl requested review of this revision.Mar 21 2021, 11:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 21 2021, 11:40 PM
jhenderson requested changes to this revision.Mar 22 2021, 1:45 AM

It also makes format of error mesages for --add-symbol option matching with format of error mesages for other options.

Could you do this in a separate patch upfront, please? As things stand, this patch is not NFC, because of this...! I have some concerns about the specific nature of the non-NFC part, which I'd like to comment on separately.

As for the main part of this change, I think this is essentially trying to redo D67139, which had a lot of back and forth discussion on it. Apart from anything else, the *Objcopy.cpp files are not where to put command-line parsing logic, which was an item from that review. Please go over the comments from D67139, and make sure to take the points raised in the comments on board. Once you have done that, feel free to propose a new solution (which could be similar to this, or something else entirely), if you feel it is important.

This revision now requires changes to proceed.Mar 22 2021, 1:45 AM
avl added a comment.Mar 22 2021, 2:31 AM

Could you do this in a separate patch upfront, please? As things stand, this patch is not NFC, because of this...! I have some concerns about the specific nature of the non-NFC part, which I'd like to comment on separately.

It would be a bit artificial... Since the change on how error message is displayed happens because option processing was moved form common part into the specific part. i.e. it could not be separated from this change. But I might be done by adding extra code:

Expected<elf::ELFCopyConfig> ELFConfig = elf::parseConfig(*this);
if (!ELFConfig)
  return createFileError(Config.InputFilename, ELFConfig.takeError());

Would create a separate review.

avl added a comment.Mar 22 2021, 2:31 AM

As for the main part of this change, I think this is essentially trying to redo D67139, which had a lot of back and forth discussion on it. Apart from anything else, the *Objcopy.cpp files are not where to put command-line parsing logic, which was an item from that review. Please go over the comments from D67139, and make sure to take the points raised in the comments on board. Once you have done that, feel free to propose a new solution (which could be similar to this, or something else entirely), if you feel it is important.

Thanks, will look through it and get back.

avl updated this revision to Diff 332586.Mar 23 2021, 3:10 AM

addressed comments:

Restored ELFConfig.h/cpp as a separate file keeping options parsing code.
Left usage of parsing code of elf-specific options inside /ELF directory.

D67139 suggested parsing specific options lazily.
i.e. To parse them only in case when they are necessary.
But it places format-specific options in a common CopyConfig structure.
This does not look good since all users of ELFConfig are inside of
/ELF directory(Thus we do not need to make them visible to others).
This patch hides elf-specific processing inside /ELF directory
and parses format-specific options lazily.

avl edited the summary of this revision. (Show Details)Mar 23 2021, 3:17 AM
avl added a comment.Mar 23 2021, 2:59 PM

Looking at the design for ELF parseable options more, I do not really understand why parsing of ELF options should be delayed. It seems that they might be set at the same time when all other options parsed(i.e. as it was before D67139):

remove

std::vector<StringRef> SymbolsToAdd;
Optional<StringRef> NewSymbolVisibility

from CopyConfig. So that CopyConfig never contains unparsed strings.

add specific ELF fields to CopyConfig:

CopyConfig {

  // ELF specific options.
  Optional<uint8_t> NewSymbolVisibility;
  std::vector<NewSymbolInfo> SymbolsToAdd;
}

parseObjcopyOptions() {
  // set NewSymbolVisibility and SymbolsToAdd
  // if corresponding options exist in arguments.
}

What do you think?

In D99055#2644077, @avl wrote:

addressed comments:

Restored ELFConfig.h/cpp as a separate file keeping options parsing code.
Left usage of parsing code of elf-specific options inside /ELF directory.

D67139 suggested parsing specific options lazily.
i.e. To parse them only in case when they are necessary.
But it places format-specific options in a common CopyConfig structure.
This does not look good since all users of ELFConfig are inside of
/ELF directory(Thus we do not need to make them visible to others).
This patch hides elf-specific processing inside /ELF directory
and parses format-specific options lazily.

Lazy parsing previously also had the advantage that the parsing only needed to happen once at most, even if there were multiple input files. It looks like this change causes the ELF specific config to be calculated once per input, which seems suboptimal? Aside from that, this looks like a general improvement to me. What do others think?

avl added a comment.Mar 24 2021, 3:59 AM

after some more thinking, it seems to me that nor this patch nor current llvm version is not a right solution.

Current CopyConfig has parsed and unparsed options at the same time - this is not good in general.
Unparsed ELF options are visible to other formats.
Parsed ELF options are visible to other formats.
Delayed options parsing makes CopyConfig to be invalid for some time periods(until target options are parsed).
Current CopyConfig might be in inconsistent state (if several target would be specified at the same time).

I think that the better way would be something like this:

CopyConfig has all options in parsed state. It is filled in parseObjcopyOptions().

CopyConfig {
   StringRef InputFilename;

  // ELF specific options.
  Optional<uint8_t> NewSymbolVisibility;
  std::vector<NewSymbolInfo> SymbolsToAdd;
}

Target ElfCopyConfig has only options which are supported by the target.

ElfCopyConfig {
   StringRef InputFilename;

  Optional<uint8_t> NewSymbolVisibility;
  std::vector<NewSymbolInfo> SymbolsToAdd;
}

Error executeObjcopyOnBinary(const ElfCopyConfig &Config,
                             object::ELFObjectFileBase &In, raw_ostream &Out)

CopyConfig has methods converting it to the target config:

CopyConfig {
  ElfCopyConfig getELFConfig();
}

static Error executeObjcopyOnBinary(CopyConfig &Config, object::Binary &In,
                                    raw_ostream &Out) {
  if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In)) {
    return elf::executeObjcopyOnBinary(Config.getELFConfig(), *ELFBinary, Out);
}

Would implement that approach and show for review.

avl updated this revision to Diff 333535.Mar 26 2021, 4:51 AM

Changed a patch to solve following problems:

  1. Current CopyConfig has parsed and unparsed options at the same time. Thus, parsing functionality is located in different places. Instead, It would be good to parse options once and then have parsed options inside CopyConfig. This patch does all parsing job in the parse*() routines.
  2. Current trunc implementation has an attempt to hide format-specific options. This patch makes it in a more general way: separate format-specific options using subclasses and allow to create the format-specific views at all options.

    So, this patch creates some interfaces representing common and format-specific options:

    CopyConfigBase CopyConfigELF CopyConfigCOFF CopyConfigMachO CopyConfigWasm

    And a class that allows to switch the view on the full options set:

    MultiFormatCopyConfig.

    I am a bit unsure whether it is finally good solution. From one side, It creates a good separation of concerns: clearly categorize options and separate them from other formats. From other side, the implementation looks a bit complicated.

    @jhenderson @alexshap @MaskRay What is your opinion?
avl retitled this revision from [llvm-objcopy][NFC] remove processing of ELF specific options from common CopyConfig structure. to [llvm-objcopy][NFC] Refactor CopyConfig structure..Mar 26 2021, 4:55 AM
avl edited the summary of this revision. (Show Details)
avl retitled this revision from [llvm-objcopy][NFC] Refactor CopyConfig structure. to [llvm-objcopy] Refactor CopyConfig structure..Mar 26 2021, 4:55 AM
jhenderson resigned from this revision.Mar 29 2021, 1:03 AM

Thanks for the update. I do have some concerns about the multiple inheritance stuff going on, if I'm honest, as usually this is a design smell, but I haven't got the time right now to think about it. Unfortunately, I'm not likely to get more of a chance to later this week either, as from Thursday I'm off for just over two weeks.

I do still think you should normalise the error messages in a separate prerequisite change. Small changes are good, especially if it helps make something else a pure refactor.

If I don't come back around to this before I go, please feel free to go with the approval of others. I'm resigning from reviewing to remove my "request changes" flag, but I may come back to it later.

Thanks for the update. I do have some concerns about the multiple inheritance stuff going on, if I'm honest, as usually this is a design smell

+1 I can take a look more closely to try to tease out whether there's some alternative that might be nicer, but maybe @rupprecht or @MaskRay might have more context here already. (at a glance the use of virtual inheritance here seems plausible - since these are just structs - if all the config property access went through virtual functions then it could probably be done without virtual inheritance, but with virtual functions and maybe some CRTP mix-in kind of implementations, but that may not be any better - but I wouldn't mind a few other eyes on this, maybe there's some other ways forward here)

I do still think you should normalise the error messages in a separate prerequisite change. Small changes are good, especially if it helps make something else a pure refactor.

Sounds pretty reasonable.

avl added a comment.Mar 30 2021, 7:38 AM

I do have some concerns about the multiple inheritance stuff going on, if I'm honest, as usually this is a design smell

The idea here is to reuse only data. The most weaknesses of multiplue inheritance in C++
came with handling of inherited methods. virtual inheritance of data members looks
safe(since they could not be redefined). Though, copy and move operators should be properly addressed.
But if there is something bad with this approach I am open to change it.

I do still think you should normalise the error messages in a separate prerequisite change. Small changes are good, especially if it helps make something else a pure refactor.

will create a separate review.

avl added a comment.Mar 30 2021, 7:50 AM

if all the config property access went through virtual functions then it could probably be done without virtual inheritance, but with virtual functions and maybe some CRTP mix-in kind of implementations, but that may not be any better

I thought about similar approach : virtual functions and maybe some CRTP mix-in kind of implementations.
But it looked much more heavyweight(at least the implementation which I was thinking of).

struct CopyConfig {};

template < class T >
struct BaseMixin : public T {
  virtual int getCommonOpt() = 0;
};

template < class T >
struct ELFMixin : public T {
  virtual int getElfOpt() = 0;
};

template < class T >
struct COFFMixin : public T {};

template < class T >
struct MachOMixin : public T {};

template < class T >
struct WasmMixin : public T {};


struct CopyConfigELF : public BaseMixin<CopyConfig>,
                       public ELFMixin<CopyConfig> {};

struct CopyConfigCOFF : public BaseMixin<CopyConfig>,
                        public COFFMixin<CopyConfig> {};

struct CopyConfigMachO : public BaseMixin<CopyConfig>,
                         public MachOMixin<CopyConfig> {};

struct CopyConfigWasm : public BaseMixin<CopyConfig>,
                        public WasmMixin<CopyConfig> {};
                        
struct CopyConfigFull : public BaseMixin<CopyConfig>,
                        public ELFMixin<CopyConfig>,
                        public COFFMixin<CopyConfig>,
                        public MachOMixin<CopyConfig>,
                        public WasmOMixin<CopyConfig>
                         {}


struct MultiFormatCopyConfig  {

  MultiFormatCopyConfig() : ElfProxy(Opts) {
  }
                               
  virtual const CopyConfigELF & getELFConfig() const {
      return ElfProxy;
  }

  protected:
  class CopyConfigELFProxy : public CopyConfigELF{
    CopyConfigELFProxy(CopyConfigFull& Opts) : Opts(Opts){}
    int getElfOpt() override {
      return Opts.getElfOpt();
    }
     int getCommonOpt() override {
      return Opts.getCommonOpt();
     }
    CopyConfigFull& Opts;
  };


  CopyConfigFull Opts;
  CopyConfigELFProxy ElfProxy;
}

So the general point of this change is that previously all the options were in one struct which meant that, say, the MachO implementation could accidentally end up using an ELF option value, etc? And this way the implementations can accept the common options (the base class) and their implementation-specific options?

Any sense of whether that sort of cross-use was/is a significant source of bugs, or whether it'd be OK to just have chunks of the config specify "these are for ELF", "these are for MachO", etc (just in comments, or perhaps using ELF/MachO as a prefix, or having them in an ELF/MachO/etc subobject - while still having them all together/without the ability to create a common-and-MachO-view, common-and-ELF-view, etc) - I guess this came out of discussions in other reviews around the "it is necessary to minimize internal dependencies."? Might be good to have more of that context here to motivate/explain the benefits.

avl added a comment.Mar 31 2021, 7:35 AM

So the general point of this change is that previously all the options were in one struct which meant that, say, the MachO implementation could accidentally end up using an ELF option value, etc? And this way the implementations can accept the common options (the base class) and their implementation-specific options?

Any sense of whether that sort of cross-use was/is a significant source of bugs, or whether it'd be OK to just have chunks of the config specify "these are for ELF", "these are for MachO", etc (just in comments, or perhaps using ELF/MachO as a prefix, or having them in an ELF/MachO/etc subobject - while still having them all together/without the ability to create a common-and-MachO-view, common-and-ELF-view, etc) - I guess this came out of discussions in other reviews around the "it is necessary to minimize internal dependencies."? Might be good to have more of that context here to motivate/explain the benefits.

my immediate need(driven by D88827) is to remove dependencies to the parts which are not suitable for the library and to remove things that complicate the library interface.
In the current llvm implementation these are :

  1. CopyConfig has two versions of some options. Parsed and unparsed. f.e. These are SymbolsToAdd and ELF.SymbolsToAdd. It is redundant for the library interface to have two of them.
  1. Current implementation uses lazy initialization of CopyConfig.ELF. i.e. it parses ELF-specific options later than others. It would be inconvenient to have potentially uninitialized parts in the library interface. It is also would be incorrect to initialize this part inside a library(Since the library should not depend on the user interface of the concrete tool)

another thing is that D67139 suggested dividing options into format-specific parts. Thus this patch tries to resolve points 1 and 2 and to follow D67139 idea. Speaking
of whether such format-specific grouping is necessary and whether it is worth efforts: I think that probably it is not a big problem that format specific parts would see
other options(it would be nice to have, but if it requires many things to do then we might leave without it). So to resolve 1 and 2 it would be just fine to have:

struct CopyConfig {
  // common options 
  StringRef InputFilename;
  ...
  
  // ELF specific options
  std::vector<NewSymbolInfo> SymbolsToAdd;
  
  // MachO specific options
  bool KeepUndefined = false; 
};

When that structure would be created, all input options would be put in CopyConfig in the parsed state.
Later this structure would be passed into format-specific part:

Error elf::executeObjcopyOnBinary(const CopyConfig &Config,
                             object::Binary &In, raw_ostream &Out);

That solution would be fine from the point of view "preparation to move objcopy implementation to the separate library". But it would revert the idea done by D67139.

avl added a comment.Apr 1 2021, 4:08 AM

So the general point of this change is that previously all the options were in one struct which meant that, say, the MachO implementation could accidentally end up using an ELF option value, etc? And this way the implementations can accept the common options (the base class) and their implementation-specific options?

My understanding is that above is point which D67139 suggested. This patch(D99055) tried to preserve it and implemented using other design approach.

Any sense of whether that sort of cross-use was/is a significant source of bugs, or whether it'd be OK to just have chunks of the config specify "these are for ELF", "these are for MachO", etc (just in comments, or perhaps using ELF/MachO as a prefix, or having them in an ELF/MachO/etc subobject - while still having them all together/without the ability to create a common-and-MachO-view, common-and-ELF-view, etc) - I guess this came out of discussions in other reviews around the "it is necessary to minimize internal dependencies."?

I think we do not have many bugs connected with cross-use. From that point of view it is easier and might be OK to use simpler approach ("just have chunks of the config specify "these are for ELF", "these are for MachO", ...").

From the point of view of clear design it is better to have solution which will avoid cross usages. We might use a solution from this patch(D99055) or delay it until better solution would be found.

What would be the better ?

avl updated this revision to Diff 334944.Apr 2 2021, 6:20 AM

The one more argument for design presented by this patch is that: it is assumed that part of the interface
would be moved into the future ObjCopy library. The part which would be moved is quite small and clear.
Second part would be left in the llvm-objcopy tool. If we do not like its implementation(ParsedMultiFormatConfig)
we still might be able to reimplement llvm-objcopy part using another approach, but the library interface
will remain the same. Currently, indeed, we do not have many cross-format bugs. But, when this code
would be done as a library then it becomes more possibilities to do such cross-format bugs.
So, it is generally helpful if the interface of the library would prevent cross-format bugs.
I refactored this patch so that it would be more clear which part is supposed to go into the library
and which part is supposed to be left inside llvm-objcopy:

library part:

CopyConfigBase.h
CopyConfigELF.h
CopyConfigCOFF.h
CopyConfigMachO.h
CopyConfigWasm.h
MultiFormatCopyConfig.h

llvm-objcopy tool part:

ParsedConfig.cpp

avl added a comment.Apr 14 2021, 5:06 AM

ping. Colleagues, what is your opinion on the way how this refactoring should be done?

I think the solution presented in this patch is a good way, especially from the library interface perspective.
It separates options and passes only relevant options to format specific parts. So my suggestion is to integrate this patch.

But if that approach is not good I am fine to implement a simpler approach: when all options are in the same structure, the structure is divided into the chunks of the config(by comments) "these are for ELF", "these are for MachO".

In D99055#2688559, @avl wrote:

ping. Colleagues, what is your opinion on the way how this refactoring should be done?

I think the solution presented in this patch is a good way, especially from the library interface perspective.
It separates options and passes only relevant options to format specific parts. So my suggestion is to integrate this patch.

But if that approach is not good I am fine to implement a simpler approach: when all options are in the same structure, the structure is divided into the chunks of the config(by comments) "these are for ELF", "these are for MachO".

I think @jhenderson is out of office for a bit - hoping when he's back he can weigh in on some of this. Give it another week or two hopefully.

I'm back from my time off, although have a massive backlog to work my way through, so haven't had too much time to revisit the history of this and related patches, so far. I took a step back and was wondering whether another approach would be significantly simpler. There's likely problems with my below suggesting, which may have already been covered elsewhere, but hopefully any explanations of why any issues are indeed issues will help me understand things better. Apologies if you've addressed some of the issues before.

Basically, the idea is to change the executeObjcopyOn* functions to take two separate configs, a base one and a format-specific one. Each file format would (where necessary) have a simple struct that contains the format-specific values, and then the base one would have the options shared by more than one format. This would of course require updating all references to the Config to reference the correct version as required, but if it leads to a nicer design overall, I think it is worth it.

To assist with the lazy parsing, a factory-like class could be responsible for providing the appropriate config when requested. The code might like something like this:

class ConfigManager {
public:
  ConfigManager(/*...*/); // Constructs Common. Stores raw argument data needed to construct other configs.

  const CommonConfig &getConfig() const { return Common; }
  const ELFConfig &getELFConfig() {
    if (!Elf)
      Elf = ...;
    return *Elf;
  }
  /* Same for other formats. */
  
private:
  CommonConfig Common;
  Optional<COFFConfig> Coff;
  Optional<ELFConfig> Elf;
  Optional<MachOConfig> MachO;
  Optional<WasmConfig> Wasm;
};

static Error executeObjcopyOnBinary(ConfigManager &Configs, object::Binary &In,
                                    raw_ostream &Out) {
  if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In)) {
    if (Error E = Config.parseELFConfig())
      return E;
    return elf::executeObjcopyOnBinary(Configs.getCommon(), Configs.getELFConfig(), *ELFBinary, Out);
  }
  /* etc */
}

What do you think?

avl added a comment.Apr 19 2021, 1:17 PM

What do you think?

This solution has some concerns caused by lazy options parsing.
I do not understand whether/why we need lazy parsing.
Thus, depending on the reason for lazy parsing, these concerns might be acceptable price or might be not. To support lazy parsing it is necessary
to keep unparsed data, move some parsing functionality into parseELFConfig().
In case it would be necessary to create another tool(llvm-another-tool) then
we need to implement a child of ConfigManager and implement another version of
parseELFConfig(). All this is doable but looks like unnecessary complications.
If it would be not necessary to support lazy parsing then the solution would look simpler.
Do we need to support lazy parsing?

The same solution without lazy parsing looks simpler:

class ConfigManager {
public:
  ConfigManager(/*...*/); // initializes all options into parsed state.

  const CommonConfig &getConfig() const { return Common; }
  const ELFConfig &getELFConfig() const { return Elf; }
  /* Same for other formats. */
  
private:
  CommonConfig Common;
  COFFConfig   Coff;
  ELFConfig    Elf;
  MachOConfig  MachO;
  WasmConfig   Wasm;
};

static Error executeObjcopyOnBinary(ConfigManager &Configs, object::Binary &In,
                                    raw_ostream &Out) {
  if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In))
    return elf::executeObjcopyOnBinary(Configs.getCommon(), Configs.getELFConfig(), *ELFBinary, Out);
  /* etc */
}
In D99055#2699528, @avl wrote:

What do you think?

This solution has some concerns caused by lazy options parsing.
I do not understand whether/why we need lazy parsing.
Thus, depending on the reason for lazy parsing, these concerns might be acceptable price or might be not. To support lazy parsing it is necessary
to keep unparsed data, move some parsing functionality into parseELFConfig().
In case it would be necessary to create another tool(llvm-another-tool) then
we need to implement a child of ConfigManager and implement another version of
parseELFConfig(). All this is doable but looks like unnecessary complications.
If it would be not necessary to support lazy parsing then the solution would look simpler.
Do we need to support lazy parsing?

I've not checked, but if memory serves me correctly, there are some options (--add-symbol, I think, at least) that are parsed differently depending on the file format. As such, we can't parse them upfront - if we did, one of the two parse*Config functions would fail. We'd only want that to happen if a user was providing both file formats and the option. Additionally, we don't want to warn about unsupported options for specific file formats, if we don't use those file formats.

avl added a comment.Apr 20 2021, 2:55 AM

I've not checked, but if memory serves me correctly, there are some options (--add-symbol, I think, at least) that are parsed differently depending on the file format. As such, we can't parse them upfront - if we did, one of the two parse*Config functions would fail. We'd only want that to happen if a user was providing both file formats and the option.

It seems the same problem might occur even with lazy parsing. What if we have an archive containing object files with different formats. We might not be able to specify "--add-symbol" option which would be successfully parsed for both of them:

if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In)) {
  if (Error E = Config.parseELFConfig()) <<<< one of this parsing routines might fail because of wrong format in source unparsed string.
    return E;
  return elf::executeObjcopyOnBinary(Configs.getCommon(), Configs.getELFConfig(), *ELFBinary, Out);
} else if (auto *COFFBinary = dyn_cast<object::COFFObjectFile>(&In)) {
  if (Error E = Config.parseCOFFConfig())  <<<< one of this parsing routines might fail because of wrong format in source unparsed string.
    return E;
  return coff::executeObjcopyOnBinary(Config, *COFFBinary, Out);
}

Probably we should avoid options having the same name but parsed differently for various formats?

In D99055#2700994, @avl wrote:

I've not checked, but if memory serves me correctly, there are some options (--add-symbol, I think, at least) that are parsed differently depending on the file format. As such, we can't parse them upfront - if we did, one of the two parse*Config functions would fail. We'd only want that to happen if a user was providing both file formats and the option.

It seems the same problem might occur even with lazy parsing. What if we have an archive containing object files with different formats. We might not be able to specify "--add-symbol" option which would be successfully parsed for both of them:

if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In)) {
  if (Error E = Config.parseELFConfig()) <<<< one of this parsing routines might fail because of wrong format in source unparsed string.
    return E;
  return elf::executeObjcopyOnBinary(Configs.getCommon(), Configs.getELFConfig(), *ELFBinary, Out);
} else if (auto *COFFBinary = dyn_cast<object::COFFObjectFile>(&In)) {
  if (Error E = Config.parseCOFFConfig())  <<<< one of this parsing routines might fail because of wrong format in source unparsed string.
    return E;
  return coff::executeObjcopyOnBinary(Config, *COFFBinary, Out);
}

Probably we should avoid options having the same name but parsed differently for various formats?

Yes, as mentioned, this is a problem if we have multiple inputs in different formats (including as archive members). I don't think we need to worry about encountering archives with different file format members though, as they will typically not be usable at link time, I'd think. If users want to use the option, they'll have to do it on the individual files in separate invocations.

Perhaps using a different option name would be useful, if we had that choice, but we don't always, either because of compatibility with existing tools, or to avoid unnecessarily breaking people's existing build scripts that use llvm-objcopy.

avl added a comment.Apr 20 2021, 3:44 AM

Perhaps using a different option name would be useful, if we had that choice, but we don't always, either because of compatibility with existing tools, or to avoid unnecessarily breaking people's existing build scripts that use llvm-objcopy.

I see. will change the patch to follow this pattern:

class ConfigManager {
public:
  ConfigManager(/*...*/); // Constructs Common. Stores raw argument data needed to construct other configs.

  const CommonConfig &getConfig() const { return Common; }
  const ELFConfig &getELFConfig() {
    if (!Elf)
      Elf = ...;
    return *Elf;
  }
  /* Same for other formats. */
  
private:
  CommonConfig Common;
  Optional<COFFConfig> Coff;
  Optional<ELFConfig> Elf;
  Optional<MachOConfig> MachO;
  Optional<WasmConfig> Wasm;
};

static Error executeObjcopyOnBinary(ConfigManager &Configs, object::Binary &In,
                                    raw_ostream &Out) {
  if (auto *ELFBinary = dyn_cast<object::ELFObjectFileBase>(&In)) {
    if (Error E = Config.parseELFConfig())
      return E;
    return elf::executeObjcopyOnBinary(Configs.getCommon(), Configs.getELFConfig(), *ELFBinary, Out);
  }
  /* etc */
}

According to the separation by CommonConfig and ELFConfig: Would it be useful to separate options like in this patch:

struct CommonConfig {
  // Main input/output options
  StringRef InputFilename;
  FileFormat InputFormat = FileFormat::Unspecified;
  StringRef OutputFilename;
  FileFormat OutputFormat = FileFormat::Unspecified;

  // Advanced options
  StringRef AddGnuDebugLink;

  DiscardType DiscardMode = DiscardType::None;

  // Repeated options
  std::vector<StringRef> AddSection;
  std::vector<StringRef> DumpSection;

  // Section matchers
  NameMatcher OnlySection;
  NameMatcher ToRemove;

  // Symbol matchers
  NameMatcher SymbolsToRemove;
  NameMatcher UnneededSymbolsToRemove;

  // Map options
  StringMap<SectionFlagsUpdate> SetSectionFlags;
  StringMap<StringRef> SymbolsToRename;

  // Boolean options
  bool DeterministicArchives = true;
  bool OnlyKeepDebug = false;
  bool PreserveDates = false;
  bool StripAll = false;
  bool StripAllGNU = false;
  bool StripDebug = false;
  bool StripUnneeded = false;
}

struct ELFConfig {
  // Only applicable when --output-format!=binary (e.g. elf64-x86-64).
  Optional<MachineInfo> OutputArch;

  // Advanced options
  // Cached gnu_debuglink's target CRC
  uint32_t GnuDebugLinkCRC32;
  Optional<StringRef> ExtractPartition;
  StringRef SplitDWO;
  StringRef SymbolsPrefix;
  StringRef AllocSectionsPrefix;
  Optional<uint8_t> NewSymbolVisibility;
  std::vector<NewSymbolInfo> SymbolsToAdd;

  // Section matchers
  NameMatcher KeepSection;

  // Map options
  StringMap<SectionRename> SectionsToRename;
  StringMap<uint64_t> SetSectionAlignment;

  // Symbol matchers
  NameMatcher SymbolsToGlobalize;
  NameMatcher SymbolsToKeep;
  NameMatcher SymbolsToLocalize;
  NameMatcher SymbolsToWeaken;
  NameMatcher SymbolsToKeepGlobal;

  // ELF entry point address expression. The input parameter is an entry point
  // address in the input ELF file. The entry address in the output file is
  // calculated with EntryExpr(input_address), when either --set-start or
  // --change-start is used.
  std::function<uint64_t(uint64_t)> EntryExpr;

  // Boolean options
  bool AllowBrokenLinks = false;
  bool ExtractDWO = false;
  bool ExtractMainPartition = false;
  bool KeepFileSymbols = false;
  bool LocalizeHidden = false;
  bool StripDWO = false;
  bool StripNonAlloc = false;
  bool StripSections = false;
  bool Weaken = false;
  bool DecompressDebugSections = false;

  DebugCompressionType CompressionType = DebugCompressionType::None;
};

Or like in current llvm code :

struct ELFConfig {
  Optional<uint8_t> NewSymbolVisibility;
  std::vector<NewSymbolInfo> SymbolsToAdd;
}

?

Maybe it should be a two-part process? In this first patch, keep the split more-or-less the same as it is now, so that the diff is simpler, and we can focus on the more important aspects of the patch (switching to the new configuration structure), and then a second patch can move the options between structures so that they're in the most appropriate one.

avl updated this revision to Diff 340083.Apr 23 2021, 9:33 AM

Implemented as suggested.

It seems to me that this version(using encapsulation instead of inheritance)
became more complex because of the necessity to repeatedly specify the name of the config
and to pass pair of configs. To minimize the number of changes we might use inheritance
like in this change(https://reviews.llvm.org/D99055?id=334944).
If the complexity is not a problem, then let`s use a variant from this last version of the patch.

I've skimmed bits of this briefly for now - I am quite busy and don't have the time for a more in-depth review at this point. It would be really helpful to get another pair of eyes on this approach too, to see what others think.

llvm/tools/llvm-objcopy/ConfigManager.cpp
709

Optional suggestion, which might help reduce the repetitive Config.Common and reduce the changes here:

ConfigManager ConfigMgr;
CommonConfig &Confg = ConfigMgr.Common;

The same could be done in other places too.

llvm/tools/llvm-objcopy/ELF/ELFObjcopy.cpp
514

I'd be inclined to call these something like Config and ElfCfg (or even ElfConfig if you don't mind the hiding of the class type within this function). This reduces the changes needed here at least. Thinking about it, even if you want to name CommonConfig something reflecting the Common aspect, I'd call the variable CommonCfg not simply Common, and Elf similarly becomes ElfCfg. The reason is that Elf normally implies some sort of Elf object, not a config class. Renaming Common also helps describe its purpose more clearly.

I've skimmed bits of this briefly for now - I am quite busy and don't have the time for a more in-depth review at this point. It would be really helpful to get another pair of eyes on this approach too, to see what others think.

I've been watching, but my understanding is limited, and from what I've seen it doesn't seem like it's been that beneficial to try to split these things up when they all have to be known to the frontend anyway. So one big struct with some comments about which bits are for which architecture sounds like it would be OK to me - but perhaps I'm missing some things.

llvm/tools/llvm-objcopy/ConfigManager.h
27

Why is there an inheritance relationship here - if there's only one class that implements MultiFormatConfig - why isn't ConfigManager a non-virtual API?

avl added inline comments.Apr 26 2021, 1:33 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

MultiFormatConfig interface is used when passed to routine working with archives:

Expected<std::vector<NewArchiveMember>>
createNewArchiveMembers(const MultiFormatConfig &Config,
                        const object::Archive &Ar);

It would be bad to pass whole ConfigManager here since ConfigManager has elements not used by createNewArchiveMembers. Additionally ConfigManager is a llvm-objcopy specific thing. For some another tool(llvm-another-tool) there might be another implementation of config manager, but it needs to be presented as a universal interface for createNewArchiveMembers. MultiFormatConfig is a such interface.

avl updated this revision to Diff 340631.Apr 26 2021, 1:38 PM

renamed variables as requested.

dblaikie added inline comments.Apr 26 2021, 1:46 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

It seems like an abstraction without a use case yet, so it should wait until there is a use case?

I understand some of this is refactoring to make this more amenable to be a general purpose library - but if all the uses of the archive support currently want a big multi-format config, it seems OK for that to be a single non-virtual thing at the moment. Then when someone has a need to build a format-specific tool with archive support (hopefully we don't do too much of that in LLVM - it's nice to be generic over all our supported object formats where possible) they could add this abstraction?

(what's the idea, that they could implement ConfigManager with a version that returns errors for everything other than the object format they support (ie: an ELF-only use of this would return a real ELF config, but getCOFF/MachO/WasmConfig would return Error about how those formats aren't supported)

avl updated this revision to Diff 340634.Apr 26 2021, 1:46 PM

added several missed renamings.

avl added inline comments.Apr 26 2021, 2:34 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

It seems like an abstraction without a use case yet, so it should wait until there is a use case?

I understand some of this is refactoring to make this more amenable to be a general purpose library - but if all the uses of the archive support currently want a big multi-format config, it seems OK for that to be a single non-virtual thing at the moment. Then when someone has a need to build a format-specific tool with archive support (hopefully we don't do too much of that in LLVM - it's nice to be generic over all our supported object formats where possible) they could add this abstraction?

(what's the idea, that they could implement ConfigManager with a version that returns errors for everything other than the object format they support (ie: an ELF-only use of this would return a real ELF config, but getCOFF/MachO/WasmConfig would return Error about how those formats aren't supported)

My understanding is that there is a problem connected with the fact that there are options that could not be parsed properly without knowing the final format(the same options would be parsed differently for various formats). i.e. we could not parse&fill all options before we make a call to createNewArchiveMembers(). If such options are specified in command line and if archive contains files with different formats then we could not handle such option - it would fail for one or another format(https://reviews.llvm.org/D99055#2701025). Thus we assume that only one format might be requested.

The current solution is to have a callback from createNewArchiveMembers()(from the point where the format is known).
i.e. when it is known that the final format is ELF, we call getELFConfig() which parses lazy options and checks for unsupported options(using code from ConfigManager). Such a call back allows separating createNewArchiveMembers from the code which parses the source options. i.e. when llvm-objcopy code would be divided into the common library part and specific llvm-objcopy part then createNewArchiveMembers() and MultiFormatConfig will go into the library part. ConfigManager will stay in llvm-objcopy specific part. Without such separation, it would be necessary to put ConfigManager into the library part. Putting specific part into the common part looks wrong.

I think before you go too much further with this refactoring, it would be a good idea if we have subsequent patches ready to see which make use of this. Without a concrete use-case, we are making change for no particular reason. With the subsequent patches, we can then see whether the refactoring here and elsewhere actually makes sense for the final use-case.

avl added a comment.Apr 27 2021, 3:24 AM

I think before you go too much further with this refactoring, it would be a good idea if we have subsequent patches ready to see which make use of this. Without a concrete use-case, we are making change for no particular reason. With the subsequent patches, we can then see whether the refactoring here and elsewhere actually makes sense for the final use-case.

https://reviews.llvm.org/D86539 is the example of possible use(it is a bit outdated, but still shows a usecase).
Please, consider following fragment:

template <class ELFT>
Error writeOutputFile(const Options &Options, ELFObjectFile<ELFT> &InputFile,
                      DataBits &OutBits) {
  DwarfutilConfigManager Config;

  Config.InputFilename = Options.InputFileName;
  Config.OutputFilename = Options.OutputFileName;
  if (Options.Compress)
    Config.CompressionType = DebugCompressionType::GNU;
  Config.DecompressDebugSections = Options.Decompress;

  if (Options.StripAll) {
    Config.StripDebug = true;
  } else if (Options.Decompress) {
    Config.DecompressDebugSections = true;
  } else {
    if (Expected<ELFObjectFile<ELFT>> MemFile =
            ELFObjectFile<ELFT>::create(MemoryBufferRef(OutBits, ""))) {
      if (Options.StripUnoptimizedDebug) {
        // Delete all debug sections. Debug sections which have optimized
        // pair should be deleted as they would be replaced with paired
        // optimized version. Debug section which does not have optimized pair
        // should be deleted since they could become incorrect after debug
        // info has been optimized.
        Error Err =
            Config.ToRemove.addMatcher(objectcopy::NameOrPattern::create(
                "\\.debug*", objectcopy::MatchStyle::Wildcard,
                [](Error Err) -> Error { return Err; }));
        error(ToolName, std::move(Err));

        Err = Config.ToRemove.addMatcher(objectcopy::NameOrPattern::create(
            "\\.gdb_index", objectcopy::MatchStyle::Wildcard,
            [](Error Err) -> Error { return Err; }));
        error(ToolName, std::move(Err));
      } else {
        // Delete debug sections which have optimized pair as they should be
        // replaced with paired optimized version.
        for (const auto &Section : MemFile->sections()) {
          if (Expected<StringRef> SectionName = Section.getName()) {
            if (SectionName->startswith(".debug") ||
                *SectionName == ".gdb_index") {
              Error Err =
                  Config.ToRemove.addMatcher(objectcopy::NameOrPattern::create(
                      *SectionName, objectcopy::MatchStyle::Wildcard,
                      [](Error Err) -> Error { return Err; }));
              error(ToolName, std::move(Err));
            }
          }
        }
      }

      // Add new debug sections.
      for (const SectionRef &Sec : MemFile->sections()) {
        if (Expected<StringRef> SecName = Sec.getName())
          if (SecName->startswith(".debug") || *SecName == ".gdb_index") {
            if (Expected<StringRef> SecData = Sec.getContents()) {
              Config.NewDebugSections.insert(
                  std::make_pair(*SecName, *SecData));
            }
          }
      }
    } else {
      error(ToolName, createStringError(std::errc::invalid_argument,
                                        "unknown command line option."));
    }
  }

  objectcopy::FileBuffer FB(Config.OutputFilename);
  return objectcopy::elf::executeObjcopyOnBinary(Config.getCommonConfig(), Config.getElfConfig(), InputFile, FB);  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
}

With the design suggested by the current(D99055) patch there would be created local DwarfutilConfigManager which implements MultiFormatConfig interface. The objectcopy::elf::executeObjcopyOnBinary function from object library would get the options by call to getCommonConfig(), getElfConfig(). To have access to objectcopy functions the https://reviews.llvm.org/D88827 is created. The goal of https://reviews.llvm.org/D88827 is to move object copy functions from the specific tool to the object library area. So that object copy functions would be available to the alternative usages(not only from llvm-objcopy tool):

Error executeObjcopyOnIHex(const CommonConfig &CommonCfg, const ElfConfig &ElfCfg, MemoryBuffer &In,
                           raw_ostream &Out);
Error executeObjcopyOnRawBinary(const CommonConfig &CommonCfg, const ElfConfig &ElfCfg, MemoryBuffer &In,
                                raw_ostream &Out);
Error executeObjcopyOnBinary(const CommonConfig &CommonCfg, const ElfConfig &ElfCfg,
                             object::ELFObjectFileBase &In, raw_ostream &Out);

Error executeObjcopyOnBinary(const CommonConfig &CommonCfg, const COFFConfig &COFFCfg,
                             object::COFFObjectFile &In, raw_ostream &Out);

Error executeObjcopyOnBinary(const CommonConfig &CommonCfg, const MachOConfig &MachOCfg,
                             object::MachOObjectFile &In, raw_ostream &Out);
Error executeObjcopyOnMachOUniversalBinary(
    const MultiFormatConfig &Config, const object::MachOUniversalBinary &In, raw_ostream &Out);

Error executeObjcopyOnBinary(const CommonConfig &CommonCfg, const MachOConfig &MachOCfg,
                             object::WasmObjectFile &In, raw_ostream &Out);

Error executeObjcopyOnArchive(const MultiFormatConfig &Config, const object::Archive &Ar)
dblaikie added inline comments.May 7 2021, 6:24 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

I'm not sure how generic the common part is if it has to call for/have specifically enumerated options for each specific part, though? I'm not sure how much value that adds.

But also I realize this code review seems like it's gone on/been through so many different fairly expensive variations, that I don't want to come in half-informed and complicate this review further. I'd like to try to simplify/streamline the review a bit if I can help with tie-breakers or other things that might enable progress.

@jhenderson (@rupprecht?) - is there some way we could help settle this work a bit more than it seems to have been able to be so far? (happy to video chat, etc, if that might be helpful)

avl added inline comments.May 9 2021, 3:36 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

I'm not sure how generic the common part is if it has to call for/have specifically enumerated options for each specific part, though? I'm not sure how much value that adds.

The necessity to call for the specific part is driven by the fact that there exist options that could not be parsed without knowing the final format. That is the reason why parsing such options is delayed until the final format is known.

Probably, instead of delayed parsing, we might detect the final format by pre-check. i.e. before starting to parse options we inspect the files and detect the format. Then we would be able to parse options properly and create a fully parsed CopyConfig from the scratch. No extra call for specific options would be required.

In that case CopyConfig might be single structure containing all options(without lazy initialized parts).

dblaikie added inline comments.May 9 2021, 7:24 PM
llvm/tools/llvm-objcopy/ConfigManager.h
27

The necessity to call for the specific part is driven by the fact that there exist options that could not be parsed without knowing the final format. That is the reason why parsing such options is delayed until the final format is known.

*nod* Though the current code doesn't have this abstraction and seems to get along OK, yeah? It does that by having some options be implicitly invalid until some operation is called that parses and populations those options (based on which file format is in use)?

But also I realize this code review seems like it's gone on/been through so many different fairly expensive variations, that I don't want to come in half-informed and complicate this review further. I'd like to try to simplify/streamline the review a bit if I can help with tie-breakers or other things that might enable progress.

@jhenderson (@rupprecht?) - is there some way we could help settle this work a bit more than it seems to have been able to be so far? (happy to video chat, etc, if that might be helpful)

Sorry, I've been snowed under with other reviews, combined with workloads elsewhere, so I haven't been able to keep focused on this patch and given it the time it needs. It's also meant that as I've come back to it every so often, I see things differently, hence the occasional changes in opinion.

FWIW, I think the current design generally looks reasonably good to me, but I'm open to other design ideas that satisfy the goals of both the new use-case and the existing behaviour. If an alternative looks simpler/more effective, please feel free to shout, though it might be easier to write it in a different patch so that we can compare and contrast competing approaches.

@avl, when I meant see a use-case, I meant literally see a patch that makes use of this refactoring in its current form. If a later patch can't be cleanly based on this patch, then it indicates a problem with the current design.

Aside: the big if (/*various option checks*/ for the file formats is always something that's bothered me, and I wonder if we really need them. Perhaps that's a different conversation to be had, but it feels like not having them would somewhat simplify some of this code too.

llvm/tools/llvm-objcopy/ConfigManager.cpp
709

I specifically meant the name Config in my example (well, with a typo...), as it stops you needing to constantly change Config to CommonCfg in this function. The same sort of thing goes throughout this patch. In other words, Config would represent the common stuff, and then ELFConfig etc would represent the format-specific stuff.

1060–1061

Same here as above - use Config to refer to the CommonConfig to save having to rename everything. Same goes thoughout this patch.

avl added a comment.May 10 2021, 3:46 AM

FWIW, I think the current design generally looks reasonably good to me, but I'm open to other design ideas that satisfy the goals of both the new use-case and the existing behaviour. If an alternative looks simpler/more effective, please feel free to shout, though it might be easier to write it in a different patch so that we can compare and contrast competing approaches.

The current design is also OK from my point of view. The only thing is that it seems to me that using inheritance instead of encapsulation for CopyConfig structures would make usage more convenient/simple(https://reviews.llvm.org/D99055?id=334944).

@avl, when I meant see a use-case, I meant literally see a patch that makes use of this refactoring in its current form. If a later patch can't be cleanly based on this patch, then it indicates a problem with the current design.

It would take a time to prepare such a patch. The fully ready use case - is some fully implemented tool, it requires a lot of other refactorings and changes. Anyway, would change https://reviews.llvm.org/D86539 to use refactoring from this patch.

llvm/tools/llvm-objcopy/ConfigManager.cpp
709

OK.

avl added inline comments.May 10 2021, 4:40 AM
llvm/tools/llvm-objcopy/ConfigManager.h
27

Right. This - "having some options be implicitly invalid until some operation is called that parses and populations those options" - makes it bad for general interface. Because the interface contains extra and partially incomplete options. "MultiFormatConfig" abstraction hides these details. Thus, unparsed and unintialized options are not seen inside createNewArchiveMembers. In other words, implementation details of options processing of llvm-objcopy are hidden behind MultiFormatConfig interface. This looks good for general interface.

...
FWIW, I think the current design generally looks reasonably good to me,

Nah, that's god enough for me - wanted to help ensure this review was converging, that's all. I'll step back.

avl updated this revision to Diff 344350.May 11 2021, 4:05 AM

renamed as requested.

I've done another skim. I should be able to give this a fuller review again tomorrow.

llvm/tools/llvm-objcopy/ELF/ELFConfig.h
1

Revert the "C++" bit of this change. The old-style was correct - see https://llvm.org/docs/CodingStandards.html#file-headers. In fact, looking at the other headers in this change, please add this to them too.

Also "ELFConfig.h" not "ElfConfig.h" is more correct, as "ELF" is an accronym, so should be all-caps in general.

29–31

Similar comment: "ELF specific" and ELFConfig are the correct spellings.

llvm/tools/llvm-objcopy/MachOConfig.h
15 ↗(On Diff #344350)
llvm/tools/llvm-objcopy/MultiFormatConfig.h
13–15

vector and Allocator.h look unnecessary to me.

llvm/tools/llvm-objcopy/llvm-objcopy.cpp
12–14

Is there a particular reason you haven't put these files inside the COFF/ELF/MachO/wasm subdirectories?

avl added inline comments.May 11 2021, 5:17 AM
llvm/tools/llvm-objcopy/llvm-objcopy.cpp
12–14

They all supposed to go away from the llvm/tools/llvm-objcopy directory. They would be put into the include/llvm/Object/ObjCopy? directory later. So I put it together for now. If it is necessary we might temporarily place them into COFF/ELF/MachO/wasm subdirectories until library review is not ready.

jhenderson added inline comments.May 11 2021, 5:35 AM
llvm/tools/llvm-objcopy/llvm-objcopy.cpp
12–14

I don't think it's particularly hard to move these headers once the code is pulled into a library at a later date. Given that there will be a presumed interim period when this patch will have landed, but the library does not exist yet, we should have everything make sense in the interim. That would imply the file format specific files should be in their corresponding subdirectories. After all, the files that are already in those directories will need to move too, right?

avl added inline comments.May 11 2021, 9:20 AM
llvm/tools/llvm-objcopy/llvm-objcopy.cpp
12–14

yes, that`s right. will put above into the specific directories.

avl updated this revision to Diff 344438.May 11 2021, 9:27 AM

addressed comments.

Do you actually need WasmConfig and COFFConfig? If we can avoid adding them, that would be nice.

llvm/tools/llvm-objcopy/COFF/COFFConfig.h
10

Need to fix the include guards. Same in the other headers, as directed by clang-tidy.

llvm/tools/llvm-objcopy/ELF/ELFConfig.h
1

Still need to change ElfConfig.h to ELFConfig.h in the comment on the first line of the file.

llvm/tools/llvm-objcopy/MultiFormatConfig.h
2

Nit: missing the "C++" bit in this line of the comment.

llvm/tools/llvm-objcopy/llvm-objcopy.cpp
273–274

Perhaps add a CommonConfig &Config local variable to this function? Would allow you to avoid repeated getCommonConfig() calls.

326

Similar to elsewhere, consider adding a CommonConfig &Config local variable to reduce the repetitive ConfigMgr.Common statements, and reduce changes in this diff.

avl added a comment.May 14 2021, 3:58 AM

Do you actually need WasmConfig and COFFConfig? If we can avoid adding them, that would be nice.

Other than specific options set we also need to do following calls for validation:

virtual Expected<const COFFConfig &> getCOFFConfig() const = 0;
virtual Expected<const WasmConfig &> getWasmConfig() const = 0;

I think it is good to have these calls to be done in the same way as other getSpecificConfig() methods. Thus, It looks like we still need to have WasmConfig and COFFConfig structures.

avl updated this revision to Diff 345727.May 16 2021, 2:02 PM

addressed comments.

jhenderson accepted this revision.May 17 2021, 1:04 AM

Okay, I think we're there, LGTM! Might want to give it a couple of days before landing, in case others have any comments.

This revision is now accepted and ready to land.May 17 2021, 1:04 AM
avl added a comment.May 17 2021, 6:01 AM

Thank you for the review.

This revision was automatically updated to reflect the committed changes.