Page MenuHomePhabricator

[llvm-objcopy] NFC: Refactor main objcopy method that takes an ELFReader to a generic Reader.
AbandonedPublic

Authored by rupprecht on Jul 31 2018, 4:37 PM.

Details

Summary

Refactor the main transformation method (ExecuteElfObjcopyOnBinary) to be agnostic of which Reader subclass (currently ELFReader) is passed in.

This adds RTTI to the Reader class so that we can infer the ELF output type from the input reader. As a TODO, we should explicitly require this to be set by the --binary-architechture flag if the Reader is a BinaryReader.

This is a small refactoring in preparation for implementing BinaryReader (see also D41687). It should have no functional change.

Diff Detail

Event Timeline

rupprecht created this revision.Jul 31 2018, 4:37 PM
jhenderson added inline comments.Aug 1 2018, 2:26 AM
tools/llvm-objcopy/Object.cpp
929

The aim is to support this at some point in the future, right? If so, I'd say "currently unsupported" to make it clear that it's not a permanent state of affairs necessarily.

tools/llvm-objcopy/llvm-objcopy.cpp
304

I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc. My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before).

If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs?

rupprecht marked an inline comment as done.Aug 1 2018, 11:54 AM
rupprecht added inline comments.
tools/llvm-objcopy/llvm-objcopy.cpp
304

I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc.

I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first.

My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before).

That's true for CreateWriter itself, but currently ExecuteElfObjcopyOnBinary (which is constructs a writer via CreateWriter) has to look at the ELFReader it has created so it knows what elf type to pass.

Also -- this problem could be pushed back up another layer, i.e. Writer could be passed into ExecuteElfObjcopyOnBinary instead of ExecuteElfObjcopyOnBinary calling CreateWriter itself... although there is currently another chain of ExecuteElfObjcopyOnBinary -> HandleArgs -> SplitDWOToFile -> CreateWriter that would also need to be plumbed.

If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs?

Reading from a binary input should still have an elf type, it will just come from the -B command line flag, instead of being inferred from the object file magic header.

However -- refactoring to your suggestion so that CreateWriter doesn't need to know about Reader, I've chosen to change it to Optional<ElfType>. If the input and output are both "binary" (I'm not sure if that's a valid use of objcopy, but it's at least allowed by command line flags), then there is no elf type.

rupprecht updated this revision to Diff 158593.Aug 1 2018, 11:54 AM
  • Move Reader dependency out of CreateWriter
rupprecht updated this revision to Diff 158595.Aug 1 2018, 11:59 AM
  • Add llvm_unreachable to handle missing optional
rupprecht edited the summary of this revision. (Show Details)Aug 1 2018, 12:00 PM

I forsee a slight conflict between this and another change but it isn't clear how that will work out and I'm not sure what the best solution is. This change overall LGTM so maybe we'll just let things shake out and deal with that conflict as we have more concrete plans.

tools/llvm-objcopy/llvm-objcopy.cpp
304

My feeling is that the reader and writer should be wholly independent but that you might use information from the reader (or other sources) in order to select which writer you use by default. For instance it should be the case that the same output writer type is chosen for the same input reader by default. If -I binary is used however that goes out the window and we force the user to specify. Indeed if the user specifies anything we should listen to the user.

For instance I thought of a use case for converting from ELF64 to ELF32 (which is something I previously said would never happen). The multi boot specification specifies how to load ELF32 but not ELF64 but all existing tools only support producing ELF64 binaries for 64-bit architectures. Generally this is solved by creating a small 32-bit binary that contains the 64-bit binary as a blob (a la -O binary in most cases) but I *think* there's no real reason you can't just convert a 64-bit binary (containing only data and program headers and some meaningless sections like .text) and directly place it in a 32-bit binary. This would work by having the user specify -O elf32-x86_64 or something like that.

I like passing Optional<ElfType> as a means of accomplishing this.

576

This change (which is not necessarily wrong) conflicts with one of the two proposals I made for how --dump-section should work in another change. After a bit of thought I think "ExecuteElfObjcopyOnBlob" might be needed or something. The --dump-section change hasn't really shaken out just yet so we'll see.

rupprecht updated this revision to Diff 158637.Aug 1 2018, 2:24 PM
  • Rename binary -> blob
rupprecht added inline comments.Aug 1 2018, 2:30 PM
tools/llvm-objcopy/llvm-objcopy.cpp
576

I renamed this to ExecuteElfObjcopyOnBlob since it's no longer always a binary -- is that what you meant?
Do you have a link to your proposals on --dump-section?

tools/llvm-objcopy/llvm-objcopy.cpp
291

it looks like now or even in the future None here is not "recoverable" anyway
(down the line we exit with an error),
maybe it would be better to return the enum directly instead ?
(and if the output format is "binary" it will error("Unsupported output format ...")).

jakehehrlich added inline comments.Aug 1 2018, 2:47 PM
tools/llvm-objcopy/llvm-objcopy.cpp
576

No I meant for ExecuteElfObjcopyOnBinary to remain unchanged and to not take a Reader at all but instead for a separate almost identical function ExecuteElfObjcopyOnBlob to exist so that current invocations to ExecuteElfObjcopyOnBinary would have to decide which one to call. The issue is that sometimes there are options (like --dump-section) which only really make sense on the original data. Depending on how that all shakes out we might want to handle that option differently depending on if -I binary or -I elf* was used. One of the options (and the option I'm currently in favor of) for that change would allow us to treat this all uniformly however.

rupprecht added inline comments.Aug 1 2018, 3:26 PM
tools/llvm-objcopy/llvm-objcopy.cpp
291

The unrecoverable error that's thrown when this optional doesn't have a value comes immediately after a check that Config.OutputFormat == "binary". So returning None is valid in that case.

576

Sorry, I'm still not following. Do you mean this method should be duplicated, once for ELF binary, once for non-ELF binaries?

// Original method, always handles ELF binary input
static void ExecuteElfObjcopyOnBinary(const CopyConfig &Config, FileBuffer &Binary,
                                      Buffer &Out) {
  ELFReader Reader(&Binary);
  std::unique_ptr<Object> Obj = Reader.create();

  HandleArgs(Config, *Obj, Reader, Reader.getElfType());
  // ... maybe some special logic only for ELF binaries ...

  std::unique_ptr<Writer> Writer =
      CreateWriter(Config, *Obj, Out, Reader.getElfType());
  Writer->finalize();
  Writer->write();
}

// Called for -I Binary
static void ExecuteElfObjcopyOnBlob(const CopyConfig &Config, MemoryBuffer &Blob,
                                    Buffer &Out) {
  BinaryReader Reader(&Blob);
  std::unique_ptr<Object> Obj = Reader.create();

  ElfType OutputElfType = <... get ELFType from -B flag ...>
  HandleArgs(Config, *Obj, Reader, ET, OutputElfType);
  // ... maybe some special logic only for non-ELF binaries ...

  std::unique_ptr<Writer> Writer =
      CreateWriter(Config, *Obj, Out, OutputElfType);
  Writer->finalize();
  Writer->write();
}
tools/llvm-objcopy/llvm-objcopy.cpp
291

yes, that's exactly what i was saying - but the point was different - wouldn't returning enum directly make the code a bit simpler ?

alexander-shaposhnikov requested changes to this revision.Aug 1 2018, 4:44 PM
This revision now requires changes to proceed.Aug 1 2018, 4:44 PM
jhenderson added inline comments.Aug 2 2018, 6:28 AM
tools/llvm-objcopy/llvm-objcopy.cpp
291

If it's unrecoverable, should we actually return an Expected?

304

I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first.

It was a fairly length email chain we had between ourselves off-list. Assuming @jakehehrlich and @alexshap are happy, I see no reason not to forward it to you.

Speaking of that conflict I mentioned, it won't happen! So we're good to go. This change more or less LGTM. Please resolve Alex's comments however.

tools/llvm-objcopy/llvm-objcopy.cpp
304

Forwarding it is cool with me The discussion was about a more major refactor. It was a big sticking point of mine that reading and writing not be intertwined but this doesn't rise to that level. The conclusion was a whole new Reader and Writer system where we would make something like a small copy of ELFTypes and then we would copy our non-tempted generic versions of ELFTypes to the libObject ELFTypes but only in the reader and writer. The sections themselves would then take a Writer in their write method and...uh I can't actually remember how the Reader worked but it basically translated libObject's ELFTypes into an internal version of those things (that could later be written using Writer).

There was also agreement on using a kind of internal dependency manager to resolve initialization order issues.

576

Yeah that's what I had in mind but don't do that just yet. Also that might all get templated out in some manner as well. I was just pointing out a potential future issue.

rupprecht added inline comments.Aug 2 2018, 1:01 PM
tools/llvm-objcopy/llvm-objcopy.cpp
291

I think it would be OK to return an error/return Expected<ElfType>/etc. if this were called from a scope where we were guaranteed that the output is an elf binary. But, it's not. I think if the output format is "binary", the logical thing for an API to do when asked "What's your elf type?" in a generic context would be return <None>, because the output isn't an ELF binary at all, right?

i.e., the relevant code distilled is:

static void ExecuteElfObjcopyOnBinary() {
  ...
  Optional<ElfType> OutputElfType = GetOutputElfType();
  std::unique_ptr<Writer> Writer = CreateWriter(..., Config, OutputElfType);
  ...
}

static std::unique_ptr<Writer>CreateWriter(..., Config, Optional<OutputElfType>) {
  if (Config.OutputFormat == "binary") // Return binary writer, and OutputElfType is never even looked at
  else // ... Use OutputElfType to make an ELFWriter ...
}

Having GetOutputElfType() return an error if the output is binary will break that case. Having it return Expected<ElfType> would work, although it would be weird -- you'd have to only check it in CreateWriter only after you've checked that the output format isn't binary, so I think Optional<ElfType> is cleaner.

Given the state the code is in today, it's always possible to derive an ELF type, but only because --input-target is currently ignored, and the inputs are implicitly assumed to be ELF objects from which we can derive a type. I think the code is simpler when optimized for reading, and returning an Optional<ElfType> makes it clear that the output might not be ELF.

Sorry for the long reply -- I'm OK with making some changes here, but I'm not sure which direction you want this to go in.

rupprecht updated this revision to Diff 158823.Aug 2 2018, 1:02 PM
  • Revert [rename binary -> blob]
tools/llvm-objcopy/llvm-objcopy.cpp
291

though this is probably not a big deal, i think Optional is not the right choice here because of the reasons mentioned above. I think it would be better to either return enum directly or make use of Expected<...>.

@alexshap, are you happy for me to forward the email thread we had about refactoring llvm-objcopy to @rupprecht?

tools/llvm-objcopy/llvm-objcopy.cpp
291

If I see a function called GetOutputElfType I expect it to return a type of ELF. If the function returns anything else (apart from an error indicator of some kind), then it's no longer an "ElfType", so the function is badly named. If we only expect this to be used when we want an ELF, then the return should be Expected, because anything else would be unexpected, and therefore an error. On the other hand, if this function is intended to be more general, then it should be renamed to simply GetOutputType. I'd still say returning None is wrong, because that would imply that we don't want an output at all. Rather, I'd say an explicit enum is the right response (i.e. return something like ELF64LE/ELF32BE/Binary etc).

Related note: None is not an indicator of error, it is an indicator of No Result being a reasonable response. If this function remains as GetOutputElfType and it is impossible to determine what kind of ELF it was, then a None result might be acceptable.

631–632

I may have missed this somewhere in the discussion, but why are we moving the Reader creation out of the ExecuteElfObjcopyOnBinary function?

rupprecht planned changes to this revision.Aug 3 2018, 10:55 AM

I think the changes here will make more sense with the implementation of BinaryReader::create(), so I'll hold off on this change for now. I'll revive it if it still makes sense to submit independently, but more likely I'll just roll this up into the next change. Sorry for the noise!

tools/llvm-objcopy/llvm-objcopy.cpp
291

OK, I see some of the points here, I think it's just incompatible with the way the code is currently written.

Agreed that GetOutputElfType is bad name now -- a more correct (but gratuitously verbose) name for this implementation would be "GetOutputElfTypeIfOutputIsElf".

In order to be guaranteed that this is only called when we know the output is ELF, we'd either have to duplicate the "if output == binary" check that's in CreateWriter (which feels a little wrong, although maybe it's the least worst option), or we'd have to call this from within CreateWriter after that check (which goes against a previous code review comment that Reader, which this method looks at, should not be passed into CreateWriter).

I'll go back to the drawing board to see if there's a refactoring of these methods that makes this all less awkward.

631–632

The goal I'm trying to reach is that ExecuteElfObjcopyOnBinary should work regardless of what the type of input is, which means it can't take either Binary or ELFReader as input. Thus, callers construct it.

So I think I want to remove -split-dwo as an option all together so that HandleArgs can just not worry about all of this.

For the reasoning on removing -split-dwo (which I'll do in another change today) :

  1. Clang can now emit dwo output directly without calling objcopy which was the better way to do this all along
  2. -split-dwo was litterally never used by anyone. I just invented because it was a better way to do the same fundamental. I never changed clang to actually use this option however.
  3. Now that something even better exists, we have an obligation to kill -split-dwo with fire.
tools/llvm-objcopy/Object.h
631

So I realized (thanks to the other change) that it does not really make sense to put getKind here. I think this is what James was pointing out already but I was just not thinking clearly. I apologize. ELFReader *should* have this as a means of telling us what was read in but Reader should not. The ELFType should be determined using the machine info and be passed to handle args as it was before. If If it weren't for -split-dwo (which quite frankly is kind of a dumb option now that a better alternative exists) we would pass the Writer directly to HandleArgs. The only other planned option is a -split-debug option but that only makes sense for the ELF input format case so when that's implemented it can be moved elsewhere.

rupprecht abandoned this revision.Aug 8 2018, 9:16 AM

D50343 includes (some of) the changes here, so I'm officially abandoning this -- it makes more sense to discuss the design on something concrete.

tools/llvm-objcopy/Object.h
631

I ended up not needing to add llvm's RTTI here, so getKind has been reverted in D50343. I thought I knew where I was going when I sent this prefactoring, but it turns out I didn't.