This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
tools/llvm-objcopy/
-
llvm-objcopy/
2
Object.h
1/1
Object.cpp
20
llvm-objcopy.cpp

Differential D50117

[llvm-objcopy] NFC: Refactor main objcopy method that takes an ELFReader to a generic Reader.
AbandonedPublic

Authored by rupprecht on Jul 31 2018, 4:37 PM.

Download Raw Diff

Details

Reviewers

jakehehrlich
alexander-shaposhnikov
jhenderson

Summary

Refactor the main transformation method (ExecuteElfObjcopyOnBinary) to be agnostic of which Reader subclass (currently ELFReader) is passed in.

This adds RTTI to the Reader class so that we can infer the ELF output type from the input reader. As a TODO, we should explicitly require this to be set by the --binary-architechture flag if the Reader is a BinaryReader.

This is a small refactoring in preparation for implementing BinaryReader (see also D41687). It should have no functional change.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 21011
Build 21011: arc lint + arc unit

Event Timeline

rupprecht created this revision.Jul 31 2018, 4:37 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJul 31 2018, 4:37 PM

Harbormaster completed remote builds in B20937: Diff 158418.Jul 31 2018, 4:37 PM

jhenderson added inline comments.Aug 1 2018, 2:26 AM

tools/llvm-objcopy/Object.cpp
929	The aim is to support this at some point in the future, right? If so, I'd say "currently unsupported" to make it clear that it's not a permanent state of affairs necessarily.
tools/llvm-objcopy/llvm-objcopy.cpp
304	I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc. My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before). If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs?

rupprecht marked an inline comment as done.Aug 1 2018, 11:54 AM

rupprecht added inline comments.

tools/llvm-objcopy/llvm-objcopy.cpp
304	I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc. I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first. My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before). That's true for CreateWriter itself, but currently ExecuteElfObjcopyOnBinary (which is constructs a writer via CreateWriter) has to look at the ELFReader it has created so it knows what elf type to pass. Also -- this problem could be pushed back up another layer, i.e. Writer could be passed into ExecuteElfObjcopyOnBinary instead of ExecuteElfObjcopyOnBinary calling CreateWriter itself... although there is currently another chain of ExecuteElfObjcopyOnBinary -> HandleArgs -> SplitDWOToFile -> CreateWriter that would also need to be plumbed. If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs? Reading from a binary input should still have an elf type, it will just come from the -B command line flag, instead of being inferred from the object file magic header. However -- refactoring to your suggestion so that CreateWriter doesn't need to know about Reader, I've chosen to change it to Optional<ElfType>. If the input and output are both "binary" (I'm not sure if that's a valid use of objcopy, but it's at least allowed by command line flags), then there is no elf type.

Move Reader dependency out of CreateWriter

Harbormaster completed remote builds in B20968: Diff 158593.Aug 1 2018, 11:55 AM

Add llvm_unreachable to handle missing optional

rupprecht edited the summary of this revision. (Show Details)Aug 1 2018, 12:00 PM

I forsee a slight conflict between this and another change but it isn't clear how that will work out and I'm not sure what the best solution is. This change overall LGTM so maybe we'll just let things shake out and deal with that conflict as we have more concrete plans.

tools/llvm-objcopy/llvm-objcopy.cpp
304	My feeling is that the reader and writer should be wholly independent but that you might use information from the reader (or other sources) in order to select which writer you use by default. For instance it should be the case that the same output writer type is chosen for the same input reader by default. If `-I binary` is used however that goes out the window and we force the user to specify. Indeed if the user specifies anything we should listen to the user. For instance I thought of a use case for converting from ELF64 to ELF32 (which is something I previously said would never happen). The multi boot specification specifies how to load ELF32 but not ELF64 but all existing tools only support producing ELF64 binaries for 64-bit architectures. Generally this is solved by creating a small 32-bit binary that contains the 64-bit binary as a blob (a la `-O binary` in most cases) but I think there's no real reason you can't just convert a 64-bit binary (containing only data and program headers and some meaningless sections like .text) and directly place it in a 32-bit binary. This would work by having the user specify `-O elf32-x86_64` or something like that. I like passing `Optional<ElfType>` as a means of accomplishing this.
576	This change (which is not necessarily wrong) conflicts with one of the two proposals I made for how --dump-section should work in another change. After a bit of thought I think "ExecuteElfObjcopyOnBlob" might be needed or something. The --dump-section change hasn't really shaken out just yet so we'll see.

Rename binary -> blob

Harbormaster completed remote builds in B20980: Diff 158637.Aug 1 2018, 2:24 PM

rupprecht added inline comments.Aug 1 2018, 2:30 PM

tools/llvm-objcopy/llvm-objcopy.cpp
576	I renamed this to ExecuteElfObjcopyOnBlob since it's no longer always a binary -- is that what you meant? Do you have a link to your proposals on --dump-section?

alexander-shaposhnikov added inline comments.Aug 1 2018, 2:39 PM

tools/llvm-objcopy/llvm-objcopy.cpp
291	it looks like now or even in the future None here is not "recoverable" anyway (down the line we exit with an error), maybe it would be better to return the enum directly instead ? (and if the output format is "binary" it will error("Unsupported output format ...")).

jakehehrlich added inline comments.Aug 1 2018, 2:47 PM

tools/llvm-objcopy/llvm-objcopy.cpp
576	No I meant for ExecuteElfObjcopyOnBinary to remain unchanged and to not take a Reader at all but instead for a separate almost identical function ExecuteElfObjcopyOnBlob to exist so that current invocations to ExecuteElfObjcopyOnBinary would have to decide which one to call. The issue is that sometimes there are options (like --dump-section) which only really make sense on the original data. Depending on how that all shakes out we might want to handle that option differently depending on if -I binary or -I elf* was used. One of the options (and the option I'm currently in favor of) for that change would allow us to treat this all uniformly however.

rupprecht added inline comments.Aug 1 2018, 3:26 PM

tools/llvm-objcopy/llvm-objcopy.cpp

291

The unrecoverable error that's thrown when this optional doesn't have a value comes immediately after a check that Config.OutputFormat == "binary". So returning None is valid in that case.

576

Sorry, I'm still not following. Do you mean this method should be duplicated, once for ELF binary, once for non-ELF binaries?

// Original method, always handles ELF binary input
static void ExecuteElfObjcopyOnBinary(const CopyConfig &Config, FileBuffer &Binary,
                                      Buffer &Out) {
  ELFReader Reader(&Binary);
  std::unique_ptr<Object> Obj = Reader.create();

  HandleArgs(Config, *Obj, Reader, Reader.getElfType());
  // ... maybe some special logic only for ELF binaries ...

  std::unique_ptr<Writer> Writer =
      CreateWriter(Config, *Obj, Out, Reader.getElfType());
  Writer->finalize();
  Writer->write();
}

// Called for -I Binary
static void ExecuteElfObjcopyOnBlob(const CopyConfig &Config, MemoryBuffer &Blob,
                                    Buffer &Out) {
  BinaryReader Reader(&Blob);
  std::unique_ptr<Object> Obj = Reader.create();

  ElfType OutputElfType = <... get ELFType from -B flag ...>
  HandleArgs(Config, *Obj, Reader, ET, OutputElfType);
  // ... maybe some special logic only for non-ELF binaries ...

  std::unique_ptr<Writer> Writer =
      CreateWriter(Config, *Obj, Out, OutputElfType);
  Writer->finalize();
  Writer->write();
}

alexander-shaposhnikov added inline comments.Aug 1 2018, 4:16 PM

tools/llvm-objcopy/llvm-objcopy.cpp
291	yes, that's exactly what i was saying - but the point was different - wouldn't returning enum directly make the code a bit simpler ?

alexander-shaposhnikov requested changes to this revision.Aug 1 2018, 4:44 PM

This revision now requires changes to proceed.Aug 1 2018, 4:44 PM

jhenderson added inline comments.Aug 2 2018, 6:28 AM

tools/llvm-objcopy/llvm-objcopy.cpp
291	If it's unrecoverable, should we actually return an Expected?
304	I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first. It was a fairly length email chain we had between ourselves off-list. Assuming @jakehehrlich and @alexshap are happy, I see no reason not to forward it to you.

Speaking of that conflict I mentioned, it won't happen! So we're good to go. This change more or less LGTM. Please resolve Alex's comments however.

tools/llvm-objcopy/llvm-objcopy.cpp
304	Forwarding it is cool with me The discussion was about a more major refactor. It was a big sticking point of mine that reading and writing not be intertwined but this doesn't rise to that level. The conclusion was a whole new Reader and Writer system where we would make something like a small copy of ELFTypes and then we would copy our non-tempted generic versions of ELFTypes to the libObject ELFTypes but only in the reader and writer. The sections themselves would then take a Writer in their write method and...uh I can't actually remember how the Reader worked but it basically translated libObject's ELFTypes into an internal version of those things (that could later be written using Writer). There was also agreement on using a kind of internal dependency manager to resolve initialization order issues.
576	Yeah that's what I had in mind but don't do that just yet. Also that might all get templated out in some manner as well. I was just pointing out a potential future issue.

rupprecht added inline comments.Aug 2 2018, 1:01 PM

tools/llvm-objcopy/llvm-objcopy.cpp
291	I think it would be OK to return an error/return Expected<ElfType>/etc. if this were called from a scope where we were guaranteed that the output is an elf binary. But, it's not. I think if the output format is "binary", the logical thing for an API to do when asked "What's your elf type?" in a generic context would be return <None>, because the output isn't an ELF binary at all, right? i.e., the relevant code distilled is: static void ExecuteElfObjcopyOnBinary() { ... Optional<ElfType> OutputElfType = GetOutputElfType(); std::unique_ptr<Writer> Writer = CreateWriter(..., Config, OutputElfType); ... } static std::unique_ptr<Writer>CreateWriter(..., Config, Optional<OutputElfType>) { if (Config.OutputFormat == "binary") // Return binary writer, and OutputElfType is never even looked at else // ... Use OutputElfType to make an ELFWriter ... } Having GetOutputElfType() return an error if the output is binary will break that case. Having it return Expected<ElfType> would work, although it would be weird -- you'd have to only check it in CreateWriter only after you've checked that the output format isn't binary, so I think Optional<ElfType> is cleaner. Given the state the code is in today, it's always possible to derive an ELF type, but only because --input-target is currently ignored, and the inputs are implicitly assumed to be ELF objects from which we can derive a type. I think the code is simpler when optimized for reading, and returning an Optional<ElfType> makes it clear that the output might not be ELF. Sorry for the long reply -- I'm OK with making some changes here, but I'm not sure which direction you want this to go in.

Revert [rename binary -> blob]

Harbormaster completed remote builds in B21011: Diff 158823.Aug 2 2018, 1:02 PM

alexander-shaposhnikov added inline comments.Aug 2 2018, 2:31 PM

tools/llvm-objcopy/llvm-objcopy.cpp
291	though this is probably not a big deal, i think Optional is not the right choice here because of the reasons mentioned above. I think it would be better to either return enum directly or make use of Expected<...>.

@alexshap, are you happy for me to forward the email thread we had about refactoring llvm-objcopy to @rupprecht?

tools/llvm-objcopy/llvm-objcopy.cpp
291	If I see a function called `GetOutputElfType` I expect it to return a type of ELF. If the function returns anything else (apart from an error indicator of some kind), then it's no longer an "ElfType", so the function is badly named. If we only expect this to be used when we want an ELF, then the return should be Expected, because anything else would be unexpected, and therefore an error. On the other hand, if this function is intended to be more general, then it should be renamed to simply `GetOutputType`. I'd still say returning None is wrong, because that would imply that we don't want an output at all. Rather, I'd say an explicit enum is the right response (i.e. return something like ELF64LE/ELF32BE/Binary etc). Related note: None is not an indicator of error, it is an indicator of No Result being a reasonable response. If this function remains as GetOutputElfType and it is impossible to determine what kind of ELF it was, then a None result might be acceptable.
631–632	I may have missed this somewhere in the discussion, but why are we moving the Reader creation out of the ExecuteElfObjcopyOnBinary function?

I think the changes here will make more sense with the implementation of BinaryReader::create(), so I'll hold off on this change for now. I'll revive it if it still makes sense to submit independently, but more likely I'll just roll this up into the next change. Sorry for the noise!

tools/llvm-objcopy/llvm-objcopy.cpp
291	OK, I see some of the points here, I think it's just incompatible with the way the code is currently written. Agreed that GetOutputElfType is bad name now -- a more correct (but gratuitously verbose) name for this implementation would be "GetOutputElfTypeIfOutputIsElf". In order to be guaranteed that this is only called when we know the output is ELF, we'd either have to duplicate the "if output == binary" check that's in CreateWriter (which feels a little wrong, although maybe it's the least worst option), or we'd have to call this from within CreateWriter after that check (which goes against a previous code review comment that Reader, which this method looks at, should not be passed into CreateWriter). I'll go back to the drawing board to see if there's a refactoring of these methods that makes this all less awkward.
631–632	The goal I'm trying to reach is that ExecuteElfObjcopyOnBinary should work regardless of what the type of input is, which means it can't take either Binary or ELFReader as input. Thus, callers construct it.

So I think I want to remove -split-dwo as an option all together so that HandleArgs can just not worry about all of this.

For the reasoning on removing -split-dwo (which I'll do in another change today) :

Clang can now emit dwo output directly without calling objcopy which was the better way to do this all along
-split-dwo was litterally never used by anyone. I just invented because it was a better way to do the same fundamental. I never changed clang to actually use this option however.
Now that something even better exists, we have an obligation to kill -split-dwo with fire.

tools/llvm-objcopy/Object.h
631	So I realized (thanks to the other change) that it does not really make sense to put getKind here. I think this is what James was pointing out already but I was just not thinking clearly. I apologize. ELFReader should have this as a means of telling us what was read in but Reader should not. The ELFType should be determined using the machine info and be passed to handle args as it was before. If If it weren't for -split-dwo (which quite frankly is kind of a dumb option now that a better alternative exists) we would pass the Writer directly to HandleArgs. The only other planned option is a -split-debug option but that only makes sense for the ELF input format case so when that's implemented it can be moved elsewhere.

D50343 includes (some of) the changes here, so I'm officially abandoning this -- it makes more sense to discuss the design on something concrete.

tools/llvm-objcopy/Object.h
631	I ended up not needing to add llvm's RTTI here, so getKind has been reverted in D50343. I thought I knew where I was going when I sent this prefactoring, but it turns out I didn't.

Revision Contents

Path

Size

tools/

llvm-objcopy/

Object.h

27 lines

Object.cpp

4 lines

llvm-objcopy.cpp

38 lines

Diff 158823

tools/llvm-objcopy/Object.h

	Show First 20 Lines • Show All 608 Lines • ▼ Show 20 Lines
	public:			public:
	// If we add this section from an external source we can use this ctor.			// If we add this section from an external source we can use this ctor.
	explicit GnuDebugLinkSection(StringRef File);			explicit GnuDebugLinkSection(StringRef File);
	void accept(SectionVisitor &Visitor) const override;			void accept(SectionVisitor &Visitor) const override;
	};			};

	class Reader {			class Reader {
	public:			public:
				enum class ReaderKind {
				BinaryKind,
				ELFKind,
				};

				private:
				const ReaderKind Kind;

				protected:
				Reader(ReaderKind K) : Kind(K) {}

				public:
	virtual ~Reader();			virtual ~Reader();
	virtual std::unique_ptr<Object> create() const = 0;			virtual std::unique_ptr<Object> create() const = 0;
				ReaderKind getKind() const { return Kind; }
				jakehehrlichUnsubmitted Not Done Reply Inline Actions So I realized (thanks to the other change) that it does not really make sense to put getKind here. I think this is what James was pointing out already but I was just not thinking clearly. I apologize. ELFReader should have this as a means of telling us what was read in but Reader should not. The ELFType should be determined using the machine info and be passed to handle args as it was before. If If it weren't for -split-dwo (which quite frankly is kind of a dumb option now that a better alternative exists) we would pass the Writer directly to HandleArgs. The only other planned option is a -split-debug option but that only makes sense for the ELF input format case so when that's implemented it can be moved elsewhere. jakehehrlich: So I realized (thanks to the other change) that it does not really make sense to put getKind…
				rupprechtAuthorUnsubmitted Not Done Reply Inline Actions I ended up not needing to add llvm's RTTI here, so getKind has been reverted in D50343. I thought I knew where I was going when I sent this prefactoring, but it turns out I didn't. rupprecht: I ended up not needing to add llvm's RTTI here, so getKind has been reverted in D50343. I…
	};			};

	using object::Binary;			using object::Binary;
	using object::ELFFile;			using object::ELFFile;
	using object::ELFObjectFile;			using object::ELFObjectFile;
	using object::OwningBinary;			using object::OwningBinary;

	template <class ELFT> class ELFBuilder {			template <class ELFT> class ELFBuilder {
	Show All 15 Lines

	public:			public:
	ELFBuilder(const ELFObjectFile<ELFT> &ElfObj, Object &Obj)			ELFBuilder(const ELFObjectFile<ELFT> &ElfObj, Object &Obj)
	: ElfFile(*ElfObj.getELFFile()), Obj(Obj) {}			: ElfFile(*ElfObj.getELFFile()), Obj(Obj) {}

	void build();			void build();
	};			};

				class BinaryReader : public Reader {
				public:
				BinaryReader() : Reader(ReaderKind::BinaryKind) {}
				std::unique_ptr<Object> create() const override;
				static bool classof(const Reader *R) {
				return R->getKind() == ReaderKind::BinaryKind;
				}
				};

	class ELFReader : public Reader {			class ELFReader : public Reader {
	Binary *Bin;			Binary *Bin;

	public:			public:
	ElfType getElfType() const;			ElfType getElfType() const;
	std::unique_ptr<Object> create() const override;			std::unique_ptr<Object> create() const override;
	explicit ELFReader(Binary *B) : Bin(B){};			explicit ELFReader(Binary *B) : Reader(ReaderKind::ELFKind), Bin(B) {}
				static bool classof(const Reader *R) {
				return R->getKind() == ReaderKind::ELFKind;
				}
	};			};

	class Object {			class Object {
	private:			private:
	using SecPtr = std::unique_ptr<SectionBase>;			using SecPtr = std::unique_ptr<SectionBase>;
	using SegPtr = std::unique_ptr<Segment>;			using SegPtr = std::unique_ptr<Segment>;

	std::vector<SecPtr> Sections;			std::vector<SecPtr> Sections;
	▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

tools/llvm-objcopy/Object.cpp

	Show First 20 Lines • Show All 919 Lines • ▼ Show 20 Lines
	template <class R> size_t size(R &&Range) {			template <class R> size_t size(R &&Range) {
	return static_cast<size_t>(std::end(Range) - std::begin(Range));			return static_cast<size_t>(std::end(Range) - std::begin(Range));
	}			}

	Writer::~Writer() {}			Writer::~Writer() {}

	Reader::~Reader() {}			Reader::~Reader() {}

				std::unique_ptr<Object> BinaryReader::create() const {
				error("--input-target Binary is currently unsupported");
				jhendersonUnsubmitted Done Reply Inline Actions The aim is to support this at some point in the future, right? If so, I'd say "currently unsupported" to make it clear that it's not a permanent state of affairs necessarily. jhenderson: The aim is to support this at some point in the future, right? If so, I'd say "currently…
				}

	ElfType ELFReader::getElfType() const {			ElfType ELFReader::getElfType() const {
	if (isa<ELFObjectFile<ELF32LE>>(Bin))			if (isa<ELFObjectFile<ELF32LE>>(Bin))
	return ELFT_ELF32LE;			return ELFT_ELF32LE;
	if (isa<ELFObjectFile<ELF64LE>>(Bin))			if (isa<ELFObjectFile<ELF64LE>>(Bin))
	return ELFT_ELF64LE;			return ELFT_ELF64LE;
	if (isa<ELFObjectFile<ELF32BE>>(Bin))			if (isa<ELFObjectFile<ELF32BE>>(Bin))
	return ELFT_ELF32BE;			return ELFT_ELF32BE;
	if (isa<ELFObjectFile<ELF64BE>>(Bin))			if (isa<ELFObjectFile<ELF64BE>>(Bin))
	▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

tools/llvm-objcopy/llvm-objcopy.cpp

Show First 20 Lines • Show All 282 Lines • ▼ Show 20 Lines	static bool OnlyKeepDWOPred(const Object &Obj, const SectionBase &Sec) {
// We can't remove the section header string table.		// We can't remove the section header string table.
if (&Sec == Obj.SectionNames)		if (&Sec == Obj.SectionNames)
return false;		return false;
// Short of keeping the string table we want to keep everything that is a DWO		// Short of keeping the string table we want to keep everything that is a DWO
// section and remove everything else.		// section and remove everything else.
return !IsDWOSection(Sec);		return !IsDWOSection(Sec);
}		}

		static Optional<ElfType> GetOutputElfType(const CopyConfig &Config,
		alexander-shaposhnikovUnsubmitted Not Done Reply Inline Actions it looks like now or even in the future None here is not "recoverable" anyway (down the line we exit with an error), maybe it would be better to return the enum directly instead ? (and if the output format is "binary" it will error("Unsupported output format ...")). alexander-shaposhnikov: it looks like now or even in the future None here is not "recoverable" anyway (down the line…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions The unrecoverable error that's thrown when this optional doesn't have a value comes immediately after a check that Config.OutputFormat == "binary". So returning None is valid in that case. rupprecht: The unrecoverable error that's thrown when this optional doesn't have a value comes immediately…
		alexander-shaposhnikovUnsubmitted Not Done Reply Inline Actions yes, that's exactly what i was saying - but the point was different - wouldn't returning enum directly make the code a bit simpler ? alexander-shaposhnikov: yes, that's exactly what i was saying - but the point was different - wouldn't returning enum…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions I think it would be OK to return an error/return Expected<ElfType>/etc. if this were called from a scope where we were guaranteed that the output is an elf binary. But, it's not. I think if the output format is "binary", the logical thing for an API to do when asked "What's your elf type?" in a generic context would be return <None>, because the output isn't an ELF binary at all, right? i.e., the relevant code distilled is: static void ExecuteElfObjcopyOnBinary() { ... Optional<ElfType> OutputElfType = GetOutputElfType(); std::unique_ptr<Writer> Writer = CreateWriter(..., Config, OutputElfType); ... } static std::unique_ptr<Writer>CreateWriter(..., Config, Optional<OutputElfType>) { if (Config.OutputFormat == "binary") // Return binary writer, and OutputElfType is never even looked at else // ... Use OutputElfType to make an ELFWriter ... } Having GetOutputElfType() return an error if the output is binary will break that case. Having it return Expected<ElfType> would work, although it would be weird -- you'd have to only check it in CreateWriter only after you've checked that the output format isn't binary, so I think Optional<ElfType> is cleaner. Given the state the code is in today, it's always possible to derive an ELF type, but only because --input-target is currently ignored, and the inputs are implicitly assumed to be ELF objects from which we can derive a type. I think the code is simpler when optimized for reading, and returning an Optional<ElfType> makes it clear that the output might not be ELF. Sorry for the long reply -- I'm OK with making some changes here, but I'm not sure which direction you want this to go in. rupprecht: I think it would be OK to return an error/return Expected<ElfType>/etc. if this were called…
		jhendersonUnsubmitted Not Done Reply Inline Actions If it's unrecoverable, should we actually return an Expected? jhenderson: If it's unrecoverable, should we actually return an Expected?
		alexander-shaposhnikovUnsubmitted Not Done Reply Inline Actions though this is probably not a big deal, i think Optional is not the right choice here because of the reasons mentioned above. I think it would be better to either return enum directly or make use of Expected<...>. alexander-shaposhnikov: though this is probably not a big deal, i think Optional is not the right choice here because…
		jhendersonUnsubmitted Not Done Reply Inline Actions If I see a function called `GetOutputElfType` I expect it to return a type of ELF. If the function returns anything else (apart from an error indicator of some kind), then it's no longer an "ElfType", so the function is badly named. If we only expect this to be used when we want an ELF, then the return should be Expected, because anything else would be unexpected, and therefore an error. On the other hand, if this function is intended to be more general, then it should be renamed to simply `GetOutputType`. I'd still say returning None is wrong, because that would imply that we don't want an output at all. Rather, I'd say an explicit enum is the right response (i.e. return something like ELF64LE/ELF32BE/Binary etc). Related note: None is not an indicator of error, it is an indicator of No Result being a reasonable response. If this function remains as GetOutputElfType and it is impossible to determine what kind of ELF it was, then a None result might be acceptable. jhenderson: If I see a function called `GetOutputElfType` I expect it to return a type of ELF. If the…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions OK, I see some of the points here, I think it's just incompatible with the way the code is currently written. Agreed that GetOutputElfType is bad name now -- a more correct (but gratuitously verbose) name for this implementation would be "GetOutputElfTypeIfOutputIsElf". In order to be guaranteed that this is only called when we know the output is ELF, we'd either have to duplicate the "if output == binary" check that's in CreateWriter (which feels a little wrong, although maybe it's the least worst option), or we'd have to call this from within CreateWriter after that check (which goes against a previous code review comment that Reader, which this method looks at, should not be passed into CreateWriter). I'll go back to the drawing board to see if there's a refactoring of these methods that makes this all less awkward. rupprecht: OK, I see some of the points here, I think it's just incompatible with the way the code is…
		const Reader &Reader) {
		if (Config.OutputFormat == "binary")
		return None;
		// We must be an ELFReader -- it's the only implementation currently used.
		// TODO: support BinaryReader. In that case, read from
		// Config.BinaryArchitecture config to determine output type.
		const auto *ER = dyn_cast<ELFReader>(&Reader);
		return ER->getElfType();
		}

static std::unique_ptr<Writer> CreateWriter(const CopyConfig &Config,		static std::unique_ptr<Writer> CreateWriter(const CopyConfig &Config,
Object &Obj, Buffer &Buf,		Object &Obj, Buffer &Buf,
ElfType OutputElfType) {		Optional<ElfType> OutputElfType) {
		jhendersonUnsubmitted Not Done Reply Inline Actions I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc. My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before). If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs? jhenderson: I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago, but I don't really remember what we agreed etc. I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first. My current instinct is that a Writer (and the functions used to create it) shouldn't need to know about anything to do with the Reader. This change couples the two together, which I don't really like. I'd rather it be decided outside the CreateWriter function and passed in (as was the case before). That's true for CreateWriter itself, but currently ExecuteElfObjcopyOnBinary (which is constructs a writer via CreateWriter) has to look at the ELFReader it has created so it knows what elf type to pass. Also -- this problem could be pushed back up another layer, i.e. Writer could be passed into ExecuteElfObjcopyOnBinary instead of ExecuteElfObjcopyOnBinary calling CreateWriter itself... although there is currently another chain of ExecuteElfObjcopyOnBinary -> HandleArgs -> SplitDWOToFile -> CreateWriter that would also need to be plumbed. If the problem is the assumption of needing an "ElfType", maybe we should generalise that enum to not be specific to do with ELFs? Reading from a binary input should still have an elf type, it will just come from the -B command line flag, instead of being inferred from the object file magic header. However -- refactoring to your suggestion so that CreateWriter doesn't need to know about Reader, I've chosen to change it to Optional<ElfType>. If the input and output are both "binary" (I'm not sure if that's a valid use of objcopy, but it's at least allowed by command line flags), then there is no elf type. rupprecht: > I feel like @jakehehrlich, @alexshap and I had a discussion about this a number of months ago…
		jhendersonUnsubmitted Not Done Reply Inline Actions I've read through the comments at D41687 (Jake pointed me to that one as a starter for this change), but let me know if there's some other discussion thread you had that I should read first. It was a fairly length email chain we had between ourselves off-list. Assuming @jakehehrlich and @alexshap are happy, I see no reason not to forward it to you. jhenderson: > I've read through the comments at D41687 (Jake pointed me to that one as a starter for this…
		jakehehrlichUnsubmitted Not Done Reply Inline Actions Forwarding it is cool with me The discussion was about a more major refactor. It was a big sticking point of mine that reading and writing not be intertwined but this doesn't rise to that level. The conclusion was a whole new Reader and Writer system where we would make something like a small copy of ELFTypes and then we would copy our non-tempted generic versions of ELFTypes to the libObject ELFTypes but only in the reader and writer. The sections themselves would then take a Writer in their write method and...uh I can't actually remember how the Reader worked but it basically translated libObject's ELFTypes into an internal version of those things (that could later be written using Writer). There was also agreement on using a kind of internal dependency manager to resolve initialization order issues. jakehehrlich: Forwarding it is cool with me The discussion was about a more major refactor. It was a big…
		jakehehrlichUnsubmitted Not Done Reply Inline Actions My feeling is that the reader and writer should be wholly independent but that you might use information from the reader (or other sources) in order to select which writer you use by default. For instance it should be the case that the same output writer type is chosen for the same input reader by default. If `-I binary` is used however that goes out the window and we force the user to specify. Indeed if the user specifies anything we should listen to the user. For instance I thought of a use case for converting from ELF64 to ELF32 (which is something I previously said would never happen). The multi boot specification specifies how to load ELF32 but not ELF64 but all existing tools only support producing ELF64 binaries for 64-bit architectures. Generally this is solved by creating a small 32-bit binary that contains the 64-bit binary as a blob (a la `-O binary` in most cases) but I think there's no real reason you can't just convert a 64-bit binary (containing only data and program headers and some meaningless sections like .text) and directly place it in a 32-bit binary. This would work by having the user specify `-O elf32-x86_64` or something like that. I like passing `Optional<ElfType>` as a means of accomplishing this. jakehehrlich: My feeling is that the reader and writer should be wholly independent but that you might use…
if (Config.OutputFormat == "binary") {		if (Config.OutputFormat == "binary") {
return llvm::make_unique<BinaryWriter>(Obj, Buf);		return llvm::make_unique<BinaryWriter>(Obj, Buf);
}		}
// Depending on the initial ELFT and OutputFormat we need a different Writer.		// Depending on the initial ELFT and OutputFormat we need a different Writer.
switch (OutputElfType) {		if (!OutputElfType)
		llvm_unreachable("Unknown output ELF type");
		switch (OutputElfType.getValue()) {
case ELFT_ELF32LE:		case ELFT_ELF32LE:
return llvm::make_unique<ELFWriter<ELF32LE>>(Obj, Buf,		return llvm::make_unique<ELFWriter<ELF32LE>>(Obj, Buf,
!Config.StripSections);		!Config.StripSections);
case ELFT_ELF64LE:		case ELFT_ELF64LE:
return llvm::make_unique<ELFWriter<ELF64LE>>(Obj, Buf,		return llvm::make_unique<ELFWriter<ELF64LE>>(Obj, Buf,
!Config.StripSections);		!Config.StripSections);
case ELFT_ELF32BE:		case ELFT_ELF32BE:
return llvm::make_unique<ELFWriter<ELF32BE>>(Obj, Buf,		return llvm::make_unique<ELFWriter<ELF32BE>>(Obj, Buf,
!Config.StripSections);		!Config.StripSections);
case ELFT_ELF64BE:		case ELFT_ELF64BE:
return llvm::make_unique<ELFWriter<ELF64BE>>(Obj, Buf,		return llvm::make_unique<ELFWriter<ELF64BE>>(Obj, Buf,
!Config.StripSections);		!Config.StripSections);
}		}
llvm_unreachable("Invalid output format");		llvm_unreachable("Invalid output format");
}		}

static void SplitDWOToFile(const CopyConfig &Config, const Reader &Reader,		static void SplitDWOToFile(const CopyConfig &Config, const Reader &Reader,
StringRef File, ElfType OutputElfType) {		StringRef File) {
auto DWOFile = Reader.create();		auto DWOFile = Reader.create();
DWOFile->removeSections(		DWOFile->removeSections(
[&](const SectionBase &Sec) { return OnlyKeepDWOPred(*DWOFile, Sec); });		[&](const SectionBase &Sec) { return OnlyKeepDWOPred(*DWOFile, Sec); });
FileBuffer FB(File);		FileBuffer FB(File);
		Optional<ElfType> OutputElfType = GetOutputElfType(Config, Reader);
auto Writer = CreateWriter(Config, *DWOFile, FB, OutputElfType);		auto Writer = CreateWriter(Config, *DWOFile, FB, OutputElfType);
Writer->finalize();		Writer->finalize();
Writer->write();		Writer->write();
}		}

// This function handles the high level operations of GNU objcopy including		// This function handles the high level operations of GNU objcopy including
// handling command line options. It's important to outline certain properties		// handling command line options. It's important to outline certain properties
// we expect to hold of the command line operations. Any operation that "keeps"		// we expect to hold of the command line operations. Any operation that "keeps"
// should keep regardless of a remove. Additionally any removal should respect		// should keep regardless of a remove. Additionally any removal should respect
// any previous removals. Lastly whether or not something is removed shouldn't		// any previous removals. Lastly whether or not something is removed shouldn't
// depend a) on the order the options occur in or b) on some opaque priority		// depend a) on the order the options occur in or b) on some opaque priority
// system. The only priority is that keeps/copies overrule removes.		// system. The only priority is that keeps/copies overrule removes.
static void HandleArgs(const CopyConfig &Config, Object &Obj,		static void HandleArgs(const CopyConfig &Config, Object &Obj,
const Reader &Reader, ElfType OutputElfType) {		const Reader &Reader) {

if (!Config.SplitDWO.empty()) {		if (!Config.SplitDWO.empty()) {
SplitDWOToFile(Config, Reader, Config.SplitDWO, OutputElfType);		SplitDWOToFile(Config, Reader, Config.SplitDWO);
}		}

// TODO: update or remove symbols only if there is an option that affects		// TODO: update or remove symbols only if there is an option that affects
// them.		// them.
if (Obj.SymbolTable) {		if (Obj.SymbolTable) {
Obj.SymbolTable->updateSymbols([&](Symbol &Sym) {		Obj.SymbolTable->updateSymbols([&](Symbol &Sym) {
if ((Config.LocalizeHidden &&		if ((Config.LocalizeHidden &&
(Sym.Visibility == STV_HIDDEN \|\| Sym.Visibility == STV_INTERNAL)) \|\|		(Sym.Visibility == STV_HIDDEN \|\| Sym.Visibility == STV_INTERNAL)) \|\|
▲ Show 20 Lines • Show All 208 Lines • ▼ Show 20 Lines	for (const auto &Flag : Config.AddSection) {
ArrayRef<uint8_t>(BufPtr, BufSize));		ArrayRef<uint8_t>(BufPtr, BufSize));
}		}
}		}

if (!Config.AddGnuDebugLink.empty())		if (!Config.AddGnuDebugLink.empty())
Obj.addSection<GnuDebugLinkSection>(Config.AddGnuDebugLink);		Obj.addSection<GnuDebugLinkSection>(Config.AddGnuDebugLink);
}		}

static void ExecuteElfObjcopyOnBinary(const CopyConfig &Config, Binary &Binary,		static void ExecuteElfObjcopyOnBinary(const CopyConfig &Config, Reader &Reader,
		jakehehrlichUnsubmitted Not Done Reply Inline Actions This change (which is not necessarily wrong) conflicts with one of the two proposals I made for how --dump-section should work in another change. After a bit of thought I think "ExecuteElfObjcopyOnBlob" might be needed or something. The --dump-section change hasn't really shaken out just yet so we'll see. jakehehrlich: This change (which is not necessarily wrong) conflicts with one of the two proposals I made for…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions I renamed this to ExecuteElfObjcopyOnBlob since it's no longer always a binary -- is that what you meant? Do you have a link to your proposals on --dump-section? rupprecht: I renamed this to ExecuteElfObjcopyOnBlob since it's no longer always a binary -- is that what…
		jakehehrlichUnsubmitted Not Done Reply Inline Actions No I meant for ExecuteElfObjcopyOnBinary to remain unchanged and to not take a Reader at all but instead for a separate almost identical function ExecuteElfObjcopyOnBlob to exist so that current invocations to ExecuteElfObjcopyOnBinary would have to decide which one to call. The issue is that sometimes there are options (like --dump-section) which only really make sense on the original data. Depending on how that all shakes out we might want to handle that option differently depending on if -I binary or -I elf* was used. One of the options (and the option I'm currently in favor of) for that change would allow us to treat this all uniformly however. jakehehrlich: No I meant for ExecuteElfObjcopyOnBinary to remain unchanged and to not take a Reader at all…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions Sorry, I'm still not following. Do you mean this method should be duplicated, once for ELF binary, once for non-ELF binaries? // Original method, always handles ELF binary input static void ExecuteElfObjcopyOnBinary(const CopyConfig &Config, FileBuffer &Binary, Buffer &Out) { ELFReader Reader(&Binary); std::unique_ptr<Object> Obj = Reader.create(); HandleArgs(Config, Obj, Reader, Reader.getElfType()); // ... maybe some special logic only for ELF binaries ... std::unique_ptr<Writer> Writer = CreateWriter(Config, Obj, Out, Reader.getElfType()); Writer->finalize(); Writer->write(); } // Called for -I Binary static void ExecuteElfObjcopyOnBlob(const CopyConfig &Config, MemoryBuffer &Blob, Buffer &Out) { BinaryReader Reader(&Blob); std::unique_ptr<Object> Obj = Reader.create(); ElfType OutputElfType = <... get ELFType from -B flag ...> HandleArgs(Config, Obj, Reader, ET, OutputElfType); // ... maybe some special logic only for non-ELF binaries ... std::unique_ptr<Writer> Writer = CreateWriter(Config, Obj, Out, OutputElfType); Writer->finalize(); Writer->write(); } rupprecht: Sorry, I'm still not following. Do you mean this method should be duplicated, once for ELF…
		jakehehrlichUnsubmitted Not Done Reply Inline Actions Yeah that's what I had in mind but don't do that just yet. Also that might all get templated out in some manner as well. I was just pointing out a potential future issue. jakehehrlich: Yeah that's what I had in mind but don't do that just yet. Also that might all get templated…
Buffer &Out) {		Buffer &Out) {
ELFReader Reader(&Binary);
std::unique_ptr<Object> Obj = Reader.create();		std::unique_ptr<Object> Obj = Reader.create();

HandleArgs(Config, *Obj, Reader, Reader.getElfType());		HandleArgs(Config, *Obj, Reader);

		Optional<ElfType> OutputElfType = GetOutputElfType(Config, Reader);
std::unique_ptr<Writer> Writer =		std::unique_ptr<Writer> Writer =
CreateWriter(Config, *Obj, Out, Reader.getElfType());		CreateWriter(Config, *Obj, Out, OutputElfType);
Writer->finalize();		Writer->finalize();
Writer->write();		Writer->write();
}		}

// For regular archives this function simply calls llvm::writeArchive,		// For regular archives this function simply calls llvm::writeArchive,
// For thin archives it writes the archive file itself as well as its members.		// For thin archives it writes the archive file itself as well as its members.
static Error deepWriteArchive(StringRef ArcName,		static Error deepWriteArchive(StringRef ArcName,
ArrayRef<NewArchiveMember> NewMembers,		ArrayRef<NewArchiveMember> NewMembers,
Show All 30 Lines	for (const Archive::Child &Child : Ar.children(Err)) {
Expected<std::unique_ptr<Binary>> ChildOrErr = Child.getAsBinary();		Expected<std::unique_ptr<Binary>> ChildOrErr = Child.getAsBinary();
if (!ChildOrErr)		if (!ChildOrErr)
reportError(Ar.getFileName(), ChildOrErr.takeError());		reportError(Ar.getFileName(), ChildOrErr.takeError());
Expected<StringRef> ChildNameOrErr = Child.getName();		Expected<StringRef> ChildNameOrErr = Child.getName();
if (!ChildNameOrErr)		if (!ChildNameOrErr)
reportError(Ar.getFileName(), ChildNameOrErr.takeError());		reportError(Ar.getFileName(), ChildNameOrErr.takeError());

MemBuffer MB(ChildNameOrErr.get());		MemBuffer MB(ChildNameOrErr.get());
ExecuteElfObjcopyOnBinary(Config, **ChildOrErr, MB);		ELFReader Reader(ChildOrErr->get());
		ExecuteElfObjcopyOnBinary(Config, Reader, MB);
		jhendersonUnsubmitted Not Done Reply Inline Actions I may have missed this somewhere in the discussion, but why are we moving the Reader creation out of the ExecuteElfObjcopyOnBinary function? jhenderson: I may have missed this somewhere in the discussion, but why are we moving the Reader creation…
		rupprechtAuthorUnsubmitted Not Done Reply Inline Actions The goal I'm trying to reach is that ExecuteElfObjcopyOnBinary should work regardless of what the type of input is, which means it can't take either Binary or ELFReader as input. Thus, callers construct it. rupprecht: The goal I'm trying to reach is that ExecuteElfObjcopyOnBinary should work regardless of what…

Expected<NewArchiveMember> Member =		Expected<NewArchiveMember> Member =
NewArchiveMember::getOldMember(Child, true);		NewArchiveMember::getOldMember(Child, true);
if (!Member)		if (!Member)
reportError(Ar.getFileName(), Member.takeError());		reportError(Ar.getFileName(), Member.takeError());
Member->Buf = MB.releaseMemoryBuffer();		Member->Buf = MB.releaseMemoryBuffer();
Member->MemberName = Member->Buf->getBufferIdentifier();		Member->MemberName = Member->Buf->getBufferIdentifier();
NewArchiveMembers.push_back(std::move(*Member));		NewArchiveMembers.push_back(std::move(*Member));
Show All 12 Lines	Expected<OwningBinary<llvm::object::Binary>> BinaryOrErr =
createBinary(Config.InputFilename);		createBinary(Config.InputFilename);
if (!BinaryOrErr)		if (!BinaryOrErr)
reportError(Config.InputFilename, BinaryOrErr.takeError());		reportError(Config.InputFilename, BinaryOrErr.takeError());

if (Archive *Ar = dyn_cast<Archive>(BinaryOrErr.get().getBinary()))		if (Archive *Ar = dyn_cast<Archive>(BinaryOrErr.get().getBinary()))
return ExecuteElfObjcopyOnArchive(Config, *Ar);		return ExecuteElfObjcopyOnArchive(Config, *Ar);

FileBuffer FB(Config.OutputFilename);		FileBuffer FB(Config.OutputFilename);
ExecuteElfObjcopyOnBinary(Config, *BinaryOrErr.get().getBinary(), FB);		ELFReader Reader(BinaryOrErr.get().getBinary());
		ExecuteElfObjcopyOnBinary(Config, Reader, FB);
}		}

// ParseObjcopyOptions returns the config and sets the input arguments. If a		// ParseObjcopyOptions returns the config and sets the input arguments. If a
// help flag is set then ParseObjcopyOptions will print the help messege and		// help flag is set then ParseObjcopyOptions will print the help messege and
// exit.		// exit.
static CopyConfig ParseObjcopyOptions(ArrayRef<const char *> ArgsArr) {		static CopyConfig ParseObjcopyOptions(ArrayRef<const char *> ArgsArr) {
ObjcopyOptTable T;		ObjcopyOptTable T;
unsigned MissingArgumentIndex, MissingArgumentCount;		unsigned MissingArgumentIndex, MissingArgumentCount;
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] NFC: Refactor main objcopy method that takes an ELFReader to a generic Reader.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 158823

tools/llvm-objcopy/Object.h

tools/llvm-objcopy/Object.cpp

tools/llvm-objcopy/llvm-objcopy.cpp

[llvm-objcopy] NFC: Refactor main objcopy method that takes an ELFReader to a generic Reader.
AbandonedPublic