This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
test/tools/llvm-objcopy/
-
tools/
-
llvm-objcopy/
-
COFF/
-
Inputs/
-
bigobj.o.gz
1
bigobj.test
-
ELF/
-
Inputs/
-
ungzip.py
-
auto-remove-shndx.test
-
many-sections.test
-
remove-shndx.test
-
strict-no-add.test
-
Inputs/
-
ungzip.py
-
tools/llvm-objcopy/COFF/
-
llvm-objcopy/
-
COFF/
-
COFFObjcopy.cpp
1/3
Object.h
-
Object.cpp
3/7
Reader.cpp
-
Writer.h
3/5
Writer.cpp

Differential D57009

[llvm-objcopy] [COFF] Fix handling of aux symbols for big objects
ClosedPublic

Authored by mstorsjo on Jan 21 2019, 2:18 AM.

Download Raw Diff

Details

Reviewers

jhenderson
alexander-shaposhnikov
jakehehrlich
rupprecht
rnk
• espindola
serge-sans-paille

Commits

rG1be91958b34c: [llvm-objcopy] [COFF] Fix handling of aux symbols for big objects
rL351947: [llvm-objcopy] [COFF] Fix handling of aux symbols for big objects

Summary

@rnk - I want your input on this one.

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

Currently, the aux symbols are stored in an opaque std::vector<uint8_t>, with contents interpreted according to the rest of the symbol. This allows passing through all the aux symbols we don't need to touch or care about.

If the input was a bigobj but the output isn't, or vice versa, this makes the aux data desync the whole symbol table.

All aux symbol types that use a struct fit in 18 bytes (sizeof(coff_symbol16)), and if written to a bigobj, two extra padding bytes are written after each (as sizeof(coff_symbol32) is 20).

This patch implements the following fix: In the llvm-objcopy storage agnostic intermediate representation, store the aux symbols as a series of coff_symbol16 sized opaque blobs within the same std::vector<uint8_t>. (In practice, all such struct based aux symbols only consist of one aux symbol, so this is more flexible than what reality needs.)

The special case is the file aux symbols, which are written in potentially more than one aux symbol slot, without any padding, as one single long string. This can't be stored in the same opaque vector of fixed sized aux symbol entries. The file aux symbols will
occupy a different number of aux symbol slots depending on the type of output object file. As nothing in the intermediate process needs to have accurate raw symbol indices, updating that is moved into the writer class.

Instead of updating the symbol raw indices at the end when the final format is known, one could alternatively choose to waste a bit more space and always allocate indices based on a normal object file. For a bigobj, we could potentially end up with a whole aux entry slot of padding for the filename. As this is rather uncommon (in practice max one per file), the total wasted space would be 20 bytes per file, unless really long file names are stored.

An alternative to the opaque AuxData vector would be to add a set of Optional<coff_aux_section_definition>, Optional <coff_aux_weak_external>. The upside is that this makes the intermediate format much clearer and neater, but the downside is that we need to explicitly know and care about all sorts of aux symbols (5 types, plus the file names) that we'd otherwise just pass through without touching and even knowing the specifics about.

Diff Detail

Event Timeline

mstorsjo created this revision.Jan 21 2019, 2:18 AM

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

I'm not a fan of adding hidden options purely for testing when there are alternatives. In the ELF tests, we use a pre-built zip file containing an object with many sections (see ELF/many-sections.test). I think you could probably do the same thing here.

Regarding the code, I honestly don't really understand it, so I don't feel like I'm qualified to review this COFF-ism.

tools/llvm-objcopy/COFF/Reader.cpp
113	You can get rid of the braces here.
tools/llvm-objcopy/COFF/Writer.cpp
322–333	Strictly speaking, you don't need the braces in this if/else.
328–330	Are coff_symbol16 and SymbolTy supposed to be the same size? Or are you deliberately writing less (or more) into the buffer than you iterate over?

In D57009#1366076, @jhenderson wrote:

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

I'm not a fan of adding hidden options purely for testing when there are alternatives. In the ELF tests, we use a pre-built zip file containing an object with many sections (see ELF/many-sections.test). I think you could probably do the same thing here.

Hmm, ok. Now with --add-gnu-debuglink it's possible to add a section, so with that I guess it should be possible to achieve a test which removes a section to make the big object small, and then add another one to make it big again. Less elegant than a small and neat yaml test input IMO, but probably tolerable.

tools/llvm-objcopy/COFF/Writer.cpp
328–330	I'm deliberately writing more or less, yes. The COFF-ism is that the symbol table can consist of entries of either coff_symbol16 or coff_symbol32, of 18 or 20 bytes each. A symbol can be followed by a number of aux symbols, which can be one of a number of different structs, all 18 bytes each. If the table consists of coff_symbol32 entries, each one of the aux symbols (opaque aux structs) will have 2 bytes of padding at the end. So here I'm writing chunks of 18 bytes at a time out of the stored AuxData (where they are packed tightly), spaced 18 or 20 bytes apart in the output symbol table (depending on the entry size of that symbol table).

rnk added inline comments.Jan 22 2019, 9:56 AM

tools/llvm-objcopy/COFF/Object.h
85–86	Well, they aren't always coff_symbol16 sized are they? For an input bigobj, it'll be coff_symbol32, or we should make this a vector of coff_symbol16 directly. I don't know much about objcopy, but I think it might be more in the spirit of it to widen into coff_symbol32 as is done for the main symbol field above, instead of keeping this as an opaque binary blob.
tools/llvm-objcopy/COFF/Reader.cpp
116	I think this can just be `.rtrim('\0')`, there is a single character overload of rtrim.
121	Should this second `sizeof(coff_symbol16)` be SymSize? Maybe an easier way to express it would be: ArrayRef<uint8_t> Chunk = AuxData.take_front(SymSize); Sym.AuxData.insert(Sym.AuxData.end(), Chunk.begin(), Chunk.end()); AuxData = AuxData.drop_front(SymSize); It mutates a local variable, but takes less math.

mstorsjo marked 2 inline comments as done.Jan 22 2019, 10:43 AM

mstorsjo added inline comments.

tools/llvm-objcopy/COFF/Object.h
85–86	Even for bigobj inputs, the aux symbols (except for .file) only have coff_symbol16 worth of payload. There's no wide version of any of the `coff_aux_..` structs, so they can't be widened into the intermediate storage. Making it a vector of coff_symbol16 would make things clearer, but as the data actually isn't that struct, maybe `struct { uint8_t Opaque[sizeof(coff_symbol16)]; }` would be more correct? Or alternatively `Optional<coff_aux_*>` for each of the known types - but I prefer being able to passthrough unknown data untouched.
tools/llvm-objcopy/COFF/Reader.cpp
121	No, this is intentionally (within the current patch design) copying 18 bytes from a source which has got either 18 or 20 bytes stride.

Using a vector of AuxSymbol, which are an opaque struct of coff_symbol16 size, making the code slightly clearer. Didn't change the test to use a large object file to actually trigger generating a big object yet.

rnk added inline comments.Jan 22 2019, 4:36 PM

tools/llvm-objcopy/COFF/Object.h
85–86	I see.
tools/llvm-objcopy/COFF/Reader.cpp
121	Of course now I read you already clarified this. I think there should be a comment about how this is normalizing from coff_symbol32-sized entries to AuxSymbol sized entries, and discarding the padding bytes that are present in a bigobj.

lgtm I like the new code, so feel free to commit after adding a comment about the thing both reviewers were confused by. :)

This revision is now accepted and ready to land.Jan 22 2019, 4:47 PM

In D57009#1366076, @jhenderson wrote:

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

I'm not a fan of adding hidden options purely for testing when there are alternatives. In the ELF tests, we use a pre-built zip file containing an object with many sections (see ELF/many-sections.test). I think you could probably do the same thing here.

I'm experimenting with crafting such an input file now. The uncompressed object file weighs in at around 5 MB, and after gzip (as is used for that ELF test) it currently ends up at around 725 KB. Do you think that's acceptable or too large?

In D57009#1367437, @mstorsjo wrote:

In D57009#1366076, @jhenderson wrote:

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

I'm not a fan of adding hidden options purely for testing when there are alternatives. In the ELF tests, we use a pre-built zip file containing an object with many sections (see ELF/many-sections.test). I think you could probably do the same thing here.

I'm experimenting with crafting such an input file now. The uncompressed object file weighs in at around 5 MB, and after gzip (as is used for that ELF test) it currently ends up at around 725 KB. Do you think that's acceptable or too large?

It's not ideal, if I'm honest, but it might be a quirk of the file format, and therefore unavoidable. The equivalent file for ELF is only 152 KB. I don't know if it was somehow more aggressively compressed though. If you can't get it any smaller, I think it's probably acceptable.

By the way, there's a gunzip.py script in the ELF/Inputs directory, which you should probably move to a shared area and use for decompressing.

In D57009#1367445, @jhenderson wrote:

In D57009#1367437, @mstorsjo wrote:

I'm experimenting with crafting such an input file now. The uncompressed object file weighs in at around 5 MB, and after gzip (as is used for that ELF test) it currently ends up at around 725 KB. Do you think that's acceptable or too large?

It's not ideal, if I'm honest, but it might be a quirk of the file format, and therefore unavoidable. The equivalent file for ELF is only 152 KB. I don't know if it was somehow more aggressively compressed though. If you can't get it any smaller, I think it's probably acceptable.

Ok then. At least it makes for a better testcase.

By the way, there's a gunzip.py script in the ELF/Inputs directory, which you should probably move to a shared area and use for decompressing.

Hmm, what place would that be, where it's findable by python within the lit tests? I could just move it up into test/tools/llvm-objcopy/Inputs and refer to it with %p/../Inputs/ungzip.py in the ELF/COFF subdirs - not ideal or elegant or anything, but at least shared between these two directories.

In D57009#1367446, @mstorsjo wrote:

By the way, there's a gunzip.py script in the ELF/Inputs directory, which you should probably move to a shared area and use for decompressing.

Hmm, what place would that be, where it's findable by python within the lit tests? I could just move it up into test/tools/llvm-objcopy/Inputs and refer to it with %p/../Inputs/ungzip.py in the ELF/COFF subdirs - not ideal or elegant or anything, but at least shared between these two directories.

That's where I'd put it. No point in duplicating it after all.

Removed the option for forcing emission of a big object, made a test that operates on a bundled large object file instead.

Herald added a reviewer: • espindola. · View Herald TranscriptJan 23 2019, 2:16 AM

Herald added a reviewer: serge-sans-paille. · View Herald Transcript

Herald added subscribers: arichardson, emaste. · View Herald Transcript

jhenderson added inline comments.Jan 23 2019, 3:03 AM

test/tools/llvm-objcopy/COFF/bigobj.test
5–8	I think it probably is easier to associate comments with the corresponding test case without the blank link between them, but I'm not too bothered, so if you prefer it this way, that's fine.
tools/llvm-objcopy/COFF/Reader.cpp
113	You can still get rid of these braces ;) I think a few code comments around here explaining what you are doing and why would make it much more understandable. Perhaps a brief explanation of the difference in the BigObj format?
tools/llvm-objcopy/COFF/Writer.cpp
148–149	This is another place requiring a code comment, I think, just explaining the "why".
322–323	Again, a few comments around here would be good.

In D57009#1367445, @jhenderson wrote:

In D57009#1367437, @mstorsjo wrote:

In D57009#1366076, @jhenderson wrote:

Other llvm-objcopy reviewers: I'd like to add a custom hidden option for testing, for triggering using the big object format. Without that, a test would have to create over 32k sections to trigger that.

I'm not a fan of adding hidden options purely for testing when there are alternatives. In the ELF tests, we use a pre-built zip file containing an object with many sections (see ELF/many-sections.test). I think you could probably do the same thing here.

I'm experimenting with crafting such an input file now. The uncompressed object file weighs in at around 5 MB, and after gzip (as is used for that ELF test) it currently ends up at around 725 KB. Do you think that's acceptable or too large?

It's not ideal, if I'm honest, but it might be a quirk of the file format, and therefore unavoidable. The equivalent file for ELF is only 152 KB. I don't know if it was somehow more aggressively compressed though. If you can't get it any smaller, I think it's probably acceptable.

I realized I could make the testcase less interesting and remove some aspects that aren't strictly needed for this test. That reduced the uncompressed object from 5 MB to 2.5, and the compressed one from 725 KB to 7 KB. That's probably small enough :-)

tools/llvm-objcopy/COFF/Reader.cpp
113	Oh, right, I forgot about the other comments when focusing on the test data.

Removed the extra braces and added more comments, adjusted the testcase for the smaller test data.

LGTM.

Closed by commit rL351947: [llvm-objcopy] [COFF] Fix handling of aux symbols for big objects (authored by mstorsjo). · Explain WhyJan 23 2019, 3:55 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

test/

tools/

llvm-objcopy/

COFF/

Inputs/

bigobj.o.gz

bigobj.test

34 lines

ELF/

Inputs/

ungzip.py

auto-remove-shndx.test

2 lines

many-sections.test

2 lines

remove-shndx.test

2 lines

strict-no-add.test

2 lines

Inputs/

ungzip.py

13 lines

tools/

llvm-objcopy/

COFF/

4 lines

18 lines

6 lines

14 lines

2 lines

41 lines

Diff 183065

test/tools/llvm-objcopy/COFF/Inputs/bigobj.o.gz

This binary file was added.

test/tools/llvm-objcopy/COFF/bigobj.test

This file was added.

				RUN: %python %p/../Inputs/ungzip.py %p/Inputs/bigobj.o.gz > %t.in.o

				RUN: llvm-objdump -t %t.in.o \| FileCheck %s --check-prefixes=SYMBOLS,SYMBOLS-BIG,SYMBOLS-ORIG

				# Remove a section, making the section count fit into a small object.

				RUN: llvm-objcopy -R '.text$4' %t.in.o %t.small.o
				RUN: llvm-objdump -t %t.small.o \| FileCheck %s --check-prefixes=SYMBOLS,SYMBOLS-SMALL,SYMBOLS-REMOVED-SMALL
				jhendersonUnsubmitted Not Done Reply Inline Actions I think it probably is easier to associate comments with the corresponding test case without the blank link between them, but I'm not too bothered, so if you prefer it this way, that's fine. jhenderson: I think it probably is easier to associate comments with the corresponding test case without…

				# Add a .gnu_debuglink section, forcing the object back to big format.

				RUN: llvm-objcopy --add-gnu-debuglink=%t.in.o %t.small.o %t.big.o
				llvm-objdump -t %t.big.o \| FileCheck %s --check-prefixes=SYMBOLS,SYMBOLS-BIG,SYMBOLS-REMOVED-BIG

				SYMBOLS: SYMBOL TABLE:
				SYMBOLS-NEXT: [ 0]{{.}} (nx 1) {{.}} .text
				SYMBOLS-NEXT: AUX scnlen
				SYMBOLS-SMALL-NEXT: [ 2]{{.}} (nx 2) {{.}} .file
				SYMBOLS-BIG-NEXT: [ 2]{{.}} (nx 1) {{.}} .file
				SYMBOLS-NEXT: AUX abcdefghijklmnopqrs
				SYMBOLS-SMALL-NEXT: [ 5]{{.}} (nx 0) {{.}} foo
				SYMBOLS-BIG-NEXT: [ 4]{{.}} (nx 0) {{.}} foo

				# Check that the section numbers outside of signed 16 bit int range
				# are represented properly.

				# After removing one section, the section numbers decrease. Since the .file
				# symbol expanded to having two aux symbol entries, the raw symbol index
				# stays the same for REMOVED-SMALL, but decreases by one when going back to
				# big format.

				SYMBOLS-ORIG: [65283](sec 65280){{.*}} symbol65280
				SYMBOLS-REMOVED-SMALL: [65283](sec 65279){{.*}} symbol65280
				SYMBOLS-REMOVED-BIG: [65282](sec 65279){{.*}} symbol65280

test/tools/llvm-objcopy/ELF/Inputs/ungzip.py

This file was deleted.

	import gzip
	import sys

	with gzip.open(sys.argv[1], 'rb') as f:
	writer = getattr(sys.stdout, 'buffer', None)
	if writer is None:
	writer = sys.stdout
	if sys.platform == "win32":
	import os, msvcrt
	msvcrt.setmode(sys.stdout.fileno(),os.O_BINARY)

	writer.write(f.read())
	sys.stdout.flush()

test/tools/llvm-objcopy/ELF/auto-remove-shndx.test

	# RUN: %python %p/Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t			# RUN: %python %p/../Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t
	# RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t %t2			# RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t %t2
	# RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s			# RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s

	# SECS-NOT: Name: .symtab_shndx			# SECS-NOT: Name: .symtab_shndx

test/tools/llvm-objcopy/ELF/many-sections.test

	RUN: %python %p/Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t			RUN: %python %p/../Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t
	RUN: llvm-objcopy %t %t2			RUN: llvm-objcopy %t %t2
	RUN: llvm-readobj --file-headers %t2 \| FileCheck --check-prefix=EHDR %s			RUN: llvm-readobj --file-headers %t2 \| FileCheck --check-prefix=EHDR %s
	RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s			RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s
	RUN: llvm-readobj --symbols %t2 \| grep "Symbol {" \| wc -l \| FileCheck --check-prefix=SYMS %s			RUN: llvm-readobj --symbols %t2 \| grep "Symbol {" \| wc -l \| FileCheck --check-prefix=SYMS %s

	EHDR: Format: ELF64-x86-64			EHDR: Format: ELF64-x86-64
	EHDR-NEXT: Arch: x86_64			EHDR-NEXT: Arch: x86_64
	EHDR-NEXT: AddressSize: 64bit			EHDR-NEXT: AddressSize: 64bit
	▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

test/tools/llvm-objcopy/ELF/remove-shndx.test

	# This test checks to see that a .symtab_shndx section is added to any binary			# This test checks to see that a .symtab_shndx section is added to any binary
	# that needs it, even if the original was removed.			# that needs it, even if the original was removed.
	RUN: %python %p/Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t			RUN: %python %p/../Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t
	RUN: llvm-objcopy -R .symtab_shndx %t %t2			RUN: llvm-objcopy -R .symtab_shndx %t %t2
	RUN: llvm-readobj --sections %t2 \| FileCheck %s			RUN: llvm-readobj --sections %t2 \| FileCheck %s

	CHECK: Name: .symtab_shndx (			CHECK: Name: .symtab_shndx (

test/tools/llvm-objcopy/ELF/strict-no-add.test

	# This test makes sure that sections added at the end that don't have symbols			# This test makes sure that sections added at the end that don't have symbols
	# defined in them don't trigger the creation of a large index table.			# defined in them don't trigger the creation of a large index table.

	RUN: %python %p/Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t.0			RUN: %python %p/../Inputs/ungzip.py %p/Inputs/many-sections.o.gz > %t.0
	RUN: cat %p/Inputs/alloc-symtab.o > %t			RUN: cat %p/Inputs/alloc-symtab.o > %t
	RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t.0 %t2			RUN: llvm-objcopy -R .text -R s0 -R s1 -R s2 -R s3 -R s4 -R s5 -R s6 %t.0 %t2
	RUN: llvm-objcopy --add-section=.s0=%t --add-section=.s1=%t --add-section=.s2=%t %t2 %t2			RUN: llvm-objcopy --add-section=.s0=%t --add-section=.s1=%t --add-section=.s2=%t %t2 %t2
	RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s			RUN: llvm-readobj --sections %t2 \| FileCheck --check-prefix=SECS %s

	SECS-NOT: Name: .symtab_shndx			SECS-NOT: Name: .symtab_shndx

test/tools/llvm-objcopy/Inputs/ungzip.py

This file was added.

				import gzip
				import sys

				with gzip.open(sys.argv[1], 'rb') as f:
				writer = getattr(sys.stdout, 'buffer', None)
				if writer is None:
				writer = sys.stdout
				if sys.platform == "win32":
				import os, msvcrt
				msvcrt.setmode(sys.stdout.fileno(),os.O_BINARY)

				writer.write(f.read())
				sys.stdout.flush()

tools/llvm-objcopy/COFF/COFFObjcopy.cpp

Show All 31 Lines	static bool isDebugSection(const Section &Sec) {
return Sec.Name.startswith(".debug");		return Sec.Name.startswith(".debug");
}		}

static uint64_t getNextRVA(const Object &Obj) {		static uint64_t getNextRVA(const Object &Obj) {
if (Obj.getSections().empty())		if (Obj.getSections().empty())
return 0;		return 0;
const Section &Last = Obj.getSections().back();		const Section &Last = Obj.getSections().back();
return alignTo(Last.Header.VirtualAddress + Last.Header.VirtualSize,		return alignTo(Last.Header.VirtualAddress + Last.Header.VirtualSize,
Obj.PeHeader.SectionAlignment);		Obj.IsPE ? Obj.PeHeader.SectionAlignment : 1);
}		}

static uint32_t getCRC32(StringRef Data) {		static uint32_t getCRC32(StringRef Data) {
JamCRC CRC;		JamCRC CRC;
CRC.update(ArrayRef<char>(Data.data(), Data.size()));		CRC.update(ArrayRef<char>(Data.data(), Data.size()));
// The CRC32 value needs to be complemented because the JamCRC dosn't		// The CRC32 value needs to be complemented because the JamCRC dosn't
// finalize the CRC32 value. It also dosn't negate the initial CRC32 value		// finalize the CRC32 value. It also dosn't negate the initial CRC32 value
// but it starts by default at 0xFFFFFFFF which is the complement of zero.		// but it starts by default at 0xFFFFFFFF which is the complement of zero.
Show All 21 Lines	static void addGnuDebugLink(Object &Obj, StringRef DebugLinkFile) {

std::vector<Section> Sections;		std::vector<Section> Sections;
Section Sec;		Section Sec;
Sec.setOwnedContents(createGnuDebugLinkSectionContents(DebugLinkFile));		Sec.setOwnedContents(createGnuDebugLinkSectionContents(DebugLinkFile));
Sec.Name = ".gnu_debuglink";		Sec.Name = ".gnu_debuglink";
Sec.Header.VirtualSize = Sec.getContents().size();		Sec.Header.VirtualSize = Sec.getContents().size();
Sec.Header.VirtualAddress = StartRVA;		Sec.Header.VirtualAddress = StartRVA;
Sec.Header.SizeOfRawData =		Sec.Header.SizeOfRawData =
alignTo(Sec.Header.VirtualSize, Obj.PeHeader.FileAlignment);		alignTo(Sec.Header.VirtualSize, Obj.IsPE ? Obj.PeHeader.FileAlignment : 1);
// Sec.Header.PointerToRawData is filled in by the writer.		// Sec.Header.PointerToRawData is filled in by the writer.
Sec.Header.PointerToRelocations = 0;		Sec.Header.PointerToRelocations = 0;
Sec.Header.PointerToLinenumbers = 0;		Sec.Header.PointerToLinenumbers = 0;
// Sec.Header.NumberOfRelocations is filled in by the writer.		// Sec.Header.NumberOfRelocations is filled in by the writer.
Sec.Header.NumberOfLinenumbers = 0;		Sec.Header.NumberOfLinenumbers = 0;
Sec.Header.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA \|		Sec.Header.Characteristics = IMAGE_SCN_CNT_INITIALIZED_DATA \|
IMAGE_SCN_MEM_READ \| IMAGE_SCN_MEM_DISCARDABLE;		IMAGE_SCN_MEM_READ \| IMAGE_SCN_MEM_DISCARDABLE;
Sections.push_back(Sec);		Sections.push_back(Sec);
▲ Show 20 Lines • Show All 122 Lines • Show Last 20 Lines

tools/llvm-objcopy/COFF/Object.h

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	void clearContents() {
OwnedContents.clear();		OwnedContents.clear();
}		}

private:		private:
ArrayRef<uint8_t> ContentsRef;		ArrayRef<uint8_t> ContentsRef;
std::vector<uint8_t> OwnedContents;		std::vector<uint8_t> OwnedContents;
};		};

		struct AuxSymbol {
		AuxSymbol(ArrayRef<uint8_t> In) {
		assert(In.size() == sizeof(Opaque));
		std::copy(In.begin(), In.end(), Opaque);
		}

		ArrayRef<uint8_t> getRef() const {
		return ArrayRef<uint8_t>(Opaque, sizeof(Opaque));
		}

		uint8_t Opaque[sizeof(object::coff_symbol16)];
		};

struct Symbol {		struct Symbol {
object::coff_symbol32 Sym;		object::coff_symbol32 Sym;
StringRef Name;		StringRef Name;
std::vector<uint8_t> AuxData;		std::vector<AuxSymbol> AuxData;
		StringRef AuxFile;
		rnkUnsubmitted Not Done Reply Inline Actions Well, they aren't always coff_symbol16 sized are they? For an input bigobj, it'll be coff_symbol32, or we should make this a vector of coff_symbol16 directly. I don't know much about objcopy, but I think it might be more in the spirit of it to widen into coff_symbol32 as is done for the main symbol field above, instead of keeping this as an opaque binary blob. rnk: Well, they aren't always coff_symbol16 sized are they? For an input bigobj, it'll be…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions Even for bigobj inputs, the aux symbols (except for .file) only have coff_symbol16 worth of payload. There's no wide version of any of the `coff_aux_..` structs, so they can't be widened into the intermediate storage. Making it a vector of coff_symbol16 would make things clearer, but as the data actually isn't that struct, maybe `struct { uint8_t Opaque[sizeof(coff_symbol16)]; }` would be more correct? Or alternatively `Optional<coff_aux_>` for each of the known types - but I prefer being able to passthrough unknown data untouched. mstorsjo:* Even for bigobj inputs, the aux symbols (except for .file) only have coff_symbol16 worth of…
		rnkUnsubmitted Not Done Reply Inline Actions I see. rnk: I see.
ssize_t TargetSectionId;		ssize_t TargetSectionId;
ssize_t AssociativeComdatTargetSectionId = 0;		ssize_t AssociativeComdatTargetSectionId = 0;
Optional<size_t> WeakTargetSymbolId;		Optional<size_t> WeakTargetSymbolId;
size_t UniqueId;		size_t UniqueId;
size_t RawIndex;		size_t RawIndex;
bool Referenced;		bool Referenced;
};		};

▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	private:

size_t NextSymbolUniqueId = 0;		size_t NextSymbolUniqueId = 0;

std::vector<Section> Sections;		std::vector<Section> Sections;
DenseMap<ssize_t, Section *> SectionMap;		DenseMap<ssize_t, Section *> SectionMap;

ssize_t NextSectionUniqueId = 1; // Allow a UniqueId 0 to mean undefined.		ssize_t NextSectionUniqueId = 1; // Allow a UniqueId 0 to mean undefined.

// Update SymbolMap and RawIndex in each Symbol.		// Update SymbolMap.
void updateSymbols();		void updateSymbols();

// Update SectionMap and Index in each Section.		// Update SectionMap and Index in each Section.
void updateSections();		void updateSections();
};		};

// Copy between coff_symbol16 and coff_symbol32.		// Copy between coff_symbol16 and coff_symbol32.
// The source and destination files can use either coff_symbol16 or		// The source and destination files can use either coff_symbol16 or
▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

tools/llvm-objcopy/COFF/Object.cpp

Show All 20 Lines	for (Symbol S : NewSymbols) {
S.UniqueId = NextSymbolUniqueId++;		S.UniqueId = NextSymbolUniqueId++;
Symbols.emplace_back(S);		Symbols.emplace_back(S);
}		}
updateSymbols();		updateSymbols();
}		}

void Object::updateSymbols() {		void Object::updateSymbols() {
SymbolMap = DenseMap<size_t, Symbol *>(Symbols.size());		SymbolMap = DenseMap<size_t, Symbol *>(Symbols.size());
size_t RawSymIndex = 0;		for (Symbol &Sym : Symbols)
for (Symbol &Sym : Symbols) {
SymbolMap[Sym.UniqueId] = &Sym;		SymbolMap[Sym.UniqueId] = &Sym;
Sym.RawIndex = RawSymIndex;
RawSymIndex += 1 + Sym.Sym.NumberOfAuxSymbols;
}
}		}

const Symbol *Object::findSymbol(size_t UniqueId) const {		const Symbol *Object::findSymbol(size_t UniqueId) const {
auto It = SymbolMap.find(UniqueId);		auto It = SymbolMap.find(UniqueId);
if (It == SymbolMap.end())		if (It == SymbolMap.end())
return nullptr;		return nullptr;
return It->second;		return It->second;
}		}
▲ Show 20 Lines • Show All 99 Lines • Show Last 20 Lines

tools/llvm-objcopy/COFF/Reader.cpp

Show First 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	for (uint32_t I = 0, E = COFFObj.getRawNumberOfSymbols(); I < E;) {
if (IsBigObj)		if (IsBigObj)
copySymbol(Sym.Sym,		copySymbol(Sym.Sym,
reinterpret_cast<const coff_symbol32 >(SymRef.getRawPtr()));		reinterpret_cast<const coff_symbol32 >(SymRef.getRawPtr()));
else		else
copySymbol(Sym.Sym,		copySymbol(Sym.Sym,
reinterpret_cast<const coff_symbol16 >(SymRef.getRawPtr()));		reinterpret_cast<const coff_symbol16 >(SymRef.getRawPtr()));
if (auto EC = COFFObj.getSymbolName(SymRef, Sym.Name))		if (auto EC = COFFObj.getSymbolName(SymRef, Sym.Name))
return errorCodeToError(EC);		return errorCodeToError(EC);
Sym.AuxData = COFFObj.getSymbolAuxData(SymRef);		ArrayRef<uint8_t> AuxData = COFFObj.getSymbolAuxData(SymRef);
assert((Sym.AuxData.size() %		size_t SymSize = IsBigObj ? sizeof(coff_symbol32) : sizeof(coff_symbol16);
(IsBigObj ? sizeof(coff_symbol32) : sizeof(coff_symbol16))) == 0);		assert(AuxData.size() == SymSize * SymRef.getNumberOfAuxSymbols());
		if (SymRef.isFileRecord()) {
		jhendersonUnsubmitted Not Done Reply Inline Actions You can get rid of the braces here. jhenderson: You can get rid of the braces here.
		jhendersonUnsubmitted Done Reply Inline Actions You can still get rid of these braces ;) I think a few code comments around here explaining what you are doing and why would make it much more understandable. Perhaps a brief explanation of the difference in the BigObj format? jhenderson: You can still get rid of these braces ;) I think a few code comments around here explaining…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions Oh, right, I forgot about the other comments when focusing on the test data. mstorsjo: Oh, right, I forgot about the other comments when focusing on the test data.
		Sym.AuxFile = StringRef(reinterpret_cast<const char *>(AuxData.data()),
		AuxData.size())
		.rtrim('\0');
		rnkUnsubmitted Not Done Reply Inline Actions I think this can just be `.rtrim('\0')`, there is a single character overload of rtrim. rnk: I think this can just be `.rtrim('\0')`, there is a single character overload of rtrim.
		} else {
		for (size_t I = 0; I < SymRef.getNumberOfAuxSymbols(); I++)
		Sym.AuxData.push_back(AuxData.slice(I * SymSize, sizeof(AuxSymbol)));
		}
// Find the unique id of the section		// Find the unique id of the section
		rnkUnsubmitted Not Done Reply Inline Actions Should this second `sizeof(coff_symbol16)` be SymSize? Maybe an easier way to express it would be: ArrayRef<uint8_t> Chunk = AuxData.take_front(SymSize); Sym.AuxData.insert(Sym.AuxData.end(), Chunk.begin(), Chunk.end()); AuxData = AuxData.drop_front(SymSize); It mutates a local variable, but takes less math. rnk: Should this second `sizeof(coff_symbol16)` be SymSize? Maybe an easier way to express it would…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions No, this is intentionally (within the current patch design) copying 18 bytes from a source which has got either 18 or 20 bytes stride. mstorsjo: No, this is intentionally (within the current patch design) copying 18 bytes from a source…
		rnkUnsubmitted Not Done Reply Inline Actions Of course now I read you already clarified this. I think there should be a comment about how this is normalizing from coff_symbol32-sized entries to AuxSymbol sized entries, and discarding the padding bytes that are present in a bigobj. rnk: Of course now I read you already clarified this. I think there should be a comment about how…
if (SymRef.getSectionNumber() <=		if (SymRef.getSectionNumber() <=
0) // Special symbol (undefined/absolute/debug)		0) // Special symbol (undefined/absolute/debug)
Sym.TargetSectionId = SymRef.getSectionNumber();		Sym.TargetSectionId = SymRef.getSectionNumber();
else if (static_cast<uint32_t>(SymRef.getSectionNumber() - 1) <		else if (static_cast<uint32_t>(SymRef.getSectionNumber() - 1) <
Sections.size())		Sections.size())
Sym.TargetSectionId = Sections[SymRef.getSectionNumber() - 1].UniqueId;		Sym.TargetSectionId = Sections[SymRef.getSectionNumber() - 1].UniqueId;
else		else
return createStringError(object_error::parse_failed,		return createStringError(object_error::parse_failed,
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

tools/llvm-objcopy/COFF/Writer.h

Show All 24 Lines	class COFFWriter {
Object &Obj;		Object &Obj;
Buffer &Buf;		Buffer &Buf;

size_t FileSize;		size_t FileSize;
size_t FileAlignment;		size_t FileAlignment;
size_t SizeOfInitializedData;		size_t SizeOfInitializedData;
StringTableBuilder StrTabBuilder;		StringTableBuilder StrTabBuilder;

		template <class SymbolTy> std::pair<size_t, size_t> finalizeSymbolTable();
Error finalizeRelocTargets();		Error finalizeRelocTargets();
Error finalizeSymbolContents();		Error finalizeSymbolContents();
void layoutSections();		void layoutSections();
size_t finalizeStringTable();		size_t finalizeStringTable();
template <class SymbolTy> std::pair<size_t, size_t> finalizeSymbolTable();

Error finalize(bool IsBigObj);		Error finalize(bool IsBigObj);

void writeHeaders(bool IsBigObj);		void writeHeaders(bool IsBigObj);
void writeSections();		void writeSections();
template <class SymbolTy> void writeSymbolStringTables();		template <class SymbolTy> void writeSymbolStringTables();

Error write(bool IsBigObj);		Error write(bool IsBigObj);
Show All 16 Lines

tools/llvm-objcopy/COFF/Writer.cpp

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	if (Sym.TargetSectionId <= 0) {
return createStringError(object_error::invalid_symbol_index,		return createStringError(object_error::invalid_symbol_index,
"Symbol '%s' points to a removed section",		"Symbol '%s' points to a removed section",
Sym.Name.str().c_str());		Sym.Name.str().c_str());
Sym.Sym.SectionNumber = Sec->Index;		Sym.Sym.SectionNumber = Sec->Index;

if (Sym.Sym.NumberOfAuxSymbols == 1 &&		if (Sym.Sym.NumberOfAuxSymbols == 1 &&
Sym.Sym.StorageClass == IMAGE_SYM_CLASS_STATIC) {		Sym.Sym.StorageClass == IMAGE_SYM_CLASS_STATIC) {
coff_aux_section_definition *SD =		coff_aux_section_definition *SD =
reinterpret_cast<coff_aux_section_definition *>(Sym.AuxData.data());		reinterpret_cast<coff_aux_section_definition *>(
		Sym.AuxData[0].Opaque);
uint32_t SDSectionNumber;		uint32_t SDSectionNumber;
if (Sym.AssociativeComdatTargetSectionId == 0) {		if (Sym.AssociativeComdatTargetSectionId == 0) {
// Not a comdat associative section; just set the Number field to		// Not a comdat associative section; just set the Number field to
// the number of the section itself.		// the number of the section itself.
SDSectionNumber = Sec->Index;		SDSectionNumber = Sec->Index;
} else {		} else {
Sec = Obj.findSection(Sym.AssociativeComdatTargetSectionId);		Sec = Obj.findSection(Sym.AssociativeComdatTargetSectionId);
if (Sec == nullptr)		if (Sec == nullptr)
return createStringError(		return createStringError(
object_error::invalid_symbol_index,		object_error::invalid_symbol_index,
"Symbol '%s' is associative to a removed section",		"Symbol '%s' is associative to a removed section",
Sym.Name.str().c_str());		Sym.Name.str().c_str());
SDSectionNumber = Sec->Index;		SDSectionNumber = Sec->Index;
}		}
// Update the section definition with the new section number.		// Update the section definition with the new section number.
SD->NumberLowPart = static_cast<uint16_t>(SDSectionNumber);		SD->NumberLowPart = static_cast<uint16_t>(SDSectionNumber);
SD->NumberHighPart = static_cast<uint16_t>(SDSectionNumber >> 16);		SD->NumberHighPart = static_cast<uint16_t>(SDSectionNumber >> 16);
}		}
}		}
// Check that we actually have got AuxData to match the weak symbol target		// Check that we actually have got AuxData to match the weak symbol target
// we want to set. Only >= 1 would be required, but only == 1 makes sense.		// we want to set. Only >= 1 would be required, but only == 1 makes sense.
if (Sym.WeakTargetSymbolId && Sym.Sym.NumberOfAuxSymbols == 1) {		if (Sym.WeakTargetSymbolId && Sym.Sym.NumberOfAuxSymbols == 1) {
coff_aux_weak_external *WE =		coff_aux_weak_external *WE =
reinterpret_cast<coff_aux_weak_external *>(Sym.AuxData.data());		reinterpret_cast<coff_aux_weak_external *>(Sym.AuxData[0].Opaque);
const Symbol Target = Obj.findSymbol(Sym.WeakTargetSymbolId);		const Symbol Target = Obj.findSymbol(Sym.WeakTargetSymbolId);
if (Target == nullptr)		if (Target == nullptr)
return createStringError(object_error::invalid_symbol_index,		return createStringError(object_error::invalid_symbol_index,
"Symbol '%s' is missing its weak target",		"Symbol '%s' is missing its weak target",
Sym.Name.str().c_str());		Sym.Name.str().c_str());
WE->TagIndex = Target->RawIndex;		WE->TagIndex = Target->RawIndex;
}		}
}		}
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	if (S.Name.size() > COFF::NameSize) {
strncpy(S.Sym.Name.ShortName, S.Name.data(), COFF::NameSize);		strncpy(S.Sym.Name.ShortName, S.Name.data(), COFF::NameSize);
}		}
}		}
return StrTabBuilder.getSize();		return StrTabBuilder.getSize();
}		}

template <class SymbolTy>		template <class SymbolTy>
std::pair<size_t, size_t> COFFWriter::finalizeSymbolTable() {		std::pair<size_t, size_t> COFFWriter::finalizeSymbolTable() {
size_t SymTabSize = Obj.getSymbols().size() * sizeof(SymbolTy);		size_t RawSymIndex = 0;
for (const auto &S : Obj.getSymbols())		for (auto &S : Obj.getMutableSymbols()) {
SymTabSize += S.AuxData.size();		if (!S.AuxFile.empty())
return std::make_pair(SymTabSize, sizeof(SymbolTy));		S.Sym.NumberOfAuxSymbols =
		alignTo(S.AuxFile.size(), sizeof(SymbolTy)) / sizeof(SymbolTy);
		jhendersonUnsubmitted Done Reply Inline Actions This is another place requiring a code comment, I think, just explaining the "why". jhenderson: This is another place requiring a code comment, I think, just explaining the "why".
		S.RawIndex = RawSymIndex;
		RawSymIndex += 1 + S.Sym.NumberOfAuxSymbols;
		}
		return std::make_pair(RawSymIndex * sizeof(SymbolTy), sizeof(SymbolTy));
}		}

Error COFFWriter::finalize(bool IsBigObj) {		Error COFFWriter::finalize(bool IsBigObj) {
		size_t SymTabSize, SymbolSize;
		std::tie(SymTabSize, SymbolSize) = IsBigObj
		? finalizeSymbolTable<coff_symbol32>()
		: finalizeSymbolTable<coff_symbol16>();

if (Error E = finalizeRelocTargets())		if (Error E = finalizeRelocTargets())
return E;		return E;
if (Error E = finalizeSymbolContents())		if (Error E = finalizeSymbolContents())
return E;		return E;

size_t SizeOfHeaders = 0;		size_t SizeOfHeaders = 0;
FileAlignment = 1;		FileAlignment = 1;
size_t PeHeaderSize = 0;		size_t PeHeaderSize = 0;
Show All 35 Lines	if (Obj.IsPE) {
}		}

// If the PE header had a checksum, clear it, since it isn't valid		// If the PE header had a checksum, clear it, since it isn't valid
// any longer. (We don't calculate a new one.)		// any longer. (We don't calculate a new one.)
Obj.PeHeader.CheckSum = 0;		Obj.PeHeader.CheckSum = 0;
}		}

size_t StrTabSize = finalizeStringTable();		size_t StrTabSize = finalizeStringTable();
size_t SymTabSize, SymbolSize;
std::tie(SymTabSize, SymbolSize) = IsBigObj
? finalizeSymbolTable<coff_symbol32>()
: finalizeSymbolTable<coff_symbol16>();

size_t PointerToSymbolTable = FileSize;		size_t PointerToSymbolTable = FileSize;
// StrTabSize <= 4 is the size of an empty string table, only consisting		// StrTabSize <= 4 is the size of an empty string table, only consisting
// of the length field.		// of the length field.
if (SymTabSize == 0 && StrTabSize <= 4 && Obj.IsPE) {		if (SymTabSize == 0 && StrTabSize <= 4 && Obj.IsPE) {
// For executables, don't point to the symbol table and skip writing		// For executables, don't point to the symbol table and skip writing
// the length field, if both the symbol and string tables are empty.		// the length field, if both the symbol and string tables are empty.
PointerToSymbolTable = 0;		PointerToSymbolTable = 0;
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines

template <class SymbolTy> void COFFWriter::writeSymbolStringTables() {		template <class SymbolTy> void COFFWriter::writeSymbolStringTables() {
uint8_t *Ptr = Buf.getBufferStart() + Obj.CoffFileHeader.PointerToSymbolTable;		uint8_t *Ptr = Buf.getBufferStart() + Obj.CoffFileHeader.PointerToSymbolTable;
for (const auto &S : Obj.getSymbols()) {		for (const auto &S : Obj.getSymbols()) {
// Convert symbols back to the right size, from coff_symbol32.		// Convert symbols back to the right size, from coff_symbol32.
copySymbol<SymbolTy, coff_symbol32>(reinterpret_cast<SymbolTy >(Ptr),		copySymbol<SymbolTy, coff_symbol32>(reinterpret_cast<SymbolTy >(Ptr),
S.Sym);		S.Sym);
Ptr += sizeof(SymbolTy);		Ptr += sizeof(SymbolTy);
std::copy(S.AuxData.begin(), S.AuxData.end(), Ptr);		if (!S.AuxFile.empty()) {
Ptr += S.AuxData.size();		std::copy(S.AuxFile.begin(), S.AuxFile.end(), Ptr);
		jhendersonUnsubmitted Done Reply Inline Actions Again, a few comments around here would be good. jhenderson: Again, a few comments around here would be good.
		// This assumes that unwritten parts of the memory mapped file
		// are initialized to zero.
		Ptr += S.Sym.NumberOfAuxSymbols * sizeof(SymbolTy);
		} else {
		for (const AuxSymbol &AuxSym : S.AuxData) {
		ArrayRef<uint8_t> Ref = AuxSym.getRef();
		std::copy(Ref.begin(), Ref.end(), Ptr);
		jhendersonUnsubmitted Not Done Reply Inline Actions Are coff_symbol16 and SymbolTy supposed to be the same size? Or are you deliberately writing less (or more) into the buffer than you iterate over? jhenderson: Are coff_symbol16 and SymbolTy supposed to be the same size? Or are you deliberately writing…
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions I'm deliberately writing more or less, yes. The COFF-ism is that the symbol table can consist of entries of either coff_symbol16 or coff_symbol32, of 18 or 20 bytes each. A symbol can be followed by a number of aux symbols, which can be one of a number of different structs, all 18 bytes each. If the table consists of coff_symbol32 entries, each one of the aux symbols (opaque aux structs) will have 2 bytes of padding at the end. So here I'm writing chunks of 18 bytes at a time out of the stored AuxData (where they are packed tightly), spaced 18 or 20 bytes apart in the output symbol table (depending on the entry size of that symbol table). mstorsjo: I'm deliberately writing more or less, yes. The COFF-ism is that the symbol table can consist…
		Ptr += sizeof(SymbolTy);
		}
		}
		jhendersonUnsubmitted Not Done Reply Inline Actions Strictly speaking, you don't need the braces in this if/else. jhenderson: Strictly speaking, you don't need the braces in this if/else.
}		}
if (StrTabBuilder.getSize() > 4 \|\| !Obj.IsPE) {		if (StrTabBuilder.getSize() > 4 \|\| !Obj.IsPE) {
// Always write a string table in object files, even an empty one.		// Always write a string table in object files, even an empty one.
StrTabBuilder.write(Ptr);		StrTabBuilder.write(Ptr);
Ptr += StrTabBuilder.getSize();		Ptr += StrTabBuilder.getSize();
}		}
}		}

▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] [COFF] Fix handling of aux symbols for big objectsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 183065

test/tools/llvm-objcopy/COFF/Inputs/bigobj.o.gz

test/tools/llvm-objcopy/COFF/bigobj.test

test/tools/llvm-objcopy/ELF/Inputs/ungzip.py

test/tools/llvm-objcopy/ELF/auto-remove-shndx.test

test/tools/llvm-objcopy/ELF/many-sections.test

test/tools/llvm-objcopy/ELF/remove-shndx.test

test/tools/llvm-objcopy/ELF/strict-no-add.test

test/tools/llvm-objcopy/Inputs/ungzip.py

tools/llvm-objcopy/COFF/COFFObjcopy.cpp

tools/llvm-objcopy/COFF/Object.h

tools/llvm-objcopy/COFF/Object.cpp

tools/llvm-objcopy/COFF/Reader.cpp

tools/llvm-objcopy/COFF/Writer.h

tools/llvm-objcopy/COFF/Writer.cpp

[llvm-objcopy] [COFF] Fix handling of aux symbols for big objects
ClosedPublic