Download Raw Diff

Details

Reviewers

jhenderson
alexander-shaposhnikov
echristo
jakehehrlich

Commits

rG46201fb7bc31: [llvm-objcopy] Fix null symbol handling
rL333772: [llvm-objcopy] Fix null symbol handling

Summary

This fixes the bug where strip-all option was leading to a malformed outputted ELF file.

Diff Detail

Repository: rL LLVM

Event Timeline

paulsemel created this revision.May 26 2018, 10:13 AM

thanks for addressing this (i noticed this isssue too, but didn't have a minute to send a fix).
Regarding the code: to be honest i would prefer to avoid these somewhat "hacky" manipulations with the index (i.e. in getSymbolByIndex etc),
instead, i would keep this special symbol in the symbol table and in llvm-objcopy.cpp I would replace the check !Obj.SymbolTable->empty() with
Obj.SymbolTable->size() >= 2 or smth like this (+ add a comment that we need to take into account that special symbol).
What are your thoughts ?

Hi !

In D47414#1113435, @alexshap wrote:

thanks for addressing this (i noticed this isssue too, but didn't have a minute to send a fix).
Regarding the code: to be honest i would prefer to avoid these somewhat "hacky" manipulations with the index (i.e. in getSymbolByIndex etc),
instead, i would keep this special symbol in the symbol table and in llvm-objcopy.cpp I would replace the check !Obj.SymbolTable->empty() with
Obj.SymbolTable->size() >= 2 or smth like this (+ add a comment that we need to take into account that special symbol).
What are your thoughts ?

Thansk for reviewing ! I actually agree with you that this hacky thing with the indexes is not perfect at all.. However, as it is with the null section, I would have really liked not to expose this symbol at all to the user.
Indeed, we might really want not to think about this dummy symbol while implementing new options or smth.
I couldn't find a smarter way to do this, do you have smth in mind ? :)

I've been thinking about the structure a lot here. It seems to me that the right thing to do IS to have a null symbol that getSymbolByIndex() returns. If a user has explicitly asked for the symbol at index 0, who are we to say they can't have it? As far as I can see, there is nothing in the ELF standard that prevents us having one.

Indeed, this is a wider problem - our internal implementation of the SymbolTableSection needs to not expose that we don't have a real null symbol. This includes things like how the size is calculated, how to iterate over things, whether or not it is empty etc. If there is a need to explicitly distinguish the null symbol from other symbols on a per-symbol basis, we should have something to identify it as such (e.g. Index == 0).

I know this is different from how the section header table works, and I'm beginning to wonder whether we made the wrong choice there too.

This then brings up the question, how do we prevent the symbol from being stripped? Unlike @alexshap, I'd prefer to keep the empty() check in the symbol table. Perhaps the definition of empty() should be "no symbols other than the null symbol"? In other words Symbols.size() == 1. The removeSymbols predicate would explicitly not remove the null symbol, or better yet, SymbolTableSection::removeSymbols (and probably also updateSymbols) start at the second symbol, rather than the first. What do you guys think?

test/tools/llvm-objcopy/strip-all-and-keep-symbol.test
61 ↗	(On Diff #148724)	Could you add something here (or in a different test) to show that we haven't given the null symbol a real name, please?
tools/llvm-objcopy/Object.cpp
121 ↗	(On Diff #148724)	zero symbol -> null symbol It's worth pointing out that the standard requires the first byte to be a null character, so maybe this should be added as a string in the StringTableSection `initialize` method or constructor instead? That will also ensure that the size is correct, which could affect the behaviour of `assignOffsets` etc.
170 ↗	(On Diff #148724)	Comment here please, to explain why we start from 1.
1142 ↗	(On Diff #148724)	It feels to me like we should actually have a virtual .size() method on the section base class (which will be overriden to return `Symbols.size() + 1` in this case), which wraps this sort of thing, hiding the details from assignOffsets (or its caller). Alternatively, can't this be done in the symbol table constructor/`initialize` method?

in general i agree with @jhenderson. Regarding changing the definition of empty() or checking the size() (what i suggested) - the latter seemes to be more explicit to me and more intuitive / causing less surprise, but i don't insist.

In D47414#1114472, @alexshap wrote:

in general i agree with @jhenderson. Regarding changing the definition of empty() or checking the size() (what i suggested) - the latter seemes to be more explicit to me and more intuitive / causing less surprise, but i don't insist.

My main concern is that a developer can easily forget the difference, and use the wrong one, but I'm not particularly fussed either way.

to me this is fine either way, i don't really have a strong opinion on this, we can change the definition of empty() - it's fine too.

Hi!

I understand your point but still, there is smth that I don't get. Why would a user need to access the null symbol ? I see two different things :

For reading. This is useless, this is just all zeros right, it makes no sense to read it.
For writing. This is forbidden, we might never change the null symbol.

So, if we let the user play with it, that means we might add complexity while writing the new binary, to make sure the user didn't touched this symbol.
Can you elaborate more on the benefit to let the user access this symbol ?

The fact is that I find the way of changing the different functions that are using the SymbolTable not to touch the null symbol way more hacky than the current way (which is also hacky I admit it).

Our tool is an auxiliary tool, to help with manipulating ELF files. It is not a verification tool, so we should not be trying to identify every single problem with the ELF, unless it makes it hard/impossible to do something (e.g. a link field refers to a non-existent section). The only two clients of the symbol index that we care about, as far as I am aware, are relocations (which can have symbols index 0, but in this case just refer to no symbol - we should already special case this, although we don't), and group sections (which shouldn't refer to index 0, but it doesn't really matter if they do, as we do nothing with that symbol except get its updated index). End users can't directly manipulate the null symbol.

tools/llvm-objcopy/Object.cpp
219 ↗	(On Diff #148724)	How about `std::begin(Symbols)+1`? That's really simple, and will mean the null symbol is not removed.

Hi @jhenderson,

I'm really sorry, but still don't really get your point. I think that the goal is to make llvm-objcopy as close as possible as what GNU objcopy is doing in terms of functionality right ? My point was more that, in the future, we might want to implement other options that deals with symbols. And maybe it could be nice that, at this time, we don't have to deal with the null symbol anymore. My point was more : "There won't be any case (or options) where we will need to do smth with the null symbol, so we are better not exposing it".

I mean, it totally makes sense to make that, when the symbol vector is empty, that means that there is no symbol in the SymbolTable anymore, as no one needs to know whether the null symbol is present, as it always is. I don't know if you get my point, maybe I am going completely wrong, I would just need some explanations on this one :)

Btw, thanks for reviewing !

Let's imagine the case where we add an option --print-symbol-value=<index> which prints the value of the symbol at the specified index. I think it's clear that we should not prevent a user from printing the symbol at index 0 (i.e. the null symbol) - yes the value is always 0, but that doesn't stop us from printing it. It is clearly not the SymbolTableSection's responsibility to get the value from the symbol - it should be the symbol's responsibility to provide such an interface, and the caller should use it directly. As such, we would need a way to retrieve the symbol at index 0 from the symbol table. If a developer is trying to implement this, it makes sense for them to call getSymbolByIndex, but then they see that there is an error check for index 0. How do they then retrieve the null symbol? They can't, so they'll have to put some special case handling in for index 0. In other words, I disagree with you that a user will never want to read the null symbol. Manipulating it is perhaps another matter, but I don't care too much about verifying that it hasn't been adjusted - perhaps there's a good platform-specific reason somebody wants to?

A couple of related questions I'd like to be clear on:

What do you think should be the return value of SymbolTable.size() when there is only the null symbol around?
What do you think should be the return value of SymbolTable.empty() when there is only the null symbol around?

Well, the fact is that , in my opinion, the goal of llvm-objcopy or objcopy is not to print anything. Their only goal is to modify/alter the given binary. And the fact is that, as you can see with objcopy, for every options that plays with symbols, the null symbol is obviously omitted (I think that's also why you only search for symbol names, as you shouldn't need to do anything with null symbol).
But anyway, I can keep this symbol if you want, and modify the helper functions so that we omit the null symbol everywhere else !

Ok, I'm getting back to you because I really think that we want to go forward on this one, as this fixes a really "big" bug.

Are you guys totally convinced that we might keep this symbol ?

If so, I will change the implementation, even though I truly think not exposing the symbol will simplify the future development of llvm-objcopy.

cc: @jhenderson @alexshap

Quick drop in. What I was thinking about before was to just never pass the
null symbol to remove or anything else. It can still be there it just
shouldn't be allowed to be updated or removed.

As for how the null section header should be treated we have to consider
large indexes before proceeding.

In D47414#1117184, @jakehehrlich wrote:

Quick drop in. What I was thinking about before was to just never pass the
null symbol to remove or anything else. It can still be there it just
shouldn't be allowed to be updated or removed.

As for how the null section header should be treated we have to consider
large indexes before proceeding.

Yes, I think this is essentially my suggestion with the change to removeSymbols etc.

paulsemel updated this revision to Diff 149273.May 31 2018, 5:57 AM

paulsemel edited the summary of this revision. (Show Details)

jhenderson added inline comments.May 31 2018, 6:13 AM

tools/llvm-objcopy/Object.cpp
203 ↗	(On Diff #149273)	Usual style in LLVM is to assign begin and end values in the initialization clause. See Don't evaluate end() every time through a loop. Similarly use ++Sym instead of Sym++.
tools/llvm-objcopy/Object.h
370–371 ↗	(On Diff #149273)	I'd probably rephrase this, and say something like "An 'empty' symbol table still contains a null symbol." The fact that we don't alter it is not really relevant.

Applied James' changes.
Added test for null symbol name.

LGTM with the inline comments, but please wait for @alexshap or @jakehehrlich to confirm they're happy.

test/tools/llvm-objcopy/null-symbol.test
21 ↗	(On Diff #149283)	I think this can be simplified to `Name: (0)`. Whitespace (except new lines) is not significant for FileCheck, except to divide other non-whitespace, i.e. `Name: (0)` (many spaces) is identical to `Name: (0)` (one space) but not `Name:(0)` (no spaces).
tools/llvm-objcopy/Object.cpp
203–205 ↗	(On Diff #149283)	Hmm... now that I think about it, could we use std::for_each here?

This revision is now accepted and ready to land.May 31 2018, 7:09 AM

Don't wait on me until June 10th. I'm just skimming emails.

msg-7734-693.txt162 BDownload

Applied James' suggestions.

test/tools/llvm-objcopy/null-symbol.test
21 ↗	(On Diff #149283)	Thanks for the tips :)
tools/llvm-objcopy/Object.cpp
203–205 ↗	(On Diff #149283)	Actually yes :)

Closed by commit rL333772: [llvm-objcopy] Fix null symbol handling (authored by paulsemel). · Explain WhyJun 1 2018, 9:24 AM

This revision was automatically updated to reflect the committed changes.

Diff 149501

llvm/trunk/test/tools/llvm-objcopy/keep-file-symbols.test

Show All 23 Lines	- Name: foo
Section: .text		Section: .text
Global:		Global:
- Name: bar		- Name: bar
Type: STT_FUNC		Type: STT_FUNC
Section: .text		Section: .text

#STRIPALL: Symbols [		#STRIPALL: Symbols [
#STRIPALL-NEXT: Symbol {		#STRIPALL-NEXT: Symbol {
		#STRIPALL-NEXT: Name:
		#STRIPALL-NEXT: Value: 0x0
		#STRIPALL-NEXT: Size: 0
		#STRIPALL-NEXT: Binding: Local
		#STRIPALL-NEXT: Type: None
		#STRIPALL-NEXT: Other: 0
		#STRIPALL-NEXT: Section: Undefined
		#STRIPALL-NEXT: }
		#STRIPALL-NEXT: Symbol {
#STRIPALL-NEXT: Name: foo		#STRIPALL-NEXT: Name: foo
#STRIPALL-NEXT: Value: 0x0		#STRIPALL-NEXT: Value: 0x0
#STRIPALL-NEXT: Size: 0		#STRIPALL-NEXT: Size: 0
#STRIPALL-NEXT: Binding: Local		#STRIPALL-NEXT: Binding: Local
#STRIPALL-NEXT: Type: File		#STRIPALL-NEXT: Type: File
#STRIPALL-NEXT: Other: 0		#STRIPALL-NEXT: Other: 0
#STRIPALL-NEXT: Section: .text		#STRIPALL-NEXT: Section: .text
#STRIPALL-NEXT: }		#STRIPALL-NEXT: }
Show All 31 Lines

llvm/trunk/test/tools/llvm-objcopy/null-symbol.test

				# RUN: yaml2obj %s > %t
				# RUN: llvm-objcopy %t %t2
				# RUN: llvm-readobj -symbols %t2 \| FileCheck %s

				!ELF
				FileHeader:
				Class: ELFCLASS64
				Data: ELFDATA2LSB
				Type: ET_REL
				Machine: EM_X86_64
				Sections:
				- Name: .text
				Type: SHT_PROGBITS
				Flags: [ SHF_ALLOC, SHF_EXECINSTR ]
				Address: 0x1000
				AddressAlign: 0x0000000000000010
				Size: 8

				#CHECK: Symbols [
				#CHECK-NEXT: Symbol {
				#CHECK-NEXT: Name: (0)
				#CHECK-NEXT: Value: 0x0
				#CHECK-NEXT: Size: 0
				#CHECK-NEXT: Binding: Local
				#CHECK-NEXT: Type: None
				#CHECK-NEXT: Other: 0
				#CHECK-NEXT: Section: Undefined
				#CHECK-NEXT: }

llvm/trunk/test/tools/llvm-objcopy/strip-all-and-keep-symbol.test

	Show First 20 Lines • Show All 52 Lines • ▼ Show 20 Lines
	# CHECK: Name: .gnu.warning.foo			# CHECK: Name: .gnu.warning.foo
	# CHECK: Name: .symtab			# CHECK: Name: .symtab
	# CHECK: Name: .strtab			# CHECK: Name: .strtab
	# CHECK: Name: .shstrtab			# CHECK: Name: .shstrtab
	# CHECK-NOT: Name: .debug_bar			# CHECK-NOT: Name: .debug_bar

	#CHECK: Symbols [			#CHECK: Symbols [
	#CHECK-NEXT: Symbol {			#CHECK-NEXT: Symbol {
				#CHECK-NEXT: Name:
				#CHECK-NEXT: Value: 0x0
				#CHECK-NEXT: Size: 0
				#CHECK-NEXT: Binding: Local
				#CHECK-NEXT: Type: None
				#CHECK-NEXT: Other: 0
				#CHECK-NEXT: Section: Undefined
				#CHECK-NEXT: }
				#CHECK-NEXT: Symbol {
	#CHECK-NEXT: Name: foo			#CHECK-NEXT: Name: foo
	#CHECK-NEXT: Value: 0x1000			#CHECK-NEXT: Value: 0x1000
	#CHECK-NEXT: Size: 8			#CHECK-NEXT: Size: 8
	#CHECK-NEXT: Binding: Local			#CHECK-NEXT: Binding: Local
	#CHECK-NEXT: Type: Function			#CHECK-NEXT: Type: Function
	#CHECK-NEXT: Other: 0			#CHECK-NEXT: Other: 0
	#CHECK-NEXT: Section: .text			#CHECK-NEXT: Section: .text
	#CHECK-NEXT: }			#CHECK-NEXT: }
	#CHECK-NEXT:]			#CHECK-NEXT:]

llvm/trunk/tools/llvm-objcopy/Object.h

Show First 20 Lines • Show All 361 Lines • ▼ Show 20 Lines	protected:

using SymPtr = std::unique_ptr<Symbol>;		using SymPtr = std::unique_ptr<Symbol>;

public:		public:
void addSymbol(StringRef Name, uint8_t Bind, uint8_t Type,		void addSymbol(StringRef Name, uint8_t Bind, uint8_t Type,
SectionBase *DefinedIn, uint64_t Value, uint8_t Visibility,		SectionBase *DefinedIn, uint64_t Value, uint8_t Visibility,
uint16_t Shndx, uint64_t Sz);		uint16_t Shndx, uint64_t Sz);
void addSymbolNames();		void addSymbolNames();
bool empty() const { return Symbols.empty(); }		// An 'empty' symbol table still contains a null symbol.
		bool empty() const { return Symbols.size() == 1; }
const SectionBase *getStrTab() const { return SymbolNames; }		const SectionBase *getStrTab() const { return SymbolNames; }
const Symbol *getSymbolByIndex(uint32_t Index) const;		const Symbol *getSymbolByIndex(uint32_t Index) const;
Symbol *getSymbolByIndex(uint32_t Index);		Symbol *getSymbolByIndex(uint32_t Index);
void updateSymbols(function_ref<void(Symbol &)> Callable);		void updateSymbols(function_ref<void(Symbol &)> Callable);

void removeSectionReferences(const SectionBase *Sec) override;		void removeSectionReferences(const SectionBase *Sec) override;
void initialize(SectionTableRef SecTable) override;		void initialize(SectionTableRef SecTable) override;
void finalize() override;		void finalize() override;
▲ Show 20 Lines • Show All 268 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-objcopy/Object.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	if (SymbolNames == Sec) {
error("String table " + SymbolNames->Name +		error("String table " + SymbolNames->Name +
" cannot be removed because it is referenced by the symbol table " +		" cannot be removed because it is referenced by the symbol table " +
this->Name);		this->Name);
}		}
removeSymbols([Sec](const Symbol &Sym) { return Sym.DefinedIn == Sec; });		removeSymbols([Sec](const Symbol &Sym) { return Sym.DefinedIn == Sec; });
}		}

void SymbolTableSection::updateSymbols(function_ref<void(Symbol &)> Callable) {		void SymbolTableSection::updateSymbols(function_ref<void(Symbol &)> Callable) {
for (auto &Sym : Symbols)		std::for_each(std::begin(Symbols) + 1, std::end(Symbols),
Callable(*Sym);		[Callable](SymPtr &Sym) { Callable(*Sym); });
std::stable_partition(		std::stable_partition(
std::begin(Symbols), std::end(Symbols),		std::begin(Symbols), std::end(Symbols),
[](const SymPtr &Sym) { return Sym->Binding == STB_LOCAL; });		[](const SymPtr &Sym) { return Sym->Binding == STB_LOCAL; });
assignIndices();		assignIndices();
}		}

void SymbolTableSection::removeSymbols(		void SymbolTableSection::removeSymbols(
function_ref<bool(const Symbol &)> ToRemove) {		function_ref<bool(const Symbol &)> ToRemove) {
Symbols.erase(		Symbols.erase(
std::remove_if(std::begin(Symbols), std::end(Symbols),		std::remove_if(std::begin(Symbols) + 1, std::end(Symbols),
[ToRemove](const SymPtr &Sym) { return ToRemove(*Sym); }),		[ToRemove](const SymPtr &Sym) { return ToRemove(*Sym); }),
std::end(Symbols));		std::end(Symbols));
Size = Symbols.size() * EntrySize;		Size = Symbols.size() * EntrySize;
assignIndices();		assignIndices();
}		}

void SymbolTableSection::initialize(SectionTableRef SecTable) {		void SymbolTableSection::initialize(SectionTableRef SecTable) {
Size = 0;		Size = 0;
▲ Show 20 Lines • Show All 1,012 Lines • Show Last 20 Lines

llvm/trunk/tools/llvm-objcopy/llvm-objcopy.cpp

Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines	Obj.removeSymbols([&](const Symbol &Sym) {
if (Config.StripAll \|\| Config.StripAllGNU)		if (Config.StripAll \|\| Config.StripAllGNU)
return true;		return true;

if (!Config.SymbolsToRemove.empty() &&		if (!Config.SymbolsToRemove.empty() &&
is_contained(Config.SymbolsToRemove, Sym.Name)) {		is_contained(Config.SymbolsToRemove, Sym.Name)) {
return true;		return true;
}		}

// TODO: We might handle the 'null symbol' in a different way		if (Config.StripUnneeded && !Sym.Referenced &&
// by probably handling it the same way as we handle 'null section' ?
if (Config.StripUnneeded && !Sym.Referenced && Sym.Index != 0 &&
(Sym.Binding == STB_LOCAL \|\| Sym.getShndx() == SHN_UNDEF) &&		(Sym.Binding == STB_LOCAL \|\| Sym.getShndx() == SHN_UNDEF) &&
Sym.Type != STT_FILE && Sym.Type != STT_SECTION)		Sym.Type != STT_FILE && Sym.Type != STT_SECTION)
return true;		return true;

return false;		return false;
});		});
}		}

▲ Show 20 Lines • Show All 306 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Fix null symbol handling
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 149501

llvm/trunk/test/tools/llvm-objcopy/keep-file-symbols.test

llvm/trunk/test/tools/llvm-objcopy/null-symbol.test

llvm/trunk/test/tools/llvm-objcopy/strip-all-and-keep-symbol.test

llvm/trunk/tools/llvm-objcopy/Object.h

llvm/trunk/tools/llvm-objcopy/Object.cpp

llvm/trunk/tools/llvm-objcopy/llvm-objcopy.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy] Fix null symbol handlingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 149501

llvm/trunk/test/tools/llvm-objcopy/keep-file-symbols.test

llvm/trunk/test/tools/llvm-objcopy/null-symbol.test

llvm/trunk/test/tools/llvm-objcopy/strip-all-and-keep-symbol.test

llvm/trunk/tools/llvm-objcopy/Object.h

llvm/trunk/tools/llvm-objcopy/Object.cpp

llvm/trunk/tools/llvm-objcopy/llvm-objcopy.cpp

[llvm-objcopy] Fix null symbol handling
ClosedPublic