This is an archive of the discontinued LLVM Phabricator instance.

Don't store names in the symbols
Needs ReviewPublic

Authored by espindola on Apr 5 2016, 9:23 AM.

Details

Reviewers
ruiu
Summary

Most of the linker doesn't have to handle symbol names. The parts that do are close enough to the symbol table that they can get the name from there.

This deletes the boilerplate for tracking whether we have a name or name offset. It also reduces SymbolBody from 56 to 40 bytes on x86_64.

The performance results are more mixed than what I was expecting, but I think the patch is still worth it:

the gold plugin

master 0.389837796
patch  0.390047270

clang

master 0.669623963
patch  0.666252931

llvm-as

master 0.035799643
patch  0.035869600

the gold plugin fsds

master 0.417012576
patch  0.415979779

clang fsds

master 0.722732074
patch  0.719288980

llvm-as fsds

master 0.032511485
patch  0.032528634

scylla

master 4.915247943
patch  4.978490188

Diff Detail

Event Timeline

rafael updated this revision to Diff 52703.Apr 5 2016, 9:23 AM
rafael retitled this revision from to Don't store names in the symbols.
rafael updated this object.
rafael added a reviewer: ruiu.
rafael added a subscriber: llvm-commits.
ruiu edited edge metadata.Apr 5 2016, 10:40 AM

If this is neutral in terms of performance, I wouldn't probably do this, as it feels pretty natural to me to have symbols have names, even though we do not use them often in the linker. It is also useful for debugging and easy to understand.

In addition to that, this change seems to make it hard to parallelize file parsing. Currently, most part of file parsing does not interact with other parts of the linker, so we can call them in parallel. However, if you pass Resolve to each parsing function, it is no longer possible.

ELF/InputFiles.h
83–84

I'd define three functions instead of dispatching three ways in a single function.

115

Is there any reason to use llvm::function_read instead of std::function?

If this is neutral in terms of performance, I wouldn't probably do this, as it feels pretty natural to me to have symbols have names, even though we do not use them often in the linker. It is also useful for debugging and easy to understand.

Fair enough.

In addition to that, this change seems to make it hard to parallelize file parsing. Currently, most part of file parsing does not interact with other parts of the linker, so we can call them in parallel. However, if you pass Resolve to each parsing function, it is no longer possible.

True. I did try keeping just the offset, but that requires most of the
changes in this patch anyway. Otherwise we compute the size of global
symbols multiple times.

Comment at: ELF/InputFiles.h:115
@@ -113,2 +114,3 @@

explicit ObjectFile(MemoryBufferRef M);
  • void parse(llvm::DenseSet<StringRef> &ComdatGroups);

+ void parse(llvm::function_ref<void(StringRef, SymbolBody *)> Resolve,

+ llvm::DenseSet<StringRef> &ComdatGroups);

Is there any reason to use llvm::function_read instead of std::function?

It is supposed to be faster for when we don't need to store it
somewhere. I confess I don't know the details.

Cheers,
Rafael

espindola commandeered this revision.Mar 15 2018, 8:58 AM
espindola added a reviewer: rafael.