This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Change the way of sorting local symbols.
ClosedPublic

Authored by grimar on Apr 11 2018, 7:47 AM.

Details

Summary

rLLD329787 added the stable sorting to SymbolTableBaseSection::postThunkContents.

I profiled the Mozilla (response-O0.txt) from lld-speed-test package and found
std::stable_sort is showing up in profile results and consuming the 3.1% of the total
CPU time in the RelWithDebug build. Total time of postThunkContents is 3.54%, 238ms.

This change reduces postTimeContents time to 50ms, making it to take 0.73% of Total CPU time.

Instead of sorting the local part I suggest to just rebuild it.

Diff Detail

Repository
rLLD LLVM Linker

Event Timeline

grimar created this revision.Apr 11 2018, 7:47 AM
espindola accepted this revision.Apr 11 2018, 5:09 PM

Add a comment on why we don't use stable_sort.

LGTM with that.

This revision is now accepted and ready to land.Apr 11 2018, 5:09 PM
ruiu added a comment.Apr 11 2018, 5:10 PM

Strictly speaking, this is a violation of the rule that we shouldn't use a hash table for symbols other than the Symbol Table. Is there any way to avoid using a new hash table?

ruiu added a comment.Apr 11 2018, 5:12 PM

I mean just sorting a symbol shouldn't take 0.73% of the total execution time.

ruiu added a comment.Apr 11 2018, 5:17 PM

George, can you try to not use hash table like this in the future? Both this patch and your previous patch use a hash table to store a lot of symbols, which is a violation of the rule I set to achieve a good performance. If it is unavoidable, please explain in the commit message so that we can discuss and examine if it is really unavoidable. Just using a hash table without an explanation isn't good.

George, can you try to not use hash table like this in the future? Both this patch and your previous patch use a hash table to store a lot of symbols, which is a violation of the rule I set to achieve a good performance. If it is unavoidable, please explain in the commit message so that we can discuss and examine if it is really unavoidable. Just using a hash table without an explanation isn't good.

Note that this hash in on InputFiles, not symbols.

ruiu added a comment.Apr 11 2018, 5:23 PM

Yeah, but the it looks up the hash table N times where N is the number of local symbols, no?

Yeah, but the it looks up the hash table N times where N is the number of local symbols, no?

It does.

Since the order of two files is arbitrary, how about just giving each InputFile a number when it is created ?

ruiu added a comment.Apr 11 2018, 5:43 PM

Since the order of two files is arbitrary, how about just giving each InputFile a number when it is created ?

That's exactly what I was thinking. :) I'll send you guys a patch.

grimar added a comment.EditedApr 12 2018, 1:51 AM

Since the order of two files is arbitrary, how about just giving each InputFile a number when it is created ?

That's exactly what I was thinking. :) I'll send you guys a patch.

That will not work. First what I tried was: I changed the predicate from return FileIDs[L.Sym->File] < FileIDs[R.Sym->File]; to return L.Sym->File < R.Sym->File; (just to look how the lookup in the hash table affects the perfomance).
It decreased CPU time by about 1%, but it still was about 2%. After that this patch born.

Can we land it stand alone? It still brings a noticeable speed-up even without D45548.

ruiu accepted this revision.Jun 26 2018, 1:44 AM

LGTM

Thanks, Rui!

This revision was automatically updated to reflect the committed changes.