The test results are interesting
Apr 27 2018
You have to update
I think this is fine. I will run the benchmarks locally to confirm.
I really like the idea of changing the default, but maybe instead of an argument we should just have two functions:
Delete call to reserve.
I tried this patch and found that at least for Chrome, we wasted memory by calling reserve() on ".data.rel.ro" sections. Such section in Chrome doesn't add any item to Sec.Relocations while their Rels.size() is relatively large, so reserved memory is totally wasted. If you don't call reserve on such section, you can save almost the same memory as you did in this patch. Can you take a look?
I think it is probably OK. What about adding an assert checking that we do decompress only for non-alloc sections?
correct patch .
Use end() when inserting. Doesn't make a difference in here, but is the canonical way of concatenating vectors.
I wonder if you can make a guess on how many relocations will be inserted to the vector. We know the number of relocations for each input section, so calling reserve() might work.
I think reserve() is exactly what causes high peaks now. We already call it. See:
Apr 26 2018
It is possible that xxhash is just too slow for use in a hash table. The experiment I did for pr37029 using hash_combine was still using strlen.
Does your code make this part of code faster? It reduces memory allocation but it does extra memcpy, so I'm wondering.
The part MC and Object part of this patch LGTM.
Since this is mainly about debug info and I not an expert in the area, please get one more LGTM before committing.
The new message is IMO slightly better, so I am OK with that patch.
But I am actually thinking about something like:
"Layout sections to place symbols in the order specified by symbol ordering file"
Yeah, that's also true. Slightly unrelated to this change, but --warn-symbol-ordering is silent right now in case you forget -ffunction-sections (but of course the ordering doesn't work). Would it be reasonable to extend the warnings to cover that case? I'm thinking, for each symbol specified in the symbol ordering file, warn if that symbol's section also contains other symbols, though I don't know if that would lead to false positives and/or be reasonable cost-wise.
Maybe we can warn if two or more symbols specified by a symbol ordering file belong to one section? I'm not sure if that would cause false positives, but it might be worth to try.
I agree that an additional warning here might be nice. We would probably only want to warn once (or possibly some other low limit) per section, not per symbol, as otherwise if you forget -ffunction-sections, you will get far too many warnings.
Implement the remaining suggestions.
Rebased and address some of the review comments.
Apr 25 2018
Maybe we should remove StringRefZ class and use a char pointer instead? The point of having StringRefZ class is that an instance of StringRefZ is automatically converted to StringRefZ when needed, but I think we no longer use the feature after this patch.
Apr 24 2018
Actually, I misread the location of the call to demoteSharedSymbols. I now think this is the proper fix as it just simplifies the symbols before we start processing the relocations.
I am finally trying to reduce what went wrong with this patch.
Since this is using information inside a single fragment when producing assembly I am OK with it.
Apr 23 2018
Apr 20 2018
Apr 19 2018
I believe a fast strlen() can be implemented using SSE instructions. And if you are using SSE instructions, your data is loaded to XMM registers. I believe there exists a fast vectorized hash function that works on data on a XMM register. I wonder if we can combine the two to create a single fast function that returns the length of a string as well as its hash value.
Does --start-lib/--end-lib really implies --start-group/--end-group?
Have the OutputOffsets store just a uint64_t. This uses a bit of the hash for the live bit.
Apr 18 2018
Apr 16 2018
This is a plt by another name, no?
Do you know why it is defined to have another name?
I'm not sure what you mean by 'plt' here. In this abi the .plt section is just the array of addresses of external functions that the dynamic linker fills out at runtime. The .glink section holds the lazy resolution stubs which will setup the environment for the dynamic linker to do so. Then there are also stubs for calling the external functions by loading their address out of the .plt. Am I right to assume by 'plt' you mean a combination of lazy resolver and the call stub?
LGTM with a few last requests.
This is causing LLD to drop __cxa_finalize from the symbol tables of some Chromium binaries, and that causes them to crash on shutdown. See the test failures here:
I'm going to revert this for now. I found that re-linking just libipc_mojom_shared.so and libmojo_mojom_bindings_shared.so with LLD after this change causes ipc_tests to crash reliably on shutdown.
This is a plt by another name, no?
It should also be possible to template Hash.h over the returned type so that some clients can explicitly request a 32 or 64 bit hash. Not sure if that change would be accepted.
Do you know why it have to use size_t, btw? Given that hash_value falls back to short_hash that returns uint64_t, I wonder if it was intentional design or can be changed to always use uint64_t.
LGTM with the sort predicate fixed.
Apr 13 2018
No much difference for me:Function Name Total CPU(%) Total CPU (ms) * After the change: - lld.exe (PID: 15032) 100.00 4166 + lld::elf::MergeInputSection::splitIntoPieces 22.40 933 * Default (xxHash64): - lld.exe 100.00% 4254 + lld::elf::MergeInputSection::splitStrings 21.86% 930
Apr 12 2018
LGTM with a small change.
The results I got
My idea was actually to combine both patches.
I just noticed that hash_short will read at most 64 bytes of the string.
Interesting. The patch by itself seems fine. I will benchmark it locally.
Would never having a null InputFile be sufficient?
Like assigning a dummy ObjFile or something to synthetic symbols?