This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
ELF/
8/22
InputSection.cpp
-
test/ELF/
-
ELF/
-
comment-gc.s
-
debug-gc.s
-
emit-relocs-mergeable-i386.s
-
emit-relocs-mergeable.s
-
merge-string-no-null.s
-
string-gc.s

Differential D45571

[ELF] - Speedup MergeInputSection::splitStrings
AbandonedPublic

Authored by grimar on Apr 12 2018, 8:12 AM.

Download Raw Diff

Details

Reviewers

ruiu
• espindola

Summary

This is for https://bugs.llvm.org//show_bug.cgi?id=37029,
which was about the experiment of using hash_value for splitting strings.

hash_value at some point for short strings falls back to hash_short:
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/ADT/Hashing.h#L453

I think we can use it instead of xxHash64 for `splitStrings, as this method
uses uint64_t as a return value and shows really good results it seems.
It computes the hash of the part of the string, but this seems to be OK here.

I used scylla to profile changes and benchmarked few algorithms.
splitNonStrings did not show up in profile,
so I experimented only on changing splitStrings. Results are below:

* Default (xxHash64):                             CPU(%)   CPU(ms)
- lld.exe                                                      100.00%   4254
 + lld::elf::MergeInputSection::splitStrings  21.86%    930

* With use of hash_value:
- lld.exe                                                     100.00%   4001
 + lld::elf::MergeInputSection::splitStrings  18.25%    730

* With use of hashGnu:
- lld.exe                                                     100.00 %  4469
 + lld::elf::MergeInputSection::splitStrings  25.60 %  1144

* With use of hashSysV:
- lld.exe (PID: 5716)                                   100.00 %  5080
 + lld::elf::MergeInputSection::splitStrings  33.40 %  1711

* This patch:
- lld.exe (PID: 9192)                                  100.00%   3866
 + lld::elf::MergeInputSection::splitStrings  13.24 %   512

So this change improves total CPU time by about 10% (4254/3866) for Scylla.
And makes splitStrings about 80% faster.
(Note that is the time that profiler shows, I did not yet try to benchmark it
in a regular way).

Seed value used was taken from:
https://github.com/llvm-mirror/llvm/blob/master/include/llvm/ADT/Hashing.h#L328
It is equal to default seed used by hash_value

Diff Detail

Event Timeline

grimar created this revision.Apr 12 2018, 8:12 AM

Herald added subscribers: arichardson, emaste. · View Herald TranscriptApr 12 2018, 8:12 AM

grimar edited the summary of this revision. (Show Details)Apr 12 2018, 8:12 AM

Interesting. The patch by itself seems fine. I will benchmark it locally.

• espindola added inline comments.Apr 12 2018, 11:21 AM

ELF/InputSection.cpp
940	For consistency we should probably also replace this use of xxHash64.

grimar added inline comments.Apr 12 2018, 12:14 PM

ELF/InputSection.cpp
940	I just did not find a good test for that place. I tried to do that change, but for Scylla, there was no difference, so decided to not do that in this patch to keep the focus on the main place.

I just noticed that hash_short will read at most 64 bytes of the string.

This could cause a pathological case if many symbols have a common prefix, no?

One interesting thing about the current setup is that we first read the bytes in the string looking for the 0 that terminates the string. We then read them again to hash them. At least with a simple hash like djb (what is implemented in hashGnu) it should not be too hard to read each byte once. Would you mind giving that a try?

ELF/InputSection.cpp
940	I don't think there is ever a case where a non SHF_STRING SHF_MERGE section shows up in the profile.

ruiu added inline comments.Apr 12 2018, 1:45 PM

ELF/InputSection.cpp
866–867	This is interesting, but if this is effective, we should do that in StringRef::hash so that it speeds up everyone's code, shouldn't we?

What is your operating system and CPU?

In D45571#1066162, @espindola wrote:

I just noticed that hash_short will read at most 64 bytes of the string.

This could cause a pathological case if many symbols have a common prefix, no?

Sure, we will have many hash collisions then, but how much is real?

I see at least 2 good (IMO) solutions for that:

One of the options is to use hash_short for short strings only (<64) and fall back to something else otherwise,

but I would give a chance to the current way.

Also, we could just pass the 64 bytes from the middle of the string, assuming it covers both the prefix, data itself and postfix.

One interesting thing about the current setup is that we first read the bytes in the string looking for the 0 that terminates the string. We then read them again to hash them. At least with a simple hash like djb (what is implemented in hashGnu) it should not be too hard to read each byte once. Would you mind giving that a try?

Sure, I'll try.

In D45571#1066270, @ruiu wrote:

What is your operating system and CPU?

Windows 8, i7-4970K @ 4.00Ghz, 32gb ram.

In D45571#1066162, @espindola wrote:

One interesting thing about the current setup is that we first read the bytes in the string looking for the 0 that terminates the string. We then read them again to hash them. At least with a simple hash like djb (what is implemented in hashGnu) it should not be too hard to read each byte once. Would you mind giving that a try?

No much difference for me:

Function Name                                   Total CPU(%) Total CPU (ms)

* After the change:
- lld.exe (PID: 15032)                          100.00        4166
 + lld::elf::MergeInputSection::splitIntoPieces  22.40         933

* Default (xxHash64):
- lld.exe                                         100.00%        4254
 + lld::elf::MergeInputSection::splitStrings       21.86%         930

My test code was:

static size_t findNull(StringRef S, size_t EntSize, uint32_t& H) {
  uint32_t Hash = 5381;

  // Optimize the common case.
  if (EntSize == 1) {
    const char *Data = S.data();
    while (uint8_t C = *Data++)
      Hash = (Hash << 5) + Hash + C;
    H = Hash;
    return Data - S.data();
  }

  llvm_unreachable("scylla?");

  for (unsigned I = 0, N = S.size(); I != N; I += EntSize) {
    const char *B = S.begin() + I;
    if (std::all_of(B, B + EntSize, [](char C) { return C == 0; }))
      return I;
  }
}


// Split SHF_STRINGS section. Such section is a sequence of
// null-terminated strings.
void MergeInputSection::splitStrings(ArrayRef<uint8_t> Data, size_t EntSize) {
  size_t Off = 0;
  bool IsAlloc = Flags & SHF_ALLOC;
  StringRef S = toStringRef(Data);

  while (!S.empty()) {
    uint32_t Hash;
    size_t Size = findNull(S, EntSize, Hash);

    Pieces.emplace_back(Off, Hash, !IsAlloc);
    S = S.substr(Size);
    Off += Size;
  }
}

Removed excessive change.

ELF/InputSection.cpp
866–867	Hmm. I thought it has a minor but positive effect with this change. But today I re-profiled exactly this place using more iterations and it seems it was a computation error. So I removed this part.

OK,

No much difference for me:

Function Name                                   Total CPU(%) Total CPU (ms)

* After the change:
- lld.exe (PID: 15032)                          100.00        4166
 + lld::elf::MergeInputSection::splitIntoPieces  22.40         933

* Default (xxHash64):
- lld.exe                                         100.00%        4254
 + lld::elf::MergeInputSection::splitStrings       21.86%         930

On the description you report that just using gnuHash is 4469, so I think some reasonable hypothesis so far:

Reading the value only once is a good improvement.
Reading a byte at a time is a noticeable lost.

so it would probably be ideal to combine some of the memchr tricks for reading multiple bytes at a time with a simpleish hash that can combine more than one char at a time.

It should also be possible to template Hash.h over the returned type so that some clients can explicitly request a 32 or 64 bit hash. Not sure if that change would be accepted.

In D45571#1067425, @espindola wrote:

It should also be possible to template Hash.h over the returned type so that some clients can explicitly request a 32 or 64 bit hash. Not sure if that change would be accepted.

Do you know why it have to use size_t, btw? Given that hash_value falls back to short_hash that returns uint64_t, I wonder if it was intentional design or can be changed to always use uint64_t.

I am also will be happy to experiment with your suggestions. Will only be able to start that a bit later though (I am off during the next week because of llvm-euro).

In D45571#1067496, @grimar wrote:

In D45571#1067425, @espindola wrote:

It should also be possible to template Hash.h over the returned type so that some clients can explicitly request a 32 or 64 bit hash. Not sure if that change would be accepted.

Do you know why it have to use size_t, btw? Given that hash_value falls back to short_hash that returns uint64_t, I wonder if it was intentional design or can be changed to always use uint64_t.

I think part of the reason is the desire to make the results unstable. The hashing API can return different results in different machines. If there is a hash algorithm that works better with AVX, that algorithm can be used if the host has AVX.

What we need is some form of signature/digest. We can change it as lld evolves, but it has to produce the same result in every host.

I believe a fast strlen() can be implemented using SSE instructions. And if you are using SSE instructions, your data is loaded to XMM registers. I believe there exists a fast vectorized hash function that works on data on a XMM register. I wonder if we can combine the two to create a single fast function that returns the length of a string as well as its hash value.

In D45571#1071365, @ruiu wrote:

I believe a fast strlen() can be implemented using SSE instructions. And if you are using SSE instructions, your data is loaded to XMM registers. I believe there exists a fast vectorized hash function that works on data on a XMM register. I wonder if we can combine the two to create a single fast function that returns the length of a string as well as its hash value.

Probably, but I would suggest not going that far in the first patch. Also note that we can use a memchr ,which is a bit easier than strlen. The hash_value in llvm uses 64 bits at a time. Given that using a byte at a time djbhash was already a small speedup, using 64 bits at a time for a combined memchr and hash should be very helpful.

grimar planned changes to this revision.Apr 23 2018, 8:44 AM

Reimplemented. With that implementation timings for scylla are:

This patch:
- lld.exe                                                    100.00%   3944
 + lld::elf::MergeInputSection::splitStrings  10.75%    424

LLD head:
- lld.exe                                                    100.00%   4234
 + lld::elf::MergeInputSection::splitStrings  21.82%    924

• espindola added inline comments.Apr 24 2018, 11:07 AM

ELF/InputSection.cpp
886	This will produce different results on a big endian host, no?
894	I don't know enough about hashing to judge if this is a reasonable extension of the djb hash for using 4 bytes at a time, but we can probably start with it.

Addressed comments.

ELF/InputSection.cpp
886	You are right.. To solve this, I tried to rewrite hashing code right below to read byte by byte in a loop, but that damages the performance too much. Then tried both `read32be()` and `read32le()` (my host is LE). Avg time for them was the same and has no difference from `reinterpret_cast<const uint32_t >`, so it seems we can use it.
894	I experimented here, any bad hashing increases hash collisions, what instantly shows up in the profile. My approach seems works well, so I think it is OK. It is simpler than taking single bytes, and also the loop reading bytes worked slower for me.

grimar planned changes to this revision.Apr 25 2018, 10:04 AM

Fixed last minute issue (forgot to rename read32le to generic read32).

• espindola added inline comments.Apr 26 2018, 7:19 PM

ELF/InputSection.cpp
877	I think I just realized a problem with this. If you have a string that is more than 4 bytes long, its hash value depends on its address, no? For example, if the string foobarzed starts at a position 4 byte aligned, we will hash it as foob arzed \0 but if it is offset by one byte, it will be hashed as f ooba zed\0 You can probably just delete the initial alignment loop. This will produce slow code on cpus that don't support fast unaligned loads, but we are already very slow in those cases.

grimar added inline comments.Apr 27 2018, 6:25 AM

ELF/InputSection.cpp
877	Oh, nice catch! I removed the loop and what is interesting, it sped up the timings for me even more! Time went from ~3968 to ~3704 (total LLD CPU time). And from ~443 to ~415 for `splitStrings`. Seems it means it probably produces better hash with that change. I'll update the patch.

Addressed review comment.

I think this is fine. I will run the benchmarks locally to confirm.

ELF/InputSection.cpp
892	It should be OK to return a std::pair, no?

You have to update

lld :: ELF/compressed-debug-input.s                 
lld :: ELF/relocatable-compressed-input.s

I guess you don't have zlib installed.

In D45571#1081592, @espindola wrote:
You have to update
lld :: ELF/compressed-debug-input.s                 
lld :: ELF/relocatable-compressed-input.s
I guess you don't have zlib installed.

I'll check, thanks. zlib is generally not enabled/supported for windows LLVM configuration still, I believe.

ELF/InputSection.cpp
892	I tried, but it affected the perfomance for me. I can probably add `Hash` reference out argument instead.

For some reason I missed this thread. I'll review this, but it looks pretty promising!

ruiu added inline comments.Apr 27 2018, 2:13 PM

ELF/InputSection.cpp
872	I'd name this `H`.
875	sizeof(uint32_t) is always 4, so please write just 4.
880	Omit `!= 0` because it is always implied.
883	I don't think you need to update DataSize. You know the end position of S, so you can compare Data with it to see if you've reached end of a string.
884	`* 33` is perhaps better for brevity.
889	Likewise, maintaining both DataSize and Data doesn't seem to make sense to me.
896	You should be able to handle this case gracefully.
901	Split this entire function into two; one for '\0'-terminated string and the other for multi-word string. That's better than doing it inside the while loop for readability.
914–916	This is very odd, and we should avoid returning two values in one word. Since `findSizeAndHash` is inlined, I don't think you need this.

The test results are interesting

The geometric mean for seconds-elapsed improves by 0.5%. Scylla is just 0.3% faster.

Part of the reason is a 12% regression on the number of instructions for scylla. Maybe that is because xxhash64 hashes 8 bytes at a time? Have you experimented with reading 8 bytes at a time?

Rafael,

If you have time, can you plug in my change and compare that in your environment? I didn't see any noticeable difference on my machine, but it might on other environment.

In D45571#1081681, @espindola wrote:

The test results are interesting

The geometric mean for seconds-elapsed improves by 0.5%. Scylla is just 0.3% faster.

Strange, because I am observing stable major boost. Restested today just in case.
Total CPU time, 3 runs, MSVS profiler, RelWithDebInfo configuration, linking \lld-speed-test\scylla.
Windows 8, i7-4970K @ 4.00Ghz, 32gb ram.

Numbers:

LLD (r331097): 4057ms, 4076ms, 4028ms
This patch: 3537ms, 3511ms, 3531ms.

Part of the reason is a 12% regression on the number of instructions for scylla. Maybe that is because xxhash64 hashes 8 bytes at a time? Have you experimented with reading 8 bytes at a time?

It was slower for me. But I think it worth to retest. I am going to address the rest comments
so that it would be easy to switch between 4 and 8 bytes reading at a time and then will retest the difference between these 2.

grimar mentioned this in D46163: [DO NOT SUBMIT] Do strlen() while computing xxhash.Apr 29 2018, 6:20 PM

Update:
Time of D46163 for me was: 3873, 3916, 3937.

Addressed review comments.

I tried reading both 4 and 8 bytes at one. Code for 8 bytes was:

static inline size_t findSizeAndHash(StringRef S, uint64_t &Hash) {
  const char *Data = S.data();
  const char *const End = Data + S.size();

  // We are going to calculate simple hash based on DJB hash below. Hash is
  // calculated as the same time as we read the string bytes for speedup.
  uint64_t H = 5381;

  // Load a word at a time and test if any of bytes is 0-byte.
  while (End - Data > 8) {
    uint64_t Word = read64(Data);
    // This checks if at least one byte of a word is a null byte.
    // If we found such case we want to break the loop and continue
    // testing the single bytes to find the exact null byte position.
    if ((Word - 0x0101010101010101) & ~Word & 0x8080808080808080)
      break;
    Data += 8;
    H = H * 33 + Word;
  }

  // Now find the exact position of the null byte. Do not forget to
  // update the hash value too.
  while (End - Data) {
    H = H * 33 + *Data;
    if (!*Data) {
      Hash = H;
      return Data - S.data() + 1;
    }
    ++Data;
  }

  llvm_unreachable("string is not null terminated");
}

It seems generally works about 50ms slower than 4 bytes at a time (posted diff) for me, though
the difference is so minor sometimes that I am inclined to think it can be the calculation error probably.

Do we want it?

Its too old.

Herald added a subscriber: MaskRay. · View Herald TranscriptAug 9 2020, 1:22 AM

In D45571#2205088, @grimar wrote:

Its too old.

This is still interesting. We probably should use int64_t instead of int32_t.

MaskRay mentioned this in D115983: [ELF] Speed up SHF_MERGE|SHF_STRINGS split.Dec 17 2021, 10:05 PM

Revision Contents

Path

Size

ELF/

InputSection.cpp

62 lines

test/

ELF/

comment-gc.s

2 lines

debug-gc.s

4 lines

emit-relocs-mergeable-i386.s

8 lines

emit-relocs-mergeable.s

8 lines

merge-string-no-null.s

2 lines

string-gc.s

4 lines

Diff 144326

ELF/InputSection.cpp

Show First 20 Lines • Show All 843 Lines • ▼ Show 20 Lines	for (size_t Off = 0, End = Data.size(); Off != End;) {
// The empty record is the end marker.		// The empty record is the end marker.
if (Size == 4)		if (Size == 4)
break;		break;
Off += Size;		Off += Size;
}		}
}		}

static size_t findNull(StringRef S, size_t EntSize) {		static size_t findNull(StringRef S, size_t EntSize) {
// Optimize the common case.
if (EntSize == 1)
return S.find(0);

for (unsigned I = 0, N = S.size(); I != N; I += EntSize) {		for (unsigned I = 0, N = S.size(); I != N; I += EntSize) {
const char *B = S.begin() + I;		const char *B = S.begin() + I;
if (std::all_of(B, B + EntSize, [](char C) { return C == 0; }))		if (std::all_of(B, B + EntSize, [](char C) { return C == 0; }))
return I;		return I;
}		}
return StringRef::npos;		return StringRef::npos;
}		}

SyntheticSection *MergeInputSection::getParent() const {		SyntheticSection *MergeInputSection::getParent() const {
return cast_or_null<SyntheticSection>(Parent);		return cast_or_null<SyntheticSection>(Parent);
}		}

		// This helper function takes a null-terminated string and finds string size and
		// hash value. Return value is [uint32 Size, uint32 Hash] packed into uint64.
		static inline uint64_t findSizeAndHash(StringRef S) {
		size_t DataSize = S.size();
		ruiuUnsubmitted Not Done Reply Inline Actions This is interesting, but if this is effective, we should do that in StringRef::hash so that it speeds up everyone's code, shouldn't we? ruiu: This is interesting, but if this is effective, we should do that in StringRef::hash so that it…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions Hmm. I thought it has a minor but positive effect with this change. But today I re-profiled exactly this place using more iterations and it seems it was a computation error. So I removed this part. grimar: Hmm. I thought it has a minor but positive effect with this change. But today I re-profiled…
		const uint8_t Data = reinterpret_cast<const uint8_t >(S.data());

		// We are going to calculate simple hash based on DJB hash below. Hash is
		// calculated as the same time as we read the string bytes for speedup.
		uint32_t Hash = 5381;
		ruiuUnsubmitted Done Reply Inline Actions I'd name this `H`. ruiu: I'd name this `H`.

		// Load a word at a time and test if any of bytes is 0-byte.
		while (DataSize > sizeof(uint32_t)) {
		ruiuUnsubmitted Done Reply Inline Actions sizeof(uint32_t) is always 4, so please write just 4. ruiu: sizeof(uint32_t) is always 4, so please write just 4.
		uint32_t Word = read32(Data);
		// This checks if at least one byte of a word is a null byte.
		espindolaUnsubmitted Not Done Reply Inline Actions I think I just realized a problem with this. If you have a string that is more than 4 bytes long, its hash value depends on its address, no? For example, if the string foobarzed starts at a position 4 byte aligned, we will hash it as foob arzed \0 but if it is offset by one byte, it will be hashed as f ooba zed\0 You can probably just delete the initial alignment loop. This will produce slow code on cpus that don't support fast unaligned loads, but we are already very slow in those cases. espindola: I think I just realized a problem with this. If you have a string that is more than 4 bytes…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions Oh, nice catch! I removed the loop and what is interesting, it sped up the timings for me even more! Time went from ~3968 to ~3704 (total LLD CPU time). And from ~443 to ~415 for `splitStrings`. Seems it means it probably produces better hash with that change. I'll update the patch. grimar: Oh, nice catch! I removed the loop and what is interesting, it sped up the timings for me even…
		// If we found such case we want to break the loop and continue
		// testing the single bytes to find the exact null byte position.
		if (((Word - 0x01010101) & ~Word & 0x80808080) != 0)
		ruiuUnsubmitted Done Reply Inline Actions Omit `!= 0` because it is always implied. ruiu: Omit `!= 0` because it is always implied.
		break;
		Data += sizeof(uint32_t);
		DataSize -= sizeof(uint32_t);
		ruiuUnsubmitted Done Reply Inline Actions I don't think you need to update DataSize. You know the end position of S, so you can compare Data with it to see if you've reached end of a string. ruiu: I don't think you need to update DataSize. You know the end position of S, so you can compare…
		Hash = (Hash << 5) + Hash + Word;
		ruiuUnsubmitted Done Reply Inline Actions `* 33` is perhaps better for brevity. ruiu: `* 33` is perhaps better for brevity.
		}

		espindolaUnsubmitted Not Done Reply Inline Actions This will produce different results on a big endian host, no? espindola: This will produce different results on a big endian host, no?
		grimarAuthorUnsubmitted Not Done Reply Inline Actions You are right.. To solve this, I tried to rewrite hashing code right below to read byte by byte in a loop, but that damages the performance too much. Then tried both `read32be()` and `read32le()` (my host is LE). Avg time for them was the same and has no difference from `reinterpret_cast<const uint32_t >`, so it seems we can use it. grimar: You are right.. To solve this, I tried to rewrite hashing code right below to read byte by…
		// Now find the exact position of the null byte. Do not forget to
		// update the hash value too.
		while (DataSize--) {
		ruiuUnsubmitted Done Reply Inline Actions Likewise, maintaining both DataSize and Data doesn't seem to make sense to me. ruiu: Likewise, maintaining both DataSize and Data doesn't seem to make sense to me.
		Hash = (Hash << 5) + Hash + *Data;
		if (!*Data)
		return (S.size() - DataSize) \| ((uint64_t)Hash << 32);
		espindolaUnsubmitted Not Done Reply Inline Actions It should be OK to return a std::pair, no? espindola: It should be OK to return a std::pair, no?
		grimarAuthorUnsubmitted Not Done Reply Inline Actions I tried, but it affected the perfomance for me. I can probably add `Hash` reference out argument instead. grimar: I tried, but it affected the perfomance for me. I can probably add `Hash` reference out…
		++Data;
		}
		espindolaUnsubmitted Not Done Reply Inline Actions I don't know enough about hashing to judge if this is a reasonable extension of the djb hash for using 4 bytes at a time, but we can probably start with it. espindola: I don't know enough about hashing to judge if this is a reasonable extension of the djb hash…
		grimarAuthorUnsubmitted Not Done Reply Inline Actions I experimented here, any bad hashing increases hash collisions, what instantly shows up in the profile. My approach seems works well, so I think it is OK. It is simpler than taking single bytes, and also the loop reading bytes worked slower for me. grimar: I experimented here, any bad hashing increases hash collisions, what instantly shows up in the…

		llvm_unreachable("string is not null terminated");
		ruiuUnsubmitted Not Done Reply Inline Actions You should be able to handle this case gracefully. ruiu: You should be able to handle this case gracefully.
		}

// Split SHF_STRINGS section. Such section is a sequence of		// Split SHF_STRINGS section. Such section is a sequence of
// null-terminated strings.		// null-terminated strings.
void MergeInputSection::splitStrings(ArrayRef<uint8_t> Data, size_t EntSize) {		void MergeInputSection::splitStrings(ArrayRef<uint8_t> Data, size_t EntSize) {
		ruiuUnsubmitted Done Reply Inline Actions Split this entire function into two; one for '\0'-terminated string and the other for multi-word string. That's better than doing it inside the while loop for readability. ruiu: Split this entire function into two; one for '\0'-terminated string and the other for multi…
		if (Data.size() < EntSize \|\| Data.back() != 0)
		fatal(toString(this) + ": string is too short or not null terminated");

size_t Off = 0;		size_t Off = 0;
bool IsAlloc = Flags & SHF_ALLOC;		bool IsAlloc = Flags & SHF_ALLOC;
StringRef S = toStringRef(Data);		StringRef S = toStringRef(Data);

while (!S.empty()) {		while (!S.empty()) {
		uint32_t Hash;
		size_t Size;

		if (EntSize == 1) {
		uint64_t SizeHash = findSizeAndHash(S);
		Size = SizeHash & 0xFFFFFFFF;
		Hash = SizeHash >> 32;
		ruiuUnsubmitted Done Reply Inline Actions This is very odd, and we should avoid returning two values in one word. Since `findSizeAndHash` is inlined, I don't think you need this. ruiu: This is very odd, and we should avoid returning two values in one word. Since `findSizeAndHash`…
		} else {
size_t End = findNull(S, EntSize);		size_t End = findNull(S, EntSize);
if (End == StringRef::npos)		if (End == StringRef::npos)
fatal(toString(this) + ": string is not null terminated");		fatal(toString(this) + ": string is not null terminated");
size_t Size = End + EntSize;		Size = findNull(S, EntSize) + EntSize;
		Hash = xxHash64(S.substr(0, Size));
		}

Pieces.emplace_back(Off, xxHash64(S.substr(0, Size)), !IsAlloc);		Pieces.emplace_back(Off, Hash, !IsAlloc);
S = S.substr(Size);		S = S.substr(Size);
Off += Size;		Off += Size;
}		}
}		}

// Split non-SHF_STRINGS section. Such section is a sequence of		// Split non-SHF_STRINGS section. Such section is a sequence of
// fixed size records.		// fixed size records.
void MergeInputSection::splitNonStrings(ArrayRef<uint8_t> Data,		void MergeInputSection::splitNonStrings(ArrayRef<uint8_t> Data,
size_t EntSize) {		size_t EntSize) {
size_t Size = Data.size();		size_t Size = Data.size();
assert((Size % EntSize) == 0);		assert((Size % EntSize) == 0);
bool IsAlloc = Flags & SHF_ALLOC;		bool IsAlloc = Flags & SHF_ALLOC;

for (size_t I = 0; I != Size; I += EntSize)		for (size_t I = 0; I != Size; I += EntSize)
Pieces.emplace_back(I, xxHash64(toStringRef(Data.slice(I, EntSize))),		Pieces.emplace_back(I, xxHash64(toStringRef(Data.slice(I, EntSize))),
		espindolaUnsubmitted Not Done Reply Inline Actions For consistency we should probably also replace this use of xxHash64. espindola: For consistency we should probably also replace this use of xxHash64.
		grimarAuthorUnsubmitted Not Done Reply Inline Actions I just did not find a good test for that place. I tried to do that change, but for Scylla, there was no difference, so decided to not do that in this patch to keep the focus on the main place. grimar: I just did not find a good test for that place. I tried to do that change, but for Scylla…
		espindolaUnsubmitted Not Done Reply Inline Actions I don't think there is ever a case where a non SHF_STRING SHF_MERGE section shows up in the profile. espindola: I don't think there is ever a case where a non SHF_STRING SHF_MERGE section shows up in the…
!IsAlloc);		!IsAlloc);
}		}

template <class ELFT>		template <class ELFT>
MergeInputSection::MergeInputSection(ObjFile<ELFT> &F,		MergeInputSection::MergeInputSection(ObjFile<ELFT> &F,
const typename ELFT::Shdr &Header,		const typename ELFT::Shdr &Header,
StringRef Name)		StringRef Name)
: InputSectionBase(F, Header, Name, InputSectionBase::Merge) {}		: InputSectionBase(F, Header, Name, InputSectionBase::Merge) {}
▲ Show 20 Lines • Show All 125 Lines • Show Last 20 Lines

test/ELF/comment-gc.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/comment-gc.s -o %t2.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %p/Inputs/comment-gc.s -o %t2.o
	# RUN: ld.lld %t.o %t2.o -o %t1 --gc-sections -shared			# RUN: ld.lld %t.o %t2.o -o %t1 --gc-sections -shared
	# RUN: llvm-objdump -s %t1 \| FileCheck %s			# RUN: llvm-objdump -s %t1 \| FileCheck %s

	# CHECK: Contents of section .comment:			# CHECK: Contents of section .comment:
	# CHECK-NEXT: foo..LLD 1.0.bar			# CHECK-NEXT: .LLD 1.0.foo.bar

	.ident "foo"			.ident "foo"

	.globl _start			.globl _start
	_start:			_start:
	nop			nop

test/ELF/debug-gc.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o			# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
	# RUN: ld.lld %t.o -o %t1 --gc-sections			# RUN: ld.lld %t.o -o %t1 --gc-sections
	# RUN: llvm-objdump -s %t1 \| FileCheck %s			# RUN: llvm-objdump -s %t1 \| FileCheck %s

	# CHECK: Contents of section .debug_str:			# CHECK: Contents of section .debug_str:
	# CHECK-NEXT: 0000 41414100 43434300 42424200 AAA.CCC.BBB.			# CHECK-NEXT: 0000 41414100 42424200 43434300 AAA.BBB.CCC.
	# CHECK: Contents of section .foo:			# CHECK: Contents of section .foo:
	# CHECK-NEXT: 0000 2a000000			# CHECK-NEXT: 0000 2a000000
	# CHECK: Contents of section .debug_info:			# CHECK: Contents of section .debug_info:
	# CHECK-NEXT: 0000 00000000 08000000			# CHECK-NEXT: 0000 00000000 04000000

	.globl _start			.globl _start
	_start:			_start:

	.section .debug_str,"MS",@progbits,1			.section .debug_str,"MS",@progbits,1
	.Linfo_string0:			.Linfo_string0:
	.asciz "AAA"			.asciz "AAA"
	.Linfo_string1:			.Linfo_string1:
	Show All 11 Lines

test/ELF/emit-relocs-mergeable-i386.s

	# REQUIRES: x86			# REQUIRES: x86
	# RUN: llvm-mc -filetype=obj -triple=i686-pc-linux %s -o %t1			# RUN: llvm-mc -filetype=obj -triple=i686-pc-linux %s -o %t1
	# RUN: ld.lld --emit-relocs %t1 -o %t2			# RUN: ld.lld --emit-relocs %t1 -o %t2
	# RUN: llvm-readobj -sections -section-data %t2 \| FileCheck %s			# RUN: llvm-readobj -sections -section-data %t2 \| FileCheck %s

	## Check lf we produce proper relocations when doing merging of SHF_MERGE sections.			## Check lf we produce proper relocations when doing merging of SHF_MERGE sections.

	## Check addends of relocations are: 0x0, 0x8, 0x8, 0x4			## Check addends of relocations are: 0x0, 0x4, 0x4, 0x8
	# CHECK: Section {			# CHECK: Section {
	# CHECK: Index:			# CHECK: Index:
	# CHECK: Name: .foo			# CHECK: Name: .foo
	# CHECK-NEXT: Type: SHT_PROGBITS			# CHECK-NEXT: Type: SHT_PROGBITS
	# CHECK-NEXT: Flags [			# CHECK-NEXT: Flags [
	# CHECK-NEXT: SHF_ALLOC			# CHECK-NEXT: SHF_ALLOC
	# CHECK-NEXT: SHF_EXECINSTR			# CHECK-NEXT: SHF_EXECINSTR
	# CHECK-NEXT: ]			# CHECK-NEXT: ]
	# CHECK-NEXT: Address:			# CHECK-NEXT: Address:
	# CHECK-NEXT: Offset:			# CHECK-NEXT: Offset:
	# CHECK-NEXT: Size:			# CHECK-NEXT: Size:
	# CHECK-NEXT: Link:			# CHECK-NEXT: Link:
	# CHECK-NEXT: Info:			# CHECK-NEXT: Info:
	# CHECK-NEXT: AddressAlignment:			# CHECK-NEXT: AddressAlignment:
	# CHECK-NEXT: EntrySize:			# CHECK-NEXT: EntrySize:
	# CHECK-NEXT: SectionData (			# CHECK-NEXT: SectionData (
	# CHECK-NEXT: 0000: 00000000 08000000 08000000 04000000			# CHECK-NEXT: 0000: 00000000 04000000 04000000 08000000
	# CHECK-NEXT: )			# CHECK-NEXT: )
	# CHECK-NEXT: }			# CHECK-NEXT: }

	## Check that offsets for AAA is 0x0, for BBB is 0x8 and CCC has offset 0x4.			## Check that offsets for AAA is 0x0, for BBB is 0x4 and CCC has offset 0x8.
	# CHECK: Section {			# CHECK: Section {
	# CHECK: Index:			# CHECK: Index:
	# CHECK: Name: .strings			# CHECK: Name: .strings
	# CHECK-NEXT: Type: SHT_PROGBITS			# CHECK-NEXT: Type: SHT_PROGBITS
	# CHECK-NEXT: Flags [			# CHECK-NEXT: Flags [
	# CHECK-NEXT: SHF_MERGE			# CHECK-NEXT: SHF_MERGE
	# CHECK-NEXT: SHF_STRINGS			# CHECK-NEXT: SHF_STRINGS
	# CHECK-NEXT: ]			# CHECK-NEXT: ]
	# CHECK-NEXT: Address:			# CHECK-NEXT: Address:
	# CHECK-NEXT: Offset:			# CHECK-NEXT: Offset:
	# CHECK-NEXT: Size:			# CHECK-NEXT: Size:
	# CHECK-NEXT: Link:			# CHECK-NEXT: Link:
	# CHECK-NEXT: Info:			# CHECK-NEXT: Info:
	# CHECK-NEXT: AddressAlignment:			# CHECK-NEXT: AddressAlignment:
	# CHECK-NEXT: EntrySize:			# CHECK-NEXT: EntrySize:
	# CHECK-NEXT: SectionData (			# CHECK-NEXT: SectionData (
	# CHECK-NEXT: \|AAA.CCC.BBB.\|			# CHECK-NEXT: \|AAA.BBB.CCC.\|
	# CHECK-NEXT: )			# CHECK-NEXT: )
	# CHECK-NEXT: }			# CHECK-NEXT: }

	.section .strings,"MS",@progbits,1,unique,10			.section .strings,"MS",@progbits,1,unique,10
	.Linfo_string0:			.Linfo_string0:
	.asciz "AAA"			.asciz "AAA"
	.Linfo_string1:			.Linfo_string1:
	.asciz "BBB"			.asciz "BBB"
	Show All 12 Lines

test/ELF/emit-relocs-mergeable.s

	Show All 15 Lines
	# CHECK-NEXT: Address:			# CHECK-NEXT: Address:
	# CHECK-NEXT: Offset:			# CHECK-NEXT: Offset:
	# CHECK-NEXT: Size: 12			# CHECK-NEXT: Size: 12
	# CHECK-NEXT: Link:			# CHECK-NEXT: Link:
	# CHECK-NEXT: Info:			# CHECK-NEXT: Info:
	# CHECK-NEXT: AddressAlignment:			# CHECK-NEXT: AddressAlignment:
	# CHECK-NEXT: EntrySize:			# CHECK-NEXT: EntrySize:
	# CHECK-NEXT: SectionData (			# CHECK-NEXT: SectionData (
	# CHECK-NEXT: 0000: 41414100 43434300 42424200 \|AAA.CCC.BBB.\|			# CHECK-NEXT: 0000: 41414100 42424200 43434300 \|AAA.BBB.CCC.\|
	# CHECK-NEXT: )			# CHECK-NEXT: )
	# CHECK-NEXT: }			# CHECK-NEXT: }

	# CHECK: Relocations [			# CHECK: Relocations [
	# CHECK-NEXT: Section {{.*}} .rela.foo {			# CHECK-NEXT: Section {{.*}} .rela.foo {
	# CHECK-NEXT: 0x201000 R_X86_64_64 .strings 0x0			# CHECK-NEXT: 0x201000 R_X86_64_64 .strings 0x0
	# CHECK-NEXT: 0x201008 R_X86_64_64 .strings 0x8			# CHECK-NEXT: 0x201008 R_X86_64_64 .strings 0x4
	# CHECK-NEXT: 0x201010 R_X86_64_64 .strings 0x8			# CHECK-NEXT: 0x201010 R_X86_64_64 .strings 0x4
	# CHECK-NEXT: 0x201018 R_X86_64_64 .strings 0x4			# CHECK-NEXT: 0x201018 R_X86_64_64 .strings 0x8
	# CHECK-NEXT: }			# CHECK-NEXT: }
	# CHECK-NEXT: ]			# CHECK-NEXT: ]

	.section .strings,"MS",@progbits,1,unique,10			.section .strings,"MS",@progbits,1,unique,10
	.Linfo_string0:			.Linfo_string0:
	.asciz "AAA"			.asciz "AAA"
	.Linfo_string1:			.Linfo_string1:
	.asciz "BBB"			.asciz "BBB"
	Show All 12 Lines

test/ELF/merge-string-no-null.s

	// REQUIRES: x86			// REQUIRES: x86
	// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %s -o %t.o			// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %s -o %t.o
	// RUN: not ld.lld %t.o -o %t.so -shared 2>&1 \| FileCheck %s			// RUN: not ld.lld %t.o -o %t.so -shared 2>&1 \| FileCheck %s

	.section .rodata.str1.1,"aMS",@progbits,1			.section .rodata.str1.1,"aMS",@progbits,1
	.ascii "abc"			.ascii "abc"

	// CHECK: string is not null terminated			// CHECK: string is too short or not null terminated

test/ELF/string-gc.s

	// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %s -o %t.o			// RUN: llvm-mc -filetype=obj -triple=x86_64-pc-linux %s -o %t.o
	// RUN: ld.lld %t.o -o %t --gc-sections			// RUN: ld.lld %t.o -o %t --gc-sections
	// RUN: llvm-readobj -symbols %t \| FileCheck %s			// RUN: llvm-readobj -symbols %t \| FileCheck %s

	// CHECK: Symbols [			// CHECK: Symbols [
	// CHECK-NEXT: Symbol {			// CHECK-NEXT: Symbol {
	// CHECK-NEXT: Name: (0)			// CHECK-NEXT: Name: (0)
	// CHECK-NEXT: Value: 0x0			// CHECK-NEXT: Value: 0x0
	// CHECK-NEXT: Size: 0			// CHECK-NEXT: Size: 0
	// CHECK-NEXT: Binding: Local (0x0)			// CHECK-NEXT: Binding: Local (0x0)
	// CHECK-NEXT: Type: None (0x0)			// CHECK-NEXT: Type: None (0x0)
	// CHECK-NEXT: Other: 0			// CHECK-NEXT: Other: 0
	// CHECK-NEXT: Section: Undefined (0x0)			// CHECK-NEXT: Section: Undefined (0x0)
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: Symbol {			// CHECK-NEXT: Symbol {
	// CHECK-NEXT: Name: s3			// CHECK-NEXT: Name: s3
	// CHECK-NEXT: Value: 0x200120			// CHECK-NEXT: Value: 0x200125
	// CHECK-NEXT: Size: 0			// CHECK-NEXT: Size: 0
	// CHECK-NEXT: Binding: Local (0x0)			// CHECK-NEXT: Binding: Local (0x0)
	// CHECK-NEXT: Type: Object (0x1)			// CHECK-NEXT: Type: Object (0x1)
	// CHECK-NEXT: Other: 0			// CHECK-NEXT: Other: 0
	// CHECK-NEXT: Section: .rodata (0x1)			// CHECK-NEXT: Section: .rodata (0x1)
	// CHECK-NEXT: }			// CHECK-NEXT: }
	// CHECK-NEXT: Symbol {			// CHECK-NEXT: Symbol {
	// CHECK-NEXT: Name: s1			// CHECK-NEXT: Name: s1
	// CHECK-NEXT: Value: 0x200125			// CHECK-NEXT: Value: 0x200120
	// CHECK-NEXT: Size: 0			// CHECK-NEXT: Size: 0
	// CHECK-NEXT: Binding: Local (0x0)			// CHECK-NEXT: Binding: Local (0x0)
	// CHECK-NEXT: Type: Object (0x1)			// CHECK-NEXT: Type: Object (0x1)
	// CHECK-NEXT: Other [ (0x2)			// CHECK-NEXT: Other [ (0x2)
	// CHECK-NEXT: STV_HIDDEN (0x2)			// CHECK-NEXT: STV_HIDDEN (0x2)
	// CHECK-NEXT: ]			// CHECK-NEXT: ]
	// CHECK-NEXT: Section: .rodata (0x1)			// CHECK-NEXT: Section: .rodata (0x1)
	// CHECK-NEXT: }			// CHECK-NEXT: }
	Show All 39 Lines