Download Raw Diff

Details

Reviewers

silvas
• rafael

Commits

rGc1835319c93b: Parallelize ICF to make LLD's ICF really fast.
rLLD288373: Parallelize ICF to make LLD's ICF really fast.
rL288373: Parallelize ICF to make LLD's ICF really fast.

Summary

ICF is short for Identical Code Folding. It is a size optimization to identify two or more functions that happened to have the same contents to merges them. It usually reduces output size by a few percent.

ICF is slow because it is computationally intensive process. I tried to paralellize it before but failed because I couldn't make a parallelized version produce consistent outputs. Although it didn't create broken executables, every invocation of the linker generated slightly different output, and I couldn't figure out why.

I think I now understand what was going on, and also came up with a simple algorithm to fix it. So is this patch.

The result is very exciting. Chromium for example has 780,662 input sections in which 20,774 are reducible by ICF. LLD previously took 7.980 seconds for ICF. Now it finishes in 1.065 seconds.

As a result, LLD can now link a Chromium binary (output size 1.59 GB) in 10.28 seconds on my machine with ICF enabled. Compared to gold which takes 40.94 seconds to do the same thing, this is an amazing number.

From here, I'll describe what we are doing for ICF, what was the previous problem, and what I did in this patch.

In ICF, two sections are considered identical if they have the same section flags, section data, and relocations. Relocations are tricky, becuase two relocations are considered the same if they have the same relocation type, values, and if they point to the same section _in terms of ICF_.

Here is an example. If foo and bar defined below are compiled to the same machine instructions, ICF can (and should) merge the two, although their relocations point to each other.

void foo() { bar(); }
void bar() { foo(); }

This is not an easy problem to solve.

What we are doing in LLD is some sort of coloring algorithm. We color non-identical sections using different colors repeatedly, and sections in the same color when the algorithm terminates are considered identical. Here is the details:

First, we color all sections using their hash values of section types, section contents, and numbers of relocations. At this moment, relocation targets are not taken into account. We just color sections that apparently differ in different colors.
Next, for each color C, we visit sections having color C to see if their relocations are the same. Relocations are considered equal if their targets have the same color. We then recolor sections that have different relocation targets in a new colors.
If we recolor some section in step 2, relocations that were previously pointing to the same color targets may now be pointing to different colors. Therefore, repeat 2 until a convergence is obtained.

Step 2 is a heavy operation. For Chromium, the first iteration of step 2 takes 2.882 seconds, and the second iteration takes 1.038 seconds, and in total it needs 23 iterations.

Parallelizing step 1 is easy because we can color each section independently. This patch does that.

Parallelizing step 2 is tricky. We could work on each color independently, but we cannot recolor sections in place, because it will break the invariance that two possibly-identical sections must have the same color at any moment.

Consider sections S1, S2, S3, S4 in the same color C, where S1 and S2 are identical, S3 and S4 are identical, but S2 and S4 are not. Thread A is about to recolor S1 and S2 in C'. After thread A recolor S1 in C', but before recolor S2 in C', other thread B might observe S1 and S2. Then thread B will conclude that S1 and S2 are different, and it will split thread B's sections into smaller groups wrongly. Over-splitting doesn't produce broken results, but it loses a chance to merge some identical sections. That was the cause of indeterminism.

To fix the problem, I made sections have two colors, namely current color and next color. At the beginning of each iteration, both colors are the same. Each thread reads from current color and writes to next color. In this way, we can avoid threads from reading partial results. After each iteration, we flip current and next.

This is a very simple solution and is implemented in less than 50 lines of code.

I tested this patch with Chromium and confirmed that this parallelized ICF produces the identical output as the non-parallelized one.

Diff Detail

Build Status

Buildable 1719
Build 1719: arc lint + arc unit

Event Timeline

ruiu updated this revision to Diff 79699.Nov 29 2016, 8:12 PM

ruiu retitled this revision from to Parallelize ICF to make LLD's ICF really fast..

ruiu updated this object.

ruiu added reviewers: silvas, • rafael.

ruiu added a subscriber: llvm-commits.

emaste added a subscriber: emaste.Nov 29 2016, 8:19 PM

Fixed correctness issue. When we flip GroupId[0] and GroupId[1], we needed to copy new colors from old space to new space.
Handle !Config->Thread case.

ruiu updated this object.Nov 29 2016, 10:35 PM

Nice! The idea of storing current and next colors solves the nondeterminism in a very simple way!

One concern: this increases sizeof(InputSection) :(
With -ffunction-sections -fdata-sections we might have a very large number of them, so reducing memory footprint is important. I'm afraid that this might slow down regular links.
After this patch, we will be paying one pointer size for DependentSection and 2 for GroupId. That is 3 pointer memory overhead which is really quite nontrivial.
It doesn't have to be done in this patch, but I think we can adjust the memory allocation to optionally allocate this data "off the tail" of the InputSection, so that we don't pay the memory overhead if these advanced features aren't being used.

Is the large (23) number of iterations due to slowness of propagating identicalness across references (one level of references per iteration currently) or due to ICF<ELFT>::segregate only being able to split into two at each iteration? (or a combination of both?). Here are two ideas for reducing the number of iterations:

do some sort of topological sorting (even approximate) and then do partial iterations which only sort only part of the array. (more generally, avoid revisiting sections that are unlikely to change this iteration). This can speed up the convergence since we avoid wasting work on nodes that won't change.

One interesting observation is that if the array is topologically sorted (i.e. except for cycles) then I believe that a serial visitation with relaxation at each step (i.e., cannot be parallelized deterministically) would be guaranteed to resolve in a single iteration. The savings of reducing iterations might pay off.
Note that --gc-sections already has to compute some of this, so this topological ordering information might not be so expensive.

make the "equal" comparison actually be a "less". That will allow ICF<ELFT>::segregate to sort instead of partition, which allows it to generate multiple ranges at a time.

What we are doing in LLD is some sort of coloring algorithm

Believe it or not, once I started learning about GVN I learned that this algorithm is actually a textbook example of an "optimistic" GVN algorithm. So it is actually a well-studied kind of algorithm.

ELF/InputSection.h
292	Would struct { uint64_t Current; uint64_t Next; } GroupId; be a bit better?

Change the type of GroupId from uint64_t to uint32_t to keep the original object size.

Is the large (23) number of iterations due to slowness of propagating identicalness across references (one level of references per iteration currently) or due to ICF<ELFT>::segregate only being able to split into two at each iteration? (or a combination of both?). Here are two ideas for reducing the number of iterations:

Very good question. Single-threaded iteration with a single GroupId for each InputSection took 11 iterations, so this patch slows down single threaded execution. The main loop gets faster as we have less number of input sections to mutate, but still executing an empty iteration takes ~100ms. I made a change to code to optimize for that case. (I could optimize it even more if I added more code for single thread, but I think in practice this should be enough.)

do some sort of topological sorting (even approximate) and then do partial iterations which only sort only part of the array. (more generally, avoid revisiting sections that are unlikely to change this iteration). This can speed up the convergence since we avoid wasting work on nodes that won't change.

Interesting idea. I could imagine that we might be able to compute strongly connected component and toposort them to do some sort of optimization. I wouldn't do that in this patch, but that's probably worth trying.

make the "equal" comparison actually be a "less". That will allow ICF<ELFT>::segregate to sort instead of partition, which allows it to generate multiple ranges at a time.

This is very interesting idea too. I'll try to do that after submit this patch.

ELF/InputSection.h
292	I'm using GroupId[0] and GroupId[1] alternately, so I can't make that change.

Optimize for single-thread case.

This LGTM. Thanks for looking so closely at this! It's a very nice speedup!

do some sort of topological sorting (even approximate) and then do partial iterations which only sort only part of the array. (more generally, avoid revisiting sections that are unlikely to change this iteration). This can speed up the convergence since we avoid wasting work on nodes that won't change.

Interesting idea. I could imagine that we might be able to compute strongly connected component and toposort them to do some sort of optimization. I wouldn't do that in this patch, but that's probably worth trying.

We may not even need SCC's, as cycles are relatively rare. Just ordering by a simple DFS/BFS may give most of the benefit. So we may be able to get this "for free" from the work we do for --gc-sections.

(as far as parallelizing graph traversals, parallel graph algorithms have actually been studied quite a bit, although often in a distributed setting.
Some ones I remember seeing are:
https://github.com/MicrosoftResearch/Naiad/blob/release_0.5/Examples/DifferentialDataflow/ConnectedComponents.cs
https://github.com/MicrosoftResearch/Naiad/blob/release_0.5/Examples/DifferentialDataflow/StronglyConnectedComponents.cs
https://github.com/frankmcsherry/timely-dataflow/blob/master/examples/bfs.rs

Ideas from there might be useful inspiration for any parallel graph processing we have to do.
)

This revision is now accepted and ready to land.Dec 1 2016, 12:16 AM

Closed by commit rL288373: Parallelize ICF to make LLD's ICF really fast. (authored by ruiu). · Explain WhyDec 1 2016, 9:19 AM

This revision was automatically updated to reflect the committed changes.

Diff 79777

ELF/ICF.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
// gold implemented is different from the optimistic algorithm.)		// gold implemented is different from the optimistic algorithm.)
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "ICF.h"		#include "ICF.h"
#include "Config.h"		#include "Config.h"
#include "SymbolTable.h"		#include "SymbolTable.h"

		#include "lld/Core/Parallel.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/Object/ELF.h"		#include "llvm/Object/ELF.h"
#include "llvm/Support/ELF.h"		#include "llvm/Support/ELF.h"
#include <algorithm>		#include <algorithm>
		#include <mutex>

using namespace lld;		using namespace lld;
using namespace lld::elf;		using namespace lld::elf;
using namespace llvm;		using namespace llvm;
using namespace llvm::ELF;		using namespace llvm::ELF;
using namespace llvm::object;		using namespace llvm::object;

namespace {		namespace {
Show All 16 Lines	private:
bool variableEq(const InputSection<ELFT> *A, ArrayRef<RelTy> RelsA,		bool variableEq(const InputSection<ELFT> *A, ArrayRef<RelTy> RelsA,
const InputSection<ELFT> *B, ArrayRef<RelTy> RelsB);		const InputSection<ELFT> *B, ArrayRef<RelTy> RelsB);

bool equalsConstant(const InputSection<ELFT> A, const InputSection<ELFT> B);		bool equalsConstant(const InputSection<ELFT> A, const InputSection<ELFT> B);
bool equalsVariable(const InputSection<ELFT> A, const InputSection<ELFT> B);		bool equalsVariable(const InputSection<ELFT> A, const InputSection<ELFT> B);

std::vector<InputSection<ELFT> *> Sections;		std::vector<InputSection<ELFT> *> Sections;
std::vector<Range> Ranges;		std::vector<Range> Ranges;
		std::mutex Mu;

// The main loop is repeated until we get a convergence.		uint32_t NextId = 1;
bool Repeat = false; // If Repeat is true, we need to repeat.		int Cnt = 0;
int Cnt = 0; // Counter for the main loop.
};		};
}		}

// Returns a hash value for S. Note that the information about		// Returns a hash value for S. Note that the information about
// relocation targets is not included in the hash value.		// relocation targets is not included in the hash value.
template <class ELFT> static uint64_t getHash(InputSection<ELFT> *S) {		template <class ELFT> static uint32_t getHash(InputSection<ELFT> *S) {
return hash_combine(S->Flags, S->getSize(), S->NumRelocations);		return hash_combine(S->Flags, S->getSize(), S->NumRelocations);
}		}

// Returns true if section S is subject of ICF.		// Returns true if section S is subject of ICF.
template <class ELFT> static bool isEligible(InputSection<ELFT> *S) {		template <class ELFT> static bool isEligible(InputSection<ELFT> *S) {
// .init and .fini contains instructions that must be executed to		// .init and .fini contains instructions that must be executed to
// initialize and finalize the process. They cannot and should not		// initialize and finalize the process. They cannot and should not
// be merged.		// be merged.
return S->Live && (S->Flags & SHF_ALLOC) && !(S->Flags & SHF_WRITE) &&		return S->Live && (S->Flags & SHF_ALLOC) && !(S->Flags & SHF_WRITE) &&
S->Name != ".init" && S->Name != ".fini";		S->Name != ".init" && S->Name != ".fini";
}		}

// Before calling this function, all sections in range R must have the		// Before calling this function, all sections in range R must have the
// same group ID.		// same group ID.
template <class ELFT> void ICF<ELFT>::segregate(Range *R, bool Constant) {		template <class ELFT> void ICF<ELFT>::segregate(Range *R, bool Constant) {
// This loop rearranges sections in range R so that all sections		// This loop rearranges sections in range R so that all sections
// that are equal in terms of equals{Constant,Variable} are contiguous		// that are equal in terms of equals{Constant,Variable} are contiguous
// in Sections vector.		// in Sections vector.
//		//
// The algorithm is quadratic in the worst case, but that is not an		// The algorithm is quadratic in the worst case, but that is not an
// issue in practice because the number of the distinct sections in		// issue in practice because the number of the distinct sections in
// [R.Begin, R.End] is usually very small.		// [R.Begin, R.End] is usually very small.
while (R->End - R->Begin > 1) {		while (R->End - R->Begin > 1) {
		size_t Begin = R->Begin;
		size_t End = R->End;

// Divide range R into two. Let Mid be the start index of the		// Divide range R into two. Let Mid be the start index of the
// second group.		// second group.
auto Bound = std::stable_partition(		auto Bound = std::stable_partition(
Sections.begin() + R->Begin + 1, Sections.begin() + R->End,		Sections.begin() + Begin + 1, Sections.begin() + End,
[&](InputSection<ELFT> *S) {		[&](InputSection<ELFT> *S) {
if (Constant)		if (Constant)
return equalsConstant(Sections[R->Begin], S);		return equalsConstant(Sections[Begin], S);
return equalsVariable(Sections[R->Begin], S);		return equalsVariable(Sections[Begin], S);
});		});
size_t Mid = Bound - Sections.begin();		size_t Mid = Bound - Sections.begin();

if (Mid == R->End)		if (Mid == End)
return;		return;

// Now we split [R.Begin, R.End) into [R.Begin, Mid) and [Mid, R.End).		// Now we split [Begin, End) into [Begin, Mid) and [Mid, End).
if (Mid - R->Begin > 1)		uint32_t Id;
Ranges.push_back({R->Begin, Mid});		Range *NewRange;
R->Begin = Mid;		{
		std::lock_guard<std::mutex> Lock(Mu);
// Update GroupIds for the new group members. We use the index of		Ranges.push_back({Mid, End});
// the group first member as a group ID because that is unique.		NewRange = &Ranges.back();
for (size_t I = Mid; I < R->End; ++I)		Id = NextId++;
Sections[I]->GroupId = Mid;		}
		R->End = Mid;
// Since we have split a group, we need to repeat the main loop
// later to obtain a convergence. Remember that.		// Update GroupIds for the new group members.
Repeat = true;		//
		// Note on GroupId[0] and GroupId[1]: we have two storages for
		// group IDs. At the beginning of each iteration of the main loop,
		// both have the same ID. GroupId[0] contains the current ID, and
		// GroupId[1] contains the next ID which will be used in the next
		// iteration.
		//
		// Recall that other threads may be working on other ranges. They
		// may be reading group IDs that we are about to update. We cannot
		// update group IDs directly because it breaks the invariance that
		// all sections in the same group must have the same ID. In other
		// words, the following for loop is not an atomic operation, and
		// that is observable from other threads.
		//
		// By writing new IDs to write-only places, we can keep the invariance.
		for (size_t I = Mid; I < End; ++I)
		Sections[I]->GroupId[(Cnt + 1) % 2] = Id;

		R = NewRange;
}		}
}		}

// Compare two lists of relocations.		// Compare two lists of relocations.
template <class ELFT>		template <class ELFT>
template <class RelTy>		template <class RelTy>
bool ICF<ELFT>::constantEq(ArrayRef<RelTy> RelsA, ArrayRef<RelTy> RelsB) {		bool ICF<ELFT>::constantEq(ArrayRef<RelTy> RelsA, ArrayRef<RelTy> RelsB) {
auto Eq = [](const RelTy &A, const RelTy &B) {		auto Eq = [](const RelTy &A, const RelTy &B) {
Show All 40 Lines	if (!DA \|\| !DB)
return false;		return false;
if (DA->Value != DB->Value)		if (DA->Value != DB->Value)
return false;		return false;

auto *X = dyn_cast<InputSection<ELFT>>(DA->Section);		auto *X = dyn_cast<InputSection<ELFT>>(DA->Section);
auto *Y = dyn_cast<InputSection<ELFT>>(DB->Section);		auto *Y = dyn_cast<InputSection<ELFT>>(DB->Section);
if (!X \|\| !Y)		if (!X \|\| !Y)
return false;		return false;
return X->GroupId != 0 && X->GroupId == Y->GroupId;		if (X->GroupId[Cnt % 2] == 0)
		return false;
		return X->GroupId[Cnt % 2] == Y->GroupId[Cnt % 2];
};		};

return std::equal(RelsA.begin(), RelsA.end(), RelsB.begin(), Eq);		return std::equal(RelsA.begin(), RelsA.end(), RelsB.begin(), Eq);
}		}

// Compare "moving" part of two InputSections, namely relocation targets.		// Compare "moving" part of two InputSections, namely relocation targets.
template <class ELFT>		template <class ELFT>
bool ICF<ELFT>::equalsVariable(const InputSection<ELFT> *A,		bool ICF<ELFT>::equalsVariable(const InputSection<ELFT> *A,
const InputSection<ELFT> *B) {		const InputSection<ELFT> *B) {
if (A->AreRelocsRela)		if (A->AreRelocsRela)
return variableEq(A, A->relas(), B, B->relas());		return variableEq(A, A->relas(), B, B->relas());
return variableEq(A, A->rels(), B, B->rels());		return variableEq(A, A->rels(), B, B->rels());
}		}

		template <class IterTy, class FuncTy>
		static void foreach(IterTy Begin, IterTy End, FuncTy Fn) {
		if (Config->Threads)
		parallel_for_each(Begin, End, Fn);
		else
		std::for_each(Begin, End, Fn);
		}

// The main function of ICF.		// The main function of ICF.
template <class ELFT> void ICF<ELFT>::run() {		template <class ELFT> void ICF<ELFT>::run() {
// Collect sections to merge.		// Collect sections to merge.
for (InputSectionBase<ELFT> *Sec : Symtab<ELFT>::X->Sections)		for (InputSectionBase<ELFT> *Sec : Symtab<ELFT>::X->Sections)
if (auto *S = dyn_cast<InputSection<ELFT>>(Sec))		if (auto *S = dyn_cast<InputSection<ELFT>>(Sec))
if (isEligible(S))		if (isEligible(S))
Sections.push_back(S);		Sections.push_back(S);

// Initially, we use hash values as section group IDs. Therefore,		// Initially, we use hash values as section group IDs. Therefore,
// if two sections have the same ID, they are likely (but not		// if two sections have the same ID, they are likely (but not
// guaranteed) to have the same static contents in terms of ICF.		// guaranteed) to have the same static contents in terms of ICF.
for (InputSection<ELFT> *S : Sections)		for (InputSection<ELFT> *S : Sections)
// Set MSB to 1 to avoid collisions with non-hash IDs.		// Set MSB to 1 to avoid collisions with non-hash IDs.
S->GroupId = getHash(S) \| (uint64_t(1) << 63);		S->GroupId[0] = S->GroupId[1] = getHash(S) \| (1 << 31);

// From now on, sections in Sections are ordered so that sections in		// From now on, sections in Sections are ordered so that sections in
// the same group are consecutive in the vector.		// the same group are consecutive in the vector.
std::stable_sort(Sections.begin(), Sections.end(),		std::stable_sort(Sections.begin(), Sections.end(),
[](InputSection<ELFT> A, InputSection<ELFT> B) {		[](InputSection<ELFT> A, InputSection<ELFT> B) {
if (A->GroupId != B->GroupId)		if (A->GroupId[0] != B->GroupId[0])
return A->GroupId < B->GroupId;		return A->GroupId[0] < B->GroupId[0];
// Within a group, put the highest alignment		// Within a group, put the highest alignment
// requirement first, so that's the one we'll keep.		// requirement first, so that's the one we'll keep.
return B->Alignment < A->Alignment;		return B->Alignment < A->Alignment;
});		});

// Split sections into groups by ID. And then we are going to		// Split sections into groups by ID. And then we are going to
// split groups into more and more smaller groups.		// split groups into more and more smaller groups.
// Note that we do not add single element groups because they		// Note that we do not add single element groups because they
// are already the smallest.		// are already the smallest.
Ranges.reserve(Sections.size());		Ranges.reserve(Sections.size());
for (size_t I = 0, E = Sections.size(); I < E - 1;) {		for (size_t I = 0, E = Sections.size(); I < E - 1;) {
// Let J be the first index whose element has a different ID.		// Let J be the first index whose element has a different ID.
size_t J = I + 1;		size_t J = I + 1;
while (J < E && Sections[I]->GroupId == Sections[J]->GroupId)		while (J < E && Sections[I]->GroupId[0] == Sections[J]->GroupId[0])
++J;		++J;
if (J - I > 1)		if (J - I > 1)
Ranges.push_back({I, J});		Ranges.push_back({I, J});
I = J;		I = J;
}		}

		// This function copies new colors from former write-only space to
		// former read-only space, so that we can flip GroupId[0] and GroupId[1].
		// Note that new colors are always be added to end of Ranges.
		auto Copy = [&](Range &R) {
		for (size_t I = R.Begin; I < R.End; ++I)
		Sections[I]->GroupId[Cnt % 2] = Sections[I]->GroupId[(Cnt + 1) % 2];
		};

// Compare static contents and assign unique IDs for each static content.		// Compare static contents and assign unique IDs for each static content.
std::for_each(Ranges.begin(), Ranges.end(),		auto End = Ranges.end();
[&](Range &R) { segregate(&R, true); });		foreach(Ranges.begin(), End, [&](Range &R) { segregate(&R, true); });
		foreach(End, Ranges.end(), Copy);
++Cnt;		++Cnt;

// Split groups by comparing relocations until convergence is obtained.		// Split groups by comparing relocations until convergence is obtained.
do {		for (;;) {
Repeat = false;		auto End = Ranges.end();
std::for_each(Ranges.begin(), Ranges.end(),		foreach(Ranges.begin(), End, [&](Range &R) { segregate(&R, false); });
[&](Range &R) { segregate(&R, false); });		foreach(End, Ranges.end(), Copy);
++Cnt;		++Cnt;
} while (Repeat);
		if (End == Ranges.end())
		break;
		}

log("ICF needed " + Twine(Cnt) + " iterations");		log("ICF needed " + Twine(Cnt) + " iterations");

// Merge sections in the same group.		// Merge sections in the same group.
for (Range R : Ranges) {		for (Range R : Ranges) {
if (R.End - R.Begin == 1)		if (R.End - R.Begin == 1)
continue;		continue;

Show All 15 Lines

ELF/InputSection.h

Show First 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	public:

// Size of chunk with thunks code.		// Size of chunk with thunks code.
uint64_t getThunksSize() const;		uint64_t getThunksSize() const;

template <class RelTy>		template <class RelTy>
void relocateNonAlloc(uint8_t *Buf, llvm::ArrayRef<RelTy> Rels);		void relocateNonAlloc(uint8_t *Buf, llvm::ArrayRef<RelTy> Rels);

// Used by ICF.		// Used by ICF.
uint64_t GroupId = 0;		uint32_t GroupId[2] = {0, 0};
		silvasUnsubmitted Not Done Reply Inline Actions Would struct { uint64_t Current; uint64_t Next; } GroupId; be a bit better? silvas: Would ``` struct { uint64_t Current; uint64_t Next; } GroupId; ``` be a bit better?
		ruiuAuthorUnsubmitted Not Done Reply Inline Actions I'm using GroupId[0] and GroupId[1] alternately, so I can't make that change. ruiu: I'm using GroupId[0] and GroupId[1] alternately, so I can't make that change.

// Called by ICF to merge two input sections.		// Called by ICF to merge two input sections.
void replace(InputSection<ELFT> *Other);		void replace(InputSection<ELFT> *Other);

private:		private:
template <class RelTy>		template <class RelTy>
void copyRelocations(uint8_t *Buf, llvm::ArrayRef<RelTy> Rels);		void copyRelocations(uint8_t *Buf, llvm::ArrayRef<RelTy> Rels);

Show All 11 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Parallelize ICF to make LLD's ICF really fast.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79777

ELF/ICF.cpp

ELF/InputSection.h

This is an archive of the discontinued LLVM Phabricator instance.

Parallelize ICF to make LLD's ICF really fast.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 79777

ELF/ICF.cpp

ELF/InputSection.h

Parallelize ICF to make LLD's ICF really fast.
ClosedPublic