This is an archive of the discontinued LLVM Phabricator instance.

[lld][ELF] Add profile guided section layout
ClosedPublic

Authored by Bigcheese on Aug 4 2017, 9:13 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ruiu added inline comments.Sep 7 2017, 2:13 PM
ELF/Writer.cpp
930–936
for (BaseCommand *Base : Script->Opt.Commands)
  if (auto *OS = dyn_cast<OutputSection>(Base))
    if (OS->Name == ".text")
      OS->sort([&](InputSectionBase *IS) { return OrderMap.lookup(IS); });
pcc edited edge metadata.Sep 7 2017, 2:21 PM
In D36351#863961, @ruiu wrote:

How can I test this? I wonder how to create profiling data to feed this feature.

I have a local patch to AutoFDO that I use to create profiles. I can upload it somewhere if you'd like.

ruiu added a comment.Sep 7 2017, 2:25 PM

I have a local patch to AutoFDO that I use to create profiles. I can upload it somewhere if you'd like.

Thanks. But the paper says that they used perf command to get profiling data. Why did you have to use AutoFDO?

pcc added a comment.Sep 7 2017, 2:37 PM
In D36351#863966, @ruiu wrote:

I have a local patch to AutoFDO that I use to create profiles. I can upload it somewhere if you'd like.

Thanks. But the paper says that they used perf command to get profiling data. Why did you have to use AutoFDO?

As far as I know the missing piece is a tool that converts LBR data collected by perf into a weighted call graph. AutoFDO already has support for creating weighted call graphs from perf LBR data, so I just needed to teach it to output the graph in a different format.

ruiu added a comment.Sep 7 2017, 2:45 PM

I'd think I want to make lld able to directly parse a perf command output so that users can try this feature easily.

pcc added a comment.Sep 7 2017, 2:51 PM
In D36351#864001, @pcc wrote:
In D36351#863966, @ruiu wrote:

I have a local patch to AutoFDO that I use to create profiles. I can upload it somewhere if you'd like.

Thanks. But the paper says that they used perf command to get profiling data. Why did you have to use AutoFDO?

As far as I know the missing piece is a tool that converts LBR data collected by perf into a weighted call graph. AutoFDO already has support for creating weighted call graphs from perf LBR data, so I just needed to teach it to output the graph in a different format.

Here is my patched copy of autofdo: https://github.com/pcc/autofdo/tree/lld-cgprofile

Once you've built it, you can use it like this:

perf record --pfm-events=br_inst_retired:near_taken -b ./your_binary
/path/to/create_lld_profile -binary ./your_binary perf.data

Then the profile will be in cgprofile.txt.

Bigcheese updated this revision to Diff 114448.Sep 8 2017, 3:55 PM

Address review comments.

ruiu added a comment.Oct 3 2017, 6:43 PM

Could you send me a patch to produce a call graph file so that I can try this patch on my machine?

ELF/CallGraphSort.cpp
100–102

You can just return To < Other.To.

128

This loop is a bit too dense. It cannot be understood without reading each line carefully as I don't understand the whole picture. Please insert a blank line between code blocks. Adding more comment would help.

129

Please define local variables for C.first.first, C.first.second and C.second so that they are accessed through meaningful names.

131–136

auto -> auto *

148

nit: add {}

154

Please add a function comment as to what this function is intended to do. I do not understand this function because I don't get a whole picture.

159–164

Add blank lines before comments.

166–167

Please remove debug code.

179–181

This looks odd. Why do you need to do this? I think you can just leave it alone.

208–209

This might be understood for those who read the paper, but I don't think that is enough. Please write more comment as to what are clusters, what is density, and what is the heuristic.

273–274

You can do this in one line without defining a local variable.

ELF/Writer.cpp
930–932

I'd factor this code out as OutputSection *findOutputSection(StringRef Name).

davide added inline comments.Oct 24 2017, 9:35 PM
ELF/CallGraphSort.cpp
113

Simbol Bodies.

118

why std::map and not densemap?
if there's a reason, add a comment.

120

is the explicit return needed here?

131

auto * presumably?

187

Can you guarantee determinism of the output if two sort implementations break ties in different ways?

190

std::unique?

242

node indices (I don't think indexes is a word).

davide added inline comments.Oct 24 2017, 9:37 PM
ELF/CallGraphSort.cpp
113

*symbol bodies

zturner added a subscriber: zturner.Dec 8 2017, 5:56 PM

Not sure what the status of this is, but it would be realllly cool if we could get this in before the 6.0 release branch, which I think is only a few weeks away.

Bigcheese updated this revision to Diff 126894.Dec 13 2017, 9:03 PM

Fix priority queue bugs.

ruiu added a comment.Dec 14 2017, 4:47 PM

This is perhaps a fundamental question, but why do you compute the layout in the linker? It looks like you could create an external tool that computes a desired layout and outputs it as a symbol ordering file.

lld/ELF/CallGraphSort.cpp
62–95 ↗(On Diff #126894)

Since CallGraphSort class is local to this file anyway, I don't see a reason to make these classes inside CallGraphSort. You could move this outside of CallGraphSort (but still inside the anonymous namespace) to remove an extra level of namespace.

100 ↗(On Diff #126894)

ArrayRef is preferred over std::vector<T>&

106–107 ↗(On Diff #126894)

priority -> Priority

priority_lookup -> PriorityLookup

115–119 ↗(On Diff #126894)

Usually we write public members before private members.

124–125 ↗(On Diff #126894)

Sections is always empty, no? Why don't you do

CallGraphSort::Node::Node(const InputSectionBase *IS) : Sections(IS), Size(IS->getSize()) {}

?

162 ↗(On Diff #126894)

Since Config->CallGraphProfile is available here, you don't need to pass it as an argument.

217 ↗(On Diff #126894)

Use ArrayRef.

lld/ELF/Driver.cpp
581 ↗(On Diff #126894)

Remove this trailing empty comment line.

1059 ↗(On Diff #126894)

Directly use getLastArg(OPT_call_graph_profile_file) and remove Config->CallGraphProfileFile.

lld/ELF/Options.td
54–58 ↗(On Diff #126894)

I think these two options should be merged into one option which takes a filename as an argument and enables the feature.

llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
126–142 ↗(On Diff #126894)

Debug code?

llvm/lib/Transforms/Instrumentation/CFGProfile.cpp
96–99 ↗(On Diff #126894)

Indentation

llvm/tools/llvm-readobj/ELFDumper.cpp
3422 ↗(On Diff #126894)

Please use clang-format.

4032 ↗(On Diff #126894)

80 cols

Bigcheese marked 10 inline comments as done.Dec 14 2017, 5:51 PM

This needs to be implemented in the linker because of the line:

if (From.Size + To.Size > Target->PageSize)
  continue;

This is responsible for doubling the performance improvement on a game. Implementing this outside of the linker requires implementing most of the linker, including LTO.

lld/ELF/CallGraphSort.cpp
124–125 ↗(On Diff #126894)

Because Sections(IS) isn't valid?

162 ↗(On Diff #126894)

This class currently uses zero global state. I'd prefer not to add any for no gain.

217 ↗(On Diff #126894)

FE is used as a std::vector here. Replacing with ArrayRef would not work.

lld/ELF/Options.td
54–58 ↗(On Diff #126894)

A text file is not the only way to get input for sorting. The input can also come from object files.

Bigcheese updated this revision to Diff 127059.Dec 14 2017, 5:52 PM

Address review comments.

ruiu added a comment.Dec 14 2017, 6:01 PM

This needs to be implemented in the linker because of the line:

if (From.Size + To.Size > Target->PageSize)
  continue;

This is responsible for doubling the performance improvement on a game. Implementing this outside of the linker requires implementing most of the linker, including LTO.

And function sizes cannot be retrieved from the instrumented build (because instrumentation changes function size), so it must be computed at link-time. Is this correct?

ruiu added inline comments.Dec 14 2017, 6:04 PM
lld/ELF/Options.td
54–58 ↗(On Diff #126894)

Yeah, that actually confused me. How a compiler can add information related to profiling data to an object file being generated? I don't think I understand how the new .note section works.

Bigcheese added a comment.EditedDec 14 2017, 7:29 PM
In D36351#956212, @ruiu wrote:

This needs to be implemented in the linker because of the line:

if (From.Size + To.Size > Target->PageSize)
  continue;

This is responsible for doubling the performance improvement on a game. Implementing this outside of the linker requires implementing most of the linker, including LTO.

And function sizes cannot be retrieved from the instrumented build (because instrumentation changes function size), so it must be computed at link-time. Is this correct?

Yes.

Bigcheese added inline comments.Dec 14 2017, 7:47 PM
lld/ELF/Options.td
54–58 ↗(On Diff #126894)

When you do -fprofile-instr-use=<path> the compiler adds profile data from that file to the IR as metadat. The cgprofile pass uses that metadata to generate metadata for the call graph profile. It then writes that to the object file.

Bigcheese added inline comments.Dec 15 2017, 1:46 AM
lld/ELF/Options.td
54–58 ↗(On Diff #126894)

To clarify. Where the compiler reads -fprofile-instr-use=<path> is the only place the profile data can be read. It depends on the exact state of the IR at that point in translation. You can't just pass the profile data to the linker or some other tool and have it make sense of it. The cgprofile pass runs after all other IR level optimization. These other optimization passes keep the branch frequency info (BFI) metadata up to date with CFG and CG changes. The cgprofile pass uses BFI to compute the actual call graph profile. This has to be communicated to the linker somehow, so it gets written to the object file.

ruiu added a comment.Dec 15 2017, 1:02 PM

Then I wonder why you need to pass a profile data to the linker as a text file in the first place. Does the data embedded to an object file lack something, like inter-compilation-unit call frequency?

ruiu added inline comments.Dec 15 2017, 1:16 PM
lld/ELF/CallGraphSort.cpp
58–59 ↗(On Diff #127059)

nit: move this out of the anonymous namespace.

194–196 ↗(On Diff #127059)

This means that an input file could contain more than one line for the same from-to symbol pair. But does that actually happen?

ruiu added inline comments.Dec 15 2017, 1:37 PM
lld/ELF/CallGraphSort.cpp
208 ↗(On Diff #127059)

How can the graph have a self edge?

213–215 ↗(On Diff #127059)

This assertion doesn't make make sense to me -- if WorkLookup.find(CEI) == WorkLookup.end(), then std::find(WorkQueue.begin(), WorkQueue.end(), CEI) == WorkQueue.end() should be trivially true which can't be broken.

241 ↗(On Diff #127059)

We don't care about micro managing memory consumption in lld. If you need to erase elements not for saving memory but for other reason, please use erase().

248–249 ↗(On Diff #127059)

Can't this just be std::sort(FE.begin(), FE.end())?

257 ↗(On Diff #127059)

It seems you are not actually removing Result.

263–264 ↗(On Diff #127059)

Remove.

267–268 ↗(On Diff #127059)

Remove

279 ↗(On Diff #127059)

nit: please add {}

292 ↗(On Diff #127059)

Please expand this comment. In particular, please describe what WorkLookup and WorkQueue are expected to have. Looks like I can't understand the code without it.

In D36351#957139, @ruiu wrote:

Then I wonder why you need to pass a profile data to the linker as a text file in the first place. Does the data embedded to an object file lack something, like inter-compilation-unit call frequency?

The text file is a separate way to get the call graph profile into the linker. @rafael wanted to do that as a first pass to simplify the patch, but you wanted to see the full thing.

ruiu added a comment.Dec 15 2017, 6:28 PM

I wonder what was a motivation to implement two different ways to do one thing.

Bigcheese added inline comments.Dec 15 2017, 6:29 PM
lld/ELF/CallGraphSort.cpp
194–196 ↗(On Diff #127059)

This actually means that multiple symbols could be in the same section.

208 ↗(On Diff #127059)

Two symbols which are in the same section.

213–215 ↗(On Diff #127059)

It's not trivially true. WorkLookup and WorkQueue are manually maintained and edges are constantly changed. This is to ensure that they are kept in sync.

241 ↗(On Diff #127059)

Pretty sure we care about N^2 memory usage.

248–249 ↗(On Diff #127059)

No.

In D36351#957474, @ruiu wrote:

I wonder what was a motivation to implement two different ways to do one thing.

There are other ways to generate a call graph profile. dtrace for example. I don't particularly care about the text format as you can always generate an object file that only contains profile data, but it is a lot easier to generate the text file.

ruiu added inline comments.Dec 15 2017, 6:32 PM
lld/ELF/CallGraphSort.cpp
208 ↗(On Diff #127059)

Can you elaborate? I'm trying to understand it, so giving me just an answer isn't that helpful to me.

248–249 ↗(On Diff #127059)

Why?

ruiu added a comment.Dec 15 2017, 6:35 PM

I did not request you to upload a patch which contains both LLVM and lld changes, but I didn't intend to put everything in one patch. I wouldn't add the text file and the object file profiling data at the same time in the initial patch. It doesn't sound like you are not sure if you need the text file support. I would drop the text file support then.

ruiu added a comment.Dec 15 2017, 6:39 PM

Apologies. I meant I did request you to upload a patch which contains ...

Bigcheese added inline comments.Dec 15 2017, 7:07 PM
lld/ELF/CallGraphSort.cpp
208 ↗(On Diff #127059)

Given an object file which contains a .text section and two symbols, A and B, which point to that section. And given the following profile:

A -> B 1000

The code in CallGraphSort::CallGraphSort will generate a graph with one node and one edge:

Nodes:
  - index: 0
    weight: 1000
Edges:
  - from: 0
    to: 0
    weight: 1000

This correctly gives node[0] a weight of 1000 which is needed for the final sort by density.

CallGraphSort::CallGraphSort could skip adding self edges which would remove the need for this. I'll do that.

248–249 ↗(On Diff #127059)

Because the following code merges consecutive equal edges to remove duplicate edges. FE is a list of edges indices, not edges.

Bigcheese edited the summary of this revision. (Show Details)

I've simplified the logic by getting rid of WorkLookup and just embedding the std::set iterator in Edge.

I've also split off the LLVM changes into two separate reviews:

Bigcheese updated this revision to Diff 132446.Feb 1 2018, 12:04 PM

Changed to testcase from paper.

Bigcheese updated this revision to Diff 132502.Feb 1 2018, 4:17 PM
  • Order all sections and add a test
  • Simplified
  • Remove unused include
Bigcheese updated this revision to Diff 132667.Feb 2 2018, 1:48 PM
  • Don't reorder non-reorderable sections
  • Skip edges across output sections
  • Add tests
Bigcheese updated this revision to Diff 133550.Feb 8 2018, 6:05 PM
  • Update to top of tree.
  • Add some comments.
pdeng6 added a subscriber: pdeng6.Feb 9 2018, 1:18 AM

@Bigcheese

I'm trying function layout methodologies within ChromeOS to improve chrome performance, this feature is very important since the work relies on it.
I've checked out the patches for both lld(https://reviews.llvm.org/D36351?id=133550) and llvm(https://reviews.llvm.org/D42161 and https://reviews.llvm.org/D42163), and built it locally, but I'm afraid the patches for llvm is not as fresh as that for lld.
Would it be possible for you to show the latest patches at both side, and the base llvm/lld revisions you are working on? then we can completely align the environment.

thanks
Pan

espindola edited reviewers, added: espindola; removed: rafael.Mar 2 2018, 5:52 PM
Bigcheese updated this revision to Diff 137040.Mar 5 2018, 10:42 AM

Rewrote the algorithm to match hfsort. Now gets the same performance as hfsort in Rafael's testcase.

Bigcheese updated this revision to Diff 137539.Mar 7 2018, 8:03 PM

Address review comments.

Rafael Avila de Espindola <rafael.espindola@gmail.com> writes:

+ InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
+ InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
+ if (FromSec && ToSec)
+ Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] = Count;

This should be "+=", no? If there are multiple calls from one section to
another I would expect us to use the total in the graph.

Looks like the missing + gets us the last missing performance bits.

In the attached log "old" is a rebased version of your previous patch, "new"
is your current patch and "new-new" is your current patch with the above
fix.

Ah, and please git-clang-format :-)

Cheers,
Rafael

Michael Spencer via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

Bigcheese updated this revision to Diff 137040.
Bigcheese added a comment.

Rewrote the algorithm to match hfsort. Now gets the same performance as hfsort in Rafael's testcase.

https://reviews.llvm.org/D36351

Files:

ELF/CMakeLists.txt
ELF/CallGraphSort.cpp
ELF/CallGraphSort.h
ELF/Config.h
ELF/Driver.cpp
ELF/Options.td
ELF/Writer.cpp
test/ELF/cgprofile-txt.s

Index: test/ELF/cgprofile-txt.s

  • /dev/null

+++ test/ELF/cgprofile-txt.s
@@ -0,0 +1,106 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
+# RUN: ld.lld -e A %t -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
+
+# RUN: echo "A B 100" > %t.call_graph
+# RUN: echo "A C 40" >> %t.call_graph
+# RUN: echo "B C 30" >> %t.call_graph
+# RUN: echo "C D 90" >> %t.call_graph
+# RUN: echo "PP TS 100" >> %t.call_graph
+# RUN: echo "_init2 _init 24567837" >> %t.call_graph
+# RUN: echo "TS QC 9001" >> %t.call_graph
+# RUN: ld.lld -e A %t --call-graph-ordering-file %t.call_graph -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s
+
+ .section .text.D,"ax",@progbits
+D:
+ retq
+
+ .section .text.C,"ax",@progbits
+ .globl C
+C:
+ retq
+
+ .section .text.B,"ax",@progbits
+ .globl B
+B:
+ retq
+
+ .section .text.A,"ax",@progbits
+ .globl A
+A:
+ retq
+
+ .section .ponies,"ax",@progbits,unique,1
+ .globl TS
+TS:
+ retq
+
+ .section .ponies,"ax",@progbits,unique,2
+ .globl PP
+PP:
+ retq
+
+ .section .other,"ax",@progbits,unique,1
+ .globl QC
+QC:
+ retq
+
+ .section .other,"ax",@progbits,unique,2
+ .globl GB
+GB:
+ retq
+
+ .section .init,"ax",@progbits,unique,1
+ .globl _init
+_init:
+ retq
+
+ .section .init,"ax",@progbits,unique,2
+ .globl _init2
+_init2:
+ retq
+
+# CHECK: Name: D
+# CHECK-NEXT: Value: 0x201003
+# CHECK: Name: A
+# CHECK-NEXT: Value: 0x201000
+# CHECK: Name: B
+# CHECK-NEXT: Value: 0x201001
+# CHECK: Name: C
+# CHECK-NEXT: Value: 0x201002
+# CHECK: Name: GB
+# CHECK-NEXT: Value: 0x201007
+# CHECK: Name: PP
+# CHECK-NEXT: Value: 0x201004
+# CHECK: Name: QC
+# CHECK-NEXT: Value: 0x201006
+# CHECK: Name: TS
+# CHECK-NEXT: Value: 0x201005
+# CHECK: Name: _init
+# CHECK-NEXT: Value: 0x201008
+# CHECK: Name: _init2
+# CHECK-NEXT: Value: 0x201009
+
+# NOSORT: Name: D
+# NOSORT-NEXT: Value: 0x201000
+# NOSORT: Name: A
+# NOSORT-NEXT: Value: 0x201003
+# NOSORT: Name: B
+# NOSORT-NEXT: Value: 0x201002
+# NOSORT: Name: C
+# NOSORT-NEXT: Value: 0x201001
+# NOSORT: Name: GB
+# NOSORT-NEXT: Value: 0x201007
+# NOSORT: Name: PP
+# NOSORT-NEXT: Value: 0x201005
+# NOSORT: Name: QC
+# NOSORT-NEXT: Value: 0x201006
+# NOSORT: Name: TS
+# NOSORT-NEXT: Value: 0x201004
+# NOSORT: Name: _init
+# NOSORT-NEXT: Value: 0x201008
+# NOSORT: Name: _init2
+# NOSORT-NEXT: Value: 0x201009

Index: ELF/Writer.cpp

  • ELF/Writer.cpp

+++ ELF/Writer.cpp
@@ -9,6 +9,7 @@

#include "Writer.h"
#include "AArch64ErrataFix.h"
+#include "CallGraphSort.h"
#include "Config.h"
#include "Filesystem.h"
#include "LinkerScript.h"
@@ -1110,6 +1111,14 @@
If no layout was provided by linker script, we want to apply default
sorting for special input sections. This also handles --symbol-ordering-file.
template <class ELFT> void Writer<ELFT>::sortInputSections() {
+ // Use the rarely used option -call-graph-ordering-file to sort sections.
+ if (!Config->CallGraphProfile.empty()) {
+ DenseMap<const InputSectionBase *, int> Order =
+ computeCallGraphProfileOrder();
+ for (BaseCommand *Base : Script->SectionCommands)
+ if (auto *Sec = dyn_cast<OutputSection>(Base))
+ sortSection(Sec, Order);
+ }

// Build the order once since it is expensive.
DenseMap<const InputSectionBase *, int> Order = buildSectionOrder();
for (BaseCommand *Base : Script->SectionCommands)

Index: ELF/Options.td

  • ELF/Options.td

+++ ELF/Options.td
@@ -66,6 +66,9 @@

"Only set DT_NEEDED for shared libraries if used",
"Always set DT_NEEDED for shared libraries">;

+defm call_graph_ordering_file: Eq<"call-graph-ordering-file">,
+ HelpText<"Layout sections to optimize the given callgraph">;
+
// -chroot doesn't have a help text because it is an internal option.
defm chroot: Eq<"chroot">;

Index: ELF/Driver.cpp

  • ELF/Driver.cpp

+++ ELF/Driver.cpp
@@ -571,6 +571,31 @@

return {BuildIdKind::None, {}};

}

+static void readCallGraph(MemoryBufferRef MB) {
+ // Build a map from symbol name to section
+ DenseMap<StringRef, InputSectionBase *> SymbolSection;
+ for (InputFile *File : ObjectFiles)
+ for (Symbol *Sym : File->getSymbols())
+ if (auto *D = dyn_cast<Defined>(Sym))
+ if (auto *IS = dyn_cast_or_null<InputSectionBase>(D->Section))
+ SymbolSection[D->getName()] = IS;
+
+ std::vector<StringRef> Lines = args::getLines(MB);
+ for (StringRef L : Lines) {
+ SmallVector<StringRef, 3> Fields;
+ L.split(Fields, ' ');
+ if (Fields.size() != 3)
+ fatal("parse error");
+ uint64_t Count;
+ if (!to_integer(Fields[2], Count))
+ fatal("parse error");
+ InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
+ InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
+ if (FromSec && ToSec)
+ Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] = Count;
+ }
+}
+
static bool getCompressDebugSections(opt::InputArgList &Args) {

StringRef S = Args.getLastArgValue(OPT_compress_debug_sections, "none");
if (S == "none")

@@ -1118,6 +1143,10 @@

// Apply symbol renames for -wrap.
Symtab->applySymbolWrap();

+ if (auto *Arg = Args.getLastArg(OPT_call_graph_ordering_file))
+ if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
+ readCallGraph(*Buffer);
+

// Now that we have a complete list of input files.
// Beyond this point, no new files are added.
// Aggregate all input sections into one place.

Index: ELF/Config.h

  • ELF/Config.h

+++ ELF/Config.h
@@ -24,6 +24,7 @@
namespace elf {

class InputFile;
+class InputSectionBase;

enum ELFKind {

ELFNoneKind,

@@ -103,6 +104,9 @@

std::vector<SymbolVersion> VersionScriptGlobals;
std::vector<SymbolVersion> VersionScriptLocals;
std::vector<uint8_t> BuildIdVector;

+ llvm::MapVector<std::pair<const InputSectionBase *, const InputSectionBase *>,
+ uint64_t>
+ CallGraphProfile;

bool AllowMultipleDefinition;
bool AndroidPackDynRelocs = false;
bool ARMHasBlx = false;

Index: ELF/CallGraphSort.h

  • /dev/null

+++ ELF/CallGraphSort.h
@@ -0,0 +1,23 @@
+===- CallGraphSort.h ------------------------------------------*- C++ -*-===
+
+
The LLVM Linker
+
+
This file is distributed under the University of Illinois Open Source
+ License. See LICENSE.TXT for details.
+

+===----------------------------------------------------------------------===
+
+#ifndef LLD_ELF_CALL_GRAPH_SORT_H
+#define LLD_ELF_CALL_GRAPH_SORT_H
+
+#include "llvm/ADT/DenseMap.h"
+
+namespace lld {
+namespace elf {
+class InputSectionBase;
+
+llvm::DenseMap<const InputSectionBase *, int> computeCallGraphProfileOrder();
+} namespace elf
+}
namespace lld
+
+#endif

Index: ELF/CallGraphSort.cpp

  • /dev/null

+++ ELF/CallGraphSort.cpp
@@ -0,0 +1,350 @@
+===- CallGraphSort.cpp --------------------------------------------------===
+
+
The LLVM Linker
+
+
This file is distributed under the University of Illinois Open Source
+ License. See LICENSE.TXT for details.
+

+===----------------------------------------------------------------------===
+/
+
/ Implementation of Call-Chain Clustering from: Optimizing Function Placement
+/ for Large-Scale Data-Center Applications
+
/ https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+/
+
/ The goal of this algorithm is to improve runtime performance of the final
+/ executable by arranging code sections such that page table and i-cache
+
/ misses are minimized.
+/
+
/ Definitions:
+/ * Cluster
+
/ * An ordered list of input sections which are layed out as a unit. At the
+/ beginning of the algorithm each input section has its own cluster and
+
/ the weight of the cluster is the sum of the weight of all incomming
+/ edges.
+
/ * Call-Chain Clustering (C³) Heuristic
+/ * Defines when and how clusters are combined. Pick the highest weighted
+
/ input section then add it to its most likely predecessor if it wouldn't
+/ penalize it too much.
+
/ * Density
+/ * The weight of the cluster divided by the size of the cluster. This is a
+
/ proxy for the ammount of execution time spent per byte of the cluster.
+/
+
/ It does so given a call graph profile by the following:
+/ * Build a weighted call graph from the profile
+
/ * Sort input sections by weight
+/ * For each input section starting with the highest weight
+
/ * Find its most likely predecessor cluster
+/ * Check if the combined cluster would be too large, or would have too low
+
/ a density.
+/ * If not, then combine the clusters.
+
/ * Sort non-empty clusters by density
+/
+
===----------------------------------------------------------------------===
+
+#include "CallGraphSort.h"
+#include "OutputSections.h"
+#include "SymbolTable.h"
+#include "Symbols.h"
+
+#include "llvm/Support/MathExtras.h"
+
+using namespace llvm;
+using namespace lld;
+using namespace lld::elf;
+
+namespace {
+using ClusterIndex = std::ptrdiff_t;
+using SectionIndex = std::ptrdiff_t;
+using EdgeIndex = std::ptrdiff_t;
+
+
Used for calculating an comparing density. Use soft-float for determinism.
+struct Double : APFloat {
+ Double() : APFloat(APFloat::IEEEdouble(), 0) {}
+ Double(uint64_t Val) : APFloat(APFloat::IEEEdouble(), Val) {}
+ Double(APFloat A) : APFloat(A) {}
+ bool operator>(const Double Other) const {
+ return compare(Other) == cmpGreaterThan;
+ }
+ bool operator<(const Double Other) const {
+ return compare(Other) == cmpLessThan;
+ }
+};
+
+struct Cluster {
+ Cluster(SectionIndex Sec, const InputSectionBase *IS);
+
+ Double getDensity() const {
+ if (Size == 0)
+ return 0;
+ return Double(Weight) / Double(Size);
+ }
+
+ std::vector<const InputSectionBase *> ISBs;
+ std::vector<SectionIndex> Sections;
+ int64_t Size = 0;
+ uint64_t Weight = 0;
+};
+
+struct Section {
+ Section(const InputSectionBase *IS) : ISB(IS) { Size = ISB->getSize(); }
+
+ Double getDensity() const {
+ if (Size == 0)
+ return 0;
+ return Double(Weight) / Double(Size);
+ }
+
+ int64_t Size = 0;
+ uint64_t Weight = 0;
+ const InputSectionBase *ISB;
+ std::vector<SectionIndex> Preds;
+ std::vector<SectionIndex> Succs;
+};
+
+struct Edge {
+ SectionIndex From;
+ SectionIndex To;
+ uint64_t Weight;
+ Double NormalizedWeight = 0;
+
+ bool operator==(const Edge Other) const;
+};
+
+struct EdgeDenseMapInfo {
+ static Edge getEmptyKey() {
+ return {DenseMapInfo<SectionIndex>::getEmptyKey(),
+ DenseMapInfo<SectionIndex>::getEmptyKey(), 0, 0};
+ }
+ static Edge getTombstoneKey() {
+ return {DenseMapInfo<SectionIndex>::getTombstoneKey(),
+ DenseMapInfo<SectionIndex>::getTombstoneKey(), 0, 0};
+ }
+ static unsigned getHashValue(const Edge &Val) {
+ return hash_combine(DenseMapInfo<SectionIndex>::getHashValue(Val.From),
+ DenseMapInfo<SectionIndex>::getHashValue(Val.To));
+ }
+ static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS == RHS; }
+};
+
+class CallGraphSort {
+public:
+ CallGraphSort();
+
+ DenseMap<const InputSectionBase *, int> run();
+
+private:
+ DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;
+ std::vector<Cluster> Clusters;
+ std::vector<Edge> Edges;
+ std::vector<Section> Sections;
+
+ void normalizeEdgeWeights();
+ void generateClusters();
+};
+
+ Maximum ammount the combined cluster density can be worse than the original
+
cluster to consider merging.
+constexpr int MAX_DENSITY_DEGRADATION = 8;
+
+ Maximum cluster size in bytes.
+constexpr uint64_t MAX_CLUSTER_SIZE = 1024 * 1024;
+}
end anonymous namespace
+
+Cluster::Cluster(SectionIndex Sec, const InputSectionBase *IS) {
+ ISBs.push_back(IS);
+ Sections.push_back(Sec);
+ Size = IS->getSize();
+}
+
+bool Edge::operator==(const Edge Other) const {
+ return From == Other.From && To == Other.To;
+}
+
+ Take the edge list in Config->CallGraphProfile, resolve symbol names to
+
Symbols, and generate a graph between InputSections with the provided
+ weights.
+CallGraphSort::CallGraphSort() {
+ MapVector<std::pair<const InputSectionBase *, const InputSectionBase *>,
+ uint64_t> &Profile = Config->CallGraphProfile;
+ DenseMap<const InputSectionBase *, SectionIndex> SecToSec;
+
+ auto GetOrCreateNode = [&](const InputSectionBase *IS) -> SectionIndex {
+ auto Res = SecToSec.insert(std::make_pair(IS, Sections.size()));
+ if (Res.second)
+ Sections.emplace_back(IS);
+ return Res.first->second;
+ };
+
+
Create the graph.
+ for (const auto &C : Profile) {
+ const InputSectionBase *FromSB = C.first.first;
+ const InputSectionBase *ToSB = C.first.second;
+ uint64_t Weight = C.second;
+
+ if (Weight == 0)
+ continue;
+
+ Ignore edges between input sections belonging to different output
+
sections. This is done because otherwise we would end up with clusters
+ containing input sections that can't actually be placed adjacently in the
+
output. This messes with the cluster size and density calculations. We
+ would also end up moving input sections in other output sections without
+
moving them closer to what calls them.
+ if (FromSB->getOutputSection() != ToSB->getOutputSection())
+ continue;
+
+ SectionIndex From = GetOrCreateNode(FromSB);
+ SectionIndex To = GetOrCreateNode(ToSB);
+
+ Sections[To].Weight = SaturatingAdd(Sections[To].Weight, Weight);
+
+ if (From == To)
+ continue;
+
+ Edge E{From, To, Weight};
+
+ Add or increment an edge
+ auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
+ EdgeIndex EI = Res.first->second;
+ if (Res.second) {
+ Edges.push_back(E);
+ Sections[From].Succs.push_back(To);
+ Sections[To].Preds.push_back(From);
+ } else
+ Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, Weight);
+ }
+ normalizeEdgeWeights();
+}
+
+
Normalize the edge weights so that we can reject edges which have a low
+ probibility.
+void CallGraphSort::normalizeEdgeWeights() {
+ for (SectionIndex SI = 0, SE = Sections.size(); SI != SE; ++SI) {
+ Section &S = Sections[SI];
+ for (SectionIndex PI : S.Preds) {
+ Edge &E = Edges[EdgeMap[{PI, SI, 0, 0}]];
+ if (S.Weight == 0)
+ continue;
+ E.NormalizedWeight = Double(E.Weight) / Double(S.Weight);
+ }
+ }
+}
+
+
It's bad to merge clusters which would degrade the density too much.
+static bool isNewDensityBad(Cluster &A, Cluster &B) {
+ Double NewDensity = Double(A.Weight + B.Weight) / Double(A.Size + B.Size);
+ if (A.getDensity() >
+ NewDensity * Double(MAX_DENSITY_DEGRADATION))
+ return true;
+ return false;
+}
+
+static void mergeClusters(Cluster &Into, Cluster &From) {
+ Into.ISBs.insert(Into.ISBs.end(), From.ISBs.begin(), From.ISBs.end());
+ Into.Sections.insert(Into.Sections.end(), From.Sections.begin(),
+ From.Sections.end());
+ Into.Size += From.Size;
+ Into.Weight += From.Weight;
+ From.ISBs.clear();
+ From.Sections.clear();
+ From.Size = 0;
+ From.Weight = 0;
+}
+
+ Group InputSections into clusters using the Call-Chain Clustering heuristic
+
then sort the clusters by density.
+void CallGraphSort::generateClusters() {
+ Minimum edge probability to consider merging.
+ const Double MIN_EDGE_PROBABILITY = Double(1) / Double(10);
+
+ std::vector<SectionIndex> SortedSecs;
+ std::vector<Cluster *> SecToCluster(Sections.size());
+
+ Clusters.reserve(Sections.size());
+
+ for (SectionIndex SI = 0, SE = Sections.size(); SI != SE; ++SI) {
+ Cluster C(SI, Sections[SI].ISB);
+ C.Size = Sections[SI].Size;
+ C.Weight = Sections[SI].Weight;
+ Clusters.push_back(C);
+ SortedSecs.push_back(SI);
+ }
+
+ for (Cluster &C : Clusters) {
+ SecToCluster[C.Sections.front()] = &C;
+ }
+
+ std::stable_sort(SortedSecs.begin(), SortedSecs.end(),
+ [&](SectionIndex A, SectionIndex B) {
+ return Sections[A].getDensity() > Sections[B].getDensity();
+ });
+
+ for (SectionIndex SI : SortedSecs) {
+ Cluster &C = *SecToCluster[SI];
+
+ SectionIndex BestPred = -1;
+ Double BestWeight = 0;
+
+ for (SectionIndex PI : Sections[SI].Preds) {
+ Edge &E = Edges[EdgeMap[{PI, SI, 0, 0}]];
+ if (BestPred == -1 || E.NormalizedWeight > BestWeight) {
+ BestPred = PI;
+ BestWeight = E.NormalizedWeight;
+ }
+ }
+
+ if (BestWeight < MIN_EDGE_PROBABILITY)
+ continue;
+
+ Cluster *PredC = SecToCluster[BestPred];
+ if (PredC == nullptr || PredC == &C)
+ continue;
+
+ if (C.Size + PredC->Size > MAX_CLUSTER_SIZE)
+ continue;
+
+ if (isNewDensityBad(*PredC, C))
+ continue;
+
+ for (SectionIndex SI : C.Sections)
+ SecToCluster[SI] = PredC;
+
+ mergeClusters(*PredC, C);
+ }
+
+
Remove empty or dead nodes.
+ Clusters.erase(std::remove_if(Clusters.begin(), Clusters.end(),
+ [](const Cluster &C) {
+ return C.Size == 0 || C.Sections.empty();
+ }),
+ Clusters.end());
+
+ Sort by density. Invalidates all NodeIndexs.
+ std::sort(Clusters.begin(), Clusters.end(),
+ [](const Cluster &A, const Cluster &B) {
+ return A.getDensity() > B.getDensity();
+ });
+}
+
+DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
+ generateClusters();
+
+
Generate order.
+ llvm::DenseMap<const InputSectionBase *, int> OrderMap;
+ ssize_t CurOrder = 1;
+
+ for (const Cluster &C : Clusters)
+ for (const InputSectionBase *IS : C.ISBs)
+ OrderMap[IS] = CurOrder++;
+
+ return OrderMap;
+}
+
+ Sort sections by the profile data provided by -callgraph-profile-file
+

+ This first builds a call graph based on the profile data then merges sections
+
according to the C³ huristic. All clusters are then sorted by a density
+// metric to further improve locality.
+DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder() {
+ return CallGraphSort().run();
+}

Index: ELF/CMakeLists.txt

  • ELF/CMakeLists.txt

+++ ELF/CMakeLists.txt
@@ -19,6 +19,7 @@

Arch/SPARCV9.cpp
Arch/X86.cpp
Arch/X86_64.cpp

+ CallGraphSort.cpp

Driver.cpp
DriverUtils.cpp
EhFrame.cpp

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

Michael Spencer via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

@@ -1110,6 +1111,14 @@
If no layout was provided by linker script, we want to apply default
sorting for special input sections. This also handles --symbol-ordering-file.
template <class ELFT> void Writer<ELFT>::sortInputSections() {
+ // Use the rarely used option -call-graph-ordering-file to sort sections.
+ if (!Config->CallGraphProfile.empty()) {
+ DenseMap<const InputSectionBase *, int> Order =
+ computeCallGraphProfileOrder();
+ for (BaseCommand *Base : Script->SectionCommands)
+ if (auto *Sec = dyn_cast<OutputSection>(Base))
+ sortSection(Sec, Order);
+ }

Instead of adding this if you can change buildSectionOrder:

static DenseMap<const InputSectionBase *, int> buildSectionOrder() {

if (!Config->CallGraphProfile.empty())
  return computeCallGraphProfileOrder();
...

+ InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
+ InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
+ if (FromSec && ToSec)
+ Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] = Count;

This should be "+=", no? If there are multiple calls from one section to
another I would expect us to use the total in the graph.

+struct EdgeDenseMapInfo {
+ static Edge getEmptyKey() {
+ return {DenseMapInfo<SectionIndex>::getEmptyKey(),
+ DenseMapInfo<SectionIndex>::getEmptyKey(), 0, 0};
+ }

This doesn't build with clang:

/home/espindola/llvm/llvm/tools/lld/ELF/CallGraphSort.cpp:115:12: error:
no matching constructor for initialization of '(anonymous
namespace)::Edge'

return {DenseMapInfo<SectionIndex>::getEmptyKey(),
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

/home/espindola/llvm/llvm/tools/lld/ELF/CallGraphSort.cpp:104:8: note:
candidate constructor (the implicit copy constructor) not viable:
requires 1 argument, but 4 were provided
struct Edge {

^

/home/espindola/llvm/llvm/tools/lld/ELF/CallGraphSort.cpp:104:8: note:
candidate constructor (the implicit move constructor) not viable:
requires 1 argument, but 4 were provided
/home/espindola/llvm/llvm/tools/lld/ELF/CallGraphSort.cpp:104:8: note:
candidate constructor (the implicit default constructor) not viable:
requires 0 arguments, but 4 were provided

Cheers,
Rafael

Michael Spencer via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

+struct Section {
+ Section(const InputSectionBase *IS) : ISB(IS) { Size = ISB->getSize(); }
+
+ Double getDensity() const {
+ if (Size == 0)
+ return 0;
+ return Double(Weight) / Double(Size);
+ }
+
+ int64_t Size = 0;

Why is Size signed? With the previous errors fixed I get a warning:

CallGraphSort.cpp:302:30: warning: comparison of integers of different
signs: 'long' and 'const uint64_t' (aka 'const unsigned long')

Cheers,
Rafael

I now get this warning:

/home/espindola/llvm/llvm/tools/lld/ELF/CallGraphSort.cpp:204:28:
warning: missing field 'NormalizedWeight' initializer
[-Wmissing-field-initializers]

Edge E{From, To, Weight};

And git-clang-format still produces this diff:

  • if (A.getDensity() >
  • NewDensity * Double(MAX_DENSITY_DEGRADATION))

+ if (A.getDensity() > NewDensity * Double(MAX_DENSITY_DEGRADATION))

Cheers,
Rafael

Michael Spencer via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

Bigcheese updated this revision to Diff 137539.
Bigcheese added a comment.

Address review comments.

https://reviews.llvm.org/D36351

Files:

ELF/CMakeLists.txt
ELF/CallGraphSort.cpp
ELF/CallGraphSort.h
ELF/Config.h
ELF/Driver.cpp
ELF/Options.td
ELF/Writer.cpp
test/ELF/cgprofile-txt.s

Index: test/ELF/cgprofile-txt.s

  • /dev/null

+++ test/ELF/cgprofile-txt.s
@@ -0,0 +1,106 @@
+# REQUIRES: x86
+
+# RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t
+# RUN: ld.lld -e A %t -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s --check-prefix=NOSORT
+
+# RUN: echo "A B 100" > %t.call_graph
+# RUN: echo "A C 40" >> %t.call_graph
+# RUN: echo "B C 30" >> %t.call_graph
+# RUN: echo "C D 90" >> %t.call_graph
+# RUN: echo "PP TS 100" >> %t.call_graph
+# RUN: echo "_init2 _init 24567837" >> %t.call_graph
+# RUN: echo "TS QC 9001" >> %t.call_graph
+# RUN: ld.lld -e A %t --call-graph-ordering-file %t.call_graph -o %t2
+# RUN: llvm-readobj -symbols %t2 | FileCheck %s
+
+ .section .text.D,"ax",@progbits
+D:
+ retq
+
+ .section .text.C,"ax",@progbits
+ .globl C
+C:
+ retq
+
+ .section .text.B,"ax",@progbits
+ .globl B
+B:
+ retq
+
+ .section .text.A,"ax",@progbits
+ .globl A
+A:
+ retq
+
+ .section .ponies,"ax",@progbits,unique,1
+ .globl TS
+TS:
+ retq
+
+ .section .ponies,"ax",@progbits,unique,2
+ .globl PP
+PP:
+ retq
+
+ .section .other,"ax",@progbits,unique,1
+ .globl QC
+QC:
+ retq
+
+ .section .other,"ax",@progbits,unique,2
+ .globl GB
+GB:
+ retq
+
+ .section .init,"ax",@progbits,unique,1
+ .globl _init
+_init:
+ retq
+
+ .section .init,"ax",@progbits,unique,2
+ .globl _init2
+_init2:
+ retq
+
+# CHECK: Name: D
+# CHECK-NEXT: Value: 0x201003
+# CHECK: Name: A
+# CHECK-NEXT: Value: 0x201000
+# CHECK: Name: B
+# CHECK-NEXT: Value: 0x201001
+# CHECK: Name: C
+# CHECK-NEXT: Value: 0x201002
+# CHECK: Name: GB
+# CHECK-NEXT: Value: 0x201007
+# CHECK: Name: PP
+# CHECK-NEXT: Value: 0x201004
+# CHECK: Name: QC
+# CHECK-NEXT: Value: 0x201006
+# CHECK: Name: TS
+# CHECK-NEXT: Value: 0x201005
+# CHECK: Name: _init
+# CHECK-NEXT: Value: 0x201008
+# CHECK: Name: _init2
+# CHECK-NEXT: Value: 0x201009
+
+# NOSORT: Name: D
+# NOSORT-NEXT: Value: 0x201000
+# NOSORT: Name: A
+# NOSORT-NEXT: Value: 0x201003
+# NOSORT: Name: B
+# NOSORT-NEXT: Value: 0x201002
+# NOSORT: Name: C
+# NOSORT-NEXT: Value: 0x201001
+# NOSORT: Name: GB
+# NOSORT-NEXT: Value: 0x201007
+# NOSORT: Name: PP
+# NOSORT-NEXT: Value: 0x201005
+# NOSORT: Name: QC
+# NOSORT-NEXT: Value: 0x201006
+# NOSORT: Name: TS
+# NOSORT-NEXT: Value: 0x201004
+# NOSORT: Name: _init
+# NOSORT-NEXT: Value: 0x201008
+# NOSORT: Name: _init2
+# NOSORT-NEXT: Value: 0x201009

Index: ELF/Writer.cpp

  • ELF/Writer.cpp

+++ ELF/Writer.cpp
@@ -9,6 +9,7 @@

#include "Writer.h"
#include "AArch64ErrataFix.h"
+#include "CallGraphSort.h"
#include "Config.h"
#include "Filesystem.h"
#include "LinkerScript.h"
@@ -1020,6 +1021,10 @@
// Builds section order for handling --symbol-ordering-file.
static DenseMap<const InputSectionBase *, int> buildSectionOrder() {

DenseMap<const InputSectionBase *, int> SectionOrder;

+ // Use the rarely used option -call-graph-ordering-file to sort sections.
+ if (!Config->CallGraphProfile.empty())
+ return computeCallGraphProfileOrder();
+

if (Config->SymbolOrderingFile.empty())
  return SectionOrder;

Index: ELF/Options.td

  • ELF/Options.td

+++ ELF/Options.td
@@ -66,6 +66,9 @@

"Only set DT_NEEDED for shared libraries if used",
"Always set DT_NEEDED for shared libraries">;

+defm call_graph_ordering_file: Eq<"call-graph-ordering-file">,
+ HelpText<"Layout sections to optimize the given callgraph">;
+
// -chroot doesn't have a help text because it is an internal option.
defm chroot: Eq<"chroot">;

Index: ELF/Driver.cpp

  • ELF/Driver.cpp

+++ ELF/Driver.cpp
@@ -571,6 +571,31 @@

return {BuildIdKind::None, {}};

}

+static void readCallGraph(MemoryBufferRef MB) {
+ // Build a map from symbol name to section
+ DenseMap<StringRef, InputSectionBase *> SymbolSection;
+ for (InputFile *File : ObjectFiles)
+ for (Symbol *Sym : File->getSymbols())
+ if (auto *D = dyn_cast<Defined>(Sym))
+ if (auto *IS = dyn_cast_or_null<InputSectionBase>(D->Section))
+ SymbolSection[D->getName()] = IS;
+
+ std::vector<StringRef> Lines = args::getLines(MB);
+ for (StringRef L : Lines) {
+ SmallVector<StringRef, 3> Fields;
+ L.split(Fields, ' ');
+ if (Fields.size() != 3)
+ fatal("parse error");
+ uint64_t Count;
+ if (!to_integer(Fields[2], Count))
+ fatal("parse error");
+ InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
+ InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
+ if (FromSec && ToSec)
+ Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] += Count;
+ }
+}
+
static bool getCompressDebugSections(opt::InputArgList &Args) {

StringRef S = Args.getLastArgValue(OPT_compress_debug_sections, "none");
if (S == "none")

@@ -1124,6 +1149,10 @@

// Apply symbol renames for -wrap.
Symtab->applySymbolWrap();

+ if (auto *Arg = Args.getLastArg(OPT_call_graph_ordering_file))
+ if (Optional<MemoryBufferRef> Buffer = readFile(Arg->getValue()))
+ readCallGraph(*Buffer);
+

// Now that we have a complete list of input files.
// Beyond this point, no new files are added.
// Aggregate all input sections into one place.

Index: ELF/Config.h

  • ELF/Config.h

+++ ELF/Config.h
@@ -24,6 +24,7 @@
namespace elf {

class InputFile;
+class InputSectionBase;

enum ELFKind {

ELFNoneKind,

@@ -103,6 +104,9 @@

std::vector<SymbolVersion> VersionScriptGlobals;
std::vector<SymbolVersion> VersionScriptLocals;
std::vector<uint8_t> BuildIdVector;

+ llvm::MapVector<std::pair<const InputSectionBase *, const InputSectionBase *>,
+ uint64_t>
+ CallGraphProfile;

bool AllowMultipleDefinition;
bool AndroidPackDynRelocs = false;
bool ARMHasBlx = false;

Index: ELF/CallGraphSort.h

  • /dev/null

+++ ELF/CallGraphSort.h
@@ -0,0 +1,23 @@
+===- CallGraphSort.h ------------------------------------------*- C++ -*-===
+
+
The LLVM Linker
+
+
This file is distributed under the University of Illinois Open Source
+ License. See LICENSE.TXT for details.
+

+===----------------------------------------------------------------------===
+
+#ifndef LLD_ELF_CALL_GRAPH_SORT_H
+#define LLD_ELF_CALL_GRAPH_SORT_H
+
+#include "llvm/ADT/DenseMap.h"
+
+namespace lld {
+namespace elf {
+class InputSectionBase;
+
+llvm::DenseMap<const InputSectionBase *, int> computeCallGraphProfileOrder();
+} namespace elf
+}
namespace lld
+
+#endif

Index: ELF/CallGraphSort.cpp

  • /dev/null

+++ ELF/CallGraphSort.cpp
@@ -0,0 +1,350 @@
+===- CallGraphSort.cpp --------------------------------------------------===
+
+
The LLVM Linker
+
+
This file is distributed under the University of Illinois Open Source
+ License. See LICENSE.TXT for details.
+

+===----------------------------------------------------------------------===
+/
+
/ Implementation of Call-Chain Clustering from: Optimizing Function Placement
+/ for Large-Scale Data-Center Applications
+
/ https://research.fb.com/wp-content/uploads/2017/01/cgo2017-hfsort-final1.pdf
+/
+
/ The goal of this algorithm is to improve runtime performance of the final
+/ executable by arranging code sections such that page table and i-cache
+
/ misses are minimized.
+/
+
/ Definitions:
+/ * Cluster
+
/ * An ordered list of input sections which are layed out as a unit. At the
+/ beginning of the algorithm each input section has its own cluster and
+
/ the weight of the cluster is the sum of the weight of all incomming
+/ edges.
+
/ * Call-Chain Clustering (C³) Heuristic
+/ * Defines when and how clusters are combined. Pick the highest weighted
+
/ input section then add it to its most likely predecessor if it wouldn't
+/ penalize it too much.
+
/ * Density
+/ * The weight of the cluster divided by the size of the cluster. This is a
+
/ proxy for the ammount of execution time spent per byte of the cluster.
+/
+
/ It does so given a call graph profile by the following:
+/ * Build a weighted call graph from the profile
+
/ * Sort input sections by weight
+/ * For each input section starting with the highest weight
+
/ * Find its most likely predecessor cluster
+/ * Check if the combined cluster would be too large, or would have too low
+
/ a density.
+/ * If not, then combine the clusters.
+
/ * Sort non-empty clusters by density
+/
+
===----------------------------------------------------------------------===
+
+#include "CallGraphSort.h"
+#include "OutputSections.h"
+#include "SymbolTable.h"
+#include "Symbols.h"
+
+#include "llvm/Support/MathExtras.h"
+
+using namespace llvm;
+using namespace lld;
+using namespace lld::elf;
+
+namespace {
+using ClusterIndex = std::ptrdiff_t;
+using SectionIndex = std::ptrdiff_t;
+using EdgeIndex = std::ptrdiff_t;
+
+
Used for calculating an comparing density. Use soft-float for determinism.
+struct Double : APFloat {
+ Double() : APFloat(APFloat::IEEEdouble(), 0) {}
+ Double(uint64_t Val) : APFloat(APFloat::IEEEdouble(), Val) {}
+ Double(APFloat A) : APFloat(A) {}
+ bool operator>(const Double Other) const {
+ return compare(Other) == cmpGreaterThan;
+ }
+ bool operator<(const Double Other) const {
+ return compare(Other) == cmpLessThan;
+ }
+};
+
+struct Cluster {
+ Cluster(SectionIndex Sec, const InputSectionBase *IS);
+
+ Double getDensity() const {
+ if (Size == 0)
+ return 0;
+ return Double(Weight) / Double(Size);
+ }
+
+ std::vector<const InputSectionBase *> ISBs;
+ std::vector<SectionIndex> Sections;
+ size_t Size = 0;
+ uint64_t Weight = 0;
+};
+
+struct Section {
+ Section(const InputSectionBase *IS) : ISB(IS) { Size = ISB->getSize(); }
+
+ Double getDensity() const {
+ if (Size == 0)
+ return 0;
+ return Double(Weight) / Double(Size);
+ }
+
+ size_t Size = 0;
+ uint64_t Weight = 0;
+ const InputSectionBase *ISB;
+ std::vector<SectionIndex> Preds;
+ std::vector<SectionIndex> Succs;
+};
+
+struct Edge {
+ SectionIndex From;
+ SectionIndex To;
+ uint64_t Weight;
+ Double NormalizedWeight;
+
+ bool operator==(const Edge Other) const;
+};
+
+struct EdgeDenseMapInfo {
+ static Edge getEmptyKey() {
+ return {DenseMapInfo<SectionIndex>::getEmptyKey(),
+ DenseMapInfo<SectionIndex>::getEmptyKey(), 0, 0};
+ }
+ static Edge getTombstoneKey() {
+ return {DenseMapInfo<SectionIndex>::getTombstoneKey(),
+ DenseMapInfo<SectionIndex>::getTombstoneKey(), 0, 0};
+ }
+ static unsigned getHashValue(const Edge &Val) {
+ return hash_combine(DenseMapInfo<SectionIndex>::getHashValue(Val.From),
+ DenseMapInfo<SectionIndex>::getHashValue(Val.To));
+ }
+ static bool isEqual(const Edge &LHS, const Edge &RHS) { return LHS == RHS; }
+};
+
+class CallGraphSort {
+public:
+ CallGraphSort();
+
+ DenseMap<const InputSectionBase *, int> run();
+
+private:
+ DenseMap<Edge, EdgeIndex, EdgeDenseMapInfo> EdgeMap;
+ std::vector<Cluster> Clusters;
+ std::vector<Edge> Edges;
+ std::vector<Section> Sections;
+
+ void normalizeEdgeWeights();
+ void generateClusters();
+};
+
+ Maximum ammount the combined cluster density can be worse than the original
+
cluster to consider merging.
+constexpr int MAX_DENSITY_DEGRADATION = 8;
+
+ Maximum cluster size in bytes.
+constexpr uint64_t MAX_CLUSTER_SIZE = 1024 * 1024;
+}
end anonymous namespace
+
+Cluster::Cluster(SectionIndex Sec, const InputSectionBase *IS) {
+ ISBs.push_back(IS);
+ Sections.push_back(Sec);
+ Size = IS->getSize();
+}
+
+bool Edge::operator==(const Edge Other) const {
+ return From == Other.From && To == Other.To;
+}
+
+ Take the edge list in Config->CallGraphProfile, resolve symbol names to
+
Symbols, and generate a graph between InputSections with the provided
+ weights.
+CallGraphSort::CallGraphSort() {
+ MapVector<std::pair<const InputSectionBase *, const InputSectionBase *>,
+ uint64_t> &Profile = Config->CallGraphProfile;
+ DenseMap<const InputSectionBase *, SectionIndex> SecToSec;
+
+ auto GetOrCreateNode = [&](const InputSectionBase *IS) -> SectionIndex {
+ auto Res = SecToSec.insert(std::make_pair(IS, Sections.size()));
+ if (Res.second)
+ Sections.emplace_back(IS);
+ return Res.first->second;
+ };
+
+
Create the graph.
+ for (const auto &C : Profile) {
+ const InputSectionBase *FromSB = C.first.first;
+ const InputSectionBase *ToSB = C.first.second;
+ uint64_t Weight = C.second;
+
+ if (Weight == 0)
+ continue;
+
+ Ignore edges between input sections belonging to different output
+
sections. This is done because otherwise we would end up with clusters
+ containing input sections that can't actually be placed adjacently in the
+
output. This messes with the cluster size and density calculations. We
+ would also end up moving input sections in other output sections without
+
moving them closer to what calls them.
+ if (FromSB->getOutputSection() != ToSB->getOutputSection())
+ continue;
+
+ SectionIndex From = GetOrCreateNode(FromSB);
+ SectionIndex To = GetOrCreateNode(ToSB);
+
+ Sections[To].Weight = SaturatingAdd(Sections[To].Weight, Weight);
+
+ if (From == To)
+ continue;
+
+ Edge E{From, To, Weight};
+
+ Add or increment an edge
+ auto Res = EdgeMap.insert(std::make_pair(E, Edges.size()));
+ EdgeIndex EI = Res.first->second;
+ if (Res.second) {
+ Edges.push_back(E);
+ Sections[From].Succs.push_back(To);
+ Sections[To].Preds.push_back(From);
+ } else
+ Edges[EI].Weight = SaturatingAdd(Edges[EI].Weight, Weight);
+ }
+ normalizeEdgeWeights();
+}
+
+
Normalize the edge weights so that we can reject edges which have a low
+ probibility.
+void CallGraphSort::normalizeEdgeWeights() {
+ for (SectionIndex SI = 0, SE = Sections.size(); SI != SE; ++SI) {
+ Section &S = Sections[SI];
+ for (SectionIndex PI : S.Preds) {
+ Edge &E = Edges[EdgeMap[{PI, SI, 0, 0}]];
+ if (S.Weight == 0)
+ continue;
+ E.NormalizedWeight = Double(E.Weight) / Double(S.Weight);
+ }
+ }
+}
+
+
It's bad to merge clusters which would degrade the density too much.
+static bool isNewDensityBad(Cluster &A, Cluster &B) {
+ Double NewDensity = Double(A.Weight + B.Weight) / Double(A.Size + B.Size);
+ if (A.getDensity() >
+ NewDensity * Double(MAX_DENSITY_DEGRADATION))
+ return true;
+ return false;
+}
+
+static void mergeClusters(Cluster &Into, Cluster &From) {
+ Into.ISBs.insert(Into.ISBs.end(), From.ISBs.begin(), From.ISBs.end());
+ Into.Sections.insert(Into.Sections.end(), From.Sections.begin(),
+ From.Sections.end());
+ Into.Size += From.Size;
+ Into.Weight += From.Weight;
+ From.ISBs.clear();
+ From.Sections.clear();
+ From.Size = 0;
+ From.Weight = 0;
+}
+
+ Group InputSections into clusters using the Call-Chain Clustering heuristic
+
then sort the clusters by density.
+void CallGraphSort::generateClusters() {
+ Minimum edge probability to consider merging.
+ const Double MIN_EDGE_PROBABILITY = Double(1) / Double(10);
+
+ std::vector<SectionIndex> SortedSecs;
+ std::vector<Cluster *> SecToCluster(Sections.size());
+
+ Clusters.reserve(Sections.size());
+
+ for (SectionIndex SI = 0, SE = Sections.size(); SI != SE; ++SI) {
+ Cluster C(SI, Sections[SI].ISB);
+ C.Size = Sections[SI].Size;
+ C.Weight = Sections[SI].Weight;
+ Clusters.push_back(C);
+ SortedSecs.push_back(SI);
+ }
+
+ for (Cluster &C : Clusters) {
+ SecToCluster[C.Sections.front()] = &C;
+ }
+
+ std::stable_sort(SortedSecs.begin(), SortedSecs.end(),
+ [&](SectionIndex A, SectionIndex B) {
+ return Sections[A].getDensity() > Sections[B].getDensity();
+ });
+
+ for (SectionIndex SI : SortedSecs) {
+ Cluster &C = *SecToCluster[SI];
+
+ SectionIndex BestPred = -1;
+ Double BestWeight = 0;
+
+ for (SectionIndex PI : Sections[SI].Preds) {
+ Edge &E = Edges[EdgeMap[{PI, SI, 0, 0}]];
+ if (BestPred == -1 || E.NormalizedWeight > BestWeight) {
+ BestPred = PI;
+ BestWeight = E.NormalizedWeight;
+ }
+ }
+
+ if (BestWeight < MIN_EDGE_PROBABILITY)
+ continue;
+
+ Cluster *PredC = SecToCluster[BestPred];
+ if (PredC == nullptr || PredC == &C)
+ continue;
+
+ if (C.Size + PredC->Size > MAX_CLUSTER_SIZE)
+ continue;
+
+ if (isNewDensityBad(*PredC, C))
+ continue;
+
+ for (SectionIndex SI : C.Sections)
+ SecToCluster[SI] = PredC;
+
+ mergeClusters(*PredC, C);
+ }
+
+
Remove empty or dead nodes.
+ Clusters.erase(std::remove_if(Clusters.begin(), Clusters.end(),
+ [](const Cluster &C) {
+ return C.Size == 0 || C.Sections.empty();
+ }),
+ Clusters.end());
+
+ Sort by density. Invalidates all NodeIndexs.
+ std::sort(Clusters.begin(), Clusters.end(),
+ [](const Cluster &A, const Cluster &B) {
+ return A.getDensity() > B.getDensity();
+ });
+}
+
+DenseMap<const InputSectionBase *, int> CallGraphSort::run() {
+ generateClusters();
+
+
Generate order.
+ llvm::DenseMap<const InputSectionBase *, int> OrderMap;
+ ssize_t CurOrder = 1;
+
+ for (const Cluster &C : Clusters)
+ for (const InputSectionBase *IS : C.ISBs)
+ OrderMap[IS] = CurOrder++;
+
+ return OrderMap;
+}
+
+ Sort sections by the profile data provided by -callgraph-profile-file
+

+ This first builds a call graph based on the profile data then merges sections
+
according to the C³ huristic. All clusters are then sorted by a density
+// metric to further improve locality.
+DenseMap<const InputSectionBase *, int> elf::computeCallGraphProfileOrder() {
+ return CallGraphSort().run();
+}

Index: ELF/CMakeLists.txt

  • ELF/CMakeLists.txt

+++ ELF/CMakeLists.txt
@@ -19,6 +19,7 @@

Arch/SPARCV9.cpp
Arch/X86.cpp
Arch/X86_64.cpp

+ CallGraphSort.cpp

Driver.cpp
DriverUtils.cpp
EhFrame.cpp

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

+static void readCallGraph(MemoryBufferRef MB) {
+ // Build a map from symbol name to section
+ DenseMap<StringRef, InputSectionBase *> SymbolSection;
+ for (InputFile *File : ObjectFiles)
+ for (Symbol *Sym : File->getSymbols())
+ if (auto *D = dyn_cast<Defined>(Sym))
+ if (auto *IS = dyn_cast_or_null<InputSectionBase>(D->Section))
+ SymbolSection[D->getName()] = IS;
+
+ std::vector<StringRef> Lines = args::getLines(MB);
+ for (StringRef L : Lines) {
+ SmallVector<StringRef, 3> Fields;
+ L.split(Fields, ' ');
+ if (Fields.size() != 3)
+ fatal("parse error");
+ uint64_t Count;
+ if (!to_integer(Fields[2], Count))
+ fatal("parse error");
+ InputSectionBase *FromSec = SymbolSection.lookup(Fields[0]);
+ InputSectionBase *ToSec = SymbolSection.lookup(Fields[1]);
+ if (FromSec && ToSec)
+ Config->CallGraphProfile[std::make_pair(FromSec, ToSec)] += Count;

Should this be using FromSec->Repl, ToSec->Repl to handle ICF? If so
please update the code and add a new test (the one in the patch is
getting big).

+ }
+}

With --symbol-ordering-file we produce warnings if, for example, the
symbol is found to be absolute. The same issue applies here, no? Can you
refactor the code so that we print the same warnings for
--call-graph-ordering-file?

+using ClusterIndex = std::ptrdiff_t;
+using SectionIndex = std::ptrdiff_t;
+using EdgeIndex = std::ptrdiff_t;

Does this means that we use 64 bit integers for indexes? Why?

+// Used for calculating an comparing density. Use soft-float for determinism.
+struct Double : APFloat {

This used to be a problem with x87, but do we support a system today
where this is a problem?

Cheers,
Rafael

Bigcheese updated this revision to Diff 139535.Mar 22 2018, 5:09 PM

Apply review comments.

After discussing the floating point issue with people and checking compilers, I believe this specific use is safe as there is no opportunity to form FMAs and it only uses division and comparison.

espindola added inline comments.Mar 28 2018, 8:16 AM
ELF/CallGraphSort.cpp
247

please git-clang-format the patch.

espindola added inline comments.Mar 28 2018, 9:38 AM
ELF/CallGraphSort.cpp
159

This warning is duplicated with what we have in Writer.cpp for --symbol-ordering-file. Could it be merged?

278

Delete extra empty lines.

Bigcheese updated this revision to Diff 140132.Mar 28 2018, 1:56 PM

Address review comments.

hintonda removed a subscriber: hintonda.Mar 28 2018, 5:47 PM
espindola added inline comments.Mar 28 2018, 10:48 PM
ELF/CallGraphSort.cpp
57

dead code.

58

Why do you need a 64 bit index. Why does it need a typedef?

espindola added inline comments.Mar 28 2018, 11:05 PM
ELF/CallGraphSort.cpp
150

const Edge &Other

ELF/Config.h
28

dead declaration.

ELF/Driver.cpp
615

This should now be just =, no?

Bigcheese marked 3 inline comments as done.Mar 29 2018, 11:35 AM
Bigcheese added inline comments.
ELF/CallGraphSort.cpp
58

While ELF limits the number of sections/symbols a single ELF file can hold, there's no limit on the number of input sections that go into the final link. If we have multiple ELF files with 4bn sections, then this could overflow. The chances of this are rare, but there's almost no cost in supporting it. The correct default type for array indexing is std::ptrdiff_t, you need a good reason to use something else.

As for the typedef, it makes it clearer what the purpose of the index is. Otherwise they would all have the same type. If C++ had a good way to do strong typedefs I would be using that here to add compile time enforcement.

ELF/Driver.cpp
615

It should be a saturating add. An input file can legitimately have multiple edges, and they should be merged.

Address review comments.

espindola added inline comments.Mar 29 2018, 5:55 PM
ELF/CallGraphSort.cpp
58

The reason is that nothing in lld does this.

Just use int/unsigned as appropriate. We do not design for an abstract pie in the sky.

Change this.

Bigcheese updated this revision to Diff 140713.Apr 2 2018, 5:15 PM

Address review comments.

espindola added inline comments.Apr 2 2018, 6:05 PM
ELF/CallGraphSort.cpp
57

Delete these typedefs and just use the proper type directly.

149

Define this inline.

ELF/Driver.cpp
617

This means that we supports having duplicated symbol edges. Is there a reason for that? If not please change this to report an error on duplicated edges.

espindola added inline comments.Apr 2 2018, 6:19 PM
ELF/CallGraphSort.cpp
202

This is using SaturatingAdd on a uint64_t. That also seems like over engineering.

Each unit of edge weight comes from a sample during execution. Even with a compact format using a single byte per sample we would need 16 exbibytes for it to overflow. Where would such a sample have been stored?

espindola added inline comments.Apr 2 2018, 6:28 PM
ELF/CallGraphSort.cpp
149

Actually, it is only called once. Inline it.

espindola added inline comments.Apr 2 2018, 6:47 PM
ELF/Driver.cpp
611–612

This is marked done, but it is not.

Bigcheese added inline comments.Apr 5 2018, 3:52 PM
ELF/CallGraphSort.cpp
57

That would significantly reduce code clarity. It's not uncommon in LLVM to do this. LLD even has a case of this with Relocations.h RelType.

ELF/Driver.cpp
617

This makes it easier to create input. For instance you can cat together profiles from different runs.

Bigcheese updated this revision to Diff 141236.Apr 5 2018, 3:53 PM

Address review comments.

espindola added inline comments.Apr 5 2018, 8:46 PM
ELF/CallGraphSort.cpp
57

RelType is the one example in lld and its use spans most of the linker, so it makes sense there. This is completely local and is just noise. Delete it.

ELF/Symbols.cpp
251 ↗(On Diff #141236)

Rebase on top of r329371. You should be able to remove the File argument with that.

espindola added inline comments.Apr 6 2018, 10:53 AM
ELF/CallGraphSort.cpp
50

Dead include.

ELF/Driver.cpp
617

Fair enough. Please add a testcase showing that behavior.

Bigcheese updated this revision to Diff 141718.Apr 9 2018, 12:45 PM

Address review comments.

espindola added inline comments.Apr 10 2018, 4:08 PM
ELF/CallGraphSort.cpp
33

The part about profile should be in a followup patch. For now this is given the weighted call graph.

64

You can remove this vector.

The only real use is when computing the returned DenseMap and you can use the Sections vector there:

for (const Cluster &C : Clusters)
  for (int SecIndex : C.Sections)
    OrderMap[Sections[SecIndex].ISB] = CurOrder++;
138

Define this function inline.

264

You don't need this loop, just add

SecToCluster[SI] = &Clusters.back();

to the previous one.

298

This is a slow union find solution when there are many union operations. Maybe that doesn't impact this in practice, but at least leave a comment about a possible improvement if we ever hit it.

Bigcheese updated this revision to Diff 142093.Apr 11 2018, 4:18 PM

Address review comments.

ELF/CallGraphSort.cpp
33

This is talking about the for loop in CallGraphSort() where it builds the graph.

espindola added inline comments.Apr 12 2018, 8:38 AM
ELF/CallGraphSort.cpp
33

Building a graph from a set of edges doesn't sound much, but it is OK to keep the comment. Just don't use the word profile as it makes it sound like lld is reading the direct output of perf or similar.

204

This is not tested. All tests pass if I replace it with an assert.

espindola added inline comments.Apr 12 2018, 9:29 AM
test/ELF/cgprofile-txt.s
8 ↗(On Diff #142093)

This should be >>, no?

espindola added inline comments.Apr 12 2018, 10:16 AM
ELF/CallGraphSort.cpp
72

This structure is almost a perfect duplicate of Cluster. That is not too surprising since each cluster starts as a single section.

In the traditional description of union-find, there is only one type of element which gets combined as union operations are performed.

I was able to remove this struct and I think it makes the code quite a bite closer to a description of the union-find structure and easier to read. The patch on top of this is at https://reviews.llvm.org/D45577.

290

BTW, this is a very interesting problem.

The traditional disjoint-set/union-find uses unordered sets which are easy to represent with trees with only parent pointers.

Changing it to represent order in the sets and still be efficient is an interesting challenge (not for this patch).

Bigcheese updated this revision to Diff 142272.Apr 12 2018, 3:15 PM

Addressed review comments.

Bigcheese marked 47 inline comments as done.Apr 12 2018, 3:19 PM
Bigcheese added inline comments.
ELF/CallGraphSort.cpp
33

I've made it clear it's talking about the call graph profile, not whatever profile that was generated from.

espindola added inline comments.Apr 12 2018, 9:32 PM
ELF/CallGraphSort.cpp
199

Instead of storing a normalized edge value it is simpler to store the original cluster weight: https://reviews.llvm.org/D45607

258

This is not tested.

This is not in the paper, right? Do you know what is the motivation? It is interesting in that the algorithm prioritizes hot nodes, but has a limit on how cold an edge can be.

It does seem to have a small performance win, at lest on the LTOing FileCheck testcase. With this patch a run takes 2.519023951 seconds, with this check removed it takes 2.520792327 (no sorting is 2.575764375).

espindola added inline comments.Apr 13 2018, 9:15 AM
ELF/CallGraphSort.cpp
71

Succs is not used, please delete.

espindola added inline comments.Apr 13 2018, 10:22 AM
ELF/Config.h
111

Using Symbol in here causes some duplication. We support duplicated edges, and we have to support two symbols pointing to the same section. This means two maps, this one for symbols and one for edges from section to section.

If this map is changed to directly map from section to section, we can avoid the second map completely:

https://reviews.llvm.org/D45630

Bigcheese updated this revision to Diff 142708.Apr 16 2018, 3:23 PM
Bigcheese marked an inline comment as done.

Address review comments.

Bigcheese marked 4 inline comments as done.Apr 16 2018, 3:25 PM
espindola accepted this revision.Apr 16 2018, 5:45 PM

LGTM with a few last requests.

ELF/CallGraphSort.cpp
181

Since this is looking at a section we haven't merged to a predecessor yet this can be just

Cluster &C = Clusters[SI];

186

This can be just C.Preds.

193

You don't think you need this if. If there are no predecessors, BestWeight will be 0 and so the following if will continue since InitialWeight is unsigned.

207

All tests pass if I delete this if. Please add a test.

213

All tests pass if I delete this loop. Please add a test.

225

NodeIndex is not used elsewhere in the patch. What this (and the previous remove_if) invalidate are the cluster to cluster edges.

This revision was not accepted when it landed; it landed in state Needs Review.Apr 17 2018, 4:33 PM
This revision was automatically updated to reflect the committed changes.