This is an archive of the discontinued LLVM Phabricator instance.

ELF: Place ordered sections in the middle of the unordered section list on targets with limited-range branches.
ClosedPublic

Authored by pcc on Mar 27 2018, 8:59 PM.

Download Raw Diff

Details

Reviewers

ruiu
• espindola
peter.smith

Commits

rG5ea6d50af113: ELF: Place ordered sections in the middle of the unordered section list on…
rLLD328905: ELF: Place ordered sections in the middle of the unordered section list on…
rL328905: ELF: Place ordered sections in the middle of the unordered section list on…

Summary

It generally does not matter much where we place sections ordered
by --symbol-ordering-file relative to other sections. But if the
ordered sections are hot (which is the case already for some users
of --symbol-ordering-file, and is increasingly more likely to be
the case once profile-guided section layout lands) and the target
has limited-range branches, it is beneficial to place the ordered
sections in the middle of the output section in order to decrease
the likelihood that a range extension thunk will be required to call
a hot function from a cold function or vice versa.

That is what this patch does. After D44966 it reduces the size of
Chromium for Android's .text section by 60KB.

Diff Detail

Build Status

Buildable 16491
Build 16491: arc lint + arc unit

Event Timeline

pcc created this revision.Mar 27 2018, 8:59 PM

Herald added subscribers: mgrang, arichardson, javed.absar, emaste. · View Herald TranscriptMar 27 2018, 8:59 PM

Harbormaster completed remote builds in B16491: Diff 140036.Mar 27 2018, 8:59 PM

grimar added a subscriber: grimar.Mar 28 2018, 1:39 AM

That looks reasonable to me. Another heuristic that can be used in at least C programs is to look for the sections with the most number of calls to them and place these in the centre. This tends to reduce the number of thunks as many library functions are called from all over the program, however I think the performance loss of going through a thunk means that moving the hot functions makes more sense.

I don't think I understand why this patch can reduce size of generated executables. This may increase performance because we use short branches to jump to hot functions, but how can it reduce the size of thunks?

lld/ELF/Writer.cpp
1088	Can you add a comment?
1091	nit: I'd write it in two lines.

Imagine that you have 8MB of hot code and 32MB of cold code. If the layout is:

8MB hot
32MB cold

only the first 8-16MB of the cold code (depending on which hot function it is actually calling) can call the hot code without a range extension thunk. However, if we use this layout:

16MB cold
8MB hot
16MB cold

both the last 8-16MB of the first block of cold code and the first 8-16MB of the second block of cold code can call the hot code without a thunk. So we effectively double the amount of code that could potentially call into the hot code without a thunk, reducing the number of thunks that we need.

Address review comments

Harbormaster completed remote builds in B16597: Diff 140465.Mar 30 2018, 12:11 PM

pcc marked 2 inline comments as done.Mar 30 2018, 12:11 PM

Ah, so the assumption I was missing is that hot functions are called from many more places than cold functions. That's perhaps obvious, but that's not necessarily true. Could you add your explanation as a comment?

Add comment

Harbormaster completed remote builds in B16601: Diff 140480.Mar 30 2018, 1:35 PM

LGTM

Thanks!

This revision is now accepted and ready to land.Mar 30 2018, 2:31 PM

Closed by commit rL328905: ELF: Place ordered sections in the middle of the unordered section list on… (authored by pcc). · Explain WhyMar 30 2018, 2:39 PM

This revision was automatically updated to reflect the committed changes.

pcc mentioned this in D45841: Keep the output text sections with prefixes ".text.hot" , ".text.unlikely", ".text.startup", ".text.exit" separate.Apr 19 2018, 3:46 PM

MaskRay mentioned this in D128382: [LLD] Two tweaks to symbol ordering scheme.Jun 22 2022, 3:14 PM

Revision Contents

Path

Size

lld/

ELF/

Writer.cpp

45 lines

test/

ELF/

arm-symbol-ordering-file.s

31 lines

Diff 140036

lld/ELF/Writer.cpp

Show First 20 Lines • Show All 1,079 Lines • ▼ Show 20 Lines	static DenseMap<const InputSectionBase *, int> buildSectionOrder() {
if (Config->WarnSymbolOrdering)		if (Config->WarnSymbolOrdering)
for (auto OrderEntry : SymbolOrder)		for (auto OrderEntry : SymbolOrder)
if (!OrderEntry.second.Present)		if (!OrderEntry.second.Present)
warn("symbol ordering file: no such symbol: " + OrderEntry.first);		warn("symbol ordering file: no such symbol: " + OrderEntry.first);

return SectionOrder;		return SectionOrder;
}		}

		static void
		ruiuUnsubmitted Done Reply Inline Actions Can you add a comment? ruiu: Can you add a comment?
		sortISDBySectionOrder(InputSectionDescription *ISD,
		const DenseMap<const InputSectionBase *, int> &Order) {
		std::vector<InputSection *> UnorderedSections, OrderedSections;
		ruiuUnsubmitted Done Reply Inline Actions nit: I'd write it in two lines. ruiu: nit: I'd write it in two lines.
		uint64_t UnorderedSize = 0;

		for (InputSection *IS : ISD->Sections) {
		if (!Order.count(IS)) {
		UnorderedSections.push_back(IS);
		UnorderedSize += IS->getSize();
		continue;
		}
		OrderedSections.push_back(IS);
		}
		std::sort(OrderedSections.begin(), OrderedSections.end(),
		[&](InputSection A, InputSection B) {
		return Order.lookup(A) < Order.lookup(B);
		});

		// Find an insertion point for the ordered section list in the unordered
		// section list. On targets with limited-range branches, this is the mid-point
		// of the unordered section list. This decreases the likelihood that a range
		// extension thunk will be needed to enter or exit the ordered region.
		size_t UnorderedInsPt = 0;
		if (Target->ThunkSectionSpacing && !OrderedSections.empty()) {
		uint64_t UnorderedPos = 0;
		for (; UnorderedInsPt != UnorderedSections.size(); ++UnorderedInsPt) {
		UnorderedPos += UnorderedSections[UnorderedInsPt]->getSize();
		if (UnorderedPos > UnorderedSize / 2)
		break;
		}
		}

		std::copy(UnorderedSections.begin(),
		UnorderedSections.begin() + UnorderedInsPt, ISD->Sections.begin());
		std::copy(OrderedSections.begin(), OrderedSections.end(),
		ISD->Sections.begin() + UnorderedInsPt);
		std::copy(UnorderedSections.begin() + UnorderedInsPt, UnorderedSections.end(),
		ISD->Sections.begin() + UnorderedInsPt + OrderedSections.size());
		}

static void sortSection(OutputSection *Sec,		static void sortSection(OutputSection *Sec,
const DenseMap<const InputSectionBase *, int> &Order) {		const DenseMap<const InputSectionBase *, int> &Order) {
StringRef Name = Sec->Name;		StringRef Name = Sec->Name;

// Sort input sections by section name suffixes for		// Sort input sections by section name suffixes for
// __attribute__((init_priority(N))).		// __attribute__((init_priority(N))).
if (Name == ".init_array" \|\| Name == ".fini_array") {		if (Name == ".init_array" \|\| Name == ".fini_array") {
if (!Script->HasSectionsCommand)		if (!Script->HasSectionsCommand)
Show All 10 Lines	static void sortSection(OutputSection *Sec,

// Never sort these.		// Never sort these.
if (Name == ".init" \|\| Name == ".fini")		if (Name == ".init" \|\| Name == ".fini")
return;		return;

// Sort input sections by priority using the list provided		// Sort input sections by priority using the list provided
// by --symbol-ordering-file.		// by --symbol-ordering-file.
if (!Order.empty())		if (!Order.empty())
Sec->sort([&](InputSectionBase *S) { return Order.lookup(S); });		for (BaseCommand *B : Sec->SectionCommands)
		if (auto *ISD = dyn_cast<InputSectionDescription>(B))
		sortISDBySectionOrder(ISD, Order);
}		}

// If no layout was provided by linker script, we want to apply default		// If no layout was provided by linker script, we want to apply default
// sorting for special input sections. This also handles --symbol-ordering-file.		// sorting for special input sections. This also handles --symbol-ordering-file.
template <class ELFT> void Writer<ELFT>::sortInputSections() {		template <class ELFT> void Writer<ELFT>::sortInputSections() {
// Build the order once since it is expensive.		// Build the order once since it is expensive.
DenseMap<const InputSectionBase *, int> Order = buildSectionOrder();		DenseMap<const InputSectionBase *, int> Order = buildSectionOrder();
for (BaseCommand *Base : Script->SectionCommands)		for (BaseCommand *Base : Script->SectionCommands)
▲ Show 20 Lines • Show All 1,144 Lines • Show Last 20 Lines

lld/test/ELF/arm-symbol-ordering-file.s

This file was added.

				# RUN: llvm-mc -filetype=obj -triple=armv7-unknown-linux %s -o %t.o

				# RUN: echo ordered > %t_order.txt
				# RUN: ld.lld --symbol-ordering-file %t_order.txt %t.o -o %t2.out
				# RUN: llvm-nm -n %t2.out \| FileCheck %s

				# CHECK: unordered1
				# CHECK-NEXT: unordered2
				# CHECK-NEXT: unordered3
				# CHECK-NEXT: ordered
				# CHECK-NEXT: unordered4

				.section .foo,"ax",%progbits,unique,1
				unordered1:
				.zero 1

				.section .foo,"ax",%progbits,unique,2
				unordered2:
				.zero 1

				.section .foo,"ax",%progbits,unique,3
				unordered3:
				.zero 2

				.section .foo,"ax",%progbits,unique,4
				unordered4:
				.zero 4

				.section .foo,"ax",%progbits,unique,5
				ordered:
				.zero 1