Download Raw Diff

Details

Reviewers

paquette
kyulee
smeenai
plotfi

Commits

rG87c0f6773970: [Outliner] Add an option to only enable outlining of patterns above a certain…

Summary

Outlining isn't always a win when the saved instruction count is >= 1.
The overhead of representing a new function in the binary depends on
exception metadata and alignment. So parameterize this for local tuning.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lanza created this revision.Oct 26 2022, 10:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 10:20 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

lanza requested review of this revision.Oct 26 2022, 10:20 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 10:20 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

I've found internally that the value here has heavy variance. ~35 is best for uncompressed code on aarch64 Linux with .eh_frame. ~5 seems best for armv7 Linux with .eh_frame. If you're using compression then many outlinings are wasteful compared to compressibility and various project-specific results come out with this parameter finding optimal values up through many hundreds.

When we use our local tool called Superpack it makes just about all outlining patterns a net loss on the compressed representation, so it even makes sense to go into the thousands here.

What's everybody's thoughts on adding such a flag?

Harbormaster completed remote builds in B194443: Diff 470855.Oct 26 2022, 11:46 AM

This makes sense to me, but we should probably add a test case?

In D136774#3887013, @smeenai wrote:

This makes sense to me, but we should probably add a test case?

Yup, will do. Figure I'd get consensus that this is acceptable by upstream first.

Ping @paquette.

This seems like a pretty simple and non-intrusive method to be able to tune the outliner. LGTM if you add a test case.

Add test

cc @kyulee

Harbormaster completed remote builds in B220177: Diff 506239.Mar 17 2023, 5:57 PM

Overall it looks good to me. Just simplify the test case that is more resilient to the future changes.

llvm/test/CodeGen/AArch64/machine-outliner-threshold.ll
6 ↗	(On Diff #506239)	; CHECK-NOT: OUTLINED_FUNCTION
25 ↗	(On Diff #506239)	; CHECK-NOT: OUTLINED_FUNCTION
92 ↗	(On Diff #506239)	I would simplify all the following without matching align and the body of outlined function which can vary. ; CHECK: [[OUTLINED]]: ; ALL-DAG: [[OUTLINED]]: ; ALL-DAG: [[OUTLINED2]]:
117 ↗	(On Diff #506239)	Is this necessary?

The overhead of representing a new function in the binary depends on exception metadata and alignment

Is it possible to know this without introducing a heuristic?

llvm/lib/CodeGen/MachineOutliner.cpp
121
llvm/test/CodeGen/AArch64/machine-outliner-threshold.ll
1 ↗	(On Diff #506239)	MIR testcase?

plotfi edited reviewers, added: plotfi; removed: zer0.Mar 17 2023, 6:26 PM

plotfi added inline comments.

llvm/lib/CodeGen/MachineOutliner.cpp
121	What is the size in? Instructions? Bytes?

Fix a few things

Harbormaster completed remote builds in B222903: Diff 509884.Mar 30 2023, 8:02 PM

Update for std::optional usage change

Harbormaster completed remote builds in B222907: Diff 509888.Mar 30 2023, 8:42 PM

My main concern with this patch is that I'd actually like to avoid adding new heuristics to the outliner in general.

The main benefit of having a late outliner is to leverage accurate information about what will actually be emitted by the compiler.

The overhead of representing a new function in the binary depends on exception metadata and alignment

Is it impossible to model this in the compiler? Or do we absolutely need a knob for this?

In general, I think it'd be best to improve the outliner cost model accuracy wherever possible, just because that is an option at this level of representation. But if it's absolutely necessary to add a heuristic, I'm not totally against it,

llvm/test/CodeGen/AArch64/machine-outliner-threshold.ll
1 ↗	(On Diff #509888)	this test can be removed
llvm/test/CodeGen/AArch64/machine-outliner-threshold.mir
2	shouldn't need all of that in the runline
3
6	this global is not used
16	don't need noinline

lanza added a comment.Mar 31 2023, 12:15 PM

This comment was removed by lanza.

Is it impossible to model this in the compiler? Or do we absolutely need a knob for this? In general, I think it'd be best to improve the outliner cost model accuracy wherever possible, just because that is an option at this level of representation. But if it's absolutely necessary to add a heuristic, I'm not totally against it,

This is what I aimed to do initially. The problem I ran into is that coming up with a model against compression is chaotic in the mathematical sense. e.g. the results varied based on the contents of functions in *other libraries* which we obviously don't have visibility into during outlining. We compress our Android native libraries with SuperPack (https://engineering.fb.com/2021/09/13/core-data/superpack/) which concatenates all our libraries and then compresses them asymmetrically. So a change in libraryA changes the criteria for outlining decisions in libraryB.

The situation is possibly the same for iOS apps as well with IPA compression and any usage of dylibs, though I'm not particularly familiar with this domain.

You could definitely come up with a better heuristic to improve the uncompressed size results. e.g. with -outliner-benefit-threshold=5 I saw over twice the size savings than the default hardcoded 1 yielded. This could be modeled more directly by including, for example, the cost of unwinding info in the outliner cost modeling.

However, that would still be a very poor result if compressed size was important given that it depends

the compression tool used
the contents of the other libraries/assets being compressed along side
the actual valuation your use case has for compressed vs uncompressed code
etc

Across our different aarch64 apps I still saw different values prevailing as most effective. e.g. 200 for app1, 250 for app2 and 275 for app3 where the only meaningful variation between these apps would be the contents of other libraries. This is definitively not modelable within the compiler. So even with a "correct" uncompressed model we'd still need an override for compressed needs.

clean up test a bit

Harbormaster completed remote builds in B223717: Diff 510987.Apr 4 2023, 8:36 PM

Alright, considering this cannot be modeled inside the compiler, LGTM

This revision is now accepted and ready to land.Apr 5 2023, 10:16 AM

Remember to remove the IR test case before committing :)

llvm/test/CodeGen/AArch64/machine-outliner-threshold.mir
17	Typo: different

Yeah. Even though we may come up with a base cost model, we may still have a tuning parameter anyhow depending on target or objectives, which won't be much different than having a threshold like this case.
LGTM.

Closed by commit rG87c0f6773970: [Outliner] Add an option to only enable outlining of patterns above a certain… (authored by lanza). · Explain WhyApr 7 2023, 11:13 PM

This revision was automatically updated to reflect the committed changes.

lanza marked 2 inline comments as done.

lanza added a commit: rG87c0f6773970: [Outliner] Add an option to only enable outlining of patterns above a certain….

Diff 511853

llvm/lib/CodeGen/MachineOutliner.cpp

Show First 20 Lines • Show All 110 Lines • ▼ Show 20 Lines

/// Number of times to re-run the outliner. This is not the total number of runs

/// as the outliner will run at least one time. The default value is set to 0,

/// meaning the outliner will run one time and rerun zero times after that.

static cl::opt<unsigned> OutlinerReruns(

"machine-outliner-reruns", cl::init(0), cl::Hidden,

cl::desc(

"Number of times to rerun the outliner after the initial outline"));

static cl::opt<unsigned> OutlinerBenefitThreshold(

"outliner-benefit-threshold", cl::init(1), cl::Hidden,

cl::desc(

paquetteUnsubmitted

Done

"outliner-benefit-threshold", cl::init(1), cl::Hidden,

- cl::desc("The minimum size before an outlining candidate is accpeted"));

+ cl::desc("The minimum size before an outlining candidate is accepted"));

namespace {

paquette:

plotfiUnsubmitted

Done

What is the size in? Instructions? Bytes?

plotfi: What is the size in? Instructions? Bytes?

"The minimum size in bytes before an outlining candidate is accepted"));

namespace {

/// Maps \p MachineInstrs to unsigned integers and stores the mappings.

struct InstructionMapper {

/// The next available integer to assign to a \p MachineInstr that

/// cannot be outlined.

///

▲ Show 20 Lines • Show All 532 Lines • ▼ Show 20 Lines

std::optional<OutlinedFunction> OF =

TII->getOutliningCandidateInfo(CandidatesForRepeatedSeq);

// If we deleted too many candidates, then there's nothing worth outlining.

// FIXME: This should take target-specified instruction sizes into account.

if (!OF || OF->Candidates.size() < 2)

continue;

// Is it better to outline this candidate than not?

if (OF->getBenefit() < 1) {

if (OF->getBenefit() < OutlinerBenefitThreshold) {

emitNotOutliningCheaperRemark(StringLen, CandidatesForRepeatedSeq, *OF);

continue;

}

FunctionList.push_back(*OF);

}

▲ Show 20 Lines • Show All 159 Lines • ▼ Show 20 Lines

for (OutlinedFunction &OF : FunctionList) {

erase_if(OF.Candidates, [&UnsignedVecBegin](Candidate &C) {

return std::any_of(UnsignedVecBegin + C.getStartIdx(),

UnsignedVecBegin + C.getEndIdx() + 1, [](unsigned I) {

return I == static_cast<unsigned>(-1);

});

// If we made it unbeneficial to outline this function, skip it.

if (OF.getBenefit() < 1)

if (OF.getBenefit() < OutlinerBenefitThreshold)

continue;

// It's beneficial. Create the function and outline its sequence's

// occurrences.

OF.MF = createOutlinedFunction(M, OF, Mapper, OutlinedFunctionNum);

emitOutlinedFunctionRemark(OF);

FunctionsCreated++;

OutlinedFunctionNum++; // Created a function, move to the next name.

▲ Show 20 Lines • Show All 322 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/machine-outliner-threshold.mir

This file was added.

# RUN: llc -mtriple=aarch64--- -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s

# RUN: llc -mtriple=aarch64--- -outliner-benefit-threshold=10 -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s -check-prefix=THRESHOLD

paquetteUnsubmitted

Done

- # RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -run-pass=machine-outliner -verify-machineinstrs -frame-pointer=non-leaf %s -o - | FileCheck %s

+ # RUN: llc -mtriple=aarch64 -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s

# RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -outliner-benefit-threshold=10 -run-pass=machine-outliner -verify-machineinstrs -frame-pointer=non-leaf %s -o - | FileCheck %s -check-prefix=THRESHOLD

shouldn't need all of that in the runline

paquette: shouldn't need all of that in the runline

--- |

paquetteUnsubmitted

Done

# RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -run-pass=machine-outliner -verify-machineinstrs -frame-pointer=non-leaf %s -o - | FileCheck %s

- # RUN: llc -mtriple=aarch64--- -run-pass=prologepilog -outliner-benefit-threshold=10 -run-pass=machine-outliner -verify-machineinstrs -frame-pointer=non-leaf %s -o - | FileCheck %s -check-prefix=THRESHOLD

+ # RUN: llc -mtriple=aarch64 -outliner-benefit-threshold=10 -run-pass=machine-outliner -verify-machineinstrs %s -o - | FileCheck %s -check-prefix=THRESHOLD

--- |

paquette:

define void @baz() #0 {

ret void

paquetteUnsubmitted

Done

this global is not used

paquette: this global is not used

}

define void @bar(i32 %a) #0 {

ret void

}

attributes #0 = { noredzone }

...

---

# Check that two we outline two different sequences, one from bb1 and one from

paquetteUnsubmitted

Not Done

don't need noinline

paquette: don't need noinline

# bb2 when the threshold is 1.

smeenaiUnsubmitted

Not Done

Typo: different

smeenai: Typo: different

# CHECK-LABEL: bb.1:

# CHECK: BL @OUTLINED_FUNCTION_[[F0:[0-9]+]], implicit-def $lr, implicit $sp

# CHECK-NEXT: $w11 = ORRWri $wzr, 1

# CHECK-NEXT: $w11 = ORRWri $wzr, 2

# CHECK-NEXT: BL @baz, implicit-def dead $lr, implicit $sp

# CHECK-NEXT: BL @OUTLINED_FUNCTION_[[F0]], implicit-def $lr, implicit $sp

# CHECK-NEXT: $w11 = ORRWri $wzr, 1

# CHECK-NEXT: $w8 = ORRWri $wzr, 0

# CHECK-NOT: $w11 = KILL renamable $w11, implicit killed $w11

# CHECK-LABEL: bb.2:

# CHECK: BL @OUTLINED_FUNCTION_[[F1:[0-9]+]], implicit-def $lr, implicit $sp

# CHECK-NEXT: $w9 = ORRWri $wzr, 0

# CHECK-NEXT: BL @OUTLINED_FUNCTION_[[F1]], implicit-def $lr, implicit $sp

# CHECK-NEXT: $w8 = ORRWri $wzr, 0

# CHECK-NOT: $w11 = KILL renamable $w11, implicit killed $w11

# Check that the sequences in bb.2 don't get outlined with a threshold of 10 but

# the sequences in bb.1 do.

# THRESHOLD-LABEL: bb.1:

# THRESHOLD: BL @OUTLINED_FUNCTION_[[F0:[0-9]+]], implicit-def $lr, implicit $sp

# THRESHOLD-NEXT: $w11 = ORRWri $wzr, 1

# THRESHOLD-NEXT: $w11 = ORRWri $wzr, 2

# THRESHOLD-NEXT: BL @baz, implicit-def dead $lr, implicit $sp

# THRESHOLD-NEXT: BL @OUTLINED_FUNCTION_[[F0]], implicit-def $lr, implicit $sp

# THRESHOLD-NEXT: $w11 = ORRWri $wzr, 1

# THRESHOLD-NEXT: $w8 = ORRWri $wzr, 0

# THRESHOLD-NOT: $w11 = KILL renamable $w11, implicit killed $w11

# THRESHOLD-LABEL: bb.2:

# THRESHOLD-NOT: BL @OUTLINED_FUNCTION

name: bar

tracksRegLiveness: true

body: |

bb.0:

liveins: $w0, $lr, $w8

$sp = frame-setup SUBXri $sp, 32, 0

$fp = frame-setup ADDXri $sp, 16, 0

bb.1:

BL @baz, implicit-def dead $lr, implicit $sp

$w11 = ORRWri $wzr, 1

$w11 = KILL renamable $w11, implicit killed $w11

$w11 = ORRWri $wzr, 1

BL @baz, implicit-def dead $lr, implicit $sp

$w11 = ORRWri $wzr, 1

$w11 = ORRWri $wzr, 2

BL @baz, implicit-def dead $lr, implicit $sp

$w11 = ORRWri $wzr, 1

BL @baz, implicit-def dead $lr, implicit $sp

$w11 = ORRWri $wzr, 1

$w8 = ORRWri $wzr, 0

bb.2:

$w15 = ORRWri $wzr, 1

$x15 = ADDXri $sp, 48, 0;

$w9 = ORRWri $wzr, 0

$w15 = ORRWri $wzr, 1

$x15 = ADDXri $sp, 48, 0;

$w8 = ORRWri $wzr, 0

bb.3:

$fp, $lr = LDPXi $sp, 2

$sp = ADDXri $sp, 32, 0

RET undef $lr

...

---

name: baz

tracksRegLiveness: true

body: |

bb.0:

liveins: $w0, $lr, $w8

RET undef $lr

# CHECK-LABEL: name: OUTLINED_FUNCTION_{{[0-9]}}

# CHECK-LABEL: name: OUTLINED_FUNCTION_{{[1-9]}}

# THRESHOLD-LABEL: name: OUTLINED_FUNCTION_{{[0-9]}}

# THRESHOLD-NOT: name: OUTLINED_FUNCTION_{{[1-9]}}

This is an archive of the discontinued LLVM Phabricator instance.

[Outliner] Add an option to only enable outlining of patterns above a certain threshold
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 511853

llvm/lib/CodeGen/MachineOutliner.cpp

llvm/test/CodeGen/AArch64/machine-outliner-threshold.mir

This is an archive of the discontinued LLVM Phabricator instance.

[Outliner] Add an option to only enable outlining of patterns above a certain thresholdClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 511853

llvm/lib/CodeGen/MachineOutliner.cpp

llvm/test/CodeGen/AArch64/machine-outliner-threshold.mir

[Outliner] Add an option to only enable outlining of patterns above a certain threshold
ClosedPublic