This is an archive of the discontinued LLVM Phabricator instance.

[ELF] A new code layout algorithm for function reordering [3a/3]
ClosedPublic

Authored by spupyrev on Jun 13 2023, 10:56 AM.

Details

Summary

We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.
Refer to https://reviews.llvm.org/D152834 for the actual implementation of the
reordering algorithm.

This diff adds a linker option to replace the existing C^3 heuristic with CDS.
The new behavior can be turned on by passing "--use-cache-directed-sort".
(the plan is to make it default in a next diff)

Perf-impact
clang-10 binary (built with LTO+AutoFDO/CSSPGO): wins on top of C^3 in [0.3%..0.8%]
rocksDB-8 binary (built with LTO+CSSPGO): wins on top of C^3 in [0.8%..1.5%]

Note that function layout affects the perf the most on older machines (with
smaller instruction/iTLB caches) and when huge pages are not enabled. The impact
on newer processors with huge pages enabled is likely neutral/minor.

Diff Detail

Event Timeline

spupyrev created this revision.Jun 13 2023, 10:56 AM
Herald added a project: Restricted Project. · View Herald Transcript
spupyrev retitled this revision from A new code layout algorithm for function reordering [3/3] to A new code layout algorithm for function reordering [3a/3].Jun 13 2023, 1:35 PM
spupyrev edited the summary of this revision. (Show Details)
spupyrev edited the summary of this revision. (Show Details)Jun 13 2023, 1:46 PM
spupyrev published this revision for review.Jun 14 2023, 12:15 PM
spupyrev added reviewers: wenlei, hoy, wlei.
Herald added a project: Restricted Project. · View Herald TranscriptJun 14 2023, 12:15 PM
rahmanl added inline comments.Aug 10 2023, 12:48 PM
lld/ELF/CallGraphSort.cpp
324

Add a function comment for this.

334

It doesn't look like this can take configurations by flags?

341–343
358
lld/ELF/CallGraphSort.h
17

Why are we abbreviating here? It's not clear to me what CDS stands for.

spupyrev planned changes to this revision.Aug 17 2023, 7:51 AM
Kobzol added a subscriber: Kobzol.Sep 4 2023, 2:23 PM
spupyrev marked 5 inline comments as done.Sep 11 2023, 6:51 AM
spupyrev updated this revision to Diff 556427.Sep 11 2023, 6:54 AM

addressing comments, adding a test for cdsort

Thank you for contributing a new algorithm and sorry for not commenting earlier.

A new code layout algorithm for function reordering [3a/3]

lld/ELF patches are usually tagged [ELF] . You should be able to find many examples with git log -- lld/ELF. [3a/3] seems unclear.
I think it's much better to mention https://reviews.llvm.org/D152834 in the summary as [3a/3] isn't clear whether 1/3 and 2/3 are :)

I'd like to know a plan whether we are going to remove the existing hfsort heuristic, and state this in the summary.
In D152834 you mentioned it, but many folks reading lld including me do not follow bolt.

--use-cdsort is the default and overrides the existing hfsort. This should be stated as well.

I will need some time to read D152834.

lld/ELF/CallGraphSort.cpp
277

lld/ code uses variableName. Please fix.

MaskRay added inline comments.Sep 12 2023, 9:12 PM
lld/ELF/CallGraphSort.cpp
8–9

This large comment is now stale.

277

static void

312

We don't add a blank line after variable declarations.

317

inline the used-once variable.

lld/ELF/Options.td
340

New options use BB.

spupyrev updated this revision to Diff 556720.Sep 13 2023, 2:04 PM
spupyrev marked 5 inline comments as done.

Thanks for looking into this. I made the requested changes

spupyrev retitled this revision from A new code layout algorithm for function reordering [3a/3] to [ELF] A new code layout algorithm for function reordering [3a/3].Sep 13 2023, 2:11 PM
spupyrev edited the summary of this revision. (Show Details)
spupyrev updated this revision to Diff 556722.Sep 13 2023, 2:14 PM

rebasing to latest

Thanks for the update.

https://maskray.me/blog/2021-08-08-toolchain-testing#the-test-checks-at-the-wrong-layer We need some unittests in llvm/. I created D159527

lld/ELF/CMakeLists.txt
75

move after TargetParser for an alphabetical order.

lld/ELF/CallGraphSort.cpp
9–12

using LLVM call graph profile data ...

12

TLB misses and i-cache misses

277

lld code prefers SmallVector<X, 0> for a smaller size and usually better performance.

D159526 will change these to use ArrayRef, enabling SmallVector<X, 0> at call sites.

279

the nested pair feels unnatural to me. I updated D159526 to change the element type to a std::tuple

317

This needs a comment: // Assume that the jump is at the middle of the input section. The profile data does not contain jump offsets.

332

buildCallGraph is called once and the conventional style just remove the extra function.

lld/ELF/CallGraphSort.h
17

This does not need to be a public API. Apply seems less appropriate in the context. I created D159526 to rename this to compute*

lld/ELF/Options.td
340

I feel that this abbreviated option name isn't intuitive.

A similar --call-graph-profile-sort intentionally picks a long but descriptive name. I think --use-cache-directed-sort will be better.

I suggest we make --no-use-cache-directed-sort the default and then change the default in another patch.

341

Reorder input sections with cache-directed sort.

spupyrev updated this revision to Diff 556945.Sep 18 2023, 6:56 AM
spupyrev marked 6 inline comments as done.

comments

spupyrev edited the summary of this revision. (Show Details)Sep 18 2023, 6:57 AM
spupyrev edited the summary of this revision. (Show Details)
spupyrev updated this revision to Diff 556947.Sep 18 2023, 7:02 AM
spupyrev marked an inline comment as done.

renaming the option

Thanks for the update. I am mainly waiting for the renamed function (applyCDSort does not convey the meaning well), changes of nested std::pair and std::vector => SmallVector<*, 0>

Thanks for the update. I am mainly waiting for the renamed function (applyCDSort does not convey the meaning well), changes of nested std::pair and std::vector => SmallVector<*, 0>

I have an impression the changes are made in the follow-up D159526 (which I'd prefer to land after this one). Do I miss something?

Thanks for the update. I am mainly waiting for the renamed function (applyCDSort does not convey the meaning well), changes of nested std::pair and std::vector => SmallVector<*, 0>

I have an impression the changes are made in the follow-up D159526 (which I'd prefer to land after this one). Do I miss something?

You are right. You may rebase this patch on top of D159526 :)

Thanks for the update. I am mainly waiting for the renamed function (applyCDSort does not convey the meaning well), changes of nested std::pair and std::vector => SmallVector<*, 0>

I have an impression the changes are made in the follow-up D159526 (which I'd prefer to land after this one). Do I miss something?

You are right. You may rebase this patch on top of D159526 :)

I suggest to rebase D159526 on top of this one

MaskRay added a comment.EditedSep 20 2023, 3:51 PM

Thanks for the update. I am mainly waiting for the renamed function (applyCDSort does not convey the meaning well), changes of nested std::pair and std::vector => SmallVector<*, 0>

I have an impression the changes are made in the follow-up D159526 (which I'd prefer to land after this one). Do I miss something?

You are right. You may rebase this patch on top of D159526 :)

I suggest to rebase D159526 on top of this one

D159526 performed the rename. If this patch is applied, applyCDSort in lld/ELF will need to be updated again. This is exactly the scenario I want to avoid.
D159527 added a unittest, which I consider necessary for making more application of the new layout algorithm.
(lld/test/ELF is not supposed to test functional change in llvm/ https://maskray.me/blog/2021-08-08-toolchain-testing#the-test-checks-at-the-wrong-layer)

D159526 performed the rename. If this patch is applied, applyCDSort in lld/ELF will need to be updated again. This is exactly the scenario I want to avoid.

Can you explain why renaming a function might be a problem and we may want to avoid this either in the linker or in the compiler?

I am not fully convinced that the newly proposed naming is preferred over the existing one. The new algorithm is called "cdsort" (which mimics the existing and commonly accepted name of "hfsort"), and it makes a sense to call the corresponding function "applyCDSort" as in "applyAlgorithmX". In contrast, "computeXXXlayout" might feel confusing, since the word "layout" often means a process.

uabelho removed a subscriber: uabelho.Sep 21 2023, 1:35 AM

D159526 performed the rename. If this patch is applied, applyCDSort in lld/ELF will need to be updated again. This is exactly the scenario I want to avoid.

Can you explain why renaming a function might be a problem and we may want to avoid this either in the linker or in the compiler?

I am not fully convinced that the newly proposed naming is preferred over the existing one. The new algorithm is called "cdsort" (which mimics the existing and commonly accepted name of "hfsort"), and it makes a sense to call the corresponding function "applyCDSort" as in "applyAlgorithmX". In contrast, "computeXXXlayout" might feel confusing, since the word "layout" often means a process.

I think apply* is often used when some structs are changed, e.g. applyDomTreeUpdates.
If we have a section order, then we use that to change the list of sections, we can say apply....
If we just compute the section order, apply* is not appropriate.
compute* is the conventional name here.

I have also explained that SmallVector<*, 0> is preferred in lld/ELF code, so I have a vector => ArrayRef refactoring.
We should also get rid of buildCallGraph, which uses a number of output parameters, while the function can just be inlined into apply* (to be renamed).

MaskRay added a comment.EditedSep 22 2023, 10:08 AM

Instead of having a --use-*, the best is probably to create a JJ<"call-graph-profile-sort=", ...> (values: hfsort,cdsort),
make the existing --call-graph-profile-sort an alias (FF<"call-graph-profile-sort", ...>, Alias<call_graph_profile_sort>, AliasArgs<"hfsort">),
then add a BB for --no-call-graph-profile-sort.

(I'll be out of town during 9-29 ~ 10-9)

Instead of having a --use-*, the best is probably to create a JJ<"call-graph-profile-sort=", ...> (values: hfsort,cdsort),
make the existing --call-graph-profile-sort an alias (FF<"call-graph-profile-sort", ...>, Alias<call_graph_profile_sort>, AliasArgs<"hfsort">),
then add a BB for --no-call-graph-profile-sort.

(I'll be out of town during 9-29 ~ 10-9)

See D159544

spupyrev updated this revision to Diff 557318.Sep 25 2023, 10:41 AM
spupyrev marked 3 inline comments as done.

nit

Thanks for the update.

lld/ELF/CallGraphSort.cpp
278

Nit: delete this comment.

"// Create the graph." below is clear enough.

283

sections can use SmallVector as well.

We can use one type for multiple variables

SmallVector<uint64_t, 0> funcSizes, funcCounts, callOffsets;`
lld/ELF/Config.h
61–62

update this comment

spupyrev updated this revision to Diff 557331.Sep 25 2023, 2:53 PM
spupyrev marked 3 inline comments as done.

nits

MaskRay accepted this revision.Sep 25 2023, 3:00 PM

lld/docs/ld.lld.1 needs an update as well. The format is difficult to write... you can use man lld/docs/ld.lld.1 to check whether it looks right.

This revision is now accepted and ready to land.Sep 25 2023, 3:00 PM
spupyrev updated this revision to Diff 557339.Sep 25 2023, 3:42 PM

edited ld.lld.1

MaskRay accepted this revision.Sep 25 2023, 3:53 PM

LGTM! Thanks for bearing with me about all the nits.

I'd like to have tests at the proper layer. Does D159527 look good to you?