This is an archive of the discontinued LLVM Phabricator instance.

A new code layout algorithm for function reordering [1/3]
ClosedPublic

Authored by spupyrev on Jun 13 2023, 10:13 AM.

Details

Summary

We are brining a new algorithm for function layout (reordering) based on the
call graph (extracted from a profile data). The algorithm is an improvement of
top of a known heuristic, C^3. It tries to co-locate hot and frequently executed
together functions in the resulting ordering. Unlike C^3, it explores a larger
search space and have an objective closely tied to the performance of
instruction and i-TLB caches. Hence, the name CDS = Cache-Directed Sort.
The algorithm can be used at the linking or post-linking (e.g., BOLT) stage.

This diff modifies the existing data structures to facilitate the implementation
(down the stack). This is a no-op change.

Diff Detail

Event Timeline

spupyrev created this revision.Jun 13 2023, 10:13 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 10:13 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
spupyrev published this revision for review.Jun 13 2023, 1:14 PM
spupyrev edited the summary of this revision. (Show Details)
spupyrev added reviewers: hoy, wenlei, wlei.
spupyrev added subscribers: Amir, maksfb.
Herald added a project: Restricted Project. · View Herald TranscriptJun 13 2023, 1:14 PM
hoy added inline comments.Jul 5 2023, 2:06 PM
llvm/lib/Transforms/Utils/CodeLayout.cpp
160

nit: by convention the T in the struct name is not needed. I'm seeing JumpT and a few other types getting T. Is there a particular reason?

990

I guess CDS is only for function layout, but do you think the name here should be adjusted in order to not be Exstsp-specific?

spupyrev added inline comments.Jul 5 2023, 3:22 PM
llvm/lib/Transforms/Utils/CodeLayout.cpp
160

I too don't like this T suffix but I found myself writing

for (Chain *Chain : SortedChains)

all over the place. It compiles but looks confusing. I strongly prefer ChainT *Chain over this instead.

Any other suggestions?

990

Can you clarify what "name" you refer to? (happy to rename or add a comment if there is a confusion)

hoy added inline comments.Jul 5 2023, 3:30 PM
llvm/lib/Transforms/Utils/CodeLayout.cpp
160

I usually do

for (auto *Chain : SortedChains)

or

for (Chain *C : SortedChains)
990

I mean applyExtTspLayout. Do you think it makes sense to rename it like applyCDSLayout?

spupyrev added inline comments.Jul 5 2023, 3:39 PM
llvm/lib/Transforms/Utils/CodeLayout.cpp
160

I've been using the former for a while but Some LLVM folks advised me against over-using auto for non-complex types. I've even seen constructs auto *Chain refactored to class Chain *Chain in some cases, which is worse than introducing ChainT, in my opinion.
I feel like providing full variable names, such as Chain in instead of C, yields more readable code, no?

990

These are different algorithms, one is for basic block layout, another one is for functions. Follow-up diff introduces applyCDSLayout in addition to applyExtTspLayout

hoy accepted this revision.Jul 5 2023, 3:49 PM
hoy added inline comments.
llvm/lib/Transforms/Utils/CodeLayout.cpp
160

Yeah, full name also sounds good to me. In that case, I would name Chain to BlockChain or something.

It may be not easy to get every case working like that. TBH, it's also annoying to me sometimes to come up with a tradeoff between type names and variable names :).

I'm not sure why type name suffix like T is not preferred in this code base. I don't have a strong opinion. It's up to you to check in as is or not.

This revision is now accepted and ready to land.Jul 5 2023, 3:49 PM
This revision was automatically updated to reflect the committed changes.