".text.split." holds symbols which are split out from functions in
other input sections. For example, with -fsplit-machine-functions,
placing the cold parts in .text.split instead of .text.unlikely mitigates
against poor profile inaccuracy. Techniques such as hugepage remapping can
make conservative decisions at the section granularity.
Additionally we find small improvement in icache and tlb metrics (3-5%)
due to improved locality.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Carrying over the discussion from D87813 since it's more appropriate here:
For your future lld patch: why can't the split sections be placed in .test.cold? This just affects how -z keep-text-section-prefix groups input sections into output sections.
The intent here is to create a new "known" output section prefix which holds the split parts of functions. The rationale for this is outlined below.
Is there a good reason to put this into .text.split? Does having a new output section give you more leverage on how to manage the mapping of such code?
Yes, having them placed in a separate section does provide additional leverage. In particular your suggestion of placing the split parts in .text would benefit from hugepages, however this may result in suboptimal icache performance since the split parts are interspersed across other code.
Interaction with hugepages
Placing the split parts in a separate section allows us to experiment whether keeping them on hugepages is beneficial similar. For example, we may choose to unlock for FDO targets but keep on hugepages for AFDO to mitigate against profile quality issues.
Loss of locality
From our experiments we found that keeping them in a separate section (as opposed to placing in .text.unlikely or .text) improved metrics such as icache and itlb misses. For itlb misses, it's due to the fact that the split parts are placed on hugepages vs regular pages. For icache misses, we posit that the split parts are distributed across the section reducing locality. The loss here is significant and can be up to a 5% difference (Search B, L2i miss). Note that in this experiment we apply an aggressive 99% threshold for splitting out cold blocks.
.text.unlikely | .text.split | |||
---|---|---|---|---|
Search A | Search B | Search A | Search B | |
l1i_miss | 3.83 | -1.70 | 0.65 | -5.17 |
l2_miss | 5.48 | 6.93 | 0.64 | 1.68 |
itlb_miss | -32.27 | -10.25 | -35.41 | -15.90 |
stlb_miss | -59.39 | -42.36 | -67.56 | -62.20 |
Ease of monitoring hotness
Another benefit of placing them in a separate section is ease of monitoring by collecting sampling data from production. We can keep an eye on the hotness of the split output section to ensure that our tuning thresholds for splitting are optimal for the fleet. While I believe it is still possible to monitor split parts (disambiguate symbols using ".cold" suffix) for function splitting, it is more tedious.
This looks good to me. Please add a test to test/ELF/text-section-prefix.s
lld/ELF/Writer.cpp | ||
---|---|---|
138 | Additionally we find small improvement in icache and tlb metrics due to improved locality. might be too specific. It can probably be removed. |
[lld] Add a new known text prefix - ".text.split."
Make -z keep-text-section-prefix recognize .text.split. as a prefix
Update description, add test.
- Update commit message and comment.
- Add a test for -z keep-text-section-prefix.
Additionally we find small improvement in icache and tlb metrics due to improved locality. might be too specific. It can probably be removed.