Page MenuHomePhabricator

[lld] Make -z keep-text-section-prefix recognize .text.split. as a prefix.
ClosedPublic

Authored by snehasish on Sep 17 2020, 10:07 AM.

Details

Summary

".text.split." holds symbols which are split out from functions in
other input sections. For example, with -fsplit-machine-functions,
placing the cold parts in .text.split instead of .text.unlikely mitigates
against poor profile inaccuracy. Techniques such as hugepage remapping can
make conservative decisions at the section granularity.
Additionally we find small improvement in icache and tlb metrics (3-5%)
due to improved locality.

Diff Detail

Event Timeline

snehasish created this revision.Sep 17 2020, 10:07 AM
snehasish requested review of this revision.Sep 17 2020, 10:07 AM

Carrying over the discussion from D87813 since it's more appropriate here:

@MaskRay

For your future lld patch: why can't the split sections be placed in .test.cold? This just affects how -z keep-text-section-prefix groups input sections into output sections.

The intent here is to create a new "known" output section prefix which holds the split parts of functions. The rationale for this is outlined below.

@tmsriram

Is there a good reason to put this into .text.split? Does having a new output section give you more leverage on how to manage the mapping of such code?

Yes, having them placed in a separate section does provide additional leverage. In particular your suggestion of placing the split parts in .text would benefit from hugepages, however this may result in suboptimal icache performance since the split parts are interspersed across other code.

Interaction with hugepages

Placing the split parts in a separate section allows us to experiment whether keeping them on hugepages is beneficial similar. For example, we may choose to unlock for FDO targets but keep on hugepages for AFDO to mitigate against profile quality issues.

Loss of locality

From our experiments we found that keeping them in a separate section (as opposed to placing in .text.unlikely or .text) improved metrics such as icache and itlb misses. For itlb misses, it's due to the fact that the split parts are placed on hugepages vs regular pages. For icache misses, we posit that the split parts are distributed across the section reducing locality. The loss here is significant and can be up to a 5% difference (Search B, L2i miss). Note that in this experiment we apply an aggressive 99% threshold for splitting out cold blocks.

.text.unlikely.text.split
Search ASearch BSearch ASearch B
l1i_miss3.83-1.700.65-5.17
l2_miss5.486.930.641.68
itlb_miss-32.27-10.25-35.41-15.90
stlb_miss-59.39-42.36-67.56-62.20
Ease of monitoring hotness

Another benefit of placing them in a separate section is ease of monitoring by collecting sampling data from production. We can keep an eye on the hotness of the split output section to ensure that our tuning thresholds for splitting are optimal for the fleet. While I believe it is still possible to monitor split parts (disambiguate symbols using ".cold" suffix) for function splitting, it is more tedious.

This looks good to me. Please add a test to test/ELF/text-section-prefix.s

lld/ELF/Writer.cpp
138

Additionally we find small improvement in icache and tlb metrics due to improved locality. might be too specific. It can probably be removed.

[lld] Add a new known text prefix - ".text.split."

Make -z keep-text-section-prefix recognize .text.split. as a prefix

snehasish updated this revision to Diff 294176.Sep 24 2020, 2:44 PM

Update description, add test.

  • Update commit message and comment.
  • Add a test for -z keep-text-section-prefix.
snehasish retitled this revision from [lld] Add a new known text prefix - ".text.split." to [lld] Make -z keep-text-section-prefix recognize .text.split. as a prefix..Sep 24 2020, 2:44 PM
snehasish marked an inline comment as done.

Updated description and tests, PTAL thanks!

MaskRay accepted this revision.Sep 24 2020, 2:53 PM

LGTM.

This revision is now accepted and ready to land.Sep 24 2020, 2:53 PM