Page MenuHomePhabricator

yozhu (YongKang Zhu)
User

Projects

User does not belong to any projects.

User Details

User Since
Mar 16 2022, 11:17 AM (15 w, 12 h)

Recent Activity

Sun, Jun 26

yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

One interesting note Shoaib and I just discovered is that ~16.5% of the libraries in one of our major apps have .text sections starting on the first page of the dso. Placing hot sections at the front of the .text section would give us a slice of a "free page" that, with the current algorithm, would just be cold code that's less likely to be used. I imagine this isn't a substantial difference but it's probably slightly in favor of placing hot code first here.

Thanks. This is fine, even if not that large. The patch should use osec->flags & SHF_EXECINSTR instead of executableInputSections for simplification. Then the complexity/benefit trade-off will look fine.

I didn't check output section's attribute but instead check on each input section because:

  • It could be that the output section contains .text (RX) contributions as well as .rdata (R) contributions, like read-only data is merged into text section.

This implies a linker script placing rodata into an executable output section.
This is a corner case and the operation here does not matter. Just be simple.

Sun, Jun 26, 10:55 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

One interesting note Shoaib and I just discovered is that ~16.5% of the libraries in one of our major apps have .text sections starting on the first page of the dso. Placing hot sections at the front of the .text section would give us a slice of a "free page" that, with the current algorithm, would just be cold code that's less likely to be used. I imagine this isn't a substantial difference but it's probably slightly in favor of placing hot code first here.

Thanks. This is fine, even if not that large. The patch should use osec->flags & SHF_EXECINSTR instead of executableInputSections for simplification. Then the complexity/benefit trade-off will look fine.

Sun, Jun 26, 7:05 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

How does it save one page?

Hot code will always be placed together, so where it starts impact how many pages it will occupy. Moving it towards the beginning of the output section increases the possibility that one less page will be taken.

I am not sure this is true.
For -z separate-code layout, PT_LOAD program header has an aligned start address. I agree that placing hot code at the start may potentially remove one hot page.
For -z noseparate-code layout, I think we can construct a case that placing hot code at the start may use one more page.

This is to follow what ordering implies. For CISC machine target the linker won't do anything but just strictly follow the specified order.

For the -z noseparate-code case, it would be the same probability for the impact on number of pages used by hot contributions. We don't know where in a page .text section will start, so it will be purely random or with equal probability that hot contribution may take one more page, one less page, or take same number of pages, between placing hot contributions at section start (like what the linker does for CISC machines) and shuffling some amount of cold contributions before hot ones.

`-z noseparate-code is the default case. Since it is purely random, this patch does not justify its usefulness.

The current logic is to avoid generating branch thunks only. So if we know branch thunk is no needed, why bother doing the extra shuffling working?

The patch introduced some non-trivial complexity and therefore it needs to justify it. For non-code sections I agree that not shuffling would look better but that only suggests that osec->flags & SHF_EXECINSTR (osec needs to passed from the caller of sortISDBySectionOrder) is fine, not executableInputSections > 1.

And for the -z separarte-code case, it is guaranteed to be better (or being same in rare cases).

Yes. With no measurement we can't accept an arbitrary change which claims to be better.

Sun, Jun 26, 6:58 PM · Restricted Project, Restricted Project

Thu, Jun 23

yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

How does it save one page?

Hot code will always be placed together, so where it starts impact how many pages it will occupy. Moving it towards the beginning of the output section increases the possibility that one less page will be taken.

I am not sure this is true.
For -z separate-code layout, PT_LOAD program header has an aligned start address. I agree that placing hot code at the start may potentially remove one hot page.
For -z noseparate-code layout, I think we can construct a case that placing hot code at the start may use one more page.

This is to follow what ordering implies. For CISC machine target the linker won't do anything but just strictly follow the specified order.

Thu, Jun 23, 3:55 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

This patch tries to adjust D44969 heuristics a bit.

This is not necessary if less than two input sections contain anything that is executable, for example, when user specifies an ordering of read-only data (instead of function) symbols. It is also not necessary if total size of output section is small such

that no branch thunk would ever be required, which is common for mobile apps.

I agree that the mentioned cases do not unnecessarily need reordering. It doesn't matter either way, then why change the behavior with more code?

You should also add the motivation for this change to the description (which I believe is to minimize the number of pages occupied by hot code).

Such a motivation is fine. Perhaps @srhines can test this for other applications.

This optimization requires profile data, which isn't generally in use for 1p NDK usage (to my knowledge). Maybe @trybka might know if there are PGO/AFDO users in our 1p apps. As for the Android platform, we do use AutoFDO, but it isn't possible for us to collect enough AFDO data to validate this a priori. We'd have to merge this patch into a toolchain that is currently being used for droidfood, and then experiment later to see what impact it had. Based on the way that Android schedules work, it would be months before we have enough droidfooders on next year's release. I think it might be best to see if this has an impact on any Google 1p apps for now.

Thu, Jun 23, 11:19 AM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

Another way to look at the logic in this code is that, if no branch thunk is required, then to symbol ordering there is no difference between targeting RISC machine and CISC machine. While for CISC machine we just simply follow what ordering says, we should do the same for RISC machine as well.

Thu, Jun 23, 11:00 AM · Restricted Project, Restricted Project

Wed, Jun 22

yozhu updated the diff for D128382: [LLD] Two tweaks to symbol ordering scheme.

Rewrite the test cases using the split-file feature

Wed, Jun 22, 8:57 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

Please answer why executableInputSections > 1 is needed.

If there is zero or only one input section having instructions, the scenario of "hot code calling into cold code" doesn't exist.

Then just let existing code handle it?

Wed, Jun 22, 5:43 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

4 size=0xC00000 input sections are too large (48MiB). If just for ARM::getThunkSectionSpacing (16MiB), it is too much.
Please check split-file. I think we need tests when totalSize >= target->getThunkSectionSpacing() triggers or not.

Wed, Jun 22, 5:21 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

Please answer why executableInputSections > 1 is needed.

Wed, Jun 22, 5:04 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

@MaskRay @smeenai I have added a test case to cover the prior behavior when branch thunk might be required.

Wed, Jun 22, 4:38 PM · Restricted Project, Restricted Project
yozhu updated the diff for D128382: [LLD] Two tweaks to symbol ordering scheme.

Add a test case to verify the behavior when branch thunk might be needed

Wed, Jun 22, 4:35 PM · Restricted Project, Restricted Project
yozhu updated the summary of D128382: [LLD] Two tweaks to symbol ordering scheme.
Wed, Jun 22, 3:51 PM · Restricted Project, Restricted Project
yozhu added a comment to D128382: [LLD] Two tweaks to symbol ordering scheme.

I agree that the mentioned cases do not unnecessarily need reordering. It doesn't matter either way, then why change the behavior with more code?

Wed, Jun 22, 3:34 PM · Restricted Project, Restricted Project
yozhu updated the diff for D128382: [LLD] Two tweaks to symbol ordering scheme.

Address review feedback

Wed, Jun 22, 3:31 PM · Restricted Project, Restricted Project
yozhu published D128382: [LLD] Two tweaks to symbol ordering scheme for review.
Wed, Jun 22, 2:44 PM · Restricted Project, Restricted Project