Page MenuHomePhabricator
Feed Advanced Search

Aug 21 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Aug 21 2019, 1:21 AM · Restricted Project
victoryang reopened D65242: [ELF] More dynamic relocation packing.
Aug 21 2019, 12:50 AM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.

Rebased and updated the test.

Aug 21 2019, 12:50 AM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Could this be due to merge conflict in the test case?

Aug 21 2019, 12:34 AM · Restricted Project

Aug 20 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Thanks for reviewing! Updated diff per comments. I don't have commit access though.

Aug 20 2019, 9:40 AM · Restricted Project
victoryang added inline comments to D65242: [ELF] More dynamic relocation packing.
Aug 20 2019, 9:35 AM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.
Aug 20 2019, 9:35 AM · Restricted Project

Aug 13 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Rui, when you get a chance, could you re-review this? Thanks!

Aug 13 2019, 12:11 PM · Restricted Project

Aug 9 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Any other things that any of you would like me to address? If not, can we get this reviewed?

Aug 9 2019, 10:04 AM · Restricted Project

Aug 7 2019

victoryang added inline comments to D65242: [ELF] More dynamic relocation packing.
Aug 7 2019, 2:07 PM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.
Aug 7 2019, 2:06 PM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.

Changed to grouping only when addend==0. I did have to move the grouped non-relative relocations to before the relative relocations, so that we can simplify addend encoding.

Aug 7 2019, 1:39 PM · Restricted Project

Aug 6 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

Aug 6 2019, 8:51 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Aug 6 2019, 8:35 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}
Aug 6 2019, 8:33 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Aug 6 2019, 8:00 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

Aug 6 2019, 7:25 PM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.

Expanded test/ELF/pack-dyn-relocs.s to exercise grouping of non-relative relocations as well as grouping by addend when using RELA.

Aug 6 2019, 1:57 PM · Restricted Project

Aug 5 2019

victoryang added a comment to D65242: [ELF] More dynamic relocation packing.

Ping. Rui, could you re-review this and see if everything looks okay to you? Thanks!

Aug 5 2019, 10:05 AM · Restricted Project

Jul 30 2019

victoryang added inline comments to D65242: [ELF] More dynamic relocation packing.
Jul 30 2019, 11:30 AM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.
Jul 30 2019, 11:30 AM · Restricted Project

Jul 29 2019

victoryang added inline comments to D65242: [ELF] More dynamic relocation packing.
Jul 29 2019, 8:49 PM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.
Jul 29 2019, 8:22 PM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.

Updated to group non-relative relocations by addend, in addition to r_info.

Jul 29 2019, 8:22 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.
In D65242#1605268, @pcc wrote:
In D65242#1605239, @pcc wrote:
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

For a basic release build (which is a rough approximation of what's shipped) this is what I use:

is_component_build = false
is_debug = false
target_cpu = "arm64"
target_os = "android"
use_goma = true

That lets you build an ARM32 binary with ninja android_clang_arm/libmonochrome.so and an ARM64 binary with ninja libmonochrome.so. You can also add is_official_build = true to get something significantly closer to what's shipped.

Jul 29 2019, 2:57 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.
In D65242#1605239, @pcc wrote:
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Jul 29 2019, 2:36 PM · Restricted Project
victoryang updated the summary of D65242: [ELF] More dynamic relocation packing.
Jul 29 2019, 2:36 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

Jul 29 2019, 1:41 PM · Restricted Project
victoryang added a comment to D65242: [ELF] More dynamic relocation packing.
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Jul 29 2019, 11:13 AM · Restricted Project
victoryang updated the summary of D65242: [ELF] More dynamic relocation packing.
Jul 29 2019, 11:09 AM · Restricted Project
victoryang updated the diff for D65242: [ELF] More dynamic relocation packing.

Updated comments in code to better explain what's going on, as per requested by ruiu.

Jul 29 2019, 10:36 AM · Restricted Project

Jul 24 2019

victoryang created D65242: [ELF] More dynamic relocation packing.
Jul 24 2019, 2:10 PM · Restricted Project

Feb 4 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

George,

0.5% is not a micro-optimization if it works for many applications. If you find a heuristic that can generally reduce memory usage by 0.5%, that's not negligible. Imagine for example 0.5% memory usage reduction in a huge cloud application. You might be able to save multi million dollars.

Vic,

That said, the numbers you've shown so far don't honestly seem very convincing. The number of data points is too few that I can't see if it generally works. I believe I can test your patch myself and see how it works for Linux applications by parsing /proc/self/smaps (which contains information about the number of dirty pages). Let me try to do that for a few applications like clang.

Rui,

Feb 4 2019, 7:52 PM

Jan 18 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

Another data point: Sorting the data section for Chrome on Android, data section dirty pages went down from 6308KB to 6280KB (arm32).

Or 0.5%..

Jan 18 2019, 11:34 AM

Jan 17 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

Another data point: Sorting the data section for Chrome on Android, data section dirty pages went down from 6308KB to 6280KB (arm32).

Jan 17 2019, 12:09 PM

Jan 16 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..
In D56325#1360770, @pcc wrote:

I was able to build and test Chrome on Android. Without this, the dirty pages from all libraries included in Chrome APK sum up to 364KB. With the symbols in those libraries sorted, this comes down to 360KB.

How exactly did you get those figures? For measuring anything memory related in Chrome I'd recommend using the system_health.memory_mobile benchmarks (I'd be happy to help you run them).

Jan 16 2019, 4:27 PM
victoryang added a comment to D56325: Sort symbols in .bss by size..

4KB is just one page. Honestly I think it is hard to draw a conclusion from only that data as the difference is too small. Could you give us more data points so that we are convinced that the patch actually makes a difference?

Jan 16 2019, 4:22 PM
victoryang added a comment to D56325: Sort symbols in .bss by size..

I was able to build and test Chrome on Android. Without this, the dirty pages from all libraries included in Chrome APK sum up to 364KB. With the symbols in those libraries sorted, this comes down to 360KB.

Jan 16 2019, 3:48 PM

Jan 13 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

I don't think we need a detailed benchmark for other targets, as how programs use .bss (and in general other parts of data sections) doesn't depend too much on targets.

Hard to say. I remember we had a patch that fixed inconsistency of handling .bss symbols (we had an issue in a case when application wanted to have the same order of symbols it would have if we would create them in a command line order, it had a relocation overflow because we create the .bss symbols first now). The patch was declined (I can find it if you want). So if this patch solves a local issue of low-end Android handsets only I really see no reason of why we want to change the behavior of LLD? It is the almost the same case that patch had. The difference is that now it is needed for Android and not for an unknown application. Until we have benchmarks showing it has a benefit for other platforms I am not convinced we should change the linker. It is up to you anyways, I am just saying my opinion about that change.

For the record, the reason why I'm in favor of this patch not because I'm working for Google; it is not correct to say that this patch is to fix a local issue. I've been trying to handle all contributors equally. As to this patch, I think it doesn't only improve Android but improve every target. Android Go team is working hard to reduce memory usage, and that's why they found this heuristics, and I added that to the comment because a concrete example is more reader-friendly than saying the same thing abstractly. But nothing special about Android Go.

That said, I can ask Vic to build other regular Linux applications with/without this patch to see if the same improvement (or at least no regression) can be observed. Vic, do you mind if I ask you to do that?

Jan 13 2019, 9:14 PM

Jan 9 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

I didn't test the descending order, but assuming there's no special alignment requirement, you'd likely end up with similar result because you are essentially shifting all page boundaries and flipping individual symbols at the same time.

Jan 9 2019, 10:04 AM

Jan 4 2019

victoryang added a comment to D56325: Sort symbols in .bss by size..

What is the entire size of your program?

The numbers are measured across all processes running, so there are multiple programs. If you are looking for numbers from a single binary, I saw private dirty from .bss in libc went down from 36KB to 16KB. Unfortunately I didn't record the saving from sorting .bss+.data in libc. I can do the test again if that's necessary.

I'm tempted to sort all SHF_WRITE sections by size by default. Unlike some other sections such as .init_array, there should not be really any program that have an assumption on how .data sections are laid out, but I'm not sure if that wouldn't surprise users. But maybe, we should do that?

Unless there's a certain logic in how things in SHF_WRITE sections are ordered right now, I don't think anyone can realistically depend on the ordering, so this should be safe.

Jan 4 2019, 1:20 PM
victoryang added a comment to D56325: Sort symbols in .bss by size..

For .data section specifically, the logic does apply, and I saw more memory saving when I sorted .bss+.data vs only .bss on Android. Applying this to all SHF_WRITE sections should be fine. It is possible that sorting results in no memory save at all in the worst case, but that shouldn't be a problem.

Do you have a number on how much memory you could save on Android by sorting .data section? I'm pretty interested in that number
.

Jan 4 2019, 1:05 PM
victoryang added inline comments to D56325: Sort symbols in .bss by size..
Jan 4 2019, 12:56 PM