Page MenuHomePhabricator

[ELF] More dynamic relocation packing
ClosedPublic

Authored by victoryang on Jul 24 2019, 2:09 PM.

Details

Summary

Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.

The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.

Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.

As a simple correctness check, I've also built x86_64 Android and booted
successfully.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald Transcript

Updated comments in code to better explain what's going on, as per requested by ruiu.

pcc added a subscriber: pcc.Jul 29 2019, 10:55 AM

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

victoryang edited the summary of this revision. (Show Details)Jul 29 2019, 11:09 AM
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

pcc added a comment.Jul 29 2019, 11:30 AM
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

pcc added inline comments.Jul 29 2019, 11:51 AM
lld/ELF/SyntheticSections.cpp
1784 ↗(On Diff #212195)

For non-relative RELA relocations we can generally expect the addend to always be the same (usually 0) so you may want to consider only adding relocations with an addend of 0 to nonRelativeGroups. That would let you omit the addend field from these relocations.

In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

pcc added a comment.Jul 29 2019, 2:25 PM
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

victoryang edited the summary of this revision. (Show Details)Jul 29 2019, 2:31 PM
In D65242#1605239, @pcc wrote:
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

pcc added a comment.Jul 29 2019, 2:44 PM
In D65242#1605239, @pcc wrote:
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

For a basic release build (which is a rough approximation of what's shipped) this is what I use:

is_component_build = false
is_debug = false
target_cpu = "arm64"
target_os = "android"
use_goma = true

That lets you build an ARM32 binary with ninja android_clang_arm/libmonochrome.so and an ARM64 binary with ninja libmonochrome.so. You can also add is_official_build = true to get something significantly closer to what's shipped.

In D65242#1605268, @pcc wrote:
In D65242#1605239, @pcc wrote:
In D65242#1604903, @pcc wrote:
In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

For a basic release build (which is a rough approximation of what's shipped) this is what I use:

is_component_build = false
is_debug = false
target_cpu = "arm64"
target_os = "android"
use_goma = true

That lets you build an ARM32 binary with ninja android_clang_arm/libmonochrome.so and an ARM64 binary with ninja libmonochrome.so. You can also add is_official_build = true to get something significantly closer to what's shipped.

Thanks!

I just realized we are not using Android packed relocation for .rel.plt. I'd purpose we keep this change simple and deal with PLT in follow-up patches (including grouping by delta and combining .rel.plt and .rel.dyn that you mentioned.)

pcc added a comment.Jul 29 2019, 3:10 PM

I just realized we are not using Android packed relocation for .rel.plt. I'd purpose we keep this change simple and deal with PLT in follow-up patches (including grouping by delta and combining .rel.plt and .rel.dyn that you mentioned.)

Yes, that's fine.

victoryang edited the summary of this revision. (Show Details)

Updated to group non-relative relocations by addend, in addition to r_info.

victoryang marked an inline comment as done.
pcc added inline comments.Jul 29 2019, 8:41 PM
lld/ELF/SyntheticSections.cpp
1784 ↗(On Diff #212195)

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

victoryang marked an inline comment as done.Jul 29 2019, 8:49 PM
victoryang added inline comments.
lld/ELF/SyntheticSections.cpp
1784 ↗(On Diff #212195)

IMO the code would be actually slightly messier if we do that, because the logic in grouping gets more complicated (i.e. need to check the first item in a group doesn't have non-zero addend.)

MaskRay added a subscriber: grimar.EditedJul 29 2019, 9:23 PM

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

GOTPLT (JUMP_SLOT) and GOT (GLOB_DAT) slots can be combined to a single GOT slot. @grimar has a draft D37333

We should have some statistics how much this technique will save. Functions that are both called and address taken are not very common.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

Combined GOTPLT and GOT is not very useful on x86 if you always use -z relro -z now (-z relro is the default in lld) and don't use a non-standard PLT. In that case, PLT is just useless . GOTPLT should just be eliminated by turning on -fno-plt. The call/jmp instruction will be one byte longer, though. This should still be beneficial on many other architectures because -fno-plt (very few architectures have this option, probably only x86 and ppc; ppc is something slightly different: inline PLT calls.. I haven't looked into the gcc/binutils ppc details) may cause significant code size increase at every call site.

ruiu added a comment.Jul 29 2019, 10:21 PM

Thank you for updating the comments. That's much easier to understand now!

lld/ELF/SyntheticSections.cpp
1693 ↗(On Diff #212267)

I believe you need to use llvm::stable_sort for build reproducibility.

1696–1698 ↗(On Diff #212267)

nit: no else after return.

1718–1721 ↗(On Diff #212267)

Maybe it's little bit easier to read if you use two iterators like this?

auto j = i + 1;
while (i->r_info == j->r_info && (!config->isRela || i->r_addend == j->r_addend)
  j++;
if (j - i < 3)
  ...
victoryang marked 4 inline comments as done.
victoryang added inline comments.
lld/ELF/SyntheticSections.cpp
1693 ↗(On Diff #212267)

Ah you are right. I copied llvm::sort from above but missed that this has non-unique keys.

victoryang marked an inline comment as done.Aug 5 2019, 10:04 AM

Ping. Rui, could you re-review this and see if everything looks okay to you? Thanks!

ruiu accepted this revision.Aug 6 2019, 5:43 AM

LGTM

This function accumulated code and become a bit too long. We probably have to split the function into small pieces as a follow-up patch.

This revision is now accepted and ready to land.Aug 6 2019, 5:43 AM
ruiu requested changes to this revision.Aug 6 2019, 5:44 AM

Sorry, I forgot one thing: can you add a test?

This revision now requires changes to proceed.Aug 6 2019, 5:44 AM
victoryang updated this revision to Diff 213711.Aug 6 2019, 1:56 PM

Expanded test/ELF/pack-dyn-relocs.s to exercise grouping of non-relative relocations as well as grouping by addend when using RELA.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit. What are we trying to achieve by restricting addend==0?

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

Uh, taking a closer look at this, I see we emit addend for each relative relocation, so I assume I was wrong about relative relocations all having zero addend? If that's the case, I'm back to thinking restricting addend==0 isn't simpler.

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

Uh, taking a closer look at this, I see we emit addend for each relative relocation, so I assume I was wrong about relative relocations all having zero addend? If that's the case, I'm back to thinking restricting addend==0 isn't simpler.

Dynamic relocation types can be partitioned into two groups: R_*_RELATIVE and non-R_*_RELATIVE.

r_addend is generally non-zero for R_*_RELATIVE but zero for non-relative relocations.

// In   for (std::vector<Elf_Rela> &g : nonRelativeGroups) {
-      if (config->isRela) {
-        add(r.r_addend - addend);
-        addend = r.r_addend;
-      }
    }
victoryang updated this revision to Diff 213983.Aug 7 2019, 1:38 PM

Changed to grouping only when addend==0. I did have to move the grouped non-relative relocations to before the relative relocations, so that we can simplify addend encoding.

pcc added inline comments.Aug 7 2019, 1:47 PM
lld/ELF/SyntheticSections.cpp
1745 ↗(On Diff #213983)

I don't think you need to set hasAddendIfRela | groupedByAddendIfRela. If these flags are not set, the unpacker will use 0 as the addend value. See:
https://android.googlesource.com/platform/bionic/+/master/linker/linker_reloc_iterators.h#142

I think this also means that you don't need to reorder non-relatives before relatives.

1748–1750 ↗(On Diff #213983)

Can be removed with suggestion above.

victoryang updated this revision to Diff 213993.Aug 7 2019, 2:06 PM
victoryang marked 3 inline comments as done.
victoryang added inline comments.
lld/ELF/SyntheticSections.cpp
1745 ↗(On Diff #213983)

Ah nice! I was assuming without grouping by addend, we'd always need to emit addend for each relocation entry. Thanks for pointing this out! This indeed is the simplest way to do it.

Any other things that any of you would like me to address? If not, can we get this reviewed?

Rui, when you get a chance, could you re-review this? Thanks!

ruiu accepted this revision.Aug 20 2019, 1:42 AM

LGTM

lld/ELF/SyntheticSections.cpp
1787 ↗(On Diff #213993)

We generally use ArrayRef<T> instead of std::vector<T> & in LLVM if it can be a read-only variable. Can you use ArrayRef<Elf_Rela> instead of std::vector<Elf_Rela> &?

This revision is now accepted and ready to land.Aug 20 2019, 1:42 AM
MaskRay accepted this revision.Aug 20 2019, 5:29 AM
MaskRay added inline comments.
lld/ELF/SyntheticSections.cpp
1711 ↗(On Diff #213993)

if we group by addend as well

I think you meant relocations with 0 addend are grouped.

victoryang marked 3 inline comments as done.
victoryang added inline comments.
lld/ELF/SyntheticSections.cpp
1711 ↗(On Diff #213993)

Yes. Thanks!

Thanks for reviewing! Updated diff per comments. I don't have commit access though.

Thanks for reviewing! Updated diff per comments. I don't have commit access though.

This test change should be rebased after D64930 layout change and probably D65651. I'll regenerate the CHECK lines...

MaskRay closed this revision.Aug 20 2019, 8:02 PM

Committed as r369488 (made a copy-n-paste error by associating the wrong revision..)

Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)

/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0

Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)

/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0

I wonder if llvm::sort instead of stable sort can be the reason?

MaskRay added a comment.EditedAug 21 2019, 12:20 AM

Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)

/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0

I wonder if llvm::sort instead of stable sort can be the reason?

Probably not? I tried this locally before reverting... It passed

@@ -38,6 +38,7 @@
 #include "llvm/Support/MD5.h"
 #include <cstdlib>
 #include <thread>
+#include <random>
 
 using namespace llvm;
 using namespace llvm::dwarf;
@@ -1676,7 +1677,9 @@ bool AndroidPackedRelocationSection<ELFT>::updateAllocSize() {
     else
       nonRelatives.push_back(r);
   }
+  std::mt19937 Generator(std::random_device{}());
 
+  std::shuffle(relatives.begin(), relatives.end(), Generator);
   llvm::sort(relatives, [](const Elf_Rel &a, const Elf_Rel &b) {
     return a.r_offset < b.r_offset;
   });
@@ -1747,6 +1750,7 @@ bool AndroidPackedRelocationSection<ELFT>::updateAllocSize() {
     i = j;
   }
 
+  std::shuffle(ungroupedNonRelatives.begin(), ungroupedNonRelatives.end(), Generator);
   // Sort ungrouped relocations by offset to minimize the encoded length.
   llvm::sort(ungroupedNonRelatives, [](const Elf_Rela &a, const Elf_Rela &b) {
     return a.r_offset < b.r_offset;

Also, these r_offsets are unique (dynamic relocations usually have unique r_offset).

Could this be due to merge conflict in the test case?

victoryang reopened this revision.Aug 21 2019, 12:47 AM
This revision is now accepted and ready to land.Aug 21 2019, 12:47 AM

Rebased and updated the test.

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Reading the error message again, it seems to me that grouping isn't happening for bar2 on ANDROID32. Given that we have

if (j - i < 3 || i->r_addend != 0)

I wonder if r_addend isn't always 0 or isn't initialized when RELA is not used?

MaskRay added a comment.EditedAug 21 2019, 2:05 AM

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Reading the error message again, it seems to me that grouping isn't happening for bar2 on ANDROID32. Given that we have

if (j - i < 3 || i->r_addend != 0)

I wonder if r_addend isn't always 0 or isn't initialized when RELA is not used?

Ah.. It is not initialized.. I know how to fix it then. See

template <class ELFT>
static void encodeDynamicReloc(SymbolTableBaseSection *symTab,
                               typename ELFT::Rela *p,
                               const DynamicReloc &rel) {
  if (config->isRela)
    p->r_addend = rel.computeAddend();