Download Raw Diff

Details

Reviewers

ruiu
• espindola
MaskRay

Commits

rG2d337fdc95f6: Reland D65242 "[ELF] More dynamic relocation packing""
rLLD369507: Reland D65242 "[ELF] More dynamic relocation packing""
rL369507: Reland D65242 "[ELF] More dynamic relocation packing""

Summary

Currently, with Android dynamic relocation packing, only relative
relocations are grouped together. This patch implements similar
packing for non-relative relocations.

The implementation groups non-relative relocations with the same
r_info and r_addend, if using RELA. By requiring a minimum group
size of 3, this achieves smaller relocation sections. Building Android
for an ARM32 device, I see the total size of /system/lib decrease by
392 KB.

Grouping by r_info also allows the runtime dynamic linker to implement
an 1-entry cache to reduce the number of symbol lookup required. With
such 1-entry cache implemented on Android, I'm seeing 10% to 20%
reduction in total time spent in runtime linker for several executables
that I tested.

As a simple correctness check, I've also built x86_64 Android and booted
successfully.

Diff Detail

Event Timeline

victoryang created this revision.Jul 24 2019, 2:09 PM

Herald added subscribers: llvm-commits, mgrang, MaskRay and 4 others. · View Herald TranscriptJul 24 2019, 2:09 PM

Herald added a reviewer: • espindola. · View Herald TranscriptJul 24 2019, 2:09 PM

Herald added a project: Restricted Project. · View Herald Transcript

Updated comments in code to better explain what's going on, as per requested by ruiu.

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

victoryang edited the summary of this revision. (Show Details)Jul 29 2019, 11:09 AM

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

pcc added inline comments.Jul 29 2019, 11:51 AM

lld/ELF/SyntheticSections.cpp
1784	For non-relative RELA relocations we can generally expect the addend to always be the same (usually 0) so you may want to consider only adding relocations with an addend of 0 to `nonRelativeGroups`. That would let you omit the addend field from these relocations.

In D65242#1604903, @pcc wrote:

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:

target_os = "android"
use_goma = true

Also FWIW I'm building a debug build.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

In D65242#1605129, @victoryang wrote:
In D65242#1604903, @pcc wrote:

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:
target_os = "android"
use_goma = true
Also FWIW I'm building a debug build.

Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

In D65242#1605239, @pcc wrote:
In D65242#1605129, @victoryang wrote:
In D65242#1604903, @pcc wrote:

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:
target_os = "android"
use_goma = true
Also FWIW I'm building a debug build.
Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.

Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

It may also be worth compressing .rela.plt now that we're compressing non-relative relocations.

Do we expect duplicated symbol lookup in .rela.plt? Honestly I'm not sure, but I'm under the impression that we'd have one PLT entry for each external function, so we won't see duplication?

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

Compressing across sections doesn't seem like something we can do right now, but I'll look into other compression you mentioned.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

In D65242#1605242, @victoryang wrote:
In D65242#1605239, @pcc wrote:
In D65242#1605129, @victoryang wrote:
In D65242#1604903, @pcc wrote:

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:
target_os = "android"
use_goma = true
Also FWIW I'm building a debug build.
Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.
Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?

For a basic release build (which is a rough approximation of what's shipped) this is what I use:

is_component_build = false
is_debug = false
target_cpu = "arm64"
target_os = "android"
use_goma = true

That lets you build an ARM32 binary with ninja android_clang_arm/libmonochrome.so and an ARM64 binary with ninja libmonochrome.so. You can also add is_official_build = true to get something significantly closer to what's shipped.

In D65242#1605268, @pcc wrote:
In D65242#1605242, @victoryang wrote:
In D65242#1605239, @pcc wrote:
In D65242#1605129, @victoryang wrote:
In D65242#1604903, @pcc wrote:

In D65242#1604869, @victoryang wrote:

In D65242#1604825, @pcc wrote:

Linking libmonochrome in Chromium for
Android, targeting ARM32, I see a .rel.dyn of size 127697 bytes, as
compared to 150532 bytes without this change.

IIRC over 99% of relocations in libmonochrome were relative, so I didn't implement anything for the non-relative relocations, but maybe there are more non-relative relocations now. I can imagine that if say you were building it as a component build that would result in more non-relative relocations than in shipping builds. It may also be more compelling to mention how much this helps for the Android platform which I believe uses non-relative relocations more heavily.

Thanks for pointing this out! I've updated the description to include test results on Android.

Thanks for adding those figures. I'm still not sure about your Chromium figures though. I just did a quick build myself and I'm still seeing >99% relative relocations without significant duplication in the non-relative relocations so I'm curious to see what your args.gn looks like for Chromium.

My args.gn is:
target_os = "android"
use_goma = true
Also FWIW I'm building a debug build.
Okay, since that's a component build and not realistic as a size measurement for anything that's shipped I probably wouldn't mention the Chromium numbers in the commit message since they're somewhat misleading.
Fair enough. I removed the part about Chromium. Just for my own education, what's the right way of testing this for Chromium? Is it just that I should do a Release build than a Debug build?
For a basic release build (which is a rough approximation of what's shipped) this is what I use:
is_component_build = false
is_debug = false
target_cpu = "arm64"
target_os = "android"
use_goma = true
That lets you build an ARM32 binary with ninja android_clang_arm/libmonochrome.so and an ARM64 binary with ninja libmonochrome.so. You can also add is_official_build = true to get something significantly closer to what's shipped.

Thanks!

I just realized we are not using Android packed relocation for .rel.plt. I'd purpose we keep this change simple and deal with PLT in follow-up patches (including grouping by delta and combining .rel.plt and .rel.dyn that you mentioned.)

In D65242#1605279, @victoryang wrote:

I just realized we are not using Android packed relocation for .rel.plt. I'd purpose we keep this change simple and deal with PLT in follow-up patches (including grouping by delta and combining .rel.plt and .rel.dyn that you mentioned.)

Yes, that's fine.

Updated to group non-relative relocations by addend, in addition to r_info.

victoryang updated this revision to Diff 212267.Jul 29 2019, 8:19 PM

victoryang marked an inline comment as done.

pcc added inline comments.Jul 29 2019, 8:41 PM

lld/ELF/SyntheticSections.cpp
1784	Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in `ungroupedNonRelatives`.

victoryang marked an inline comment as done.Jul 29 2019, 8:49 PM

victoryang added inline comments.

lld/ELF/SyntheticSections.cpp
1784	IMO the code would be actually slightly messier if we do that, because the logic in grouping gets more complicated (i.e. need to check the first item in a group doesn't have non-zero addend.)

You may have duplication between .rela.plt and .rela.dyn if a function is both called and address taken. But if there are many PLT entries then you should expect to see an improvement just from compressing the relocations themselves since right now they're being stored uncompressed.

GOTPLT (JUMP_SLOT) and GOT (GLOB_DAT) slots can be combined to a single GOT slot. @grimar has a draft D37333

We should have some statistics how much this technique will save. Functions that are both called and address taken are not very common.

We could just emit the relocations that would otherwise go to .rela.plt into .rela.dyn in the case where -z now is passed. Then they'll get compressed automatically. But that's largely orthogonal to what you're doing here so it can be done as a separate patch.

Combined GOTPLT and GOT is not very useful on x86 if you always use -z relro -z now (-z relro is the default in lld) and don't use a non-standard PLT. In that case, PLT is just useless . GOTPLT should just be eliminated by turning on -fno-plt. The call/jmp instruction will be one byte longer, though. This should still be beneficial on many other architectures because -fno-plt (very few architectures have this option, probably only x86 and ppc; ppc is something slightly different: inline PLT calls.. I haven't looked into the gcc/binutils ppc details) may cause significant code size increase at every call site.

Thank you for updating the comments. That's much easier to understand now!

lld/ELF/SyntheticSections.cpp
1693	I believe you need to use `llvm::stable_sort` for build reproducibility.
1696–1698	nit: no `else` after `return`.
1718–1721	Maybe it's little bit easier to read if you use two iterators like this? auto j = i + 1; while (i->r_info == j->r_info && (!config->isRela \|\| i->r_addend == j->r_addend) j++; if (j - i < 3) ...

victoryang updated this revision to Diff 212394.Jul 30 2019, 11:28 AM

victoryang marked 4 inline comments as done.

victoryang added inline comments.

lld/ELF/SyntheticSections.cpp
1693	Ah you are right. I copied llvm::sort from above but missed that this has non-unique keys.

Ping. Rui, could you re-review this and see if everything looks okay to you? Thanks!

LGTM

This function accumulated code and become a bit too long. We probably have to split the function into small pieces as a follow-up patch.

This revision is now accepted and ready to land.Aug 6 2019, 5:43 AM

Sorry, I forgot one thing: can you add a test?

This revision now requires changes to proceed.Aug 6 2019, 5:44 AM

Expanded test/ELF/pack-dyn-relocs.s to exercise grouping of non-relative relocations as well as grouping by addend when using RELA.

Herald added a subscriber: srhines. · View Herald TranscriptAug 6 2019, 1:56 PM

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:

if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}

if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)

Isn't that still not simpler? This is more code and (very) slightly less benefit. What are we trying to achieve by restricting addend==0?

In D65242#1618214, @victoryang wrote:
In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)
Isn't that still not simpler? This is more code and (very) slightly less benefit.

There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:

if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

In D65242#1618217, @MaskRay wrote:
In D65242#1618214, @victoryang wrote:
In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)
Isn't that still not simpler? This is more code and (very) slightly less benefit.
There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:
if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}

I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

In D65242#1618221, @victoryang wrote:
In D65242#1618217, @MaskRay wrote:
In D65242#1618214, @victoryang wrote:
In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)
Isn't that still not simpler? This is more code and (very) slightly less benefit.
There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:
if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}
I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.

Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

In D65242#1618222, @victoryang wrote:
In D65242#1618221, @victoryang wrote:
In D65242#1618217, @MaskRay wrote:
In D65242#1618214, @victoryang wrote:
In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)
Isn't that still not simpler? This is more code and (very) slightly less benefit.
There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:
if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}
I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.
Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.

Uh, taking a closer look at this, I see we emit addend for each relative relocation, so I assume I was wrong about relative relocations all having zero addend? If that's the case, I'm back to thinking restricting addend==0 isn't simpler.

In D65242#1618224, @victoryang wrote:
In D65242#1618222, @victoryang wrote:
In D65242#1618221, @victoryang wrote:
In D65242#1618217, @MaskRay wrote:
In D65242#1618214, @victoryang wrote:
In D65242#1618207, @MaskRay wrote:
In D65242#1618198, @victoryang wrote:
In D65242#1618191, @MaskRay wrote:

Would it be simpler to only support an addend of 0 here? I reckon that non-zero addends are rare enough that we can just let them be emitted in ungroupedNonRelatives.

I'd like @pcc's comment to be addressed. The three relocation types R_*_JUMP_SLOT, R_*_GLOB_DAT, R_*_COPY always have 0 r_addend. A symbolic relocation (R_AARCH64_ABS64, R_X86_64_64, etc) may have non-zero r_addend but they are rare. Non-relative relocation with non-zero r_addend can just be placed in ungroupedNonRelatives.

I don't disagree that non-zero addends are rare, but I don't think it'd be any simpler if we only support addends of zero. As it is now, the grouping logic only needs to compare adjacent entries. If we want to only group entries with zero addends, it'd need to check the value of addends in addition to what's currently implemented. What you are asking essentially comes down to adding the following code block to before line 1716:
if (config->isRela && i->addend != 0) {
  ungroupedNonRelatives.push_back(*i++);
  continue; 
}
if (j - i < 3 || i->addend != 0) ( it could be if (j - i < 3 || (config->isRela && i->addend != 0)) but probably unnecessary)
Isn't that still not simpler? This is more code and (very) slightly less benefit.
There is no more line. It just adds 18 characters.

What are we trying to achieve by restricting addend==0?

In return, you can delete these lines, and improve size savings:
if (config->isRela) {
  add(g[0].r_addend - addend);
  addend = g[0].r_addend;
}
I don't think these can be deleted. If I understand this correctly, we still need to encode the delta relative to the last relocation, so either we still need to keep these lines here or we need to special case handle the first group. Did I miss anything? Just to be clear, I'm not opposed to making the change as requested. I just don't see how that simplify this patch. I'd be very happy to be proven wrong.
Oh wait, I take that back. Relative relocations all have zero addend, so this indeed can be simplified. I'll put up a new diff.
Uh, taking a closer look at this, I see we emit addend for each relative relocation, so I assume I was wrong about relative relocations all having zero addend? If that's the case, I'm back to thinking restricting addend==0 isn't simpler.

Dynamic relocation types can be partitioned into two groups: R_*_RELATIVE and non-R_*_RELATIVE.

r_addend is generally non-zero for R_*_RELATIVE but zero for non-relative relocations.

// In   for (std::vector<Elf_Rela> &g : nonRelativeGroups) {
-      if (config->isRela) {
-        add(r.r_addend - addend);
-        addend = r.r_addend;
-      }
    }

Changed to grouping only when addend==0. I did have to move the grouped non-relative relocations to before the relative relocations, so that we can simplify addend encoding.

pcc added inline comments.Aug 7 2019, 1:47 PM

lld/ELF/SyntheticSections.cpp
1733	I don't think you need to set `hasAddendIfRela \| groupedByAddendIfRela`. If these flags are not set, the unpacker will use 0 as the addend value. See: https://android.googlesource.com/platform/bionic/+/master/linker/linker_reloc_iterators.h#142 I think this also means that you don't need to reorder non-relatives before relatives.
1736–1738	Can be removed with suggestion above.

victoryang updated this revision to Diff 213993.Aug 7 2019, 2:06 PM

victoryang marked 3 inline comments as done.

victoryang added inline comments.

lld/ELF/SyntheticSections.cpp
1733	Ah nice! I was assuming without grouping by addend, we'd always need to emit addend for each relocation entry. Thanks for pointing this out! This indeed is the simplest way to do it.

rprichard added a subscriber: rprichard.Aug 8 2019, 6:11 PM

Any other things that any of you would like me to address? If not, can we get this reviewed?

Rui, when you get a chance, could you re-review this? Thanks!

LGTM

lld/ELF/SyntheticSections.cpp
1777	We generally use `ArrayRef<T>` instead of `std::vector<T> &` in LLVM if it can be a read-only variable. Can you use `ArrayRef<Elf_Rela>` instead of `std::vector<Elf_Rela> &`?

This revision is now accepted and ready to land.Aug 20 2019, 1:42 AM

MaskRay accepted this revision.Aug 20 2019, 5:29 AM

MaskRay added inline comments.

lld/ELF/SyntheticSections.cpp
1711	if we group by addend as well I think you meant relocations with 0 addend are grouped.

victoryang updated this revision to Diff 216183.Aug 20 2019, 9:33 AM

victoryang marked 3 inline comments as done.

victoryang added inline comments.

lld/ELF/SyntheticSections.cpp
1711	Yes. Thanks!

Thanks for reviewing! Updated diff per comments. I don't have commit access though.

In D65242#1637661, @victoryang wrote:

Thanks for reviewing! Updated diff per comments. I don't have commit access though.

This test change should be rebased after D64930 layout change and probably D65651. I'll regenerate the CHECK lines...

Committed as r369488 (made a copy-n-paste error by associating the wrong revision..)

MaskRay mentioned this in rL369488: [ELF] More dynamic relocation packing.Aug 20 2019, 8:03 PM

MaskRay added a reverting change: rL369497: Revert D65242 "[ELF] More dynamic relocation packing".Aug 20 2019, 11:50 PM

MaskRay added a reverting change: rGb2895a8cdcd8: Revert D65242 "[ELF] More dynamic relocation packing".

MaskRay added a reverting change: rLLD369497: Revert D65242 "[ELF] More dynamic relocation packing".

Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)

/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0

In D65242#1638787, @MaskRay wrote:
Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)
/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0

I wonder if llvm::sort instead of stable sort can be the reason?

In D65242#1638794, @grimar wrote:
In D65242#1638787, @MaskRay wrote:
Weird. http://lab.llvm.org:8011/builders/lld-x86_64-freebsd/builds/34407/steps/test_lld/logs/stdio (http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/14511/steps/check-lld%20ubsan/logs/stdio has a similar error message)
/usr/home/buildslave/as-bldslv5/lld-x86_64-freebsd/llvm.src/tools/lld/test/ELF/pack-dyn-relocs.s:106:20: error: ANDROID32-NEXT: is not on the line after the previous match
// ANDROID32-NEXT: 0x2044 R_ARM_ABS32 bar2 0x0
                   ^
<stdin>:36:2: note: 'next' match was here
 0x2044 R_ARM_ABS32 bar2 0x0
 ^
<stdin>:34:29: note: previous match ended here
 0x2020 R_ARM_ABS32 bar2 0x0
                            ^
<stdin>:35:1: note: non-matching line after previous match is here
 0x2040 R_ARM_ABS32 zed2 0x0
I wonder if llvm::sort instead of stable sort can be the reason?

Probably not? I tried this locally before reverting... It passed

@@ -38,6 +38,7 @@
 #include "llvm/Support/MD5.h"
 #include <cstdlib>
 #include <thread>
+#include <random>
 
 using namespace llvm;
 using namespace llvm::dwarf;
@@ -1676,7 +1677,9 @@ bool AndroidPackedRelocationSection<ELFT>::updateAllocSize() {
     else
       nonRelatives.push_back(r);
   }
+  std::mt19937 Generator(std::random_device{}());
 
+  std::shuffle(relatives.begin(), relatives.end(), Generator);
   llvm::sort(relatives, [](const Elf_Rel &a, const Elf_Rel &b) {
     return a.r_offset < b.r_offset;
   });
@@ -1747,6 +1750,7 @@ bool AndroidPackedRelocationSection<ELFT>::updateAllocSize() {
     i = j;
   }
 
+  std::shuffle(ungroupedNonRelatives.begin(), ungroupedNonRelatives.end(), Generator);
   // Sort ungrouped relocations by offset to minimize the encoded length.
   llvm::sort(ungroupedNonRelatives, [](const Elf_Rela &a, const Elf_Rela &b) {
     return a.r_offset < b.r_offset;

Also, these r_offsets are unique (dynamic relocations usually have unique r_offset).

Could this be due to merge conflict in the test case?

victoryang reopened this revision.Aug 21 2019, 12:47 AM

This revision is now accepted and ready to land.Aug 21 2019, 12:47 AM

Rebased and updated the test.

In D65242#1638801, @victoryang wrote:

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

In D65242#1638828, @MaskRay wrote:

In D65242#1638801, @victoryang wrote:

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Reading the error message again, it seems to me that grouping isn't happening for bar2 on ANDROID32. Given that we have

if (j - i < 3 || i->r_addend != 0)

I wonder if r_addend isn't always 0 or isn't initialized when RELA is not used?

In D65242#1638832, @victoryang wrote:
In D65242#1638828, @MaskRay wrote:

In D65242#1638801, @victoryang wrote:

Could this be due to merge conflict in the test case?

No. You can apply the reverse of rL369497. There is no merge conflict.

Reading the error message again, it seems to me that grouping isn't happening for bar2 on ANDROID32. Given that we have
if (j - i < 3 || i->r_addend != 0)
I wonder if r_addend isn't always 0 or isn't initialized when RELA is not used?

Ah.. It is not initialized.. I know how to fix it then. See

template <class ELFT>
static void encodeDynamicReloc(SymbolTableBaseSection *symTab,
                               typename ELFT::Rela *p,
                               const DynamicReloc &rel) {
  if (config->isRela)
    p->r_addend = rel.computeAddend();

Closed by commit rL369507: Reland D65242 "[ELF] More dynamic relocation packing"" (authored by MaskRay). · Explain WhyAug 21 2019, 2:23 AM

This revision was automatically updated to reflect the committed changes.

MaskRay mentioned this in rL369507: Reland D65242 "[ELF] More dynamic relocation packing"".

MaskRay mentioned this in rG2d337fdc95f6: Reland D65242 "[ELF] More dynamic relocation packing"".

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] More dynamic relocation packing
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 212195

lld/ELF/SyntheticSections.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[ELF] More dynamic relocation packingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 212195

lld/ELF/SyntheticSections.cpp

[ELF] More dynamic relocation packing
ClosedPublic