This is an archive of the discontinued LLVM Phabricator instance.

[MC] - Add .stack_size sections into groups and link them with .text
ClosedPublic

Authored by grimar on May 15 2018, 5:49 AM.

Details

Summary

D39788 added a '.stack-size' section containing metadata on function stack sizes
to output ELF files behind the new -stack-size-section flag.

This change does following two things on top:

  1. Imagine the case when there are -ffunction-sections flag given and there are text sections in COMDATs. The patch adds a '.stack-size' section into corresponding COMDAT group, so that linker will be able to eliminate them fast during resolving the COMDATs.
  2. Patch sets a SHF_LINK_ORDER flag and links '.stack-size' with the corresponding .text. With that linker will be able to do -gc-sections on dead stack sizes sections.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar created this revision.May 15 2018, 5:49 AM

This is something that has needed doing, thanks! The code looks plausible to me, but I am not really an ELF expert so I am reluctant to formally approve it.

The test source needs to be IR, not C++; presumably you can just -emit-llvm -S on this test to get it.

grimar updated this revision to Diff 147013.May 16 2018, 2:04 AM

Thanks, Paul!

I switched to IR test case (and also fixed another one that turned out to be failing).

@grimar, what is the link-time performance of doing this on something with many functions?

Also, I think it would make more sense for the stack sizes section names to be derived from the "parent" section, a bit like relocation sections, so they'd be called something like .stack_sizes.text._Z3foov or possibly simply .stack_sizes._Z3foov. That way dumping tools can more easily dump the specific individual stack_sizes sections.

test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #147013)

I might be misunderstanding something here, but I don't think these should be uniqued, i.e. in my opinion, in this case, there should only be one stack_sizes section containing both entries, since there is only one text section.

@grimar, what is the link-time performance of doing this on something with many functions?

I did not yet measure. I can do the benchmark and return with the results.

Also, I think it would make more sense for the stack sizes section names to be derived from the "parent" section, a bit like relocation sections, so they'd be called something like .stack_sizes.text._Z3foov or possibly simply .stack_sizes._Z3foov. That way dumping tools can more easily dump the specific individual stack_sizes sections.

Maybe, but renaming is a subject for a different patch.
Also, they linked with a sh_link field to their parents already. Dumping tools can use that.
And renaming to something like .stack_sizes.XXX would require linker side change to place them into the single output section I think,
as currently they are merged by name, just like other regular sections.

test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #147013)

Probably that would be ideal. I am doing unification unconditionally. I think just everyone use -ffunction-sections. So I am not sure if it worth to
add code to avoid doing unification for that particular case right now. (It just a few lines probably though)

Without doing unification asm produced would contain several declarations of .section .stack_sizes,"",@progbits which
are combined into the single section in the object finally. (see original stack-size-section.ll)

@grimar, what is the link-time performance of doing this on something with many functions?

I did not yet measure. I can do the benchmark and return with the results.

Please do - we probably need to consider the impact on bfd and gold as well, although I personally am less concerned there.

Also, I think it would make more sense for the stack sizes section names to be derived from the "parent" section, a bit like relocation sections, so they'd be called something like .stack_sizes.text._Z3foov or possibly simply .stack_sizes._Z3foov. That way dumping tools can more easily dump the specific individual stack_sizes sections.

Maybe, but renaming is a subject for a different patch.

Okay, that's fine.

Also, they linked with a sh_link field to their parents already. Dumping tools can use that.

Only when they want to dump an associated group. It doesn't allow easy dumping of just the single stack sizes section (e.g. via -j in objdump).

And renaming to something like .stack_sizes.XXX would require linker side change to place them into the single output section I think,
as currently they are merged by name, just like other regular sections.

This surprises me. I thought that default grouping would match up to the first '.', like it does for e.g. .text or .data grouping (i.e. .text.foo and .text.bar end up in .text). Or are some sections special-cased for this?

test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #147013)

Probably that would be ideal. I am doing unification unconditionally. I think just everyone use -ffunction-sections. So I am not sure if it worth to add code to avoid doing unification for that particular case right now. (It just a few lines probably though)

I'm not sure I'm willing to generalise that much. However, it may not matter if performance impact is minimal.

I don't think having multiple section declarations for the same section is a big deal. It's already the case for some other sections, at least to a small extent.

! In D46874#1102740, @jhenderson wrote:

! In D46874#1102734, @grimar wrote:
And renaming to something like .stack_sizes.XXX would require linker side change to place them into the single output section I think,
as currently they are merged by name, just like other regular sections.

This surprises me. I thought that default grouping would match up to the first '.', like it does for e.g. .text or .data grouping (i.e. .text.foo and .text.bar end up in .text). Or are some sections special-cased for this?

Yes, .text.*, .data.* and few others are a special case. See LLD code for that:

https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L124

for (StringRef V :
     {".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.",
      ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", ".tbss.",
      ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab."}) {
  if (isSectionPrefix(V, S->Name))
    return V.drop_back();
}

The default behavior is to group by name.

! In D46874#1102740, @jhenderson wrote:

! In D46874#1102734, @grimar wrote:
And renaming to something like .stack_sizes.XXX would require linker side change to place them into the single output section I think,
as currently they are merged by name, just like other regular sections.

This surprises me. I thought that default grouping would match up to the first '.', like it does for e.g. .text or .data grouping (i.e. .text.foo and .text.bar end up in .text). Or are some sections special-cased for this?

Yes, .text.*, .data.* and few others are a special case. See LLD code for that:

https://github.com/llvm-mirror/lld/blob/master/ELF/Writer.cpp#L124

for (StringRef V :
     {".text.", ".rodata.", ".data.rel.ro.", ".data.", ".bss.rel.ro.",
      ".bss.", ".init_array.", ".fini_array.", ".ctors.", ".dtors.", ".tbss.",
      ".gcc_except_table.", ".tdata.", ".ARM.exidx.", ".ARM.extab."}) {
  if (isSectionPrefix(V, S->Name))
    return V.drop_back();
}

The default behavior is to group by name.

Ah okay, I haven't really worked with that area. I assume it's the same with other linkers? Because otherwise, I'd say it makes more sense to not have the special case.

The default behavior is to group by name.

Ah okay, I haven't really worked with that area. I assume it's the same with other linkers?

Yes. They also will place .stack_sizes.XXX and .stack_sizes.YYY to different output sections.

grimar planned changes to this revision.May 21 2018, 3:33 AM

I'll update the status of this later.

grimar added a comment.EditedMay 23 2018, 1:14 AM

The problem of this patch is benchmark results I had.

I tested linking of clang and chromium (used LLD as a linker, linked with -gc-sections).
(both built with -ffunction-sections -fstack-size-section)

Clang:
Link time changes from 0,428s to 0.481s (~ +12%).
Output size changes from 87,216,640 to 86,983,720 bytes (~ -1%).
Total input objects size changes from 191.3mb to 204.2mb (~ +6,7%).

Chromium:
Link time changes from 6.136s to 6.75s (~ +10%)
Output size changes from 756,347,912 to 753,375,064 bytes (~ -0.4%)
Total input objects size changes from 1,920,795,546 to 2,011,722,858 bytes (~ +4.7%)

So for clang, it allows producing 1% smaller output for a cost of +12% to link time.
For chromium, the benefit is just 0.4%, for about the same link time penalty.

I've been thinking more about this since asking you to do the numbers. Our problem is that without doing this, we will need some other approach to handle .stack_sizes entries referring to removed functions (e.g. discarded COMDATs or GC'ed sections). This is essentially the same problem as we have with things like debug data - the stack size entry will have an address (probably zero) that on at least some platforms is a valid address, which would cause problems for consumers of the section. Either they have to special case it (which may not be possible), or they get potentially misleading output.

Is .stack_sizes off by default? I assume so, and if it isn't, it probably should be, since it's not necessarily useful for everybody. If so, I'd suggest you go ahead with this change - it will only impact those who are using it, and they probably want it done properly if they are using -ffunction-sections. Maybe @ruiu has some suggestions from the linker's point of view too?

Is .stack_sizes off by default? I assume so, and if it isn't, it probably should be, since it's not necessarily useful for everybody. If so, I'd suggest you go ahead with this change - it will only impact those who are using it, and they probably want it done properly if they are using -ffunction-sections. Maybe @ruiu has some suggestions from the linker's point of view too?

Yes, it is off by default.

And observing the results, I think it is worth to stop doing unification when -ffunction-sections is off
(to reduce the number of sections and possible linker slowdown).

I'll update the patch.

And observing the results, I think it is worth to stop doing unification when -ffunction-sections is off
(to reduce the number of sections and possible linker slowdown).

I'll update the patch.

I agree, although with one caveat: COMDAT sections should always have a separate .stack_sizes section (maybe in their group), as otherwise we'll have the same invalid problem for discarded COMDATs as I mentioned earlier - i.e. their entries will be preserved, even though they are no longer present.

And observing the results, I think it is worth to stop doing unification when -ffunction-sections is off
(to reduce the number of sections and possible linker slowdown).

I'll update the patch.

I agree, although with one caveat: COMDAT sections should always have a separate .stack_sizes section (maybe in their group), as otherwise we'll have the same invalid problem for discarded COMDATs as I mentioned earlier - i.e. their entries will be preserved, even though they are no longer present.

Sure, this is an optimization we want to keep. That does not require setting unique ID attribute.
It is enough to place .stack_sizes into COMDAT. For example the following code will produce 3 different sections in a object:

.section .stack_sizes,"aG",@progbits,foo,comdat
nop

.section .stack_sizes,"aG",@progbits,bar,comdat
nop

.section .stack_sizes,"",@progbits
nop
grimar updated this revision to Diff 148182.May 23 2018, 4:04 AM
  • Do not do unification when no -ffunction-section is specified.

I think it would be good to have testing that shows that we fragment .stack_sizes for COMDATs even without -ffunction-sections. Otherwise, this change looks good to me, but you should probably get someone else to take a look too, as I'm still getting to grips with the MC code.

grimar updated this revision to Diff 148191.May 23 2018, 5:11 AM

Thanks, James!

  • Updated test case (added no -ffunction-sections case).

I'm wondering whether it would make more sense to do the COMDAT group bit without function-sections in the other stack-size-section test, and then rename the new test to stack-size-section-function-sectons or something similar? What do you think?

grimar updated this revision to Diff 148193.May 23 2018, 5:37 AM

I think that is fine. Updated the test cases as suggested.

jhenderson added inline comments.May 23 2018, 5:49 AM
test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #148193)

Should we still have SHF_LINK_ORDER in these cases? If we GC (or otherwise discard) the text section here, we'll still keep the stack_sizes section, which I don't think we want to do.

I know -gc-sections without -function-sections is rare, but it is legal, and LLD does potentially strip some sections (imagine an object file with only one or two unused functions in, for example).

grimar added inline comments.May 23 2018, 6:52 AM
test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #148193)

Ok. Problem is that currently ELF section key is based on section name, group and unique id:
https://github.com/llvm-mirror/llvm/blob/master/lib/MC/MCContext.cpp#L393

Because of that the following code currently produce single .stack_sizes section:

.section .text,"ax",@progbits
 nop

.section .stack_sizes,"o",@progbits,.text
.byte 1

.section .text,"ax",@progbits
 nop

.section .stack_sizes,"o",@progbits,.text
.byte 2

It would produce 2 if stack_sizes would have unique attribute set.

I can also imagine code when we have multiple .text sections without -ffunction-sections.
Example:

int foo() {
 return 0;
}

int main () __attribute__ ((section (".text.main")));
int main() {
  return 0;
}

In that case, we also would want to link .stack_sizes to its parent correctly. At the same time does not seem
we really want to complicate the logic of finding the ideal unique ID for them right now.
(As -fno-function-sections case is rare and no need to optimize it that early probably)

Given that I think the simple logic that should be OK for start would be to have unique property unconditionally set (for both -ffunction-sections and -fno-function-sections cases), like in one of the previous iterations. What do you think?

grimar added inline comments.May 23 2018, 7:02 AM
test/CodeGen/X86/stack-size-section.ll
16 ↗(On Diff #148193)

My first sample is a bit confusing sorry. What I wanted to say is that without unique attribute it generates single .stack_sizes which is linked to nowhere. And to set sh_link field correctly, I think we need unique attribute now.

grimar updated this revision to Diff 148213.May 23 2018, 7:35 AM
  • Updated patch to match the behavior I suggested in latest comments.

(unconditionally add unique attribute)

Maybe I've been confusing things, because from what I'm seeing in this, there is now a unique stack sizes section for every function, even without -ffunction-sections, which I don't think we want. What I kind of expect to see is this:

  1. Without function sections:
    1. Non-COMDAT functions all share a stack_sizes section which refers via SHF_LINK_ORDER to .text.
    2. COMDAT functions have their own stack_sizes section, which optionally refers via SHF_LINK_ORDER to .text.<symbol>, and is a member of the COMDAT group.
  2. With function sections:
    1. All functions have their own unique stack_sizes section, which refer to their corresponding section via SHF_LINK_ORDER.
    2. COMDAT functions' .stack_sizes sections are members of their group.

Does that make sense?

I think it would be easy to do by only using ID in the elf section creation/fetching if we are for a group section or in ffunction-sections.

Maybe I've been confusing things, because from what I'm seeing in this, there is now a unique stack sizes section for every function, even without -ffunction-sections, which I don't think we want.

That is not ideal, but safe, simple and correct I believe.

What I kind of expect to see is this:

  1. Without function sections:
    1. Non-COMDAT functions all share a stack_sizes section which refers via SHF_LINK_ORDER to .text.

But what about my sample?

int foo() {
 return 0;
}

int main () __attribute__ ((section (".text.main")));
int main() {
  return 0;
}

This code compiled without -ffunction-sections will have 2 .text sections: .text and .text.main.
The current version of the patch would correctly create 2 unique stack sizes and would link them to .text and .text.main
respectively.

There is no way currently to create and link 2 different .stack_sizes to different .text sections without setting different unique ID to them I think.

grimar planned changes to this revision.EditedMay 23 2018, 8:25 AM

What I can probably do is to compute ID based on the .text section name.

So that for no -ffunction-sections case it would emit several .stack_sizes with the same ID and so that final object would contain only a single section finally after merging them,
just like we would want.

It would work for my sample case too I think. Let me try to implement this.

What I can probably do is to compute ID based on the .text section name.

So that for no -ffunction-sections case it would emit several .stack_sizes with the same ID and so that final object would contain only a single section finally after merging them,
just like we would want.

It would work for my sample case too I think. Let me try to implement this.

Yes, I think this all makes sense. Here's a summary of what I think is best without -function-sections enabled:

// These two share a .stack_sizes section
void func1() {}
void func2() {}

// These two share a different .stack_sizes section.
void func3()  __attribute__ ((section (".text.other"))) {}
void func4()  __attribute__ ((section (".text.other"))) {}

// This has it's own .stack_sizes section in its group
template <int I> int func5() { return I; }

Of course, if .stack_sizes section names were based on their "parent" section, then this would probably become much simpler, but we can't do that with the way current linkers behave.

grimar updated this revision to Diff 148356.May 24 2018, 2:10 AM
  • Generate unique ID basing on begin symbol.
  • Updated test case.

What I can probably do is to compute ID based on the .text section name.

So that for no -ffunction-sections case it would emit several .stack_sizes with the same ID and so that final object would contain only a single section finally after merging them,
just like we would want.

It would work for my sample case too I think. Let me try to implement this.

Yes, I think this all makes sense. Here's a summary of what I think is best without -function-sections enabled:

// These two share a .stack_sizes section
void func1() {}
void func2() {}

// These two share a different .stack_sizes section.
void func3()  __attribute__ ((section (".text.other"))) {}
void func4()  __attribute__ ((section (".text.other"))) {}

// This has it's own .stack_sizes section in its group
template <int I> int func5() { return I; }

Yep. The latest diff implements exactly this behavior I believe.

jhenderson accepted this revision.May 24 2018, 3:14 AM

LGTM, but as mentioned earlier, you should probably get a second opinion.

test/CodeGen/X86/stack-size-section-function-sections.ll
11 ↗(On Diff #148356)

Nit: "to COMDAT" -> "to a COMDAT group" and "if such COMDAT exists" -> "if such a COMDAT exists".

test/CodeGen/X86/stack-size-section.ll
33 ↗(On Diff #148356)

Nit: "an unique" -> "a unique" x 2.

English is weird!

This revision is now accepted and ready to land.May 24 2018, 3:14 AM

Thanks, James!

grimar updated this revision to Diff 149056.May 30 2018, 1:22 AM
grimar marked 2 inline comments as done.
  • Addressed grammar nits.
jhenderson accepted this revision.Jun 19 2018, 3:02 AM

LGTM again :)

If there's no other opinion forthcoming, I'm okay with this going in, so that it doesn't sit around any longer.

LGTM again :)

If there's no other opinion forthcoming, I'm okay with this going in, so that it doesn't sit around any longer.

If there will be no more objections, I am inclined to commit this on Friday, 22/06 :) I am always happy to address post-commit comments,
and this functionality is enough specific and isolated under the particular flag (-stack-size-section), so I believe it is OK.

This revision was automatically updated to reflect the committed changes.