This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Replace MergeOutputSection with synthetic input section MergeSection.
AbandonedPublic

Authored by grimar on Dec 5 2016, 8:17 AM.

Details

Reviewers
ruiu
rafael
Summary

Change allows to get rid of MergeOutputSection. As a result now we can
place mergable sections with different attributes into single output section.
What is consistent to what gold/bfd do.

One of problem I faced was sh_entsize field.
Oracle documentations says:

Some sections hold a table of fixed-size entries, such as a symbol table. For such a section, this member gives the size in bytes of each entry. The member contains 0 if the section does not hold a table of fixed-size entries.

Logic implemented in this patch is to make this field to be zero.

Diff Detail

Event Timeline

grimar updated this revision to Diff 80270.Dec 5 2016, 8:17 AM
grimar retitled this revision from to [ELF] - Replace MergeOutputSection with synthetic input section MergeSection..
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar, evgeny777.
ruiu added inline comments.Dec 5 2016, 9:35 AM
ELF/InputSection.cpp
88

How does this new condition happen?

135–136

Is there a reason to return 0? If you shouldn't call this function until an output section is set, you want to add an assert instead of returning a fake value.

ELF/OutputSections.cpp
139

This is not a right place to add this code. You want to do outside of this function. Instead of hiding this logic inside addSection, make a new function that scans Symtab<ELFT>::X->Sections vector to merge mergeable input sections into synthetic mergeable sections.

ELF/OutputSections.h
75

I do not understand why we need to care about sh_entsize of our output files. Why can't you just leave it 0?

ELF/SyntheticSections.cpp
1691–1696

nit: add a blank line between different code blocks. At first sight this looked like if ~ else if ~ else.

if (Finalize)
  return;
Finalized = true;

if (should...
grimar added inline comments.Dec 6 2016, 7:02 AM
ELF/InputSection.cpp
88

I am creating MergeSection:

MergeSection<ELFT> *Syn = 
  make<MergeSection<ELFT>>(MS->Name, MS->Type, Flags, Alignment);

And should not drop flags for it, because otherwise next condition works wrong later:

template <class ELFT> bool MergeSection<ELFT>::shouldTailMerge() const {
  return (this->Flags & SHF_STRINGS) && Config->Optimize >= 2;
}
135–136

Reason is that this lines can be called when doing rellocateNonAlloc.
When a merge section was only referenced from
non-alloca sections. Like gc-sections-non-alloc-to-merge.s test shows.

In this case MS was not Live so could not have section MergeSec.
getOffset by the way contains inside the next code to handle this:

if (!this->Live)
  return 0;
ELF/OutputSections.cpp
139

This can work for non-script case, but how would you handle linkerscript then ?

This patch already relies on fact that script calls addSection() with merge input sections.
It still creates multiple output sections for different attributes because script has own createKey(),
but OutputSection::addSection is responsible for merging all what I give it.

I tried to do the suggested change and need to fix script logic too because
it still passes MergeInputSections.

Script code for creating the list of sections is next:

std::vector<InputSectionBase<ELFT> *> V = createInputSectionList(*Cmd);

// The output section name `/DISCARD/' is special.
// Any input section assigned to it is discarded.
if (Cmd->Name == "/DISCARD/") {
  discard(V);
  continue;
}

// This is for ONLY_IF_RO and ONLY_IF_RW. An output section directive
// ".foo : ONLY_IF_R[OW] { ... }" is handled only if all member input
// sections satisfy a given constraint. If not, a directive is handled
// as if it wasn't present from the beginning.
//
// Because we'll iterate over Commands many more times, the easiest
// way to "make it as if it wasn't present" is to just remove it.
if (!matchConstraints<ELFT>(V, Cmd->Constraint)) {
  for (InputSectionBase<ELFT> *S : V)
    S->Assigned = false;
  Opt.Commands.erase(Iter);
  --I;
  continue;
}

So looks after the last block of code I can take V, create synthetic mergeable sections for sections in V if any
and update V and Symtab<ELFT>::X->Sections.

There is no way to do that once outside like for non-scripted case. And that will complicate the logic.
Do you think it is correct direction ?

ELF/OutputSections.h
75

Well, that is actually the question I wish to find answer and mentioned in description of patch.
We set sh_entsize and checked it in testcases before, when each different mergeable section was a single output.
If we can ignore it and set to 0, that makes patch a bit easier. Looking on gold output which is inconsistent,
looks that value makes no sence (if not -r elocatable).

grimar added inline comments.Dec 6 2016, 7:27 AM
ELF/OutputSections.cpp
139

So generally I do not see benefits from doing that outside of OutputSection.
Logic of "scan all sections, create new synthetic sections, remove regular mergable sections from array and do something more on a linker script side" looks more complicated that do everything incapsulated in OutputSection in a short method.

grimar updated this revision to Diff 80743.Dec 8 2016, 4:09 AM

Rui, I applied your suggestion and now patch implements both script/non-script case. Could you look ?

  • Addressed comments.
grimar updated this object.Dec 8 2016, 4:17 AM
ruiu added inline comments.Dec 8 2016, 7:38 AM
ELF/InputSection.cpp
135–136

Do you mean that sections which is not marked as live are copied to the result? If so, that sounds like a bug. Anyways, adding this obscure code is not a good thing.

ELF/InputSection.h
204–206

This comment needs refinement. So, MergeInputSections are aggregated to a synthetic input section, MergeSections, and then added to an OutputSection. This pointer points to a MergeSection which this section belongs to.

ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

Why do you have to do this in LinkerScript.cpp? It feels something is not right. Section aggregation should happen before the control is passed to the linker script.

366 ↗(On Diff #80743)

This is odd too.

ELF/OutputSections.cpp
139

I disagree. This patch adds small piece of code to many different places, and in total it is not clear how mergeable sections are handled. What we want to do to mergeable sections is, well, merging. Just merge them at one place and rest should be processed normally.

ELF/OutputSections.h
143

You don't need this, no?

grimar added inline comments.Dec 8 2016, 7:50 AM
ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

How it can be possible ?

Imagine script has:

.mergeable : { *(.aaa) *(.bbb) EXCLUDE_FILE (*file1.o) *(.ccc) }

I am able to merge them only after I have some result from createInputSectionList(), no ? Until that I don't know which sections should I combine.

366 ↗(On Diff #80743)

The same. We do not know what sections are orphans until we proccess the script.

ruiu added inline comments.Dec 8 2016, 7:54 AM
ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

I don't think you need to handle that case.

grimar added inline comments.Dec 8 2016, 8:18 AM
ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

If so, do you think using elf::getOutputSectionName() for naming synthetic merge inputs for script case is ok ?

That way if .data.xxx and .data.yyy are mergeable and script has:
.data : { *(.data*) }
.only_data_xxx : { *(.data.xxx) }
.only_data_yyy : { *(.data.yyy) }
.data.xxx and .data.yyy will be combined into mergeable .data and placed into .data output section.

But according to above, we dont want to handle such cases ?
I am just asking, because see no other appropriate method except use of getOutputSectionName() on early stages then.
As leaving input sections names as is probably not an option as this kills the possible merging cases.

ruiu added inline comments.Dec 8 2016, 8:29 AM
ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

Essentially all input section names are mapped to output section names using getOutputSectionName, so this shouldn't be an exception.

grimar added inline comments.Dec 8 2016, 8:34 AM
ELF/LinkerScript.cpp
351 ↗(On Diff #80743)

For case when we use linkerscript they dont. So doing that will break the example in my previous message.

ruiu added inline comments.Dec 8 2016, 8:35 AM
ELF/LinkerScript.cpp
366 ↗(On Diff #80743)

Is that a real usage? If not, we shouldn't overthink.

grimar updated this revision to Diff 80897.Dec 9 2016, 6:59 AM
  • Addressed review comments.
ELF/InputSection.cpp
135–136

No. I mean that non-allocatable sections can have relocations against sections that are not LIve.
That happens for .debug sections often I thing.
Just please look at the gc-sections-alloc.s. or gc-sections-non-alloc-to-merge.s.

And actually we have similar pieces of code around to handle that, examples:

static typename ELFT::uint getSymVA(const SymbolBody &Body,
                                    typename ELFT::uint &Addend) {
..
uintX_t VA = (SC->OutSec ? SC->OutSec->Addr : 0) + SC->getOffset(Offset);
typename ELFT::uint MergeInputSection<ELFT>::getOffset(uintX_t Offset) const {
...
  if (!this->Live)
    return 0;

I changed that place to be more obious.

ELF/InputSection.h
204–206

Done.

ELF/OutputSections.h
143

Yep, removed.

ruiu added inline comments.Dec 9 2016, 11:21 AM
ELF/InputSection.cpp
135–136

But that is a bug of LLD, no? The invariant is that all relocations from live sections points to live sections. I think handling non-live sections like this is wrong -- you should fix the bug that live sections are marked as non-live instead.

grimar added inline comments.Dec 12 2016, 1:18 AM
ELF/InputSection.cpp
135–136

I do not think it is a bug. This was implemented in r282444 by Rafael.
It is fine that non-alloc debug sections can point to non-live allocatable sections I think.

ruiu added inline comments.Dec 12 2016, 2:50 PM
ELF/InputSection.cpp
135–136

I think that my point is, if a section is written to output, it should be marked as "live" whatever its type is. If non-alloc sections are copied to output unconditionally, they should be marked as "live" by default from the beginning.

Doing something "non-live sections are sometimes actually live under some condition, so handle that special case here" is not a good idea.

ruiu added inline comments.Dec 14 2016, 5:28 PM
ELF/InputSection.cpp
135

Use auto.

ELF/OutputSections.cpp
139

Can you keep the scope as narrow as possible? SS has unnecessarily large scope here. If you do this, it'll be much narrower.

if (auto *Sec = dyn_cast<...>())
  if (Sec->Mergeable)
    ...;
141

You seems to be reinventing dynamic class dispatching. Please define dyn_cast<> for MergeSection<ELFT>.

ELF/Writer.cpp
1728 ↗(On Diff #80897)

Remove V.

grimar updated this revision to Diff 81574.Dec 15 2016, 6:40 AM
  • Addressed review comments.
ELF/InputSection.cpp
135

Done.

ELF/OutputSections.cpp
139

Ok, done.

141

Done.

ruiu added inline comments.Dec 15 2016, 9:41 AM
ELF/InputSection.h
43

The naming of native merge section and synthetic merge section is confusing. Here, you name synthetic merge section SyntheticMerge, but you name an instance of it MergeSection at other place. On the other hand, the section named Merge here has an instance of MergeInputSection. Naming needs to be consistent. Here is my suggestion.

Kinds:
enum Kind { Regular, EHFrame, Merge, Synthetic, MergeSynthetic };

Classes:
MergeInputSection (native)
MergeSyntheticSection (synthetic)
ELF/OutputSections.cpp
141

You don't need a static_cast.

ELF/Writer.cpp
212 ↗(On Diff #81574)

Isn't this too late? I don't see a reason why we can't do this just after createSyntheticSections.

test/ELF/gc-merge-local-sym.s
18 ↗(On Diff #81574)

What is this change?

grimar updated this revision to Diff 81727.Dec 16 2016, 2:26 AM
  • Addressed review comments.
ELF/InputSection.h
43

Ok. Naming things was always hard for me, thanks for suggestions, I applied it.

ELF/OutputSections.cpp
141

Done.

ELF/Writer.cpp
212 ↗(On Diff #81574)

Well. It not "late" but also nothing stops us to do that earlier in a single place too, did that.

test/ELF/gc-merge-local-sym.s
18 ↗(On Diff #81574)

Previously we had next code:

template <class ELFT>
void MergeOutputSection<ELFT>::addSection(InputSectionData *C) {
...
  this->Entsize = Sec->Entsize;
...
}

Now we dont as handle merge sections via regular.

Actually I wrote about that situation earlier :) Please look at patch description and my comment (and "fix") for one of first diff,
which was:
"Well, that is actually the question I wish to find answer and mentioned in description of patch.
We set sh_entsize and checked it in testcases before, when each different mergeable section was a single output.
If we can ignore it and set to 0, that makes patch a bit easier. Looking on gold output which is inconsistent,
looks that value makes no sence (if not -r elocatable)."

I leaved it "just 0" as was suggested by you, though I believe you supposed it was already "0" before, but it was not.
I think it is unclear what value to leave in the output. gold as far I remember just takes the first value by order when combines sections.
So it easy to face inconsistency for its output, what IMO proves that value makes no sense for output sections.
I believe we can put zero here, like this patch do.

ruiu added inline comments.Dec 16 2016, 8:58 AM
ELF/OutputSections.cpp
140

Do you really need this? This looks odd because you are handling MergeInputSection in a special way. Generally you want to avoid that as possible. If you really need this, this may be a sign that we need to fix something else.

ELF/Writer.cpp
200 ↗(On Diff #81727)

As you know we usually do not pass around Symtab<ELFT>::X->Sections.

I found next script in linux kernel:

.rodata : {
 *(.rodata)
 *(.rodata.*)
 . = ALIGN(16);
 video_cards = .;
 *(.videocards)
 video_cards_end = .;
}

We generate video_cards == video_cards_end here, what is wrong and happens because we generate multiple .rodata*.
That what this patch had to fix, seems it is very good now justification about that we need this logic.
And I worked on a new patch basing on this one today (to have a cleaner version and to refresh it in mind).

I had a problem though:
Latest diff of this patch creates all mergeable sections early:

template <class ELFT> void Writer<ELFT>::run() {
  // Create linker-synthesized sections such as .got or .plt.
  // Such sections are of type input section.
  createSyntheticSections();
  combineMergableSections(Symtab<ELFT>::X->Sections);

So on latter steps both linkerscripted case and regular case had mergeable synthetic sections instead of regular mergable input sections.
Diff 2 of that patch did that in a different way, it did the same much later, what was more flexible.

It seems that approach of latest diff will not work. It was simple, but it is too simple to work enough good I think now.
Problem is merge-sections.s for example, which has:

.section        .foo.1a,"aMS",@progbits,1
.asciz "foo"

.section        .foo.1b,"aMS",@progbits,1
.asciz "foo"

and script:

# RUN:   .foo : { begin = .; *(.foo.*) end = .;}
Latest diff creates syntetic sections .foo.1s and .foo.1b. Then script basing on rules combines them into single .foo later.

Problem is that first and second section contain exactly one regular mergeable input section section per each.
So they keep "foo" strings separatelly. What gives us 8 bytes .foo instead of 4.
Issue is that until we parse linkerscripts we unable to create syntetic section .foo with proper name early, because we do not know
that for ".foo.1a" we should create synthetic section ".foo".

And then when we already have 2 synthetic mergeable sections .foo.1a and .foo.1b created suring script run, all we can do is place them to output,

but we can not merge them together on that step.

Diff2 on opposite created synthetic mergeable sections on linkerscript side later. What was a bit more complicated, but solved
that problem. So it took a list of regular sections, grouped mergeable together and created synthetic inputs for them. Which were added to output finally.
I think we want to have this optimization, does not ?

So I think we should return to Diff 2 approach. I am going to use it as a base, rework and upload fresh patch probably.

What do you think, Rui ?

ruiu edited edge metadata.Jan 26 2017, 11:22 AM

I probably do not understand what you are trying to solve.

Currently, LLD merges two mergeable input sections if they have the same name, types and section flags. But, you are saying that that is not always correct, right?

Can you briefly describe the exact semantics this patch is trying to implement, and then why you think that is the better behavior?

In D27415#657878, @ruiu wrote:

I probably do not understand what you are trying to solve.

Currently, LLD merges two mergeable input sections if they have the same name, types and section flags. But, you are saying that that is not always correct, right?

Can you briefly describe the exact semantics this patch is trying to implement, and then why you think that is the better behavior?

I am trying to solve issue I am demonstrating at D29217 page.
It has:

.section .aaa.1,"a"
.byte 11
.section .aaa.2,"a"
.byte 22

.section .bbb.1,"aMS",@progbits,1
.asciz "foo"
.section .bbb.2,"aMS",@progbits,1
.asciz "foo"

.section .ccc.1,"a"
.byte 33
.section .ccc.2,"a"
.byte 44

It has also symbols assignments to A and B that should mark the start/end of .ccc :

.rodata : { *(.aaa.*) *(.bbb.*) A = .; *(.ccc.*) B = .; }

1. LLD currently (clean head revision) do:

It creates 2 output .rodata sections:
.rodata (.aaa.1 .aaa.2 .ccc.1 .ccc.2)
.rodata (.bbb.1 .bbb.2)

And because of that it assigns values of A and B wrong currently. I think the cleanest way to fix is implement synthetic merge section instead of MergeOutputSection.
(this is what this patch was about, right ?)
That way output should be single .rodata section which consist of [ .aaa.1 .aaa.2, synthetic section holding <.bbb.1 and .bbb.2>, .ccc.1, .ccc.2]. And symbols will be assigned correctly then.
Output will probably be equal to bfd/gold, btw.

2. Latest diff of this patch do:

It creates synthetic sections for mergeable sections early now. Before passing them to script. After that we have next input sections available to work with on linerscript side:

[.aaa.1 .aaa.2 SynteticMergeSection(.bbb.1) SynteticMergeSection(.bbb.2) .ccc.1 .ccc.2]

Since synthetic merge sections are already created, script just places all above in a single .rodata section. We loose string merging optimization here, because have 2 pre-created SynteticMergeSections.
There is no way to merge 2 synthetic mergable sections together on this step (I do not think we want to implement this too).
And we can not create single .bbb section for holding .bbb.1 and .bbb.2 early before proccessing linkerscript commands, because we do not know it will want to put them together.

3. Patch I want to do basing on what diff2 of this patch already did:

I want to create synthetic merge sections later. For linkerscript step it should be done after script prepares input sections list to create output sections. That was done in diff2.
That way we have inputs:
[.aaa.1 .aaa.2 MergeInputSection(.bbb.1) MergeInputSection(.bbb.2) .ccc.1 .ccc.2]

Script will create single SynteticMergeSection for mergable input sections ind convert inputs to:
[.aaa.1 .aaa.2 SynteticMergeSection(.bbb.1, .bbb.2) .ccc.1 .ccc.2]

And we will end up with single .rodata with all optimizations working.

To be more clear D29217 show 2 different issues.

  1. The main issue I am trying to solve - wrong symbols values generated in scripts like:
.rodata : {
 *(.rodata)
 *(.rodata.*)
 . = ALIGN(16);
 video_cards = .;
 *(.videocards)
 video_cards_end = .;

}

For fixing just only that case, probably latest diff of that patch should work.
Because section getOutputSectionName() returns ".rodata" name for all ".rodata*" sections. That way we will correctly apply merging
optimization because all ".rodata.aaa .rodata.bbb .rodata.ccc" will be just mapped to ".rodata". And we create single synthetic merge section.

  1. Another issue is that if sections names are not those from "getOutputSectionName()". That way for example if we have 2 SHF_MERGE sections

.rodata : { *(foo.1 foo.2) } we are not able to know we should create single mergeable output section if we create them early. And that way we lose possible optimization.

I'll rebase this and probably submit alternative patch based on diff2 soon that solves second problem too.

grimar updated this revision to Diff 86053.Jan 27 2017, 8:29 AM
  • Addressed review comments.
  • Rebased and performed cleanup.

This continues the review flow.

Patch that do different way (based on diff 2) is not ready yet, but available as demo here D29223.
It has very close source base, the major difference is when combineMergableSections is called.

I suggest the next thing for now: land this first as this it should help to fix the issue with script found in linux kernel.
Then we can discuss if we should fix the rest problem I mentioned earlier, which does not affect kernel for now,
but still probably worth to fix. I'll be able to update D29223 and minimize the amount of changes as well then.

ELF/OutputSections.cpp
140

When symbols are created they refer to MergeInputSection and not to synthetic merge section, created later.
I think that is fine. That piece of code simplified things a bit, though I was able to get rid of.

ELF/Writer.cpp
200 ↗(On Diff #81727)

Fixed.

grimar added inline comments.Jan 27 2017, 8:32 AM
ELF/SyntheticSections.cpp
103

I did the change here to reduce amount of testcases failed.
Since comment section is converted to MergeSyntheticSection later, it is changes it's order
and brokes many testcases. To avoid that - I am creating it as merge synthetic section initially.

I know that probably not the best idea, though it is relative to what this patch do, is it ok to do ?

This version also does not allow .abc : { .rodata.foo .rodata.bar} .def : { .rodata.zed} to work correctly,
because .rodata created early will contain all 3 sections.

And D29223 is free of that problem as well. That is argument for switching to something like D29223,
though I think we can still land this one first as a base.

grimar abandoned this revision.Jan 31 2017, 8:43 AM

D29330 posted instead.