This is an archive of the discontinued LLVM Phabricator instance.

COFF: Use (name, output characteristics) as a key when grouping input sections into output sections.
ClosedPublic

Authored by pcc on Apr 18 2018, 11:34 PM.

Details

Summary

This is what link.exe does and lets us avoid needing to worry about
merging output characteristics while adding input sections to output
sections.

With this change we can't process /merge in the same way as before
because sections with different output characteristics can still
be merged into one another. So this change moves the processing of
/merge to just before we assign addresses. In the case where there
are multiple output sections with the same name, link.exe only merges
the first section with the source name into the first section with
the target name, and we do the same.

This isn't quite enough though because link.exe has a special case for
.CRT in 32-bit mode: it processes sections whose output characteristics
are DATA | R | W as though the output characteristics were DATA | R
(so that they get merged into things like constructor lists in the
expected way). Chromium has a few such sections, and it turns out
that those sections were causing the problem that resulted in r318699
(merge .xdata into .rdata) being reverted: because of the previous
permission merging semantics, the .CRT sections were causing the entire
.rdata section to become writable, which caused the SEH runtime to
crash because it apparently requires .xdata to be read-only. This
change also implements the same special case.

This should unblock being able to merge .xdata into .rdata by default,
as well as .bss into .data, both of which will be done in followups.

Depends on D45799

Event Timeline

pcc created this revision.Apr 18 2018, 11:34 PM
rnk added a comment.Apr 19 2018, 9:38 AM

because of the previous
permission merging semantics, the .CRT sections were causing the entire
.rdata section to become writable, which caused the SEH runtime to
crash because it apparently requires .xdata to be read-only. This
change also implements the same special case.

Wow. So, 32-bit Chromium linked with LLD had a writable .rdata. We definitely don't want to ship that. Thanks for investigating this!

lld/COFF/Writer.cpp
720

Do we need to remove From somehow? What are the semantics of /merge:.foo=.bar /merge:.baz=.foo? Move all input sections in .foo to .bar, and then move all input sections from .baz to .foo? Or, everything in .bar?

ruiu added inline comments.Apr 19 2018, 9:55 AM
lld/COFF/Writer.cpp
1138–1141

Does this mean that we now have multiple sections that have the same name? Is it OK?

rnk added a subscriber: zturner.Apr 19 2018, 10:00 AM
rnk added a comment.Apr 19 2018, 10:03 AM

@zturner: We need to add logic to this section merging code to make S_SECTION / S_COFFGROUP records.

In D45801#1072314, @rnk wrote:

@zturner: We need to add logic to this section merging code to make S_SECTION / S_COFFGROUP records.

We already emit the S_SECTION symbols (see the function addLinkerModuleSectionSymbol), but you're right, we don't emit the S_COFFGROUP symbols yet.

pcc added a comment.Apr 19 2018, 11:02 AM
In D45801#1072314, @rnk wrote:

@zturner: We need to add logic to this section merging code to make S_SECTION / S_COFFGROUP records.

We already emit the S_SECTION symbols (see the function addLinkerModuleSectionSymbol), but you're right, we don't emit the S_COFFGROUP symbols yet.

What is an S_COFFGROUP? From looking at our test cases it appears to be a group of input sections with the same name (including the $ suffix), right? Can't we infer them in the PDB writer by looking at input section names?

lld/COFF/Writer.cpp
720

We don't need to remove From because it will be taken care of by removeEmptySections.

The semantics for your example are that everything gets moved to .bar, but before this change we were just taking one step (so .baz is renamed to .foo and the input .foo sections are merged into .bar). I didn't intend to change this as part of this change, but it turns out that we happen to do the right thing in your example because .baz is alphabetically less than .foo. Probably the best thing to do is to implement multi-step merging in this change as well.

1138–1141

It seems to be fine, it's what link.exe does and the Windows loader doesn't have a problem with it.

pcc updated this revision to Diff 143153.Apr 19 2018, 12:57 PM
  • Implement multi-step merging
smeenai added inline comments.
lld/COFF/Writer.cpp
458

If I'm reading this correctly link.exe only has this special case in 32-bit mode, but we're doing it for all architectures?

Also, when you say 32-bit, do you mean x86 specifically, or 32-bit ARM as well?

pcc added inline comments.Apr 19 2018, 2:39 PM
lld/COFF/Writer.cpp
458

I can only reproduce it with IMAGE_FILE_MACHINE_I386. Do you think it's worth limiting based on machine?

pcc updated this revision to Diff 143167.Apr 19 2018, 3:01 PM
pcc marked an inline comment as done.
  • Fix cycle detection and add a test
smeenai added inline comments.Apr 19 2018, 3:30 PM
lld/COFF/Writer.cpp
458

I don't think it's worth limiting based on machine, but it's worth mentioning in the comment that LLD is just doing the special-case for all architectures. I spent a bit of time hunting around for if this was somehow limited to 32-bit from an enclosing scope, for example (since the comment to me implied that we only wanted the special case on 32-bit as well).

pcc marked an inline comment as done.Apr 19 2018, 4:11 PM
pcc updated this revision to Diff 143184.Apr 19 2018, 4:11 PM
  • Clarify comment
pcc updated this revision to Diff 143187.Apr 19 2018, 4:13 PM
  • Clarify comment further
pcc added a comment.Apr 20 2018, 2:02 PM

Any more comments on this?

ruiu accepted this revision.Apr 20 2018, 2:03 PM

No I don't. LGTM

This revision is now accepted and ready to land.Apr 20 2018, 2:03 PM
This revision was automatically updated to reflect the committed changes.