This is an archive of the discontinued LLVM Phabricator instance.

[ELF] Better resemble GNU ld when placing orphan sections into memory regions
ClosedPublic

Authored by ikudrin on Nov 1 2021, 5:27 AM.

Details

Summary

An orphan section should be placed in the same memory region as its anchor section if the latter specifies the memory region explicitly. Otherwise, the memory region for an orphan section should be selected by matching section flags and memory region attributes.

Diff Detail

Event Timeline

ikudrin created this revision.Nov 1 2021, 5:27 AM
ikudrin requested review of this revision.Nov 1 2021, 5:27 AM

I'm out of office this week. Will take a look next week if no-one else has.

This makes sense to me, but worth a look from Peter.

lld/test/ELF/linkerscript/orphan-memory.test
11

Tabular output from llvm-readelf may be more readonable. WDYT

38

Add a non-SHF_ALLOC orphan section to show that it does not belong to DATA

ikudrin updated this revision to Diff 385465.Nov 8 2021, 6:27 AM
ikudrin marked 2 inline comments as done.
ikudrin edited the summary of this revision. (Show Details)
  • llvm-readobj -> llvm-readelf
  • Added section .nonalloc in the test
lld/test/ELF/linkerscript/orphan-memory.test
11

Not sure because it requires leading zeroes and some regexes, but it is also more compact.

peter.smith added inline comments.Nov 8 2021, 8:41 AM
lld/ELF/LinkerScript.cpp
903

My first reaction is that I'd expect this to be below the // See if a region can be found by matching section flags.

Looking at LD https://sourceware.org/binutils/docs/ld/MEMORY.html

The attr string is an optional list of attributes that specify whether to use a particular memory region for an input section which is not explicitly mapped in the linker script. As described in SECTIONS, if you do not specify an output section for some input section, the linker will create an output section with the same name as the input section. If you define region attributes, the linker will use them to select the memory region for the output section that it creates.

That implies that if there are attributes/flags for the memory region then we should try these first to make sure that we respect RO, RW etc. If the MEMORY region doesn't have flags then we could fall back on prev in that case. Apologies if I've missed something obvious.

lld/test/ELF/linkerscript/orphan-memory.test
39

Would be good to add a test with attributes (https://sourceware.org/binutils/docs/ld/MEMORY.html) for the memory region as well to make sure that these are respected.

ikudrin added inline comments.Nov 8 2021, 10:12 PM
lld/ELF/LinkerScript.cpp
903

It looks like their implementation does not fully follow that rule. If the script in the test is changed like this:

MEMORY
{
  TEXT (rwx) : ORIGIN = 0x8000, LENGTH = 0x1000
  DATA (rwx) : ORIGIN = 0x9000, LENGTH = 0x1000
}
...

GNU ld still puts .init_array into DATA, not TEXT.

ikudrin updated this revision to Diff 385702.Nov 8 2021, 10:48 PM
  • Add a test with attributes for memory regions

I had a play around with ld.bfd and found it really hard to get anything like what I'd expect to work. I had the best luck when using ! due to the loose way the sections match and no direct assignment to memory regions.

Could you try something like this:

.section .text.01, "ax", %progbits
 nop

 .section .rodata.01, "a", %progbits
 .word 10

 .section .init_array,"aw",@init_array
 .zero 0x10

With

MEMORY
{
  TEXT (RX) : ORIGIN = 0x8000, LENGTH = 0x1000
  DATA (W!X) : ORIGIN = 0x9000, LENGTH = 0x1000
}

SECTIONS {
  .text : { *(.text.01) }
}

In this case I get the map file:

Memory Configuration

Name             Origin             Length             Attributes
TEXT             0x0000000000008000 0x0000000000001000 xr
DATA             0x0000000000009000 0x0000000000001000 w !x
*default*        0x0000000000000000 0xffffffffffffffff

Linker script and memory map

LOAD ld.o

.text           0x0000000000008000        0x4
 *(.text.01)
 .text.01       0x0000000000008000        0x4 ld.o
 .text          0x0000000000008004        0x0 ld.o
OUTPUT(a.out elf64-littleaarch64)
LOAD linker stubs

.rodata.01      0x0000000000008004        0x4
 .rodata.01     0x0000000000008004        0x4 ld.o

.init_array     0x0000000000009000       0x10
 .init_array    0x0000000000009000       0x10 ld.o

With .init_array correctly in the W!X region.

I think that this test case on LLD with the proposed change would put .init_array in TEXT as it would follow .rodata.01 which would be assigned to TEXT? I've not had a chance to build LLD with the patch to check. If I'm right it may be worth checking that the previous region has compatible attributes.

lld/ELF/LinkerScript.cpp
903

I think there is a chance that if the prev has incompatible flags with sec we could assign sec incorrectly. I think that in the majority, perhaps all real programs we'll be saved by the sorting order so this won't happen. I think that there is a possible corner case (I've put the test in the main comment). If I'm right it would be useful to test that the sec->flags are compatible with prev.

ikudrin updated this revision to Diff 386128.Nov 10 2021, 6:01 AM
ikudrin retitled this revision from [ELF] Place an orphan section to the same memory region as its anchor section to [ELF] Better resemble GNU ld when placing orphan sections into memory regions.
ikudrin edited the summary of this revision. (Show Details)

Thanks for the samples, @peter.smith!

My current understanding is that GNU ld puts orphan sections into the same memory regions as their anchor sections if the anchor sections are explicitly assigned to memory regions in the linker script. In that case, it ignores attributes and does the said assignment even if properties of sections and regions conflict. On the other hand, if the anchor section does not have the explicit association and its memory region is selected based on attributes, GNU ld does the same for corresponding orphan sections. That is what your example has demonstrated.

I have updated the patch to support these newly discovered cases. Unfortunately, your sample cannot be used as is in lld right now because the linker ignores the r attribute and thus is unable to put .rodata.01 into the correct region. I am going to prepare another patch to fix that.

peter.smith accepted this revision.Nov 10 2021, 10:42 AM

OK, thanks for the update, I've no further comments. I've set accepted, but please wait to see if MaskRay has any further comments.

This revision is now accepted and ready to land.Nov 10 2021, 10:42 AM
MaskRay accepted this revision.Nov 10 2021, 1:53 PM

Worth mentioning that previous error: no memory region specified for section may be accepted like GNU ld now.

lld/ELF/LinkerScript.cpp
892

ditto below

904