This is an archive of the discontinued LLVM Phabricator instance.

[llvm-objcopy][ELF] Don't assume RELA sections are dynamic if they carry the SHF_ALLOC flag
AbandonedPublic

Authored by ardb on Oct 5 2021, 1:38 PM.

Details

Summary

Currently, llvm-objcopy assumes that any SHT_REL/SHT_RELA type section with the SHF_ALLOC attribute is a dynamic relocation section, and therefore treats any linked symbol and string tables as being of the dynamic variety.

However, the ELF spec permits the use of SHF_ALLOC for ordinary SHT_RELA type sections as well, and so this heuristic is not entirely accurate, and may produce incorrect results, resulting in cryptic error messages such as

llvm-strip: error: Link field value 48 in section .rela.text is not a symbol table

where the file in question does in fact have a SYMTAB section at #48.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

Note that this requires that reading the section headers is deferred until after we figure out the file type, which also means that the code that finds the executable header offset needs to be reworked somewhat so it does not rely on this.

Diff Detail

Event Timeline

ardb created this revision.Oct 5 2021, 1:38 PM
ardb requested review of this revision.Oct 5 2021, 1:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2021, 1:38 PM

Thanks for the patch! Perhaps worth adding a new test to llvm/test/tools/llvm-objcopy/ELF/?

With Diff 377339 applied, I observe failures in the following tests:

Failed Tests (2):
  LLVM :: tools/llvm-objcopy/ELF/basic-relocations.test
  LLVM :: tools/llvm-objcopy/ELF/no-symbol-relocation.test

$ llvm-lit -vv llvm/test/tools/llvm-objcopy/ELF/basic-relocations.test llvm/test/tools/llvm-objcopy/ELF/no-symbol-relocation.test

Perhaps they need to be fixed up in conjunction with this change?

ardb updated this revision to Diff 377432.Oct 5 2021, 10:55 PM

Refine the check so that

  • ET_REL file types are assumed to only have static relocation sections
  • ET_DYN/ET_EXEC file types are permitted to have either, and the disambiguation is based on the SHF_ALLOC flag as before
jhenderson requested changes to this revision.Oct 6 2021, 12:35 AM

So let's replace this with a check against the ELF file type, which is more appropriate: partially linked binaries of type ET_REL cannot have dynamic relocation sections, and ET_EXEC/ET_DYN fully linked cannot have the static ones. Note that this requires that reading the section headers is deferred until after we figure out the file type, which also means that finding the executable header offset

Sorry, but this statement is incorrect. See for example LLD's --emit-relocs option: static relocations can be emitted in a fully linked output file. I guess you may have discovered this already though?

The only real way of being able to truly tell whether a relocation section is a dynamic relocation section is if it is referred to by the dynamic table via one of the appropriate tags. We previously didn't go down this route, as it requires doing something non-trivial that llvm-objcopy didn't do before.

Have you got an actual output that demonstrates this issue, or is it merely theoretical? If you do have an actual output, why are the relocation sections SHF_ALLOC, but not dynamic relocations? Such relocations would take up memory space in the output file, not to mention that I suspect some linkers would get very confused by them/throw them away etc.

Please ensure your patch has testing for the new behaviour, that would have failed before this change, and that you upload your diff with full context (use -U999999 if using tools like patch to generate the diff). Please also make sure you have used clang-format to reformat your changes.

This revision now requires changes to proceed.Oct 6 2021, 12:35 AM
ardb updated this revision to Diff 377480.Oct 6 2021, 2:55 AM
ardb edited the summary of this revision. (Show Details)

Added test case
Add context to diff
Fix whitespace error

ardb retitled this revision from llvm-objcopy][ELF] Don't assume RELA sections are dynamic if they carry the SHF_ALLOC flag to [llvm-objcopy][ELF] Don't assume RELA sections are dynamic if they carry the SHF_ALLOC flag.Oct 6 2021, 3:57 AM

I'm not sure what's happened, but something with the diff is causing an error (possibly you need a new line at the end of the diff file?)

ardb added a comment.Oct 6 2021, 4:08 AM

I'm not sure what's happened, but something with the diff is causing an error (possibly you need a new line at the end of the diff file?)

I don't see any errors. Could you be more specific?

I'm not sure what's happened, but something with the diff is causing an error (possibly you need a new line at the end of the diff file?)

I don't see any errors. Could you be more specific?

The error seems to have gone away now. I don't know why.

I'll take a look at this patch tomorrow hopefully.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.
Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

--strip-debug erroring on unrelated error: Link field value 3 in section .rela.text is not a symbol table may be undesired, but we may have other means to address the issue if you provide enough context.

MaskRay requested changes to this revision.Oct 6 2021, 9:49 AM
This revision now requires changes to proceed.Oct 6 2021, 9:49 AM
ardb added a comment.Oct 6 2021, 12:51 PM

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.

Where is this specified?

Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

The use case in question does not require the symtab section or strtab section, but I agree that this is an inconsistency.

--strip-debug erroring on unrelated error: Link field value 3 in section .rela.text is not a symbol table may be undesired, but we may have other means to address the issue if you provide enough context.

We have a file like this

[ 1] .text             PROGBITS         0000000000000000  00001000
     0000000000024379  0000000000000000 AXo       0     0     4096
...
[ 4] .rodata           PROGBITS         0000000000000000  0002be80
     0000000000005f01  0000000000000000 AMS       0     0     64
...
[ 9] .rela.text        RELA             0000000000000000  00031e00
     000000000000dc50  0000000000000018  AI      48     1     8
...
[40] .rela.rodata[...] RELA             0000000000000000  00045a50
     0000000000000060  0000000000000018  AI      48     4     8
...
[48] .symtab           SYMTAB           0000000000000000  00092208
     000000000000c558  0000000000000018          50   1252     8
...
[50] .strtab           STRTAB           0000000000000000  0009ea62
     000000000000b44b  0000000000000000           0     0     1

Relocation section '.rela.init.rodata' at offset 0x48298 contains 131 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000000 06f200000105 R_AARCH64_PREL32 0000000000000000 .rela.text + 0
000000000008 06f300000105 R_AARCH64_PREL32 0000000000000000 .rela.rodata + 0
...

Symbol table '.symtab' contains 2105 entries:

 Num:    Value          Size Type    Bind   Vis      Ndx Name
...
1778: 0000000000000000     0 SECTION GLOBAL DEFAULT    9 .rela.text
1779: 0000000000000000     0 SECTION GLOBAL DEFAULT   40 .rela.rodata

The reason we want this is to be able to track how/where the text and rodata were modified after they were loaded into memory and relocated in-place.

Are you claiming that this arrangement violates the ELF spec? If it does, it would be helpful if you could point out why this is the case. I agree that this is an unusual use of ELF sections, but that by itself should not be a justification for disregarding this change.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.

Where is this specified?

This area is not specified in the generic ABI. This is a convention used as an (unspecified) protocol between the linker and the loader.
I think the convention started when SunOS contributed dynamic linking to System V release 4 which was then inherited to all ELF operating systems.

Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

The use case in question does not require the symtab section or strtab section, but I agree that this is an inconsistency.

This is the exact point I question the Linux kernel's use case.
So I asked for a link about the usage.

--strip-debug erroring on unrelated error: Link field value 3 in section .rela.text is not a symbol table may be undesired, but we may have other means to address the issue if you provide enough context.

We have a file like this

[ 1] .text             PROGBITS         0000000000000000  00001000
     0000000000024379  0000000000000000 AXo       0     0     4096
...
[ 4] .rodata           PROGBITS         0000000000000000  0002be80
     0000000000005f01  0000000000000000 AMS       0     0     64
...
[ 9] .rela.text        RELA             0000000000000000  00031e00
     000000000000dc50  0000000000000018  AI      48     1     8
...
[40] .rela.rodata[...] RELA             0000000000000000  00045a50
     0000000000000060  0000000000000018  AI      48     4     8
...
[48] .symtab           SYMTAB           0000000000000000  00092208
     000000000000c558  0000000000000018          50   1252     8
...
[50] .strtab           STRTAB           0000000000000000  0009ea62
     000000000000b44b  0000000000000000           0     0     1

Relocation section '.rela.init.rodata' at offset 0x48298 contains 131 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000000 06f200000105 R_AARCH64_PREL32 0000000000000000 .rela.text + 0
000000000008 06f300000105 R_AARCH64_PREL32 0000000000000000 .rela.rodata + 0
...

Symbol table '.symtab' contains 2105 entries:

 Num:    Value          Size Type    Bind   Vis      Ndx Name
...
1778: 0000000000000000     0 SECTION GLOBAL DEFAULT    9 .rela.text
1779: 0000000000000000     0 SECTION GLOBAL DEFAULT   40 .rela.rodata

The reason we want this is to be able to track how/where the text and rodata were modified after they were loaded into memory and relocated in-place.

Are you claiming that this arrangement violates the ELF spec? If it does, it would be helpful if you could point out why this is the case. I agree that this is an unusual use of ELF sections, but that by itself should not be a justification for disregarding this change.

The ELF specification is loosely specified in some areas. Not violating the ELF spec doesn't mean we should add some code (complexity).
We need to place house rules.
I requested a link so that we can discover whether the Linux kernel usage is legitimate, because I suspect this is an instance of the XY Problem.
The Linux kernel usage can probably be fixed instead of hacking llvm-objcopy.

ardb added a comment.Oct 6 2021, 2:22 PM

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.

Where is this specified?

This area is not specified in the generic ABI. This is a convention used as an (unspecified) protocol between the linker and the loader.

Which linker and which loader? Does llvm-objcopy only target ELF files created with LLD for use in user programs? And which libc/libdl implementations is it limited to?

Please try to understand that ELF may be used in different ways in different contexts, and unspecified protocols and conventions that only cover user executables and shared libraries may not work for complex bare metal programs such as the Linux kernel.

I think the convention started when SunOS contributed dynamic linking to System V release 4 which was then inherited to all ELF operating systems.

ELF user space programs, right? What about other contexts where ELF is used?

Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

The use case in question does not require the symtab section or strtab section, but I agree that this is an inconsistency.

This is the exact point I question the Linux kernel's use case.
So I asked for a link about the usage.

There is some background here:
https://android-review.googlesource.com/c/kernel/common/+/1842462/3

--strip-debug erroring on unrelated error: Link field value 3 in section .rela.text is not a symbol table may be undesired, but we may have other means to address the issue if you provide enough context.

We have a file like this

[ 1] .text             PROGBITS         0000000000000000  00001000
     0000000000024379  0000000000000000 AXo       0     0     4096
...
[ 4] .rodata           PROGBITS         0000000000000000  0002be80
     0000000000005f01  0000000000000000 AMS       0     0     64
...
[ 9] .rela.text        RELA             0000000000000000  00031e00
     000000000000dc50  0000000000000018  AI      48     1     8
...
[40] .rela.rodata[...] RELA             0000000000000000  00045a50
     0000000000000060  0000000000000018  AI      48     4     8
...
[48] .symtab           SYMTAB           0000000000000000  00092208
     000000000000c558  0000000000000018          50   1252     8
...
[50] .strtab           STRTAB           0000000000000000  0009ea62
     000000000000b44b  0000000000000000           0     0     1

Relocation section '.rela.init.rodata' at offset 0x48298 contains 131 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000000 06f200000105 R_AARCH64_PREL32 0000000000000000 .rela.text + 0
000000000008 06f300000105 R_AARCH64_PREL32 0000000000000000 .rela.rodata + 0
...

Symbol table '.symtab' contains 2105 entries:

 Num:    Value          Size Type    Bind   Vis      Ndx Name
...
1778: 0000000000000000     0 SECTION GLOBAL DEFAULT    9 .rela.text
1779: 0000000000000000     0 SECTION GLOBAL DEFAULT   40 .rela.rodata

The reason we want this is to be able to track how/where the text and rodata were modified after they were loaded into memory and relocated in-place.

Are you claiming that this arrangement violates the ELF spec? If it does, it would be helpful if you could point out why this is the case. I agree that this is an unusual use of ELF sections, but that by itself should not be a justification for disregarding this change.

The ELF specification is loosely specified in some areas. Not violating the ELF spec doesn't mean we should add some code (complexity).
We need to place house rules.

House rules are fine if you communicate them in some way. Dismissing something because it violates unspecified protocols or unspecified house rules seems arbitrary at the very least.

I understand that you may dislike the use of SHF_ALLOC SHT_RELA sections in this manner, but that does not make it wrong, even if it is in poor taste.

I requested a link so that we can discover whether the Linux kernel usage is legitimate, because I suspect this is an instance of the XY Problem.
The Linux kernel usage can probably be fixed instead of hacking llvm-objcopy.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.

Where is this specified?

This area is not specified in the generic ABI. This is a convention used as an (unspecified) protocol between the linker and the loader.

Which linker and which loader? Does llvm-objcopy only target ELF files created with LLD for use in user programs? And which libc/libdl implementations is it limited to?

I have considered many linkers (GNU ld, gold, ld.lld) and many loaders (glibc ld.so, musl ld.so, FreeBSD rtld, NetBSD libexec-elf).

Please try to understand that ELF may be used in different ways in different contexts, and unspecified protocols and conventions that only cover user executables and shared libraries may not work for complex bare metal programs such as the Linux kernel.

Sure. But SHF_ALLOC SHT_REL[A] in an ET_REL file is a quite unusual thing.
How did you get it in the first place?
I don't think any linker's -r mode can synthesize such relocation sections.

The justification for its usage is exactly the thing I requested.
I requested a kernel discussion in my initial reply, but you did not provide one until this comment.

I think the convention started when SunOS contributed dynamic linking to System V release 4 which was then inherited to all ELF operating systems.

ELF user space programs, right? What about other contexts where ELF is used?

Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

The use case in question does not require the symtab section or strtab section, but I agree that this is an inconsistency.

This is the exact point I question the Linux kernel's use case.
So I asked for a link about the usage.

There is some background here:
https://android-review.googlesource.com/c/kernel/common/+/1842462/3

--strip-debug erroring on unrelated error: Link field value 3 in section .rela.text is not a symbol table may be undesired, but we may have other means to address the issue if you provide enough context.

We have a file like this

[ 1] .text             PROGBITS         0000000000000000  00001000
     0000000000024379  0000000000000000 AXo       0     0     4096
...
[ 4] .rodata           PROGBITS         0000000000000000  0002be80
     0000000000005f01  0000000000000000 AMS       0     0     64
...
[ 9] .rela.text        RELA             0000000000000000  00031e00
     000000000000dc50  0000000000000018  AI      48     1     8
...
[40] .rela.rodata[...] RELA             0000000000000000  00045a50
     0000000000000060  0000000000000018  AI      48     4     8
...
[48] .symtab           SYMTAB           0000000000000000  00092208
     000000000000c558  0000000000000018          50   1252     8
...
[50] .strtab           STRTAB           0000000000000000  0009ea62
     000000000000b44b  0000000000000000           0     0     1

Relocation section '.rela.init.rodata' at offset 0x48298 contains 131 entries:

Offset          Info           Type           Sym. Value    Sym. Name + Addend

000000000000 06f200000105 R_AARCH64_PREL32 0000000000000000 .rela.text + 0
000000000008 06f300000105 R_AARCH64_PREL32 0000000000000000 .rela.rodata + 0
...

Symbol table '.symtab' contains 2105 entries:

 Num:    Value          Size Type    Bind   Vis      Ndx Name
...
1778: 0000000000000000     0 SECTION GLOBAL DEFAULT    9 .rela.text
1779: 0000000000000000     0 SECTION GLOBAL DEFAULT   40 .rela.rodata

The reason we want this is to be able to track how/where the text and rodata were modified after they were loaded into memory and relocated in-place.

Are you claiming that this arrangement violates the ELF spec? If it does, it would be helpful if you could point out why this is the case. I agree that this is an unusual use of ELF sections, but that by itself should not be a justification for disregarding this change.

The ELF specification is loosely specified in some areas. Not violating the ELF spec doesn't mean we should add some code (complexity).
We need to place house rules.

House rules are fine if you communicate them in some way. Dismissing something because it violates unspecified protocols or unspecified house rules seems arbitrary at the very least.

See my previous paragraph. I am not sure "dismissing something" is on my side. I requested a link in my first reply, but you only did just now.
OK, let's just calm down.

I understand that you may dislike the use of SHF_ALLOC SHT_RELA sections in this manner, but that does not make it wrong, even if it is in poor taste.

I follow Occam's razor for such unusual things.
If your patch decreased code complexity I would probably not object.
But your patch increases complexity, so I request justification, especially given that it quite unusual (SHF_ALLOC SHT_REL[A] in an ET_REL file; a SHF_ALLOC section's sh_link field links to a non-SHF_ALLOC section).

"does not make it wrong" is not a good justification. Why can't such unusual usage be avoided in the first place?

I requested a link so that we can discover whether the Linux kernel usage is legitimate, because I suspect this is an instance of the XY Problem.
The Linux kernel usage can probably be fixed instead of hacking llvm-objcopy.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

This should be refined. https://android-review.googlesource.com/c/kernel/common/+/1842462/3 seems to propose to add such usage but it is not pre-existing. I would certainly object to such usage if I had a chance to review such kernel code.

I have considered many linkers (GNU ld, gold, ld.lld) and many loaders (glibc ld.so, musl ld.so, FreeBSD rtld, NetBSD libexec-elf).

I guess the kernel's module loader is the oddball here? It needs to process relocations from dynamically loaded objects.

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format

Seems to confirm.

OK, let's just calm down.

*cringe* Please don't tell people that. We need to be respectful to everyone. You and @ardb know more about ELF than anyone I know; you two would make great teammates, if you could learn to play nice.

Perhaps you could isolate the commands used to recreate an ELF object that demonstrates the issues in llvm-strip? That might help us identify perhaps a different way to achieve the desired outcome? At the very least it would demonstrate how such a binary could be created. Though it seems your added test case already demonstrates that. @MaskRay is that not enough? I'm guessing make ... V=1 in KBuild could help isolate the precise tool invocations?

When you refer to "use a host tool to tweak the ELF metadata of this module so that the RELA data is preserved and accessible to the module init code." by host tool are you referring to $(OBJTOOL)?

I'm guessing https://android-review.googlesource.com/c/kernel/common/+/1842462/4/arch/arm64/Makefile.postlink#b11 is part of it? IIUC, (which I probably don't), we're moving a few sections to two new ones with alloc+readonly? Do they not have alloc beforehand?

resulting in cryptic error messages such as
llvm-strip: error: Link field value 48 in section .rela.text is not a symbol table
where the file in question does in fact have a SYMTAB section at #48.

I think we can all agree that the error message here could be more helpful to the end user.

we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

I wonder what other loaders do in this case? Perhaps that is the major difference that all of this is predicated on?

ardb added a comment.Oct 6 2021, 11:05 PM

While this occurs rarely for normal user programs, the Linux kernel uses partially linked ET_REL objects as its loadable module format, and there, we sometimes rely on SHF_ALLOC to indicate to the module loader that the static relocations (which the Linux kernel fixes up at module load time) need to be preserved in memory after loading of the module completes.

So let's extend the existing heuristic with a check against the file type: for ET_REL type files, which are only partiallly linked and cannot carry dynamic relocations, we assume that the RELA sections are static. For ET_DYN/ET_EXEC type files, we stick to the old behavior, which is to use the SHF_ALLOC flag as a hint to decide whether relocation sections are static or dynamic.

If this is a Linux kernel requirement, can you link to the related discussion?
From my experience dealing with ELF files, I don't think the Linux kernel usage is correct.

SHT_REL[A] SHF_ALLOC sections are dynamic relocation sections.

Where is this specified?

This area is not specified in the generic ABI. This is a convention used as an (unspecified) protocol between the linker and the loader.

Which linker and which loader? Does llvm-objcopy only target ELF files created with LLD for use in user programs? And which libc/libdl implementations is it limited to?

I have considered many linkers (GNU ld, gold, ld.lld) and many loaders (glibc ld.so, musl ld.so, FreeBSD rtld, NetBSD libexec-elf).

Please try to understand that ELF may be used in different ways in different contexts, and unspecified protocols and conventions that only cover user executables and shared libraries may not work for complex bare metal programs such as the Linux kernel.

Sure. But SHF_ALLOC SHT_REL[A] in an ET_REL file is a quite unusual thing.
How did you get it in the first place?
I don't think any linker's -r mode can synthesize such relocation sections.

$ llvm-objcopy --set-section-flags=.rela.text=alloc,readonly fips140.ko

could be used, but in this case, I am using my own C program.

The justification for its usage is exactly the thing I requested.
I requested a kernel discussion in my initial reply, but you did not provide one until this comment.

True, my apologies.

I think the convention started when SunOS contributed dynamic linking to System V release 4 which was then inherited to all ELF operating systems.

ELF user space programs, right? What about other contexts where ELF is used?

Since ET_REL files should not contain dynamic linking information, it doesn't make sense to use SHT_REL[A] SHF_ALLOC sections.
At the very least SHF_ALLOC .rela.text linking to non-SHF_ALLOC .symtab seems wrong.

The use case in question does not require the symtab section or strtab section, but I agree that this is an inconsistency.

This is the exact point I question the Linux kernel's use case.
So I asked for a link about the usage.

There is some background here:
https://android-review.googlesource.com/c/kernel/common/+/1842462/3

...

The reason we want this is to be able to track how/where the text and rodata were modified after they were loaded into memory and relocated in-place.

Are you claiming that this arrangement violates the ELF spec? If it does, it would be helpful if you could point out why this is the case. I agree that this is an unusual use of ELF sections, but that by itself should not be a justification for disregarding this change.

The ELF specification is loosely specified in some areas. Not violating the ELF spec doesn't mean we should add some code (complexity).
We need to place house rules.

House rules are fine if you communicate them in some way. Dismissing something because it violates unspecified protocols or unspecified house rules seems arbitrary at the very least.

See my previous paragraph. I am not sure "dismissing something" is on my side. I requested a link in my first reply, but you only did just now.
OK, let's just calm down.

Apologies if the tone of my reply suggested otherwise, but I can assure you that I am perfectly calm.

I understand that you may dislike the use of SHF_ALLOC SHT_RELA sections in this manner, but that does not make it wrong, even if it is in poor taste.

I follow Occam's razor for such unusual things.
If your patch decreased code complexity I would probably not object.
But your patch increases complexity, so I request justification, especially given that it quite unusual (SHF_ALLOC SHT_REL[A] in an ET_REL file; a SHF_ALLOC section's sh_link field links to a non-SHF_ALLOC section).

"does not make it wrong" is not a good justification. Why can't such unusual usage be avoided in the first place?

I guess you are saying that llvm-objcopy only targets the subset of well-formed, compliant ELF files that are likely to be emitted by tools created for building user space programs.

Assuming that a SHT_RELA section with SHF_ALLOC set in an ET_REL file is a dynamic relocation section is an odd thing to do regardless of the context, given that such files cannot be used for dynamic relocation in the first place, so I would argue that this is a code improvement in itself.

In any case, thanks for the discussion, Please disregard this patch, we will find another solution.

If there's a real use-case, I'm not opposed to handling more unusual code, such as SHF_ALLOC relocations that are not intended for patching by the dynamic loader - if there's a still a need for it, I'd be happy to support working on a solution in llvm-objcopy/llvm-strip. The problem as I see it is that the only true way of determining whether a relocation section is really a static or dynamic relocation section is the dynamic table, which llvm-objcopy traditionally has not had a need of. We can't rely on the ELF type for two reasons:

  1. Custom ELF types could be either relocatable or non-relocatable (this is less important - producers of such custom ELF objects usually have to rely on downstream patches to keep things working for them anyway).
  2. Static relocations may exist in fully linked objects.

A simpler approach than inspecting the dynamic table might be to inspect the type of the sh_link-ed section: SHT_SYMTAB implies a static relocation section, and SHT_DYNSYM a dynamic one. There is a small corner case this approach doesn't cover, namely relocation sections with sh_link=0, but in that case, falling back to another heuristic like SHF_ALLOC or parsing the dynamic section may be sufficient.

I guess you are saying that llvm-objcopy only targets the subset of well-formed, compliant ELF files that are likely to be emitted by tools created for building user space programs.

As the person who reviewed 99% of the original ELF llvm-objcopy implementation, I can safely say that the intent of llvm-objcopy was to handle as many cases as made sense. At the time, Jake and I were unaware of any static relocation sections with SHF_ALLOC, hence it was implemented this way. Had we known of actual situations, we would have implemented it differently from the get-go: it certainly wasn't the intent for llvm-objcopy to be limited to handling a specific subset, but at the same time, we couldn't invest the time to handle every single corner case.

There's another side-issue with SHF_ALLOC static relocations, at least in llvm-objcopy: llvm-objcopy tries very hard to not change contents of memory-allocated data, because it can impact address references. Having said that, I think in an ET_REL case, we can probably ignore this specific issue. It's therefore probably safe to assume that if the type is ET_REL, all relocations are static, but not necessarily otherwise. (I'm assuming the partially linked ET_RELs in the Linux kernel still have relocations to patch all inter-section references?).

I think we can all agree that the error message here could be more helpful to the end user.

I'd be happy to review a simple patch that simply added the words "static" or "dynamic" as appropriate to the message.

ardb added a comment.Oct 7 2021, 2:00 AM

If there's a real use-case, I'm not opposed to handling more unusual code, such as SHF_ALLOC relocations that are not intended for patching by the dynamic loader - if there's a still a need for it, I'd be happy to support working on a solution in llvm-objcopy/llvm-strip. The problem as I see it is that the only true way of determining whether a relocation section is really a static or dynamic relocation section is the dynamic table, which llvm-objcopy traditionally has not had a need of. We can't rely on the ELF type for two reasons:

  1. Custom ELF types could be either relocatable or non-relocatable (this is less important - producers of such custom ELF objects usually have to rely on downstream patches to keep things working for them anyway).
  2. Static relocations may exist in fully linked objects.

This case is actually covered by my latest revision: SHF_ALLOC is taken into account as before for non-ET_REL type files.

A simpler approach than inspecting the dynamic table might be to inspect the type of the sh_link-ed section: SHT_SYMTAB implies a static relocation section, and SHT_DYNSYM a dynamic one. There is a small corner case this approach doesn't cover, namely relocation sections with sh_link=0, but in that case, falling back to another heuristic like SHF_ALLOC or parsing the dynamic section may be sufficient.

In that case, does it even matter? If there is no symtab to begin with, there is no risk of misidentifying a static one as a dynamic one, righr?

I guess you are saying that llvm-objcopy only targets the subset of well-formed, compliant ELF files that are likely to be emitted by tools created for building user space programs.

As the person who reviewed 99% of the original ELF llvm-objcopy implementation, I can safely say that the intent of llvm-objcopy was to handle as many cases as made sense. At the time, Jake and I were unaware of any static relocation sections with SHF_ALLOC, hence it was implemented this way. Had we known of actual situations, we would have implemented it differently from the get-go: it certainly wasn't the intent for llvm-objcopy to be limited to handling a specific subset, but at the same time, we couldn't invest the time to handle every single corner case.

There's another side-issue with SHF_ALLOC static relocations, at least in llvm-objcopy: llvm-objcopy tries very hard to not change contents of memory-allocated data, because it can impact address references. Having said that, I think in an ET_REL case, we can probably ignore this specific issue. It's therefore probably safe to assume that if the type is ET_REL, all relocations are static, but not necessarily otherwise. (I'm assuming the partially linked ET_RELs in the Linux kernel still have relocations to patch all inter-section references?).

Yes, the Linux kernel module loader relies on the relocation,. symtab and strtab sections to fix up all references (including undefined ones) once the ordinary allocatable sections have been copied to memory.

I think we can all agree that the error message here could be more helpful to the end user.

I'd be happy to review a simple patch that simply added the words "static" or "dynamic" as appropriate to the message.

Thanks.

But if we disregard this patch, there is still an issue where llvm-objcopy is able to generate ELF files that are malformed according to its own standards. I.e.,

$ llvm-objcopy --set-section-flags=.rela.text=alloc,readonly <object file>

will generate a file that llvm-objcopy itself is not able to read.

If there's a real use-case, I'm not opposed to handling more unusual code, such as SHF_ALLOC relocations that are not intended for patching by the dynamic loader - if there's a still a need for it, I'd be happy to support working on a solution in llvm-objcopy/llvm-strip. The problem as I see it is that the only true way of determining whether a relocation section is really a static or dynamic relocation section is the dynamic table, which llvm-objcopy traditionally has not had a need of. We can't rely on the ELF type for two reasons:

  1. Custom ELF types could be either relocatable or non-relocatable (this is less important - producers of such custom ELF objects usually have to rely on downstream patches to keep things working for them anyway).
  2. Static relocations may exist in fully linked objects.

This case is actually covered by my latest revision: SHF_ALLOC is taken into account as before for non-ET_REL type files.

Yes, I see. I haven't had the time today unfortunately to think about the code change itself in this patch. It's on my backlog.

A simpler approach than inspecting the dynamic table might be to inspect the type of the sh_link-ed section: SHT_SYMTAB implies a static relocation section, and SHT_DYNSYM a dynamic one. There is a small corner case this approach doesn't cover, namely relocation sections with sh_link=0, but in that case, falling back to another heuristic like SHF_ALLOC or parsing the dynamic section may be sufficient.

In that case, does it even matter? If there is no symtab to begin with, there is no risk of misidentifying a static one as a dynamic one, righr?

Not sure if you're referring to the symtab here or the relocation section. At the moment, I believe there are three cases that are handled (I might be wrong though - I'm going off memory, as I don't have time to dig into the code):

  1. Relocation sections with sh_link = 0. In this case, don't attempt to look for a symbol table.
  2. Static relocation sections should have a sh_link pointing to a static symbol table.
  3. Dynamic relocation sections should have a sh_link pointing to a dynamic symbol table.

In the latter two cases, we can reverse the logic and say that the symbol table kind (dynamic or static) identifies the relocation section kind. However, in the former case, we don't know by looking at sh_link whether the symbol table is static or dynamic. sh_link = 0 is fine for some relocation sections: they need to have all relocation have an r_sym of 0, i.e. be relocation types that aren't relative to a symbol. We do need to still distinguish betewen whether the relocation seciton is static or dynamic, as the llvm-objcopy code for the two is different.

But if we disregard this patch, there is still an issue where llvm-objcopy is able to generate ELF files that are malformed according to its own standards. I.e.,

$ llvm-objcopy --set-section-flags=.rela.text=alloc,readonly <object file>

will generate a file that llvm-objcopy itself is not able to read.

That's a fair point. I don't think it's unreasonable to allow llvm-objcopy to generate invalid objects (in the sense that it wouldn't be useful/the linker might reject it etc), by misusing some switches, but it should probably be able to consume such objects itself, so that it can do something sensible, in particular so that a user could reverse the option.

ardb added a comment.Oct 7 2021, 3:07 AM

A simpler approach than inspecting the dynamic table might be to inspect the type of the sh_link-ed section: SHT_SYMTAB implies a static relocation section, and SHT_DYNSYM a dynamic one. There is a small corner case this approach doesn't cover, namely relocation sections with sh_link=0, but in that case, falling back to another heuristic like SHF_ALLOC or parsing the dynamic section may be sufficient.

In that case, does it even matter? If there is no symtab to begin with, there is no risk of misidentifying a static one as a dynamic one, righr?

Not sure if you're referring to the symtab here or the relocation section. At the moment, I believe there are three cases that are handled (I might be wrong though - I'm going off memory, as I don't have time to dig into the code):

  1. Relocation sections with sh_link = 0. In this case, don't attempt to look for a symbol table.
  2. Static relocation sections should have a sh_link pointing to a static symbol table.
  3. Dynamic relocation sections should have a sh_link pointing to a dynamic symbol table.

In the latter two cases, we can reverse the logic and say that the symbol table kind (dynamic or static) identifies the relocation section kind. However, in the former case, we don't know by looking at sh_link whether the symbol table is static or dynamic. sh_link = 0 is fine for some relocation sections: they need to have all relocation have an r_sym of 0, i.e. be relocation types that aren't relative to a symbol. We do need to still distinguish betewen whether the relocation seciton is static or dynamic, as the llvm-objcopy code for the two is different.

OK that answers my question. I wasn't sure whether the RELA sections themselves need to be treated any differently but apparently they do.

ardb abandoned this revision.Oct 27 2021, 3:49 AM