This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Fix for "LLD can create incorrect debug PC ranges for functions in Comdat groups."
ClosedPublic

Authored by grimar on May 6 2018, 7:14 AM.

Details

Summary

This is an attempt to fix the https://bugs.llvm.org//show_bug.cgi?id=37212.

The issue is about code like:

.section .text.foo,"axG",@progbits,bar,comdat
.Lfoo:

.section .debug_info,"",@progbits
  .quad	.Lfoo
llvm-mc -filetype=obj -triple=x86_64-pc-linux t.s -o t.o
ld t.o t.o -o ld.out

LLD currently ignores the relocations to discarded COMDAT sections and hence out output is different from GNU linkers output.
What LLD do is valid from the spec point of view (https://docs.oracle.com/cd/E19120-01/open.solaris/819-0690/chapter7-11598/index.html),
but unfortunately, modern compilers produce technically incorrect ELF and this patch is an attempt
to work around it. It is a bit hacky but seems to be more or less isolated.

Diff Detail

Event Timeline

grimar created this revision.May 6 2018, 7:14 AM

If I've understood correctly (from test/ELF/comdat.s) this will affect relocations from all sections to discarded local symbols. If I'm right I think that this is too wide a use case. My understanding is that in gold and bfd this case is specifically for relocations from Dwarf debug sections to discarded local symbols. Relocations from non-dwarf debug sections to discarded sections will be either an error or ignored depending on the use case.

For example
file1.s

        .section .text.1, "axG", %progbits, foo, comdat, unique, 0
foo:    nop

        .section .text.2, "ax", %progbits
        .globl _start
_start:        
        .quad foo

        .section .debug_info
        .quad foo

file2.s

        .section .text.1, "axG", %progbits, foo, comdat, unique, 0
foo:    nop

        .section .text.3, "ax", %progbits
        .globl func
func:
        // This is a reference to a local definition inside group foo, ld.bfd will give an error message, ld.gold a warning, here when group is discarded, LLD will not.
        .quad foo
        nop

        .section .debug_info
        // This is a reference to a local definition inside group foo from a debug section, relocation is resolved to foo in kept group
        .quad foo
        // This is a reference to a local definition inside group foo, but does not exist in group foo in file1.s, resolved to 0.

llvm-mc -triple=x86_64-pc-linux t.s -filetype=obj -o t.o
llvm-mc -triple=x86_64-pc-linux t2.s -filetype=obj -o t2.o
ld.bfd t.o t2.o -o t.exe
.text.1' referenced in section .text.3' of t2.o: defined in discarded section `.text.1[foo]' of t2.o
ld.gold t.o t2.o -o t.exe
t2.o(.text.3+0x0): warning: relocation refers to discarded section
ld.lld t.o t2.o -o t.exe
// no errors

So I think that we are potentially doing two things wrong here:

  • I think we should be giving a warning or error when there are relocations from non-debug sections to discarded local symbols.
  • If we have a relocation from a debug section to a discarded local symbol, we should use the definition of the symbol from the section in the kept group or 0 if there is no such symbol.

Could this be implemented by something like:

If (relocation to local defined in discarded group section) {
  if (section containing relocation is debug) {
    if (find local definition in kept group)
        use that definition.
    else
        0   
  }
  else
     warn or error
ELF/InputFiles.cpp
439

Unfortunately I don't think that this will work on all targets. For example with --target=armv7a-linux-gnueabihf clang puts the exception table information (.ARM.exdix.name_of_function> and .rel.ARM.exidx.text.name_of_function) in the same group as name_of_function. I think it will be possible to select _a_ candidate section for these groups by choosing the section with SHF_EXECINSTR.

466

Same comment about number of entries as above.

ELF/InputSection.cpp
298

I'm guessing this is to match the createDiscardedProxy? I'm thinking that it might be better to extract that into a isDiscardedProxy() function to make it clearer. Might be worth checking that Repl is non null as well.

ELF/InputSection.h
53

What happens if the Repl section for a discardedProxy is merged by ICF? Can we have a Repl -> Repl -> Repl needing to follow to the end of the Repl chain to find the candidate section.

test/ELF/comdat-debug.s
1

We should add a test for ld -r as well as that could easily get broken in the future.

test/ELF/comdat.s
32

I think this should be an error or warning (see main comment)

grimar added a comment.EditedMay 10 2018, 6:38 AM

If I've understood correctly (from test/ELF/comdat.s) this will affect relocations from all sections to discarded local symbols. If I'm right I think that this is too wide a use case. My understanding is that in gold and bfd this case is specifically for relocations from Dwarf debug sections

Yes, but should we follow?

Such objects are broken by the specification. For broken objects, it is ok to produce broken output usually.

We first of all (I guess) want to fix the compilers for the case this patch tries to address and have some reasonable workaround in LLD for some time.
I think this patch introduces probably the minimum of required code to do that. And the code is simple and fast.

ELF/InputFiles.cpp
439

Ok, thanks for the info. I wonder if it will be enough. Seems gold also matches section name and size.
Should we do the same too for safety?

ELF/InputSection.h
53

Yes, we will just need Repl -> Repl -> Repl at one place I believe (In getSymVA). Since it should be an easy fix and can be separate change,
I did not add a test case and fix for ICF in this patch. Question is if we are OK with the whole approach.

If I've understood correctly (from test/ELF/comdat.s) this will affect relocations from all sections to discarded local symbols. If I'm right I think that this is too wide a use case. My understanding is that in gold and bfd this case is specifically for relocations from Dwarf debug sections

Yes, but should we follow?

Such objects are broken by the specification. For broken objects, it is ok to produce broken output usually.

Personally I think we should as output that can fail at runtime is much worse than output that doesn't have a correct debug illusion. My understanding is that compilers will not generated code that makes illegal references into comdat groups, if one exists it is a serious bug. If there is an illegal reference from assembly code I think users would want to know about it and fix it. My understanding is that debug sections are considered a special case as a bad reference won't result in runtime failure.

We first of all (I guess) want to fix the compilers for the case this patch tries to address and have some reasonable workaround in LLD for some time.
I think this patch introduces probably the minimum of required code to do that. And the code is simple and fast.

I don't think that this would make LLD any worse; references to local symbols from non-debug sections are often just as broken if they are resolved to 0 (current behaviour) as if they are resolved to a symbol. Given that it is likely to be some time before compilers change their behaviour and for the ecosystem to pick up the latest versions; I personally would prefer to restrict to just debug sections. This isn't a very strong opinion though so I'm happy to go with the consensus.

ELF/InputFiles.cpp
439

I think it is the right thing to do. If a file is compiled with the same compiler and the same flags the same source code must/should be the same size with the same section name. If files are compiled at different times, with different compilers it is probably safer to not to try and substitute debug information that could be wrong.

Rui, do you have any opinion on that?

ruiu added a comment.May 10 2018, 9:17 AM

I wonder if we can simply discard debug records that refer discarded sections. If comdat section is discarded because there's another copy of it, there should be debug records for that section, so we have two copies of debug records, thus we can discard debug records for a discarded comdat section, no?

I wonder if we can simply discard debug records that refer discarded sections. If comdat section is discarded because there's another copy of it, there should be debug records for that section, so we have two copies of debug records, thus we can discard debug records for a discarded comdat section, no?

My understanding is that it depends on what action the debugger is trying to do. If it is single-stepping through a function from the selected comdat group there will be at least one set of .debug sections that have the correct PC range so mappings from Address to Source will work. The difficulty is with the Source to Address mapping for the objects where the group was discarded. For example show me all the symbols defined in this source file will not find the comdat group as the PC range won't be correct.

ruiu added a comment.May 10 2018, 9:54 AM

My understanding is that it depends on what action the debugger is trying to do. If it is single-stepping through a function from the selected comdat group there will be at least one set of .debug sections that have the correct PC range so mappings from Address to Source will work. The difficulty is with the Source to Address mapping for the objects where the group was discarded. For example show me all the symbols defined in this source file will not find the comdat group as the PC range won't be correct.

I don't know if showing all symbols defined by some file is a common operation. Is it?

I wonder if we can simply discard debug records that refer discarded sections. If comdat section is discarded because there's another copy of it, there should be debug records for that section, so we have two copies of debug records, thus we can discard debug records for a discarded comdat section, no?

Does not that mean we would need to have an additional debug parsing in a linker? (to rebuild the DWARF data).

btw, I think general direction is to emit a single debug data COMDAT group of DWARF sections per -ffunction-section. I think we will have this sooner or later.
Linkers will be able to handle that very fast then (as handling of COMDATs is fast).

I *think* we just want to have some easy-removable hack for now probably. Like this patch do.

ruiu added a comment.May 10 2018, 10:12 AM

I don't think there's a hack that you can easily remove. Once something lands to the linker's source code, that becomes a part of the implicit ABI, and you can never remove it unless it becomes really obsolete. Basically, you should assume that landing some feature or a workaround is a commitment that we will very likely to keep it for a long period of time.

I don't think there's a hack that you can easily remove. Once something lands to the linker's source code, that becomes a part of the implicit ABI, and you can never remove it unless it becomes really obsolete. Basically, you should assume that landing some feature or a workaround is a commitment that we will very likely to keep it for a long period of time.

I like this position. Then my question is do we want it? I feel it is a dirty hack. I do not know how much it is important to have/do not have for us though.

My understanding is that it depends on what action the debugger is trying to do. If it is single-stepping through a function from the selected comdat group there will be at least one set of .debug sections that have the correct PC range so mappings from Address to Source will work. The difficulty is with the Source to Address mapping for the objects where the group was discarded. For example show me all the symbols defined in this source file will not find the comdat group as the PC range won't be correct.

I don't know if showing all symbols defined by some file is a common operation. Is it?

I've no idea to be honest, I suspect it would be more common in GUI based debuggers that have a speedbar containing all the files in a program than in something like gdb. It isn't just that operation though; it is anything that requires the debugger to look up something starting at a source location. For example setting breakpoints based on source file and line.

I don't think there's a hack that you can easily remove. Once something lands to the linker's source code, that becomes a part of the implicit ABI, and you can never remove it unless it becomes really obsolete. Basically, you should assume that landing some feature or a workaround is a commitment that we will very likely to keep it for a long period of time.

I like this position. Then my question is do we want it? I feel it is a dirty hack. I do not know how much it is important to have/do not have for us though.

I don't have a strong position, I happened to notice that gold and bfd had implemented some method to patch up relocations to local symbols in groups. I think that this actually started in BFD to work around bugs in gcc linkonce sections : https://sourceware.org/bugzilla/show_bug.cgi?id=233 however it looks to have been altered over time to only affect .debug sections and I haven't been able to find any source code commit messages that say why.

ruiu added a comment.May 10 2018, 10:36 AM

I don't actually have a strong opinion on which is better. I'm thinking which is easier to implement. In particular, I'm not very happy about the notion of "proxy input section" because it seems like it is too much abstraction. If possible, I want to make this much simpler than this.

I don't actually have a strong opinion on which is better. I'm thinking which is easier to implement. In particular, I'm not very happy about the notion of "proxy input section" because it seems like it is too much abstraction. If possible, I want to make this much simpler than this.

Note that we have no proxy section. We have a method that creates something that behaves like that. But no specific type for the section. We can trade/play with it. Can it be a section with no name and no data for example?

ruiu added a comment.May 10 2018, 10:55 AM

Your change might be one of the simplest implementations, but it feels like it is changing too many function signatures, add too much new code and functions. Let me try to see if this can be simplified.

ruiu added a comment.May 10 2018, 5:39 PM

I tried this patch locally, experimented some other ideas, and do some benchmark. My feeling is that this patch is too much as a workaround for a buggy input. This is too intrusive, and it creates too many objects at runtime. For instance, Firefox contains almost 1 million discarded comdat sections, and creating "proxy" objects for them isn't negligible. You probably should fix the compiler instead of implementing a workaround to the linekr.

I tried this patch locally, experimented some other ideas, and do some benchmark. My feeling is that this patch is too much as a workaround for a buggy input. This is too intrusive, and it creates too many objects at runtime. For instance, Firefox contains almost 1 million discarded comdat sections, and creating "proxy" objects for them isn't negligible. You probably should fix the compiler instead of implementing a workaround to the linekr.

Thanks for looking, Rui. I agree that it this functionality is too hacky probably.

Let's see for people other opinions. If there will be no strong arguments to do something like that in the linker, I'll abandon this patch then.

I tried this patch locally, experimented some other ideas, and do some benchmark. My feeling is that this patch is too much as a workaround for a buggy input. This is too intrusive, and it creates too many objects at runtime. For instance, Firefox contains almost 1 million discarded comdat sections, and creating "proxy" objects for them isn't negligible. You probably should fix the compiler instead of implementing a workaround to the linekr.

Thanks for looking, Rui. I agree that it this functionality is too hacky probably.

Let's see for people other opinions. If there will be no strong arguments to do something like that in the linker, I'll abandon this patch then.

I suggest asking on the PR as that had quite a few people involved in debug generation on the CC list. They may have a better view on how important it is for a linker to fix up the relocations or not.

A possible alternative, that I haven't thought through fully, that might not require the creation of a proxy section is for each selected group store the tuple <Group Signature Name (as we do today), InputFile*, Group Section Index in InputFile>. With this information could we look up the selected InputSection and store it in Sections[] rather than a proxy section. This would mean having an InputSection from Object A in the Sections array for Object B which could have some unforeseen side effects.

I suggest asking on the PR as that had quite a few people involved in debug generation on the CC list. They may have a better view on how important it is for a linker to fix up the relocations or not.

Yes. That what I wanted to do (had issues with VPN as for Russia bugs.llvm.org and svn just does not work without it atm. That's annoying.). Done.

A possible alternative, that I haven't thought through fully, that might not require the creation of a proxy section is for each selected group store the tuple <Group Signature Name (as we do today), InputFile*, Group Section Index in InputFile>. With this information could we look up the selected InputSection and store it in Sections[] rather than a proxy section. This would mean having an InputSection from Object A in the Sections array for Object B which could have some unforeseen side effects.

Side effects are a concern here. When I experimented with approaches for this patch, I tried different solutions, but side effects were annoying.
Also, such way is less isolated and also not clear if we want to change the design of the linker so much for this hack. Let's see the possible comments.

ruiu added a comment.May 11 2018, 10:26 AM

I'd agree that there might be some alternative way of doing this. I thought that, if there's some way to obtain a replacing comdat section from here

https://github.com/llvm-project/llvm-project-20170507/blob/master/lld/ELF/Symbols.cpp#L54

then we can get an address for a replaced comdat section which might solve the issue. But with the data structure it doesn't seem to be easy thing to do (and that's not lld's fault as it is for a compiler bug workaround).

I'd agree that there might be some alternative way of doing this. I thought that, if there's some way to obtain a replacing comdat section from here

https://github.com/llvm-project/llvm-project-20170507/blob/master/lld/ELF/Symbols.cpp#L54

then we can get an address for a replaced comdat section which might solve the issue. But with the data structure it doesn't seem to be easy thing to do (and that's not lld's fault as it is for a compiler bug workaround).

Yes. And this place is probably hot as it is called for each relocation. That is why I ended with proxy object finally. We could add one more light base section probably.
Without adding the one more explicit section type. Something like:

class SectionBase;

class SectionProxy {
public:
  // <new comment>
  SectionBase *Repl;

  unsigned SectionKind : 3;

  // The next two bit fields are only used by InputSectionBase, but we
  // put them here so the struct packs better.

  // The garbage collector sets sections' Live bits.
  // If GC is disabled, all sections are considered live by default.
  unsigned Live : 1;

  unsigned Bss : 1;

  unsigned Proxy : 1;
}


class SectionBase : public SectionProxy  { ...

So creating many proxy sections would be much cheaper. I am not sure if such approach for that is OK though.

I'd agree that there might be some alternative way of doing this. I thought that, if there's some way to obtain a replacing comdat section from here

https://github.com/llvm-project/llvm-project-20170507/blob/master/lld/ELF/Symbols.cpp#L54

then we can get an address for a replaced comdat section which might solve the issue. But with the data structure it doesn't seem to be easy thing to do (and that's not lld's fault as it is for a compiler bug workaround).

I can't think of a way that doesn't involve storing a mapping between discarded symbols/sections or just storing the input file of the selected group, but then having to do some potentially non-trivial processing to find the equivalent section for each discarded one. If each InputFile did store a mapping table then in theory copyLocalSymbols could populate the section field of a local symbol defined against a discarded section with that of the selected group.

As a drive by comment: While I think that consumers should be able to handle this case (obviously the bfd/gold workarounds here aren't always effective), I do think that we should probably implement this as well so that the consumers don't have to deal with it in a lot of cases.

MaskRay requested changes to this revision.Sun, Jan 21, 4:34 PM
MaskRay added a subscriber: MaskRay.

This has been implemented in a different way in InputSection::relocateNonAlloc to allow tombstone values.

This revision now requires changes to proceed.Sun, Jan 21, 4:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptSun, Jan 21, 4:34 PM
Herald added a subscriber: pengfei. · View Herald Transcript
MaskRay accepted this revision.Sun, Jan 21, 4:35 PM
This revision is now accepted and ready to land.Sun, Jan 21, 4:35 PM
MaskRay closed this revision.Sun, Jan 21, 4:35 PM