This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Introduce synthetic file for linker script symbols.
AbandonedPublic

Authored by grimar on Apr 6 2018, 8:43 AM.

Details

Summary

I suggest assigning a synthetic file for linker script and --defsym symbols, that would allow enabling
our toString logic to work with them properly.

That is useful for the following things:

  1. We can print --defsym and linker script files in -cref. (LLD currently prints <internal> instead of a valid location)
  2. Useful for error reporting.
  3. Useful for debugging. (Not so important, but nice to have for free).

The result of such change is shown in the test cases.

Diff Detail

Event Timeline

grimar created this revision.Apr 6 2018, 8:43 AM

I like the general direction.

ELF/LTO.cpp
167

Can we now say what the comment says: !isa<BitcodeFile>(Sym->File)?

ELF/ScriptParser.cpp
270

Why this change?

ruiu added a comment.Apr 6 2018, 1:03 PM

I'm not too excited about doing this. It looks like it opens a can of worms, even if the change itself is small. Adding a new file type is a huge thing that shouldn't be done lightly.

ruiu added a comment.Apr 6 2018, 1:06 PM

I mean unless you can show that this is super useful, I'd oppose to adding a new file type. If it is just mildly useful, we should do it in a different way (in case you really need it) or just leave it alone (otherwise).

I mean unless you can show that this is super useful, I'd oppose to adding a new file type. If it is just mildly useful, we should do it in a different way (in case you really need it) or just leave it alone (otherwise).

Most lld symbols have files in them. There is a special "kind" that is represented null pointer, which is somewhat annoying to handle.

That is the main reason I like this change: It makes more symbols have a non-null File. In fact, I think a simple followup could use SyntheticFile for symbols like GLOBAL_OFFSET_TABLE and we would never have to handle a null File again.

grimar added a comment.Apr 9 2018, 7:55 AM

I'm not too excited about doing this. It looks like it opens a can of worms, even if the change itself is small.

I think it is opened now. See what we have with null input file:

  1. We crash (see: D45440. That was a quick find, looking for more). It is hard to avoid such bugs in future

because having the null value for File is really not obvious.

  1. We are not always able to report locations in errors properly (see relocation-relative-absolute.s of this patch),

it would simply report `<internal> location for script symbol.

  1. Our -cref prints broken locations for linker script symbols. (cref.s test case of this patch).
  2. Our -trace-symbol is broken for linker script symbols. I'll update the diff with the test case.
  3. Our checks in code are more complicated (see in this patch).

We can avoid adding the new file class btw. Just a new type would be enough. I'll update the patch soon.

ELF/ScriptParser.cpp
270

Otherwise in cref.s we would output -defsym:1 for Location
(with that change it prints -defsym). Just a bit nicer.
(see ScriptLexer::getCurrentLocation() https://github.com/llvm-mirror/lld/blob/master/ELF/ScriptLexer.cpp#L69).

grimar updated this revision to Diff 141653.Apr 9 2018, 7:57 AM
  • Simplify, added one more test case.

Please rebase.

ELF/InputFiles.h
73

clang-format

ELF/LTO.cpp
167

You should keep the comment about the section symbol being null for bitcode symbols, that is why we need the isa<BitcodeFile>

grimar updated this revision to Diff 141972.Apr 11 2018, 4:06 AM
grimar marked 2 inline comments as done.
  • Addressed review comments.

I think I am OK with this.

Rui, what do you think? Would you be OK if it changed every case where File is nullptr to use a synthetic file?

ruiu added a comment.Apr 11 2018, 5:34 PM

I'm still in doubt if this is overall a win. At least this patch itself doesn't demonstrate the benefit of having an extra type of file, and I doubt if it will be. Can this simplify things significantly? I think I'm not convinced that we want this one.

I'm still in doubt if this is overall a win. At least this patch itself doesn't demonstrate the benefit of having an extra type of file, and I doubt if it will be. Can this simplify things significantly? I think I'm not convinced that we want this one.

I think it can be changed so that we have a reference to a InputFile (or at least know that the pointer is never null).

The reality is that we already have Synthetic files, we just have a very special representation for them: a null pointer.

ruiu added a comment.Apr 11 2018, 5:45 PM

The reality is that we already have Synthetic files, we just have a very special representation for them: a null pointer.

I see what you are saying. But it's not really that uncomfortable to me. It is matter of degree, but if we add a new type of a file, I'd like to see a larger benefit.

The reality is that we already have Synthetic files, we just have a very special representation for them: a null pointer.

I see what you are saying. But it's not really that uncomfortable to me. It is matter of degree, but if we add a new type of a file, I'd like to see a larger benefit.

Would never having a null InputFile be sufficient?

ruiu added a comment.Apr 11 2018, 5:56 PM

Would never having a null InputFile be sufficient?

Like assigning a dummy ObjFile or something to synthetic symbols?

Would never having a null InputFile be sufficient?

Like assigning a dummy ObjFile or something to synthetic symbols?

Yes. With this patch we can have a single InternalFile created with createSyntheticFile("<internal>").

grimar abandoned this revision.May 15 2018, 7:35 AM

I recently came across the <internal> issue, and was just reminded of the presence of the patch. I am in favor of this change.

Currently we had 7 states: nullptr, ObjKind, SharedKind, LazyObjKind, ArchiveKind, BitcodeKind, BinaryKind.
With this change, we still have 7: SyntheticKind, ObjKind, SharedKind, LazyObjKind, ArchiveKind, BitcodeKind, BinaryKind.

A nice property is that InputSectionBase::file and Symbol::file will never be null.

I agree, I think that a synthetic file has some benefits over using nullptr. I have worked on a linker that did that in the past and it was a useful property to have if no Symbol had a nullptr File. There were cases where one slipped through and it caused a crash, so some extra asserts may be needed.

Some comments on this particular implementation:

  • I suggest not creating separate files for each linkerscript defined symbol, with the location encoded in the filename. To me that isn't what I'd expect if I asked for the filename, I'd expect just the filename of the linkerscript that contained it. If we want to have extra location information such as line numbers for linker scripts, or Section + Offset etc. for symbols defined in objects we can add them on via some other means. Having a single file per linker script has some advantages in comparing symbols based on which InputFile defined them. Ideally we'd want two symbols defined in the same linkerscript to have the same file.
  • I'd also create an <internal> synthetic file for the linker defined symbols like _GLOBAL_OFFSET_ table where there isn't a real file. That would permit us to assume that the file wouldn't be nullptr.

Anyway certainly room for more discussion.

One of the latest comment from Rui about this approach was: "I'm still in doubt if this is overall a win. At least this patch itself doesn't demonstrate the benefit of having an extra type of file, and I doubt if it will be. Can this simplify things significantly? I think I'm not convinced that we want this one."
If something have changed, I can try to ressurect this patch.

One of the latest comment from Rui about this approach was: "I'm still in doubt if this is overall a win. At least this patch itself doesn't demonstrate the benefit of having an extra type of file, and I doubt if it will be. Can this simplify things significantly? I think I'm not convinced that we want this one."
If something have changed, I can try to ressurect this patch.

mold does use a synthetic file for these synthetic symbols :)

Herald added a project: Restricted Project. · View Herald TranscriptSun, Jan 21, 4:25 PM