This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Fix Bug, occurring if Symbol has no name
ClosedPublic

Authored by j0le on Sep 2 2022, 5:52 AM.

Details

Summary

Bug: An assertion fails:

Assertion failed: isa<To>(Val) && "cast<Ty>() argument of incompatible type!", 
file C:\Users\<user>\prog\llvm\llvm-git-lld-bug\llvm\include\llvm/Support/Casting.h, line 578

Bug is triggered, if

  • a map file is requested with /MAP, and
  • Architekture is ARMv7, Thumb, and
  • a relative jump (branch instruction) is greater than 16 MiB (2^24)

The reason for the Bug is:

  • a Thunk is created for the jump
  • a Symbol for the Thunk is created
    • of type DefinedSynthetic
    • in file Writer.cpp
    • in function getThunk
  • the Symbol has no name
  • when creating the map file, the name of the Symbol is queried
  • the function Symbol::computeName of the base class Symbol casts the this pointer to type DefinedCOFF (a derived type), but the acutal type is DefinedSynthetic
  • The in the llvm::cast an assertion fails

Changes:

  • Modify regression test to trigger this bug
  • Give the symbol pointing to the thunk a name, to fix the bug
  • Add assertion, that only DefinedCOFF symbols are allowed to have an empty name, when the constructor of the base class Symbol is executed

Diff Detail

Event Timeline

j0le created this revision.Sep 2 2022, 5:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2022, 5:52 AM
j0le requested review of this revision.Sep 2 2022, 5:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 2 2022, 5:52 AM
j0le added inline comments.Sep 2 2022, 6:00 AM
lld/COFF/Writer.cpp
399

some questions:

  • Should we give the symbol a name to fix the bug, instead of changing Symbol::computeName()?
  • Is it allowed, that multiple symbols have the same name?

some questions not really related to the bug, but answers would help me understand LLD better.

  • why don't we use SymbolTable::addSynthetic()?
  • Do we need to have this symbol in the symbol table?
  • Is it added later to the symbol table?
mstorsjo added a reviewer: rnk.Sep 5 2022, 1:24 AM
mstorsjo added inline comments.
lld/COFF/Symbols.cpp
60 ↗(On Diff #457568)

This fix kind of changes the assumptions here; the original assumption (see the comment in the assert above in line 57) is that this only ever gets called for DefinedCOFF symbols.

lld/COFF/Symbols.h
85

I'm pretty sure that we can be sure that nameSize is zero here; see line 112 below. (And if we strictly want to keep this, I think we should arrange it like this:

if (nameData == nullptr) {
  computeName();
  if (if nameData == nullptr)
    return StringRef{};
}

But I think we can leave this out entirely.

lld/COFF/Writer.cpp
399

I think just adding a symbol name to fix the bug is better - since apparently the assumption is that only DefinedCOFF symbols may have a non-empty symbol name.

Yes, it's generally allowed that multiple symbols have the same name (consider static symbols within object files).

We don't use SymbolTable::addSynthetic because we don't want to add these symbols into the symbol table (where the name clash with multiple symbols with the same name would be an issue). The exact mechanics of this is a bit of a hazy memory at the moment though, since it's been a few years since I wrote it, and I believe the code might have been restructured at least somewhat since.

lld/test/COFF/arm-thumb-thunks.s
3

I think it might be good to add the same to one of the aarch64 range extension thunk testcases too. From reading the code, it's fairly evident that the same fix automatically applies to both, but it'd be nice to have some test coverage for it too.

j0le added inline comments.Sep 5 2022, 1:41 AM
lld/COFF/Symbols.cpp
60 ↗(On Diff #457568)

Should we instead put an assertion in the constructor of DefinedSynthetic, that the name is not allowed to be empty?
Like this:

explicit DefinedSynthetic(StringRef name, Chunk *c)
    : Defined(DefinedSyntheticKind, name), c(c) {
  assert(!name.empty());
}

Or maybe even better in the constructor of Symbol?:

explicit Symbol(Kind k, StringRef n = "")
    : symbolKind(k), isExternal(true), isCOMDAT(false),
      writtenToSymtab(false), pendingArchiveLoad(false), isGCRoot(false),
      isRuntimePseudoReloc(false), deferUndefined(false), canInline(true),
      nameSize(n.size()), nameData(n.empty() ? nullptr : n.data()) {
  assert((!n.empty() || k <= LastDefinedCOFFKind) &&
         "If the name is empty, the Symbol must be a DefinedCOFF.");
}
mstorsjo added inline comments.Sep 5 2022, 1:48 AM
lld/COFF/Symbols.cpp
60 ↗(On Diff #457568)

Yes, I think that might make sense (the latter probably is better), if this is the design of the Symbol classes. But I'd like to hear @rnk's opinion on it too (feel free to update the patch, but I'll at least hold off of landing it until @rnk has commented on it).

j0le updated this revision to Diff 457928.Sep 5 2022, 4:01 AM
j0le retitled this revision from [LLD][COFF] Fix Bug, occouring if Symbol has no name to [LLD][COFF] Fix Bug, occurring if Symbol has no name.
j0le edited the summary of this revision. (Show Details)

Update patch:

  • Revert changes, which were made to Symbol::getName() and Symbol::computeName()
  • Add assertion to ctor of class Symbol
  • Add "-map" to command line of lld-link in test for ARM64

I can confirm, that the bug also occurs for ARM64

j0le marked 3 inline comments as done.Sep 5 2022, 4:08 AM

Setting phabricator inline comments to "Done".

j0le marked 2 inline comments as done.Sep 5 2022, 4:12 AM

Thanks, this form looks good to me. I'm still leaving the review open for now, to wait for @rnk's opinion on the direction though.

rnk accepted this revision.Sep 6 2022, 12:15 PM
This revision is now accepted and ready to land.Sep 6 2022, 12:15 PM
j0le added a comment.Sep 6 2022, 10:55 PM

Thanks everyone for review and feedback! Can someone commit this for me, because I don't have commit access?

Thanks everyone for review and feedback! Can someone commit this for me, because I don't have commit access?

Sure. What's your preferred git author line (Real Name <email@address>) to use for the commit? (I think I'll reword the commit message a little before pushing too.)

j0le added a comment.Sep 6 2022, 11:21 PM
This comment was removed by j0le.
j0le added a comment.Sep 6 2022, 11:28 PM

I think I'll reword the commit message a little before pushing too.

Yes, that would be nice. I'm not the best writer 😉.

(I think, I will delete the comment with my e-mail address after you have committed the revision, because of spam.)