This is an archive of the discontinued LLVM Phabricator instance.

[JITLink][COFF] Initial COFF support.
ClosedPublic

Authored by sunho on Jun 30 2022, 11:24 PM.

Details

Summary

Adds initial COFF support in JITLink. This is able to run a hello world c program in x86 windows successfully.

Implemented

  • COFF object loader
  • Static local symbols
  • Absolute symbols
  • External symbols
  • Weak external symbols
  • Common symbols
  • COFF jitlink-check support
  • All COMDAT selection type execpt largest
  • Implicit symobl size calculation
  • Rel32 relocation with PLT stub.
  • IMAGE_REL_AMD64_ADDR32NB relocation

Diff Detail

Event Timeline

sunho created this revision.Jun 30 2022, 11:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 11:24 PM
sunho requested review of this revision.Jun 30 2022, 11:24 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2022, 11:24 PM
sunho updated this revision to Diff 441606.Jun 30 2022, 11:46 PM
sunho updated this revision to Diff 441780.Jul 1 2022, 1:12 PM

Add tests and rel32 relocation.

sunho edited the summary of this revision. (Show Details)Jul 1 2022, 1:22 PM
sunho edited the summary of this revision. (Show Details)Jul 1 2022, 1:27 PM
sunho updated this revision to Diff 441880.Jul 2 2022, 6:41 AM

Finish up object file detection. Add more test cases.

sunho updated this revision to Diff 441888.Jul 2 2022, 9:04 AM

Refactor symobol handling, calculate implicit size, support jitlink-check.

sunho updated this revision to Diff 441889.Jul 2 2022, 9:43 AM

Support weak external symbols.

sunho retitled this revision from [DRAFT][JITLink][COFF] Initial COFF support. to [JITLink][COFF] Initial COFF support..Jul 2 2022, 9:53 AM
sunho edited the summary of this revision. (Show Details)
sunho updated this revision to Diff 441892.Jul 2 2022, 10:32 AM
housel added a subscriber: housel.Jul 2 2022, 11:03 AM
housel added inline comments.
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
137

Maybe this should be an error return rather than an assert

415

contains a selection type... are handled

sunho updated this revision to Diff 441902.Jul 2 2022, 1:12 PM

Address comments and fix builds.

sunho marked 2 inline comments as done.Jul 2 2022, 1:12 PM
sunho updated this revision to Diff 441924.Jul 3 2022, 12:37 AM
sunho updated this revision to Diff 442148.Jul 4 2022, 2:59 PM
sunho updated this revision to Diff 442150.Jul 4 2022, 3:49 PM
sunho updated this revision to Diff 442151.Jul 4 2022, 3:51 PM
lhames added a comment.Jul 8 2022, 3:05 PM

Really great work -- a fully separated COFFLinkGraphBuilder and x86-64 backend, test infrastructure hooked up and tests already written!

This is a big step forward for ORC Windows support.

Comments below.

llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
98–100

No braces needed here.

124

MemProt Prot; will default-initialize -- no need to be explicit here.

146

No need for the brace initializers -- ArrayRef<T> will default initialize.

185

Is SectionName only used for debug logging? If so, could this be refactored into a separate function (e.g. as below)? That way it will naturally compile out when we build with NDEBUG.

std::string getSectionName(COFFSectionIndex SectionIndex, const object::coff_section *Sec) {
StringRef SectionName;
    if (SectionIndex == COFF::IMAGE_SYM_UNDEFINED) {
      if (Sym->getValue())
        return "(common)";
      else
        return "(external)";
    } else if (SectionIndex == COFF::IMAGE_SYM_ABSOLUTE)
      return "(absolute)";
    else if (SectionIndex == COFF::IMAGE_SYM_DEBUG)
      return "(debug)"; // Used with .file symbol
    else {
      // Non reserved regular section number
      if (Sec) {
        if (Expected<StringRef> SecNameOrErr = Obj.getSectionName(*SecOrErr))
          return *SecNameOrErr;
        else
          return "(section name unavailable due to error: " + toString(SecNameOrErr.takeError()) + ")";
      }
      return "(section unspecified)"
    }
}

...
    COFFSectionIndex SectionIndex = Sym->getSectionNumber();
    const object::coff_section *Sec = nullptr;

    switch (SectionIndex) {
      case COFF::IMAGE_SYM_UNDEFINED:
      case COFF::IMAGE_SYM_ABSOLUTE:
      case COFF::IMAGE_SYM_DEBUG:
        break;
      default: {
        Expected<const object::coff_section *> SecOrErr =
          Obj.getSection(SectionIndex);
        if (auto SecOrErr = Obj.getSection(SectionIndex)))
          Sec = *SecOrErr;
        else
          return make_error<JITLinkError>("Invalid COFF section number:" +
                                        formatv("{0:d}: ", SectionIndex) +
                                        " (" + toString(SecOrErr.takeError()) + ")");
    }
...
      dbgs() << "    " << SymIndex << ": Skipping FileRecord symbol \""
             << SymbolName << "\" in " << getSectionName(SectionIndex, Sec)
             << " (index: " << SectionIndex << ") \n";
      });
...
200–202
if (SecOrErr.takeError())
  return make_error<JITLinkError>("Invalid COFF section number:" +
                                  formatv("{0:d}: ", SectionIndex));

This idiom is unsafe: The .takeError() method will return an llvm::Error, and in the success case the fact that you're implicitly converting to bool will count as checking the error, but in the failure case you're not returning it, but instead creating a new Error.

Whenever you check an error you want it to be a named value, because you'll need to do something with that named value in the failure case.

Two alternative approaches are:

(1) Return the error directly (easy, maintains error type, but doesn't give context to the error):

if (auto Err = SecOrErr.takeError())
  return std::move(Err);

(2) Build a new error (loses original type, but provides more context for error messages)

if (auto Err = SecOrErr.takeError())
  return make_error<JITLinkError>("Invalid COFF section number:" +
                                  formatv("{0:d}: ", SectionIndex) +
                                  " (" + toString(std::move(Err)) + ")");
231

I'm not very familiar with COFF, but just took a look at https://docs.microsoft.com/en-us/windows/win32/debug/pe-format#auxiliary-format-3-weak-externals.

It sounds like IMAGE_SYM_CLASS_WEAK_EXTERNAL behaves differently depending on the value of the Characteristics field:

  • "IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY indicates that no library search for sym1 should be performed". That sounds a bit vague, but additional conversation (e.g. here) helps to clarify:

The SVR4 ABI uses (and bfd currently supports)
the equivalent of IMAGE_WEAK_EXTERN_SEARCH_NOLIBRARY semantics (a
library member does not resolve a weak symbol unless a strong symbol
cause the member to be linked in)

Some thoughts off the top of my head:

ORC can't currently support this. ORC lookups have the related concept of a LookupKind (static linking context, or a dynamic lookup context) which we use to decide whether to pull in definitions from static archives: -- static links pull in archive defs, dynamic lookups don't.

This would be something new: "don't pull this symbol in at all, but bind it if it's already present", and it's a per-symbol behavior, rather than per-lookup.

We could support this by adding a new "ReallyWeakLinkage" type to the LinkGraph, and SymbolLookupFlags::ReallyWeaklyReferencedSymbol value.

It's worth filing a GitHub issue for this, but for now I think this patch is right to treat it as unsupported and return an error.

  • "IMAGE_WEAK_EXTERN_SEARCH_LIBRARY indicates that a library search for sym1 should be performed". We can probably implement this as a plain weak external symbol in the LinkGraph. This should be added to COFFLinkGraphBuilder in this patch or a follow-up.
  • "IMAGE_WEAK_EXTERN_SEARCH_ALIAS indicates that sym1 is an alias for sym2". We should add an explicit symbol alias representation to the LinkGraph -- MachO and ELF both need it too (though actual use is rare). For now the approach used in this patch is good.
352–358

If NDEBUG is set this condition will apply to Symbol->setSize(CandSize) below -- is that intended?

If the if (!CandSize) check is for debugging purposes only it can be moved inside the LLVM_DEBUG macro.

407

I guess we should be able to implement these semantics in the future with a new LinkGraph Section flags and a custom dead-stripping pass?

429–431

Does this mean that the second symbol in the sequence will always follow immediately after the first?
Do we need an array for ComdatLeaders, or just the most recent COMDAT leader symbol?

llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.h
68

Can we assert !COFF::isReservedSectionNumber(SecIndex) here? Or is that already checked elsewhere?

llvm/lib/ExecutionEngine/JITLink/COFF_x86_64.cpp
48–50

This shouldn't extend Edge::Kind -- they're both enums, but there's a translation step to get from COFFX86RelocationKind to a valid Edge::Kind. I'd leave this as a plain enum with no explicit type specified.

154

This should take a COFFX86RelocationKind, rather than an Edge::Kind.

sunho updated this revision to Diff 443426.Jul 9 2022, 3:49 AM
sunho marked 9 inline comments as done.
sunho added inline comments.
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
124

Looks like it doesn't default initialize, but I made it more explicit by using MemProt::None.

231

Let me do it in a follow up patch -- it will require an additional test case. Only IMAGE_WEAK_EXTERN_SEARCH_ALIAS is used in all the testcases added in this patch. Supporting alias natively in LinkGraph level is indeed interesting to look at.

407

Yes. We can dead strip these sections when parent section was dead stripped. So, we might add "parent" relationship between sections too.

429–431

In the spec, it says "The first symbol having the section value of the COMDAT section." So technically, the second symbol might be separted in the global symbol list. Practically, it should be fine to store the only most recent one though. I went with array way ultimately to potentially deal with largest symbol later. lld does some interesting "combines" of multiple COMDAT symbols to a single leader symbol when it deals with largest selection type.

lhames added inline comments.Jul 9 2022, 11:08 AM
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
124

Quite right -- my mistake!

231

Sounds good to me.

407

COFF sections are single-block, right? If so, in the short term we should use KeepAlive edges for this.

Longer term we can think about if/how to introduce COMDAT concepts into LinkGraph itself.

429–431

Do we need to worry about combines? I've been assuming that COMDAT sections are unique within an individual .o.

sunho updated this revision to Diff 443452.Jul 9 2022, 11:54 AM
sunho added inline comments.Jul 9 2022, 11:56 AM
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
429–431

I looked into this again and you're right -- it should be across object files. I've updated the code to only look at most recent one and made naming more clear. To detect failrue (e.g. MC somehow sneak in comdat symbol of another section), I've added a bunch of error checkings.

lhames accepted this revision.Jul 12 2022, 11:22 AM

LGTM. Thanks @sunho!

This revision is now accepted and ready to land.Jul 12 2022, 11:22 AM
This revision was landed with ongoing or failed builds.Jul 12 2022, 11:54 AM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
339

LastDifferentOffset is not used with -Asserts.

sunho added inline comments.Jul 14 2022, 4:11 AM
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
339

@chapuni Hi, I'll fix this shortly. Thanks for letting me know.

sunho added inline comments.Jul 14 2022, 4:15 AM
llvm/lib/ExecutionEngine/JITLink/COFFLinkGraphBuilder.cpp
339