This is an archive of the discontinued LLVM Phabricator instance.

Arm64EC entry/exit thunks, consolidated.
AcceptedPublic

Authored by efriedma on Aug 9 2023, 1:51 PM.

Details

Summary

Arm64EC entry/exit thunks, consolidated.

This combines the previously posted patches with some additional work I've done to more closely match MSVC output.

Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs.

The other changes are supporting changes, to handle the new constructs generated by that pass.

There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that.

There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.

I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently.

I'm planning to write more IR tests for test coverage.

Diff Detail

Event Timeline

efriedma created this revision.Aug 9 2023, 1:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2023, 1:51 PM
efriedma requested review of this revision.Aug 9 2023, 1:51 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 9 2023, 1:51 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jacek added inline comments.Aug 16 2023, 3:52 PM
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
54–57

I think that mangled weak symbol should link to another kind exit thunk, not to unmangled symbol directly.

When an extern code symbol is resolved by an unmangled name, it means that we have ARM64EC->X64 call and we need to use an exit thunk. Linker doesn't need a special logic for that: on MSVC it seems to be achieved by pointing the antidependency symbol to yet another exit thunk. That other exit thunk loads the target exit thunk and X64 (unmangled) symbol into x10, x11 and uses __os_arm64x_dispatch_icall to call emulator. I guess we may worry about that later, but in that case maybe it's better not to emit mangled antidependency symbol at all for now?

efriedma added inline comments.Aug 16 2023, 4:42 PM
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
54–57

From what I recall, I wrote the code here primarily to deal with issues trying to take the address of a function; even if the function is defined in arm64ec code, the address is the unmangled symbol. So there's some weirdness there involving symbol lookup even before there's any actual x64 code involved.

If you have a better idea of how external symbol references are supposed to be emitted, I'd appreciate a brief writeup, since all the information I have comes from trying to read dumpbin dumps of MSVC output.

jacek added inline comments.Aug 17 2023, 3:50 AM
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
54–57

Yes, when EC code takes an address of a function, it needs to use unmangled symbols, because mangled symbols may point to a thunk (in cases when the implementation is in X64 or __declspec(hybrid_patchable) is used) and we still want an address of the real implementation. Here is an example how MSVC sets it up:

$ cat test.c
extern int otherfunc(void);
int myfunc(void) { return otherfunc(); }
$ llvm-readobj --symbols test.obj
...

Symbol {
  Name: #otherfunc$exit_thunk
  Value: 0
  Section: .wowthk$aa (4)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: External (0x2)
  AuxSymbolCount: 0
}

...

Symbol {
  Name: #otherfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #otherfunc$exit_thunk (14)
    Search: AntiDependency (0x4)
  }
}
Symbol {
  Name: #myfunc
  Value: 0
  Section: .text$mn (3)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: External (0x2)
  AuxSymbolCount: 0
}
Symbol {
  Name: otherfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #otherfunc (19)
    Search: AntiDependency (0x4)
  }
}
Symbol {
  Name: myfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #myfunc (21)
    Search: AntiDependency (0x4)
  }
}

In this example myfunc links to #myfunc (and similarly otherfunc->#otherfunc). That's enough because even if we pass a pointer of #myfunc to actual X64 code which will try to call this address, emulator will have a chance to figure it out and use entry thunk as needed.

However, #otherfunc points to #otherfunc$exit_thunk (which then references otherfunc and $iexit_thunk$cdecl$i8$v in its implementation). If otherfunc is implemented in ARM64EC, #otherfunc symbol will be replaced by the object file implementing it. If it will not be replaced, it means that symbol is resolved to X64 implementation and the linked thunk will be used to call it.

My information are based on experimentation with MSVC as well, so those are only my best guesses. I mostly experimented with it in a context of linker support and I have those aspects of linker working in my tree. I will work on more comprehensive description of those things.

dpaoliello added inline comments.Aug 17 2023, 3:59 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2735–2738

To be clear: you're seeing MSVC emit the same entry twice, but you opted to emit the entry only once?

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
333–334

Having a look through the available parameter attributes:

  • Do we need byval and byref, or are they irrelevant to the thunk?
  • Should we copy the alignment attributes to keep them the same?
  • Might be interesting to copy some of the optimization related ones (unless we're too late in the compilation) like readnone, readonly, immarg, returned, etc.
609

Are you asking if we should be handling functions with weak linkage, or is this a TODO to implement support somewhere else?

642–646

Can you provide some more details about this? What extra stub are you seeing?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1644

Full list is:

#ceil
#floor
#memcmp
#memcpy
#memmove
#memset
llvm/lib/Target/AArch64/AArch64InstrInfo.td
4667–4668

Should these be added to AArch64InstrInfo::isSEHInstruction?

dpaoliello added inline comments.Aug 22 2023, 10:25 AM
llvm/lib/Target/AArch64/AArch64CallingConvention.td
555–557

I've been hitting asserts in computeCalleeSaveRegisterPairs due to this: LLVM doesn't support gaps when saving registers for Windows (~line 2744 of AArch64FrameLowering.cpp), so placing the smaller registers before the larger ones causes issues if those smaller ones aren't paired (the assert(OffsetPre % Scale == 0); fires since OffsetPre will be a multiple of 8 but Scale will be 16).

efriedma added inline comments.Aug 22 2023, 11:32 AM
llvm/lib/Target/AArch64/AArch64CallingConvention.td
555–557

That looks right; I thought I had that fix in here, but I guess I mixed up my patches and pulled in a different version.

(I'll reply to the other review comments in more detail soon.)

Another issue: looks like the names in AArch64Subtarget.h need to be fully decorated, including the leading # - so #__chkstk_arm64ec and #__security_check_cookie_arm64ec.

efriedma updated this revision to Diff 552917.Aug 23 2023, 4:16 PM

Update to address issues found so far.

efriedma marked 2 inline comments as done.Aug 23 2023, 4:17 PM
jacek added a comment.Aug 25 2023, 9:49 AM

I published an initial version of my attempt at documenting this stuff here: https://wiki.winehq.org/ARM64ECToolchain.

It's not yet complete, but I hope it's already useful. I plan to extend it in the future. If some clarification or more information would be useful, please let me know and I will try to improve it or fill the gaps.

bylaws added a subscriber: bylaws.Sep 13 2023, 7:31 AM
bylaws added inline comments.
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
62

This mangling logic is also necessary for dllimported functions, as is visible if you compile:

__declspec(dllimport) long UnhandledExceptionFilter (void *ExceptionInfo);

void test() {

UnhandledExceptionFilter(0);

}

With cl.exe /c and run dumpbin, without this link.exe will fail

dpaoliello added inline comments.Sep 13 2023, 1:16 PM
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
37–38

We now also need to call this for constants in case the function is only referenced from a variable (e.g., a VTable) and is never called elsewhere.

I split this function into a new function GetGlobalValueSymbol:

MCSymbol *
AArch64MCInstLower::GetGlobalAddressSymbol(const MachineOperand &MO) const {
  return GetGlobalValueSymbol(MO.getGlobal(), MO.getTargetFlags());
}

MCSymbol *
AArch64MCInstLower::GetGlobalValueSymbol(const GlobalValue *GV, unsigned TargetFlags) const { ... }

And then called it from AArch64AsmPrinter:

const MCExpr *AArch64AsmPrinter::lowerConstant(const Constant *CV) {
  if (const GlobalValue *GV = dyn_cast<GlobalValue>(CV))
    return MCSymbolRefExpr::create(MCInstLowering.GetGlobalValueSymbol(GV, 0), OutContext);

  return AsmPrinter::lowerConstant(CV);
}
bylaws added inline comments.Sep 14 2023, 7:56 AM
llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
62

Looking into it further, it probably makes sense to drop MO_DLLIMPORT from the hasDLLImportStorageClass case in AArch64Subtarget::classifyGlobalFunctionReference, then emit the __imp symbol for that special case here

efriedma updated this revision to Diff 557315.Sep 25 2023, 10:31 AM

Updated with my attempt at guest exit thunks.

Seems to be working for simple cases, but I'm not sure this is actually working properly (I'm still seeing LNK1000 crashes).

jacek added a comment.Sep 25 2023, 2:40 PM

Seems to be working for simple cases, but I'm not sure this is actually working properly (I'm still seeing LNK1000 crashes).

I don't know if it's related to crashes that you're seeing, but those anti-dependency symbols look different than what MSVC produces. With the patch, I get:

$ cat extfunc.c
extern void extfunc(void);
void func(void) { extfunc(); }
$ clang -target arm64ec-windows -c extcall.c -o extcall.o
$ llvm-readobj --symbols extcall.o
...

Symbol {
  Name: #extfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Null (0x0)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: .weak.#extfunc.default.#func (40)
    Search: AntiDependency (0x4)
  }
}

...

Symbol {
  Name: extfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Null (0x0)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #extfunc$exit_thunk (43)
    Search: AntiDependency (0x4)
  }
}

While an equivalent object file produced by MSVC has:

Symbol {
  Name: #extfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #extfunc$exit_thunk (14)
    Search: AntiDependency (0x4)
  }
}
Symbol {
  Name: extfunc
  Value: 0
  Section: IMAGE_SYM_UNDEFINED (0)
  BaseType: Null (0x0)
  ComplexType: Function (0x2)
  StorageClass: WeakExternal (0x69)
  AuxSymbolCount: 1
  AuxWeakExternal {
    Linked: #extfunc (19)
    Search: AntiDependency (0x4)
  }
}

It's the mangled symbol that's linked to guest exit thunk, not unmangled one. Also there is no .weak.*.default.* involved.

BTW, I hope to get my WIP lld-link branch into usable state very soon and plan to publish it then. I'm able to link real-world code now, but it still requires a few hacks that I need to fix to make it work out of the box. Hopefully it will make testing and debugging such problems easier.

efriedma updated this revision to Diff 557335.Sep 25 2023, 3:34 PM

Revised so guest exit thunks actually work. Thanks for the tip about the symbols; I forgot to look at the object file in addition to the generated assembly.

@efriedma the patch is failing to apply - can you please update your baseline, or move this to a GitHub PR?

dpaoliello added inline comments.Oct 2 2023, 4:47 PM
llvm/lib/Target/AArch64/AArch64CallingConvention.td
248–257

These seem wrong, shouldn't the shadows be [X0, X1, X2, X3] (instead of X2 twice)

efriedma updated this revision to Diff 557592.Oct 4 2023, 10:56 AM

Updated with feedback from @dpaoliello and additional internal testing.

dpaoliello accepted this revision.Oct 4 2023, 1:10 PM
This revision is now accepted and ready to land.Oct 4 2023, 1:10 PM
jacek added a comment.Oct 9 2023, 10:25 AM

It works nice with my testing so far, thanks!

I noticed one problem with code that defines symbols in assembly. It conflicts with anti-dependency symbols emitted by codegen and hits an asserts in assembly printer. This commit fixes it for me:
https://github.com/cjacek/llvm-project/commit/879b1b6bf99b1e506ae04b8db0b44eb1b2205f19

Here is a test case:

extern void asmfunc(void);
__asm__( ".globl \"#asmfunc\"\n"
         "\t.section .text,\"xr\",discard,\"#asmfunc\"\n"
         "\"#asmfunc\":\n\t"
         "nop; ret\n\t" );
void test(void) { asmfunc(); }

For assembly, I'm not really comfortable with the fix you're proposing; it relies on the order in which functions are defined in assembly. I think LLVM usually ends up emitting module-level inline assembly before generated code, but it seems fragile to depend on that ordering. And the way the check is written doesn't account for other differences in the way we normally generate code for declarations vs. definitions, which I'm afraid might make the linker unhappy.

How important is that particular pattern? I think most patterns involving assembly should be covered by some combination of naked functions, and functions defined in separate assembly files, both of which avoid weird questions about what the compiler should do when assembly tries to define a function symbol.

efriedma updated this revision to Diff 557651.Oct 9 2023, 11:21 AM

Fix the calling convention for functions returning C++ classes.

For assembly, I'm not really comfortable with the fix you're proposing; it relies on the order in which functions are defined in assembly. I think LLVM usually ends up emitting module-level inline assembly before generated code, but it seems fragile to depend on that ordering. And the way the check is written doesn't account for other differences in the way we normally generate code for declarations vs. definitions, which I'm afraid might make the linker unhappy.

Also, checking isUndefined() completely explodes if you use -fno-integrated-as, although I guess that's not common for Windows targets.

dpaoliello accepted this revision.Oct 9 2023, 11:49 AM
jacek added a comment.Oct 9 2023, 12:13 PM

How important is that particular pattern? I think most patterns involving assembly should be covered by some combination of naked functions, and functions defined in separate assembly files, both of which avoid weird questions about what the compiler should do when assembly tries to define a function symbol.

Thank for suggestions. We can indeed change that to naked functions (or separated assembly file when that's not suitable). We use full assembly declaration on other targets for compatibility reasons (which may be worth revisiting if that's still a problem and it's definitely not a concern in Arm64EC case).

It may be nice to output some form of error instead of assert crash in those cases, but I guess that's a secondary concern at this point.

dpaoliello added inline comments.Wed, Dec 6, 2:25 PM
llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2717–2719

Comment for what this table is?

llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
322

Can you please add comments for each of these buildXXX functions to explain what each of these thunks are used for (especially exit thunk vs guest exit thunk)

520

Redundant assert (already asserted on line 511)