This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/AST/
-
clang/
-
AST/
-
Mangle.h
-
lib/
-
AST/
-
MicrosoftMangle.cpp
-
CodeGen/
4/8
CodeGenModule.cpp
-
test/
-
CodeGen/
-
arm64ec.c
-
CodeGenCXX/
-
arm64ec.cpp

Differential D125418

[Arm64EC 6/?] Implement C/C++ mangling for Arm64EC function definitions.
Needs ReviewPublic

Authored by efriedma on May 11 2022, 1:52 PM.

Download Raw Diff

Details

Reviewers

rjmccall
mstorsjo
dpaoliello
rnk

Summary

Part of initial Arm64EC patchset.

For the Arm64EC ABI, ARM64 functions have an alternate name. For C code, this name is just the original name prefixed with "#". For C++ code, we stick a "$$h" modifier in the middle of the mangling.

For functions which are not hybrid_patchable, the normal name is then an alias for the alternate name. (For functions that are patchable, we have to do something more complicated to tell the linker to generate a stub; I haven't tried to implement that yet.)

This doesn't emit quite the same symbols table as MSVC for simple cases: MSVC generates a IMAGE_WEAK_EXTERN_ANTI_DEPENDENCY alias, where this just makes another symbol pointing at the function definition. This probably matters for the hybmp$x table, but I don't have the complete documentation at the moment.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

efriedma created this revision.May 11 2022, 1:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 1:52 PM

Herald added subscribers: zzheng, kristof.beyls. · View Herald Transcript

efriedma requested review of this revision.May 11 2022, 1:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 11 2022, 1:52 PM

efriedma added a parent revision: D125417: [ARM64EC 5/?] Fix names of __chkstk and __security_check_cookie..May 11 2022, 1:52 PM

Harbormaster completed remote builds in B163978: Diff 428767.May 11 2022, 1:53 PM

efriedma added a child revision: D125419: [Arm64EC 7/?] clang side of Arm64EC varargs ABI..May 11 2022, 1:55 PM

dpaoliello added a subscriber: dpaoliello.May 31 2022, 10:55 AM

bcl5980 added a subscriber: bcl5980.Jul 18 2022, 7:06 PM

bcl5980 added inline comments.

clang/lib/CodeGen/CodeGenModule.cpp
5300	A headache thing here. We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version. For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code": struct SC { char a; char b; char c; }; int fB(int a, double b, int i1, int i2, int i3); int fC(int a, struct SC c, int i1, int i2, int i3); int fA(int a, double b, struct SC c, int i1, int i2, int i3) { return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3); } x64 version IR for fA is: define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... } aarch64 version IR for fA is: define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...} Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. Entry thunk follow x64 version function type. But we only have aarch64 version function type. I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen. One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think.

efriedma added inline comments.Jul 19 2022, 10:31 AM

clang/lib/CodeGen/CodeGenModule.cpp
5300	Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7. Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang. Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else. (We just need a bit to tell the TargetInfo to redirect the call, like D125419. Use an entry point like CodeGenTypes::arrangeCall.) You don't need to mess with the type system or anything like that. The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations. I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations. I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI. Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior. This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse.

bcl5980 added inline comments.Aug 9 2022, 9:52 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are difference between x86 ABI and Aarch64 ABI. Maybe we can emit a function declaration here for the x86ABI thunk, then define it in Arm64ECCallLowering.

efriedma added inline comments.Aug 10 2022, 1:07 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that. But that's not really important for the general question. We need to preserve the required semantics for both the AArch64 and x86 calling conventions. There are basically the following possibilities: We compute the declaration of the thunk in the frontend, and attach it to the call with an operand bundle. Like I mentioned, I don't want to go down this path: the operand bundle blocks optimizations, and it becomes more complicated for other code to generate arm64ec compatible calls. We don't compute the definition of the thunk in the frontend. Given that, the only other way to attach the information we need to the call is to use attributes. The simplest thing is probably to attach the attribute directly to the argument; name it "arm64ec-thunk-pass-indirect", or something like that. (I mean, we could compute the whole signature and stuff it into a string attribute, but that doesn't really seem like an improvement...)

bcl5980 added inline comments.Aug 10 2022, 7:24 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that. Yeah, current code for exit thunk already account for that. I mean we need to mark the parameter because entry thunk behavior is also different. Maybe we can compute the mangle name like `$iexit_thunk$cdecl$i8$m6` or `$ientry_thunk$cdecl$m16$f` for the thunk function. Then set attributes like "arm64ec-exitthunk"="$iexit_thunk$cdecl$i8$m6" "arm64ec-entrythunk"="$ientry_thunk$cdecl$m16$f" to the function. Based on the mangle name we can restore the whole thunk I think. This should be a little easier.

efriedma added inline comments.Aug 11 2022, 12:41 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	Each function has an arm64 function signature, and a corresponding x64 signature. The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature. That part is the same whether we're generating an entry thunk, or an exit thunk. So I'm not sure why you're distinguishing between them in this context. I'm not sure it makes sense to force the frontend to generate the mangled form, then make the backend demangle it. Seems more straightforward to just attach an attribute to an argument, and make the backend generate the mangled form?

bcl5980 added inline comments.Aug 11 2022, 8:00 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	Each function has an arm64 function signature, and a corresponding x64 signature. The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature. That part is the same whether we're generating an entry thunk, or an exit thunk. So I'm not sure why you're distinguishing between them in this context. I mean which arguments need to be marked is different for the entry thunk and exit thunk. Both entry thunk and exit thunk need to mark argument with size 3/5/6/7. But when the size is larger than 8 and less than 16, entry thunk still need to mark it but exit thunk needn't. So if we attach an attribute to an argument we need to consider the case larger than 8 less than 16 also. Because when a function has an argument with size 15bytes, frontend will coerce it to i64x2. If we don't attach an attribute for it , backend can't generate the correct entry thunk as we already loss the real size of the argument. Exit thunk needn't that because the code for 15bytes and 16 bytes is the same, store i64x2 to the memory them pass the address. This is part of 15bytes entry thunk mov fp,sp mov x10,x1 ldr w1,[x10,#8] mov x19,x0 ldur w8,[x10,#0xB] ldr x0,[x10] bfi x1,x8,#0x18,#0x20 blr x9 This is part of 16bytes entry thunk mov fp,sp mov x8,x1 mov x19,x0 ldp x0,x1,[x8] blr x9

efriedma added inline comments.Aug 12 2022, 12:09 PM

clang/lib/CodeGen/CodeGenModule.cpp
5300	Oh, I see what you mean. The size of the memory is the same either way according to the ABI, but we're "cheating" a bit with exit thunks: nothing cares if we allocate extra memory for entry thunks, so we can emit slightly shorter code. But for entry thunks, if we read past the end, we could cause a fault. (Realistically, we could probably get away with reading 16 bytes for entry thunks, and not the "right" amount. In practice, indirect arguments always point to memory on the stack, and there's always going to be something on the stack after that, so reading past the end will never fault. But if MSVC is conservative here, maybe we should be too.) (On a related side-note, if we do want to generate the conservative sequence, the code MSVC is generating here is sort of inefficient; something like "ldur x1, [x1, #7]; lshr x1, x1, #8" is going to be faster than ldr+ldr+bfi.) I don't think that means we need the frontend to generate different markings depending on whether we're dealing with entry or exit thunks, though. The thunk generation code can handle the difference transparently if we don't make the frontend mangle the thunk signature.

Another thing we need consider here is this case:

#pragma pack(push, 1)
    struct b64 {
        char a[64];
    };
#pragma pack(pop)

    typedef b64 (fptrtype)(int a);

    b64 f(void* p, int a) {
        return ((fptrtype*)p)(a);
    }

For now we generate exit_thunk with type void f(void* sret(b64) ret, int a)

$iexit_thunk$cdecl$v$i8i8:              // @"$iexit_thunk$cdecl$v$i8i8"
.seh_proc $iexit_thunk$cdecl$v$i8i8
// %bb.0:
	sub	sp, sp, #48
	.seh_stackalloc	48
	stp	x29, x30, [sp, #32]             // 16-byte Folded Spill
	.seh_save_fplr	32
	add	x29, sp, #32
	.seh_add_fp	32
	.seh_endprologue
	mov	w1, w0
	mov	x0, x8
	adrp	x8, __os_arm64x_dispatch_call_no_redirect
	ldr	x8, [x8, :lo12:__os_arm64x_dispatch_call_no_redirect]
	blr	x8
	.seh_startepilogue
	ldp	x29, x30, [sp, #32]             // 16-byte Folded Reload
	.seh_save_fplr	32
	add	sp, sp, #48
	.seh_stackalloc	48
	.seh_endepilogue
	ret
	.seh_endfunclet
	.seh_endproc
                                        // -- End function
	.globl	f
	.def	f;
	.scl	2;
	.type	32;
	.endef

But it looks Microsoft generate exit thunk with type void* f(int a)

|$iexit_thunk$cdecl$i8$i8| PROC
|$LN2|
	pacibsp
	stp         fp,lr,[sp,#-0x10]!
	mov         fp,sp
	sub         sp,sp,#0x20
	adrp        x8,__os_arm64x_dispatch_call_no_redirect
	ldr         xip0,[x8,__os_arm64x_dispatch_call_no_redirect]
	blr         xip0
	mov         x0,x8
	add         sp,sp,#0x20
	ldp         fp,lr,[sp],#0x10
	autibsp
	ret

	ENDP  ; |$iexit_thunk$cdecl$i8$i8|

But based on clang x86 on Windows, we also generate the function type with void f(void* sret(b64) ret, int a).
It looks clang is different from MSVC even in x86 ABI.
Do we need to follow MSVC to generate $iexit_thunk$cdecl$i8$i8 ? Or just follow clang's ABI and ignore the difference?

There's no way the calling convention can change based on whether you're calling a function vs. a function pointer. I can't explain why MSVC is generating different code. I think we should just ignore it, at least for now.

In D125418#3756223, @efriedma wrote:

There's no way the calling convention can change based on whether you're calling a function vs. a function pointer. I can't explain why MSVC is generating different code. I think we should just ignore it, at least for now.

It's OK for me to ignore the difference but I think the main thing is not function or function pointer. It's how to generate the exit thunkwhen return with structure size value > 16.
https://godbolt.org/z/MWv4YaKdK
Three different way to call extern function, with three kind of exit thunks. All of them are keep the return value, not move the return value' point to the first argument.

The reason struct returns require register shuffling is that AArch64 passes the sret pointer in x8 (i.e. RAX), but the x64 calling convention expects in in RCX (i.e. x0).

Have you tried to see if the Microsoft-generated thunk actually works? I found at least one bug in MSVC thunk generation and reported it to Microsoft. (Microsoft didn't acknowledge the report, but that's a different story...)

In D125418#3759174, @efriedma wrote:

The reason struct returns require register shuffling is that AArch64 passes the sret pointer in x8 (i.e. RAX), but the x64 calling convention expects in in RCX (i.e. x0).

So, for the function: s64 f(int a):
AArch64 CC: void f(x8, x0)
X64 CC: void f(rcx[x0], rdx[x1])
AArch64 --> X64 we need to add instructions before blr

mov x1, x0
mov x0, x8

It can match iexit_thunk$cdecl$m64$i8 when we call extern function not a function pointer.

Have you tried to see if the Microsoft-generated thunk actually works? I found at least one bug in MSVC thunk generation and reported it to Microsoft. (Microsoft didn't acknowledge the report, but that's a different story...)

You are right. For now, I haven't tested too much case runtime. But it looks if a DLL import function pass to a function pointer, then call it will cause access violation.
Based on the debug result, it should be exit thunk issue, MSVC generate wrong thunk type.

Rebased

Harbormaster completed remote builds in B184724: Diff 457457.Sep 1 2022, 5:38 PM

I think I'd like to continue moving forward with approximately this approach, at least for the moment. As far as I know, D132926 solves the remaining issues with translating the calling conventions. (I'll try to review D132926 soon.)

bcl5980 mentioned this in D133408: [WIP][LegalizeTypes][LegalizeDAG] Use misaligned load/store to optimize memory access with non-power2 integer types..Sep 7 2022, 2:33 AM

Ping

I think this looks reasonable to me, but I don't think I'm knowledgeable enough to give this a proper review, sorry.

A question about the mangle and alias part.
Should we move the code to create alias to backend also？Sometimes we will emit the alias here but later the function will be inlined or eliminated by DCE.
And later we need to emit alias for direct call thunk also, like $originname$exitthunk. Put all of them into arm64eccalllowering pass should be better I think.

Sometimes we will emit the alias here but later the function will be inlined or eliminated by DCE.

If the alias is externally visible, it can't be eliminated; the compiler can't tell whether the symbol is referenced. If the alias isn't externally visible, it's dead from the outset. Not sure how this could become an issue.

And later we need to emit alias for direct call thunk also, like $originname$exitthunk.

Direct call thunks aren't directly relevant here; we only emit them for declarations, not definitions. I guess this does imply that we need to teach arm64eccalllowering how to modify mangled symbol names... and we could use that same code to insert the $$h.

Put all of them into arm64eccalllowering pass should be better I think.

I really don't want to do demangling in arm64eccalllowering. But looking at the generated patterns a bit more closely, maybe we don't have to fully parse the mangled symbol. If we can get away with just parsing the "?symbolname@@" at the beginning of the symbol, and ignore all the type-related stuff, I guess that would be okay.

Alternatively, I guess we could use attributes to communicate the different mangled forms to the backend, but probably better to avoid that if we can.

If we can solve the mangling issues, I guess generating the alias in arm64eccalllowering would be fine.

In D125418#3806720, @efriedma wrote:

Sometimes we will emit the alias here but later the function will be inlined or eliminated by DCE.

If the alias is externally visible, it can't be eliminated; the compiler can't tell whether the symbol is referenced. If the alias isn't externally visible, it's dead from the outset. Not sure how this could become an issue.

There will be no functional issue here. I mean that we can avoid generate some redundant alias if it is in the arm64eccalllowering.

And later we need to emit alias for direct call thunk also, like $originname$exitthunk.

Direct call thunks aren't directly relevant here; we only emit them for declarations, not definitions. I guess this does imply that we need to teach arm64eccalllowering how to modify mangled symbol names... and we could use that same code to insert the $$h.

Put all of them into arm64eccalllowering pass should be better I think.

I really don't want to do demangling in arm64eccalllowering. But looking at the generated patterns a bit more closely, maybe we don't have to fully parse the mangled symbol. If we can get away with just parsing the "?symbolname@@" at the beginning of the symbol, and ignore all the type-related stuff, I guess that would be okay.

Alternatively, I guess we could use attributes to communicate the different mangled forms to the backend, but probably better to avoid that if we can.

If we can solve the mangling issues, I guess generating the alias in arm64eccalllowering would be fine.

As far as I know, there are three kinds of alias we need to generate, For example:

extern "C" void function_name(void a)
    arm64 signature: #function_name(native)

If it is function definition, we need to create an alias to be x86 signature, and mangle a new name to arm64ec signature. That is what this change do.

x86 signature: function_name

If it is a function direct call case, we need to create two alias:

function thunk: #function_name$exit_thunk
x86 signature: function_name

I don't understand too much why we need to demangle the function name in arm64eccallovering. It looks what we need to do is generate the arm64 signature name from the default symbol name which is x86 signature by default.
I'm not familiar with normal mangle rules. If # and $$h are unique, maybe we can just insert # on the beginning for C symbol name or insert $$h after first`@@` for C++ symbol name. Like:

if (MangleName._Starts_with("?")) {
  size_t InsertIdx = MangleName.find("@@");
  if (InsertIdx != std::string::npos)
    MangleName.insert(InsertIdx + 2, "$$h");
} else {
  MangleName.insert(0, "#");

Revision Contents

Path

Size

clang/

include/

clang/

AST/

Mangle.h

4 lines

lib/

AST/

MicrosoftMangle.cpp

32 lines

CodeGen/

CodeGenModule.cpp

17 lines

test/

CodeGen/

arm64ec.c

7 lines

CodeGenCXX/

arm64ec.cpp

7 lines

Diff 457457

clang/include/clang/AST/Mangle.h

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	public:
virtual bool isUniqueInternalLinkageDecl(const NamedDecl *ND) {		virtual bool isUniqueInternalLinkageDecl(const NamedDecl *ND) {
return false;		return false;
}		}

virtual void needsUniqueInternalLinkageNames() { }		virtual void needsUniqueInternalLinkageNames() { }

// FIXME: consider replacing raw_ostream & with something like SmallString &.		// FIXME: consider replacing raw_ostream & with something like SmallString &.
void mangleName(GlobalDecl GD, raw_ostream &);		void mangleName(GlobalDecl GD, raw_ostream &);
		// Mangling for function definitions in Arm64EC ABI.
		virtual void mangleArm64ECFnDef(GlobalDecl GD, raw_ostream &) {
		llvm_unreachable("Unexpected ABI");
		}
virtual void mangleCXXName(GlobalDecl GD, raw_ostream &) = 0;		virtual void mangleCXXName(GlobalDecl GD, raw_ostream &) = 0;
virtual void mangleThunk(const CXXMethodDecl *MD,		virtual void mangleThunk(const CXXMethodDecl *MD,
const ThunkInfo &Thunk,		const ThunkInfo &Thunk,
raw_ostream &) = 0;		raw_ostream &) = 0;
virtual void mangleCXXDtorThunk(const CXXDestructorDecl *DD, CXXDtorType Type,		virtual void mangleCXXDtorThunk(const CXXDestructorDecl *DD, CXXDtorType Type,
const ThisAdjustment &ThisAdjustment,		const ThisAdjustment &ThisAdjustment,
raw_ostream &) = 0;		raw_ostream &) = 0;
virtual void mangleReferenceTemporary(const VarDecl *D,		virtual void mangleReferenceTemporary(const VarDecl *D,
▲ Show 20 Lines • Show All 179 Lines • Show Last 20 Lines

clang/lib/AST/MicrosoftMangle.cpp

Show First 20 Lines • Show All 293 Lines • ▼ Show 20 Lines	public:
}		}

/// Return a character sequence that is (somewhat) unique to the TU suitable		/// Return a character sequence that is (somewhat) unique to the TU suitable
/// for mangling anonymous namespaces.		/// for mangling anonymous namespaces.
StringRef getAnonymousNamespaceHash() const {		StringRef getAnonymousNamespaceHash() const {
return AnonymousNamespaceHash;		return AnonymousNamespaceHash;
}		}

		void mangleArm64ECFnDef(GlobalDecl GD, raw_ostream &) override;
private:		private:
void mangleInitFiniStub(const VarDecl *D, char CharCode, raw_ostream &Out);		void mangleInitFiniStub(const VarDecl *D, char CharCode, raw_ostream &Out);
};		};

/// MicrosoftCXXNameMangler - Manage the mangling of a single name for the		/// MicrosoftCXXNameMangler - Manage the mangling of a single name for the
/// Microsoft Visual C++ ABI.		/// Microsoft Visual C++ ABI.
class MicrosoftCXXNameMangler {		class MicrosoftCXXNameMangler {
MicrosoftMangleContextImpl &Context;		MicrosoftMangleContextImpl &Context;
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	MicrosoftCXXNameMangler(MicrosoftMangleContextImpl &C, raw_ostream &Out_,
TemplateArgStringStorage(TemplateArgStringStorageAlloc),		TemplateArgStringStorage(TemplateArgStringStorageAlloc),
PointersAre64Bit(C.getASTContext().getTargetInfo().getPointerWidth(0) ==		PointersAre64Bit(C.getASTContext().getTargetInfo().getPointerWidth(0) ==
64) {}		64) {}

raw_ostream &getStream() const { return Out; }		raw_ostream &getStream() const { return Out; }

void mangle(GlobalDecl GD, StringRef Prefix = "?");		void mangle(GlobalDecl GD, StringRef Prefix = "?");
void mangleName(GlobalDecl GD);		void mangleName(GlobalDecl GD);
void mangleFunctionEncoding(GlobalDecl GD, bool ShouldMangle);		void mangleFunctionEncoding(GlobalDecl GD, bool ShouldMangle, bool Arm64ECDef = false);
void mangleVariableEncoding(const VarDecl *VD);		void mangleVariableEncoding(const VarDecl *VD);
void mangleMemberDataPointer(const CXXRecordDecl RD, const ValueDecl VD,		void mangleMemberDataPointer(const CXXRecordDecl RD, const ValueDecl VD,
StringRef Prefix = "$");		StringRef Prefix = "$");
void mangleMemberFunctionPointer(const CXXRecordDecl *RD,		void mangleMemberFunctionPointer(const CXXRecordDecl *RD,
const CXXMethodDecl *MD,		const CXXMethodDecl *MD,
StringRef Prefix = "$");		StringRef Prefix = "$");
void mangleVirtualMemPtrThunk(const CXXMethodDecl *MD,		void mangleVirtualMemPtrThunk(const CXXMethodDecl *MD,
const MethodVFTableLocation &ML);		const MethodVFTableLocation &ML);
void mangleNumber(int64_t Number);		void mangleNumber(int64_t Number);
void mangleNumber(llvm::APSInt Number);		void mangleNumber(llvm::APSInt Number);
void mangleFloat(llvm::APFloat Number);		void mangleFloat(llvm::APFloat Number);
void mangleBits(llvm::APInt Number);		void mangleBits(llvm::APInt Number);
void mangleTagTypeKind(TagTypeKind TK);		void mangleTagTypeKind(TagTypeKind TK);
void mangleArtificialTagType(TagTypeKind TK, StringRef UnqualifiedName,		void mangleArtificialTagType(TagTypeKind TK, StringRef UnqualifiedName,
ArrayRef<StringRef> NestedNames = None);		ArrayRef<StringRef> NestedNames = None);
void mangleAddressSpaceType(QualType T, Qualifiers Quals, SourceRange Range);		void mangleAddressSpaceType(QualType T, Qualifiers Quals, SourceRange Range);
void mangleType(QualType T, SourceRange Range,		void mangleType(QualType T, SourceRange Range,
QualifierMangleMode QMM = QMM_Mangle);		QualifierMangleMode QMM = QMM_Mangle);
void mangleFunctionType(const FunctionType *T,		void mangleFunctionType(const FunctionType *T,
const FunctionDecl *D = nullptr,		const FunctionDecl *D = nullptr,
bool ForceThisQuals = false,		bool ForceThisQuals = false,
bool MangleExceptionSpec = true);		bool MangleExceptionSpec = true);
void mangleNestedName(GlobalDecl GD);		void mangleNestedName(GlobalDecl GD);
		void mangleArm64ECFnDef(GlobalDecl GD);

private:		private:
bool isStructorDecl(const NamedDecl *ND) const {		bool isStructorDecl(const NamedDecl *ND) const {
return ND == Structor \|\| getStructor(ND) == Structor;		return ND == Structor \|\| getStructor(ND) == Structor;
}		}

bool is64BitPointer(Qualifiers Quals) const {		bool is64BitPointer(Qualifiers Quals) const {
LangAS AddrSpace = Quals.getAddressSpace();		LangAS AddrSpace = Quals.getAddressSpace();
▲ Show 20 Lines • Show All 174 Lines • ▼ Show 20 Lines	void MicrosoftCXXNameMangler::mangle(GlobalDecl GD, StringRef Prefix) {
else if (isa<TemplateParamObjectDecl>(D)) {		else if (isa<TemplateParamObjectDecl>(D)) {
// Template parameter objects don't get a <type-encoding>; their type is		// Template parameter objects don't get a <type-encoding>; their type is
// specified as part of their value.		// specified as part of their value.
} else		} else
llvm_unreachable("Tried to mangle unexpected NamedDecl!");		llvm_unreachable("Tried to mangle unexpected NamedDecl!");
}		}

void MicrosoftCXXNameMangler::mangleFunctionEncoding(GlobalDecl GD,		void MicrosoftCXXNameMangler::mangleFunctionEncoding(GlobalDecl GD,
bool ShouldMangle) {		bool ShouldMangle,
		bool Arm64ECDef) {
const FunctionDecl *FD = cast<FunctionDecl>(GD.getDecl());		const FunctionDecl *FD = cast<FunctionDecl>(GD.getDecl());
// <type-encoding> ::= <function-class> <function-type>		// <type-encoding> ::= <function-class> <function-type>

// Since MSVC operates on the type as written and not the canonical type, it		// Since MSVC operates on the type as written and not the canonical type, it
// actually matters which decl we have here. MSVC appears to choose the		// actually matters which decl we have here. MSVC appears to choose the
// first, since it is most likely to be the declaration in a header file.		// first, since it is most likely to be the declaration in a header file.
FD = FD->getFirstDecl();		FD = FD->getFirstDecl();

// We should never ever see a FunctionNoProtoType at this point.		// We should never ever see a FunctionNoProtoType at this point.
// We don't even know how to mangle their types anyway :).		// We don't even know how to mangle their types anyway :).
const FunctionProtoType *FT = FD->getType()->castAs<FunctionProtoType>();		const FunctionProtoType *FT = FD->getType()->castAs<FunctionProtoType>();

// extern "C" functions can hold entities that must be mangled.		// extern "C" functions can hold entities that must be mangled.
// As it stands, these functions still need to get expressed in the full		// As it stands, these functions still need to get expressed in the full
// external name. They have their class and type omitted, replaced with '9'.		// external name. They have their class and type omitted, replaced with '9'.
if (ShouldMangle) {		if (ShouldMangle) {
// We would like to mangle all extern "C" functions using this additional		// We would like to mangle all extern "C" functions using this additional
// component but this would break compatibility with MSVC's behavior.		// component but this would break compatibility with MSVC's behavior.
// Instead, do this when we know that compatibility isn't important (in		// Instead, do this when we know that compatibility isn't important (in
// other words, when it is an overloaded extern "C" function).		// other words, when it is an overloaded extern "C" function).
if (FD->isExternC() && FD->hasAttr<OverloadableAttr>())		if (FD->isExternC() && FD->hasAttr<OverloadableAttr>())
Out << "$$J0";		Out << "$$J0";

		if (Arm64ECDef)
		Out << "$$h";
mangleFunctionClass(FD);		mangleFunctionClass(FD);

mangleFunctionType(FT, FD, false, false);		mangleFunctionType(FT, FD, false, false);
} else {		} else {
Out << '9';		Out << '9';
}		}
}		}

▲ Show 20 Lines • Show All 3,339 Lines • ▼ Show 20 Lines	if (SL->isWide())
MangleByte(GetBigEndianByte(I));		MangleByte(GetBigEndianByte(I));
else		else
MangleByte(GetLittleEndianByte(I));		MangleByte(GetLittleEndianByte(I));
}		}

Mangler.getStream() << '@';		Mangler.getStream() << '@';
}		}

		void MicrosoftMangleContextImpl::mangleArm64ECFnDef(GlobalDecl GD,
		raw_ostream &Out) {
		const FunctionDecl *D = cast<FunctionDecl>(GD.getDecl());
		PrettyStackTraceDecl CrashInfo(D, SourceLocation(),
		getASTContext().getSourceManager(),
		"Mangling Arm64EC function def");

		if (!shouldMangleCXXName(D)) {
		Out << '#' << D->getName();
		return;
		}

		msvc_hashing_ostream MHO(Out);
		MicrosoftCXXNameMangler Mangler(*this, MHO);
		return Mangler.mangleArm64ECFnDef(GD);
		}

		void MicrosoftCXXNameMangler::mangleArm64ECFnDef(GlobalDecl GD) {
		Out << '?';
		mangleName(GD);
		mangleFunctionEncoding(GD, /ShouldMangle/true, /Arm64ECDef/true);
		}

MicrosoftMangleContext *MicrosoftMangleContext::create(ASTContext &Context,		MicrosoftMangleContext *MicrosoftMangleContext::create(ASTContext &Context,
DiagnosticsEngine &Diags,		DiagnosticsEngine &Diags,
bool IsAux) {		bool IsAux) {
return new MicrosoftMangleContextImpl(Context, Diags, IsAux);		return new MicrosoftMangleContextImpl(Context, Diags, IsAux);
}		}

clang/lib/CodeGen/CodeGenModule.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,283 Lines • ▼ Show 20 Lines	if (!GV \|\| (GV->getValueType() != Ty))
GV = cast<llvm::GlobalValue>(GetAddrOfFunction(GD, Ty, /ForVTable=/false,		GV = cast<llvm::GlobalValue>(GetAddrOfFunction(GD, Ty, /ForVTable=/false,
/DontDefer=/true,		/DontDefer=/true,
ForDefinition));		ForDefinition));

// Already emitted.		// Already emitted.
if (!GV->isDeclaration())		if (!GV->isDeclaration())
return;		return;

		if (getTriple().isWindowsArm64EC()) {
		// For ARM64EC targets, a function definition's name is mangled differently
		// from the normal symbol. We then emit an alias from the normal
		// symbol to the remangled definition.
		// FIXME: MSVC uses IMAGE_WEAK_EXTERN_ANTI_DEPENDENCY, we just emit
		// multiple definition symbols. Why does this matter?
		// FIXME: For hybrid_patchable functions, the alias doesn't point
		// to the function itself; it points to a stub for the compiler.
		// FIXME: We also need to emit an entry thunk.
		bcl5980Unsubmitted Not Done Reply Inline Actions A headache thing here. We need to get the function definition with triple x64 to define entry thunk. For now the function definition here is aarch64 version. For example the case in Microsoft doc "Understanding Arm64EC ABI and assembly code": struct SC { char a; char b; char c; }; int fB(int a, double b, int i1, int i2, int i3); int fC(int a, struct SC c, int i1, int i2, int i3); int fA(int a, double b, struct SC c, int i1, int i2, int i3) { return fB(a, b, i1, i2, i3) + fC(a, c, i1, i2, i3); } x64 version IR for fA is: define dso_local i32 @fA(i32 noundef %a, double noundef %b, ptr nocapture noundef readonly %c, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) local_unnamed_addr #0 { ... } aarch64 version IR for fA is: define dso_local i32 @"#fA"(i32 noundef %a, double noundef %b, i64 %c.coerce, i32 noundef %i1, i32 noundef %i2, i32 noundef %i3) #0 {...} Arm64 will allow any size structure to be assigned to a register directly. x64 only allows sizes 1, 2, 4 and 8. Entry thunk follow x64 version function type. But we only have aarch64 version function type. I think the best way to do is create a x64 version codeGenModule and use the x64 CGM to generate the function type for entry thunk. But it is hard for me to do here. I tried a little but a lot of issues happen. One other way is only modify `AArch64ABIInfo::classifyArgumentType`, copy the x64 code into the function and add a flag to determine which version will the function use. It is easier but I'm not sure it is the only difference between x64 and aarch64. Maybe the classify return also need to do this. And it is not a clean way I think. bcl5980: A headache thing here. We need to get the function definition with triple x64 to define entry…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7. Like I noted on D126811, attaching thunks to calls is tricky if we try to do it from clang. Computing the right IR type shouldn't be that hard by itself; we can call into call lowering code in TargetInfo without modifying much else. (We just need a bit to tell the TargetInfo to redirect the call, like D125419. Use an entry point like CodeGenTypes::arrangeCall.) You don't need to mess with the type system or anything like that. The problem is correctly representing the lowered call in IR; we really don't want to do lowering early because it will block optimizations. I considered using an operand bundle; we can probably make that work, but it's complicated, and probably disables some optimizations. I think the best thing we can do here is add an IR attribute to mark arguments which are passed directly on AArch64, but need to be passed indirectly for the x64 ABI. Then AArch64Arm64ECCallLowering can check for the attribute and modify its behavior. This isn't really clean in the sense that it's specific to the x64/aarch64 pair of calling conventions, but I think the alternative is worse. efriedma: Oh, that's annoying... I hadn't considered the case of a struct of size 3/5/6/7. Like I noted…
		bcl5980Unsubmitted Not Done Reply Inline Actions It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are difference between x86 ABI and Aarch64 ABI. Maybe we can emit a function declaration here for the x86ABI thunk, then define it in Arm64ECCallLowering. bcl5980: It looks not only 3/5/6/7, but also all size exclusive larger than 8 and less than 16 are…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that. But that's not really important for the general question. We need to preserve the required semantics for both the AArch64 and x86 calling conventions. There are basically the following possibilities: We compute the declaration of the thunk in the frontend, and attach it to the call with an operand bundle. Like I mentioned, I don't want to go down this path: the operand bundle blocks optimizations, and it becomes more complicated for other code to generate arm64ec compatible calls. We don't compute the definition of the thunk in the frontend. Given that, the only other way to attach the information we need to the call is to use attributes. The simplest thing is probably to attach the attribute directly to the argument; name it "arm64ec-thunk-pass-indirect", or something like that. (I mean, we could compute the whole signature and stuff it into a string attribute, but that doesn't really seem like an improvement...) efriedma: I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed…
		bcl5980Unsubmitted Not Done Reply Inline Actions I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are passed indirectly on x86, and the thunk generation code accounts for that. Yeah, current code for exit thunk already account for that. I mean we need to mark the parameter because entry thunk behavior is also different. Maybe we can compute the mangle name like `$iexit_thunk$cdecl$i8$m6` or `$ientry_thunk$cdecl$m16$f` for the thunk function. Then set attributes like "arm64ec-exitthunk"="$iexit_thunk$cdecl$i8$m6" "arm64ec-entrythunk"="$ientry_thunk$cdecl$m16$f" to the function. Based on the mangle name we can restore the whole thunk I think. This should be a little easier. bcl5980: > I think the sizes between 8 and 16 work correctly already? All sizes greater than 8 are…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions Each function has an arm64 function signature, and a corresponding x64 signature. The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature. That part is the same whether we're generating an entry thunk, or an exit thunk. So I'm not sure why you're distinguishing between them in this context. I'm not sure it makes sense to force the frontend to generate the mangled form, then make the backend demangle it. Seems more straightforward to just attach an attribute to an argument, and make the backend generate the mangled form? efriedma: Each function has an arm64 function signature, and a corresponding x64 signature. The frontend…
		bcl5980Unsubmitted Not Done Reply Inline Actions Each function has an arm64 function signature, and a corresponding x64 signature. The frontend always generates the function with the arm64 signature, and thunk generation translates that to the x64 signature. That part is the same whether we're generating an entry thunk, or an exit thunk. So I'm not sure why you're distinguishing between them in this context. I mean which arguments need to be marked is different for the entry thunk and exit thunk. Both entry thunk and exit thunk need to mark argument with size 3/5/6/7. But when the size is larger than 8 and less than 16, entry thunk still need to mark it but exit thunk needn't. So if we attach an attribute to an argument we need to consider the case larger than 8 less than 16 also. Because when a function has an argument with size 15bytes, frontend will coerce it to i64x2. If we don't attach an attribute for it , backend can't generate the correct entry thunk as we already loss the real size of the argument. Exit thunk needn't that because the code for 15bytes and 16 bytes is the same, store i64x2 to the memory them pass the address. This is part of 15bytes entry thunk mov fp,sp mov x10,x1 ldr w1,[x10,#8] mov x19,x0 ldur w8,[x10,#0xB] ldr x0,[x10] bfi x1,x8,#0x18,#0x20 blr x9 This is part of 16bytes entry thunk mov fp,sp mov x8,x1 mov x19,x0 ldp x0,x1,[x8] blr x9 bcl5980: > Each function has an arm64 function signature, and a corresponding x64 signature. The…
		efriedmaAuthorUnsubmitted Done Reply Inline Actions Oh, I see what you mean. The size of the memory is the same either way according to the ABI, but we're "cheating" a bit with exit thunks: nothing cares if we allocate extra memory for entry thunks, so we can emit slightly shorter code. But for entry thunks, if we read past the end, we could cause a fault. (Realistically, we could probably get away with reading 16 bytes for entry thunks, and not the "right" amount. In practice, indirect arguments always point to memory on the stack, and there's always going to be something on the stack after that, so reading past the end will never fault. But if MSVC is conservative here, maybe we should be too.) (On a related side-note, if we do want to generate the conservative sequence, the code MSVC is generating here is sort of inefficient; something like "ldur x1, [x1, #7]; lshr x1, x1, #8" is going to be faster than ldr+ldr+bfi.) I don't think that means we need the frontend to generate different markings depending on whether we're dealing with entry or exit thunks, though. The thunk generation code can handle the difference transparently if we don't make the frontend mangle the thunk signature. efriedma: Oh, I see what you mean. The size of the memory is the same either way according to the ABI…
		SmallString<256> MangledName;
		llvm::raw_svector_ostream Out(MangledName);
		getCXXABI().getMangleContext().mangleArm64ECFnDef(GD, Out);
		auto *Alias = llvm::GlobalAlias::create("", GV);
		Alias->takeName(GV);
		GV->setName(MangledName);
		}

// We need to set linkage and visibility on the function before		// We need to set linkage and visibility on the function before
// generating code for it because various parts of IR generation		// generating code for it because various parts of IR generation
// want to propagate this information down (e.g. to local static		// want to propagate this information down (e.g. to local static
// declarations).		// declarations).
auto *Fn = cast<llvm::Function>(GV);		auto *Fn = cast<llvm::Function>(GV);
setFunctionLinkage(GD, Fn);		setFunctionLinkage(GD, Fn);

// FIXME: this is redundant with part of setFunctionDefinitionAttributes		// FIXME: this is redundant with part of setFunctionDefinitionAttributes
▲ Show 20 Lines • Show All 1,780 Lines • Show Last 20 Lines

clang/test/CodeGen/arm64ec.c

This file was added.

				// RUN: %clang_cc1 -no-opaque-pointers -triple arm64ec-windows-msvc -emit-llvm -o - %s \| FileCheck %s

				// CHECK: @g = alias void ([2 x float], [4 x float]), void ([2 x float], [4 x float])* @"#g"
				// CHECK: define dso_local void @"#g"
				typedef struct { float x[2]; } A;
				typedef struct { float x[4]; } B;
				void g(A a, B b) { }

clang/test/CodeGenCXX/arm64ec.cpp

This file was added.

				// RUN: %clang_cc1 -no-opaque-pointers -triple arm64ec-windows-msvc -emit-llvm -o - %s \| FileCheck %s

				// CHECK: @"?g@@YAXUA@@UB@@@Z" = alias void ([2 x float], [4 x float]), void ([2 x float], [4 x float])* @"?g@@$$hYAXUA@@UB@@@Z"
				// CHECK: define dso_local void @"?g@@$$hYAXUA@@UB@@@Z"
				typedef struct { float x[2]; } A;
				typedef struct { float x[4]; } B;
				void g(A a, B b) { }