This is an archive of the discontinued LLVM Phabricator instance.

llvm/lib/Target/X86/X86ISelLowering.cpp
3077	Is some risk here? It is used to truncate i1 only before.
llvm/test/CodeGen/X86/musttail-varargs.ll
348	I have some doubt on the implementation. Is caller always needs zeroext for the pointer even if it knows the upper bits are zero? Form line 309, the callers seems know the upper bits are zero.

pengfei added inline comments.Nov 12 2020, 10:08 PM

llvm/test/CodeGen/X86/x32-function_pointer-2.ll
16–17	This file should not be affected, right?
llvm/test/CodeGen/X86/x86-64-sret-return.ll
13	I saw other tests all use 64-bit instructions. In which case we may use 32-bit instruction?

hvdijk added inline comments.Nov 13 2020, 1:20 AM

llvm/lib/Target/X86/X86ISelLowering.cpp
3077	If `LocVT` and `ValVT` are different, it is the responsibility of this function to take a `LocVT` DAG node and convert it to a `ValVT` DAG node, and LLVM will assert if it fails to do so. If `isExtInLoc()` returns true, `LocVT` and `ValVT` will be different, and there is no other code to handle that conversion, so any non-bit-vector extensions of return values would already be broken and would fail that assert. I think the fact that it did not cause problems before is simply because it never came up before for any type other than bit vectors.
llvm/test/CodeGen/X86/musttail-varargs.ll
348	This is a general missing optimization in LLVM that affects targets other than X86 as well, you are correct that this `movl %edi, %edi` is not needed if the high bits of `%rdi` are already guaranteed to be zero. LLVM can lose that information. I have not looked at this case in detail, but I have previously seen this be a problem where a node is a copy of a function parameter: since it is not the function parameter, merely a copy of its value, the fact that it is a zero-extended i32 value is not available. Since the generated code is correct, just suboptimal, I did not think it is necessary to fix that at the same time.
llvm/test/CodeGen/X86/x32-function_pointer-2.ll
16–17	This file is affected because `callq` can take either the copy of `%rsi` like it did before, or `%rsi` directly since it knows it is already zero-extended.
llvm/test/CodeGen/X86/x86-64-sret-return.ll
13	When we need to copy one 64-bit register to another 64-bit register, we can use the 64-bit move instructions to preserve the high bits, or the 32-bit move instructions to zero out the high bits. When the high bits are known to be zero, the 64-bit move instructions and the 32-bit move instructions have the same effect, since zeroing out the high bits when they are already zero does nothing.

xbolva00 added a subscriber: xbolva00.Nov 13 2020, 1:56 AM

xbolva00 added inline comments.

llvm/test/CodeGen/X86/sibcall.ll
83–84	Adjust fixme?

Remove no longer necessary FIXME comment

hvdijk marked an inline comment as done.Nov 13 2020, 4:12 AM

hvdijk added inline comments.

llvm/test/CodeGen/X86/sibcall.ll
83–84	Good spot. This comment applied to the `movl %edi, %eax` that is no longer generated, so I have simply removed the comment.

Harbormaster completed remote builds in B78741: Diff 305095.Nov 13 2020, 4:53 AM

pengfei added inline comments.Nov 13 2020, 5:02 AM

llvm/test/CodeGen/X86/x32-function_pointer-2.ll
16–17	I see. But this test should also pass without the patch since you just loosen check conditions. I think it's better to make the conditions more strict. Or at least `[[REG:.*]]` in line 16 is not needed.

Remove no longer used [[REG:.*]] from test

llvm/test/CodeGen/X86/x32-function_pointer-2.ll
16–17	Since both `%rsi` and `%r[[REG]]` are equally valid, without any reason why LLVM should prefer one over the other, I did not want to have a check that only permitted one of them since later changes to LLVM could arbitrarily change it; ideally, the check would be for `%r{si\|[[REG]]}`, but I think FileCheck does not support that. So out of those options, I would say remove the `[[REG:.]]`; I have updated the patch to do that.

Hi Harald, thanks for thoroughly answering my questions. I still have doubts intention of this patch. Is there any bug related to it?
I had a look at the ABI and found it says "ILP32 binaries ... should conform to small code model or small position independent code model ...". IMHO, the ILP32 mode is designed for performance which always assumes the address bit 63~32 all zero and uses 32 bits register to reduce code size. Although the extension guarantees the correctness, I think it may go against the original intention of the design.
Nevertheless, I'm not expert on ABI, add @hjl.tools for a review.

pengfei added a reviewer: hjl.tools.Nov 13 2020, 7:53 PM

In D91338#2395497, @pengfei wrote:

Hi Harald, thanks for thoroughly answering my questions. I still have doubts intention of this patch. Is there any bug related to it?

I run an x32 system. Things work well for things built with GCC, but break badly when using LLVM, because the ABI as implemented is incompatible between GCC and LLVM, as GCC-generated code will sometimes assume that the high bits of pointer parameters have been zeroed out as required by the ABI. This is especially visible when using Rust applications, as the Rust compiler is LLVM-based but most of the rest of my system is built with GCC. Pretty much any non-trivial Rust application crashes, for instance ripgrep, or the Rust compiler itself.

IMHO, the ILP32 mode is designed for performance which always assumes the address bit 63~32 all zero and uses 32 bits register to reduce code size.

Using 32 bits registers is not always possible, and when possible increases, not reduces, code size. When instructions can take either a 32-bit register or a 64-bit register, the 64-bit register is what you get by default, the 32-bit register requires an extra byte. For instance, mov (%edi),%eax is encoded as 0x67 0x8B 0x07, but mov (%rdi), %eax is encoded as just 0x8B 0x07. The idea of the ABI, I believe, is that called functions should be able to use 64 bit registers for memory operations. That said, I have not done any work to ensure that 64-bit registers are used where possible; both GCC and LLVM with my patch do generate code that unnecessarily uses 32-bit registers.

Was there ever a follow-up on https://lkml.org/lkml/2018/12/10/1145 ?

In D91338#2395610, @lebedev.ri wrote:

Was there ever a follow-up on https://lkml.org/lkml/2018/12/10/1145 ?

None that I am aware of. The current Linux kernel (5.9) fully supports x32 and does not mark it as broken, deprecated, or anything like that in any way.

In D91338#2395608, @hvdijk wrote:

In D91338#2395497, @pengfei wrote:

Hi Harald, thanks for thoroughly answering my questions. I still have doubts intention of this patch. Is there any bug related to it?

I run an x32 system. Things work well for things built with GCC, but break badly when using LLVM, because the ABI as implemented is incompatible between GCC and LLVM, as GCC-generated code will sometimes assume that the high bits of pointer parameters have been zeroed out as required by the ABI. This is especially visible when using Rust applications, as the Rust compiler is LLVM-based but most of the rest of my system is built with GCC. Pretty much any non-trivial Rust application crashes, for instance ripgrep, or the Rust compiler itself.

IMHO, the ILP32 mode is designed for performance which always assumes the address bit 63~32 all zero and uses 32 bits register to reduce code size.

ILP32 psABI has

10.1 Parameter Passing

When a value of pointer type is returned or passed in a register, bits 32 to 63 shall be zero.

It is required for ILP32 since many system calls are shared between ILP32 and LP64.
Here is my x32 port from 2016.

ILP32 psABI has

10.1 Parameter Passing

When a value of pointer type is returned or passed in a register, bits 32 to 63 shall be zero.

It is required for ILP32 since many system calls are shared between ILP32 and LP64.
Here is my x32 port from 2016.

Thanks H.J. It's amazing you already did many works on x32 support. But these patches seem not been merged to mainline. What's the reason for not merging them? Does that mean the trunk code still has some flaws on x32 ABI supporting?

In D91338#2395907, @pengfei wrote:

It is required for ILP32 since many system calls are shared between ILP32 and LP64.
Here is my x32 port from 2016.

Thanks H.J. It's amazing you already did many works on x32 support. But these patches seem not been merged to mainline. What's the reason for not merging them? Does that mean the trunk code still has some flaws on x32 ABI supporting?

I never finished x32 work. You can use my branch as a reference when working on x32.

In D91338#2395918, @hjl.tools wrote:

I never finished x32 work. You can use my branch as a reference when working on x32.

I wish I'd known about your branch before I started my work that this is part of (https://lists.llvm.org/pipermail/llvm-dev/2020-October/146049.html) :) Thanks, that may prove very useful for the problems that still remain.

@pengfei With the confirmation that we do need this, could you take another look? Does this look okay now?

In D91338#2421551, @hvdijk wrote:

@pengfei With the confirmation that we do need this, could you take another look? Does this look okay now?

I'm OK with this patch. Can you please check if expensive check is happy with it?

This revision is now accepted and ready to land.Nov 29 2020, 5:24 PM

Closed by commit rGcdac34bd47a3: [X86] Zero-extend pointers to i64 for x86_64 (authored by hvdijk). · Explain WhyNov 30 2020, 10:51 AM

This revision was automatically updated to reflect the committed changes.

hvdijk added a commit: rGcdac34bd47a3: [X86] Zero-extend pointers to i64 for x86_64.

In D91338#2421852, @pengfei wrote:

I'm OK with this patch. Can you please check if expensive check is happy with it?

Thanks, I saw tests pass when expensive checks are enabled so pushed it.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86CallingConv.cpp

10 lines

X86CallingConv.td

6 lines

X86ISelLowering.cpp

3 lines

test/

CodeGen/

X86/

11 lines

1 line

1 line

2 lines

49 lines

x32-function_pointer-2.ll

6 lines

x86-64-sret-return.ll

10 lines

Diff 308424

llvm/lib/Target/X86/X86CallingConv.cpp

Show First 20 Lines • Show All 324 Lines • ▼ Show 20 Lines	static bool CC_X86_Intr(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
// X86FrameLowering::getFrameIndexReference, not here.		// X86FrameLowering::getFrameIndexReference, not here.
if (Is64Bit && ArgCount == 2)		if (Is64Bit && ArgCount == 2)
Offset += SlotSize;		Offset += SlotSize;

State.addLoc(CCValAssign::getMem(ValNo, ValVT, Offset, LocVT, LocInfo));		State.addLoc(CCValAssign::getMem(ValNo, ValVT, Offset, LocVT, LocInfo));
return true;		return true;
}		}

		static bool CC_X86_64_Pointer(unsigned &ValNo, MVT &ValVT, MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags, CCState &State) {
		if (LocVT != MVT::i64) {
		LocVT = MVT::i64;
		LocInfo = CCValAssign::ZExt;
		}
		return false;
		}

// Provides entry points of CC_X86 and RetCC_X86.		// Provides entry points of CC_X86 and RetCC_X86.
#include "X86GenCallingConv.inc"		#include "X86GenCallingConv.inc"

llvm/lib/Target/X86/X86CallingConv.td

Show First 20 Lines • Show All 330 Lines • ▼ Show 20 Lines	def RetCC_X86_64_C : CallingConv<[
// The X86-64 calling convention always returns FP values in XMM0.		// The X86-64 calling convention always returns FP values in XMM0.
CCIfType<[f32], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[f32], CCAssignToReg<[XMM0, XMM1]>>,
CCIfType<[f64], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[f64], CCAssignToReg<[XMM0, XMM1]>>,
CCIfType<[f128], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[f128], CCAssignToReg<[XMM0, XMM1]>>,

// MMX vector types are always returned in XMM0.		// MMX vector types are always returned in XMM0.
CCIfType<[x86mmx], CCAssignToReg<[XMM0, XMM1]>>,		CCIfType<[x86mmx], CCAssignToReg<[XMM0, XMM1]>>,

		// Pointers are always returned in full 64-bit registers.
		CCIfPtr<CCCustom<"CC_X86_64_Pointer">>,

CCIfSwiftError<CCIfType<[i64], CCAssignToReg<[R12]>>>,		CCIfSwiftError<CCIfType<[i64], CCAssignToReg<[R12]>>>,

CCDelegateTo<RetCC_X86Common>		CCDelegateTo<RetCC_X86Common>
]>;		]>;

// X86-Win64 C return-value convention.		// X86-Win64 C return-value convention.
def RetCC_X86_Win64_C : CallingConv<[		def RetCC_X86_Win64_C : CallingConv<[
// The X86-Win64 calling convention always returns __m64 values in RAX.		// The X86-Win64 calling convention always returns __m64 values in RAX.
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	def CC_X86_64_C : CallingConv<[

// A SwiftError is passed in R12.		// A SwiftError is passed in R12.
CCIfSwiftError<CCIfType<[i64], CCAssignToReg<[R12]>>>,		CCIfSwiftError<CCIfType<[i64], CCAssignToReg<[R12]>>>,

// For Swift Calling Convention, pass sret in %rax.		// For Swift Calling Convention, pass sret in %rax.
CCIfCC<"CallingConv::Swift",		CCIfCC<"CallingConv::Swift",
CCIfSRet<CCIfType<[i64], CCAssignToReg<[RAX]>>>>,		CCIfSRet<CCIfType<[i64], CCAssignToReg<[RAX]>>>>,

		// Pointers are always passed in full 64-bit registers.
		CCIfPtr<CCCustom<"CC_X86_64_Pointer">>,

// The first 6 integer arguments are passed in integer registers.		// The first 6 integer arguments are passed in integer registers.
CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D]>>,		CCIfType<[i32], CCAssignToReg<[EDI, ESI, EDX, ECX, R8D, R9D]>>,
CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8 , R9 ]>>,		CCIfType<[i64], CCAssignToReg<[RDI, RSI, RDX, RCX, R8 , R9 ]>>,

// The first 8 MMX vector arguments are passed in XMM registers on Darwin.		// The first 8 MMX vector arguments are passed in XMM registers on Darwin.
CCIfType<[x86mmx],		CCIfType<[x86mmx],
CCIfSubtarget<"isTargetDarwin()",		CCIfSubtarget<"isTargetDarwin()",
CCIfSubtarget<"hasSSE2()",		CCIfSubtarget<"hasSSE2()",
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,061 Lines • ▼ Show 20 Lines	if (VA.needsCustom()) {
InFlag = Chain.getValue(2);		InFlag = Chain.getValue(2);
}		}

if (RoundAfterCopy)		if (RoundAfterCopy)
Val = DAG.getNode(ISD::FP_ROUND, dl, VA.getValVT(), Val,		Val = DAG.getNode(ISD::FP_ROUND, dl, VA.getValVT(), Val,
// This truncation won't change the value.		// This truncation won't change the value.
DAG.getIntPtrConstant(1, dl));		DAG.getIntPtrConstant(1, dl));

if (VA.isExtInLoc() && (VA.getValVT().getScalarType() == MVT::i1)) {		if (VA.isExtInLoc()) {
if (VA.getValVT().isVector() &&		if (VA.getValVT().isVector() &&
		VA.getValVT().getScalarType() == MVT::i1 &&
((VA.getLocVT() == MVT::i64) \|\| (VA.getLocVT() == MVT::i32) \|\|		((VA.getLocVT() == MVT::i64) \|\| (VA.getLocVT() == MVT::i32) \|\|
(VA.getLocVT() == MVT::i16) \|\| (VA.getLocVT() == MVT::i8))) {		(VA.getLocVT() == MVT::i16) \|\| (VA.getLocVT() == MVT::i8))) {
// promoting a mask type (v*i1) into a register of type i64/i32/i16/i8		// promoting a mask type (v*i1) into a register of type i64/i32/i16/i8
Val = lowerRegToMasks(Val, VA.getValVT(), VA.getLocVT(), dl, DAG);		Val = lowerRegToMasks(Val, VA.getValVT(), VA.getLocVT(), dl, DAG);
} else		} else
		pengfeiUnsubmitted Not Done Reply Inline Actions Is some risk here? It is used to truncate i1 only before. pengfei: Is some risk here? It is used to truncate i1 only before.
		hvdijkAuthorUnsubmitted Done Reply Inline Actions If `LocVT` and `ValVT` are different, it is the responsibility of this function to take a `LocVT` DAG node and convert it to a `ValVT` DAG node, and LLVM will assert if it fails to do so. If `isExtInLoc()` returns true, `LocVT` and `ValVT` will be different, and there is no other code to handle that conversion, so any non-bit-vector extensions of return values would already be broken and would fail that assert. I think the fact that it did not cause problems before is simply because it never came up before for any type other than bit vectors. hvdijk: If `LocVT` and `ValVT` are different, it is the responsibility of this function to take a…
Val = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val);		Val = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val);
}		}

if (VA.getLocInfo() == CCValAssign::BCvt)		if (VA.getLocInfo() == CCValAssign::BCvt)
Val = DAG.getBitcast(VA.getValVT(), Val);		Val = DAG.getBitcast(VA.getValVT(), Val);

InVals.push_back(Val);		InVals.push_back(Val);
}		}
▲ Show 20 Lines • Show All 48,134 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/musttail-varargs.ll

	Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines
	; LINUX-X32-NEXT: .cfi_offset %r14, -32			; LINUX-X32-NEXT: .cfi_offset %r14, -32
	; LINUX-X32-NEXT: .cfi_offset %r15, -24			; LINUX-X32-NEXT: .cfi_offset %r15, -24
	; LINUX-X32-NEXT: .cfi_offset %rbp, -16			; LINUX-X32-NEXT: .cfi_offset %rbp, -16
	; LINUX-X32-NEXT: movq %r9, %r15			; LINUX-X32-NEXT: movq %r9, %r15
	; LINUX-X32-NEXT: movq %r8, %r12			; LINUX-X32-NEXT: movq %r8, %r12
	; LINUX-X32-NEXT: movq %rcx, %r13			; LINUX-X32-NEXT: movq %rcx, %r13
	; LINUX-X32-NEXT: movq %rdx, %rbp			; LINUX-X32-NEXT: movq %rdx, %rbp
	; LINUX-X32-NEXT: movq %rsi, %rbx			; LINUX-X32-NEXT: movq %rsi, %rbx
	; LINUX-X32-NEXT: movl %edi, %r14d			; LINUX-X32-NEXT: movq %rdi, %r14
	; LINUX-X32-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill			; LINUX-X32-NEXT: movb %al, {{[-0-9]+}}(%e{{[sb]}}p) # 1-byte Spill
	; LINUX-X32-NEXT: testb %al, %al			; LINUX-X32-NEXT: testb %al, %al
	; LINUX-X32-NEXT: je .LBB0_2			; LINUX-X32-NEXT: je .LBB0_2
	; LINUX-X32-NEXT: # %bb.1:			; LINUX-X32-NEXT: # %bb.1:
	; LINUX-X32-NEXT: movaps %xmm0, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm0, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm1, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm1, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm2, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm2, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm3, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm3, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm4, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm5, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm6, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movaps %xmm7, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: .LBB0_2:			; LINUX-X32-NEXT: .LBB0_2:
	; LINUX-X32-NEXT: movq %rbx, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %rbx, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %rbp, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %rbp, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r13, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %r13, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r12, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %r12, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movq %r15, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %r15, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax			; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax
	; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax			; LINUX-X32-NEXT: leal {{[0-9]+}}(%rsp), %eax
	; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movl %eax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008			; LINUX-X32-NEXT: movabsq $206158430216, %rax # imm = 0x3000000008
	; LINUX-X32-NEXT: movq %rax, {{[0-9]+}}(%esp)			; LINUX-X32-NEXT: movq %rax, {{[0-9]+}}(%esp)
	; LINUX-X32-NEXT: movl %r14d, %edi			; LINUX-X32-NEXT: movq %r14, %rdi
	; LINUX-X32-NEXT: movaps %xmm7, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm7, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm6, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm6, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm5, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm5, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm4, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm4, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm3, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm3, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm2, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm1, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: movaps %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill			; LINUX-X32-NEXT: movaps %xmm0, {{[-0-9]+}}(%e{{[sb]}}p) # 16-byte Spill
	; LINUX-X32-NEXT: callq get_f			; LINUX-X32-NEXT: callq get_f
	; LINUX-X32-NEXT: movl %eax, %r11d			; LINUX-X32-NEXT: movl %eax, %r11d
	; LINUX-X32-NEXT: movl %r14d, %edi			; LINUX-X32-NEXT: movq %r14, %rdi
	; LINUX-X32-NEXT: movq %rbx, %rsi			; LINUX-X32-NEXT: movq %rbx, %rsi
	; LINUX-X32-NEXT: movq %rbp, %rdx			; LINUX-X32-NEXT: movq %rbp, %rdx
	; LINUX-X32-NEXT: movq %r13, %rcx			; LINUX-X32-NEXT: movq %r13, %rcx
	; LINUX-X32-NEXT: movq %r12, %r8			; LINUX-X32-NEXT: movq %r12, %r8
	; LINUX-X32-NEXT: movq %r15, %r9			; LINUX-X32-NEXT: movq %r15, %r9
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload			; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm0 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload			; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm1 # 16-byte Reload
	; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload			; LINUX-X32-NEXT: movaps {{[-0-9]+}}(%e{{[sb]}}p), %xmm2 # 16-byte Reload
	▲ Show 20 Lines • Show All 115 Lines • ▼ Show 20 Lines

	define void @g_thunk(i8* %fptr_i8, ...) {			define void @g_thunk(i8* %fptr_i8, ...) {
	; LINUX-LABEL: g_thunk:			; LINUX-LABEL: g_thunk:
	; LINUX: # %bb.0:			; LINUX: # %bb.0:
	; LINUX-NEXT: jmpq *%rdi # TAILCALL			; LINUX-NEXT: jmpq *%rdi # TAILCALL
	;			;
	; LINUX-X32-LABEL: g_thunk:			; LINUX-X32-LABEL: g_thunk:
	; LINUX-X32: # %bb.0:			; LINUX-X32: # %bb.0:
	; LINUX-X32-NEXT: movl %edi, %r11d			; LINUX-X32-NEXT: jmpq *%rdi # TAILCALL
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	;			;
	; WINDOWS-LABEL: g_thunk:			; WINDOWS-LABEL: g_thunk:
	; WINDOWS: # %bb.0:			; WINDOWS: # %bb.0:
	; WINDOWS-NEXT: rex64 jmpq *%rcx # TAILCALL			; WINDOWS-NEXT: rex64 jmpq *%rcx # TAILCALL
	;			;
	; X86-LABEL: g_thunk:			; X86-LABEL: g_thunk:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: movl {{[0-9]+}}(%esp), %eax			; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
	Show All 24 Lines
	; LINUX-NEXT: jmpq *%r11 # TAILCALL			; LINUX-NEXT: jmpq *%r11 # TAILCALL
	;			;
	; LINUX-X32-LABEL: h_thunk:			; LINUX-X32-LABEL: h_thunk:
	; LINUX-X32: # %bb.0:			; LINUX-X32: # %bb.0:
	; LINUX-X32-NEXT: cmpb $1, (%edi)			; LINUX-X32-NEXT: cmpb $1, (%edi)
	; LINUX-X32-NEXT: jne .LBB2_2			; LINUX-X32-NEXT: jne .LBB2_2
	; LINUX-X32-NEXT: # %bb.1: # %then			; LINUX-X32-NEXT: # %bb.1: # %then
	; LINUX-X32-NEXT: movl 4(%edi), %r11d			; LINUX-X32-NEXT: movl 4(%edi), %r11d
				; LINUX-X32-NEXT: movl %edi, %edi
				pengfeiUnsubmitted Not Done Reply Inline Actions I have some doubt on the implementation. Is caller always needs zeroext for the pointer even if it knows the upper bits are zero? Form line 309, the callers seems know the upper bits are zero. pengfei: I have some doubt on the implementation. Is caller always needs zeroext for the pointer even if…
				hvdijkAuthorUnsubmitted Done Reply Inline Actions This is a general missing optimization in LLVM that affects targets other than X86 as well, you are correct that this `movl %edi, %edi` is not needed if the high bits of `%rdi` are already guaranteed to be zero. LLVM can lose that information. I have not looked at this case in detail, but I have previously seen this be a problem where a node is a copy of a function parameter: since it is not the function parameter, merely a copy of its value, the fact that it is a zero-extended i32 value is not available. Since the generated code is correct, just suboptimal, I did not think it is necessary to fix that at the same time. hvdijk: This is a general missing optimization in LLVM that affects targets other than X86 as well, you…
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	; LINUX-X32-NEXT: .LBB2_2: # %else			; LINUX-X32-NEXT: .LBB2_2: # %else
	; LINUX-X32-NEXT: movl 8(%edi), %r11d			; LINUX-X32-NEXT: movl 8(%edi), %r11d
	; LINUX-X32-NEXT: movl $42, {{.*}}(%rip)			; LINUX-X32-NEXT: movl $42, {{.*}}(%rip)
				; LINUX-X32-NEXT: movl %edi, %edi
	; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL			; LINUX-X32-NEXT: jmpq *%r11 # TAILCALL
	;			;
	; WINDOWS-LABEL: h_thunk:			; WINDOWS-LABEL: h_thunk:
	; WINDOWS: # %bb.0:			; WINDOWS: # %bb.0:
	; WINDOWS-NEXT: cmpb $1, (%rcx)			; WINDOWS-NEXT: cmpb $1, (%rcx)
	; WINDOWS-NEXT: jne .LBB2_2			; WINDOWS-NEXT: jne .LBB2_2
	; WINDOWS-NEXT: # %bb.1: # %then			; WINDOWS-NEXT: # %bb.1: # %then
	; WINDOWS-NEXT: movq 8(%rcx), %rax			; WINDOWS-NEXT: movq 8(%rcx), %rax
	Show All 39 Lines

llvm/test/CodeGen/X86/pr38865-2.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -O0 -mtriple=x86_64-unknown-linux-gnux32 \| FileCheck %s			; RUN: llc < %s -O0 -mtriple=x86_64-unknown-linux-gnux32 \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128"

	%struct.a = type { i8 }			%struct.a = type { i8 }

	define void @_Z1bv(%struct.a* noalias sret(%struct.a) %agg.result) {			define void @_Z1bv(%struct.a* noalias sret(%struct.a) %agg.result) {
	; CHECK-LABEL: _Z1bv:			; CHECK-LABEL: _Z1bv:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pushq %rax			; CHECK-NEXT: pushq %rax
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
				; CHECK-NEXT: # kill: def $edi killed $edi killed $rdi
	; CHECK-NEXT: movl %edi, %eax			; CHECK-NEXT: movl %edi, %eax
	; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill			; CHECK-NEXT: movl %eax, {{[-0-9]+}}(%e{{[sb]}}p) # 4-byte Spill
	; CHECK-NEXT: callq _Z1bv			; CHECK-NEXT: callq _Z1bv
	; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload			; CHECK-NEXT: movl {{[-0-9]+}}(%e{{[sb]}}p), %eax # 4-byte Reload
	; CHECK-NEXT: popq %rcx			; CHECK-NEXT: popq %rcx
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	call void @_Z1bv(%struct.a* sret(%struct.a) %agg.result)			call void @_Z1bv(%struct.a* sret(%struct.a) %agg.result)
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/pr38865-3.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -show-mc-encoding < %s \| FileCheck %s			; RUN: llc -show-mc-encoding < %s \| FileCheck %s

	target datalayout = "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p:32:32-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnux32"			target triple = "x86_64-unknown-linux-gnux32"

	define void @foo(i8* %x) optsize {			define void @foo(i8* %x) optsize {
	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: movl $707406378, %eax # encoding: [0xb8,0x2a,0x2a,0x2a,0x2a]			; CHECK-NEXT: movl $707406378, %eax # encoding: [0xb8,0x2a,0x2a,0x2a,0x2a]
	; CHECK-NEXT: # imm = 0x2A2A2A2A			; CHECK-NEXT: # imm = 0x2A2A2A2A
	; CHECK-NEXT: movl $32, %ecx # encoding: [0xb9,0x20,0x00,0x00,0x00]			; CHECK-NEXT: movl $32, %ecx # encoding: [0xb9,0x20,0x00,0x00,0x00]
				; CHECK-NEXT: # kill: def $edi killed $edi killed $rdi
	; CHECK-NEXT: rep;stosl %eax, %es:(%edi) # encoding: [0xf3,0x67,0xab]			; CHECK-NEXT: rep;stosl %eax, %es:(%edi) # encoding: [0xf3,0x67,0xab]
	; CHECK-NEXT: retq # encoding: [0xc3]			; CHECK-NEXT: retq # encoding: [0xc3]
	call void @llvm.memset.p0i8.i32(i8* align 4 %x, i8 42, i32 128, i1 false)			call void @llvm.memset.p0i8.i32(i8* align 4 %x, i8 42, i32 128, i1 false)
	ret void			ret void
	}			}
	declare void @llvm.memset.p0i8.i32(i8*, i8, i32, i1)			declare void @llvm.memset.p0i8.i32(i8*, i8, i32, i1)

llvm/test/CodeGen/X86/pr38865.ll

	Show All 9 Lines

	define void @e() nounwind {			define void @e() nounwind {
	; CHECK-LABEL: e:			; CHECK-LABEL: e:
	; CHECK: # %bb.0: # %entry			; CHECK: # %bb.0: # %entry
	; CHECK-NEXT: pushq %rbx # encoding: [0x53]			; CHECK-NEXT: pushq %rbx # encoding: [0x53]
	; CHECK-NEXT: subl $528, %esp # encoding: [0x81,0xec,0x10,0x02,0x00,0x00]			; CHECK-NEXT: subl $528, %esp # encoding: [0x81,0xec,0x10,0x02,0x00,0x00]
	; CHECK-NEXT: # imm = 0x210			; CHECK-NEXT: # imm = 0x210
	; CHECK-NEXT: leal {{[0-9]+}}(%rsp), %ebx # encoding: [0x8d,0x9c,0x24,0x08,0x01,0x00,0x00]			; CHECK-NEXT: leal {{[0-9]+}}(%rsp), %ebx # encoding: [0x8d,0x9c,0x24,0x08,0x01,0x00,0x00]
	; CHECK-NEXT: movl %ebx, %edi # encoding: [0x89,0xdf]			; CHECK-NEXT: movq %rbx, %rdi # encoding: [0x48,0x89,0xdf]
	; CHECK-NEXT: movl $c, %esi # encoding: [0xbe,A,A,A,A]			; CHECK-NEXT: movl $c, %esi # encoding: [0xbe,A,A,A,A]
	; CHECK-NEXT: # fixup A - offset: 1, value: c, kind: FK_Data_4			; CHECK-NEXT: # fixup A - offset: 1, value: c, kind: FK_Data_4
	; CHECK-NEXT: movl $260, %edx # encoding: [0xba,0x04,0x01,0x00,0x00]			; CHECK-NEXT: movl $260, %edx # encoding: [0xba,0x04,0x01,0x00,0x00]
	; CHECK-NEXT: # imm = 0x104			; CHECK-NEXT: # imm = 0x104
	; CHECK-NEXT: callq memcpy # encoding: [0xe8,A,A,A,A]			; CHECK-NEXT: callq memcpy # encoding: [0xe8,A,A,A,A]
	; CHECK-NEXT: # fixup A - offset: 1, value: memcpy-4, kind: FK_PCRel_4			; CHECK-NEXT: # fixup A - offset: 1, value: memcpy-4, kind: FK_PCRel_4
	; CHECK-NEXT: movl $32, %ecx # encoding: [0xb9,0x20,0x00,0x00,0x00]			; CHECK-NEXT: movl $32, %ecx # encoding: [0xb9,0x20,0x00,0x00,0x00]
	; CHECK-NEXT: movl %esp, %edi # encoding: [0x89,0xe7]			; CHECK-NEXT: movl %esp, %edi # encoding: [0x89,0xe7]
	Show All 21 Lines

llvm/test/CodeGen/X86/sibcall.ll

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	; X64-LABEL: t4:			; X64-LABEL: t4:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: xorl %edi, %edi			; X64-NEXT: xorl %edi, %edi
	; X64-NEXT: jmpq *%rax # TAILCALL			; X64-NEXT: jmpq *%rax # TAILCALL
	;			;
	; X32-LABEL: t4:			; X32-LABEL: t4:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl %edi, %eax			; X32-NEXT: movq %rdi, %rax
	; X32-NEXT: xorl %edi, %edi			; X32-NEXT: xorl %edi, %edi
	; X32-NEXT: jmpq *%rax # TAILCALL			; X32-NEXT: jmpq *%rax # TAILCALL
	tail call void %x(i32 0) nounwind			tail call void %x(i32 0) nounwind
	ret void			ret void
	}			}

	; FIXME: This isn't needed since x32 psABI specifies that callers must
	; zero-extend pointers passed in registers.

	define void @t5(void ()* nocapture %x) nounwind ssp {			define void @t5(void ()* nocapture %x) nounwind ssp {
				xbolva00Unsubmitted Done Reply Inline Actions Adjust fixme? xbolva00: Adjust fixme?
				hvdijkAuthorUnsubmitted Done Reply Inline Actions Good spot. This comment applied to the `movl %edi, %eax` that is no longer generated, so I have simply removed the comment. hvdijk: Good spot. This comment applied to the `movl %edi, %eax` that is no longer generated, so I have…
	; X86-LABEL: t5:			; X86-LABEL: t5:
	; X86: # %bb.0:			; X86: # %bb.0:
	; X86-NEXT: jmpl *{{[0-9]+}}(%esp) # TAILCALL			; X86-NEXT: jmpl *{{[0-9]+}}(%esp) # TAILCALL
	;			;
	; X64-LABEL: t5:			; X64-LABEL: t5:
	; X64: # %bb.0:			; X64: # %bb.0:
	; X64-NEXT: jmpq *%rdi # TAILCALL			; X64-NEXT: jmpq *%rdi # TAILCALL
	;			;
	; X32-LABEL: t5:			; X32-LABEL: t5:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: movl %edi, %eax			; X32-NEXT: jmpq *%rdi # TAILCALL
	; X32-NEXT: jmpq *%rax # TAILCALL
	tail call void %x() nounwind			tail call void %x() nounwind
	ret void			ret void
	}			}

	; Basically the same test as t5, except pass the function pointer on the stack			; Basically the same test as t5, except pass the function pointer on the stack
	; for x86_64.			; for x86_64.

	define void @t5_x64(i32, i32, i32, i32, i32, i32, void ()* nocapture %x) nounwind ssp {			define void @t5_x64(i32, i32, i32, i32, i32, i32, void ()* nocapture %x) nounwind ssp {
	▲ Show 20 Lines • Show All 114 Lines • ▼ Show 20 Lines
	; X64-LABEL: t9:			; X64-LABEL: t9:
	; X64: # %bb.0: # %entry			; X64: # %bb.0: # %entry
	; X64-NEXT: movq %rdi, %rax			; X64-NEXT: movq %rdi, %rax
	; X64-NEXT: xorl %edi, %edi			; X64-NEXT: xorl %edi, %edi
	; X64-NEXT: jmpq *%rax # TAILCALL			; X64-NEXT: jmpq *%rax # TAILCALL
	;			;
	; X32-LABEL: t9:			; X32-LABEL: t9:
	; X32: # %bb.0: # %entry			; X32: # %bb.0: # %entry
	; X32-NEXT: movl %edi, %eax			; X32-NEXT: movq %rdi, %rax
	; X32-NEXT: xorl %edi, %edi			; X32-NEXT: xorl %edi, %edi
	; X32-NEXT: jmpq *%rax # TAILCALL			; X32-NEXT: jmpq *%rax # TAILCALL
	entry:			entry:
	%0 = bitcast i32 (i32)* %x to i16 (i32)*			%0 = bitcast i32 (i32)* %x to i16 (i32)*
	%1 = tail call signext i16 %0(i32 0) nounwind			%1 = tail call signext i16 %0(i32 0) nounwind
	ret i16 %1			ret i16 %1
	}			}

	▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	; X32-NEXT: movq (%edi), %rcx			; X32-NEXT: movq (%edi), %rcx
	; X32-NEXT: movq 8(%edi), %rdx			; X32-NEXT: movq 8(%edi), %rdx
	; X32-NEXT: xorl %edi, %edi			; X32-NEXT: xorl %edi, %edi
	; X32-NEXT: pushq %rax			; X32-NEXT: pushq %rax
	; X32-NEXT: pushq %rdx			; X32-NEXT: pushq %rdx
	; X32-NEXT: pushq %rcx			; X32-NEXT: pushq %rcx
	; X32-NEXT: callq foo7			; X32-NEXT: callq foo7
	; X32-NEXT: addl $32, %esp			; X32-NEXT: addl $32, %esp
				; X32-NEXT: movl %eax, %eax
	; X32-NEXT: popq %rcx			; X32-NEXT: popq %rcx
	; X32-NEXT: retq			; X32-NEXT: retq
	entry:			entry:
	%0 = tail call fastcc %struct.ns* @foo7(%struct.cp* byval(%struct.cp) align 4 %yy, i8 signext 0) nounwind			%0 = tail call fastcc %struct.ns* @foo7(%struct.cp* byval(%struct.cp) align 4 %yy, i8 signext 0) nounwind
	ret %struct.ns* %0			ret %struct.ns* %0
	}			}

	; rdar://6195379			; rdar://6195379
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; X64-NEXT: callq f			; X64-NEXT: callq f
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t15:			; X32-LABEL: t15:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: callq f			; X32-NEXT: callq f
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @f(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind			tail call fastcc void @f(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind
	ret void			ret void
	}			}

	▲ Show 20 Lines • Show All 157 Lines • ▼ Show 20 Lines
	; X64-NEXT: callq t21_f_sret			; X64-NEXT: callq t21_f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret:			; X32-LABEL: t21_sret_to_sret:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result) nounwind
	ret void			ret void
	}			}

	Show All 21 Lines
	; X64-NEXT: addq $16, %rsp			; X64-NEXT: addq $16, %rsp
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_alloca:			; X32-LABEL: t21_sret_to_sret_alloca:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: subl $16, %esp			; X32-NEXT: subl $16, %esp
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: movl %esp, %edi			; X32-NEXT: movl %esp, %edi
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: addl $16, %esp			; X32-NEXT: addl $16, %esp
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	%a = alloca %struct.foo, align 8			%a = alloca %struct.foo, align 8
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind
	Show All 21 Lines
	; X64-NEXT: callq f_sret			; X64-NEXT: callq f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_more_args:			; X32-LABEL: t21_sret_to_sret_more_args:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: callq f_sret			; X32-NEXT: callq f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result, i32 %a, i32 %b) nounwind			tail call fastcc void @f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result, i32 %a, i32 %b) nounwind
	ret void			ret void
	}			}

	Show All 18 Lines
	; X64-NEXT: callq t21_f_sret			; X64-NEXT: callq t21_f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_second_arg_sret:			; X32-LABEL: t21_sret_to_sret_second_arg_sret:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %esi, %ebx			; X32-NEXT: movq %rsi, %rbx
	; X32-NEXT: movl %esi, %edi			; X32-NEXT: movq %rsi, %rdi
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind
	ret void			ret void
	}			}

	Show All 23 Lines
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_more_args2:			; X32-LABEL: t21_sret_to_sret_more_args2:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %esi, %eax			; X32-NEXT: movl %esi, %eax
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: movl %edx, %esi			; X32-NEXT: movl %edx, %esi
	; X32-NEXT: movl %eax, %edx			; X32-NEXT: movl %eax, %edx
	; X32-NEXT: callq f_sret			; X32-NEXT: callq f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result, i32 %b, i32 %a) nounwind			tail call fastcc void @f_sret(%struct.foo* noalias sret(%struct.foo) %agg.result, i32 %b, i32 %a) nounwind
	ret void			ret void
	Show All 21 Lines
	; X64-NEXT: callq t21_f_sret			; X64-NEXT: callq t21_f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_args_mismatch:			; X32-LABEL: t21_sret_to_sret_args_mismatch:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: movl %esi, %edi			; X32-NEXT: movq %rsi, %rdi
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind
	ret void			ret void
	}			}

	Show All 18 Lines
	; X64-NEXT: callq t21_f_sret			; X64-NEXT: callq t21_f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_args_mismatch2:			; X32-LABEL: t21_sret_to_sret_args_mismatch2:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: movl %esi, %edi			; X32-NEXT: movq %rsi, %rdi
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %ret) nounwind
	ret void			ret void
	}			}

	Show All 20 Lines
	; X64-NEXT: callq t21_f_sret			; X64-NEXT: callq t21_f_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_arg_mismatch:			; X32-LABEL: t21_sret_to_sret_arg_mismatch:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: callq ret_struct			; X32-NEXT: callq ret_struct
	; X32-NEXT: movl %eax, %edi			; X32-NEXT: movl %eax, %edi
	; X32-NEXT: callq t21_f_sret			; X32-NEXT: callq t21_f_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	%a = call fastcc %struct.foo* @ret_struct()			%a = call fastcc %struct.foo* @ret_struct()
	tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind			tail call fastcc void @t21_f_sret(%struct.foo* noalias sret(%struct.foo) %a) nounwind
	Show All 32 Lines
	; X64-NEXT: movq %r14, %rax			; X64-NEXT: movq %r14, %rax
	; X64-NEXT: addq $8, %rsp			; X64-NEXT: addq $8, %rsp
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: popq %r14			; X64-NEXT: popq %r14
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_sret_structs_mismatch:			; X32-LABEL: t21_sret_to_sret_structs_mismatch:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbp			; X32-NEXT: pushq %r14
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: pushq %rax			; X32-NEXT: pushq %rax
	; X32-NEXT: movl %esi, %ebx			; X32-NEXT: movq %rsi, %rbx
	; X32-NEXT: movl %edi, %ebp			; X32-NEXT: movq %rdi, %r14
	; X32-NEXT: callq ret_struct			; X32-NEXT: callq ret_struct
	; X32-NEXT: movl %ebx, %edi
	; X32-NEXT: movl %eax, %esi			; X32-NEXT: movl %eax, %esi
				; X32-NEXT: movq %rbx, %rdi
	; X32-NEXT: callq t21_f_sret2			; X32-NEXT: callq t21_f_sret2
	; X32-NEXT: movl %ebp, %eax			; X32-NEXT: movl %r14d, %eax
	; X32-NEXT: addl $8, %esp			; X32-NEXT: addl $8, %esp
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: popq %rbp			; X32-NEXT: popq %r14
	; X32-NEXT: retq			; X32-NEXT: retq
	%b = call fastcc %struct.foo* @ret_struct()			%b = call fastcc %struct.foo* @ret_struct()
	tail call fastcc void @t21_f_sret2(%struct.foo* noalias sret(%struct.foo) %a, %struct.foo* noalias %b) nounwind			tail call fastcc void @t21_f_sret2(%struct.foo* noalias sret(%struct.foo) %a, %struct.foo* noalias %b) nounwind
	ret void			ret void
	}			}

	declare ccc %struct.foo* @ret_struct() nounwind			declare ccc %struct.foo* @ret_struct() nounwind

	Show All 17 Lines
	; X64-NEXT: callq t21_f_non_sret			; X64-NEXT: callq t21_f_non_sret
	; X64-NEXT: movq %rbx, %rax			; X64-NEXT: movq %rbx, %rax
	; X64-NEXT: popq %rbx			; X64-NEXT: popq %rbx
	; X64-NEXT: retq			; X64-NEXT: retq
	;			;
	; X32-LABEL: t21_sret_to_non_sret:			; X32-LABEL: t21_sret_to_non_sret:
	; X32: # %bb.0:			; X32: # %bb.0:
	; X32-NEXT: pushq %rbx			; X32-NEXT: pushq %rbx
	; X32-NEXT: movl %edi, %ebx			; X32-NEXT: movq %rdi, %rbx
	; X32-NEXT: callq t21_f_non_sret			; X32-NEXT: callq t21_f_non_sret
	; X32-NEXT: movl %ebx, %eax			; X32-NEXT: movl %ebx, %eax
	; X32-NEXT: popq %rbx			; X32-NEXT: popq %rbx
	; X32-NEXT: retq			; X32-NEXT: retq
	tail call fastcc void @t21_f_non_sret(%struct.foo* %agg.result) nounwind			tail call fastcc void @t21_f_non_sret(%struct.foo* %agg.result) nounwind
	ret void			ret void
	}			}

	Show All 35 Lines

llvm/test/CodeGen/X86/x32-function_pointer-2.ll

	; RUN: llc < %s -mtriple=x86_64-linux-gnux32 \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-linux-gnux32 \| FileCheck %s
	; RUN: llc < %s -mtriple=x86_64-linux-gnux32 -fast-isel \| FileCheck %s			; RUN: llc < %s -mtriple=x86_64-linux-gnux32 -fast-isel \| FileCheck %s

	; Test call function pointer with function argument			; Test call function pointer with function argument
	;			;
	; void bar (void * h, void (foo) (void ))			; void bar (void * h, void (foo) (void ))
	; {			; {
	; foo (h);			; foo (h);
	; foo (h);			; foo (h);
	; }			; }


	define void @bar(i8* %h, void (i8) nocapture %foo) nounwind {			define void @bar(i8* %h, void (i8) nocapture %foo) nounwind {
	entry:			entry:
	tail call void %foo(i8* %h) nounwind			tail call void %foo(i8* %h) nounwind
	; CHECK: mov{{l\|q}} %{{e\|r}}si, %{{e\|r}}[[REG:.*]]{{d?}}			; CHECK: mov{{l\|q}} %{{e\|r}}si,
	; CHECK: callq *%r[[REG]]			; CHECK: callq *%r
				pengfeiUnsubmitted Not Done Reply Inline Actions This file should not be affected, right? pengfei: This file should not be affected, right?
				hvdijkAuthorUnsubmitted Done Reply Inline Actions This file is affected because `callq` can take either the copy of `%rsi` like it did before, or `%rsi` directly since it knows it is already zero-extended. hvdijk: This file is affected because `callq` can take either the copy of `*%rsi` like it did before…
				pengfeiUnsubmitted Not Done Reply Inline Actions I see. But this test should also pass without the patch since you just loosen check conditions. I think it's better to make the conditions more strict. Or at least `[[REG:.]]` in line 16 is not needed. pengfei:* I see. But this test should also pass without the patch since you just loosen check conditions.
				hvdijkAuthorUnsubmitted Done Reply Inline Actions Since both `%rsi` and `%r[[REG]]` are equally valid, without any reason why LLVM should prefer one over the other, I did not want to have a check that only permitted one of them since later changes to LLVM could arbitrarily change it; ideally, the check would be for `%r{si\|[[REG]]}`, but I think FileCheck does not support that. So out of those options, I would say remove the `[[REG:.]]`; I have updated the patch to do that. hvdijk: Since both `%rsi` and `%r[[REG]]` are equally valid, without any reason why LLVM should prefer…
	tail call void %foo(i8* %h) nounwind			tail call void %foo(i8* %h) nounwind
	; CHECK: jmpq %r{{[^,]}}			; CHECK: jmpq *%r
	ret void			ret void
	}			}

llvm/test/CodeGen/X86/x86-64-sret-return.ll

	; RUN: llc -mtriple=x86_64-apple-darwin8 < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-apple-darwin8 < %s \| FileCheck %s
	; RUN: llc -mtriple=x86_64-pc-linux < %s \| FileCheck %s			; RUN: llc -mtriple=x86_64-pc-linux < %s \| FileCheck %s
	; RUN: llc -mtriple=x86_64-pc-linux-gnux32 < %s \| FileCheck -check-prefix=X32ABI %s			; RUN: llc -mtriple=x86_64-pc-linux-gnux32 < %s \| FileCheck -check-prefix=X32ABI %s

	%struct.foo = type { [4 x i64] }			%struct.foo = type { [4 x i64] }

	; CHECK-LABEL: bar:			; CHECK-LABEL: bar:
	; CHECK: movq %rdi, %rax			; CHECK: movq %rdi, %rax

	; For the x32 ABI, pointers are 32-bit so 32-bit instructions will be used			; For the x32 ABI, pointers are 32-bit but passed in zero-extended to 64-bit
				; so either 32-bit or 64-bit instructions may be used.
	; X32ABI-LABEL: bar:			; X32ABI-LABEL: bar:
	; X32ABI: movl %edi, %eax			; X32ABI: mov{{l\|q}} %{{r\|e}}di, %{{r\|e}}ax
				pengfeiUnsubmitted Not Done Reply Inline Actions I saw other tests all use 64-bit instructions. In which case we may use 32-bit instruction? pengfei: I saw other tests all use 64-bit instructions. In which case we may use 32-bit instruction?
				hvdijkAuthorUnsubmitted Done Reply Inline Actions When we need to copy one 64-bit register to another 64-bit register, we can use the 64-bit move instructions to preserve the high bits, or the 32-bit move instructions to zero out the high bits. When the high bits are known to be zero, the 64-bit move instructions and the 32-bit move instructions have the same effect, since zeroing out the high bits when they are already zero does nothing. hvdijk: When we need to copy one 64-bit register to another 64-bit register, we can use the 64-bit move…

	define void @bar(%struct.foo* noalias sret(%struct.foo) %agg.result, %struct.foo* %d) nounwind {			define void @bar(%struct.foo* noalias sret(%struct.foo) %agg.result, %struct.foo* %d) nounwind {
	entry:			entry:
	%d_addr = alloca %struct.foo* ; <%struct.foo**> [#uses=2]			%d_addr = alloca %struct.foo* ; <%struct.foo**> [#uses=2]
	%memtmp = alloca %struct.foo, align 8 ; <%struct.foo*> [#uses=1]			%memtmp = alloca %struct.foo, align 8 ; <%struct.foo*> [#uses=1]
	%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]			%"alloca point" = bitcast i32 0 to i32 ; <i32> [#uses=0]
	store %struct.foo* %d, %struct.foo** %d_addr			store %struct.foo* %d, %struct.foo** %d_addr
	%tmp = load %struct.foo, %struct.foo* %d_addr, align 8 ; <%struct.foo*> [#uses=1]			%tmp = load %struct.foo, %struct.foo* %d_addr, align 8 ; <%struct.foo*> [#uses=1]
	Show All 37 Lines

	return: ; preds = %entry			return: ; preds = %entry
	ret void			ret void
	}			}

	; CHECK-LABEL: foo:			; CHECK-LABEL: foo:
	; CHECK: movq %rdi, %rax			; CHECK: movq %rdi, %rax

	; For the x32 ABI, pointers are 32-bit so 32-bit instructions will be used			; For the x32 ABI, pointers are 32-bit but passed in zero-extended to 64-bit
				; so either 32-bit or 64-bit instructions may be used.
	; X32ABI-LABEL: foo:			; X32ABI-LABEL: foo:
	; X32ABI: movl %edi, %eax			; X32ABI: mov{{l\|q}} %{{r\|e}}di, %{{r\|e}}ax

	define void @foo({ i64 }* noalias nocapture sret({ i64 }) %agg.result) nounwind {			define void @foo({ i64 }* noalias nocapture sret({ i64 }) %agg.result) nounwind {
	store { i64 } { i64 0 }, { i64 }* %agg.result			store { i64 } { i64 0 }, { i64 }* %agg.result
	ret void			ret void
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Zero-extend pointers to i64 for x86_64ClosedPublic

Details

Diff Detail

Event Timeline

When a value of pointer type is returned or passed in a register, bits 32 to 63 shall be zero.

When a value of pointer type is returned or passed in a register, bits 32 to 63 shall be zero.

Revision Contents

Diff 308424

llvm/lib/Target/X86/X86CallingConv.cpp

llvm/lib/Target/X86/X86CallingConv.td

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/test/CodeGen/X86/musttail-varargs.ll

llvm/test/CodeGen/X86/pr38865-2.ll

llvm/test/CodeGen/X86/pr38865-3.ll

llvm/test/CodeGen/X86/pr38865.ll

llvm/test/CodeGen/X86/sibcall.ll

llvm/test/CodeGen/X86/x32-function_pointer-2.ll

llvm/test/CodeGen/X86/x86-64-sret-return.ll

[X86] Zero-extend pointers to i64 for x86_64
ClosedPublic