This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
PrologEpilogInserter.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
zero-call-used-regs.ll
-
X86/
3/5
zero-call-used-regs.ll

Differential D132073

[CodeGen] Zero out only modified registers
AbandonedPublic

Authored by void on Aug 17 2022, 2:40 PM.

Download Raw Diff

Details

Reviewers

nickdesaulniers
nathanchance

Summary

Zeroing out used, but not modified, registers can destroy the contents
of non-volatile registers. This can lead to programs crashing in horky
ways.

Because of this, I'm wary of the -fzero-call-used-regs feature with the
"all-*" options. It would have to be used with great care, which means
it probably has an extremely limited use.

Link: https://github.com/KSPP/linux/issues/192

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

void created this revision.Aug 17 2022, 2:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 2:40 PM

Herald added subscribers: pengfei, hiraditya. · View Herald Transcript

void requested review of this revision.Aug 17 2022, 2:40 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2022, 2:40 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

void added reviewers: nickdesaulniers, nathanchance.Aug 17 2022, 2:42 PM

void edited the summary of this revision. (Show Details)Aug 17 2022, 2:46 PM

nickdesaulniers added a subscriber: kees.Aug 17 2022, 2:48 PM

Harbormaster completed remote builds in B181839: Diff 453434.Aug 17 2022, 3:28 PM

Zeroing out used, but not modified, registers can destroy the contents of non-volatile registers

Do you have a testcase? All the changes to the regression tests involve argument registers, which are not preserved across calls. (At least, by default; -mllvm -enable-ipra exists, but it's disabled at all optimization levels.)

In D132073#3730424, @efriedma wrote:

Zeroing out used, but not modified, registers can destroy the contents of non-volatile registers

Do you have a testcase? All the changes to the regression tests involve argument registers, which are not preserved across calls. (At least, by default; -mllvm -enable-ipra exists, but it's disabled at all optimization levels.)

I do, but it's rather large. I'll see if I can trim it down a bit.

Update testcase.

In D132073#3730424, @efriedma wrote:

Zeroing out used, but not modified, registers can destroy the contents of non-volatile registers

Do you have a testcase? All the changes to the regression tests involve argument registers, which are not preserved across calls. (At least, by default; -mllvm -enable-ipra exists, but it's disabled at all optimization levels.)

I think this should be done now. PTAL.

Can you give an example of a function that is miscompiled without your patch, and is not miscompiled with you patch? I don't see anything obviously wrong with the way we're compiling the given testcases without the patch.

nickdesaulniers added inline comments.Aug 30 2022, 2:55 PM

llvm/test/CodeGen/X86/zero-call-used-regs.ll
26–38	consider adding the `nounwind` fn attr to your test functions to eliminate all these obnoxious/verbose CFI directives stemming from `-fasynchronous-unwind-tables`. That should help with readability of the tests significantly.

Harbormaster completed remote builds in B184239: Diff 456790.Aug 30 2022, 3:07 PM

void added inline comments.Aug 31 2022, 10:58 AM

llvm/test/CodeGen/X86/zero-call-used-regs.ll
26–38	It didn't work. :-/

Remove uwtable

Harbormaster completed remote builds in B184403: Diff 457028.Aug 31 2022, 11:52 AM

Update testcase.

In D132073#3759740, @efriedma wrote:

Can you give an example of a function that is miscompiled without your patch, and is not miscompiled with you patch? I don't see anything obviously wrong with the way we're compiling the given testcases without the patch.

I believe I have it now. I had to find the issue again... PTAL

llvm/test/CodeGen/X86/zero-call-used-regs.ll
26–38	It didn't work. :-/ It was `uwtable`. I removed it.

nickdesaulniers added inline comments.Sep 1 2022, 11:40 AM

llvm/test/CodeGen/X86/zero-call-used-regs.ll
7	I like that the existing test cases are concise; it makes it easier to understand the behavior for the many different values of `zero-call-used-regs`. Rather than update all of the existing tests, would you mind adding your new test case as a new test fn to the file? I suspect since this is affecting the kernel, we'd only need to test `used-gpr`? That should significantly reduce the diffstat to this test file to help reviewers better understand the change.
26–38	ah, I'll keep an eye out for that fn attr in the future, too. Thanks for pinpointing it.

In D132073#3764706, @void wrote:

In D132073#3759740, @efriedma wrote:

Can you give an example of a function that is miscompiled without your patch, and is not miscompiled with you patch? I don't see anything obviously wrong with the way we're compiling the given testcases without the patch.

I believe I have it now. I had to find the issue again... PTAL

What am I looking for? I still don't see anything obviously wrong.

Harbormaster completed remote builds in B184631: Diff 457336.Sep 1 2022, 12:15 PM

In D132073#3764772, @efriedma wrote:

In D132073#3764706, @void wrote:

In D132073#3759740, @efriedma wrote:

Can you give an example of a function that is miscompiled without your patch, and is not miscompiled with you patch? I don't see anything obviously wrong with the way we're compiling the given testcases without the patch.

I believe I have it now. I had to find the issue again... PTAL

What am I looking for? I still don't see anything obviously wrong.

We shouldn't be zeroing out non-volatile registers that aren't modified by the function. In this case, we shouldn't be zeroing %rdi. Note that %rsi is marked as "modified" by the asm statements.

we shouldn't be zeroing %rdi

I don't see how it's a problem if we zero rdi. Even if we don't modify rdi, the caller doesn't have any way of knowing it wasn't modified. The calling convention says it's caller-save.

In D132073#3764848, @void wrote:

In D132073#3764772, @efriedma wrote:

What am I looking for? I still don't see anything obviously wrong.

We shouldn't be zeroing out non-volatile registers that aren't modified by the function. In this case, we shouldn't be zeroing %rdi. Note that %rsi is marked as "modified" by the asm statements.

rsi and rdi are not in any of the clobber lists...or is that what registers are being allocated for the inline asm inputs/outputs? Perhaps a MIR test would make that more obvious? Or an explicit clobber in these test cases?

In D132073#3765159, @efriedma wrote:

we shouldn't be zeroing %rdi

I don't see how it's a problem if we zero rdi. Even if we don't modify rdi, the caller doesn't have any way of knowing it wasn't modified. The calling convention says it's caller-save.

This is just a test case. It's not meant to be definitive of the original bug. Just show the same symptoms.

In D132073#3765192, @nickdesaulniers wrote:

In D132073#3764848, @void wrote:

In D132073#3764772, @efriedma wrote:

What am I looking for? I still don't see anything obviously wrong.

We shouldn't be zeroing out non-volatile registers that aren't modified by the function. In this case, we shouldn't be zeroing %rdi. Note that %rsi is marked as "modified" by the asm statements.

rsi and rdi are not in any of the clobber lists...or is that what registers are being allocated for the inline asm inputs/outputs? Perhaps a MIR test would make that more obvious? Or an explicit clobber in these test cases?

Right, they aren't. Perhaps I should just write a testcase that uses inline asm and a clobber list instead of trying to scrape together something from the Linux sources.

In D132073#3765379, @void wrote:

In D132073#3765159, @efriedma wrote:

we shouldn't be zeroing %rdi

I don't see how it's a problem if we zero rdi. Even if we don't modify rdi, the caller doesn't have any way of knowing it wasn't modified. The calling convention says it's caller-save.

This is just a test case. It's not meant to be definitive of the original bug. Just show the same symptoms.

The thing is, I don't see the connection between this "symptom" and any possible bug. rdi is callee-clobbered, and the caller has no way of knowing whether the implementation actually writes to rdi, therefore it can't be a bug to add a write to rdi.

In D132073#3765433, @efriedma wrote:

In D132073#3765379, @void wrote:

In D132073#3765159, @efriedma wrote:

we shouldn't be zeroing %rdi

I don't see how it's a problem if we zero rdi. Even if we don't modify rdi, the caller doesn't have any way of knowing it wasn't modified. The calling convention says it's caller-save.

This is just a test case. It's not meant to be definitive of the original bug. Just show the same symptoms.

The thing is, I don't see the connection between this "symptom" and any possible bug. rdi is callee-clobbered, and the caller has no way of knowing whether the implementation actually writes to rdi, therefore it can't be a bug to add a write to rdi.

I understand. Here's an example of what's going on, at least with the Linux kernel. A function is defined like this:

u64 __attribute__((no_instrument_function)) _paravirt_ident_64(u64 x)
{
        return x;
}

/* ... */

struct paravirt_patch_template pv_ops = {
  /* ... */
  .mmu.pmd_val = ((struct paravirt_callee_save) { _paravirt_ident_64 }),
  /* ... */
};

The generated code is like this:

        .text
        .globl  _paravirt_ident_64              # -- Begin function _paravirt_ident_64
        .p2align        4, 0x90
        .type   _paravirt_ident_64,@function
_paravirt_ident_64:                     # @_paravirt_ident_64
.Lfunc_begin2:
        .loc    0 85 0                          # arch/x86/kernel/paravirt.c:85:0
        .cfi_startproc
# %bb.0:
        #DEBUG_VALUE: _paravirt_ident_64:x <- $rdi
        movq    %rdi, %rax
.Ltmp22:
        .loc    0 86 2 prologue_end             # arch/x86/kernel/paravirt.c:86:2
        xorl    %edi, %edi
.Ltmp23:
        #DEBUG_VALUE: _paravirt_ident_64:x <- $rax
        jmp     __x86_return_thunk              # TAILCALL
.Ltmp24:
.Lfunc_end2:
        .size   _paravirt_ident_64, .Lfunc_end2-_paravirt_ident_64
        .cfi_endproc

As you mentioned, this *should* be okay, as %rdi is caller-saved. However, when it's used in a call, like this:

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((no_instrument_function)) pmdval_t
pmd_val(pmd_t pmd)
{
        return ({
                unsigned long __edi = __edi, __esi = __esi, __edx = __edx,
                              __ecx = __ecx, __eax = __eax;
                ;
                ((void)pv_ops.mmu.pmd_val.func);
                asm volatile("# ALT: oldnstr\n"
                             "661:\n\t"
                             "771:\n\t"
                             "999:\n\t"
                             ".pushsection .discard.retpoline_safe\n\t"
                              ".quad  999b\n\t"
                             ".popsection\n\t"
                             "call *%[paravirt_opptr];\n772:\n"
                             ".pushsection .parainstructions,\"a\"\n"
                              ".balign 8 \n"
                              ".quad  771b\n"
                             "  .byte "
                             "%c[paravirt_typenum]\n"
                             "  .byte 772b-771b\n"
                             "  .short %c[paravirt_clobber]\n"
                             ".popsection\n"
                             "\n662:\n"
                             "# ALT: padding\n"
                             ".skip -(((6651f-6641f)-(662b-661b)) > 0) * ((6651f-6641f)-(662b-661b)),0x90\n"
                             "663:\n"
                             ".pushsection .altinstructions,\"a\"\n"
                             " .long 661b - .\n"
                             " .long 6641f - .\n"
                             " .word ((( 8*32+16)) | (1 << 15))\n"
                             " .byte 663b-661b\n"
                             " .byte 6651f-6641f\n"
                             ".popsection\n"
                             ".pushsection .altinstr_replacement, \"ax\"\n"
                             "# ALT: replacement 1\n"
                             "6641:\n\t"
                             "mov %%rdi, %%rax\n"
                             "6651:\n"
                             ".popsection\n"
                             : "=a"(__eax), "+r"(current_stack_pointer)
                             : [paravirt_typenum] "i"(
                                       (__builtin_offsetof(
                                                struct paravirt_patch_template,
                                                mmu.pmd_val.func) /
                                        sizeof(void *))),
                               [paravirt_opptr] "m"(pv_ops.mmu.pmd_val.func),
                               [paravirt_clobber] "i"(((1 << 0))),
                               "D"((unsigned long)(pmd.pmd))
                             : "memory", "cc");
                ({
                        unsigned long __mask = ~0UL;
                        do {
                                __attribute__((__noreturn__)) extern void
                                __compiletime_assert_29(void) __attribute__((__error__(
                                        "BUILD_BUG_ON failed: "
                                        "sizeof(pmdval_t) > sizeof(unsigned long)")));
                                if (!(!(sizeof(pmdval_t) >
                                        sizeof(unsigned long))))
                                        __compiletime_assert_29();
                        } while (0);
                        switch (sizeof(pmdval_t)) {
                        case 1:
                                __mask = 0xffUL;
                                break;
                        case 2:
                                __mask = 0xffffUL;
                                break;
                        case 4:
                                __mask = 0xffffffffUL;
                                break;
                        default:
                                break;
                        }
                        __mask &__eax;
                });
        });
}

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((no_instrument_function)) pmdval_t
native_pmd_val(pmd_t pmd)
{
        return pmd.pmd;
}       

static inline __attribute__((__gnu_inline__)) __attribute__((__unused__))
__attribute__((no_instrument_function)) int
pmd_none(pmd_t pmd)
{
        unsigned long val = native_pmd_val(pmd);
        return (val & ~((((pteval_t)(1)) << 6) | (((pteval_t)(1)) << 5))) == 0;
}


int pmd_huge(pmd_t pmd)
{
        return !pmd_none(pmd) && (pmd_val(pmd) & ((((pteval_t)(1)) << 0) |
                                                  (((pteval_t)(1)) << 7))) !=
                                         (((pteval_t)(1)) << 0);
}

The code generated is something like this:

        .globl  pmd_huge                        # -- Begin function pmd_huge
        .p2align        4, 0x90
        .type   pmd_huge,@function
pmd_huge:                               # @pmd_huge
.Lfunc_begin0:
        .loc    0 28 0                          # arch/x86/mm/hugetlbpage.c:28:0
        .cfi_sections .debug_frame
        .cfi_startproc
# %bb.0:
        callq   __fentry__
.Ltmp1:
        #DEBUG_VALUE: pmd_huge:pmd <- $rdi
        #DEBUG_VALUE: pmd_none:pmd <- $rdi
        #DEBUG_VALUE: pmd_none:val <- $rdi
        .file   161 "./arch/x86/include/asm" "pgtable.h" md5 0x203fee1b33aab4131e5eb70cbeeb9a3d
        .loc    161 793 41 prologue_end         # ./arch/x86/include/asm/pgtable.h:793:41
        testq   $-97, %rdi
.Ltmp2:
        .loc    0 29 24                         # arch/x86/mm/hugetlbpage.c:29:24
        je      .LBB0_1
.Ltmp3:
# %bb.2:
        #DEBUG_VALUE: pmd_huge:pmd <- $rdi
        #DEBUG_VALUE: pmd_val:pmd <- $rdi
        #DEBUG_VALUE: __edi <- undef
        #DEBUG_VALUE: __esi <- undef
        #DEBUG_VALUE: __edx <- undef
        #DEBUG_VALUE: __ecx <- undef
        #DEBUG_VALUE: __eax <- undef
        .file   162 "./arch/x86/include/asm" "paravirt.h" md5 0x1f771689a20be93009d7ef8574461232
        .loc    162 458 9                       # ./arch/x86/include/asm/paravirt.h:458:9
        #APP
        # ALT: oldnstr
.Ltmp4:
.Ltmp5:
.Ltmp6:
        .section        .discard.retpoline_safe,"",@progbits
        .quad   .Ltmp6
        .text

        callq   *pv_ops+536(%rip)

.Ltmp7:
        .section        .parainstructions,"a",@progbits
        .p2align        3, 0x0
        .quad   .Ltmp5
        .byte   67
        .byte   .Ltmp7-.Ltmp5
        .short  1
        .text


.Ltmp8:
        # ALT: padding
        .zero   (-(((.Ltmp9-.Ltmp10)-(.Ltmp8-.Ltmp4))>0))*((.Ltmp9-.Ltmp10)-(.Ltmp8-.Ltmp4)),144
.Ltmp11:
        .section        .altinstructions,"a",@progbits
.Ltmp12:
        .long   .Ltmp4-.Ltmp12
.Ltmp13:
        .long   .Ltmp10-.Ltmp13
        .short  33040
        .byte   .Ltmp11-.Ltmp4
        .byte   .Ltmp9-.Ltmp10
        .text

        .section        .altinstr_replacement,"ax",@progbits
        # ALT: replacement 1
.Ltmp10:
        movq    %rdi, %rax
.Ltmp9:
        .text


        #NO_APP
        movq    %rax, %rcx
.Ltmp14:
        #DEBUG_VALUE: __eax <- $rax
        .loc    0 30 17                         # arch/x86/mm/hugetlbpage.c:30:17
        andl    $129, %ecx
        .loc    0 30 46 is_stmt 0               # arch/x86/mm/hugetlbpage.c:30:46
        xorl    %eax, %eax
.Ltmp15:
        cmpl    $1, %ecx
        setne   %al
        jmp     .LBB0_3
.Ltmp16:
.LBB0_1:
        #DEBUG_VALUE: pmd_huge:pmd <- $rdi
        #DEBUG_VALUE: pmd_none:pmd <- $rdi
        #DEBUG_VALUE: pmd_none:val <- $rdi
        .loc    0 0 46                          # arch/x86/mm/hugetlbpage.c:0:46
        xorl    %eax, %eax
.Ltmp17:
.LBB0_3:
        #DEBUG_VALUE: pmd_huge:pmd <- $rdi
        .loc    0 29 2 is_stmt 1                # arch/x86/mm/hugetlbpage.c:29:2
        xorl    %ecx, %ecx
        xorl    %edi, %edi
.Ltmp18:
        #DEBUG_VALUE: pmd_huge:pmd <- [DW_OP_LLVM_entry_value 1] $rdi
        jmp     __x86_return_thunk              # TAILCALL
.Ltmp19:
.Lfunc_end0:
        .size   pmd_huge, .Lfunc_end0-pmd_huge
        .cfi_endproc
                                        # -- End function

The important bit is the movq %rdi, %rax right after the callq *pv_ops+536(%rip) instruction. The rdi register isn't preserved across the call, and is in fact zeroed out so the return value, which was already placed in rax by pmd_val, is overwritten by zero. My thinking was that we don't want to zero out registers that aren't modified in the called function, because they're not expected to be stomped on. Perhaps there's something else going on?

The important bit is the movq %rdi, %rax right after the callq *pv_ops+536(%rip) instruction.

I read that clump of assembly as "if paravirtualization is on, call *pv_ops+536(%rip); if it's off, replace the call with movq %rdi, %rax". There isn't any expectation that RDI is preserved.

At least, that's my understanding of the intent. It looks like the asm doesn't mark up its outputs/clobbers correctly, though; it needs to mark all the registers the calling convention says are caller-saved (rcx, rdx, rdi, rsi, r8, r9, r10, r11) as outputs or clobbers. Not sure if that ends up mattering here.

In D132073#3765570, @efriedma wrote:

The important bit is the movq %rdi, %rax right after the callq *pv_ops+536(%rip) instruction.

I read that clump of assembly as "if paravirtualization is on, call *pv_ops+536(%rip); if it's off, replace the call with movq %rdi, %rax". There isn't any expectation that RDI is preserved.

At least, that's my understanding of the intent. It looks like the asm doesn't mark up its outputs/clobbers correctly, though; it needs to mark all the registers the calling convention says are caller-saved (rcx, rdx, rdi, rsi, r8, r9, r10, r11) as outputs or clobbers. Not sure if that ends up mattering here.

However, if I comment out xorl %edi, %edi in paravirt_ident_64 the kernel boots up. So *something* is expecting rdi's value to be retained. (This isn't the only calling place, of course.)

Looking at the original source file, is pmd_huge getting inlined? If it is, the missing output/clobber operands probably matter.

In D132073#3765634, @efriedma wrote:

Looking at the original source file, is pmd_huge getting inlined? If it is, the missing output/clobber operands probably matter.

pmd_huge isn't inlined, but other callers are. Let me see if adding proper output/clobbers helps.

In D132073#3765634, @efriedma wrote:

Looking at the original source file, is pmd_huge getting inlined? If it is, the missing output/clobber operands probably matter.

When I add a "=D" constraint the kernel boots up...*sigh* That may have been it...

In D132073#3765733, @void wrote:

In D132073#3765634, @efriedma wrote:

Looking at the original source file, is pmd_huge getting inlined? If it is, the missing output/clobber operands probably matter.

When I add a "=D" constraint the kernel boots up...*sigh* That may have been it...

I'm going to abandon this change. It looks like the error is in the Linux kernel itself. Thanks for all of your help!

LKML thread: https://lore.kernel.org/lkml/20220902213750.1124421-3-morbo@google.com/

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

PrologEpilogInserter.cpp

6 lines

test/

CodeGen/

AArch64/

zero-call-used-regs.ll

8 lines

X86/

zero-call-used-regs.ll

608 lines

Diff 457336

llvm/lib/CodeGen/PrologEpilogInserter.cpp

Show First 20 Lines • Show All 1,187 Lines • ▼ Show 20 Lines	void PEI::insertZeroCallUsedRegs(MachineFunction &MF) {
const bool OnlyUsed = static_cast<unsigned>(ZeroRegsKind) & ONLY_USED;		const bool OnlyUsed = static_cast<unsigned>(ZeroRegsKind) & ONLY_USED;
const bool OnlyArg = static_cast<unsigned>(ZeroRegsKind) & ONLY_ARG;		const bool OnlyArg = static_cast<unsigned>(ZeroRegsKind) & ONLY_ARG;

const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo &TRI = *MF.getSubtarget().getRegisterInfo();
const BitVector AllocatableSet(TRI.getAllocatableSet(MF));		const BitVector AllocatableSet(TRI.getAllocatableSet(MF));

// Mark all used registers.		// Mark all used registers.
BitVector UsedRegs(TRI.getNumRegs());		BitVector UsedRegs(TRI.getNumRegs());
if (OnlyUsed)		if (OnlyUsed) {
		const MachineRegisterInfo &MRI = MF.getRegInfo();
for (const MachineBasicBlock &MBB : MF)		for (const MachineBasicBlock &MBB : MF)
for (const MachineInstr &MI : MBB)		for (const MachineInstr &MI : MBB)
for (const MachineOperand &MO : MI.operands()) {		for (const MachineOperand &MO : MI.operands()) {
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;

MCRegister Reg = MO.getReg();		MCRegister Reg = MO.getReg();
if (AllocatableSet[Reg] && !MO.isImplicit() &&		if (AllocatableSet[Reg] && !MO.isImplicit() &&
(MO.isDef() \|\| MO.isUse()))		MRI.isPhysRegModified(Reg))
UsedRegs.set(Reg);		UsedRegs.set(Reg);
}		}
		}

// Get a list of registers that are used.		// Get a list of registers that are used.
BitVector LiveIns(TRI.getNumRegs());		BitVector LiveIns(TRI.getNumRegs());
for (const MachineBasicBlock::RegisterMaskPair &LI : MF.front().liveins())		for (const MachineBasicBlock::RegisterMaskPair &LI : MF.front().liveins())
LiveIns.set(LI.PhysReg);		LiveIns.set(LI.PhysReg);

BitVector RegsToZero(TRI.getNumRegs());		BitVector RegsToZero(TRI.getNumRegs());
for (MCRegister Reg : AllocatableSet.set_bits()) {		for (MCRegister Reg : AllocatableSet.set_bits()) {
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/zero-call-used-regs.ll

Show All 16 Lines	entry:
ret i32 %or		ret i32 %or
}		}

define dso_local i32 @used_gpr_arg(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-gpr-arg" {		define dso_local i32 @used_gpr_arg(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-gpr-arg" {
; CHECK-LABEL: used_gpr_arg:		; CHECK-LABEL: used_gpr_arg:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mul w8, w1, w0		; CHECK-NEXT: mul w8, w1, w0
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: orr w0, w8, w2
; CHECK-NEXT: mov x1, #0
; CHECK-NEXT: mov x2, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%or = or i32 %mul, %c		%or = or i32 %mul, %c
ret i32 %or		ret i32 %or
}		}

define dso_local i32 @used_gpr(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-gpr" {		define dso_local i32 @used_gpr(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-gpr" {
; CHECK-LABEL: used_gpr:		; CHECK-LABEL: used_gpr:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mul w8, w1, w0		; CHECK-NEXT: mul w8, w1, w0
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: orr w0, w8, w2
; CHECK-NEXT: mov x1, #0
; CHECK-NEXT: mov x2, #0
; CHECK-NEXT: mov x8, #0		; CHECK-NEXT: mov x8, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%or = or i32 %mul, %c		%or = or i32 %mul, %c
ret i32 %or		ret i32 %or
}		}

define dso_local i32 @used_arg(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-arg" {		define dso_local i32 @used_arg(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used-arg" {
; CHECK-LABEL: used_arg:		; CHECK-LABEL: used_arg:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mul w8, w1, w0		; CHECK-NEXT: mul w8, w1, w0
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: orr w0, w8, w2
; CHECK-NEXT: mov x1, #0
; CHECK-NEXT: mov x2, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%or = or i32 %mul, %c		%or = or i32 %mul, %c
ret i32 %or		ret i32 %or
}		}

define dso_local i32 @used(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used" {		define dso_local i32 @used(i32 noundef %a, i32 noundef %b, i32 noundef %c) local_unnamed_addr #0 noinline optnone "zero-call-used-regs"="used" {
; CHECK-LABEL: used:		; CHECK-LABEL: used:
; CHECK: // %bb.0: // %entry		; CHECK: // %bb.0: // %entry
; CHECK-NEXT: mul w8, w1, w0		; CHECK-NEXT: mul w8, w1, w0
; CHECK-NEXT: orr w0, w8, w2		; CHECK-NEXT: orr w0, w8, w2
; CHECK-NEXT: mov x1, #0
; CHECK-NEXT: mov x2, #0
; CHECK-NEXT: mov x8, #0		; CHECK-NEXT: mov x8, #0
; CHECK-NEXT: ret		; CHECK-NEXT: ret

entry:		entry:
%mul = mul nsw i32 %b, %a		%mul = mul nsw i32 %b, %a
%or = or i32 %mul, %c		%or = or i32 %mul, %c
ret i32 %or		ret i32 %or
}		}
▲ Show 20 Lines • Show All 586 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/zero-call-used-regs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=i386-unknown-linux-gnu \| FileCheck %s --check-prefix=I386			; RUN: llc < %s -mtriple=i386-unknown-linux-gnu \| FileCheck %s --check-prefix=I386
	; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=X86-64			; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu \| FileCheck %s --check-prefix=X86-64

	@result = dso_local global i32 0, align 4			%struct.paravirt_patch_template = type { %struct.pv_cpu_ops, %struct.pv_irq_ops, %struct.pv_mmu_ops, %struct.pv_lock_ops }
				%struct.pv_cpu_ops = type { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }
				%struct.pv_irq_ops = type { %struct.paravirt_callee_save, %struct.paravirt_callee_save, %struct.paravirt_callee_save, ptr, ptr }
				%struct.paravirt_callee_save = type { ptr }
				%struct.pv_mmu_ops = type { ptr, ptr, ptr, ptr, ptr, ptr, ptr, %struct.paravirt_callee_save, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, %struct.paravirt_callee_save, %struct.paravirt_callee_save, %struct.paravirt_callee_save, %struct.paravirt_callee_save, ptr, %struct.paravirt_callee_save, %struct.paravirt_callee_save, %struct.paravirt_callee_save, %struct.paravirt_callee_save, ptr, %struct.pv_lazy_ops, ptr }
				%struct.pv_lazy_ops = type { ptr, ptr, ptr }
				%struct.pv_lock_ops = type { ptr, %struct.paravirt_callee_save, ptr, ptr, %struct.paravirt_callee_save }
				%struct.anon.95 = type <{ i8, i32 }>

	define dso_local i32 @skip(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="skip" {			@pv_ops = external dso_local global %struct.paravirt_patch_template, align 8
	nickdesaulniersUnsubmitted Not Done Reply Inline Actions I like that the existing test cases are concise; it makes it easier to understand the behavior for the many different values of `zero-call-used-regs`. Rather than update all of the existing tests, would you mind adding your new test case as a new test fn to the file? I suspect since this is affecting the kernel, we'd only need to test `used-gpr`? That should significantly reduce the diffstat to this test file to help reviewers better understand the change. nickdesaulniers: I like that the existing test cases are concise; it makes it easier to understand the behavior…

				declare dso_local void @g()

				declare dso_local void @h()

				define dso_local i32 @skip(i8 noundef zeroext %arg, ptr noundef writeonly %arg1, i64 noundef %arg2, i32 noundef %arg3) local_unnamed_addr #0 "zero-call-used-regs"="skip" {
	; I386-LABEL: skip:			; I386-LABEL: skip:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %bb
				; I386-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
				; I386-NEXT: movzbl {{[0-9]+}}(%esp), %edx
				; I386-NEXT: movl pv_ops(,%edx,4), %edx
				; I386-NEXT: testl %edx, %edx
				; I386-NEXT: je .LBB0_1
				; I386-NEXT: # %bb.2: # %bb12
				; I386-NEXT: cmpl $h, %edx
				; I386-NEXT: je .LBB0_3
				; I386-NEXT: # %bb.4: # %bb14
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				nickdesaulniersUnsubmitted Not Done Reply Inline Actions consider adding the `nounwind` fn attr to your test functions to eliminate all these obnoxious/verbose CFI directives stemming from `-fasynchronous-unwind-tables`. That should help with readability of the tests significantly. nickdesaulniers: consider adding the `nounwind` fn attr to your test functions to eliminate all these…
				voidAuthorUnsubmitted Done Reply Inline Actions It didn't work. :-/ void: It didn't work. :-/
				voidAuthorUnsubmitted Done Reply Inline Actions It didn't work. :-/ It was `uwtable`. I removed it. void: > It didn't work. :-/ It was `uwtable`. I removed it.
				nickdesaulniersUnsubmitted Done Reply Inline Actions ah, I'll keep an eye out for that fn attr in the future, too. Thanks for pinpointing it. nickdesaulniers: ah, I'll keep an eye out for that fn attr in the future, too. Thanks for pinpointing it.
				; I386-NEXT: jmp .LBB0_5
				; I386-NEXT: .LBB0_1: # %bb7
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: movl $g, %edx
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: .LBB0_5: # %bb19
				; I386-NEXT: subl %ecx, %edx
				; I386-NEXT: movb $-24, (%eax)
				; I386-NEXT: addl $-5, %edx
				; I386-NEXT: movl %edx, 1(%eax)
				; I386-NEXT: movl $5, %eax
				; I386-NEXT: retl
				; I386-NEXT: .LBB0_3:
				; I386-NEXT: xorl %eax, %eax
	; I386-NEXT: retl			; I386-NEXT: retl
	;			;
	; X86-64-LABEL: skip:			; X86-64-LABEL: skip:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %bb
	; X86-64-NEXT: movl %edi, %eax			; X86-64-NEXT: movl %edi, %eax
				; X86-64-NEXT: movq pv_ops(,%rax,8), %rax
				; X86-64-NEXT: testq %rax, %rax
				; X86-64-NEXT: je .LBB0_1
				; X86-64-NEXT: # %bb.2: # %bb12
				; X86-64-NEXT: cmpq $h, %rax
				; X86-64-NEXT: je .LBB0_3
				; X86-64-NEXT: # %bb.4: # %bb14
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: jmp .LBB0_5
				; X86-64-NEXT: .LBB0_1: # %bb7
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: movl $g, %eax
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: .LBB0_5: # %bb19
				; X86-64-NEXT: movb $-24, (%rsi)
				; X86-64-NEXT: subl %edx, %eax
				; X86-64-NEXT: addl $-5, %eax
				; X86-64-NEXT: movl %eax, 1(%rsi)
				; X86-64-NEXT: movl $5, %eax
				; X86-64-NEXT: retq
				; X86-64-NEXT: .LBB0_3:
				; X86-64-NEXT: xorl %eax, %eax
	; X86-64-NEXT: retq			; X86-64-NEXT: retq

	entry:			bb:
	ret i32 %x			%i = zext i8 %arg to i64
				%i4 = getelementptr ptr, ptr @pv_ops, i64 %i
				%i5 = load ptr, ptr %i4, align 8
				%i6 = icmp eq ptr %i5, null
				br i1 %i6, label %bb7, label %bb12

				bb7: ; preds = %bb
				%i8 = inttoptr i64 %arg2 to ptr
				%i9 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i10 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i8) #3
				%i11 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull @g) #3
				br label %bb19

				bb12: ; preds = %bb
				%i13 = icmp eq ptr %i5, @h
				br i1 %i13, label %bb29, label %bb14

				bb14: ; preds = %bb12
				%i15 = inttoptr i64 %arg2 to ptr
				%i16 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i17 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i15) #3
				%i18 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull %i5) #3
				br label %bb19

				bb19: ; preds = %bb14, %bb7
				%i20 = phi ptr [ %i16, %bb14 ], [ %i9, %bb7 ]
				%i21 = phi ptr [ %i18, %bb14 ], [ %i11, %bb7 ]
				%i22 = phi ptr [ %i17, %bb14 ], [ %i10, %bb7 ]
				store i8 -24, ptr %i20, align 1
				%i23 = ptrtoint ptr %i21 to i64
				%i24 = getelementptr i8, ptr %i22, i64 5
				%i25 = ptrtoint ptr %i24 to i64
				%i26 = sub i64 %i23, %i25
				%i27 = trunc i64 %i26 to i32
				%i28 = getelementptr inbounds %struct.anon.95, ptr %i20, i64 0, i32 1
				store i32 %i27, ptr %i28, align 1
				br label %bb29

				bb29: ; preds = %bb19, %bb12
				%i30 = phi i32 [ 0, %bb12 ], [ 5, %bb19 ]
				ret i32 %i30
	}			}

	define dso_local i32 @used_gpr_arg(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="used-gpr-arg" {			define dso_local i32 @used_gpr_arg(i8 noundef zeroext %arg, ptr noundef writeonly %arg1, i64 noundef %arg2, i32 noundef %arg3) local_unnamed_addr #0 "zero-call-used-regs"="used-gpr-arg" {
	; I386-LABEL: used_gpr_arg:			; I386-LABEL: used_gpr_arg:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %bb
				; I386-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
				; I386-NEXT: movzbl {{[0-9]+}}(%esp), %edx
				; I386-NEXT: movl pv_ops(,%edx,4), %edx
				; I386-NEXT: testl %edx, %edx
				; I386-NEXT: je .LBB1_1
				; I386-NEXT: # %bb.2: # %bb12
				; I386-NEXT: cmpl $h, %edx
				; I386-NEXT: je .LBB1_3
				; I386-NEXT: # %bb.4: # %bb14
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: jmp .LBB1_5
				; I386-NEXT: .LBB1_1: # %bb7
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: movl $g, %edx
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: .LBB1_5: # %bb19
				; I386-NEXT: subl %ecx, %edx
				; I386-NEXT: movb $-24, (%eax)
				; I386-NEXT: addl $-5, %edx
				; I386-NEXT: movl %edx, 1(%eax)
				; I386-NEXT: movl $5, %eax
				; I386-NEXT: retl
				; I386-NEXT: .LBB1_3:
				; I386-NEXT: xorl %eax, %eax
	; I386-NEXT: retl			; I386-NEXT: retl
	;			;
	; X86-64-LABEL: used_gpr_arg:			; X86-64-LABEL: used_gpr_arg:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %bb
	; X86-64-NEXT: movl %edi, %eax			; X86-64-NEXT: movl %edi, %eax
	; X86-64-NEXT: xorl %edi, %edi			; X86-64-NEXT: movq pv_ops(,%rax,8), %rax
				; X86-64-NEXT: testq %rax, %rax
				; X86-64-NEXT: je .LBB1_1
				; X86-64-NEXT: # %bb.2: # %bb12
				; X86-64-NEXT: cmpq $h, %rax
				; X86-64-NEXT: je .LBB1_3
				; X86-64-NEXT: # %bb.4: # %bb14
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: jmp .LBB1_5
				; X86-64-NEXT: .LBB1_1: # %bb7
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: movl $g, %eax
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: .LBB1_5: # %bb19
				; X86-64-NEXT: movb $-24, (%rsi)
				; X86-64-NEXT: subl %edx, %eax
				; X86-64-NEXT: addl $-5, %eax
				; X86-64-NEXT: movl %eax, 1(%rsi)
				; X86-64-NEXT: movl $5, %eax
				; X86-64-NEXT: jmp .LBB1_6
				; X86-64-NEXT: .LBB1_3:
				; X86-64-NEXT: xorl %eax, %eax
				; X86-64-NEXT: .LBB1_6: # %bb29
				; X86-64-NEXT: xorl %edx, %edx
				; X86-64-NEXT: xorl %esi, %esi
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
				bb:
	entry:			%i = zext i8 %arg to i64
	ret i32 %x			%i4 = getelementptr ptr, ptr @pv_ops, i64 %i
				%i5 = load ptr, ptr %i4, align 8
				%i6 = icmp eq ptr %i5, null
				br i1 %i6, label %bb7, label %bb12

				bb7: ; preds = %bb
				%i8 = inttoptr i64 %arg2 to ptr
				%i9 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i10 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i8) #3
				%i11 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull @g) #3
				br label %bb19

				bb12: ; preds = %bb
				%i13 = icmp eq ptr %i5, @h
				br i1 %i13, label %bb29, label %bb14

				bb14: ; preds = %bb12
				%i15 = inttoptr i64 %arg2 to ptr
				%i16 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i17 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i15) #3
				%i18 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull %i5) #3
				br label %bb19

				bb19: ; preds = %bb14, %bb7
				%i20 = phi ptr [ %i16, %bb14 ], [ %i9, %bb7 ]
				%i21 = phi ptr [ %i18, %bb14 ], [ %i11, %bb7 ]
				%i22 = phi ptr [ %i17, %bb14 ], [ %i10, %bb7 ]
				store i8 -24, ptr %i20, align 1
				%i23 = ptrtoint ptr %i21 to i64
				%i24 = getelementptr i8, ptr %i22, i64 5
				%i25 = ptrtoint ptr %i24 to i64
				%i26 = sub i64 %i23, %i25
				%i27 = trunc i64 %i26 to i32
				%i28 = getelementptr inbounds %struct.anon.95, ptr %i20, i64 0, i32 1
				store i32 %i27, ptr %i28, align 1
				br label %bb29

				bb29: ; preds = %bb19, %bb12
				%i30 = phi i32 [ 0, %bb12 ], [ 5, %bb19 ]
				ret i32 %i30
	}			}

	define dso_local i32 @used_gpr(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="used-gpr" {			define dso_local i32 @used_gpr(i8 noundef zeroext %arg, ptr noundef writeonly %arg1, i64 noundef %arg2, i32 noundef %arg3) local_unnamed_addr #0 "zero-call-used-regs"="used-gpr" {
	; I386-LABEL: used_gpr:			; I386-LABEL: used_gpr:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %bb
				; I386-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
				; I386-NEXT: movzbl {{[0-9]+}}(%esp), %edx
				; I386-NEXT: movl pv_ops(,%edx,4), %edx
				; I386-NEXT: testl %edx, %edx
				; I386-NEXT: je .LBB2_1
				; I386-NEXT: # %bb.2: # %bb12
				; I386-NEXT: cmpl $h, %edx
				; I386-NEXT: je .LBB2_3
				; I386-NEXT: # %bb.4: # %bb14
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: jmp .LBB2_5
				; I386-NEXT: .LBB2_1: # %bb7
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: movl $g, %edx
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: .LBB2_5: # %bb19
				; I386-NEXT: subl %ecx, %edx
				; I386-NEXT: movb $-24, (%eax)
				; I386-NEXT: addl $-5, %edx
				; I386-NEXT: movl %edx, 1(%eax)
				; I386-NEXT: movl $5, %eax
				; I386-NEXT: jmp .LBB2_6
				; I386-NEXT: .LBB2_3:
				; I386-NEXT: xorl %eax, %eax
				; I386-NEXT: .LBB2_6: # %bb29
				; I386-NEXT: xorl %ecx, %ecx
				; I386-NEXT: xorl %edx, %edx
	; I386-NEXT: retl			; I386-NEXT: retl
	;			;
	; X86-64-LABEL: used_gpr:			; X86-64-LABEL: used_gpr:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %bb
	; X86-64-NEXT: movl %edi, %eax			; X86-64-NEXT: movl %edi, %eax
	; X86-64-NEXT: xorl %edi, %edi			; X86-64-NEXT: movq pv_ops(,%rax,8), %rax
				; X86-64-NEXT: testq %rax, %rax
				; X86-64-NEXT: je .LBB2_1
				; X86-64-NEXT: # %bb.2: # %bb12
				; X86-64-NEXT: cmpq $h, %rax
				; X86-64-NEXT: je .LBB2_3
				; X86-64-NEXT: # %bb.4: # %bb14
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: jmp .LBB2_5
				; X86-64-NEXT: .LBB2_1: # %bb7
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: movl $g, %eax
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: .LBB2_5: # %bb19
				; X86-64-NEXT: movb $-24, (%rsi)
				; X86-64-NEXT: subl %edx, %eax
				; X86-64-NEXT: addl $-5, %eax
				; X86-64-NEXT: movl %eax, 1(%rsi)
				; X86-64-NEXT: movl $5, %eax
				; X86-64-NEXT: jmp .LBB2_6
				; X86-64-NEXT: .LBB2_3:
				; X86-64-NEXT: xorl %eax, %eax
				; X86-64-NEXT: .LBB2_6: # %bb29
				; X86-64-NEXT: xorl %edx, %edx
				; X86-64-NEXT: xorl %esi, %esi
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
				bb:
	entry:			%i = zext i8 %arg to i64
	ret i32 %x			%i4 = getelementptr ptr, ptr @pv_ops, i64 %i
				%i5 = load ptr, ptr %i4, align 8
				%i6 = icmp eq ptr %i5, null
				br i1 %i6, label %bb7, label %bb12

				bb7: ; preds = %bb
				%i8 = inttoptr i64 %arg2 to ptr
				%i9 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i10 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i8) #3
				%i11 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull @g) #3
				br label %bb19

				bb12: ; preds = %bb
				%i13 = icmp eq ptr %i5, @h
				br i1 %i13, label %bb29, label %bb14

				bb14: ; preds = %bb12
				%i15 = inttoptr i64 %arg2 to ptr
				%i16 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i17 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i15) #3
				%i18 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull %i5) #3
				br label %bb19

				bb19: ; preds = %bb14, %bb7
				%i20 = phi ptr [ %i16, %bb14 ], [ %i9, %bb7 ]
				%i21 = phi ptr [ %i18, %bb14 ], [ %i11, %bb7 ]
				%i22 = phi ptr [ %i17, %bb14 ], [ %i10, %bb7 ]
				store i8 -24, ptr %i20, align 1
				%i23 = ptrtoint ptr %i21 to i64
				%i24 = getelementptr i8, ptr %i22, i64 5
				%i25 = ptrtoint ptr %i24 to i64
				%i26 = sub i64 %i23, %i25
				%i27 = trunc i64 %i26 to i32
				%i28 = getelementptr inbounds %struct.anon.95, ptr %i20, i64 0, i32 1
				store i32 %i27, ptr %i28, align 1
				br label %bb29

				bb29: ; preds = %bb19, %bb12
				%i30 = phi i32 [ 0, %bb12 ], [ 5, %bb19 ]
				ret i32 %i30
	}			}

	define dso_local i32 @used_arg(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="used-arg" {			define dso_local i32 @used_arg(i8 noundef zeroext %arg, ptr noundef writeonly %arg1, i64 noundef %arg2, i32 noundef %arg3) local_unnamed_addr #0 "zero-call-used-regs"="used-arg" {
	; I386-LABEL: used_arg:			; I386-LABEL: used_arg:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %bb
				; I386-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
				; I386-NEXT: movzbl {{[0-9]+}}(%esp), %edx
				; I386-NEXT: movl pv_ops(,%edx,4), %edx
				; I386-NEXT: testl %edx, %edx
				; I386-NEXT: je .LBB3_1
				; I386-NEXT: # %bb.2: # %bb12
				; I386-NEXT: cmpl $h, %edx
				; I386-NEXT: je .LBB3_3
				; I386-NEXT: # %bb.4: # %bb14
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: jmp .LBB3_5
				; I386-NEXT: .LBB3_1: # %bb7
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: movl $g, %edx
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: .LBB3_5: # %bb19
				; I386-NEXT: subl %ecx, %edx
				; I386-NEXT: movb $-24, (%eax)
				; I386-NEXT: addl $-5, %edx
				; I386-NEXT: movl %edx, 1(%eax)
				; I386-NEXT: movl $5, %eax
				; I386-NEXT: retl
				; I386-NEXT: .LBB3_3:
				; I386-NEXT: xorl %eax, %eax
	; I386-NEXT: retl			; I386-NEXT: retl
	;			;
	; X86-64-LABEL: used_arg:			; X86-64-LABEL: used_arg:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %bb
	; X86-64-NEXT: movl %edi, %eax			; X86-64-NEXT: movl %edi, %eax
	; X86-64-NEXT: xorl %edi, %edi			; X86-64-NEXT: movq pv_ops(,%rax,8), %rax
				; X86-64-NEXT: testq %rax, %rax
				; X86-64-NEXT: je .LBB3_1
				; X86-64-NEXT: # %bb.2: # %bb12
				; X86-64-NEXT: cmpq $h, %rax
				; X86-64-NEXT: je .LBB3_3
				; X86-64-NEXT: # %bb.4: # %bb14
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: jmp .LBB3_5
				; X86-64-NEXT: .LBB3_1: # %bb7
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: movl $g, %eax
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: .LBB3_5: # %bb19
				; X86-64-NEXT: movb $-24, (%rsi)
				; X86-64-NEXT: subl %edx, %eax
				; X86-64-NEXT: addl $-5, %eax
				; X86-64-NEXT: movl %eax, 1(%rsi)
				; X86-64-NEXT: movl $5, %eax
				; X86-64-NEXT: jmp .LBB3_6
				; X86-64-NEXT: .LBB3_3:
				; X86-64-NEXT: xorl %eax, %eax
				; X86-64-NEXT: .LBB3_6: # %bb29
				; X86-64-NEXT: xorl %edx, %edx
				; X86-64-NEXT: xorl %esi, %esi
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
				bb:
	entry:			%i = zext i8 %arg to i64
	ret i32 %x			%i4 = getelementptr ptr, ptr @pv_ops, i64 %i
				%i5 = load ptr, ptr %i4, align 8
				%i6 = icmp eq ptr %i5, null
				br i1 %i6, label %bb7, label %bb12

				bb7: ; preds = %bb
				%i8 = inttoptr i64 %arg2 to ptr
				%i9 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i10 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i8) #3
				%i11 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull @g) #3
				br label %bb19

				bb12: ; preds = %bb
				%i13 = icmp eq ptr %i5, @h
				br i1 %i13, label %bb29, label %bb14

				bb14: ; preds = %bb12
				%i15 = inttoptr i64 %arg2 to ptr
				%i16 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i17 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i15) #3
				%i18 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull %i5) #3
				br label %bb19

				bb19: ; preds = %bb14, %bb7
				%i20 = phi ptr [ %i16, %bb14 ], [ %i9, %bb7 ]
				%i21 = phi ptr [ %i18, %bb14 ], [ %i11, %bb7 ]
				%i22 = phi ptr [ %i17, %bb14 ], [ %i10, %bb7 ]
				store i8 -24, ptr %i20, align 1
				%i23 = ptrtoint ptr %i21 to i64
				%i24 = getelementptr i8, ptr %i22, i64 5
				%i25 = ptrtoint ptr %i24 to i64
				%i26 = sub i64 %i23, %i25
				%i27 = trunc i64 %i26 to i32
				%i28 = getelementptr inbounds %struct.anon.95, ptr %i20, i64 0, i32 1
				store i32 %i27, ptr %i28, align 1
				br label %bb29

				bb29: ; preds = %bb19, %bb12
				%i30 = phi i32 [ 0, %bb12 ], [ 5, %bb19 ]
				ret i32 %i30
	}			}

	define dso_local i32 @used(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="used" {			define dso_local i32 @used(i8 noundef zeroext %arg, ptr noundef writeonly %arg1, i64 noundef %arg2, i32 noundef %arg3) local_unnamed_addr #0 "zero-call-used-regs"="used" {
	; I386-LABEL: used:			; I386-LABEL: used:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %bb
				; I386-NEXT: movl {{[0-9]+}}(%esp), %ecx
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
				; I386-NEXT: movzbl {{[0-9]+}}(%esp), %edx
				; I386-NEXT: movl pv_ops(,%edx,4), %edx
				; I386-NEXT: testl %edx, %edx
				; I386-NEXT: je .LBB4_1
				; I386-NEXT: # %bb.2: # %bb12
				; I386-NEXT: cmpl $h, %edx
				; I386-NEXT: je .LBB4_3
				; I386-NEXT: # %bb.4: # %bb14
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: jmp .LBB4_5
				; I386-NEXT: .LBB4_1: # %bb7
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: movl $g, %edx
				; I386-NEXT: #APP
				; I386-NEXT: #NO_APP
				; I386-NEXT: .LBB4_5: # %bb19
				; I386-NEXT: subl %ecx, %edx
				; I386-NEXT: movb $-24, (%eax)
				; I386-NEXT: addl $-5, %edx
				; I386-NEXT: movl %edx, 1(%eax)
				; I386-NEXT: movl $5, %eax
				; I386-NEXT: jmp .LBB4_6
				; I386-NEXT: .LBB4_3:
				; I386-NEXT: xorl %eax, %eax
				; I386-NEXT: .LBB4_6: # %bb29
				; I386-NEXT: xorl %ecx, %ecx
				; I386-NEXT: xorl %edx, %edx
	; I386-NEXT: retl			; I386-NEXT: retl
	;			;
	; X86-64-LABEL: used:			; X86-64-LABEL: used:
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %bb
	; X86-64-NEXT: movl %edi, %eax			; X86-64-NEXT: movl %edi, %eax
	; X86-64-NEXT: xorl %edi, %edi			; X86-64-NEXT: movq pv_ops(,%rax,8), %rax
				; X86-64-NEXT: testq %rax, %rax
				; X86-64-NEXT: je .LBB4_1
				; X86-64-NEXT: # %bb.2: # %bb12
				; X86-64-NEXT: cmpq $h, %rax
				; X86-64-NEXT: je .LBB4_3
				; X86-64-NEXT: # %bb.4: # %bb14
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: jmp .LBB4_5
				; X86-64-NEXT: .LBB4_1: # %bb7
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: movl $g, %eax
				; X86-64-NEXT: #APP
				; X86-64-NEXT: #NO_APP
				; X86-64-NEXT: .LBB4_5: # %bb19
				; X86-64-NEXT: movb $-24, (%rsi)
				; X86-64-NEXT: subl %edx, %eax
				; X86-64-NEXT: addl $-5, %eax
				; X86-64-NEXT: movl %eax, 1(%rsi)
				; X86-64-NEXT: movl $5, %eax
				; X86-64-NEXT: jmp .LBB4_6
				; X86-64-NEXT: .LBB4_3:
				; X86-64-NEXT: xorl %eax, %eax
				; X86-64-NEXT: .LBB4_6: # %bb29
				; X86-64-NEXT: xorl %edx, %edx
				; X86-64-NEXT: xorl %esi, %esi
	; X86-64-NEXT: retq			; X86-64-NEXT: retq
				bb:
	entry:			%i = zext i8 %arg to i64
	ret i32 %x			%i4 = getelementptr ptr, ptr @pv_ops, i64 %i
				%i5 = load ptr, ptr %i4, align 8
				%i6 = icmp eq ptr %i5, null
				br i1 %i6, label %bb7, label %bb12

				bb7: ; preds = %bb
				%i8 = inttoptr i64 %arg2 to ptr
				%i9 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i10 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i8) #3
				%i11 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull @g) #3
				br label %bb19

				bb12: ; preds = %bb
				%i13 = icmp eq ptr %i5, @h
				br i1 %i13, label %bb29, label %bb14

				bb14: ; preds = %bb12
				%i15 = inttoptr i64 %arg2 to ptr
				%i16 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %arg1) #3
				%i17 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr %i15) #3
				%i18 = tail call ptr asm "", "=r,0,~{dirflag},~{fpsr},~{flags}"(ptr nonnull %i5) #3
				br label %bb19

				bb19: ; preds = %bb14, %bb7
				%i20 = phi ptr [ %i16, %bb14 ], [ %i9, %bb7 ]
				%i21 = phi ptr [ %i18, %bb14 ], [ %i11, %bb7 ]
				%i22 = phi ptr [ %i17, %bb14 ], [ %i10, %bb7 ]
				store i8 -24, ptr %i20, align 1
				%i23 = ptrtoint ptr %i21 to i64
				%i24 = getelementptr i8, ptr %i22, i64 5
				%i25 = ptrtoint ptr %i24 to i64
				%i26 = sub i64 %i23, %i25
				%i27 = trunc i64 %i26 to i32
				%i28 = getelementptr inbounds %struct.anon.95, ptr %i20, i64 0, i32 1
				store i32 %i27, ptr %i28, align 1
				br label %bb29

				bb29: ; preds = %bb19, %bb12
				%i30 = phi i32 [ 0, %bb12 ], [ 5, %bb19 ]
				ret i32 %i30
	}			}

	define dso_local i32 @all_gpr_arg(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="all-gpr-arg" {			define dso_local i32 @all_gpr_arg(i32 returned %x) local_unnamed_addr #0 "zero-call-used-regs"="all-gpr-arg" {
	; I386-LABEL: all_gpr_arg:			; I386-LABEL: all_gpr_arg:
	; I386: # %bb.0: # %entry			; I386: # %bb.0: # %entry
	; I386-NEXT: movl {{[0-9]+}}(%esp), %eax			; I386-NEXT: movl {{[0-9]+}}(%esp), %eax
	; I386-NEXT: xorl %ecx, %ecx			; I386-NEXT: xorl %ecx, %ecx
	; I386-NEXT: xorl %edx, %edx			; I386-NEXT: xorl %edx, %edx
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	; X86-64: # %bb.0: # %entry			; X86-64: # %bb.0: # %entry
	; X86-64-NEXT: xorl %eax, %eax			; X86-64-NEXT: xorl %eax, %eax
	; X86-64-NEXT: retq			; X86-64-NEXT: retq

	entry:			entry:
	ret i32 0			ret i32 0
	}			}

	attributes #0 = { mustprogress nofree norecurse nosync nounwind readnone uwtable willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }			attributes #0 = { mustprogress nofree norecurse nosync nounwind readnone willreturn "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
	attributes #1 = { nofree norecurse nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }			attributes #1 = { nofree norecurse nounwind "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
				attributes #2 = { fn_ret_thunk_extern noredzone nounwind null_pointer_is_valid sspstrong "fentry-call"="true" "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+retpoline-external-thunk,+retpoline-indirect-branches,+retpoline-indirect-calls,-3dnow,-3dnowa,-aes,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512fp16,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vp2intersect,-avx512vpopcntdq,-avxvnni,-f16c,-fma,-fma4,-gfni,-kl,-mmx,-pclmul,-sha,-sse,-sse2,-sse3,-sse4.1,-sse4.2,-sse4a,-ssse3,-vaes,-vpclmulqdq,-widekl,-x87,-xop" "tune-cpu"="generic" "warn-stack-size"="2048" }