This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/
-
CodeGen/
-
CalcSpillWeights.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
6/6
callbr-asm-regalloc.mir

Differential D77849

[calcspillweights] mark LiveIntervals from INLINEASM_BR defs as not spillable
Needs ReviewPublic

Authored by nickdesaulniers on Apr 9 2020, 6:54 PM.

Download Raw Diff

Details

Reviewers

void
efriedma
craig.topper
wmi
kparzysz
qcolombet
myatsina
arsenm

Summary

Typically, most of the existing callbr tests are in the form:

  callbr ...
    to label %fallthrough [label %indirect_target]
...
fallthrough:
...
indirect_target:
...

Optimizations such as loop-sink and mergeicmps have been observed to
produce callbr's in the form:

  callbr ...
    to label %fallthrough [label %indirect_target]
...
indirect_target:
...
fallthrough:
...

Note the order of the BasicBlocks is reversed. This generally isn't a
problem, until we encounter register pressure in this non-canonical
form.

Under register pressure, it has been observed that the greedy
register allocator will spill the registers defined in INLINEASM_BR
MachineInstrs that whose LiveIntervals cross into the copy blocks
split during pre-RA ISel.

This is bad. Having a register spill (ie. a COPY) post a Terminal
MachineInstr is a violation of the MachineVerifiers invariant that
subsequent Terminal MachineInstrs in a MachineBasicBlock after the first
must also be Terminal.

Further, this leads to a cascaded failure where much later in
branch-folder calls to TargetInstructionInfo::analyzeBranch() fail to
properly classify a MachineBasicBlock. This triggers calls to
MachineBasicBlock::CorrectCFGEdges() which removes the indirect
successors of the MachineBasicBlock containing the INDIRECTASM_BR.
Finally, branch-folder removes the INDIRECTASM_BR indirect target
MachineBasicBlocks, as they appear to have no predecessors. This results
in references to labels that have been removed.

We should try harder to force these outputs to stay in registers and not
be spilled.

Some alternatives considered:

We could teach loop-sink and mergeicmps to not create "non-canonical" callbrs. I prefer to allow LLVM IR to be flexible; I don't think we should canonicalize an order to the BasicBlocks.
We could "canonicalize" the MachineBasicBlocks during pre-RA ISel, such that the MachineBasicBlock following the split block following a block terminated by an INLINEASM_BR was the fallthrough successor. So three MachineBasicBlocks that fallthrough in order. I prefer to allow arbitrary ordering of MachineBasicBlocks leading up to regalloc, which makes regalloc able to handle more generic cases.

It might seem bad to take away freedom from the register allocator, but
in this case and especially when in the canonical form, the
LiveIntervals are quite small, and we generally don't want a register
spill between the INLINEASM_BR and it's first COPY to GPR. Later passes
typically re-order the fallthrough case anyways.

A helpful command I used to observe regalloc in action for debugging
this:

llc -debug-only=calcspillweights -debug-only=spill-code-placement \
  -debug-only=greedy -debug-only=regalloc -regalloc=greedy \
  -stop-after=greedy -verify-regalloc -verify-machineinstrs \
  llvm/test/CodeGen/X86/callbr-asm-regalloc.ll

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.Apr 9 2020, 6:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 9 2020, 6:54 PM

Herald added subscribers: llvm-commits, hiraditya, qcolombet, MatzeB. · View Herald Transcript

Harbormaster failed remote builds in B52610: Diff 256475!Apr 9 2020, 7:16 PM

If I'm understanding correctly, the issue is that the INLINEASM_BR is producing a virtual register, we try to spill that register, and the spill is inserted in the same block as the INLINEASM_BR?

I can see two possible approaches here, at a high level:

Forbid directly spilling registers produced by an INLINEASM_BR; instead, force a live interval split, and then we can spill the split interval.
Allow spilling registers produced by an INLINEASM_BR, but teach the spill insertion code to insert the spill into the fallthrough succecssor, as opposed to the INLINEASM_BR block itself.

This patch is sort of along the lines of (1), but I'm not sure this is enough to make the interval split reliably (in which case, the allocator could potentially run out of registers). I haven't spent enough time with the spill weight code to say for sure.

nickdesaulniers added a reviewer: wmi.Apr 10 2020, 11:18 AM

efriedma added reviewers: kparzysz, arsenm, qcolombet.Apr 10 2020, 3:45 PM

Herald added a subscriber: wdng. · View Herald TranscriptApr 10 2020, 3:45 PM

In D77849#1973733, @efriedma wrote:

If I'm understanding correctly, the issue is that the INLINEASM_BR is producing a virtual register, we try to spill that register, and the spill is inserted in the same block as the INLINEASM_BR?

I can see two possible approaches here, at a high level:

Forbid directly spilling registers produced by an INLINEASM_BR; instead, force a live interval split, and then we can spill the split interval.

Allow spilling registers produced by an INLINEASM_BR, but teach the spill insertion code to insert the spill into the fallthrough succecssor, as opposed to the INLINEASM_BR block itself.

Re: 2: it looks like InlineSpiller::insertSpill might be the most appropriate place to do that, as it just picks the next MachineInstr to insert the spill before, even in that case it violates the invariant that only Terminal MachineInstr end a MachineBasicBlock (no non-Terminal MachineInstrs after the first Terminal MachineInstr). We could check that std::next(MI)->getOpcode == TargetOpcode::INLINEASM_BR, or !std::next(MI).isTerminator(), though I'm not sure what to do in the latter case.

That would do less work for the general case; the current approach checks the opcode of every MachineInstr that is the source of a LiveInterval, which there are generally a lot. This would only check for the special case when we decided we needed to spill (still occurs frequently for large MachineFunctions, but a lot less than the number of LiveIntervals. Let me see if I can update the patch to do that.

Re: 1: I totally do not understand. :( Are you referring to LiveIntervals::splitSeparateComponents?

This patch is sort of along the lines of (1), but I'm not sure this is enough to make the interval split reliably (in which case, the allocator could potentially run out of registers). I haven't spent enough time with the spill weight code to say for sure.

Is there someone more active in this area you could cc for review? This bug is blocking the use of asm goto w/ outputs in the Linux kernel.

You cannot rely on marking ranges as unspillable. This isn't an option with fast regalloc.

llvm/test/CodeGen/X86/callbr-asm-regalloc.ll
10 ↗	(On Diff #256475)	This test is way too complicated. This really needs a MIR test

This revision now requires changes to proceed.Apr 10 2020, 4:25 PM

nickdesaulniers added inline comments.Apr 10 2020, 4:52 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.ll
10 ↗	(On Diff #256475)	That's fair. It would be better to more precisely represent and test the issue, which would also aid review. And it's time I learn to write some MIR tests. For dumping MIR, do I just do `llc -print-after=finalize-isel` and write that to a file? Is there an existing MIR test that you recommend I refer to in order to understand testing for cases where specific registers need to be or not to be spilled?

nickdesaulniers added inline comments.Apr 10 2020, 4:56 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.ll
10 ↗	(On Diff #256475)	Nevermind, I found https://llvm.org/docs/MIRLangRef.html#mir-testing-guide. Looks like I have some reading to do.

convert .ll test to .mir test, though it fails during YAML parsing...?
will simplify test next week, and remove .ll test.

We could check that std::next(MI)->getOpcode == TargetOpcode::INLINEASM_BR, or !std::next(MI).isTerminator(), though I'm not sure what to do in the latter case.

Probably you'd want to check whether the instruction that produces the value is a terminator (which at the moment, should imply it's an INLINEASM_BR).

Re: 1: I totally do not understand. :( Are you referring to LiveIntervals::splitSeparateComponents?

I was going to try to explain a little more, but then I realized I'm not sure I understand how this would work in practice, so probably I just shouldn't try.

Harbormaster failed remote builds in B52764: Diff 256711!Apr 10 2020, 5:47 PM

Using LiveIntervals in any way is not an option to solve this. They are not available in all fast regalloc and other allocators. The constraints need to be expressed with copies that are terminators, and splitting blocks when necessary

nickdesaulniers added inline comments.Apr 10 2020, 6:32 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	huh, that's not yaml...

arsenm added inline comments.Apr 14 2020, 12:30 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	The debug printing format that -print-after gives you is not the same MIR output as -stop-after

update MIR test to something that runs, still need to clean it up

In D77849#1973733, @efriedma wrote:

Forbid directly spilling registers produced by an INLINEASM_BR; instead, force a live interval split, and then we can spill the split interval.

In D77849#1975590, @arsenm wrote:

Using LiveIntervals in any way is not an option to solve this. They are not available in all fast regalloc and other allocators. The constraints need to be expressed with copies that are terminators, and splitting blocks when necessary

Right, but I'll bet that when FastAlloc needs to spill, it also uses InlineSpiller to do so. I want to play with or prototype to see if we can make InlineSpiller not violate MachineVerifier invariants by spilling post terminal instructions.

Some other notes I had jotted down, reviewers please speak up if any of these seem immediately not worth pursuing:

branch-folder should probably assert when removing a MachineBasicBlock that has its address taken. That's twice now we've seen this produce completely garbage code. I don't think we should prevent it, as both times its been a cascading failure, so I think an assert is more appropriate.
assert in inline-spiller if it places a spill after a terminator MachineInstr. This will wind up breaking the MachineVerifier constraint about non terminal instructions post terminators anyways. We might even be able to solve this problem there, maybe.
"canonicalize" the fallthrough case during our block splitting special case in instruction scheduling to put the fallthrough immediately after the INLINEASM_BR's parent MachineBasicBlock (as in this test case, and not any of the existing ones). I suspect this will help reduce overlapping LiveRanges and help reduce register pressure.

And of course https://reviews.llvm.org/D75098.

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	Ok, I think I have it updated. Seems that regalloc is actually a grouping of passes that run, so there's no explicit "Greedy" in `-print-after-all` and you have to dump the MIR before any of the related regalloc passes (ie. after `twoaddressinstruction` for x86). MIR/llc is different from LLVM IR/opt in that passes generally are order dependent, so you must dump MIR in the precise form that a given pass expects. Someone please correct me if I'm wrong in my understanding of this point. I suppose I can first clean up this test by removing the LLVM IR embedded within. We need a test case for which there are more live registers in use than there are physical GPRs to allocate, and where the register allocator chooses to spill the liveouts from the `INLINEASM_BR`. @arsenm , I'm not sure how much more I can simplify the test, at least in an automated manner.

nickdesaulniers added inline comments.Apr 14 2020, 2:13 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	huh, seems I can't eliminate the LLVM IR from the MIR, as otherwise the MIR becomes invalid due to references of missing globals defined in the LLVM IR. That's uh, kind of a mess, and feels like LLVM IR is not discarded or even discardable when lowered to MIR.

drop LLVM IR test, superseded by MIR test.

nickdesaulniers marked 3 inline comments as done.Apr 14 2020, 2:16 PM

branch-folder should probably assert when removing a MachineBasicBlock that has its address taken.

My convcern here is that we aren't very aggressive about dropping blockaddresses at the IR level, so we could end up with an "address-taken" block that doesn't actually have any callbr/indirectbr predecessors.

nickdesaulniers added inline comments.Apr 14 2020, 2:29 PM

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	sigh, and this test passes with or without my change to `CalcSpillWeights`, though `llc -O2 -verify-machineinstrs` on the .ll fails...I'm missing something that's preventing me from restarting the failure in llc. Will keep digging.

Harbormaster completed remote builds in B53231: Diff 257480.Apr 14 2020, 2:39 PM

nickdesaulniers marked 6 inline comments as done.Apr 14 2020, 2:40 PM

nickdesaulniers added inline comments.

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir
6	got it, so: regalloc is a group, you can see delineations of the groups via `-verify-machineinstrs`, which will run `MachineVerifier` between groups of passes, not each individual pass. This helped me spot that I needed to `-stop-after=machine-scheduler`, not `-stop-after=twoaddressinstruction`. passing `-O2` to `llc` will influence codegen, and make this particular issue go away. Posting updated MIR with those 2 corrected.

redump MIR test, use -O2 and -stop-after=instruction-scheduler

For simplifying the test, if you just want to force everything to spill, you can write asm volatile("":::"rax", "rbx", "rcx", "rdx", "rsi", "rdi", "r8", "r9", "r10", "r11", "r12", "r13", "r14", "r15");

Harbormaster completed remote builds in B53241: Diff 257498.Apr 14 2020, 3:45 PM

Harbormaster completed remote builds in B53244: Diff 257505.Apr 14 2020, 4:20 PM

In D77849#1975590, @arsenm wrote:

Using LiveIntervals in any way is not an option to solve this. They are not available in all fast regalloc and other allocators. The constraints need to be expressed with copies that are terminators, and splitting blocks when necessary

As a sanity check, I patched in https://reviews.llvm.org/D75098 and I can still reproduce this failure. I think the problem that TCOPY solves, and the problem I'm trying to solve are orthogonal? So I don't think I'll just be able to solve the case of spilled live outs from INLINEASM_BR via TCOPY.

nickdesaulniers mentioned this in D78234: [BranchFolding] assert when removing INLINEASM_BR indirect targets.Apr 15 2020, 12:40 PM

nickdesaulniers mentioned this in D78520: [InlineSpiller] simplify insertReload() NFC.Apr 20 2020, 3:23 PM

nickdesaulniers mentioned this in rGd3fdafae0630: [InlineSpiller] simplify insertReload() NFC.Apr 21 2020, 8:37 AM

nickdesaulniers added a reviewer: myatsina.Apr 27 2020, 2:23 PM

Ah, I wish I watched https://www.youtube.com/watch?v=hf8kD-eAaxg before looking into this! (Todo list includes https://www.youtube.com/watch?v=ktXFOlyEOtY and https://www.youtube.com/watch?v=IK8TMJf3G6U and https://github.com/nael8r/How-To-Write-An-LLVM-Register-Allocator/blob/master/HowToWriteAnLLVMRegisterAllocator.rst).

I tried canonicallizing the callbr fallthrough, but it makes no difference. I suppose that makes sense, given that the live range doesn't change.

https://reviews.llvm.org/D78586 is also pointing out another issue with callbr return values.

Re: https://reviews.llvm.org/D78586

it looks like livevars marks the physical reg defs in the INLINEASM_BR as dead, because its analysis is intrablock (as stated in a comment at the top of llvm/lib/CodeGen/LiveVariables.cpp), so it doesn't see the uses in the newly split block. I wonder if for INLINEASM_BR, if we extend the analysis so that the split block was analyzed, if this would work? A hacked up patch that marks all registers live post INLINEASM_BR post livevars allows the test case in D78586 to pass, but I haven't looked at the codegen to see if anything terrible happened, or what other invariants were violated. (maybe D78166 or D78234). Actually, generated code looks good. Let me see if that's feasible when cleaned up, tomorrow.

The issue with physical register liveness is what D75098 was trying to solve. The testcase here involves a callbr that produces a virtual register, which is a slightly different problem, I think.

In D77849#1981966, @efriedma wrote:

branch-folder should probably assert when removing a MachineBasicBlock that has its address taken.

My convcern here is that we aren't very aggressive about dropping blockaddresses at the IR level, so we could end up with an "address-taken" block that doesn't actually have any callbr/indirectbr predecessors.

If a block has its address taken, whether it's used in a callbr or some other instruction, is it ever safe to remove the block if the blockaddress instruction referencing it still exists?

Also, if some degenerate were to write some heresy like this, would LLVM be able to handle it?

  %ba = blockaddress label %indirect
  store %ba, %some_stack_slot

...

  %ba_reload = load %some_stack_slot
  callbr ... (%ba_reload)

If a block has its address taken, whether it's used in a callbr or some other instruction, is it ever safe to remove the block if the blockaddress instruction referencing it still exists?

The way that indirectbr works is that blockaddress passes out an opaque value, and that address is supposed to be passed to the corresponding indirectbr. It doesn't really matter where the blockaddress is computed; what matters is that there's an indirectbr that branches to the block mentioned in the blockaddress constant.

We don't promise that a blockaddress is usable for any other purpose. If there is no indirectbr to pass the blockaddress to, there isn't anything constraining the opaque value, so it can be transformed to an arbitrary integer.

arsenm resigned from this revision.Jun 29 2021, 3:04 PM

Herald added a subscriber: pengfei. · View Herald TranscriptJun 29 2021, 3:04 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

CalcSpillWeights.cpp

5 lines

test/

CodeGen/

X86/

callbr-asm-regalloc.mir

312 lines

Diff 257505

llvm/lib/CodeGen/CalcSpillWeights.cpp

Show First 20 Lines • Show All 203 Lines • ▼ Show 20 Lines	float VirtRegAuxInfo::weightCalcHelper(LiveInterval &li, SlotIndex *start,
std::set<CopyHint> CopyHints;		std::set<CopyHint> CopyHints;

for (MachineRegisterInfo::reg_instr_nodbg_iterator		for (MachineRegisterInfo::reg_instr_nodbg_iterator
I = mri.reg_instr_nodbg_begin(li.reg),		I = mri.reg_instr_nodbg_begin(li.reg),
E = mri.reg_instr_nodbg_end();		E = mri.reg_instr_nodbg_end();
I != E;) {		I != E;) {
MachineInstr mi = &(I++);		MachineInstr mi = &(I++);

		if (mi->getOpcode() == TargetOpcode::INLINEASM_BR) {
		li.markNotSpillable();
		return -1.0;
		}

// For local split artifacts, we are interested only in instructions between		// For local split artifacts, we are interested only in instructions between
// the expected start and end of the range.		// the expected start and end of the range.
SlotIndex si = LIS.getInstructionIndex(*mi);		SlotIndex si = LIS.getInstructionIndex(*mi);
if (localSplitArtifact && ((si < start) \|\| (si > end)))		if (localSplitArtifact && ((si < start) \|\| (si > end)))
continue;		continue;

numInstr++;		numInstr++;
if (mi->isIdentityCopy() \|\| mi->isImplicitDef())		if (mi->isIdentityCopy() \|\| mi->isImplicitDef())
▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/callbr-asm-regalloc.mir

This file was added.

				# RUN: llc -run-pass=greedy -regalloc=greedy -verify-regalloc %s
				# Dumped via:
				# $ llc -stop-after=machine-scheduler -simplify-mir \
				# -o callbr-asm-regalloc.mir llvm/test/CodeGen/X86/callbr-asm-regalloc.ll
				--- \|
				; ModuleID = 'callbr-asm-regalloc.ll'
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions huh, that's not yaml... nickdesaulniers: huh, that's not yaml...
				arsenmUnsubmitted Done Reply Inline Actions The debug printing format that -print-after gives you is not the same MIR output as -stop-after arsenm: The debug printing format that -print-after gives you is not the same MIR output as -stop-after
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions Ok, I think I have it updated. Seems that regalloc is actually a grouping of passes that run, so there's no explicit "Greedy" in `-print-after-all` and you have to dump the MIR before any of the related regalloc passes (ie. after `twoaddressinstruction` for x86). MIR/llc is different from LLVM IR/opt in that passes generally are order dependent, so you must dump MIR in the precise form that a given pass expects. Someone please correct me if I'm wrong in my understanding of this point. I suppose I can first clean up this test by removing the LLVM IR embedded within. We need a test case for which there are more live registers in use than there are physical GPRs to allocate, and where the register allocator chooses to spill the liveouts from the `INLINEASM_BR`. @arsenm , I'm not sure how much more I can simplify the test, at least in an automated manner. nickdesaulniers: Ok, I think I have it updated. Seems that regalloc is actually a grouping of passes that run…
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions huh, seems I can't eliminate the LLVM IR from the MIR, as otherwise the MIR becomes invalid due to references of missing globals defined in the LLVM IR. That's uh, kind of a mess, and feels like LLVM IR is not discarded or even discardable when lowered to MIR. nickdesaulniers: huh, seems I can't eliminate the LLVM IR from the MIR, as otherwise the MIR becomes invalid due…
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions sigh, and this test passes with or without my change to `CalcSpillWeights`, though `llc -O2 -verify-machineinstrs` on the .ll fails...I'm missing something that's preventing me from restarting the failure in llc. Will keep digging. nickdesaulniers: sigh, and this test passes with or without my change to `CalcSpillWeights`, though `llc -O2…
				nickdesaulniersAuthorUnsubmitted Done Reply Inline Actions got it, so: regalloc is a group, you can see delineations of the groups via `-verify-machineinstrs`, which will run `MachineVerifier` between groups of passes, not each individual pass. This helped me spot that I needed to `-stop-after=machine-scheduler`, not `-stop-after=twoaddressinstruction`. passing `-O2` to `llc` will influence codegen, and make this particular issue go away. Posting updated MIR with those 2 corrected. nickdesaulniers: got it, so: 1. regalloc is a group, you can see delineations of the groups via `-verify…
				source_filename = "callbr-asm-regalloc.ll"
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

				%struct.d = type { i8* }
				%struct.b = type {}

				@j = dso_local local_unnamed_addr global i32 0, align 4
				@i = dso_local local_unnamed_addr global i32 0, align 4
				@h = dso_local local_unnamed_addr global i32 0, align 4
				@g = dso_local local_unnamed_addr global i32 0, align 4

				define dso_local i32 @cmsghdr_from_user_compat_to_kern(%struct.d* %0) {
				%const = bitcast i64 8589934590 to i64
				br label %2

				2: ; preds = %11, %1
				%3 = phi i32 [ %16, %11 ], [ undef, %1 ]
				%4 = load i32, i32* @j, align 4
				%5 = zext i32 %4 to i64
				%6 = load i32, i32* @i, align 4
				%7 = icmp eq i32 %6, 0
				br i1 %7, label %8, label %46

				8: ; preds = %2
				%9 = trunc i64 %5 to i32
				%10 = icmp eq i32 %9, 0
				br i1 %10, label %46, label %11

				11: ; preds = %8
				%12 = trunc i64 %5 to i32
				%13 = add nuw nsw i64 %5, 1
				%14 = and i64 %13, %const
				%15 = trunc i64 %14 to i32
				store i32 %15, i32* @h, align 4
				%16 = tail call i32 @f(%struct.d* %0, i32 %3, i32 %12)
				%17 = icmp eq i32 %16, 0
				br i1 %17, label %18, label %2

				18: ; preds = %11
				%19 = icmp eq i64 %14, 0
				br i1 %19, label %20, label %46

				20: ; preds = %18
				%21 = icmp eq %struct.d* %0, null
				br i1 %21, label %46, label %22

				22: ; preds = %20
				%23 = bitcast %struct.d* %0 to i64*
				%24 = load i64, i64* %23, align 8
				%25 = trunc i64 %24 to i32
				%26 = icmp eq i32 %25, 0
				br i1 %26, label %46, label %.preheader, !prof !0

				.preheader: ; preds = %22
				br label %27

				27: ; preds = %.preheader, %40
				%28 = phi i32 [ %44, %40 ], [ %25, %.preheader ]
				%29 = phi i64 [ %43, %40 ], [ 0, %.preheader ]
				%30 = callbr i32 asm "1:\09mov $1,$0\0A .pushsection \22__ex_table\22,\22a\22\0A .long 1b - .\0A .long ${2:l} - .\0A .long 0 - .\0A .popsection\0A", "=r,m,X,~{dirflag},~{fpsr},~{flags}"(%struct.b null, i8* blockaddress(@cmsghdr_from_user_compat_to_kern, %31))
				to label %33 [label %31]

				31: ; preds = %27
				%32 = tail call i32 @a()
				br label %46

				33: ; preds = %27
				%34 = zext i32 %30 to i64
				%35 = shl i64 %29, 32
				%36 = ashr exact i64 %35, 32
				%37 = inttoptr i64 %36 to i32*
				%38 = tail call i32 @c(i32* %37, i64 %34)
				%39 = icmp eq i32 %38, 0
				br i1 %39, label %40, label %46

				40: ; preds = %33
				%41 = add nuw nsw i64 %34, 8
				%42 = and i64 %41, %const
				%43 = add nsw i64 %42, %36
				%44 = tail call i32 @f(%struct.d* nonnull %0, i32 %28, i32 %30)
				%45 = icmp eq i32 %44, 0
				br i1 %45, label %46, label %27, !prof !0

				46: ; preds = %8, %2, %40, %33, %31, %22, %20, %18
				%47 = phi i32 [ -22, %18 ], [ undef, %31 ], [ undef, %22 ], [ undef, %20 ], [ undef, %33 ], [ undef, %40 ], [ 4, %2 ], [ -22, %8 ]
				ret i32 %47
				}

				declare dso_local i32 @f(%struct.d*, i32, i32) local_unnamed_addr

				declare dso_local i32 @a() local_unnamed_addr

				declare dso_local i32 @c(i32*, i64) local_unnamed_addr

				define dso_local i32 @put_cmsg_compat() local_unnamed_addr {
				ret i32 undef
				}

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #0

				attributes #0 = { nounwind }

				!0 = !{!"branch_weights", i32 2146410443, i32 1073205}

				...
				---
				name: cmsghdr_from_user_compat_to_kern
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr64 }
				- { id: 1, class: gr32 }
				- { id: 2, class: gr64_with_sub_8bit }
				- { id: 3, class: gr64_with_sub_8bit }
				- { id: 4, class: gr32 }
				- { id: 5, class: gr32 }
				- { id: 6, class: gr32 }
				- { id: 7, class: gr64 }
				- { id: 8, class: gr32 }
				- { id: 9, class: gr64_with_sub_8bit }
				- { id: 10, class: gr64 }
				- { id: 11, class: gr64 }
				- { id: 12, class: gr32 }
				- { id: 13, class: gr32 }
				- { id: 14, class: gr64 }
				- { id: 15, class: gr32 }
				- { id: 16, class: gr32 }
				- { id: 17, class: gr32 }
				- { id: 18, class: gr32 }
				- { id: 19, class: gr32 }
				- { id: 20, class: gr32 }
				- { id: 21, class: gr64 }
				- { id: 22, class: gr32 }
				- { id: 23, class: gr32 }
				- { id: 24, class: gr32 }
				- { id: 25, class: gr32 }
				- { id: 26, class: gr32 }
				- { id: 27, class: gr64_with_sub_8bit }
				- { id: 28, class: gr32 }
				- { id: 29, class: gr32 }
				- { id: 30, class: gr32 }
				- { id: 31, class: gr32 }
				- { id: 32, class: gr32 }
				- { id: 33, class: gr32 }
				- { id: 34, class: gr32 }
				- { id: 35, class: gr32 }
				- { id: 36, class: gr32 }
				- { id: 37, class: gr64_with_sub_8bit }
				- { id: 38, class: gr64_with_sub_8bit }
				- { id: 39, class: gr32 }
				- { id: 40, class: gr32 }
				- { id: 41, class: gr32 }
				- { id: 42, class: gr64_with_sub_8bit }
				- { id: 43, class: gr32 }
				liveins:
				- { reg: '$rdi', virtual-reg: '%14' }
				frameInfo:
				maxAlignment: 1
				hasCalls: true
				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.1):
				liveins: $rdi

				%14:gr64 = COPY $rdi
				%0:gr64 = MOV64ri 8589934590
				%1:gr32 = IMPLICIT_DEF

				bb.1 (%ir-block.2):
				successors: %bb.3(0x7c000000), %bb.2(0x04000000)

				CMP32mi8 $rip, 1, $noreg, @i, $noreg, 0, implicit-def $eflags :: (dereferenceable load 4 from @i)
				JCC_1 %bb.3, 4, implicit killed $eflags

				bb.2:
				%43:gr32 = MOV32ri 4
				JMP_1 %bb.15

				bb.3 (%ir-block.8):
				successors: %bb.16(0x04000000), %bb.4(0x7c000000)

				undef %2.sub_32bit:gr64_with_sub_8bit = MOV32rm $rip, 1, $noreg, @j, $noreg :: (dereferenceable load 4 from @j)
				%43:gr32 = MOV32ri -22
				TEST32rr %2.sub_32bit, %2.sub_32bit, implicit-def $eflags
				JCC_1 %bb.4, 5, implicit killed $eflags

				bb.16:
				JMP_1 %bb.15

				bb.4 (%ir-block.11):
				successors: %bb.5(0x04000000), %bb.1(0x7c000000)

				%3:gr64_with_sub_8bit = COPY %2
				%3:gr64_with_sub_8bit = nuw nsw INC64r %3, implicit-def dead $eflags
				%3:gr64_with_sub_8bit = AND64rr %3, %0, implicit-def dead $eflags
				MOV32mr $rip, 1, $noreg, @h, $noreg, %3.sub_32bit :: (store 4 into @h)
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$rdi = COPY %14
				$esi = COPY %1
				$edx = COPY %2.sub_32bit
				CALL64pcrel32 @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit killed $esi, implicit killed $edx, implicit-def $rsp, implicit-def $ssp, implicit-def $eax
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				%1:gr32 = COPY killed $eax
				TEST32rr %1, %1, implicit-def $eflags
				JCC_1 %bb.1, 5, implicit killed $eflags
				JMP_1 %bb.5

				bb.5 (%ir-block.18):
				successors: %bb.6(0x30000000), %bb.15(0x50000000)

				TEST64rr %3, %3, implicit-def $eflags
				JCC_1 %bb.15, 5, implicit killed $eflags
				JMP_1 %bb.6

				bb.6 (%ir-block.20):
				successors: %bb.7(0x30000000), %bb.8(0x50000000)

				TEST64rr %14, %14, implicit-def $eflags
				JCC_1 %bb.8, 5, implicit killed $eflags

				bb.7:
				%43:gr32 = IMPLICIT_DEF
				JMP_1 %bb.15

				bb.8 (%ir-block.22):
				successors: %bb.15(0x7fef9fcb), %bb.9(0x00106035)

				%41:gr32 = MOV32rm %14, 1, $noreg, 0, $noreg :: (load 4 from %ir.23, align 8)
				TEST32rr %41, %41, implicit-def $eflags
				%43:gr32 = IMPLICIT_DEF
				JCC_1 %bb.15, 4, implicit killed $eflags
				JMP_1 %bb.9

				bb.9..preheader:
				undef %42.sub_32bit:gr64_with_sub_8bit = MOV32r0 implicit-def dead $eflags

				bb.10 (%ir-block.27):
				successors: %bb.12(0x00000000), %bb.11(0x80000000)

				INLINEASM_BR &"1:\09mov $1,$0\0A .pushsection \22__ex_table\22,\22a\22\0A .long 1b - .\0A .long ${2:l} - .\0A .long 0 - .\0A .popsection\0A", 8, 2228234, def %8, 196654, $noreg, 1, $noreg, 0, $noreg, 13, blockaddress(@cmsghdr_from_user_compat_to_kern, %ir-block.31), 12, implicit-def dead early-clobber $df, 12, implicit-def early-clobber $fpsw, 12, implicit-def dead early-clobber $eflags

				bb.11 (%ir-block.27):
				JMP_1 %bb.13

				bb.12 (%ir-block.31, address-taken):
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				CALL64pcrel32 @a, csr_64, implicit $rsp, implicit $ssp, implicit-def $rsp, implicit-def $ssp, implicit-def dead $eax
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				%43:gr32 = IMPLICIT_DEF
				JMP_1 %bb.15

				bb.13 (%ir-block.33):
				successors: %bb.14(0x7c000000), %bb.15(0x04000000)

				undef %38.sub_32bit:gr64_with_sub_8bit = MOV32rr %8
				%10:gr64 = MOVSX64rr32 %42.sub_32bit
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$rdi = COPY %10
				$rsi = COPY %38
				CALL64pcrel32 @c, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit killed $rsi, implicit-def $rsp, implicit-def $ssp, implicit-def $eax
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				%35:gr32 = COPY killed $eax
				TEST32rr %35, %35, implicit-def $eflags
				%43:gr32 = IMPLICIT_DEF
				JCC_1 %bb.15, 5, implicit killed $eflags
				JMP_1 %bb.14

				bb.14 (%ir-block.40):
				successors: %bb.15(0x7fef9fcb), %bb.10(0x00106035)

				%38:gr64_with_sub_8bit = nuw nsw ADD64ri8 %38, 8, implicit-def dead $eflags
				%38:gr64_with_sub_8bit = AND64rr %38, %0, implicit-def dead $eflags
				%38:gr64_with_sub_8bit = nsw ADD64rr %38, %10, implicit-def dead $eflags
				ADJCALLSTACKDOWN64 0, 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				$rdi = COPY %14
				$esi = COPY %41
				$edx = COPY %8
				CALL64pcrel32 @f, csr_64, implicit $rsp, implicit $ssp, implicit $rdi, implicit killed $esi, implicit killed $edx, implicit-def $rsp, implicit-def $ssp, implicit-def $eax
				ADJCALLSTACKUP64 0, 0, implicit-def dead $rsp, implicit-def dead $eflags, implicit-def dead $ssp, implicit $rsp, implicit $ssp
				%41:gr32 = COPY killed $eax
				TEST32rr %41, %41, implicit-def $eflags
				%42:gr64_with_sub_8bit = COPY %38
				%43:gr32 = IMPLICIT_DEF
				JCC_1 %bb.10, 5, implicit killed $eflags
				JMP_1 %bb.15

				bb.15 (%ir-block.46):
				$eax = COPY %43
				RET 0, killed $eax

				...
				---
				name: put_cmsg_compat
				alignment: 16
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr32 }
				frameInfo:
				maxAlignment: 1
				machineFunctionInfo: {}
				body: \|
				bb.0 (%ir-block.0):
				RET 0, undef $eax

				...