This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/
3
MachineScheduler.cpp
-
Target/X86/
-
X86/
8
X86RegisterInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
scheduler-asm-moves.mir

Differential D124308

[MachineScheduler] exclude INLINEASM from schedule when it would increase register pressure
AbandonedPublic

Authored by nickdesaulniers on Apr 22 2022, 3:22 PM.

Download Raw Diff

Details

Reviewers

pengfei
craig.topper
arsenm
lebedev.ri
qcolombet
MatzeB

Summary

INLINEASM with a large number of operands requires delicate
pre-register-allocation scheduling, so as not to exhaust the number of
physical registers allocatable to the INLINEASM.

Sinking a COPY of a PhysReg to a VirtReg closer to its use is
problematic when sunk past an INLINEASM that requires Physreg's of the
same register class. Doing so extends the LiveRange of the Physregs in a
way that register allocation may fail to allocate enough registers for
the inline asm, resulting in compile time failures for inline asm
statements that have many operands.

When we encounter and INLINEASM whose number of operands of any
particular TargetRegisterClass would be above the register pressure
limit of a given MachineFunction, split the scheduling boundary at the
INLINEASM.

Fixes: https://github.com/llvm/llvm-project/issues/41914

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nickdesaulniers created this revision.Apr 22 2022, 3:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 3:22 PM

Herald added subscribers: StephenFan, ecnelises, javed.absar and 2 others. · View Herald Transcript

nickdesaulniers requested review of this revision.Apr 22 2022, 3:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 3:22 PM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

nickdesaulniers added child revisions: D124307: make getRegPressureLimit arg const NFC, D122348: [x86][scheduler] Add MIR test for 41914.Apr 22 2022, 3:23 PM

Harbormaster completed remote builds in B160973: Diff 424633.Apr 22 2022, 3:23 PM

Note to reviewers, I'd like to land this AND https://reviews.llvm.org/D122350 (either of which both resolve my issue observed in this MIR test case).

nickdesaulniers added a reviewer: qcolombet.Apr 22 2022, 3:25 PM

nickdesaulniers added a subscriber: efriedma.

One alternative approach I can think of is rather than shrink the schedule region which is pretty pessimistic (the INLINEASM could have lots of operands of a particular TargetRegisterClass, but the only COPYs the would benefit from rescheduling are of a different TargetRegisterClass); I guess during moving of an instruction, we could check that we weren't moving COPY of a PhysReg past an INLINEASM with an operand of the same TargetRegisterClass would not be permitted if the number of live phyregs at the INLINEASM minus the physregs consumed by the INLINEASM.

I'm not sure yet if what's the cleanest place to handle such an exceptional case and if that's even the best approach.

The change LG, though I'm not sure if changing getRegPressureLimit affects existing code.

llvm/lib/Target/X86/X86RegisterInfo.cpp
266–271	Should we change them to `(Is64Bit ? 12 : 4) - FPDiff` ?
275	I wonder why return 4 for it.

This revision is now accepted and ready to land.Apr 23 2022, 4:11 AM

craig.topper added inline comments.Apr 23 2022, 1:29 PM

llvm/lib/CodeGen/MachineScheduler.cpp
473	reschduling -> rescheduling

nickdesaulniers added inline comments.Apr 25 2022, 1:25 PM

llvm/lib/Target/X86/X86RegisterInfo.cpp
266–271	Yeah, I think so. Though I'm curious about the use of 4 in the first place. i386 has: eax ebx ecx edx esi edi ebp esp eip As GPRs. eip and esp aren't allocatable, and ebp is the condition `FPDiff` here. Why 4? I would have guessed 6 for the first 6 (for `-m32`)? Perhaps 4 corresponds to `GR32_ABCD`, for some reason? So should the cases used be `GR32_ABCDRegClassID` rather than `GR8RegClassID`? Or should it be `GR8RegClassID` then use `6 - FPDiff`? For 64b, I'd have guessed those six plus r8d to r15d, for a total of 14, not 12. Am I missing something (perhaps about esi+edi)?
275	Good question. Some comments could be added to this function to help retain more context as to WHY these magic constants have these values. I'm happy to do so in these commits, if we can document _why_. I'm running `git log -S X86TargetLowering::getRegPressureLimit` to see if I can find additional context.

nickdesaulniers added inline comments.Apr 25 2022, 1:31 PM

llvm/lib/Target/X86/X86RegisterInfo.cpp
275	This code was added by commit: commit 37b740c4bfbb ("Add an ILP scheduler.) Author: Evan Cheng <evan.cheng@apple.com> AuthorDate: Sat Jul 24 00:39:05 2010 +0000

nickdesaulniers added inline comments.Apr 25 2022, 1:40 PM

llvm/lib/Target/X86/X86RegisterInfo.cpp
266–271	Also, if we extend this from 4 to 6 on 32b, then the test case will no longer be fixed.The inline asm from the test case uses 1 GR8 and 4 GR32.

nickdesaulniers added a reviewer: MatzeB.Apr 25 2022, 3:47 PM

pengfei added inline comments.Apr 26 2022, 12:05 AM

llvm/lib/Target/X86/X86RegisterInfo.cpp
266–271	I'm guessing it's a heuristic number. It might consider an average of the usage of other special registers e.g., eax for returning, ecx for counter, esi and edi for data moving etc. If the assumption is ture, then it makes sense to 64b using 12 (4 + 8), since the r8~r15 are pure general ones.
275	Oops, I made a mistake. I thought the `V` means virtual... The `V` actually means vector. And the `VR64` are `MMX` registers which have the same size 8 on both 32b and 64b. Why we use `4` for `VR64` and `VR128` on 32b while `10` on 64b. A wild guess is it's related to calling conversion. Anyway, I think we should keep such value before we can make sure the origin. OTOH, I think maybe we can more register classes (maybe a seperate patch), e.g. case X86::VR128RegClassID: case X86::VR256RegClassID: case X86::VR512RegClassID: return Is64Bit ? 10 : 4; case X86::VR128XRegClassID: case X86::VR256XRegClassID: case X86::VR512XRegClassID: return Is64Bit ? 26 : 4;

Hi,

I see what you are trying to solve, but at the same time, in theory any instruction could have this kind of register pressure problems. (Though, we probably don't ever create too many not spillable live-ranges around regular instructions and regular instructions have a reasonable number of operands.)

Ultimately, I feel that telling the users that their inline asm is using too many registers is not necessarily a bad thing.

Making inline asm instructions scheduling boundaries when they use a "lot of registers" is a pretty big hammer. Also technically, inline asm instructions with only a few operands could also become scheduling barrier if the surrounding pressure is already high.

I'm on the fence with that patch. I expect it fixes the cases where we may run out of registers but it also over-constrains the scheduling problem.

I guess this is acceptable because when users use inline asm, they essentially are telling the compiler "I know what I'm doing". Thus, I'm leaning toward "this patch is okay" but I could be convinced otherwise.

What do others think?

Cheers,
-Quentin

arsenm added inline comments.May 2 2022, 2:14 PM

llvm/lib/CodeGen/MachineScheduler.cpp
459	This is implicitly relying on register Class exact matches when register classes are really overlapping sets
473	Remove the new line and INLINEASM since that’s included with the instruction print (or Indent the rest)

My understanding is the inline asm is in the transition zone between a "regular instruction" and "function call", because the content of inline asm is vary largely. It can be as simple as a "nop" or more complicated as modifing memory or even call a function.
I think setting scheduling barrier based on register pressure is compromised solution. It's also much like we prepare/preserve registers for function call when we use many registers in inline asm.
One more thought, we may also need to consider the clobber registers and memory?

I'm not comfortable pursuing this approach.

Say we had an example where there were two instructions that consumed physregs, producing virt regs, then inlineasm with lots of operands, then consumers of the virt regs, and we could only reschedule one of the two virt reg produces below the asm.

This approach will pessimistically allow neither to be moved.

I will have to think more about this, but am super busy with other things ATM. So I plan to at least land https://reviews.llvm.org/D122350 for now.

Do you have a feeling for how well this heuristic works for non-X86? I wonder if it would be better to be conservative and only have it in X86InstrInfo::isSchedulingBoundary, that way you could also hardcode some values instead of fiddeling with getRegPressureLimit.

I also have a feeling like you only need to consider GPR registers on x86 where constraints are plentiful while for XMM/YMM it probably doesn't matter...

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

MachineScheduler.cpp

57 lines

Target/

X86/

X86RegisterInfo.cpp

2 lines

test/

CodeGen/

X86/

scheduler-asm-moves.mir

8 lines

Diff 424633

llvm/lib/CodeGen/MachineScheduler.cpp

Show First 20 Lines • Show All 443 Lines • ▼ Show 20 Lines	bool PostMachineScheduler::runOnMachineFunction(MachineFunction &mf) {
std::unique_ptr<ScheduleDAGInstrs> Scheduler(createPostMachineScheduler());		std::unique_ptr<ScheduleDAGInstrs> Scheduler(createPostMachineScheduler());
scheduleRegions(*Scheduler, true);		scheduleRegions(*Scheduler, true);

if (VerifyScheduling)		if (VerifyScheduling)
MF->verify(this, "After post machine scheduling.");		MF->verify(this, "After post machine scheduling.");
return true;		return true;
}		}

		static bool inlineAsmWouldIncreasePressure(const MachineInstr &MI,
		const MachineFunction &MF,
		const TargetInstrInfo *TII) {
		if (!MI.isInlineAsm())
		return false;

		const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
		DenseMap<const TargetRegisterClass *, unsigned> RCCount;
		arsenmUnsubmitted Not Done Reply Inline Actions This is implicitly relying on register Class exact matches when register classes are really overlapping sets arsenm: This is implicitly relying on register Class exact matches when register classes are really…

		// For each operand, keep a count of the encountered TargetRegisterClass.
		for (unsigned OpNo = 0, e = MI.getNumOperands(); OpNo != e; ++OpNo)
		if (MI.getOperand(OpNo).isReg())
		if (const auto *RC = MI.getRegClassConstraint(OpNo, TII, TRI))
		++RCCount[RC];

		// Is the number of TargetRegisterClass operands of the INLINEASM above the
		// pressure limit?
		for (auto it : RCCount) {
		unsigned PressureLimit = TRI->getRegPressureLimit(it.first, MF);
		if (it.second > PressureLimit) {
		LLVM_DEBUG(
		dbgs() << "Not reschduling around: " << MI << "\nINLINEASM has "
		craig.topperUnsubmitted Not Done Reply Inline Actions reschduling -> rescheduling craig.topper: reschduling -> rescheduling
		arsenmUnsubmitted Not Done Reply Inline Actions Remove the new line and INLINEASM since that’s included with the instruction print (or Indent the rest) arsenm: Remove the new line and INLINEASM since that’s included with the instruction print (or Indent…
		<< it.second << " " << TRI->getRegClassName(it.first)
		<< " operands, which is above the register pressure limit of "
		<< PressureLimit << "\n";);
		return true;
		}
		}

		return false;
		}

/// Return true of the given instruction should not be included in a scheduling		/// Return true of the given instruction should not be included in a scheduling
/// region.		/// region.
///		///
/// MachineScheduler does not currently support scheduling across calls. To		/// MachineScheduler does not currently support scheduling across calls. To
/// handle calls, the DAG builder needs to be modified to create register		/// handle calls, the DAG builder needs to be modified to create register
/// anti/output dependencies on the registers clobbered by the call's regmask		/// anti/output dependencies on the registers clobbered by the call's regmask
/// operand. In PreRA scheduling, the stack pointer adjustment already prevents		/// operand. In PreRA scheduling, the stack pointer adjustment already prevents
/// scheduling across calls. In PostRA scheduling, we need the isCall to enforce		/// scheduling across calls. In PostRA scheduling, we need the isCall to enforce
/// the boundary, but there would be no benefit to postRA scheduling across		/// the boundary, but there would be no benefit to postRA scheduling across
/// calls this late anyway.		/// calls this late anyway.
static bool isSchedBoundary(MachineBasicBlock::iterator MI,		static bool isSchedBoundary(const MachineInstr &MI,
MachineBasicBlock *MBB,		const MachineBasicBlock &MBB,
MachineFunction *MF,		const MachineFunction &MF,
const TargetInstrInfo *TII) {		const TargetInstrInfo *TII) {
return MI->isCall() \|\| TII->isSchedulingBoundary(MI, MBB, MF);		return MI.isCall() \|\| TII->isSchedulingBoundary(MI, &MBB, MF) \|\|
		inlineAsmWouldIncreasePressure(MI, MF, TII);
}		}

/// A region of an MBB for scheduling.		/// A region of an MBB for scheduling.
namespace {		namespace {
struct SchedRegion {		struct SchedRegion {
/// RegionBegin is the first instruction in the scheduling region, and		/// RegionBegin is the first instruction in the scheduling region, and
/// RegionEnd is either MBB->end() or the scheduling boundary after the		/// RegionEnd is either MBB->end() or the scheduling boundary after the
/// last instruction in the scheduling region. These iterators cannot refer		/// last instruction in the scheduling region. These iterators cannot refer
/// to instructions outside of the identified scheduling region because		/// to instructions outside of the identified scheduling region because
/// those may be reordered before scheduling this region.		/// those may be reordered before scheduling this region.
MachineBasicBlock::iterator RegionBegin;		MachineBasicBlock::iterator RegionBegin;
MachineBasicBlock::iterator RegionEnd;		MachineBasicBlock::iterator RegionEnd;
unsigned NumRegionInstrs;		unsigned NumRegionInstrs;

SchedRegion(MachineBasicBlock::iterator B, MachineBasicBlock::iterator E,		SchedRegion(MachineBasicBlock::iterator B, MachineBasicBlock::iterator E,
unsigned N) :		unsigned N) :
RegionBegin(B), RegionEnd(E), NumRegionInstrs(N) {}		RegionBegin(B), RegionEnd(E), NumRegionInstrs(N) {}
};		};
} // end anonymous namespace		} // end anonymous namespace

using MBBRegionsVector = SmallVector<SchedRegion, 16>;		using MBBRegionsVector = SmallVector<SchedRegion, 16>;

static void		static void getSchedRegions(MachineBasicBlock *MBB, MBBRegionsVector &Regions,
getSchedRegions(MachineBasicBlock *MBB,
MBBRegionsVector &Regions,
bool RegionsTopDown) {		bool RegionsTopDown) {
MachineFunction *MF = MBB->getParent();		const MachineFunction &MF = *MBB->getParent();
const TargetInstrInfo *TII = MF->getSubtarget().getInstrInfo();		const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();

MachineBasicBlock::iterator I = nullptr;		MachineBasicBlock::iterator I = nullptr;
for(MachineBasicBlock::iterator RegionEnd = MBB->end();		for(MachineBasicBlock::iterator RegionEnd = MBB->end();
RegionEnd != MBB->begin(); RegionEnd = I) {		RegionEnd != MBB->begin(); RegionEnd = I) {

// Avoid decrementing RegionEnd for blocks with no terminator.		// Avoid decrementing RegionEnd for blocks with no terminator.
if (RegionEnd != MBB->end() \|\|		if (RegionEnd != MBB->end() \|\|
isSchedBoundary(&std::prev(RegionEnd), &MBB, MF, TII)) {		isSchedBoundary(std::prev(RegionEnd), MBB, MF, TII)) {
--RegionEnd;		--RegionEnd;
}		}

// The next region starts above the previous region. Look backward in the		// The next region starts above the previous region. Look backward in the
// instruction stream until we find the nearest boundary.		// instruction stream until we find the nearest boundary.
unsigned NumRegionInstrs = 0;		unsigned NumRegionInstrs = 0;
I = RegionEnd;		I = RegionEnd;
for (;I != MBB->begin(); --I) {		for (;I != MBB->begin(); --I) {
MachineInstr &MI = *std::prev(I);		const MachineInstr &MI = *std::prev(I);
if (isSchedBoundary(&MI, &*MBB, MF, TII))		if (isSchedBoundary(MI, *MBB, MF, TII))
break;		break;
if (!MI.isDebugOrPseudoInstr()) {		if (!MI.isDebugOrPseudoInstr()) {
// MBB::size() uses instr_iterator to count. Here we need a bundle to		// MBB::size() uses instr_iterator to count. Here we need a bundle to
// count as a single instruction.		// count as a single instruction.
++NumRegionInstrs;		++NumRegionInstrs;
}		}
}		}

▲ Show 20 Lines • Show All 3,431 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86RegisterInfo.cpp

	Show First 20 Lines • Show All 257 Lines • ▼ Show 20 Lines
	X86RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,			X86RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
	const MachineFunction &MF) const {			const MachineFunction &MF) const {
	const X86FrameLowering *TFI = getFrameLowering(MF);			const X86FrameLowering *TFI = getFrameLowering(MF);

	unsigned FPDiff = TFI->hasFP(MF) ? 1 : 0;			unsigned FPDiff = TFI->hasFP(MF) ? 1 : 0;
	switch (RC->getID()) {			switch (RC->getID()) {
	default:			default:
	return 0;			return 0;
				case X86::GR8RegClassID:
				case X86::GR16RegClassID:
	case X86::GR32RegClassID:			case X86::GR32RegClassID:
	return 4 - FPDiff;			return 4 - FPDiff;
	case X86::GR64RegClassID:			case X86::GR64RegClassID:
	return 12 - FPDiff;			return 12 - FPDiff;
				pengfeiUnsubmitted Not Done Reply Inline Actions Should we change them to `(Is64Bit ? 12 : 4) - FPDiff` ? pengfei: Should we change them to `(Is64Bit ? 12 : 4) - FPDiff` ?
				nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions Yeah, I think so. Though I'm curious about the use of 4 in the first place. i386 has: eax ebx ecx edx esi edi ebp esp eip As GPRs. eip and esp aren't allocatable, and ebp is the condition `FPDiff` here. Why 4? I would have guessed 6 for the first 6 (for `-m32`)? Perhaps 4 corresponds to `GR32_ABCD`, for some reason? So should the cases used be `GR32_ABCDRegClassID` rather than `GR8RegClassID`? Or should it be `GR8RegClassID` then use `6 - FPDiff`? For 64b, I'd have guessed those six plus r8d to r15d, for a total of 14, not 12. Am I missing something (perhaps about esi+edi)? nickdesaulniers: Yeah, I think so. Though I'm curious about the use of 4 in the first place. i386 has: * eax *…
				nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions Also, if we extend this from 4 to 6 on 32b, then the test case will no longer be fixed.The inline asm from the test case uses 1 GR8 and 4 GR32. nickdesaulniers: Also, if we extend this from 4 to 6 on 32b, then the test case will no longer be fixed.The…
				pengfeiUnsubmitted Not Done Reply Inline Actions I'm guessing it's a heuristic number. It might consider an average of the usage of other special registers e.g., eax for returning, ecx for counter, esi and edi for data moving etc. If the assumption is ture, then it makes sense to 64b using 12 (4 + 8), since the r8~r15 are pure general ones. pengfei: I'm guessing it's a heuristic number. It might consider an average of the usage of other…
	case X86::VR128RegClassID:			case X86::VR128RegClassID:
	return Is64Bit ? 10 : 4;			return Is64Bit ? 10 : 4;
	case X86::VR64RegClassID:			case X86::VR64RegClassID:
	return 4;			return 4;
				pengfeiUnsubmitted Not Done Reply Inline Actions I wonder why return 4 for it. pengfei: I wonder why return 4 for it.
				nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions Good question. Some comments could be added to this function to help retain more context as to WHY these magic constants have these values. I'm happy to do so in these commits, if we can document _why_. I'm running `git log -S X86TargetLowering::getRegPressureLimit` to see if I can find additional context. nickdesaulniers: Good question. Some comments could be added to this function to help retain more context as to…
				nickdesaulniersAuthorUnsubmitted Not Done Reply Inline Actions This code was added by commit: commit 37b740c4bfbb ("Add an ILP scheduler.) Author: Evan Cheng <evan.cheng@apple.com> AuthorDate: Sat Jul 24 00:39:05 2010 +0000 nickdesaulniers: This code was added by commit: commit 37b740c4bfbb ("Add an ILP scheduler.) Author: Evan…
				pengfeiUnsubmitted Not Done Reply Inline Actions Oops, I made a mistake. I thought the `V` means virtual... The `V` actually means vector. And the `VR64` are `MMX` registers which have the same size 8 on both 32b and 64b. Why we use `4` for `VR64` and `VR128` on 32b while `10` on 64b. A wild guess is it's related to calling conversion. Anyway, I think we should keep such value before we can make sure the origin. OTOH, I think maybe we can more register classes (maybe a seperate patch), e.g. case X86::VR128RegClassID: case X86::VR256RegClassID: case X86::VR512RegClassID: return Is64Bit ? 10 : 4; case X86::VR128XRegClassID: case X86::VR256XRegClassID: case X86::VR512XRegClassID: return Is64Bit ? 26 : 4; pengfei: Oops, I made a mistake. I thought the `V` means virtual... The `V` actually means vector. And…
	}			}
	}			}

	const MCPhysReg *			const MCPhysReg *
	X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {			X86RegisterInfo::getCalleeSavedRegs(const MachineFunction *MF) const {
	assert(MF && "MachineFunction required");			assert(MF && "MachineFunction required");

	const X86Subtarget &Subtarget = MF->getSubtarget<X86Subtarget>();			const X86Subtarget &Subtarget = MF->getSubtarget<X86Subtarget>();
	▲ Show 20 Lines • Show All 737 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/scheduler-asm-moves.mir

	Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines
	machineFunctionInfo: {}			machineFunctionInfo: {}
	body: \|			body: \|
	bb.0.entry:			bb.0.entry:
	liveins: $eax, $edx			liveins: $eax, $edx

	; CHECK-LABEL: name: synproxy_send_tcp_ipv6			; CHECK-LABEL: name: synproxy_send_tcp_ipv6
	; CHECK: liveins: $eax, $edx			; CHECK: liveins: $eax, $edx
	; CHECK-NEXT: {{ $}}			; CHECK-NEXT: {{ $}}
				; CHECK-NEXT: [[COPY:%[0-9]+]]:gr32 = COPY $edx
				; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr32_abcd = COPY $eax
	; CHECK-NEXT: [[MOV8rm:%[0-9]+]]:gr8 = MOV8rm $noreg, 1, $noreg, @csum_ipv6_magic_saddr, $noreg :: (dereferenceable load (s8) from `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @csum_ipv6_magic_saddr, i32 0, i32 0, i32 0)`)			; CHECK-NEXT: [[MOV8rm:%[0-9]+]]:gr8 = MOV8rm $noreg, 1, $noreg, @csum_ipv6_magic_saddr, $noreg :: (dereferenceable load (s8) from `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @csum_ipv6_magic_saddr, i32 0, i32 0, i32 0)`)
	; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_daddr, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_daddr, !tbaa !4)			; CHECK-NEXT: [[MOV32rm:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_daddr, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_daddr, !tbaa !4)
	; CHECK-NEXT: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_proto, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_proto, !tbaa !4)			; CHECK-NEXT: [[MOV32rm1:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_proto, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_proto, !tbaa !4)
	; CHECK-NEXT: [[MOV32r0_:%[0-9]+]]:gr32 = MOV32r0 implicit-def dead $eflags			; CHECK-NEXT: [[MOV32r0_:%[0-9]+]]:gr32 = MOV32r0 implicit-def dead $eflags
	; CHECK-NEXT: INLINEASM &"", 0 /* attdialect /, 2293771 / regdef-ec:GR32 /, def early-clobber %2, 65545 / reguse:GR8 /, [[MOV8rm]], 2293769 / reguse:GR32 /, [[MOV32rm]], 2293769 / reguse:GR32 /, [[MOV32r0_]], 2293769 / reguse:GR32 /, [[MOV32rm1]], 12 / clobber /, implicit-def dead early-clobber $df, 12 / clobber /, implicit-def early-clobber $fpsw, 12 / clobber */, implicit-def dead early-clobber $eflags, !8			; CHECK-NEXT: INLINEASM &"", 0 /* attdialect /, 2293771 / regdef-ec:GR32 /, def early-clobber %2, 65545 / reguse:GR8 /, [[MOV8rm]], 2293769 / reguse:GR32 /, [[MOV32rm]], 2293769 / reguse:GR32 /, [[MOV32r0_]], 2293769 / reguse:GR32 /, [[MOV32rm1]], 12 / clobber /, implicit-def dead early-clobber $df, 12 / clobber /, implicit-def early-clobber $fpsw, 12 / clobber */, implicit-def dead early-clobber $eflags, !8
	; CHECK-NEXT: MOV32mr $noreg, 1, $noreg, @csum_ipv6_magic_sum, $noreg, %2 :: (store (s32) into @csum_ipv6_magic_sum, !tbaa !4)			; CHECK-NEXT: MOV32mr $noreg, 1, $noreg, @csum_ipv6_magic_sum, $noreg, %2 :: (store (s32) into @csum_ipv6_magic_sum, !tbaa !4)
	; CHECK-NEXT: [[MOV32rm2:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @synproxy_send_tcp_ipv6_nskb, $noreg :: (dereferenceable load (s32) from `i8 bitcast (%struct.sk_buff @synproxy_send_tcp_ipv6_nskb to i8**)`, !tbaa !9)			; CHECK-NEXT: [[MOV32rm2:%[0-9]+]]:gr32 = MOV32rm $noreg, 1, $noreg, @synproxy_send_tcp_ipv6_nskb, $noreg :: (dereferenceable load (s32) from `i8 bitcast (%struct.sk_buff @synproxy_send_tcp_ipv6_nskb to i8**)`, !tbaa !9)
	; CHECK-NEXT: OR8mi [[MOV32rm2]], 1, $noreg, 0, $noreg, 3, implicit-def dead $eflags :: (store (s8) into %ir.4), (load (s8) from %ir.4)			; CHECK-NEXT: OR8mi [[MOV32rm2]], 1, $noreg, 0, $noreg, 3, implicit-def dead $eflags :: (store (s8) into %ir.4), (load (s8) from %ir.4)
	; CHECK-NEXT: [[COPY:%[0-9]+]]:gr32_abcd = COPY $eax			; CHECK-NEXT: [[MOV8rm1:%[0-9]+]]:gr8 = MOV8rm [[COPY]], 1, $noreg, 0, $noreg :: (load (s8) from %ir.5, !tbaa !11)
	; CHECK-NEXT: [[COPY1:%[0-9]+]]:gr32 = COPY $edx
	; CHECK-NEXT: [[MOV8rm1:%[0-9]+]]:gr8 = MOV8rm [[COPY1]], 1, $noreg, 0, $noreg :: (load (s8) from %ir.5, !tbaa !11)
	; CHECK-NEXT: MOV8mr $noreg, 1, $noreg, @synproxy_send_tcp_ipv6_fl6, $noreg, [[MOV8rm1]] :: (store (s8) into `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @synproxy_send_tcp_ipv6_fl6, i32 0, i32 0, i32 0)`, !tbaa !11)			; CHECK-NEXT: MOV8mr $noreg, 1, $noreg, @synproxy_send_tcp_ipv6_fl6, $noreg, [[MOV8rm1]] :: (store (s8) into `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @synproxy_send_tcp_ipv6_fl6, i32 0, i32 0, i32 0)`, !tbaa !11)
	; CHECK-NEXT: [[MOVZX32rr8_:%[0-9]+]]:gr32 = MOVZX32rr8 [[COPY]].sub_8bit			; CHECK-NEXT: [[MOVZX32rr8_:%[0-9]+]]:gr32 = MOVZX32rr8 [[COPY1]].sub_8bit
	; CHECK-NEXT: $eax = COPY [[MOVZX32rr8_]]			; CHECK-NEXT: $eax = COPY [[MOVZX32rr8_]]
	; CHECK-NEXT: TCRETURNdi @fl6nthsecurity_skb_classify_flow, 0, csr_32, implicit $esp, implicit $ssp, implicit $eax			; CHECK-NEXT: TCRETURNdi @fl6nthsecurity_skb_classify_flow, 0, csr_32, implicit $esp, implicit $ssp, implicit $eax
	%1:gr32 = COPY $edx			%1:gr32 = COPY $edx
	%0:gr32_abcd = COPY $eax			%0:gr32_abcd = COPY $eax
	%3:gr8 = MOV8rm $noreg, 1, $noreg, @csum_ipv6_magic_saddr, $noreg :: (dereferenceable load (s8) from `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @csum_ipv6_magic_saddr, i32 0, i32 0, i32 0)`)			%3:gr8 = MOV8rm $noreg, 1, $noreg, @csum_ipv6_magic_saddr, $noreg :: (dereferenceable load (s8) from `i8* getelementptr inbounds (%struct.in6_addr, %struct.in6_addr* @csum_ipv6_magic_saddr, i32 0, i32 0, i32 0)`)
	%4:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_daddr, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_daddr, !tbaa !5)			%4:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_daddr, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_daddr, !tbaa !5)
	%6:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_proto, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_proto, !tbaa !5)			%6:gr32 = MOV32rm $noreg, 1, $noreg, @csum_ipv6_magic_proto, $noreg :: (dereferenceable load (s32) from @csum_ipv6_magic_proto, !tbaa !5)
	%5:gr32 = MOV32r0 implicit-def dead $eflags			%5:gr32 = MOV32r0 implicit-def dead $eflags
	Show All 11 Lines