This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86CmovConversion.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
-
PR34565.ll

Differential D38359

[X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565
ClosedPublic

Authored by aaboud on Sep 28 2017, 7:34 AM.

Download Raw Diff

Details

Reviewers

craig.topper
chandlerc
ormris
rnk

Commits

rGc8d67979c038: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve…
rL315851: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve…

Summary

This patch resolves the issue mentioned in PR34565

Diff Detail

Repository: rL LLVM

Event Timeline

aaboud created this revision.Sep 28 2017, 7:34 AM

RKSimon added a subscriber: RKSimon.Sep 28 2017, 12:13 PM

This fixes the code differences I reported.

In D38359#885162, @ormris wrote:

This fixes the code differences I reported.

Thanks Mathew.

@chandlerc, you know this pass better than anybody else, can you review this minor change please?

Is there any way that the debug instructions can become invalid due to this transform? I'm not seeing anything obvious, but wanted to make sure you've thought about this too.

lib/Target/X86/X86CmovConversion.cpp
621–630 ↗	(On Diff #116988)	I think the use of `MI` to refer to the front of the group makes this and other code in this function harder to read. I'd suggest renaming that variable to something less confusing (or completely removing it). But you also have both `It` and `DbgIt` here which also seems confusing. Really, all of this is driving me toward the conclusion that this is too late to do this fix. Consider that this is after the scan to swap the CC above which seems like it would not naively expect a non-cmov machine intsr. I could well imagine an assert being added to it later that would fire. I think you want to do this before you start to reason about the group of machine instructions as definitely consecutive `cmov` instructions. I'd also suggest pulling it into its own function. Then you can write the code in a more clear way along the lines of: for (auto I = Group.front().getIterator(), E = Group.back().getIterator(); I != E;) { MachineInstr &MI = *I++; if (!I.isDebugValue()) continue; // Splice the debug instruction after the cmov group. MBB->insertAfter(E, I.removeFromParent()); } Anyways, my goal would be to use `auto` and for loop syntax to make the code a bit more clear, but I don't think it'll work well until this gets hoisted out of this location and into another location. Maybe you can make it work over an MI range as well, unsure.

Thanks Chandler for reviewing the code.

Is there any way that the debug instructions can become invalid due to this transform? I'm not seeing anything obvious, but wanted to make sure you've thought about this too.

This is how the code I am trying to fix looks like:

%vreg2<def,tied1> = CMOVB32rr %vreg6<tied0>, %vreg1, %EFLAGS<imp-use>
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7
%vreg3<def,tied1> = CMOVB32rr %vreg0<tied0>, %vreg6, %EFLAGS<imp-use>

I assume 3 facts at this point:

DBG_VALUE has only virtual register and cannot have physical register, that is true because we are still working with in SSA machine code.
DBG_VALUE instruction generates a void value, i.e., no instruction can use it (or refer to it).
When we replace all uses of the CMOV instruction with the new PHINode instruction, that also changes the virtual register in the DBG_VALUE.

Saying that, we can always move all DBG_VALUE instructions forward up to the end of Machine Basic Block, right?

This is the transformation I am doing:

%vreg2<def,tied1> = CMOVB32rr %vreg6<tied0>, %vreg1, %EFLAGS<imp-use>
%vreg3<def,tied1> = CMOVB32rr %vreg0<tied0>, %vreg6, %EFLAGS<imp-use>
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7

The rest, i.e., creating the branches and moving these DBG instructions to different basic block, is not related to this change and should have been working regardless of this case.

lib/Target/X86/X86CmovConversion.cpp
621–630 ↗	(On Diff #116988)	Consider that this is after the scan to swap the CC above which seems like it would not naively expect a non-cmov machine intsr. I could well imagine an assert being added to it later that would fire. It is not really an issue, because the scan to swap the CC runs over the instructions in the "Group" list , and we add only CMOV instructions to the "Group" list. There is a difference between CMOV "Group" list container and the instructions between Group->front() and Group->back(). Anyway, I agree with you that we better move this into its own function, so I will add a "packCmovGroup" function that will assure the CMOV group is consecutive.

Addressed Chandler comment.

In D38359#886781, @aaboud wrote:
Thanks Chandler for reviewing the code.

Is there any way that the debug instructions can become invalid due to this transform? I'm not seeing anything obvious, but wanted to make sure you've thought about this too.

This is how the code I am trying to fix looks like:
%vreg2<def,tied1> = CMOVB32rr %vreg6<tied0>, %vreg1, %EFLAGS<imp-use>
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7
DBG_VALUE %vreg2, %noreg, !"b", <!DIExpression()>; GR32:%vreg2 line no:7
%vreg3<def,tied1> = CMOVB32rr %vreg0<tied0>, %vreg6, %EFLAGS<imp-use>
I assume 3 facts at this point:

DBG_VALUE has only virtual register and cannot have physical register, that is true because we are still working with in SSA machine code.

DBG_VALUE instruction generates a void value, i.e., no instruction can use it (or refer to it).

When we replace all uses of the CMOV instruction with the new PHINode instruction, that also changes the virtual register in the DBG_VALUE.

Saying that, we can always move all DBG_VALUE instructions forward up to the end of Machine Basic Block, right?

Well, that's what I'm trying to make sure of...

I'm not an MI expert, but in the past I have found code motion at MI-time to be somewhat tricky.

Do we have kill flags set? If so, we could move a use of a virtual register across its kill... But maybe there is some reason to assure that this is not the case. Looping in Reid as he has been looking at DBG_VALUE a lot more closely than I have recently and may be able to quickly reassure both of us that this is safe. =]

lib/Target/X86/X86CmovConversion.cpp
580–584 ↗	(On Diff #117479)	Any particular reason you moved to a vector rather than doing the increment first, and then splicing?

Do we have kill flags set? If so, we could move a use of a virtual register across its kill... But maybe there is some reason to assure that this is not the case.

I would like to think that at this point of the optimizations, we do not preserve liveness of virtual registers. But I might be wrong!

Looping in Reid as he has been looking at DBG_VALUE a lot more closely than I have recently and may be able to quickly reassure both of us that this is safe. =]

Sure, let is wait for Reid's answer.

lib/Target/X86/X86CmovConversion.cpp
580–584 ↗	(On Diff #117479)	No, thought it might be more clear and much more safer. If you prefer the other way, I can change it back.

In D38359#886815, @aaboud wrote:

Looping in Reid as he has been looking at DBG_VALUE a lot more closely than I have recently and may be able to quickly reassure both of us that this is safe. =]

Sure, let is wait for Reid's answer.

Moving the DBG_VALUEs down and replacing the cmovs with phis should be fine.

This revision is now accepted and ready to land.Oct 3 2017, 9:25 AM

Closed by commit rL315851: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve… (authored by aaboud). · Explain WhyOct 15 2017, 4:01 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

X86/

X86CmovConversion.cpp

31 lines

test/

CodeGen/

X86/

PR34565.ll

60 lines

Diff 119071

llvm/trunk/lib/Target/X86/X86CmovConversion.cpp

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	for (auto *MBB : Blocks) {
// opposite condition code.		// opposite condition code.
X86::CondCode FirstCC, FirstOppCC, MemOpCC;		X86::CondCode FirstCC, FirstOppCC, MemOpCC;
// Indicator of a non CMOVrr instruction in the current processed range.		// Indicator of a non CMOVrr instruction in the current processed range.
bool FoundNonCMOVInst = false;		bool FoundNonCMOVInst = false;
// Indicator for current processed CMOV-group if it should be skipped.		// Indicator for current processed CMOV-group if it should be skipped.
bool SkipGroup = false;		bool SkipGroup = false;

for (auto &I : *MBB) {		for (auto &I : *MBB) {
		// Skip debug instructions.
		if (I.isDebugValue())
		continue;
X86::CondCode CC = X86::getCondFromCMovOpc(I.getOpcode());		X86::CondCode CC = X86::getCondFromCMovOpc(I.getOpcode());
// Check if we found a X86::CMOVrr instruction.		// Check if we found a X86::CMOVrr instruction.
if (CC != X86::COND_INVALID && (IncludeLoads \|\| !I.mayLoad())) {		if (CC != X86::COND_INVALID && (IncludeLoads \|\| !I.mayLoad())) {
if (Group.empty()) {		if (Group.empty()) {
// We found first CMOV in the range, reset flags.		// We found first CMOV in the range, reset flags.
FirstCC = CC;		FirstCC = CC;
FirstOppCC = X86::GetOppositeBranchCondition(CC);		FirstOppCC = X86::GetOppositeBranchCondition(CC);
// Clear out the prior group's memory operand CC.		// Clear out the prior group's memory operand CC.
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	bool X86CmovConverterPass::checkForProfitableCmovCandidates(
// Number of cycles saved in first 'i` iterations by optimizing the loop.		// Number of cycles saved in first 'i` iterations by optimizing the loop.
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
for (unsigned I = 0; I < LoopIterations; ++I) {		for (unsigned I = 0; I < LoopIterations; ++I) {
DepthInfo &MaxDepth = LoopDepth[I];		DepthInfo &MaxDepth = LoopDepth[I];
for (auto *MBB : Blocks) {		for (auto *MBB : Blocks) {
// Clear physical registers Def map.		// Clear physical registers Def map.
RegDefMaps[PhyRegType].clear();		RegDefMaps[PhyRegType].clear();
for (MachineInstr &MI : *MBB) {		for (MachineInstr &MI : *MBB) {
		// Skip debug instructions.
		if (MI.isDebugValue())
		continue;
unsigned MIDepth = 0;		unsigned MIDepth = 0;
unsigned MIDepthOpt = 0;		unsigned MIDepthOpt = 0;
bool IsCMOV = CmovInstructions.count(&MI);		bool IsCMOV = CmovInstructions.count(&MI);
for (auto &MO : MI.uses()) {		for (auto &MO : MI.uses()) {
// Checks for "isUse()" as "uses()" returns also implicit definitions.		// Checks for "isUse()" as "uses()" returns also implicit definitions.
if (!MO.isReg() \|\| !MO.isUse())		if (!MO.isReg() \|\| !MO.isUse())
continue;		continue;
unsigned Reg = MO.getReg();		unsigned Reg = MO.getReg();
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	static bool checkEFLAGSLive(MachineInstr *MI) {
for (auto I = BB->succ_begin(), E = BB->succ_end(); I != E; ++I) {		for (auto I = BB->succ_begin(), E = BB->succ_end(); I != E; ++I) {
if ((*I)->isLiveIn(X86::EFLAGS))		if ((*I)->isLiveIn(X86::EFLAGS))
return true;		return true;
}		}

return false;		return false;
}		}

		/// Given /p First CMOV instruction and /p Last CMOV instruction representing a
		/// group of CMOV instructions, which may contain debug instructions in between,
		/// move all debug instructions to after the last CMOV instruction, making the
		/// CMOV group consecutive.
		static void packCmovGroup(MachineInstr First, MachineInstr Last) {
		assert(X86::getCondFromCMovOpc(Last->getOpcode()) != X86::COND_INVALID &&
		"Last instruction in a CMOV group must be a CMOV instruction");

		SmallVector<MachineInstr *, 2> DBGInstructions;
		for (auto I = First->getIterator(), E = Last->getIterator(); I != E; I++) {
		if (I->isDebugValue())
		DBGInstructions.push_back(&*I);
		}

		// Splice the debug instruction after the cmov group.
		MachineBasicBlock *MBB = First->getParent();
		for (auto *MI : DBGInstructions)
		MBB->insertAfter(Last, MI->removeFromParent());
		}

void X86CmovConverterPass::convertCmovInstsToBranches(		void X86CmovConverterPass::convertCmovInstsToBranches(
SmallVectorImpl<MachineInstr *> &Group) const {		SmallVectorImpl<MachineInstr *> &Group) const {
assert(!Group.empty() && "No CMOV instructions to convert");		assert(!Group.empty() && "No CMOV instructions to convert");
++NumOfOptimizedCmovGroups;		++NumOfOptimizedCmovGroups;

		// If the CMOV group is not packed, e.g., there are debug instructions between
		// first CMOV and last CMOV, then pack the group and make the CMOV instruction
		// consecutive by moving the debug instructions to after the last CMOV.
		packCmovGroup(Group.front(), Group.back());

// To convert a CMOVcc instruction, we actually have to insert the diamond		// To convert a CMOVcc instruction, we actually have to insert the diamond
// control-flow pattern. The incoming instruction knows the destination vreg		// control-flow pattern. The incoming instruction knows the destination vreg
// to set, the condition code register to branch on, the true/false values to		// to set, the condition code register to branch on, the true/false values to
// select between, and a branch opcode to use.		// select between, and a branch opcode to use.

// Before		// Before
// -----		// -----
// MBB:		// MBB:
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/X86/PR34565.ll

				; RUN: llc -mtriple=x86_64-pc-linux -x86-cmov-converter=true -verify-machineinstrs < %s \| FileCheck %s

				; Test for PR34565, check that DBG instructions are ignored while optimizing
				; X86 CMOV instructions.
				; In this case, we check that there is no 'cmov' generated.

				; CHECK-NOT: cmov

				@main.buf = private unnamed_addr constant [10 x i64] [i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7, i64 8, i64 9], align 8

				define i32 @main() #0 !dbg !5 {
				entry:
				br label %while.body

				while.body: ; preds = %while.body, %entry
				%a.010 = phi i32 [ 0, %entry ], [ %add.a.0, %while.body ]
				%b.09 = phi i32 [ 10, %entry ], [ %b.0.add, %while.body ]
				%add = add i32 %a.010, %b.09
				%call = tail call i32 @rand()
				%conv = sext i32 %call to i64
				%arrayidx = getelementptr inbounds [10 x i64], [10 x i64]* @main.buf, i32 0, i32 %add
				%0 = load i64, i64* %arrayidx, align 8
				%cmp1 = icmp ult i64 %0, %conv
				%b.0.add = select i1 %cmp1, i32 %b.09, i32 %add
				%add.a.0 = select i1 %cmp1, i32 %add, i32 %a.010
				tail call void @llvm.dbg.value(metadata i32 %add.a.0, metadata !10, metadata !DIExpression()), !dbg !13
				tail call void @llvm.dbg.value(metadata i32 %b.0.add, metadata !12, metadata !DIExpression()), !dbg !14
				tail call void @llvm.dbg.value(metadata i32 %add.a.0, metadata !10, metadata !DIExpression()), !dbg !13
				tail call void @llvm.dbg.value(metadata i32 %b.0.add, metadata !12, metadata !DIExpression()), !dbg !14
				%cmp = icmp ult i32 %add.a.0, %b.0.add
				br i1 %cmp, label %while.body, label %while.end

				while.end: ; preds = %while.body
				ret i32 0
				}

				declare i32 @rand()

				declare void @llvm.dbg.value(metadata, metadata, metadata)

				attributes #0 = { "target-cpu"="x86-64" }

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 6.0.0 (trunk)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "PR34565.c", directory: "\5C")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 3, type: !6, isLocal: false, isDefinition: true, scopeLine: 4, flags: DIFlagPrototyped, isOptimized: true, unit: !0, variables: !9)
				!6 = !DISubroutineType(types: !7)
				!7 = !{!8}
				!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!9 = !{!10, !12}
				!10 = !DILocalVariable(name: "a", scope: !5, file: !1, line: 6, type: !11)
				!11 = !DIBasicType(name: "unsigned int", size: 32, encoding: DW_ATE_unsigned)
				!12 = !DILocalVariable(name: "b", scope: !5, file: !1, line: 7, type: !11)
				!13 = !DILocation(line: 6, column: 16, scope: !5)
				!14 = !DILocation(line: 7, column: 16, scope: !5)