This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/Target/
-
llvm/
-
Target/
-
TargetInstrInfo.h
-
lib/
-
CodeGen/
20
ExecutionDepsFix.cpp
-
Target/X86/
-
X86/
-
X86InstrInfo.h
8
X86InstrInfo.cpp
-
test/CodeGen/X86/
-
CodeGen/
-
X86/
9
break-false-dep.ll
-
known-bits-vector.ll
2
vec_int_to_fp.ll

Differential D28915

[ExecutionDepsFix] Optimize instruction insertion
Needs ReviewPublic

Authored by loladiro on Jan 19 2017, 1:23 PM.

Download Raw Diff

Details

Reviewers

myatsina
atrick
mkuper
MatzeB

Summary

In preparation for an upcoming commit (D28786) that will make
ExecutionDepsFix more agressive about inserting dependency breaking instructions,
teach ExecutionDepsFix to be smarter about inserting these instructions.
There are two aspects to this:

Recognize dependency breaking instructions and if there already is such an instruction, simply re-use it, rather than inserting a new one.

For undef reads, which register is used does not matter. Thus, if there is many such reads in close succession, we only need to insert one dependency breaking instruction for many such reads.

Note: This revision depends on D28759

Diff Detail

Build Status

Buildable 3074
Build 3074: arc lint + arc unit

Event Timeline

loladiro created this revision.Jan 19 2017, 1:23 PM

loladiro mentioned this in D28786: [ExecutionDepsFix] Kill clearance at function entry/calls.Jan 19 2017, 1:34 PM

loladiro added a child revision: D28786: [ExecutionDepsFix] Kill clearance at function entry/calls.Jan 19 2017, 1:35 PM

Bump.

First, I'm sorry for the late review :)

Second, I have a general comment:
I know that all the things you've put in this patch are aimed at improving the number of xors and the registers we pick as undefs, but it is much easier to review, check necessity of each change and make sure the feature is properly tested when you break down your commits to small changes/patches.
Removing redundant xors when you have several instructions that use same undef reg is one issue
isDependencyBreak support is another issue
Trying to add only true dependency to the undef regs set is another issue
Trying to choose a better register is another issue (and I would expect it to somehow make "pickBestReg" smarter and not rewrite everything at the end).

Most of these changes where not part of the original patch you've uploaded in your previous patch version that combined the function calls, so I'm wondering how they got themselves into this patch now? Do we really need them? Have you seen performance improvements by these changes too measuring each one separately (even if they look good "on paper" they might not bring the expected result, they might even make it worse)?

I'm sorry I'm being very strict about this again, but I want to make sure we do each such change the best possible way and keep them separate (with dedicated tests of course), so please break your changes as much as possible, upload each change you want in a different review with a different test case.
I would also think about the general approach of trying to rewrite the register again - how does it work together with "pickBestRegister"? Does it undo the work of that function? Or did we miss some cases there in the first place? With this issue i would start from the beginning - what problem are we trying to solve and what is the best way to solve it?

lib/CodeGen/ExecutionDepsFix.cpp
635	Maybe it's better to use iterator_range for this purpose? You can see regIndices example in this file.
649	Please document this function and what's its purpose
656	formatting issue - please surround the if with {} and thus be consistent with the style of the other conditions in this if- if - else if-else
661	why do you need this case for?
666	why do we choose lowest register? why not choose a "far away" register (this way we may even be able to save the xor)?
690	Shouldn't "pickBestRegister" catch these cases? And if not, maybe we've missed an optimization case there. I'm not sure we should have this in this form. What is the scenario you're trying to optimizer and that "pickBestRegister" doesn't catch?
711	Please try to avoid defining variable's initial values this way. It not very readable. How about: size_t LastInit = ...; size_t i = LastInit;
722	Aren't you checking the liverange set in updateChooseableRegs? Why do it here too?
729	Please document the meaning of this code, and the next section as well. What are each of these sections doing? It's really hard for to follow this. I read the documentation that you've added earlier on of what is the purpose of your transformation, but I don't really understand how you iterate over the undefs and achieve this, why you've split it to several sections and the cases each section handles.
lib/Target/X86/X86InstrInfo.cpp
7501	There are additional instructions like KXORBrr, KXORwrr, KXORDrr, KXORQrr(AVX512). Others might be added in the future, so perhaps we should set the right infrastructure for that. swithc (op): default: return false; case XORPS: case XORPS: ... <check xor instruction>
7504	you can use X86::NoRegister instead
7505	You can iterate directly over operands
7508	I think this whole section of could would be a bit clearer if you write it in this spirit: for ()
7511	I see that you write a lot of functions so that they could return 2 arguments and that you've used the same "trick" in other functions as well. I don't thing it's a good approach in general. This need of a function to return 2 values generally indicates that you need to extract the common logic somehow or define a new type or do some other form of refactoring refactoring. In this case I think "isDependencyBreak" should only return true/false (as the name indicates). If the module asking wants to know which register it is then it can find the dest reg on its own, or we can provide the module a "getDependencyBreakReg". In general it will behave similar to the already existing "isReg()" and "getReg()".
test/CodeGen/X86/break-false-dep.ll
236	Each feature should have it's own test. You are not supposed to eliminate the original test (unless the original test's logic is no longer true), you're suppose to a small unit test for each feature. This test was originally designed to check "pickBestRegister".
237	This test is solely for testing the xor combining you've added? Or is it suppose to test something else? I already see you've added a dedicated test for the "xor", so I'm not sure why you've changed this one. If it's only for xor combine then you can do a very simple test like this: define double @clearence(i64 %arg1, i64 %arg2) { // the inline asm calls that make xmm6 as the best choice %tmp1 = sitofp i64 %arg1 to double %tmp2 = sitofp i64 %arg2 to double %tmp3 = add i64 %tmp1, %tmp2 ret double %tmp3 } In this case xmm6 should be chosen as the undef for both convert instructions. and there should only be one xor
242	why did you add this part? What is the logic that is tests?
367	where are the tests for all the other optimizations you've added? The "isBreakingDependency", "register re-picking", etc?
test/CodeGen/X86/vec_int_to_fp.ll
1668	Why is using xmm1 ok? If we fall through from the revius BB where xmm1 is xored, then it's ok but we might jump to this block from another place and if we use xmm1 and won't have a xor, then we will have a stall, no?

There's only really two major changes here:

Updating the registers at the end
isDependencyBreak support

Updating the registers at the end is not redundant with picking the best register,
because it only operates on instructions where we've determined earlier that a
dependency break is required (i.e. there are no suitable undef registers with
sufficient clearance). What updating at the end does is allow us to avoid the pathological
case in the comment I added above:

// All registers clobbered here, the code determines it needs a dependency break,
// so arbitrarily picks xmm0 as the undef read and then remembers for later to
// insert a dependency break here
vrandom %xmm0<undef>, %xmm0<def>
vrandom %xmm1<undef>, %xmm1<def>
vrandom %xmm2<undef>, %xmm2<def>
vrandom %xmm3<undef>, %xmm3<def>

The problem is that without this patch this turns into:

vxorps %xmm0, %xmm0, %xmm0
vrandom %xmm0<undef>, %xmm0<def>
vxorps %xmm1, %xmm1, %xmm1
vrandom %xmm1<undef>, %xmm1<def>
vxorps %xmm2, %xmm2, %xmm2
vrandom %xmm2<undef>, %xmm2<def>
vxorps %xmm3, %xmm3, %xmm3
vrandom %xmm3<undef>, %xmm3<def>

Clearly, we can do better, which is what this patch does. However, doing so involves
a backwards walk over all the undef reads we know need to be broken, so it's fundamentally
incompatible with the forwards algorithm in pickBestRegister.

Most of these changes where not part of the original patch you've uploaded in your previous patch version that combined the function calls, so I'm wondering how they got themselves into this patch now? Do we really need them? Have you seen performance improvements by these changes too measuring each one separately (even if they look good "on paper" they might not bring the expected result, they might even make it worse)?

I think I had an earlier version for avoiding redundant xorps in a forward manner, which you may have looked at.
That turned out to be insufficient, and is subsumed by this algorithm. In general this
revision is aimed at limiting the impact of D28786, which without this enhancement,
caused a very large number of vxorps to be inserted.

The isDependencyBreak is mostly included because it makes it possible to easily create registers that the algorithm
recognizes as having lots of clearance (and thus not requiring an extra dependency break), even when clearance is
killed at function entry as D28786 does. After that change it is very rare for there to be registers of sufficient clearance otherwise.
I'd be happy to split that out if you want.

lib/CodeGen/ExecutionDepsFix.cpp
661	Only to make sure we actually return the correct LowestValid.
666	We want to pick an arbitrary register from the choosable set. Picking the lowest one ensures a consistent choice. Picking a far away one doesn't make sense, because to get to this point, we've already determined that no registers are sufficiently far away.
690	See the non-inline reply. This is fundamentally a reverse operation.
722	`updateChoosableRegs` only looks at the instructions before which we require dependency breaks. This kills choosability for any registers that have liveness entirely between two such instructions.
lib/Target/X86/X86InstrInfo.cpp
7511	Fair enough (maybe I've been writing too much C code). In general I dislike splitting such logic across two functions since it needs to be kept in sync (e.g. when adding additional dependency breaking instructions to be recognized). How about adding an `getDependencyBreakReg` that just returns 0 or an llvm::Optional, if it's not a dependency breaking instruction.
test/CodeGen/X86/break-false-dep.ll
236	It still checks pickBestRegister, but in a way that remains valid after D28786, by making use of the fact that the `fcmp ult double %x, 0.0` materializes a constant `0.0` as a dependency breaking instruction, so pickBestRegister can pick up on that.
237	See above
242	See above
367	This is the test for register re-picking.
test/CodeGen/X86/vec_int_to_fp.ll
1668	As far as I can tell, nothing else jumps in between the vxorps and this undefined use, so it's a good register to use.

I'm sorry, I'm getting too lost in this review with the interaction between all the changes and especially trying to check if all the corner cases were tested sufficiently.
I want to make sure we implement each of your changes the best possible way.

Let's do the separation to the 3 incremental reviews. I think the order should be:
Trying to add only true dependency to the undef regs
isDependencyBreak support (which is the "xor fusion" if I understand it correctly)
Register re-picking.
(and the call affect on clearance calculations can follow)

I'm emphasizing the fact that the "xor fusion" and "register re-picking" should have several unit tests dedicated only to these features, testing both simple and comlex cases.

Also, I think a cleaner implementation can be done for the register re-picking, but I need to stew on it a bit more until I have a good concrete suggestion.

lib/CodeGen/ExecutionDepsFix.cpp
499	Why not make it return true/false (true dependency) instead of void?
584	And then here you will have : bool TrueDependency = pickBest...()
628	This feature should be heavily tested. This is the "xor fusion" logic, right? We're adding xors for dependency break in a late stage of the program, after clearance calculation of everyone was already done, including loops (i.e. you're previous patch that changes the calculation algorithm that you've comitted). Now you are changing the clearance for the xor's we've added, if I understand this correctly. So somehow you need to update your old clearance calculation, no? Or is this code aimed to only catch "xor"s that were in the original code. I would like to see a lot of tests for these feature (without the affect of the other features): How does it affect simple one BB code? What is the affect on simple loops with one xor to this register and no dependency? What is the affect on complex loops that no xors and assigns to the same register: loop: cvt xor xmm0 cvt xmm0 = cvt cvt xor xmm0 cvt xmm0 =
656	There's something that doesn't make sense for me. Sometimes we're adding registers to ChoosableRegs and sometimes we're erasing registers from it. Why did we add something "bad" in the first place? It seem like this function is trying to mix 2 different (but close) functionalities, and I don't think it's right.
681	Do you mean vroundss? In this case it's a bad example. It uses 3 xmms and you should hide the false dependency under the true dependency of the source xmm. I think you should put here the full instruction with all the operands (vcvtsi2sd will probably be good enough for your point).
lib/Target/X86/X86InstrInfo.cpp
7508	Ignore the comment above :) I forgot to delete it.
7511	The llvm:optional seems very cool :) I wasn't familiar with it :) I think it's the best for us. I like it better than reg == 0 (I don't think there's a 0 is "non reg" for all targets). Anyway, here are some additional options I thought of as well: implement getDepRegBreak and isDepRegBreak when one uses the other isDepsRegBreak() { return getRegBreak() != X86::NON_REG } This way you don't have the duplication of the logic that you've talked about. getDependencyBreak will return operand number (of dest reg) or -1 if it's not breaking
test/CodeGen/X86/break-false-dep.ll
367	I think you need added additional more complex tests fore the re-picking. What happens if we're in a loop? or in a loop with a carry on dependency from the previous iteration? What happens if you have instructions between the converts that make some of you original potential registers "alive" and then you can no longer use them?

I'll split up this review further and let's go from there.

lib/CodeGen/ExecutionDepsFix.cpp
499	That seems fine. I think I originally had it return a value, which is why I went this way, but since that's no longer the case, I'll update accordingly.
628	This is just for xors that were in the original code. xor fusion is no longer there, because it got subsumed by the register re-picking.

Ok, I've split everything but the register re-picking into D30173 and D30177. Let's get those reviewed and once that's done, I'll rebase this with just the register re-picking code. I hope I've also addressed the review comments for the relevant parts in those revisions.

MatzeB resigned from this revision.Aug 15 2017, 11:10 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetInstrInfo.h

11 lines

lib/

CodeGen/

ExecutionDepsFix.cpp

139 lines

Target/

X86/

X86InstrInfo.h

1 line

X86InstrInfo.cpp

17 lines

test/

CodeGen/

X86/

break-false-dep.ll

49 lines

known-bits-vector.ll

2 lines

vec_int_to_fp.ll

5 lines

Diff 85018

include/llvm/Target/TargetInstrInfo.h

Show First 20 Lines • Show All 1,434 Lines • ▼ Show 20 Lines	public:
/// cvtsi2ss %rbx, %xmm0		/// cvtsi2ss %rbx, %xmm0
///		///
/// An <imp-kill> operand should be added to MI if an instruction was		/// An <imp-kill> operand should be added to MI if an instruction was
/// inserted. This ties the instructions together in the post-ra scheduler.		/// inserted. This ties the instructions together in the post-ra scheduler.
///		///
virtual void breakPartialRegDependency(MachineInstr &MI, unsigned OpNum,		virtual void breakPartialRegDependency(MachineInstr &MI, unsigned OpNum,
const TargetRegisterInfo *TRI) const {}		const TargetRegisterInfo *TRI) const {}

		/// May return true if the instruction in question is a dependency breaking
		/// instruction. If so, the register number for which it is dependency
		/// breaking should be returned in `OutReg`. It is prefereable to return
		/// false if the result cannot be determined. This would at worst result
		/// in the insertion of an unnecessary instruction, while the other
		/// alternative could result in significant false-dependency penalties.
		virtual bool isDependencyBreak(MachineInstr &MI,
		unsigned *OutReg = nullptr) const {
		return false;
		}

/// Create machine specific model for scheduling.		/// Create machine specific model for scheduling.
virtual DFAPacketizer *		virtual DFAPacketizer *
CreateTargetScheduleState(const TargetSubtargetInfo &) const {		CreateTargetScheduleState(const TargetSubtargetInfo &) const {
return nullptr;		return nullptr;
}		}

// Sometimes, it is possible for the target		// Sometimes, it is possible for the target
// to tell, even without aliasing information, that two MIs access different		// to tell, even without aliasing information, that two MIs access different
▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

lib/CodeGen/ExecutionDepsFix.cpp

Show First 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	private:
void leaveBasicBlock(MachineBasicBlock*);		void leaveBasicBlock(MachineBasicBlock*);
bool isBlockDone(MachineBasicBlock *);		bool isBlockDone(MachineBasicBlock *);
void processBasicBlock(MachineBasicBlock *MBB, bool PrimaryPass);		void processBasicBlock(MachineBasicBlock *MBB, bool PrimaryPass);
void updateSuccessors(MachineBasicBlock *MBB, bool PrimaryPass);		void updateSuccessors(MachineBasicBlock *MBB, bool PrimaryPass);
bool visitInstr(MachineInstr *);		bool visitInstr(MachineInstr *);
void processDefs(MachineInstr *, bool breakDependency, bool Kill);		void processDefs(MachineInstr *, bool breakDependency, bool Kill);
void visitSoftInstr(MachineInstr*, unsigned mask);		void visitSoftInstr(MachineInstr*, unsigned mask);
void visitHardInstr(MachineInstr*, unsigned domain);		void visitHardInstr(MachineInstr*, unsigned domain);
void pickBestRegisterForUndef(MachineInstr *MI, unsigned OpIdx,		void pickBestRegisterForUndef(MachineInstr *MI, unsigned OpIdx, unsigned Pref,
unsigned Pref);		bool &TrueDependency);
bool shouldBreakDependence(MachineInstr*, unsigned OpIdx, unsigned Pref);		bool shouldBreakDependence(MachineInstr*, unsigned OpIdx, unsigned Pref);

		// Undef Reads
		void collapseUndefReads(unsigned from, unsigned to, unsigned Reg);
		unsigned updateChooseableRegs(SparseSet<unsigned> &,
		const TargetRegisterClass *, bool);
void processUndefReads(MachineBasicBlock*);		void processUndefReads(MachineBasicBlock*);
};		};
}		}

char ExeDepsFix::ID = 0;		char ExeDepsFix::ID = 0;

/// Translate TRI register number to a list of indices into our smaller tables		/// Translate TRI register number to a list of indices into our smaller tables
/// of interesting registers.		/// of interesting registers.
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	bool ExeDepsFix::visitInstr(MachineInstr *MI) {
// Update instructions with explicit execution domains.		// Update instructions with explicit execution domains.
std::pair<uint16_t, uint16_t> DomP = TII->getExecutionDomain(*MI);		std::pair<uint16_t, uint16_t> DomP = TII->getExecutionDomain(*MI);
if (DomP.first) {		if (DomP.first) {
if (DomP.second)		if (DomP.second)
visitSoftInstr(MI, DomP.second);		visitSoftInstr(MI, DomP.second);
else		else
visitHardInstr(MI, DomP.first);		visitHardInstr(MI, DomP.first);
}		}

return !DomP.first;		return !DomP.first;
}		}

/// \brief Helps avoid false dependencies on undef registers by updating the		/// \brief Helps avoid false dependencies on undef registers by updating the
/// machine instructions' undef operand to use a register that the instruction		/// machine instructions' undef operand to use a register that the instruction
/// is truly dependent on, or use a register with clearance higher than Pref.		/// is truly dependent on, or use a register with clearance higher than Pref.
void ExeDepsFix::pickBestRegisterForUndef(MachineInstr *MI, unsigned OpIdx,		void ExeDepsFix::pickBestRegisterForUndef(MachineInstr *MI, unsigned OpIdx,
		myatsinaUnsubmitted Not Done Reply Inline Actions Why not make it return true/false (true dependency) instead of void? myatsina: Why not make it return true/false (true dependency) instead of void?
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions That seems fine. I think I originally had it return a value, which is why I went this way, but since that's no longer the case, I'll update accordingly. loladiro: That seems fine. I think I originally had it return a value, which is why I went this way, but…
unsigned Pref) {		unsigned Pref, bool &TrueDependency) {
MachineOperand &MO = MI->getOperand(OpIdx);		MachineOperand &MO = MI->getOperand(OpIdx);
assert(MO.isUndef() && "Expected undef machine operand");		assert(MO.isUndef() && "Expected undef machine operand");

unsigned OriginalReg = MO.getReg();		unsigned OriginalReg = MO.getReg();

// Update only undef operands that are mapped to one register.		// Update only undef operands that are mapped to one register.
if (AliasMap[OriginalReg].size() != 1)		if (AliasMap[OriginalReg].size() != 1)
return;		return;

// Get the undef operand's register class		// Get the undef operand's register class
const TargetRegisterClass *OpRC =		const TargetRegisterClass *OpRC =
TII->getRegClass(MI->getDesc(), OpIdx, TRI, *MF);		TII->getRegClass(MI->getDesc(), OpIdx, TRI, *MF);

// If the instruction has a true dependency, we can hide the false depdency		// If the instruction has a true dependency, we can hide the false depdency
// behind it.		// behind it.
for (MachineOperand &CurrMO : MI->operands()) {		for (MachineOperand &CurrMO : MI->operands()) {
if (!CurrMO.isReg() \|\| CurrMO.isDef() \|\| CurrMO.isUndef() \|\|		if (!CurrMO.isReg() \|\| CurrMO.isDef() \|\| CurrMO.isUndef() \|\|
!OpRC->contains(CurrMO.getReg()))		!OpRC->contains(CurrMO.getReg()))
continue;		continue;
// We found a true dependency - replace the undef register with the true		// We found a true dependency - replace the undef register with the true
// dependency.		// dependency.
MO.setReg(CurrMO.getReg());		MO.setReg(CurrMO.getReg());
		TrueDependency = true;
return;		return;
}		}

// Go over all registers in the register class and find the register with		// Go over all registers in the register class and find the register with
// max clearance or clearance higher than Pref.		// max clearance or clearance higher than Pref.
unsigned MaxClearance = 0;		unsigned MaxClearance = 0;
unsigned MaxClearanceReg = OriginalReg;		unsigned MaxClearanceReg = OriginalReg;
ArrayRef<MCPhysReg> Order = RegClassInfo.getOrder(OpRC);		ArrayRef<MCPhysReg> Order = RegClassInfo.getOrder(OpRC);
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	void ExeDepsFix::processDefs(MachineInstr *MI, bool breakDependency,
bool Kill) {		bool Kill) {
assert(!MI->isDebugValue() && "Won't process debug values");		assert(!MI->isDebugValue() && "Won't process debug values");

// Break dependence on undef uses. Do this before updating LiveRegs below.		// Break dependence on undef uses. Do this before updating LiveRegs below.
unsigned OpNum;		unsigned OpNum;
if (breakDependency) {		if (breakDependency) {
unsigned Pref = TII->getUndefRegClearance(*MI, OpNum, TRI);		unsigned Pref = TII->getUndefRegClearance(*MI, OpNum, TRI);
if (Pref) {		if (Pref) {
pickBestRegisterForUndef(MI, OpNum, Pref);		bool TrueDependency = false;
if (shouldBreakDependence(MI, OpNum, Pref))		pickBestRegisterForUndef(MI, OpNum, Pref, TrueDependency);
		myatsinaUnsubmitted Not Done Reply Inline Actions And then here you will have : bool TrueDependency = pickBest...() myatsina: And then here you will have : bool TrueDependency = pickBest...()
		// Don't bother adding true dependencies to UndefReads. All we'd find out
		// is that the register is live (since this very instruction depends on
		// it), so we can't do anything.
		if (!TrueDependency && shouldBreakDependence(MI, OpNum, Pref)) {
UndefReads.push_back(std::make_pair(MI, OpNum));		UndefReads.push_back(std::make_pair(MI, OpNum));
}		}
}		}
		}
const MCInstrDesc &MCID = MI->getDesc();		const MCInstrDesc &MCID = MI->getDesc();
for (unsigned i = 0,		for (unsigned i = 0,
e = MI->isVariadic() ? MI->getNumOperands() : MCID.getNumDefs();		e = MI->isVariadic() ? MI->getNumOperands() : MCID.getNumDefs();
i != e; ++i) {		i != e; ++i) {
MachineOperand &MO = MI->getOperand(i);		MachineOperand &MO = MI->getOperand(i);
if (!MO.isReg())		if (!MO.isReg())
continue;		continue;
if (MO.isUse())		if (MO.isUse())
Show All 14 Lines	for (int rx : regIndices(MO.getReg())) {
// How many instructions since rx was last written?		// How many instructions since rx was last written?
LiveRegs[rx].Def = CurInstr;		LiveRegs[rx].Def = CurInstr;

// Kill off domains redefined by generic instructions.		// Kill off domains redefined by generic instructions.
if (Kill)		if (Kill)
kill(rx);		kill(rx);
}		}
}		}
		unsigned DepReg = 0;
		if (TII->isDependencyBreak(*MI, &DepReg)) {
		for (int rx : regIndices(DepReg)) {
		// This instruction is a dependency break, so there are no clearance
		// issues, reset the counter.
		LiveRegs[rx].Def = -(1 << 20);
		myatsinaUnsubmitted Not Done Reply Inline Actions This feature should be heavily tested. This is the "xor fusion" logic, right? We're adding xors for dependency break in a late stage of the program, after clearance calculation of everyone was already done, including loops (i.e. you're previous patch that changes the calculation algorithm that you've comitted). Now you are changing the clearance for the xor's we've added, if I understand this correctly. So somehow you need to update your old clearance calculation, no? Or is this code aimed to only catch "xor"s that were in the original code. I would like to see a lot of tests for these feature (without the affect of the other features): How does it affect simple one BB code? What is the affect on simple loops with one xor to this register and no dependency? What is the affect on complex loops that no xors and assigns to the same register: loop: cvt xor xmm0 cvt xmm0 = cvt cvt xor xmm0 cvt xmm0 = myatsina: This feature should be heavily tested. This is the "xor fusion" logic, right? We're adding…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions This is just for xors that were in the original code. xor fusion is no longer there, because it got subsumed by the register re-picking. loladiro: This is just for xors that were in the original code. xor fusion is no longer there, because it…
		}
		}
++CurInstr;		++CurInstr;
}		}

		// Set the undef read register to `Reg` for all UndefReads in the range
		// [from,to).
		myatsinaUnsubmitted Not Done Reply Inline Actions Maybe it's better to use iterator_range for this purpose? You can see regIndices example in this file. myatsina: Maybe it's better to use iterator_range for this purpose? You can see regIndices example in…
		void ExeDepsFix::collapseUndefReads(unsigned from, unsigned to, unsigned Reg) {
		if (from >= to)
		return;
		for (unsigned i = from; i < to; ++i) {
		MachineInstr *MI = std::get<0>(UndefReads[i]);
		unsigned OpIdx = std::get<1>(UndefReads[i]);
		MachineOperand &MO = MI->getOperand(OpIdx);
		MO.setReg(Reg);
		}
		TII->breakPartialRegDependency(*std::get<0>(UndefReads[from]),
		std::get<1>(UndefReads[from]), TRI);
		}

		unsigned ExeDepsFix::updateChooseableRegs(SparseSet<unsigned> &ChoosableRegs,
		myatsinaUnsubmitted Not Done Reply Inline Actions Please document this function and what's its purpose myatsina: Please document this function and what's its purpose
		const TargetRegisterClass *OpRC,
		bool add) {
		unsigned LowestValid = (unsigned)-1;
		ArrayRef<MCPhysReg> Order = RegClassInfo.getOrder(OpRC);
		for (auto Reg : Order) {
		if (LiveRegSet.contains(Reg))
		ChoosableRegs.erase(Reg);
		myatsinaUnsubmitted Not Done Reply Inline Actions formatting issue - please surround the if with {} and thus be consistent with the style of the other conditions in this if- if - else if-else myatsina: formatting issue - please surround the if with {} and thus be consistent with the style of the…
		myatsinaUnsubmitted Not Done Reply Inline Actions There's something that doesn't make sense for me. Sometimes we're adding registers to ChoosableRegs and sometimes we're erasing registers from it. Why did we add something "bad" in the first place? It seem like this function is trying to mix 2 different (but close) functionalities, and I don't think it's right. myatsina: There's something that doesn't make sense for me. Sometimes we're adding registers to…
		else if (add) {
		ChoosableRegs.insert(Reg);
		if (LowestValid == (unsigned)-1)
		LowestValid = Reg;
		} else if (ChoosableRegs.count(Reg) == 1) {
		myatsinaUnsubmitted Not Done Reply Inline Actions why do you need this case for? myatsina: why do you need this case for?
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions Only to make sure we actually return the correct LowestValid. loladiro: Only to make sure we actually return the correct LowestValid.
		if (LowestValid == (unsigned)-1)
		LowestValid = Reg;
		}
		}
		return LowestValid;
		myatsinaUnsubmitted Not Done Reply Inline Actions why do we choose lowest register? why not choose a "far away" register (this way we may even be able to save the xor)? myatsina: why do we choose lowest register? why not choose a "far away" register (this way we may even be…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions We want to pick an arbitrary register from the choosable set. Picking the lowest one ensures a consistent choice. Picking a far away one doesn't make sense, because to get to this point, we've already determined that no registers are sufficiently far away. loladiro: We want to pick an arbitrary register from the choosable set. Picking the lowest one ensures a…
		}

/// \break Break false dependencies on undefined register reads.		/// \break Break false dependencies on undefined register reads.
///		///
/// Walk the block backward computing precise liveness. This is expensive, so we		/// Walk the block backward computing precise liveness. This is expensive, so we
/// only do it on demand. Note that the occurrence of undefined register reads		/// only do it on demand. Note that the occurrence of undefined register reads
/// that should be broken is very rare, but when they occur we may have many in		/// that should be broken is very rare, but when they occur we may have many in
/// a single block.		/// a single block.
void ExeDepsFix::processUndefReads(MachineBasicBlock *MBB) {		void ExeDepsFix::processUndefReads(MachineBasicBlock *MBB) {
if (UndefReads.empty())		if (UndefReads.empty())
return;		return;

		// We want to be slightly clever here, to avoid the following common pattern:
		// Suppose we have some instruction `vrandom %in, %out` and the following code
		// vrandom %xmm0<undef>, %xmm0<def>
		myatsinaUnsubmitted Not Done Reply Inline Actions Do you mean vroundss? In this case it's a bad example. It uses 3 xmms and you should hide the false dependency under the true dependency of the source xmm. I think you should put here the full instruction with all the operands (vcvtsi2sd will probably be good enough for your point). myatsina: Do you mean vroundss? In this case it's a bad example. It uses 3 xmms and you should hide the…
		// vrandom %xmm1<undef>, %xmm1<def>
		// vrandom %xmm2<undef>, %xmm2<def>
		// vrandom %xmm3<undef>, %xmm3<def>
		// The earlier logic likes to produce these, because it picks the first
		// register
		// to break ties in clearance. However, most register allocators pick the dest
		// register the same way. Naively, we'd have to insert a dependency break,
		// before every instruction above. However, what we really want is
		// vxorps %xmm3, %xmm3, %xmm3
		myatsinaUnsubmitted Not Done Reply Inline Actions Shouldn't "pickBestRegister" catch these cases? And if not, maybe we've missed an optimization case there. I'm not sure we should have this in this form. What is the scenario you're trying to optimizer and that "pickBestRegister" doesn't catch? myatsina: Shouldn't "pickBestRegister" catch these cases? And if not, maybe we've missed an optimization…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions See the non-inline reply. This is fundamentally a reverse operation. loladiro: See the non-inline reply. This is fundamentally a reverse operation.
		// vrandom %xmm3<undef>, %xmm0<def>
		// vrandom %xmm3<undef>, %xmm1<def>
		// vrandom %xmm3<undef>, %xmm2<def>
		// vrandom %xmm3<undef>, %xmm3<def>
		// To do so, we walk backwards and cumulatively keep track of which registers
		// we can use to break the dependency. Then, once the set has collapsed, we
		// reset the undef read register for all following instructions.

// Collect this block's live out register units.		// Collect this block's live out register units.
LiveRegSet.init(*TRI);		LiveRegSet.init(*TRI);
// We do not need to care about pristine registers as they are just preserved		// We do not need to care about pristine registers as they are just preserved
// but not actually used in the function.		// but not actually used in the function.
LiveRegSet.addLiveOutsNoPristines(*MBB);		LiveRegSet.addLiveOutsNoPristines(*MBB);

MachineInstr *UndefMI = UndefReads.back().first;		SparseSet<unsigned> ChoosableRegs;
unsigned OpIdx = UndefReads.back().second;		ChoosableRegs.setUniverse(TRI->getNumRegs());

		unsigned LastValid = (unsigned)-1;
		const TargetRegisterClass *LastOpRC = nullptr;
		size_t i, LastInit;
		i = LastInit = UndefReads.size() - 1;
		myatsinaUnsubmitted Not Done Reply Inline Actions Please try to avoid defining variable's initial values this way. It not very readable. How about: size_t LastInit = ...; size_t i = LastInit; myatsina: Please try to avoid defining variable's initial values this way. It not very readable. How…
		MachineInstr *UndefMI = std::get<0>(UndefReads[i]);

for (MachineInstr &I : make_range(MBB->rbegin(), MBB->rend())) {		for (MachineInstr &I : make_range(MBB->rbegin(), MBB->rend())) {
// Update liveness, including the current instruction's defs.		// Update liveness, including the current instruction's defs.
LiveRegSet.stepBackward(I);		LiveRegSet.stepBackward(I);

		// This ensures that we don't accidentally pick a register whose live region
		// lies entirely between two undef reads (since that would defeat the
		// purpose of breaking the dependency).
		for (auto LiveReg : LiveRegSet)
		ChoosableRegs.erase(LiveReg);
		myatsinaUnsubmitted Not Done Reply Inline Actions Aren't you checking the liverange set in updateChooseableRegs? Why do it here too? myatsina: Aren't you checking the liverange set in updateChooseableRegs? Why do it here too?
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions `updateChoosableRegs` only looks at the instructions before which we require dependency breaks. This kills choosability for any registers that have liveness entirely between two such instructions. loladiro: `updateChoosableRegs` only looks at the instructions before which we require dependency breaks.

if (UndefMI == &I) {		if (UndefMI == &I) {
if (!LiveRegSet.contains(UndefMI->getOperand(OpIdx).getReg()))		unsigned OpIdx = std::get<1>(UndefReads[i]);
TII->breakPartialRegDependency(*UndefMI, OpIdx, TRI);		// Get the undef operand's register class
		const TargetRegisterClass *OpRC =
		TII->getRegClass(UndefMI->getDesc(), OpIdx, TRI, *MF);
		if (OpRC != LastOpRC \|\| ChoosableRegs.size() == 0) {
		myatsinaUnsubmitted Not Done Reply Inline Actions Please document the meaning of this code, and the next section as well. What are each of these sections doing? It's really hard for to follow this. I read the documentation that you've added earlier on of what is the purpose of your transformation, but I don't really understand how you iterate over the undefs and achieve this, why you've split it to several sections and the cases each section handles. myatsina: Please document the meaning of this code, and the next section as well. What are each of these…
		if (LastInit != i) {
		if (LastValid != (unsigned)-1)
		collapseUndefReads(i + 1, LastInit + 1, LastValid);
		ChoosableRegs.clear();
		LastInit = i;
		}
		}

UndefReads.pop_back();		unsigned LowestValid =
if (UndefReads.empty())		updateChooseableRegs(ChoosableRegs, OpRC, LastInit == i);
return;
		if (ChoosableRegs.size() == 0) {
		if (LastInit != i) {
		if (LastValid != (unsigned)-1)
		collapseUndefReads(i + 1, LastInit + 1, LastValid);
		LowestValid = updateChooseableRegs(ChoosableRegs, OpRC, true);
		LastInit = i;
		}
		}
		LastValid = LowestValid;
		LastOpRC = OpRC;

		if (i == 0)
		break;

UndefMI = UndefReads.back().first;		UndefMI = std::get<0>(UndefReads[--i]);
OpIdx = UndefReads.back().second;
}		}
}		}
		if (LastValid != (unsigned)-1)
		collapseUndefReads(0, LastInit + 1, LastValid);
}		}

// A hard instruction only works in one domain. All input registers will be		// A hard instruction only works in one domain. All input registers will be
// forced into that domain.		// forced into that domain.
void ExeDepsFix::visitHardInstr(MachineInstr *mi, unsigned domain) {		void ExeDepsFix::visitHardInstr(MachineInstr *mi, unsigned domain) {
// Collapse all uses.		// Collapse all uses.
for (unsigned i = mi->getDesc().getNumDefs(),		for (unsigned i = mi->getDesc().getNumDefs(),
e = mi->getDesc().getNumOperands(); i != e; ++i) {		e = mi->getDesc().getNumOperands(); i != e; ++i) {
▲ Show 20 Lines • Show All 303 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.h

Show First 20 Lines • Show All 478 Lines • ▼ Show 20 Lines	public:

unsigned		unsigned
getPartialRegUpdateClearance(const MachineInstr &MI, unsigned OpNum,		getPartialRegUpdateClearance(const MachineInstr &MI, unsigned OpNum,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
unsigned getUndefRegClearance(const MachineInstr &MI, unsigned &OpNum,		unsigned getUndefRegClearance(const MachineInstr &MI, unsigned &OpNum,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
void breakPartialRegDependency(MachineInstr &MI, unsigned OpNum,		void breakPartialRegDependency(MachineInstr &MI, unsigned OpNum,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
		bool isDependencyBreak(MachineInstr &MI, unsigned *OutReg) const override;

MachineInstr *foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		MachineInstr *foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
unsigned OpNum,		unsigned OpNum,
ArrayRef<MachineOperand> MOs,		ArrayRef<MachineOperand> MOs,
MachineBasicBlock::iterator InsertPt,		MachineBasicBlock::iterator InsertPt,
unsigned Size, unsigned Alignment,		unsigned Size, unsigned Alignment,
bool AllowCommute) const;		bool AllowCommute) const;

▲ Show 20 Lines • Show All 114 Lines • Show Last 20 Lines

lib/Target/X86/X86InstrInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,490 Lines • ▼ Show 20 Lines	if (X86::VR128RegClass.contains(Reg)) {
BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), get(X86::VXORPSrr), XReg)		BuildMI(*MI.getParent(), MI, MI.getDebugLoc(), get(X86::VXORPSrr), XReg)
.addReg(XReg, RegState::Undef)		.addReg(XReg, RegState::Undef)
.addReg(XReg, RegState::Undef)		.addReg(XReg, RegState::Undef)
.addReg(Reg, RegState::ImplicitDefine);		.addReg(Reg, RegState::ImplicitDefine);
MI.addRegisterKilled(Reg, TRI, true);		MI.addRegisterKilled(Reg, TRI, true);
}		}
}		}

		bool X86InstrInfo::isDependencyBreak(MachineInstr &MI, unsigned *OutReg) const {
		unsigned Opc = MI.getOpcode();
		if (!(Opc == X86::VXORPSrr \|\| Opc == X86::VXORPDrr \|\| Opc == X86::XORPSrr \|\|
		myatsinaUnsubmitted Not Done Reply Inline Actions There are additional instructions like KXORBrr, KXORwrr, KXORDrr, KXORQrr(AVX512). Others might be added in the future, so perhaps we should set the right infrastructure for that. swithc (op): default: return false; case XORPS: case XORPS: ... <check xor instruction> myatsina: There are additional instructions like KXORBrr, KXORwrr, KXORDrr, KXORQrr(AVX512). Others might…
		Opc == X86::XORPDrr))
		return false;
		unsigned Reg = 0;
		myatsinaUnsubmitted Not Done Reply Inline Actions you can use X86::NoRegister instead myatsina: you can use X86::NoRegister instead
		for (unsigned i = 0; i < MI.getNumOperands(); ++i) {
		myatsinaUnsubmitted Not Done Reply Inline Actions You can iterate directly over operands myatsina: You can iterate directly over operands
		const MachineOperand &MO = MI.getOperand(i);
		if (!MO.isReg() \|\| (Reg != 0 && MO.getReg() != Reg))
		return false;
		myatsinaUnsubmitted Not Done Reply Inline Actions I think this whole section of could would be a bit clearer if you write it in this spirit: for () myatsina: I think this whole section of could would be a bit clearer if you write it in this spirit…
		myatsinaUnsubmitted Not Done Reply Inline Actions Ignore the comment above :) I forgot to delete it. myatsina: Ignore the comment above :) I forgot to delete it.
		Reg = MO.getReg();
		}
		if (OutReg)
		myatsinaUnsubmitted Not Done Reply Inline Actions I see that you write a lot of functions so that they could return 2 arguments and that you've used the same "trick" in other functions as well. I don't thing it's a good approach in general. This need of a function to return 2 values generally indicates that you need to extract the common logic somehow or define a new type or do some other form of refactoring refactoring. In this case I think "isDependencyBreak" should only return true/false (as the name indicates). If the module asking wants to know which register it is then it can find the dest reg on its own, or we can provide the module a "getDependencyBreakReg". In general it will behave similar to the already existing "isReg()" and "getReg()". myatsina: I see that you write a lot of functions so that they could return 2 arguments and that you've…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions Fair enough (maybe I've been writing too much C code). In general I dislike splitting such logic across two functions since it needs to be kept in sync (e.g. when adding additional dependency breaking instructions to be recognized). How about adding an `getDependencyBreakReg` that just returns 0 or an llvm::Optional, if it's not a dependency breaking instruction. loladiro: Fair enough (maybe I've been writing too much C code). In general I dislike splitting such…
		myatsinaUnsubmitted Not Done Reply Inline Actions The llvm:optional seems very cool :) I wasn't familiar with it :) I think it's the best for us. I like it better than reg == 0 (I don't think there's a 0 is "non reg" for all targets). Anyway, here are some additional options I thought of as well: implement getDepRegBreak and isDepRegBreak when one uses the other isDepsRegBreak() { return getRegBreak() != X86::NON_REG } This way you don't have the duplication of the logic that you've talked about. getDependencyBreak will return operand number (of dest reg) or -1 if it's not breaking myatsina: The llvm:optional seems very cool :) I wasn't familiar with it :) I think it's the best for us.
		*OutReg = Reg;
		return true;
		}

MachineInstr *		MachineInstr *
X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,		X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF, MachineInstr &MI,
ArrayRef<unsigned> Ops,		ArrayRef<unsigned> Ops,
MachineBasicBlock::iterator InsertPt,		MachineBasicBlock::iterator InsertPt,
int FrameIndex, LiveIntervals *LIS) const {		int FrameIndex, LiveIntervals *LIS) const {
// Check switch flag		// Check switch flag
if (NoFusing)		if (NoFusing)
return nullptr;		return nullptr;
▲ Show 20 Lines • Show All 2,225 Lines • Show Last 20 Lines

test/CodeGen/X86/break-false-dep.ll

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	top:
tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{dirflag},~{fpsr},~{flags}"()		tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{dirflag},~{fpsr},~{flags}"()
%tmp1 = fpext float %arg to double		%tmp1 = fpext float %arg to double
ret double %tmp1		ret double %tmp1
;AVX-LABEL:@truedeps		;AVX-LABEL:@truedeps
;AVX-NOT: vxorps		;AVX-NOT: vxorps
;AVX: vcvtss2sd [[XMM0:%xmm[0-9]+]], [[XMM0]], {{%xmm[0-9]+}}		;AVX: vcvtss2sd [[XMM0:%xmm[0-9]+]], [[XMM0]], {{%xmm[0-9]+}}
}		}

; Make sure we are making a smart choice regarding undef registers and		define double @clearence(double %x, i64 %arg) {
; choosing the register with the highest clearence
define double @clearence(i64 %arg) {
top:		top:
tail call void asm sideeffect "", "~{xmm6},~{dirflag},~{fpsr},~{flags}"()		;AVX-LABEL:@clearence
		myatsinaUnsubmitted Not Done Reply Inline Actions Each feature should have it's own test. You are not supposed to eliminate the original test (unless the original test's logic is no longer true), you're suppose to a small unit test for each feature. This test was originally designed to check "pickBestRegister". myatsina: Each feature should have it's own test. You are not supposed to eliminate the original test…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions It still checks pickBestRegister, but in a way that remains valid after D28786, by making use of the fact that the `fcmp ult double %x, 0.0` materializes a constant `0.0` as a dependency breaking instruction, so pickBestRegister can pick up on that. loladiro: It still checks pickBestRegister, but in a way that remains valid after D28786, by making use…
tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm3},~{dirflag},~{fpsr},~{flags}"()		; This is carefully constructed to force LLVM to materialize a vxorps, which
		myatsinaUnsubmitted Not Done Reply Inline Actions This test is solely for testing the xor combining you've added? Or is it suppose to test something else? I already see you've added a dedicated test for the "xor", so I'm not sure why you've changed this one. If it's only for xor combine then you can do a very simple test like this: define double @clearence(i64 %arg1, i64 %arg2) { // the inline asm calls that make xmm6 as the best choice %tmp1 = sitofp i64 %arg1 to double %tmp2 = sitofp i64 %arg2 to double %tmp3 = add i64 %tmp1, %tmp2 ret double %tmp3 } In this case xmm6 should be chosen as the undef for both convert instructions. and there should only be one xor myatsina: This test is solely for testing the xor combining you've added? Or is it suppose to test…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions See above loladiro: See above
tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm7},~{dirflag},~{fpsr},~{flags}"()		; also implicitly breaks the dependency, making it a good candidate for the
		; undef read below
		;AVX: vxorps [[XMM1:%xmm1]], [[XMM1]], [[XMM1]]
		;AVX: vucomisd [[XMM1]], %xmm0
		%0 = fcmp ult double %x, 0.0
		myatsinaUnsubmitted Not Done Reply Inline Actions why did you add this part? What is the logic that is tests? myatsina: why did you add this part? What is the logic that is tests?
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions See above loladiro: See above
		br i1 %0, label %main, label %fake

		main:
		tail call void asm sideeffect "", "~{xmm0},~{xmm2},~{xmm3},~{dirflag},~{fpsr},~{flags}"()
		tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{dirflag},~{fpsr},~{flags}"()
tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{dirflag},~{fpsr},~{flags}"()		tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{dirflag},~{fpsr},~{flags}"()
tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{dirflag},~{fpsr},~{flags}"()		tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{dirflag},~{fpsr},~{flags}"()
%tmp1 = sitofp i64 %arg to double		%tmp1 = sitofp i64 %arg to double
ret double %tmp1		ret double %tmp1
;AVX-LABEL:@clearence		; Check that we re-use the dependency break from above
;AVX: vxorps [[XMM6:%xmm6]], [[XMM6]], [[XMM6]]		;AVX-NOT: vxorps
;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM6]], {{%xmm[0-9]+}}		;AVX: vcvtsi2sdq {{.*}}, [[XMM1]], {{%xmm[0-9]+}}
		fake:
		ret double 0.0
}		}

; Make sure we are making a smart choice regarding undef registers in order to		; Make sure we are making a smart choice regarding undef registers in order to
; avoid a cyclic dependence on a write to the same register in a previous		; avoid a cyclic dependence on a write to the same register in a previous
; iteration, especially when we cannot zero out the undef register because it		; iteration, especially when we cannot zero out the undef register because it
; is alive.		; is alive.
define i64 @loopclearence(i64* nocapture %x, double* nocapture %y) nounwind {		define i64 @loopclearence(i64* nocapture %x, double* nocapture %y) nounwind {
entry:		entry:
▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	;AVX-NOT: %xmm6
%outptr = getelementptr double, double* %y, i64 %prev_j		%outptr = getelementptr double, double* %y, i64 %prev_j
store double %div, double* %outptr, align 8		store double %div, double* %outptr, align 8
%done = icmp slt i64 %size, %nexti		%done = icmp slt i64 %size, %nexti
br i1 %done, label %loopdone, label %loop		br i1 %done, label %loopdone, label %loop

loopdone:		loopdone:
ret void		ret void
}		}

		define double @breakoptimization(i64 %a, i64 %b, i64 %c, i64 %d) {
		;AVX-LABEL:@breakoptimization
		top:
		tail call void asm sideeffect "", "~{xmm0},~{xmm1},~{xmm2},~{xmm3},~{dirflag},~{fpsr},~{flags}"()
		tail call void asm sideeffect "", "~{xmm4},~{xmm5},~{xmm6},~{xmm7},~{dirflag},~{fpsr},~{flags}"()
		tail call void asm sideeffect "", "~{xmm8},~{xmm9},~{xmm10},~{xmm11},~{dirflag},~{fpsr},~{flags}"()
		tail call void asm sideeffect "", "~{xmm12},~{xmm13},~{xmm14},~{xmm15},~{dirflag},~{fpsr},~{flags}"()
		;AVX: vxorps [[XMM:%xmm[0-9]+]], [[XMM]], [[XMM]]
		;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM]], {{%xmm[0-9]+}}
		;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM]], {{%xmm[0-9]+}}
		;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM]], {{%xmm[0-9]+}}
		;AVX-NEXT: vcvtsi2sdq {{.*}}, [[XMM]], {{%xmm[0-9]+}}
		%af = sitofp i64 %a to double
		%bf = sitofp i64 %b to double
		%cf = sitofp i64 %c to double
		%df = sitofp i64 %d to double
		%fadd1 = fadd double %af, %bf
		%fadd2 = fadd double %cf, %df
		%fadd3 = fadd double %fadd1, %fadd2
		ret double %fadd3
		}
		myatsinaUnsubmitted Not Done Reply Inline Actions where are the tests for all the other optimizations you've added? The "isBreakingDependency", "register re-picking", etc? myatsina: where are the tests for all the other optimizations you've added? The "isBreakingDependency"…
		loladiroAuthorUnsubmitted Not Done Reply Inline Actions This is the test for register re-picking. loladiro: This is the test for register re-picking.
		myatsinaUnsubmitted Not Done Reply Inline Actions I think you need added additional more complex tests fore the re-picking. What happens if we're in a loop? or in a loop with a carry on dependency from the previous iteration? What happens if you have instructions between the converts that make some of you original potential registers "alive" and then you can no longer use them? myatsina: I think you need added additional more complex tests fore the re-picking. What happens if we're…

test/CodeGen/X86/known-bits-vector.ll

	Show All 36 Lines
	; X32-NEXT: popl %ebp			; X32-NEXT: popl %ebp
	; X32-NEXT: retl			; X32-NEXT: retl
	;			;
	; X64-LABEL: knownbits_mask_extract_uitofp:			; X64-LABEL: knownbits_mask_extract_uitofp:
	; X64: # BB#0:			; X64: # BB#0:
	; X64-NEXT: vpxor %xmm1, %xmm1, %xmm1			; X64-NEXT: vpxor %xmm1, %xmm1, %xmm1
	; X64-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3],xmm0[4,5,6,7]			; X64-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3],xmm0[4,5,6,7]
	; X64-NEXT: vmovq %xmm0, %rax			; X64-NEXT: vmovq %xmm0, %rax
	; X64-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0			; X64-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm0
	; X64-NEXT: retq			; X64-NEXT: retq
	%1 = and <2 x i64> %a0, <i64 65535, i64 -1>			%1 = and <2 x i64> %a0, <i64 65535, i64 -1>
	%2 = extractelement <2 x i64> %1, i32 0			%2 = extractelement <2 x i64> %1, i32 0
	%3 = uitofp i64 %2 to float			%3 = uitofp i64 %2 to float
	ret float %3			ret float %3
	}			}

	define <4 x float> @knownbits_insert_uitofp(<4 x i32> %a0, i16 %a1, i16 %a2) nounwind {			define <4 x float> @knownbits_insert_uitofp(<4 x i32> %a0, i16 %a1, i16 %a2) nounwind {
	▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

test/CodeGen/X86/vec_int_to_fp.ll

	Show First 20 Lines • Show All 1,659 Lines • ▼ Show 20 Lines
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0
	; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0			; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; VEX-NEXT: .LBB39_6:			; VEX-NEXT: .LBB39_6:
	; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]			; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
	; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1			; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: testq %rax, %rax			; VEX-NEXT: testq %rax, %rax
	; VEX-NEXT: js .LBB39_8			; VEX-NEXT: js .LBB39_8
	; VEX-NEXT: # BB#7:			; VEX-NEXT: # BB#7:
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1			; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1
				myatsinaUnsubmitted Not Done Reply Inline Actions Why is using xmm1 ok? If we fall through from the revius BB where xmm1 is xored, then it's ok but we might jump to this block from another place and if we use xmm1 and won't have a xor, then we will have a stall, no? myatsina: Why is using xmm1 ok? If we fall through from the revius BB where xmm1 is xored, then it's ok…
				loladiroAuthorUnsubmitted Not Done Reply Inline Actions As far as I can tell, nothing else jumps in between the vxorps and this undefined use, so it's a good register to use. loladiro: As far as I can tell, nothing else jumps in between the vxorps and this undefined use, so it's…
	; VEX-NEXT: .LBB39_8:			; VEX-NEXT: .LBB39_8:
	; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]			; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]
	; VEX-NEXT: retq			; VEX-NEXT: retq
	;			;
	; AVX512F-LABEL: uitofp_2i64_to_4f32:			; AVX512F-LABEL: uitofp_2i64_to_4f32:
	; AVX512F: # BB#0:			; AVX512F: # BB#0:
	; AVX512F-NEXT: vpextrq $1, %xmm0, %rax			; AVX512F-NEXT: vpextrq $1, %xmm0, %rax
	; AVX512F-NEXT: vcvtusi2ssq %rax, %xmm1, %xmm1			; AVX512F-NEXT: vcvtusi2ssq %rax, %xmm1, %xmm1
	▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines
	define <4 x float> @uitofp_4i64_to_4f32_undef(<2 x i64> %a) {			define <4 x float> @uitofp_4i64_to_4f32_undef(<2 x i64> %a) {
	; SSE-LABEL: uitofp_4i64_to_4f32_undef:			; SSE-LABEL: uitofp_4i64_to_4f32_undef:
	; SSE: # BB#0:			; SSE: # BB#0:
	; SSE-NEXT: movdqa %xmm0, %xmm1			; SSE-NEXT: movdqa %xmm0, %xmm1
	; SSE-NEXT: testq %rax, %rax			; SSE-NEXT: testq %rax, %rax
	; SSE-NEXT: xorps %xmm2, %xmm2			; SSE-NEXT: xorps %xmm2, %xmm2
	; SSE-NEXT: js .LBB41_2			; SSE-NEXT: js .LBB41_2
	; SSE-NEXT: # BB#1:			; SSE-NEXT: # BB#1:
	; SSE-NEXT: xorps %xmm2, %xmm2
	; SSE-NEXT: cvtsi2ssq %rax, %xmm2			; SSE-NEXT: cvtsi2ssq %rax, %xmm2
	; SSE-NEXT: .LBB41_2:			; SSE-NEXT: .LBB41_2:
	; SSE-NEXT: movd %xmm1, %rax			; SSE-NEXT: movd %xmm1, %rax
	; SSE-NEXT: testq %rax, %rax			; SSE-NEXT: testq %rax, %rax
	; SSE-NEXT: js .LBB41_3			; SSE-NEXT: js .LBB41_3
	; SSE-NEXT: # BB#4:			; SSE-NEXT: # BB#4:
	; SSE-NEXT: xorps %xmm0, %xmm0			; SSE-NEXT: xorps %xmm0, %xmm0
	; SSE-NEXT: cvtsi2ssq %rax, %xmm0			; SSE-NEXT: cvtsi2ssq %rax, %xmm0
	▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0			; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm0
	; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0			; VEX-NEXT: vaddss %xmm0, %xmm0, %xmm0
	; VEX-NEXT: .LBB41_6:			; VEX-NEXT: .LBB41_6:
	; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]			; VEX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]
	; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1			; VEX-NEXT: vxorps %xmm1, %xmm1, %xmm1
	; VEX-NEXT: testq %rax, %rax			; VEX-NEXT: testq %rax, %rax
	; VEX-NEXT: js .LBB41_8			; VEX-NEXT: js .LBB41_8
	; VEX-NEXT: # BB#7:			; VEX-NEXT: # BB#7:
	; VEX-NEXT: vcvtsi2ssq %rax, %xmm2, %xmm1			; VEX-NEXT: vcvtsi2ssq %rax, %xmm1, %xmm1
	; VEX-NEXT: .LBB41_8:			; VEX-NEXT: .LBB41_8:
	; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]			; VEX-NEXT: vshufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]
	; VEX-NEXT: retq			; VEX-NEXT: retq
	;			;
	; AVX512F-LABEL: uitofp_4i64_to_4f32_undef:			; AVX512F-LABEL: uitofp_4i64_to_4f32_undef:
	; AVX512F: # BB#0:			; AVX512F: # BB#0:
	; AVX512F-NEXT: vpextrq $1, %xmm0, %rax			; AVX512F-NEXT: vpextrq $1, %xmm0, %rax
	; AVX512F-NEXT: vcvtusi2ssq %rax, %xmm1, %xmm1			; AVX512F-NEXT: vcvtusi2ssq %rax, %xmm1, %xmm1
	▲ Show 20 Lines • Show All 2,909 Lines • Show Last 20 Lines