This is an archive of the discontinued LLVM Phabricator instance.

[X86][FastISel] Avoid introducing legacy SSE instructions if the subtarget has AVX.
ClosedPublic

Authored by andreadb on Feb 5 2015, 8:06 AM.

Download Raw Diff

Details

Reviewers

mkuper
craig.topper
ributzka

Commits

rG62622d23969b: [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX.
rL228682: [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX.

Summary

This patch teaches X86FastISel how to select scalar float/double convert operations using AVX instructions.

Before this patch, X86FastISel always selected legacy SSE instructions for FPExt (from float to double) and FPTrunc (from double to float).

For example:
\code

define double @foo(float %f) {
  %conv = fpext float %f to double
  ret double %conv
}

\end code

Before (with -mattr=+avx -fast-isel), X86FastIsel selected a CVTSS2SDrr which is
legacy SSE:

cvtss2sd %xmm0, %xmm0

With this patch (with -mattr=+avx -fast-isel), X86FastIsel selects a VCVTSS2SDrr instead:

vcvtss2sd %xmm0, %xmm0, %xmm0

Added test fast-isel-fptrunc-fpext.ll to check both the register-register and the register-memory float/double conversion variants.

Please let me know if ok to submit.

Thanks!
Andrea

Diff Detail

Event Timeline

andreadb updated this revision to Diff 19407.Feb 5 2015, 8:06 AM

andreadb retitled this revision from to [X86][FastISel] Avoid introducing legacy SSE instructions if the subtarget has AVX..

andreadb updated this object.

andreadb edited the test plan for this revision. (Show Details)

andreadb added reviewers: craig.topper, mkuper, ributzka.

andreadb added a subscriber: Unknown Object (MLST).

Looks good in general, with a couple of comments.

lib/Target/X86/X86FastISel.cpp
2019	I have the feeling just reusing OpReg here would be better. It's true that having an IMPLICIT_DEF, in theory, gives the regalloc more freedom, but I'm afraid it may just cause false-dependence trouble further on if decides to choose a different register. So it may be better to just force OpReg. This is what the pattern for the rr version does, btw: def : Pat<(f64 (fextend FR32:$src)), (VCVTSS2SDrr FR32:$src, FR32:$src)>, Requires<[UseAVX]>; The pattern for the rm version has an IMPLICIT_DEF, but in that case, there is no choice.
2046	Perhaps this is now large enough to move the common code from the two functions into a helper?
test/CodeGen/X86/fast-isel-fptrunc-fpext.ll
25	If I remember correctly, this will also match "vcvtss2sd %xmm0, %xmm0, %xmm0" since there's no requirement for the match to start at the beginning of a line, and this is a partial match. Perhaps have an SSE-NOT for the v version? (Same applies to all testcases)

Hi Michael,

thanks for the review. I will send a new version of the patch that addresses all your comments.

-Andrea

lib/Target/X86/X86FastISel.cpp
2019	Right, I will remove the IMPLICIT_DEF and just use OpReg for both operands in the AVX case. The reason why I ended up using IMPLICIT_DEF was to give a chance to FastISel to also match the 'register-memory' variants of scalar float/double convert.
2046	Good point. I'll try to move the common code into a helper function.
test/CodeGen/X86/fast-isel-fptrunc-fpext.ll
25	Sure, I will change the tests.

Hi Michael,

Here is a new version of the patch which hopefully addresses all your comments.
These are the main differences with respect to the previous patch:

I removed all the uses of IMPLICIT_DEF registers. If the target has AVX, the same register is passed to both input operands.
I changed the tests so that on SSE, we check that the instruction generated doesn't have a vex prefix.
I tried to factor out common code into a separate helper named 'X86SelectFPExtOrFPTrunc'.

Please let me know if ok to submit.

Thanks again for your time,
Andrea

Thanks, Andrea!
LGTM.

This revision is now accepted and ready to land.Feb 9 2015, 10:36 PM

Closed by commit rL228682: [X86][FastIsel] Avoid introducing legacy SSE instructions if the target has AVX. (authored by adibiagio). · Explain WhyFeb 10 2015, 4:06 AM

This revision was automatically updated to reflect the committed changes.

Thanks Michael!
committed revision 228682.

Revision Contents

Path

Size

lib/

Target/

X86/

X86FastISel.cpp

58 lines

test/

CodeGen/

X86/

fast-isel-fptrunc-fpext.ll

63 lines

Diff 19407

lib/Target/X86/X86FastISel.cpp

	Show First 20 Lines • Show All 2,003 Lines • ▼ Show 20 Lines
	bool X86FastISel::X86SelectFPExt(const Instruction *I) {			bool X86FastISel::X86SelectFPExt(const Instruction *I) {
	// fpext from float to double.			// fpext from float to double.
	if (X86ScalarSSEf64 &&			if (X86ScalarSSEf64 &&
	I->getType()->isDoubleTy()) {			I->getType()->isDoubleTy()) {
	const Value *V = I->getOperand(0);			const Value *V = I->getOperand(0);
	if (V->getType()->isFloatTy()) {			if (V->getType()->isFloatTy()) {
	unsigned OpReg = getRegForValue(V);			unsigned OpReg = getRegForValue(V);
	if (OpReg == 0) return false;			if (OpReg == 0) return false;
	unsigned ResultReg = createResultReg(&X86::FR64RegClass);			// Avoid introducing a legacy SSE instruction if the target has AVX.
				bool HasAVX = Subtarget->hasAVX();
				unsigned Opc = HasAVX ? X86::VCVTSS2SDrr : X86::CVTSS2SDrr;

				unsigned ImplicitDefReg = 0;
				const TargetRegisterClass *RC = &X86::FR64RegClass;
				if (HasAVX) {
				ImplicitDefReg = createResultReg(RC);
				mkuperUnsubmitted Not Done Reply Inline Actions I have the feeling just reusing OpReg here would be better. It's true that having an IMPLICIT_DEF, in theory, gives the regalloc more freedom, but I'm afraid it may just cause false-dependence trouble further on if decides to choose a different register. So it may be better to just force OpReg. This is what the pattern for the rr version does, btw: def : Pat<(f64 (fextend FR32:$src)), (VCVTSS2SDrr FR32:$src, FR32:$src)>, Requires<[UseAVX]>; The pattern for the rm version has an IMPLICIT_DEF, but in that case, there is no choice. mkuper: I have the feeling just reusing OpReg here would be better. It's true that having an…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions Right, I will remove the IMPLICIT_DEF and just use OpReg for both operands in the AVX case. The reason why I ended up using IMPLICIT_DEF was to give a chance to FastISel to also match the 'register-memory' variants of scalar float/double convert. andreadb: Right, I will remove the IMPLICIT_DEF and just use OpReg for both operands in the AVX case. The…
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::CVTSS2SDrr), ResultReg)			TII.get(TargetOpcode::IMPLICIT_DEF), ImplicitDefReg);
	.addReg(OpReg);			}

				MachineInstrBuilder MIB;
				unsigned ResultReg = createResultReg(RC);
				MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc),
				ResultReg);
				if (ImplicitDefReg)
				MIB.addReg(ImplicitDefReg);
				MIB.addReg(OpReg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}
	}			}

	return false;			return false;
	}			}

	bool X86FastISel::X86SelectFPTrunc(const Instruction *I) {			bool X86FastISel::X86SelectFPTrunc(const Instruction *I) {
	if (X86ScalarSSEf64) {			if (X86ScalarSSEf64 && I->getType()->isFloatTy()) {
	if (I->getType()->isFloatTy()) {
	const Value *V = I->getOperand(0);			const Value *V = I->getOperand(0);
	if (V->getType()->isDoubleTy()) {			if (V->getType()->isDoubleTy()) {
	unsigned OpReg = getRegForValue(V);			unsigned OpReg = getRegForValue(V);
	if (OpReg == 0) return false;			if (OpReg == 0) return false;
	unsigned ResultReg = createResultReg(&X86::FR32RegClass);			// Avoid introducing a legacy SSE instruction if the target has AVX.
				bool HasAVX = Subtarget->hasAVX();
				mkuperUnsubmitted Not Done Reply Inline Actions Perhaps this is now large enough to move the common code from the two functions into a helper? mkuper: Perhaps this is now large enough to move the common code from the two functions into a helper?
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions Good point. I'll try to move the common code into a helper function. andreadb: Good point. I'll try to move the common code into a helper function.
				unsigned Opc = HasAVX ? X86::VCVTSD2SSrr : X86::CVTSD2SSrr;

				unsigned ImplicitDefReg = 0;
				const TargetRegisterClass *RC = &X86::FR32RegClass;
				if (HasAVX) {
				ImplicitDefReg = createResultReg(RC);
	BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,			BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
	TII.get(X86::CVTSD2SSrr), ResultReg)			TII.get(TargetOpcode::IMPLICIT_DEF), ImplicitDefReg);
	.addReg(OpReg);			}

				MachineInstrBuilder MIB;
				unsigned ResultReg = createResultReg(RC);
				MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc),
				ResultReg);
				if (ImplicitDefReg)
				MIB.addReg(ImplicitDefReg);
				MIB.addReg(OpReg);
	updateValueMap(I, ResultReg);			updateValueMap(I, ResultReg);
	return true;			return true;
	}			}
	}			}
	}

	return false;			return false;
	}			}

	bool X86FastISel::X86SelectTrunc(const Instruction *I) {			bool X86FastISel::X86SelectTrunc(const Instruction *I) {
	EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());			EVT SrcVT = TLI.getValueType(I->getOperand(0)->getType());
	EVT DstVT = TLI.getValueType(I->getType());			EVT DstVT = TLI.getValueType(I->getType());

	▲ Show 20 Lines • Show All 1,309 Lines • Show Last 20 Lines

test/CodeGen/X86/fast-isel-fptrunc-fpext.ll

				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+sse2 -fast-isel \| FileCheck %s --check-prefix=ALL --check-prefix=SSE
				; RUN: llc < %s -mtriple=x86_64-unknown-unknown -mattr=+avx -fast-isel \| FileCheck %s --check-prefix=ALL --check-prefix=AVX
				;
				; Verify that fast-isel doesn't select legacy SSE instructions on targets that
				; feature AVX.
				;
				; Test cases are obtained from the following code snippet:
				; ///
				; double single_to_double_rr(float x) {
				; return (double)x;
				; }
				; float double_to_single_rr(double x) {
				; return (float)x;
				; }
				; double single_to_double_rm(float *x) {
				; return (double)*x;
				; }
				; float double_to_single_rm(double *x) {
				; return (float)*x;
				; }
				; ///

				define double @single_to_double_rr(float %x) {
				; ALL-LABEL: single_to_double_rr:
				; SSE: cvtss2sd %xmm0, %xmm0
				mkuperUnsubmitted Not Done Reply Inline Actions If I remember correctly, this will also match "vcvtss2sd %xmm0, %xmm0, %xmm0" since there's no requirement for the match to start at the beginning of a line, and this is a partial match. Perhaps have an SSE-NOT for the v version? (Same applies to all testcases) mkuper: If I remember correctly, this will also match "vcvtss2sd %xmm0, %xmm0, %xmm0" since there's no…
				andreadbAuthorUnsubmitted Not Done Reply Inline Actions Sure, I will change the tests. andreadb: Sure, I will change the tests.
				; AVX: vcvtss2sd %xmm0, %xmm0, %xmm0
				; ALL-NEXT: ret
				entry:
				%conv = fpext float %x to double
				ret double %conv
				}

				define float @double_to_single_rr(double %x) {
				; ALL-LABEL: double_to_single_rr:
				; SSE: cvtsd2ss %xmm0, %xmm0
				; AVX: vcvtsd2ss %xmm0, %xmm0, %xmm0
				; ALL-NEXT: ret
				entry:
				%conv = fptrunc double %x to float
				ret float %conv
				}

				define double @single_to_double_rm(float* %x) {
				; ALL-LABEL: single_to_double_rm:
				; SSE: cvtss2sd (%rdi), %xmm0
				; AVX: vcvtss2sd (%rdi), %xmm0, %xmm0
				; ALL-NEXT: ret
				entry:
				%0 = load float* %x, align 4
				%conv = fpext float %0 to double
				ret double %conv
				}

				define float @double_to_single_rm(double* %x) {
				; ALL-LABEL: double_to_single_rm:
				; SSE: cvtsd2ss (%rdi), %xmm0
				; AVX: vcvtsd2ss (%rdi), %xmm0, %xmm0
				; ALL-NEXT: ret
				entry:
				%0 = load double* %x, align 8
				%conv = fptrunc double %0 to float
				ret float %conv
				}