This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
test/tools/llvm-exegesis/X86/
-
tools/
-
llvm-exegesis/
-
X86/
-
uops-VFMADDSS4rm.s
-
tools/llvm-exegesis/lib/X86/
-
llvm-exegesis/
-
lib/
-
X86/
1/2
Target.cpp

Differential D70874

[X86] Add initialization of MXCSR in llvm-exegesis
ClosedPublic

Authored by pengfei on Nov 30 2019, 9:50 PM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
courbet
gchatelet

Commits

rG44b9942898c7: [X86] Add initialization of MXCSR in llvm-exegesis

Summary

This patch is used to initialize the new added register MXCSR.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

pengfei created this revision.Nov 30 2019, 9:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2019, 9:50 PM

Herald added subscribers: llvm-commits, courbet, tschuett. · View Herald Transcript

Harbormaster completed remote builds in B41668: Diff 231610.Nov 30 2019, 9:56 PM

pengfei added a child revision: D70875: [X86] Model MXCSR for AVX instructions other than AVX512.Nov 30 2019, 9:59 PM

I'm not sure I understand why this needs to be initialized. Why don't we need to do it for FPCW?

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
506	Isn't the default value of MXCSR 0x1f80 not 0x1f8?

craig.topper added reviewers: courbet, gchatelet.Nov 30 2019, 10:15 PM

Fix typo.

llvm/tools/llvm-exegesis/lib/X86/Target.cpp
506	Yes! I missed an 0. Thanks!

Harbormaster completed remote builds in B41670: Diff 231613.Dec 1 2019, 1:59 AM

I'm not sure I understand why this needs to be initialized. Why don't we need to do it for FPCW?

I'm sorry for not giving comments. Yes, we do need for FPCW if this is a reasonable approach. But I'm not sure if we need to explicatly initialize registers like MXCSR and FPCW or not either.
Can we just return a MCInstBuilder(X86::NOOP) for them to make the check pass (In Assembler.cpp L44, it judges the IsSnippetSetupComplete by checking if the return value is empty).

In D70874#1764446, @craig.topper wrote:

I'm not sure I understand why this needs to be initialized. Why don't we need to do it for FPCW?

We make sure that every register that is used by an instruction in the snippet is initialized. This is to avoid having fluctuations in measurements due to performance depending on values in registers. I think it's great if SSE/AVX instructions start explicitly state their deps on MXCSR, because the behaviour does indeed depend on the value of these flags.

Regarding whether masking instruction is the right thing to do: Yes for now, because this will ensure that these instructions will at least pass, but on the other hand that might not be perfectly representative of the real performance characteristics. AFAICT @gchatelet still wants to work on value constraints & exploration.

So this looks good to me if Craig has no other remarks.

This revision is now accepted and ready to land.Dec 2 2019, 12:07 AM

LGTM

Closed by commit rG44b9942898c7: [X86] Add initialization of MXCSR in llvm-exegesis (authored by Wang, Pengfei <pengfei.wang@intel.com>). · Explain WhyDec 2 2019, 2:32 AM

This revision was automatically updated to reflect the committed changes.

We make sure that every register that is used by an instruction in the snippet is initialized. This is to avoid having fluctuations in measurements due to performance depending on values in registers. I think it's great if SSE/AVX instructions start explicitly state their deps on MXCSR, because the behaviour does indeed depend on the value of these flags.

Thanks for the explanation. Please note that currently we only state the rounding modes and the IEEE masks bits of MXCSR. This means instructions like VRSQRT14PS, which only have dependence on FTZ and DAZ, wouldn't be modeled.

pengfei mentioned this in D70891: [X86] Add initialization of FPCW in llvm-exegesis.Dec 2 2019, 3:42 AM

pengfei mentioned this in rG76b70f6f75e9: [X86] Add initialization of FPCW in llvm-exegesis.Dec 2 2019, 4:21 AM

Please note that currently we only state the rounding modes and the IEEE masks bits of MXCSR. This means instructions like VRSQRT14PS, which only have dependence on FTZ and DAZ, wouldn't be modeled.

FTZ and DAZ are modeled now by commit rGc8995de06994. Thanks.

In D70874#1764713, @courbet wrote:

Regarding whether masking instruction is the right thing to do: Yes for now, because this will ensure that these instructions will at least pass, but on the other hand that might not be perfectly representative of the real performance characteristics. AFAICT @gchatelet still wants to work on value constraints & exploration.

Yes definitely. LGTM as well.

Revision Contents

Path

Size

llvm/

test/

tools/

llvm-exegesis/

X86/

uops-VFMADDSS4rm.s

3 lines

tools/

llvm-exegesis/

lib/

X86/

Target.cpp

17 lines

Diff 231653

llvm/test/tools/llvm-exegesis/X86/uops-VFMADDSS4rm.s

	# RUN: llvm-exegesis -mode=uops -opcode-name=VFMADDSS4rm -repetition-mode=duplicate \| FileCheck %s			# RUN: llvm-exegesis -mode=uops -opcode-name=VFMADDSS4rm -repetition-mode=duplicate \| FileCheck %s
	# RUN: llvm-exegesis -mode=uops -opcode-name=VFMADDSS4rm -repetition-mode=loop \| FileCheck %s			# RUN: llvm-exegesis -mode=uops -opcode-name=VFMADDSS4rm -repetition-mode=loop \| FileCheck %s

	CHECK: mode: uops			CHECK: mode: uops
	CHECK-NEXT: key:			CHECK-NEXT: key:
	CHECK-NEXT: instructions:			CHECK-NEXT: instructions:
	CHECK-NEXT: VFMADDSS4rm			CHECK-NEXT: VFMADDSS4rm
				CHECK: register_initial_values:
				# FIXME: This will be changed to CHECK by the following patch that modeling MXCSR to VFMADDSS.
				CHECK-NOT: MXCSR

llvm/tools/llvm-exegesis/lib/X86/Target.cpp

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	std::vector<MCInst> loadAndFinalize(unsigned Reg, unsigned RegBitWidth,
unsigned Opcode);		unsigned Opcode);

std::vector<MCInst> loadX87STAndFinalize(unsigned Reg);		std::vector<MCInst> loadX87STAndFinalize(unsigned Reg);

std::vector<MCInst> loadX87FPAndFinalize(unsigned Reg);		std::vector<MCInst> loadX87FPAndFinalize(unsigned Reg);

std::vector<MCInst> popFlagAndFinalize();		std::vector<MCInst> popFlagAndFinalize();

		std::vector<MCInst> loadMXCSRAndFinalize(bool HasAVX);

private:		private:
ConstantInliner &add(const MCInst &Inst) {		ConstantInliner &add(const MCInst &Inst) {
Instructions.push_back(Inst);		Instructions.push_back(Inst);
return *this;		return *this;
}		}

void initStack(unsigned Bytes);		void initStack(unsigned Bytes);

▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
}		}

std::vector<MCInst> ConstantInliner::popFlagAndFinalize() {		std::vector<MCInst> ConstantInliner::popFlagAndFinalize() {
initStack(8);		initStack(8);
add(MCInstBuilder(X86::POPF64));		add(MCInstBuilder(X86::POPF64));
return std::move(Instructions);		return std::move(Instructions);
}		}

		std::vector<MCInst> ConstantInliner::loadMXCSRAndFinalize(bool HasAVX) {
		add(allocateStackSpace(4));
		add(fillStackSpace(X86::MOV32mi, 0, 0x1f80)); // Mask all FP exceptions
		craig.topperUnsubmitted Not Done Reply Inline Actions Isn't the default value of MXCSR 0x1f80 not 0x1f8? craig.topper: Isn't the default value of MXCSR 0x1f80 not 0x1f8?
		pengfeiAuthorUnsubmitted Done Reply Inline Actions Yes! I missed an 0. Thanks! pengfei: Yes! I missed an 0. Thanks!
		add(MCInstBuilder(HasAVX ? X86::VLDMXCSR : X86::LDMXCSR)
		// Address = ESP
		.addReg(X86::RSP) // BaseReg
		.addImm(1) // ScaleAmt
		.addReg(0) // IndexReg
		.addImm(0) // Disp
		.addReg(0)); // Segment
		return std::move(Instructions);
		}

void ConstantInliner::initStack(unsigned Bytes) {		void ConstantInliner::initStack(unsigned Bytes) {
assert(Constant_.getBitWidth() <= Bytes * 8 &&		assert(Constant_.getBitWidth() <= Bytes * 8 &&
"Value does not have the correct size");		"Value does not have the correct size");
const APInt WideConstant = Constant_.getBitWidth() < Bytes * 8		const APInt WideConstant = Constant_.getBitWidth() < Bytes * 8
? Constant_.sext(Bytes * 8)		? Constant_.sext(Bytes * 8)
: Constant_;		: Constant_;
add(allocateStackSpace(Bytes));		add(allocateStackSpace(Bytes));
size_t ByteOffset = 0;		size_t ByteOffset = 0;
▲ Show 20 Lines • Show All 184 Lines • ▼ Show 20 Lines	if (X86::RSTRegClass.contains(Reg)) {
return CI.loadX87STAndFinalize(Reg);		return CI.loadX87STAndFinalize(Reg);
}		}
if (X86::RFP32RegClass.contains(Reg) \|\| X86::RFP64RegClass.contains(Reg) \|\|		if (X86::RFP32RegClass.contains(Reg) \|\| X86::RFP64RegClass.contains(Reg) \|\|
X86::RFP80RegClass.contains(Reg)) {		X86::RFP80RegClass.contains(Reg)) {
return CI.loadX87FPAndFinalize(Reg);		return CI.loadX87FPAndFinalize(Reg);
}		}
if (Reg == X86::EFLAGS)		if (Reg == X86::EFLAGS)
return CI.popFlagAndFinalize();		return CI.popFlagAndFinalize();
		if (Reg == X86::MXCSR)
		return CI.loadMXCSRAndFinalize(STI.getFeatureBits()[X86::FeatureAVX]);
return {}; // Not yet implemented.		return {}; // Not yet implemented.
}		}

static ExegesisTarget *getTheExegesisX86Target() {		static ExegesisTarget *getTheExegesisX86Target() {
static ExegesisX86Target Target;		static ExegesisX86Target Target;
return &Target;		return &Target;
}		}

void InitializeX86ExegesisTarget() {		void InitializeX86ExegesisTarget() {
ExegesisTarget::registerTarget(getTheExegesisX86Target());		ExegesisTarget::registerTarget(getTheExegesisX86Target());
}		}

} // namespace exegesis		} // namespace exegesis
} // namespace llvm		} // namespace llvm