This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add a flag to guard the wide load
ClosedPublic

Authored by Carrot on Jun 1 2020, 11:51 AM.

Download Raw Diff

Details

Reviewers

efriedma
craig.topper

Commits

rG587af86f1d8a: [X86] Add a flag to guard the wide load

Summary

As shown in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141854.html, widen load can also cause stall. Add a flag to guard the widening code, so users can disable it and evaluate its performance impact.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Carrot created this revision.Jun 1 2020, 11:51 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 11:51 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

craig.topper added a subscriber: craig.topper.Jun 1 2020, 12:08 PM

craig.topper added inline comments.

llvm/lib/Target/X86/X86InstrInfo.td
1135	Why not do this in loadi16 as well?
llvm/lib/Target/X86/X86Subtarget.cpp
48 ↗	(On Diff #267681)	Can we put this in X86ISelDAGToDAG.cpp and access it directly? The .td should generate an include that gets included there.
48 ↗	(On Diff #267681)	enalbe->enable x86-promote-anyext-load might be a better name? "wide" might get mixed up with vectors since we have WidenVector as a type legalization action.

Harbormaster failed remote builds in B58651: Diff 267681!Jun 1 2020, 12:26 PM

See https://reviews.llvm.org/rL51019 for the original commit.

The benefits of matching loadi32 like this are:

mov is one byte shorter than movzx
We can fold the load into a subsequent 32-bit instruction in some cases.

And the downside, of course, is that if we're unlucky and create a hazard, there's a performance hit.

In D80943#2067284, @efriedma wrote:

See https://reviews.llvm.org/rL51019 for the original commit.

The benefits of matching loadi32 like this are:

mov is one byte shorter than movzx

We can fold the load into a subsequent 32-bit instruction in some cases.

And the downside, of course, is that if we're unlucky and create a hazard, there's a performance hit.

Thanks for the background information. The new flag is on by default, so it doesn't change anything if it is not explicitly disabled.

And if the wide load hit a narrow store, the penalty is much larger than a separate alu instruction, because it must wait until all instructions before (and including) the narrow store retired.

llvm/lib/Target/X86/X86InstrInfo.td
1135	LLVM IR doesn't have anyext, I failed to write a test case to trigger it. :(

Carrot updated this revision to Diff 267755.Jun 1 2020, 3:47 PM

Carrot marked an inline comment as done.

Carrot added a reviewer: craig.topper.Jun 2 2020, 11:21 AM

LGTM

llvm/lib/Target/X86/X86InstrInfo.td
1135	yeah it is probably hard to trigger with so much promotion of i16 ops.

This revision is now accepted and ready to land.Jun 2 2020, 1:29 PM

Closed by commit rG587af86f1d8a: [X86] Add a flag to guard the wide load (authored by Carrot). · Explain WhyJun 2 2020, 4:29 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86ISelDAGToDAG.cpp

4 lines

X86InstrInfo.td

4 lines

test/

CodeGen/

X86/

no-wide-load.ll

22 lines

Diff 268020

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

	Show All 34 Lines
	#define DEBUG_TYPE "x86-isel"			#define DEBUG_TYPE "x86-isel"

	STATISTIC(NumLoadMoved, "Number of loads moved below TokenFactor");			STATISTIC(NumLoadMoved, "Number of loads moved below TokenFactor");

	static cl::opt<bool> AndImmShrink("x86-and-imm-shrink", cl::init(true),			static cl::opt<bool> AndImmShrink("x86-and-imm-shrink", cl::init(true),
	cl::desc("Enable setting constant bits to reduce size of mask immediates"),			cl::desc("Enable setting constant bits to reduce size of mask immediates"),
	cl::Hidden);			cl::Hidden);

				static cl::opt<bool> EnablePromoteAnyextLoad(
				"x86-promote-anyext-load", cl::init(true),
				cl::desc("Enable promoting aligned anyext load to wider load"), cl::Hidden);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Pattern Matcher Implementation			// Pattern Matcher Implementation
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	namespace {			namespace {
	/// This corresponds to X86AddressMode, but uses SDValue's instead of register			/// This corresponds to X86AddressMode, but uses SDValue's instead of register
	/// numbers for the leaves of the matched tree.			/// numbers for the leaves of the matched tree.
	struct X86ISelAddressMode {			struct X86ISelAddressMode {
	▲ Show 20 Lines • Show All 5,616 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86InstrInfo.td

	Show First 20 Lines • Show All 1,116 Lines • ▼ Show 20 Lines

	// It's always safe to treat a anyext i16 load as a i32 load if the i16 is			// It's always safe to treat a anyext i16 load as a i32 load if the i16 is
	// known to be 32-bit aligned or better. Ditto for i8 to i16.			// known to be 32-bit aligned or better. Ditto for i8 to i16.
	def loadi16 : PatFrag<(ops node:$ptr), (i16 (unindexedload node:$ptr)), [{			def loadi16 : PatFrag<(ops node:$ptr), (i16 (unindexedload node:$ptr)), [{
	LoadSDNode *LD = cast<LoadSDNode>(N);			LoadSDNode *LD = cast<LoadSDNode>(N);
	ISD::LoadExtType ExtType = LD->getExtensionType();			ISD::LoadExtType ExtType = LD->getExtensionType();
	if (ExtType == ISD::NON_EXTLOAD)			if (ExtType == ISD::NON_EXTLOAD)
	return true;			return true;
	if (ExtType == ISD::EXTLOAD)			if (ExtType == ISD::EXTLOAD && EnablePromoteAnyextLoad)
	return LD->getAlignment() >= 2 && LD->isSimple();			return LD->getAlignment() >= 2 && LD->isSimple();
	return false;			return false;
	}]>;			}]>;

	def loadi32 : PatFrag<(ops node:$ptr), (i32 (unindexedload node:$ptr)), [{			def loadi32 : PatFrag<(ops node:$ptr), (i32 (unindexedload node:$ptr)), [{
	LoadSDNode *LD = cast<LoadSDNode>(N);			LoadSDNode *LD = cast<LoadSDNode>(N);
	ISD::LoadExtType ExtType = LD->getExtensionType();			ISD::LoadExtType ExtType = LD->getExtensionType();
	if (ExtType == ISD::NON_EXTLOAD)			if (ExtType == ISD::NON_EXTLOAD)
	return true;			return true;
	if (ExtType == ISD::EXTLOAD)			if (ExtType == ISD::EXTLOAD && EnablePromoteAnyextLoad)
				craig.topperUnsubmitted Done Reply Inline Actions Why not do this in loadi16 as well? craig.topper: Why not do this in loadi16 as well?
				CarrotAuthorUnsubmitted Done Reply Inline Actions LLVM IR doesn't have anyext, I failed to write a test case to trigger it. :( Carrot: LLVM IR doesn't have anyext, I failed to write a test case to trigger it. :(
				craig.topperUnsubmitted Not Done Reply Inline Actions yeah it is probably hard to trigger with so much promotion of i16 ops. craig.topper: yeah it is probably hard to trigger with so much promotion of i16 ops.
	return LD->getAlignment() >= 4 && LD->isSimple();			return LD->getAlignment() >= 4 && LD->isSimple();
	return false;			return false;
	}]>;			}]>;

	def loadi64 : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;			def loadi64 : PatFrag<(ops node:$ptr), (i64 (load node:$ptr))>;
	def loadf32 : PatFrag<(ops node:$ptr), (f32 (load node:$ptr))>;			def loadf32 : PatFrag<(ops node:$ptr), (f32 (load node:$ptr))>;
	def loadf64 : PatFrag<(ops node:$ptr), (f64 (load node:$ptr))>;			def loadf64 : PatFrag<(ops node:$ptr), (f64 (load node:$ptr))>;
	def loadf80 : PatFrag<(ops node:$ptr), (f80 (load node:$ptr))>;			def loadf80 : PatFrag<(ops node:$ptr), (f80 (load node:$ptr))>;
	▲ Show 20 Lines • Show All 2,470 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/no-wide-load.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=x86_64-- -x86-promote-anyext-load=false \| FileCheck %s

				%struct.S = type { i32, i16, i16 }

				define void @foo(%struct.S* %p, i16 signext %s) {
				; CHECK-LABEL: foo:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: movzwl 4(%rdi), %eax
				; CHECK-NEXT: andl $-1121, %eax # imm = 0xFB9F
				; CHECK-NEXT: orl $1024, %eax # imm = 0x400
				; CHECK-NEXT: movw %ax, 4(%rdi)
				; CHECK-NEXT: retq
				entry:
				%f2 = getelementptr inbounds %struct.S, %struct.S* %p, i64 0, i32 1
				%0 = load i16, i16* %f2, align 4
				%1 = and i16 %0, -1121
				%2 = or i16 %1, 1024
				store i16 %2, i16* %f2, align 4
				ret void
				}