This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: enable 128-bit for local addr space under an option
ClosedPublic

Authored by hakzsam on Mar 28 2018, 2:10 AM.

Download Raw Diff

Details

Reviewers

Summary

ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.

Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464

Diff Detail

Event Timeline

hakzsam created this revision.Mar 28 2018, 2:10 AM

Herald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptMar 28 2018, 2:10 AM

arsenm added inline comments.Mar 28 2018, 7:32 AM

lib/Target/AMDGPU/AMDGPU.td
426	Weird formatting/capitalization. "ds_{read\|write}_b128 maybe?
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
273–275	return ternary operator
test/CodeGen/AMDGPU/load-local-f32.ll
6–8	Somewhere needs lines with it disabled

hakzsam added inline comments.Mar 29 2018, 11:12 AM

test/CodeGen/AMDGPU/load-local-f32.ll
6–8	Where? Do you want new tests where the option is disabled?

arsenm added inline comments.Mar 29 2018, 11:19 AM

test/CodeGen/AMDGPU/load-local-f32.ll
6–8	You can either just add another run line and checks, or another small test where it gets used or not. The separate test is probably less effort

v2:

use a return ternary operator
fix feature format
add small test load-local-f32-no-ds128.ll

LGTM

This revision is now accepted and ready to land.Mar 30 2018, 9:28 AM

jvesely added a subscriber: jvesely.Mar 30 2018, 11:17 AM

jvesely added inline comments.

lib/Target/AMDGPU/AMDGPUSubtarget.h
136	Shouldn't this be initialized in the constructor?

hakzsam added inline comments.Apr 3 2018, 2:30 AM

lib/Target/AMDGPU/AMDGPUSubtarget.h
136	You are right.

v3:

init EnableDS128 in the constructor

ping?

LGTM

This was a9a58fa236ab19b5caae32330d31e30ebdf6751f

Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2022, 3:59 PM

Herald added subscribers: kosarev, kerbowa, arichardson. · View Herald Transcript

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPU.td

6 lines

AMDGPUSubtarget.h

5 lines

AMDGPUTargetTransformInfo.cpp

11 lines

SIISelLowering.cpp

9 lines

test/

CodeGen/

AMDGPU/

6 lines

4 lines

4 lines

6 lines

4 lines

4 lines

Diff 140052

lib/Target/AMDGPU/AMDGPU.td

Context not available.
	"Enable SI Machine Scheduler"	"Enable SI Machine Scheduler"
	>;	>;

		def FeatureEnableDS128 : SubtargetFeature<"enable-ds128",
		"EnableDS128",
		"true",
		"Use DS_read/write_b128"
		arsenmUnsubmitted Not Done Reply Inline Actions Weird formatting/capitalization. "ds_{read\|write}_b128 maybe? arsenm: Weird formatting/capitalization. "ds_{read\|write}_b128 maybe?
		>;

	// Unless +-flat-for-global is specified, turn on FlatForGlobal for	// Unless +-flat-for-global is specified, turn on FlatForGlobal for
	// all OS-es on VI and newer hardware to avoid assertion failures due	// all OS-es on VI and newer hardware to avoid assertion failures due
	// to missing ADDR64 variants of MUBUF instructions.	// to missing ADDR64 variants of MUBUF instructions.
Context not available.

lib/Target/AMDGPU/AMDGPUSubtarget.h

Context not available.
	bool EnableLoadStoreOpt;	bool EnableLoadStoreOpt;
	bool EnableUnsafeDSOffsetFolding;	bool EnableUnsafeDSOffsetFolding;
	bool EnableSIScheduler;	bool EnableSIScheduler;
		bool EnableDS128;
		jveselyUnsubmitted Not Done Reply Inline Actions Shouldn't this be initialized in the constructor? jvesely: Shouldn't this be initialized in the constructor?
		hakzsamAuthorUnsubmitted Not Done Reply Inline Actions You are right. hakzsam: You are right.
	bool DumpCode;	bool DumpCode;

	// Subtarget statically properties set by tablegen	// Subtarget statically properties set by tablegen
Context not available.

	/// \returns If target supports ds_read/write_b128 and user enables generation	/// \returns If target supports ds_read/write_b128 and user enables generation
	/// of ds_read/write_b128.	/// of ds_read/write_b128.
	bool useDS128(bool UserEnable) const {	bool useDS128() const {
	return CIInsts && UserEnable;	return CIInsts && EnableDS128;
	}	}

	/// \returns If MUBUF instructions always perform range checking, even for	/// \returns If MUBUF instructions always perform range checking, even for
Context not available.

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

Context not available.
	return 512;	return 512;
	}	}

	if (AddrSpace == AS.FLAT_ADDRESS \|\|	if (AddrSpace == AS.FLAT_ADDRESS)
	AddrSpace == AS.LOCAL_ADDRESS \|\|
	AddrSpace == AS.REGION_ADDRESS)
	return 128;	return 128;

		if (AddrSpace == AS.LOCAL_ADDRESS \|\|
		AddrSpace == AS.REGION_ADDRESS) {
		if (ST->useDS128())
		return 128;
		return 64;
		arsenmUnsubmitted Not Done Reply Inline Actions return ternary operator arsenm: return ternary operator
		}

	if (AddrSpace == AS.PRIVATE_ADDRESS)	if (AddrSpace == AS.PRIVATE_ADDRESS)
	return 8 * ST->getMaxPrivateElementSize();	return 8 * ST->getMaxPrivateElementSize();

Context not available.

lib/Target/AMDGPU/SIISelLowering.cpp

Context not available.
	cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),	cl::desc("Use GPR indexing mode instead of movrel for vector indexing"),
	cl::init(false));	cl::init(false));

	static cl::opt<bool> EnableDS128(
	"amdgpu-ds128",
	cl::desc("Use DS_read/write_b128"),
	cl::init(false));

	static cl::opt<unsigned> AssumeFrameIndexHighZeroBits(	static cl::opt<unsigned> AssumeFrameIndexHighZeroBits(
	"amdgpu-frame-index-zero-bits",	"amdgpu-frame-index-zero-bits",
	cl::desc("High bits of frame index assumed to be zero"),	cl::desc("High bits of frame index assumed to be zero"),
Context not available.
	}	}
	} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {	} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {
	// Use ds_read_b128 if possible.	// Use ds_read_b128 if possible.
	if (Subtarget->useDS128(EnableDS128) && Load->getAlignment() >= 16 &&	if (Subtarget->useDS128() && Load->getAlignment() >= 16 &&
	MemVT.getStoreSize() == 16)	MemVT.getStoreSize() == 16)
	return SDValue();	return SDValue();

Context not available.
	}	}
	} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {	} else if (AS == AMDGPUASI.LOCAL_ADDRESS) {
	// Use ds_write_b128 if possible.	// Use ds_write_b128 if possible.
	if (Subtarget->useDS128(EnableDS128) && Store->getAlignment() >= 16 &&	if (Subtarget->useDS128() && Store->getAlignment() >= 16 &&
	VT.getStoreSize() == 16)	VT.getStoreSize() == 16)
	return SDValue();	return SDValue();

Context not available.

test/CodeGen/AMDGPU/load-local-f32.ll

Context not available.
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s

	; Testing for ds_read/write_128	; Testing for ds_read/write_128
	; RUN: llc -march=amdgcn -mcpu=tahiti -amdgpu-ds128 < %s \| FileCheck -check-prefixes=SI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=SI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
		arsenmUnsubmitted Not Done Reply Inline Actions Somewhere needs lines with it disabled arsenm: Somewhere needs lines with it disabled
		hakzsamAuthorUnsubmitted Not Done Reply Inline Actions Where? Do you want new tests where the option is disabled? hakzsam: Where? Do you want new tests where the option is disabled?
		arsenmUnsubmitted Not Done Reply Inline Actions You can either just add another run line and checks, or another small test where it gets used or not. The separate test is probably less effort arsenm: You can either just add another run line and checks, or another small test where it gets used…

	; FUNC-LABEL: {{^}}load_f32_local:	; FUNC-LABEL: {{^}}load_f32_local:
	; SICIVI: s_mov_b32 m0	; SICIVI: s_mov_b32 m0
Context not available.

test/CodeGen/AMDGPU/load-local-f64.ll

Context not available.
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s

	; Testing for ds_read_b128	; Testing for ds_read_b128
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s

	; FUNC-LABEL: {{^}}local_load_f64:	; FUNC-LABEL: {{^}}local_load_f64:
	; SICIV: s_mov_b32 m0	; SICIV: s_mov_b32 m0
Context not available.

test/CodeGen/AMDGPU/load-local-i16.ll

Context not available.
	; RUN: llc -march=r600 -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s	; RUN: llc -march=r600 -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; Testing for ds_read/write_b128	; Testing for ds_read/write_b128
	; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s

	; FUNC-LABEL: {{^}}local_load_i16:	; FUNC-LABEL: {{^}}local_load_i16:
	; GFX9-NOT: m0	; GFX9-NOT: m0
Context not available.

test/CodeGen/AMDGPU/load-local-i32.ll

Context not available.
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; Testing for ds_read/write_128	; Testing for ds_read/write_128
	; RUN: llc -march=amdgcn -mcpu=tahiti -amdgpu-ds128 < %s \| FileCheck -check-prefixes=SI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tahiti -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=SI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s

	; FUNC-LABEL: {{^}}local_load_i32:	; FUNC-LABEL: {{^}}local_load_i32:
	; GCN-NOT: s_wqm_b64	; GCN-NOT: s_wqm_b64
Context not available.

test/CodeGen/AMDGPU/load-local-i64.ll

Context not available.
	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s	; RUN: llc -march=r600 -mcpu=redwood < %s \| FileCheck -check-prefixes=EG,FUNC %s

	; Testing for ds_read/write_b128	; Testing for ds_read/write_b128
	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -verify-machineinstrs -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -verify-machineinstrs -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s

	; FUNC-LABEL: {{^}}local_load_i64:	; FUNC-LABEL: {{^}}local_load_i64:
	; SICIVI: s_mov_b32 m0	; SICIVI: s_mov_b32 m0
Context not available.

test/CodeGen/AMDGPU/load-local-i8.ll

Context not available.
	; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s	; RUN: llc -march=r600 -mtriple=r600---amdgiz -mcpu=redwood -verify-machineinstrs < %s \| FileCheck -check-prefix=EG -check-prefix=FUNC %s

	; Testing for ds_read/write_b128	; Testing for ds_read/write_b128
	; RUN: llc -march=amdgcn -mcpu=tonga -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=tonga -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s
	; RUN: llc -march=amdgcn -mcpu=gfx900 -amdgpu-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s	; RUN: llc -march=amdgcn -mcpu=gfx900 -mattr=+enable-ds128 < %s \| FileCheck -check-prefixes=CIVI,FUNC %s

	; FUNC-LABEL: {{^}}local_load_i8:	; FUNC-LABEL: {{^}}local_load_i8:
	; GCN-NOT: s_wqm_b64	; GCN-NOT: s_wqm_b64
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: enable 128-bit for local addr space under an optionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 140052

lib/Target/AMDGPU/AMDGPU.td

lib/Target/AMDGPU/AMDGPUSubtarget.h

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp

lib/Target/AMDGPU/SIISelLowering.cpp

test/CodeGen/AMDGPU/load-local-f32.ll

test/CodeGen/AMDGPU/load-local-f64.ll

test/CodeGen/AMDGPU/load-local-i16.ll

test/CodeGen/AMDGPU/load-local-i32.ll

test/CodeGen/AMDGPU/load-local-i64.ll

test/CodeGen/AMDGPU/load-local-i8.ll

AMDGPU: enable 128-bit for local addr space under an option
ClosedPublic