This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
ClosedPublic

Authored by nhaehnle on Nov 22 2017, 4:24 AM.

Download Raw Diff

Details

Reviewers

arsenm
rampitec

Commits

rGb4f28deda0b3: AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer
rL319156: AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer

Summary

The entire algorithm operates per basic-block, so for cache locality
it should be better to re-optimize a basic-block immediately rather than
in a separate loop.

I don't have performance measurements.

Change-Id: I85106570bd623c4ff277faaa50ee43258e1ddcc5

Diff Detail

Build Status

Buildable 12396
Build 12396: arc lint + arc unit

Event Timeline

nhaehnle created this revision.Nov 22 2017, 4:24 AM

Herald added subscribers: t-tye, tpr, dstuttard and 3 others. · View Herald TranscriptNov 22 2017, 4:24 AM

LGTM

This revision is now accepted and ready to land.Nov 22 2017, 11:17 AM

Closed by commit rL319156: AMDGPU: Re-organize the outer loop of SILoadStoreOptimizer (authored by nha). · Explain WhyNov 28 2017, 12:43 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SILoadStoreOptimizer.cpp

11 lines

Diff 123907

lib/Target/AMDGPU/SILoadStoreOptimizer.cpp

//===- SILoadStoreOptimizer.cpp -------------------------------------------===//		//===- SILoadStoreOptimizer.cpp -------------------------------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This pass tries to fuse DS instructions with close by immediate offsets.		// This pass tries to fuse DS instructions with close by immediate offsets.
// This will fuse operations such as		// This will fuse operations such as
// ds_read_b32 v0, v2 offset:16		// ds_read_b32 v0, v2 offset:16
// ds_read_b32 v1, v2 offset:32		// ds_read_b32 v1, v2 offset:32
// ==>		// ==>
// ds_read2_b32 v[0:1], v2, offset0:4 offset1:8		// ds_read2_b32 v[0:1], v2, offset0:4 offset1:8
//		//
// The same is done for certain SMEM opcodes, e.g.:		// The same is done for certain SMEM and VMEM opcodes, e.g.:
// s_buffer_load_dword s4, s[0:3], 4		// s_buffer_load_dword s4, s[0:3], 4
// s_buffer_load_dword s5, s[0:3], 8		// s_buffer_load_dword s5, s[0:3], 8
// ==>		// ==>
// s_buffer_load_dwordx2 s[4:5], s[0:3], 4		// s_buffer_load_dwordx2 s[4:5], s[0:3], 4
//		//
//		//
// Future improvements:		// Future improvements:
//		//
▲ Show 20 Lines • Show All 860 Lines • ▼ Show 20 Lines	bool SILoadStoreOptimizer::runOnMachineFunction(MachineFunction &MF) {
MRI = &MF.getRegInfo();		MRI = &MF.getRegInfo();
AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();		AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();

assert(MRI->isSSA() && "Must be run on SSA");		assert(MRI->isSSA() && "Must be run on SSA");

DEBUG(dbgs() << "Running SILoadStoreOptimizer\n");		DEBUG(dbgs() << "Running SILoadStoreOptimizer\n");

bool Modified = false;		bool Modified = false;
CreatedX2 = 0;

for (MachineBasicBlock &MBB : MF)		for (MachineBasicBlock &MBB : MF) {
		CreatedX2 = 0;
Modified \|= optimizeBlock(MBB);		Modified \|= optimizeBlock(MBB);

// Run again to convert x2 to x4.		// Run again to convert x2 to x4.
if (CreatedX2 >= 1) {		if (CreatedX2 >= 1)
for (MachineBasicBlock &MBB : MF)
Modified \|= optimizeBlock(MBB);		Modified \|= optimizeBlock(MBB);
}		}

return Modified;		return Modified;
}		}