This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] AMDGPUAAResult::pointsToConstantMemory should not use the default MaxLookup (i.e., 6) to limit getUnderlyingObject
Needs ReviewPublic

Authored by jizhuoran on Aug 28 2020, 10:22 PM.

Download Raw Diff

Details

Reviewers

alex-t
rampitec
vpykhtin
arsenm

Summary

The default value of MaxLookup is 6, which limits the number of instructions to be stripped off when getting the underlying object. 'Loop Unroll' and 'Loop Strength Reduction' passes trend to replace the memory index 'base + i * offset' to 'base, base += offset, base += offset'. It increases the depth of the underlying object so that pointsToConstantMemory may fail to identify a pointer's underlying object is a NoAlias and ReadOnly Argument. It leads to false memory load scheduling dependency and prevents the instruction scheduler to pipeline the memory load operations.

The default MaxLookup is too small for this case. In general, even the first memory load has a GEP and a bitcast. The false memory dependency begins from the fifth memory load for a global memory (argument). It causes less efficient assembly codes.a

Specifying a larger MaxLookup value or even an unlimited MaxLookup (i.e., 0) solves this problem.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jizhuoran created this revision.Aug 28 2020, 10:22 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2020, 10:22 PM

Herald added subscribers: llvm-commits, danielkiss, kerbowa and 8 others. · View Herald Transcript

jizhuoran requested review of this revision.Aug 28 2020, 10:22 PM

Herald added a subscriber: wdng. · View Herald TranscriptAug 28 2020, 10:22 PM

Harbormaster completed remote builds in B70012: Diff 288760.Aug 28 2020, 11:03 PM

Unlimited is really too aggressive. It can slow down compilation dramatically in some cases.
Also it would be nice to see a relevant test.

I would agree with Stas here.
In case you can identify the patterns that require the lookup deeper then 6 levels, you probably can formulate the exact threshold.
And adding tests for such a pattern would make it clear.

Unbounding this completely is too aggressive, and needs a test case

Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2022, 2:10 PM

Herald added subscribers: kosarev, jeroen.dobbelaere, foad, arsenm. · View Herald Transcript

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

AMDGPUAliasAnalysis.cpp

2 lines

Diff 288760

llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp

	//===- AMDGPUAliasAnalysis ------------------------------------------------===//			//===- AMDGPUAliasAnalysis ------------------------------------------------===//
				Lint: Lint Inline Actions clang-format not found in user's PATH; not linting file. Lint: Lint: clang-format not found in user's PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// \file			/// \file
	▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines

	bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,			bool AMDGPUAAResult::pointsToConstantMemory(const MemoryLocation &Loc,
	AAQueryInfo &AAQI, bool OrLocal) {			AAQueryInfo &AAQI, bool OrLocal) {
	unsigned AS = Loc.Ptr->getType()->getPointerAddressSpace();			unsigned AS = Loc.Ptr->getType()->getPointerAddressSpace();
	if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|			if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|
	AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT)			AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT)
	return true;			return true;

	const Value *Base = getUnderlyingObject(Loc.Ptr);			const Value *Base = getUnderlyingObject(Loc.Ptr, 0);
	AS = Base->getType()->getPointerAddressSpace();			AS = Base->getType()->getPointerAddressSpace();
	if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|			if (AS == AMDGPUAS::CONSTANT_ADDRESS \|\|
	AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT)			AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT)
	return true;			return true;

	if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(Base)) {			if (const GlobalVariable *GV = dyn_cast<GlobalVariable>(Base)) {
	if (GV->isConstant())			if (GV->isConstant())
	return true;			return true;
	Show All 35 Lines