This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/IR/
-
llvm/
-
IR/
1
Value.h
-
lib/
-
IR/
-
Value.cpp
-
Transforms/InstCombine/
-
InstCombine/
-
InstructionCombining.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
cast-no-addrspacecast-folding.ll

Differential D41652

[InstCombine] Add an option to disable addrspacecast folding into GEP
AbandonedPublic

Authored by mareko on Dec 31 2017, 8:21 PM.

Download Raw Diff

Details

Reviewers

rnk
aprantl
majnemer
chandlerc
spatel
craig.topper
RKSimon
hfinkel
sanjoy
davide
arsenm
nhaehnle

Summary

AMDGPU users use addrspacecast to zero-extend a 32-bit pointer to 64 bits.
No other instruction is ever used on 32-bit pointers.

AMDGPU doesn't have full support for 32-bit pointers everywhere, so make
sure addrspacecast isn't folded, so that we don't get GEP on 32-bit
pointers.

Diff Detail

Build Status

Buildable 13457
Build 13457: arc lint + arc unit

Event Timeline

mareko created this revision.Dec 31 2017, 8:21 PM

Herald added a subscriber: tpr. · View Herald TranscriptDec 31 2017, 8:21 PM

FYI, I'd like to get this into LLVM 6.0 if it's OK.

Herald added a subscriber: wdng. · View Herald TranscriptJan 1 2018, 12:28 PM

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?
Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

This revision now requires changes to proceed.Jan 2 2018, 5:50 AM

In D41652#965765, @davide wrote:

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

In D41652#965797, @mareko wrote:

In D41652#965765, @davide wrote:

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

In D41652#965806, @davide wrote:

In D41652#965797, @mareko wrote:

In D41652#965765, @davide wrote:

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

GEPs will actually work, but the generated assembly for loads will be worse, thus I'd like to avoid them. A better solution for LLVM 7.0 might be possible.

In D41652#965836, @mareko wrote:

In D41652#965806, @davide wrote:

In D41652#965797, @mareko wrote:

In D41652#965765, @davide wrote:

I'm not sure having a cl::opt is the best option here.
If you really want to make such a change, can you at least thread this information through TargetTransformInfo rather than using a global option?

Yes.

Also, for my own curiosity, is this a temporary workaround until AMDGPU grows proper support for 32-bit GEPs or is this an inherent limitation?
If the former, maybe this hack should not go in (or at least we should consider what's the amount of work needed to finish implementing the support)

It's possible that this will be a permanent solution, because we don't plan to have full 32-bit support (it would be too much work in the backend for little benefit).

This is quite unfortunate. I'd like to point out I don't feel particularly comfortable to have this as a long term solution (but, maybe, OK for 6.0 & reverted in trunk).
The contract between the mid-level optimizer and the backend is that the latter should possibly accept everything produced by the former (or, FWIW, error out in some circumstances).
Maybe you can have something in your backend logic that recovers from the fact that AMDGPU doesn't know (and won't be taught) about 32-bit GEPs?
Have you considered something during legalization? [Apologies if I'm off, but I'm not really familiar with the way AMDGPU works, at least not in depth].

GEPs will actually work, but the generated assembly for loads will be worse, thus I'd like to avoid them. A better solution for LLVM 7.0 might be possible.

Cool. After you update the patch (and we reach consensus), I guess it's not unreachable to commit this to trunk, back port to 6, then revert and start working on a better solution for the 7.0 timeframe.
I do understand we all have deadlines at times :)

aprantl added inline comments.Jan 2 2018, 9:46 AM

include/llvm/IR/Value.h
512	The `\brief` is redundant, we use autobrief now.

Abandoning. The new plan is that we'll do it the right way in AMDGPU now.

Revision Contents

Path

Size

include/

llvm/

IR/

Value.h

11 lines

lib/

IR/

Value.cpp

10 lines

Transforms/

InstCombine/

InstructionCombining.cpp

10 lines

test/

Transforms/

InstCombine/

cast-no-addrspacecast-folding.ll

16 lines

Diff 128380

include/llvm/IR/Value.h

Show First 20 Lines • Show All 503 Lines • ▼ Show 20 Lines	#include "llvm/IR/Value.def"
/// Returns the original uncasted value. If this is called on a non-pointer		/// Returns the original uncasted value. If this is called on a non-pointer
/// value, it returns 'this'.		/// value, it returns 'this'.
const Value *stripPointerCasts() const;		const Value *stripPointerCasts() const;
Value *stripPointerCasts() {		Value *stripPointerCasts() {
return const_cast<Value *>(		return const_cast<Value *>(
static_cast<const Value *>(this)->stripPointerCasts());		static_cast<const Value *>(this)->stripPointerCasts());
}		}

		/// \brief Strip off pointer casts (except addrspacecast), all-zero GEPs,
		aprantlUnsubmitted Not Done Reply Inline Actions The `\brief` is redundant, we use autobrief now. aprantl: The `\brief` is redundant, we use autobrief now.
		/// and aliases.
		///
		/// Returns the original uncasted value or addrspacecast (if present).
		/// If this is called on a non-pointer value, it returns 'this'.
		const Value *stripPointerCastsKeepAddrSpaceCast() const;
		Value *stripPointerCastsKeepAddrSpaceCast() {
		return const_cast<Value *>(
		static_cast<const Value *>(this)->stripPointerCastsKeepAddrSpaceCast());
		}

/// \brief Strip off pointer casts, all-zero GEPs, aliases and barriers.		/// \brief Strip off pointer casts, all-zero GEPs, aliases and barriers.
///		///
/// Returns the original uncasted value. If this is called on a non-pointer		/// Returns the original uncasted value. If this is called on a non-pointer
/// value, it returns 'this'. This function should be used only in		/// value, it returns 'this'. This function should be used only in
/// Alias analysis.		/// Alias analysis.
const Value *stripPointerCastsAndBarriers() const;		const Value *stripPointerCastsAndBarriers() const;
Value *stripPointerCastsAndBarriers() {		Value *stripPointerCastsAndBarriers() {
return const_cast<Value *>(		return const_cast<Value *>(
▲ Show 20 Lines • Show All 353 Lines • Show Last 20 Lines

lib/IR/Value.cpp

Show First 20 Lines • Show All 484 Lines • ▼ Show 20 Lines	for (auto *C : Constants)
C->handleOperandChange(this, New);		C->handleOperandChange(this, New);
}		}

namespace {		namespace {
// Various metrics for how much to strip off of pointers.		// Various metrics for how much to strip off of pointers.
enum PointerStripKind {		enum PointerStripKind {
PSK_ZeroIndices,		PSK_ZeroIndices,
PSK_ZeroIndicesAndAliases,		PSK_ZeroIndicesAndAliases,
		PSK_ZeroIndicesAndAliasesKeepAddrSpaceCast,
PSK_ZeroIndicesAndAliasesAndBarriers,		PSK_ZeroIndicesAndAliasesAndBarriers,
PSK_InBoundsConstantIndices,		PSK_InBoundsConstantIndices,
PSK_InBounds		PSK_InBounds
};		};

template <PointerStripKind StripKind>		template <PointerStripKind StripKind>
static const Value stripPointerCastsAndOffsets(const Value V) {		static const Value stripPointerCastsAndOffsets(const Value V) {
if (!V->getType()->isPointerTy())		if (!V->getType()->isPointerTy())
return V;		return V;

// Even though we don't look through PHI nodes, we could be called on an		// Even though we don't look through PHI nodes, we could be called on an
// instruction in an unreachable block, which may be on a cycle.		// instruction in an unreachable block, which may be on a cycle.
SmallPtrSet<const Value *, 4> Visited;		SmallPtrSet<const Value *, 4> Visited;

Visited.insert(V);		Visited.insert(V);
do {		do {
if (auto *GEP = dyn_cast<GEPOperator>(V)) {		if (auto *GEP = dyn_cast<GEPOperator>(V)) {
switch (StripKind) {		switch (StripKind) {
case PSK_ZeroIndicesAndAliases:		case PSK_ZeroIndicesAndAliases:
		case PSK_ZeroIndicesAndAliasesKeepAddrSpaceCast:
case PSK_ZeroIndicesAndAliasesAndBarriers:		case PSK_ZeroIndicesAndAliasesAndBarriers:
case PSK_ZeroIndices:		case PSK_ZeroIndices:
if (!GEP->hasAllZeroIndices())		if (!GEP->hasAllZeroIndices())
return V;		return V;
break;		break;
case PSK_InBoundsConstantIndices:		case PSK_InBoundsConstantIndices:
if (!GEP->hasAllConstantIndices())		if (!GEP->hasAllConstantIndices())
return V;		return V;
LLVM_FALLTHROUGH;		LLVM_FALLTHROUGH;
case PSK_InBounds:		case PSK_InBounds:
if (!GEP->isInBounds())		if (!GEP->isInBounds())
return V;		return V;
break;		break;
}		}
V = GEP->getPointerOperand();		V = GEP->getPointerOperand();
} else if (Operator::getOpcode(V) == Instruction::BitCast \|\|		} else if (Operator::getOpcode(V) == Instruction::BitCast \|\|
Operator::getOpcode(V) == Instruction::AddrSpaceCast) {		(StripKind != PSK_ZeroIndicesAndAliasesKeepAddrSpaceCast &&
		Operator::getOpcode(V) == Instruction::AddrSpaceCast)) {
V = cast<Operator>(V)->getOperand(0);		V = cast<Operator>(V)->getOperand(0);
} else if (auto *GA = dyn_cast<GlobalAlias>(V)) {		} else if (auto *GA = dyn_cast<GlobalAlias>(V)) {
if (StripKind == PSK_ZeroIndices \|\| GA->isInterposable())		if (StripKind == PSK_ZeroIndices \|\| GA->isInterposable())
return V;		return V;
V = GA->getAliasee();		V = GA->getAliasee();
} else {		} else {
if (auto CS = ImmutableCallSite(V)) {		if (auto CS = ImmutableCallSite(V)) {
if (const Value *RV = CS.getReturnedArgOperand()) {		if (const Value *RV = CS.getReturnedArgOperand()) {
Show All 17 Lines	static const Value stripPointerCastsAndOffsets(const Value V) {
return V;		return V;
}		}
} // end anonymous namespace		} // end anonymous namespace

const Value *Value::stripPointerCasts() const {		const Value *Value::stripPointerCasts() const {
return stripPointerCastsAndOffsets<PSK_ZeroIndicesAndAliases>(this);		return stripPointerCastsAndOffsets<PSK_ZeroIndicesAndAliases>(this);
}		}

		const Value *Value::stripPointerCastsKeepAddrSpaceCast() const {
		return stripPointerCastsAndOffsets<
		PSK_ZeroIndicesAndAliasesKeepAddrSpaceCast>(this);
		}

const Value *Value::stripPointerCastsNoFollowAliases() const {		const Value *Value::stripPointerCastsNoFollowAliases() const {
return stripPointerCastsAndOffsets<PSK_ZeroIndices>(this);		return stripPointerCastsAndOffsets<PSK_ZeroIndices>(this);
}		}

const Value *Value::stripInBoundsConstantOffsets() const {		const Value *Value::stripInBoundsConstantOffsets() const {
return stripPointerCastsAndOffsets<PSK_InBoundsConstantIndices>(this);		return stripPointerCastsAndOffsets<PSK_InBoundsConstantIndices>(this);
}		}

▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstructionCombining.cpp

Show First 20 Lines • Show All 129 Lines • ▼ Show 20 Lines
// increases variable availability at the cost of accuracy. Variables that		// increases variable availability at the cost of accuracy. Variables that
// cannot be promoted by mem2reg or SROA will be described as living in memory		// cannot be promoted by mem2reg or SROA will be described as living in memory
// for their entire lifetime. However, passes like DSE and instcombine can		// for their entire lifetime. However, passes like DSE and instcombine can
// delete stores to the alloca, leading to misleading and inaccurate debug		// delete stores to the alloca, leading to misleading and inaccurate debug
// information. This flag can be removed when those passes are fixed.		// information. This flag can be removed when those passes are fixed.
static cl::opt<unsigned> ShouldLowerDbgDeclare("instcombine-lower-dbg-declare",		static cl::opt<unsigned> ShouldLowerDbgDeclare("instcombine-lower-dbg-declare",
cl::Hidden, cl::init(true));		cl::Hidden, cl::init(true));

		static cl::opt<bool>
		NoAddrSpaceCastFolding("instcombine-no-addrspacecast-folding",
		cl::desc("Do not fold addrspacecast into GEP."));

Value InstCombiner::EmitGEPOffset(User GEP) {		Value InstCombiner::EmitGEPOffset(User GEP) {
return llvm::EmitGEPOffset(&Builder, DL, GEP);		return llvm::EmitGEPOffset(&Builder, DL, GEP);
}		}

/// Return true if it is desirable to convert an integer computation from a		/// Return true if it is desirable to convert an integer computation from a
/// given bit width to a new bit width.		/// given bit width to a new bit width.
/// We don't want to convert from a legal to an illegal type or from a smaller		/// We don't want to convert from a legal to an illegal type or from a smaller
/// to a larger illegal type. A width of '1' is always treated as a legal type		/// to a larger illegal type. A width of '1' is always treated as a legal type
▲ Show 20 Lines • Show All 1,620 Lines • ▼ Show 20 Lines	if (GEP.getNumIndices() == 1) {
}		}
}		}

// We do not handle pointer-vector geps here.		// We do not handle pointer-vector geps here.
if (GEP.getType()->isVectorTy())		if (GEP.getType()->isVectorTy())
return nullptr;		return nullptr;

// Handle gep(bitcast x) and gep(gep x, 0, 0, 0).		// Handle gep(bitcast x) and gep(gep x, 0, 0, 0).
Value *StrippedPtr = PtrOp->stripPointerCasts();		Value *StrippedPtr;
		if (NoAddrSpaceCastFolding)
		StrippedPtr = PtrOp->stripPointerCastsKeepAddrSpaceCast();
		else
		StrippedPtr = PtrOp->stripPointerCasts();
PointerType *StrippedPtrTy = cast<PointerType>(StrippedPtr->getType());		PointerType *StrippedPtrTy = cast<PointerType>(StrippedPtr->getType());

if (StrippedPtr != PtrOp) {		if (StrippedPtr != PtrOp) {
bool HasZeroPointerIndex = false;		bool HasZeroPointerIndex = false;
if (ConstantInt *C = dyn_cast<ConstantInt>(GEP.getOperand(1)))		if (ConstantInt *C = dyn_cast<ConstantInt>(GEP.getOperand(1)))
HasZeroPointerIndex = C->isZero();		HasZeroPointerIndex = C->isZero();

// Transform: GEP (bitcast [10 x i8]* X to [0 x i8]*), i32 0, ...		// Transform: GEP (bitcast [10 x i8]* X to [0 x i8]*), i32 0, ...
▲ Show 20 Lines • Show All 1,568 Lines • Show Last 20 Lines

test/Transforms/InstCombine/cast-no-addrspacecast-folding.ll

This file was added.

				; RUN: opt < %s -instcombine -instcombine-no-addrspacecast-folding=true -S \| FileCheck %s
				target datalayout = "E-p:64:64:64-p1:32:32:32-p2:64:64:64-p3:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128-n8:16:32:64"

				define double @test80_addrspacecast(double addrspace(1)* %p, i32 %i) {
				; CHECK-LABEL: @test80_addrspacecast(
				; CHECK-NEXT: [[Q:%.]] = addrspacecast double addrspace(1) %p to double addrspace(2)*
				; CHECK-NEXT: [[S:%.*]] = sext i32 %i to i64
				; CHECK-NEXT: [[PP:%.]] = getelementptr double, double addrspace(2) [[Q]], i64 [[S]]
				; CHECK-NEXT: [[L:%.]] = load double, double addrspace(2) [[PP]], align 8
				; CHECK-NEXT: ret double [[L]]
				;
				%q = addrspacecast double addrspace(1)* %p to double addrspace(2)*
				%pp = getelementptr double, double addrspace(2)* %q, i32 %i
				%l = load double, double addrspace(2)* %pp
				ret double %l
				}