This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Check for existing inttotpr before creating new one
ClosedPublic

Authored by rtereshin on Jan 17 2019, 2:57 AM.

Download Raw Diff

Details

Reviewers

qcolombet
marcello.maggioni
bogner
hfinkel

Commits

rG85a0467a11bc: [CGP] Check for existing inttotpr before creating new one
rL351582: [CGP] Check for existing inttotpr before creating new one

Summary

Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions of
the same integer value while sinking address computations, but rather
CSEs them on the fly: excessive inttoptr's confuse SCEV into thinking
that related pointers have nothing to do with each other.

This problem blocks LoadStoreVectorizer from vectorizing some of the
loads / stores in a downstream target.

Diff Detail

Repository: rL LLVM

Event Timeline

rtereshin created this revision.Jan 17 2019, 2:57 AM

Herald added subscribers: llvm-commits, javed.absar. · View Herald TranscriptJan 17 2019, 2:57 AM

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

In D56838#1361579, @hfinkel wrote:

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

rtereshin added reviewers: bogner, hfinkel.Jan 17 2019, 4:26 PM

In D56838#1362440, @rtereshin wrote:

In D56838#1361579, @hfinkel wrote:

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

Fair enough. Thanks for checking.

I have a comment about the test case, but otherwise LGTM.

test/Transforms/CodeGenPrepare/X86/sink-addrmode-cse-inttoptrs.ll
19 ↗	(On Diff #182236)	Please check for the expected result, not just the absence of the scalar load.

This revision is now accepted and ready to land.Jan 17 2019, 7:27 PM

Updated the test as requested

In D56838#1362647, @hfinkel wrote:

In D56838#1362440, @rtereshin wrote:

In D56838#1361579, @hfinkel wrote:

Can we run EarlyCSE after CGP (instead of trying to do these kinds of point fixes)?

Hi Hal,

Thank you for looking into this!

That's a great suggestion! Originally I just eyeballed it as too expensive compile-time wise for the little effect that is achieved here.

This time around with you pointing it out, I performed the actual measurements for a very large suite of shaders (the downstream target in question is a GPU). I see about 1.0 (+/-0.3)% increase in overall compile time (the part of it that happens at an application run-time) on average. The generated code quality improvement that comes out of this may be important, but it doesn't trigger often enough to justify 1% compile time increase across the board.

Thanks,
Roman

Fair enough. Thanks for checking.

I have a comment about the test case, but otherwise LGTM.

Fixed the test, thank you!

Roman

Closed by commit rL351582: [CGP] Check for existing inttotpr before creating new one (authored by rtereshin). · Explain WhyJan 18 2019, 12:19 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

CodeGenPrepare.cpp

17 lines

test/

Transforms/

CodeGenPrepare/

X86/

sink-addrmode-cse-inttoptrs.ll

40 lines

Diff 182578

llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,658 Lines • ▼ Show 20 Lines	if (AddrMode.BaseGV) {

ResultPtr = AddrMode.BaseGV;		ResultPtr = AddrMode.BaseGV;
}		}

// If the real base value actually came from an inttoptr, then the matcher		// If the real base value actually came from an inttoptr, then the matcher
// will look through it and provide only the integer value. In that case,		// will look through it and provide only the integer value. In that case,
// use it here.		// use it here.
if (!DL->isNonIntegralPointerType(Addr->getType())) {		if (!DL->isNonIntegralPointerType(Addr->getType())) {
		const auto getResultPtr = [MemoryInst, Addr,
		&Builder](Value Reg) -> Value {
		for (User *U : Reg->users())
		if (auto *I2P = dyn_cast<IntToPtrInst>(U))
		if (I2P->getType() == Addr->getType() &&
		I2P->getParent() == MemoryInst->getParent()) {
		I2P->moveBefore(MemoryInst->getParent()->getFirstNonPHI());
		return I2P;
		}
		return Builder.CreateIntToPtr(Reg, Addr->getType(), "sunkaddr");
		};
if (!ResultPtr && AddrMode.BaseReg) {		if (!ResultPtr && AddrMode.BaseReg) {
ResultPtr = Builder.CreateIntToPtr(AddrMode.BaseReg, Addr->getType(),		ResultPtr = getResultPtr(AddrMode.BaseReg);
"sunkaddr");
AddrMode.BaseReg = nullptr;		AddrMode.BaseReg = nullptr;
} else if (!ResultPtr && AddrMode.Scale == 1) {		} else if (!ResultPtr && AddrMode.Scale == 1) {
ResultPtr = Builder.CreateIntToPtr(AddrMode.ScaledReg, Addr->getType(),		ResultPtr = getResultPtr(AddrMode.ScaledReg);
"sunkaddr");
AddrMode.Scale = 0;		AddrMode.Scale = 0;
}		}
}		}

if (!ResultPtr &&		if (!ResultPtr &&
!AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) {		!AddrMode.BaseReg && !AddrMode.Scale && !AddrMode.BaseOffs) {
SunkAddr = Constant::getNullValue(Addr->getType());		SunkAddr = Constant::getNullValue(Addr->getType());
} else if (!ResultPtr) {		} else if (!ResultPtr) {
▲ Show 20 Lines • Show All 2,398 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/CodeGenPrepare/X86/sink-addrmode-cse-inttoptrs.ll

				; RUN: opt -mtriple=x86_64-- -codegenprepare %s -S -o - \| FileCheck %s --check-prefix=CGP
				; RUN: opt -mtriple=x86_64-- -codegenprepare -load-store-vectorizer %s -S -o - \| FileCheck %s --check-prefix=LSV

				; Make sure CodeGenPrepare doesn't emit multiple inttoptr instructions
				; of the same integer value while sinking address computations, but
				; rather CSEs them on the fly: excessive inttoptr's confuse SCEV
				; into thinking that related pointers have nothing to do with each other.
				;
				; Triggering this problem involves having just right addressing modes,
				; and verifying that the motivating pass (LoadStoreVectorizer) is able
				; to benefit from it - just right LSV-policies. Hence the atypical combination
				; of the target and datalayout / address spaces in this test.

				target datalayout = "p1:32:32:32"

				define void @main(i32 %tmp, i32 %off) {
				; CGP: = inttoptr
				; CGP-NOT: = inttoptr
				; LSV: = load <2 x float>
				; LSV: = load <2 x float>
				entry:
				%tmp1 = inttoptr i32 %tmp to float addrspace(1)*
				%arrayidx.i.7 = getelementptr inbounds float, float addrspace(1)* %tmp1, i32 %off
				%add20.i.7 = add i32 %off, 1
				%arrayidx22.i.7 = getelementptr inbounds float, float addrspace(1)* %tmp1, i32 %add20.i.7
				br label %for.body

				for.body:
				%tmp8 = phi float [ undef, %entry ], [ %tmp62, %for.body ]
				%tmp28 = load float, float addrspace(1)* %arrayidx.i.7
				%tmp29 = load float, float addrspace(1)* %arrayidx22.i.7
				%arrayidx.i321.7 = getelementptr inbounds float, float addrspace(1)* %tmp1, i32 0
				%tmp43 = load float, float addrspace(1)* %arrayidx.i321.7
				%arrayidx22.i327.7 = getelementptr inbounds float, float addrspace(1)* %tmp1, i32 1
				%tmp44 = load float, float addrspace(1)* %arrayidx22.i327.7
				%tmp62 = tail call fast float @foo(float %tmp8, float %tmp44, float %tmp43, float %tmp29, float %tmp28)
				br label %for.body
				}

				declare float @foo(float, float, float, float, float)