This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Analysis/
-
Analysis/
-
BasicAliasAnalysis.cpp
-
test/Transforms/MemCpyOpt/
-
Transforms/
-
MemCpyOpt/
3/9
stackrestore.ll

Differential D136285

Bad optimization with alloca and intrinsic function stackrestore
AbandonedPublic

Authored by jamieschmeiser on Oct 19 2022, 12:44 PM.

Download Raw Diff

Details

Reviewers

rnk

Summary

Fix the alias analysis handling of stackrestore.

See IR in function test in new lit test for example code that is mis-optimized.
Alias analysis does not detect properly that an alloca is clobbered by a call to
the intrinsic function llvm.stackrestore.

Fix the handling of stackrestore by moving it forward in the function before
the handling of tail call functions since stackrestore is a tail call function.
Also, remove the requirement that the alloca being considered not be a static
alloca since the alloca can be after the stacksave in the entry block of
a function.

Diff Detail

Event Timeline

jamieschmeiser created this revision.Oct 19 2022, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 12:44 PM

Herald added subscribers: jeroen.dobbelaere, hiraditya. · View Herald Transcript

jamieschmeiser requested review of this revision.Oct 19 2022, 12:44 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 12:44 PM

See https://github.com/llvm/llvm-project/issues/47943 for some discussion on the topic, in particular also the last comment. I doubt that dropping the isStaticAlloca() check *just* here is the right thing to do -- we need to have some kind of self-consistent notion of what a static alloca is. The current notion appears to be that any (fixed-size, non-inalloca) alloca in the entry block is a static alloca, regardless of where exactly it appears, and as such really is not clobbered by stackrestore. Of course, codegen needs to be consistent with that notion. If we want to change that, I believe this needs to actually happen inside isStaticAlloca(), so everything has a consistent picture about this.

Regarding the tail marker, I believe that should be fixed inside markTails() -- not accessing stack memory is a prerequisite for the tail marker, so inferring it for stackrestore seems incorrect to me, independently of the other question.

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
3	The second run line is unnecessary. The check lines should be generated using update_test_checks.py.

seems like the stackrestore shouldn't be marked as a tail call?

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
3	can you use update_test_checks.py instead of adding a new RUN line?
88	test could use some reduction, e.g. no `dso_local`, some of instructions seem unnecessary for a repro

(whoops should have refreshed before commenting)

I think it will fail even if stackrestore is not tail call because it will still not pass the !isStaticAlloca query

Harbormaster completed remote builds in B193071: Diff 469010.Oct 19 2022, 1:31 PM

What does inalloca mean? It is only mentioned in the syntax line of https://llvm.org/docs/LangRef.html#alloca-instruction.

I found the following in clang/lib/CodeGen/CGCall.cpp:

// Insert a stack save if we're going to need any inalloca args.                                                                                                                                                                                            
if (hasInAllocaArgs(CGM, ExplicitCC, ArgTypes)) {
  assert(getTarget().getTriple().getArch() == llvm::Triple::x86 &&
         "inalloca only supported on x86");
  Args.allocateArgumentMemory(*this);
}

How universal is inalloca? Or is this something different?

See https://llvm.org/docs/LangRef.html#attr-inalloca ... but in this context, the important thing is just that inalloca allocas must always be treated as dynamic allocation. isStaticAlloca() handles this for you.

rnk added inline comments.Oct 27 2022, 12:26 PM

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
97	My expectation is that all the allocas here are static: They are part of the entry block, they will be part of the initial stack allocation, they will not be affected by stacksave/restore. This does mean that simplifycfg can extend the lifetime of a stack allocation, but to my knowledge, that's valid, the program should have the same observable behavior.

I have verified that adding inalloca and removing tail call on stackrestore in my sample IR will fix the problem.

I am still unclear about inalloca, where it actually gets specified and how it is used. I am continuing to study this. I am also unclear why some of the comments on the link (which does seem to be related) seem to indicate that there may not be a bug here (other than the tail call on stackrestore). There is an observable change in behaviour caused by memcpyopt and by simplifycfg when it extends object lifetime by collapsing blocks (see reply to comment in testcase).

llvm/test/Transforms/MemCpyOpt/stackrestore.ll

simplifycfg extending the lifetime of a stack allocation does have observable behaviour changes. As pointed out in previous comments, it seems that stackrestore should not have tail on it, so consider the test IR without tail call on stackrestore and the code not in the entry block.

define dso_local signext i32 @test() {
associate006_entry:
  br label %b1

b1:
  %A1 = alloca [56 x i8], align 8
  %SS = tail call ptr @llvm.stacksave()
  %A2 = alloca [56 x i8], align 4
  store i8 1, ptr %A2, align 4
  %GEP1 = getelementptr inbounds i8, ptr %A2, i32 8
  store i8 1, ptr %GEP1, align 4
  %GEP2 = getelementptr inbounds i8, ptr %A2, i32 12
  store i8 1, ptr %GEP2, align 4
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(56) %A1, ptr noundef nonnull align 4 dereferenceable(56) %A2, i32 56, i1 false)
  call void @llvm.stackrestore(ptr %SS)
  %A3 = alloca [56 x i8], align 4
  %uglygep123 = getelementptr i8, ptr %A3, i32 0
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(56) %uglygep123, ptr noundef nonnull align 8 dereferenceable(56) %A1, i32 56, i1 false)
  ret i32 0
}

This will not pass the isStaticAlloca query and (due to the removed tail call) on stackrestore, no optimization will be performed by memcpyopt. If simplifycfg is performed first, the two blocks are collapsed, this becomes the entry block and memcpyopt will do the optimization, changing the observable behaviour.

In D136285#3889698, @jamieschmeiser wrote:

I have verified that adding inalloca and removing tail call on stackrestore in my sample IR will fix the problem.

inalloca is intended to be used in conjunction with the inalloca argument attribute. It will make your allocas dynamic, as you say, but it's not designed for this purpose, and maybe that's OK, perhaps it should evolve into a "dynamic alloca" marker so we can be more intentional about static and dynamic allocas.

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
97	As long as the lifetimes of A1, A2, and A3 all last for the entire stack, I don't see any problems with the transform. From what I can tell using llc, they are all static allocas and the result of the program is the same. I think you are right that stackrestore should not have a tail call marker on it, but that doesn't have any bearing on whether these are static or dynamic allocas. Maybe this test is overreduced. Can you say more about why this transform is causing problems?

jamieschmeiser added inline comments.Oct 28 2022, 2:42 PM

llvm/test/Transforms/MemCpyOpt/stackrestore.ll

It isn't clear from the test exactly what is going wrong, so I've added a bit more to illustrate, as well as comments tracing the values.
You can see the optimization performed using opt 1028.ll -passes=memcpyopt -S -print-changed=diff-quiet > /dev/
The contents of A3 will be different before and after the optimization.

---- 1028.ll ----
; Function Attrs: nobuiltin norecurse
define dso_local signext i32 @test() {
associate006_entry:
  %A1 = alloca [2 x i8], align 8
  store i8 0, ptr %A1, align 4
  ; A1[0] == 0
  %G1 = getelementptr inbounds i8, ptr %A1, i32 1
  store i8 0, ptr %G1, align 4
  ; A1[1] == 0
  %SS = tail call ptr @llvm.stacksave()
  %A2 = alloca [2 x i8], align 4
  store i8 1, ptr %A2, align 4
  ; A2[0] == 1
  %GEP1 = getelementptr inbounds i8, ptr %A2, i32 1
  store i8 1, ptr %GEP1, align 4
  ; A2[1] == 1
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(2) %A1, ptr noundef nonnull align 4 dereferenceable(2) %A2, i32 2, i1 false)
  ; A1[0] == 1, A1[1] == 1
  tail call void @llvm.stackrestore(ptr %SS)
  ; A1[0] == 0, A1[1] == 0
  %A3 = alloca [2 x i8], align 4
  %uglygep123 = getelementptr i8, ptr %A3, i32 0
  call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(2) %uglygep123, ptr noundef nonnull align 8 dereferenceable(2) %A1, i32 2, i1 false)
  ; A3[0] == A1[0] == 0, A3[1] == A1[1] == 0
  ; memcpyopt changes IR to
  ;   call void @llvm.memcpy.p0.p0.i32(ptr align 1 %uglygep123, ptr align 4 %A2, i32 2, i1 false)
  ; A3[0] == A2[0] == 1, A3[1] == A2[1] == 1

  ret i32 0
}

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
declare ptr @llvm.stacksave()

; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn
declare void @llvm.stackrestore(ptr)

; Function Attrs: argmemonly mustprogress nocallback nofree nounwind willreturn
declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg)

efriedma added inline comments.Oct 31 2022, 11:14 AM

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
97	; A1[0] == 1, A1[1] == 1 tail call void @llvm.stackrestore(ptr %SS) ; A1[0] == 0, A1[1] == 0 If you compile the given example with "llc -O0", you'll see that `@llvm.stacksave` corresponds to `movq %rsp, %rax`, and `@llvm.stackrestore` corresponds to `movq %rax, %rsp`. Neither RAX nor RSP are modified between these two instructions, so `@llvm.stackrestore` has no effect in this context. I'm not sure why you think the value of A1 changes after the stackrestore. In general, the best evidence of a miscompile is to present a program that actually produces different results based on optimizations. Your example is reduced too far to show anything useful; the array A3 is dead after the memcpy, so its contents don't actually affect the output of the program.

jamieschmeiser abandoned this revision.Nov 4 2022, 7:21 AM

jamieschmeiser added inline comments.

llvm/test/Transforms/MemCpyOpt/stackrestore.ll
97	I went back to the original source (which I am not at liberty to share) to get a complete example and discovered, on further testing, that the code coming out of memcpyopt is correct and it is actually the next opt pass to make a change (dsepass) that causes the mis-optimization. I tested this using llc on the IR before and after dse. Thanks everyone for the help and I will continue looking into the problem. I am abandoning this revision.

Revision Contents

Path

Size

llvm/

lib/

Analysis/

BasicAliasAnalysis.cpp

11 lines

test/

Transforms/

MemCpyOpt/

stackrestore.ll

56 lines

Diff 469010

llvm/lib/Analysis/BasicAliasAnalysis.cpp

	Show First 20 Lines • Show All 874 Lines • ▼ Show 20 Lines
	ModRefInfo BasicAAResult::getModRefInfo(const CallBase *Call,			ModRefInfo BasicAAResult::getModRefInfo(const CallBase *Call,
	const MemoryLocation &Loc,			const MemoryLocation &Loc,
	AAQueryInfo &AAQI) {			AAQueryInfo &AAQI) {
	assert(notDifferentParent(Call, Loc.Ptr) &&			assert(notDifferentParent(Call, Loc.Ptr) &&
	"AliasAnalysis query involving multiple functions!");			"AliasAnalysis query involving multiple functions!");

	const Value *Object = getUnderlyingObject(Loc.Ptr);			const Value *Object = getUnderlyingObject(Loc.Ptr);

				// Stack restore is able to modify unescaped dynamic allocas. Assume it may
				// modify them even though the alloca is not escaped.
				if (isa<AllocaInst>(Object) && isIntrinsicCall(Call, Intrinsic::stackrestore))
				return ModRefInfo::Mod;

	// Calls marked 'tail' cannot read or write allocas from the current frame			// Calls marked 'tail' cannot read or write allocas from the current frame
	// because the current frame might be destroyed by the time they run. However,			// because the current frame might be destroyed by the time they run. However,
	// a tail call may use an alloca with byval. Calling with byval copies the			// a tail call may use an alloca with byval. Calling with byval copies the
	// contents of the alloca into argument registers or stack slots, so there is			// contents of the alloca into argument registers or stack slots, so there is
	// no lifetime issue.			// no lifetime issue.
	if (isa<AllocaInst>(Object))			if (isa<AllocaInst>(Object))
	if (const CallInst *CI = dyn_cast<CallInst>(Call))			if (const CallInst *CI = dyn_cast<CallInst>(Call))
	if (CI->isTailCall() &&			if (CI->isTailCall() &&
	!CI->getAttributes().hasAttrSomewhere(Attribute::ByVal))			!CI->getAttributes().hasAttrSomewhere(Attribute::ByVal))
	return ModRefInfo::NoModRef;			return ModRefInfo::NoModRef;

	// Stack restore is able to modify unescaped dynamic allocas. Assume it may
	// modify them even though the alloca is not escaped.
	if (auto *AI = dyn_cast<AllocaInst>(Object))
	if (!AI->isStaticAlloca() && isIntrinsicCall(Call, Intrinsic::stackrestore))
	return ModRefInfo::Mod;

	// If the pointer is to a locally allocated object that does not escape,			// If the pointer is to a locally allocated object that does not escape,
	// then the call can not mod/ref the pointer unless the call takes the pointer			// then the call can not mod/ref the pointer unless the call takes the pointer
	// as an argument, and itself doesn't capture it.			// as an argument, and itself doesn't capture it.
	if (!isa<Constant>(Object) && Call != Object &&			if (!isa<Constant>(Object) && Call != Object &&
	AAQI.CI->isNotCapturedBeforeOrAt(Object, Call)) {			AAQI.CI->isNotCapturedBeforeOrAt(Object, Call)) {

	// Optimistically assume that call doesn't touch Object and check this			// Optimistically assume that call doesn't touch Object and check this
	// assumption in the following loop.			// assumption in the following loop.
	▲ Show 20 Lines • Show All 961 Lines • Show Last 20 Lines

llvm/test/Transforms/MemCpyOpt/stackrestore.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -S -memcpyopt < %s -verify-memoryssa \| FileCheck %s		; RUN: opt -S -memcpyopt < %s -verify-memoryssa \| FileCheck %s
		; RUN: opt -S -passes=memcpyopt -verify-memoryssa < %s \| FileCheck %s --check-prefix=CHECK-TEST2
		aeubanksUnsubmitted Not Done Reply Inline Actions can you use update_test_checks.py instead of adding a new RUN line? aeubanks: can you use update_test_checks.py instead of adding a new RUN line?
		nikicUnsubmitted Not Done Reply Inline Actions The second run line is unnecessary. The check lines should be generated using update_test_checks.py. nikic: The second run line is unnecessary. The check lines should be generated using…

; PR40118: BasicAA didn't realize that stackrestore ends the lifetime of		; PR40118: BasicAA didn't realize that stackrestore ends the lifetime of
; unescaped dynamic allocas, such as those that might come from inalloca.		; unescaped dynamic allocas, such as those that might come from inalloca.

source_filename = "t.cpp"		source_filename = "t.cpp"
target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"		target datalayout = "e-m:x-p:32:32-i64:64-f80:32-n8:16:32-a:0:32-S32"
target triple = "i686-unknown-windows-msvc19.14.26433"		target triple = "i686-unknown-windows-msvc19.14.26433"

▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	;
call void @llvm.memcpy.p0.p0.i32(ptr %tmpmem, ptr %argmem, i32 10, i1 false)		call void @llvm.memcpy.p0.p0.i32(ptr %tmpmem, ptr %argmem, i32 10, i1 false)
call void @llvm.stackrestore(ptr %inalloca.save)		call void @llvm.stackrestore(ptr %inalloca.save)
%heap = call ptr @malloc(i32 9)		%heap = call ptr @malloc(i32 9)
call void @llvm.memcpy.p0.p0.i32(ptr %heap, ptr %tmpmem, i32 9, i1 false)		call void @llvm.memcpy.p0.p0.i32(ptr %heap, ptr %tmpmem, i32 9, i1 false)
call void @useit(ptr %heap)		call void @useit(ptr %heap)
ret i32 0		ret i32 0
}		}

		; Test that memcpyopt does not change the final memcpy because the source
		; is an alloca that is clobbered by a call to stackrestore

		; Function Attrs: nobuiltin norecurse
		define dso_local void @test() {
		aeubanksUnsubmitted Not Done Reply Inline Actions test could use some reduction, e.g. no `dso_local`, some of instructions seem unnecessary for a repro aeubanks: test could use some reduction, e.g. no `dso_local`, some of instructions seem unnecessary for a…
		; CHECK-TEST2-LABEL: @test
		; CHECK-TEST2-NOT: ret void
		; CHECK-TEST2: tail call void @llvm.stackrestore(ptr %SS)
		; CHECK-TEST2: call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(56) %uglygep123, ptr noundef nonnull align 8 dereferenceable(56) %A1, i32 56, i1 false)
		; CHECK-TEST2: ret void
		entry:
		%A1 = alloca [56 x i8], align 8
		%SS = tail call ptr @llvm.stacksave()
		%A2 = alloca [56 x i8], align 4
		rnkUnsubmitted Not Done Reply Inline Actions My expectation is that all the allocas here are static: They are part of the entry block, they will be part of the initial stack allocation, they will not be affected by stacksave/restore. This does mean that simplifycfg can extend the lifetime of a stack allocation, but to my knowledge, that's valid, the program should have the same observable behavior. rnk: My expectation is that all the allocas here are static: They are part of the entry block, they…
		jamieschmeiserAuthorUnsubmitted Done Reply Inline Actions simplifycfg extending the lifetime of a stack allocation does have observable behaviour changes. As pointed out in previous comments, it seems that stackrestore should not have tail on it, so consider the test IR without tail call on stackrestore and the code not in the entry block. define dso_local signext i32 @test() { associate006_entry: br label %b1 b1: %A1 = alloca [56 x i8], align 8 %SS = tail call ptr @llvm.stacksave() %A2 = alloca [56 x i8], align 4 store i8 1, ptr %A2, align 4 %GEP1 = getelementptr inbounds i8, ptr %A2, i32 8 store i8 1, ptr %GEP1, align 4 %GEP2 = getelementptr inbounds i8, ptr %A2, i32 12 store i8 1, ptr %GEP2, align 4 call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(56) %A1, ptr noundef nonnull align 4 dereferenceable(56) %A2, i32 56, i1 false) call void @llvm.stackrestore(ptr %SS) %A3 = alloca [56 x i8], align 4 %uglygep123 = getelementptr i8, ptr %A3, i32 0 call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(56) %uglygep123, ptr noundef nonnull align 8 dereferenceable(56) %A1, i32 56, i1 false) ret i32 0 } This will not pass the isStaticAlloca query and (due to the removed tail call) on stackrestore, no optimization will be performed by memcpyopt. If simplifycfg is performed first, the two blocks are collapsed, this becomes the entry block and memcpyopt will do the optimization, changing the observable behaviour. jamieschmeiser: simplifycfg extending the lifetime of a stack allocation does have observable behaviour changes.
		rnkUnsubmitted Not Done Reply Inline Actions As long as the lifetimes of A1, A2, and A3 all last for the entire stack, I don't see any problems with the transform. From what I can tell using llc, they are all static allocas and the result of the program is the same. I think you are right that stackrestore should not have a tail call marker on it, but that doesn't have any bearing on whether these are static or dynamic allocas. Maybe this test is overreduced. Can you say more about why this transform is causing problems? rnk: As long as the lifetimes of A1, A2, and A3 all last for the entire stack, I don't see any…
		jamieschmeiserAuthorUnsubmitted Done Reply Inline Actions It isn't clear from the test exactly what is going wrong, so I've added a bit more to illustrate, as well as comments tracing the values. You can see the optimization performed using opt 1028.ll -passes=memcpyopt -S -print-changed=diff-quiet > /dev/ The contents of A3 will be different before and after the optimization. ---- 1028.ll ---- ; Function Attrs: nobuiltin norecurse define dso_local signext i32 @test() { associate006_entry: %A1 = alloca [2 x i8], align 8 store i8 0, ptr %A1, align 4 ; A1[0] == 0 %G1 = getelementptr inbounds i8, ptr %A1, i32 1 store i8 0, ptr %G1, align 4 ; A1[1] == 0 %SS = tail call ptr @llvm.stacksave() %A2 = alloca [2 x i8], align 4 store i8 1, ptr %A2, align 4 ; A2[0] == 1 %GEP1 = getelementptr inbounds i8, ptr %A2, i32 1 store i8 1, ptr %GEP1, align 4 ; A2[1] == 1 call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(2) %A1, ptr noundef nonnull align 4 dereferenceable(2) %A2, i32 2, i1 false) ; A1[0] == 1, A1[1] == 1 tail call void @llvm.stackrestore(ptr %SS) ; A1[0] == 0, A1[1] == 0 %A3 = alloca [2 x i8], align 4 %uglygep123 = getelementptr i8, ptr %A3, i32 0 call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(2) %uglygep123, ptr noundef nonnull align 8 dereferenceable(2) %A1, i32 2, i1 false) ; A3[0] == A1[0] == 0, A3[1] == A1[1] == 0 ; memcpyopt changes IR to ; call void @llvm.memcpy.p0.p0.i32(ptr align 1 %uglygep123, ptr align 4 %A2, i32 2, i1 false) ; A3[0] == A2[0] == 1, A3[1] == A2[1] == 1 ret i32 0 } ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn declare ptr @llvm.stacksave() ; Function Attrs: mustprogress nocallback nofree nosync nounwind willreturn declare void @llvm.stackrestore(ptr) ; Function Attrs: argmemonly mustprogress nocallback nofree nounwind willreturn declare void @llvm.memcpy.p0.p0.i32(ptr noalias nocapture writeonly, ptr noalias nocapture readonly, i32, i1 immarg) jamieschmeiser: It isn't clear from the test exactly what is going wrong, so I've added a bit more to…
		efriedmaUnsubmitted Not Done Reply Inline Actions ; A1[0] == 1, A1[1] == 1 tail call void @llvm.stackrestore(ptr %SS) ; A1[0] == 0, A1[1] == 0 If you compile the given example with "llc -O0", you'll see that `@llvm.stacksave` corresponds to `movq %rsp, %rax`, and `@llvm.stackrestore` corresponds to `movq %rax, %rsp`. Neither RAX nor RSP are modified between these two instructions, so `@llvm.stackrestore` has no effect in this context. I'm not sure why you think the value of A1 changes after the stackrestore. In general, the best evidence of a miscompile is to present a program that actually produces different results based on optimizations. Your example is reduced too far to show anything useful; the array A3 is dead after the memcpy, so its contents don't actually affect the output of the program. efriedma: > ; A1[0] == 1, A1[1] == 1 > tail call void @llvm.stackrestore(ptr %SS) > ; A1[0] == 0…
		jamieschmeiserAuthorUnsubmitted Done Reply Inline Actions I went back to the original source (which I am not at liberty to share) to get a complete example and discovered, on further testing, that the code coming out of memcpyopt is correct and it is actually the next opt pass to make a change (dsepass) that causes the mis-optimization. I tested this using llc on the IR before and after dse. Thanks everyone for the help and I will continue looking into the problem. I am abandoning this revision. jamieschmeiser: I went back to the original source (which I am not at liberty to share) to get a complete…
		store i8 1, ptr %A2, align 4
		%GEP1 = getelementptr inbounds i8, ptr %A2, i32 8
		store i8 1, ptr %GEP1, align 4
		%GEP2 = getelementptr inbounds i8, ptr %A2, i32 12
		store i8 1, ptr %GEP2, align 4
		call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(56) %A1, ptr noundef nonnull align 4 dereferenceable(56) %A2, i32 56, i1 false)
		tail call void @llvm.stackrestore(ptr %SS)
		%A3 = alloca [56 x i8], align 4
		%uglygep123 = getelementptr i8, ptr %A3, i32 0
		call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(56) %uglygep123, ptr noundef nonnull align 8 dereferenceable(56) %A1, i32 56, i1 false)
		ret void
		}

		; Control test: mimic the previous test but substitute different functions
		; for the intrinsics stacksave and stackrestore. Test that memcpyopt does
		; optimize the final memcpy.

		; Function Attrs: nobuiltin norecurse
		define dso_local void @control() {
		; CHECK-TEST2-LABEL: @control
		; CHECK-TEST2-NOT: ret void
		; CHECK-TEST2: tail call void @useit(ptr %CSS)
		; CHECK-TEST2: call void @llvm.memcpy.p0.p0.i32(ptr align 1 %Cuglygep123, ptr align 4 %CA2, i32 56, i1 false)
		; CHECK-TEST2: ret void
		entry:
		%CA1 = alloca [56 x i8], align 8
		%CSS = tail call ptr @external()
		%CA2 = alloca [56 x i8], align 4
		store i8 1, ptr %CA2, align 4
		%CGEP1 = getelementptr inbounds i8, ptr %CA2, i32 8
		store i8 1, ptr %CGEP1, align 4
		%CGEP2 = getelementptr inbounds i8, ptr %CA2, i32 12
		store i8 1, ptr %CGEP2, align 4
		call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 8 dereferenceable(56) %CA1, ptr noundef nonnull align 4 dereferenceable(56) %CA2, i32 56, i1 false)
		tail call void @useit(ptr %CSS)
		%CA3 = alloca [56 x i8], align 4
		%Cuglygep123 = getelementptr i8, ptr %CA3, i32 0
		call void @llvm.memcpy.p0.p0.i32(ptr noundef nonnull align 1 dereferenceable(56) %Cuglygep123, ptr noundef nonnull align 8 dereferenceable(56) %CA1, i32 56, i1 false)
		ret void
		}

declare void @llvm.memcpy.p0.p0.i32(ptr nocapture writeonly, ptr nocapture readonly, i32, i1)		declare void @llvm.memcpy.p0.p0.i32(ptr nocapture writeonly, ptr nocapture readonly, i32, i1)
declare ptr @llvm.stacksave()		declare ptr @llvm.stacksave()
declare void @llvm.stackrestore(ptr)		declare void @llvm.stackrestore(ptr)
declare ptr @malloc(i32)		declare ptr @malloc(i32)
declare void @useit(ptr)		declare void @useit(ptr)
declare void @external()		declare void @external()

This is an archive of the discontinued LLVM Phabricator instance.

Bad optimization with alloca and intrinsic function stackrestoreAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469010

llvm/lib/Analysis/BasicAliasAnalysis.cpp

llvm/test/Transforms/MemCpyOpt/stackrestore.ll

Bad optimization with alloca and intrinsic function stackrestore
AbandonedPublic