This is an archive of the discontinued LLVM Phabricator instance.

SROA should freeze undefs for loads with no prior stores
AbandonedPublic

Authored by jamieschmeiser on Oct 19 2022, 12:21 PM.

Download Raw Diff

Details

Reviewers

nikic
tstellar
nlopes

Summary

SROA will replace loads of an alloca with undef when there are no
prior stores. However, multiple loads of the same memory must be
equal. Insert freeze instructions so that loads of the same alloca
with no prior store will compare correctly.

See new lit test /Transforms/SROA/same-promoted-undefs.ll for sample
IR that is fixed by this change.

Also fix up existing lit tests. @tstellar, please examine the AMD test changes.

Diff Detail

Event Timeline

jamieschmeiser created this revision.Oct 19 2022, 12:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 19 2022, 12:21 PM

Herald added subscribers: nlopes, kosarev, kerbowa and 3 others. · View Herald Transcript

jamieschmeiser requested review of this revision.Oct 19 2022, 12:21 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 19 2022, 12:21 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

The current behavior here is intentional -- in fact, LLVM will move towards returning poison for loads from uninitialized memory in the future (though precisely how this will happen is still uncertain, see https://discourse.llvm.org/t/rfc-making-bit-field-codegen-poison-compatible/63250 for some recent discussion on the topic).

This revision now requires changes to proceed.Oct 19 2022, 12:42 PM

Harbormaster completed remote builds in B193067: Diff 469003.Oct 19 2022, 1:09 PM

However, multiple loads of the same memory must be equal

This premise is incorrect w.r.t. with current semantics, which say that loads return undef for uninit memory.

As Nikita mentioned we are working towards changing this semantics. The work is ongoing, but we pivoted to make a couple of improvements in other areas to avoid perf regressions. For example, see the clang !noundef patches.

In D136284#3869306, @nlopes wrote:

However, multiple loads of the same memory must be equal

This premise is incorrect w.r.t. with current semantics, which say that loads return undef for uninit memory.

As Nikita mentioned we are working towards changing this semantics. The work is ongoing, but we pivoted to make a couple of improvements in other areas to avoid perf regressions. For example, see the clang !noundef patches.

This is a C/C++ language semantics statement. Yes, I realize that LLVM is not language specific, but this is what is generated for this, and also, what freeze appears specifically designed to handle.

In D136284#3869216, @nikic wrote:

The current behavior here is intentional -- in fact, LLVM will move towards returning poison for loads from uninitialized memory in the future (though precisely how this will happen is still uncertain, see https://discourse.llvm.org/t/rfc-making-bit-field-codegen-poison-compatible/63250 for some recent discussion on the topic).

The current behaviour implies that the content of uninitialized memory is volatile, which is not correct. One would still need the freeze to ensure that 2 loads of the same uninitialized memory are the same, whether this is undef or poison. So this would not affect that future change and would still be required. Besides, are you sure that poison is appropriate here? Loading uninitialized memory is not erroneous; it is undefined. Comparing the same uninitialized memory, however, is defined, hence the freeze.

At least in C++, working with uninitialized memory is pretty much always immediate undefined behavior, see https://eel.is/c++draft/basic.indet for the relevant wording. The only exception are "copy-like" operations on unsigned character types, which comparisons do not fall under.

I believe the C specification is less clear cut about this, but Clang and LLVM assume basically the same to also hold for C code.

In D136284#3874492, @nikic wrote:

At least in C++, working with uninitialized memory is pretty much always immediate undefined behavior, see https://eel.is/c++draft/basic.indet for the relevant wording. The only exception are "copy-like" operations on unsigned character types, which comparisons do not fall under.

I believe the C specification is less clear cut about this, but Clang and LLVM assume basically the same to also hold for C code.

What version of the C++ standard is this? Every version that I have seen has basics as section 3 and I cannot find this section, nor anything similar. Section 6 is Statements.

In D136284#3874596, @jamieschmeiser wrote:

In D136284#3874492, @nikic wrote:

At least in C++, working with uninitialized memory is pretty much always immediate undefined behavior, see https://eel.is/c++draft/basic.indet for the relevant wording. The only exception are "copy-like" operations on unsigned character types, which comparisons do not fall under.

I believe the C specification is less clear cut about this, but Clang and LLVM assume basically the same to also hold for C code.

What version of the C++ standard is this? Every version that I have seen has basics as section 3 and I cannot find this section, nor anything similar. Section 6 is Statements.

That discussion is orthogonal to this patch.
This patch is not desired because it's not needed per the current LLVM IR semantics. If you want to change something, you need to start by proposing a change to the LLVM IR semantics. You'll need to justify why it's needed, why it's correct, the perf impact, how to make it backwards compatible, why it's better than the proposals over the table right now.

Anyway, a patch like this solves no problem. LLVM allows loads to be duplicated. Your patch does nothing to prevent that and to ensure all loads see the same value. The issue is way more complicated than what this patch implies.

I checked with a member of the C++ standards committee and he verified that comparing an uninitialized value against itself is, indeed, undefined behaviour, in the general case. I am abandoning this revision.

In D136284#3874614, @nlopes wrote:

In D136284#3874596, @jamieschmeiser wrote:

In D136284#3874492, @nikic wrote:

At least in C++, working with uninitialized memory is pretty much always immediate undefined behavior, see https://eel.is/c++draft/basic.indet for the relevant wording. The only exception are "copy-like" operations on unsigned character types, which comparisons do not fall under.

I believe the C specification is less clear cut about this, but Clang and LLVM assume basically the same to also hold for C code.

What version of the C++ standard is this? Every version that I have seen has basics as section 3 and I cannot find this section, nor anything similar. Section 6 is Statements.

That discussion is orthogonal to this patch.
This patch is not desired because it's not needed per the current LLVM IR semantics. If you want to change something, you need to start by proposing a change to the LLVM IR semantics. You'll need to justify why it's needed, why it's correct, the perf impact, how to make it backwards compatible, why it's better than the proposals over the table right now.

Anyway, a patch like this solves no problem. LLVM allows loads to be duplicated. Your patch does nothing to prevent that and to ensure all loads see the same value. The issue is way more complicated than what this patch implies.

I'm not trying to flog a dead horse (I've already abandoned this) but I am trying to understand this statement. I do not dispute that there may be other situations similar to this but, assuming that we did want to ensure that the loads had the same value, why is freezing them at this point not the correct thing to do? Whether they are poison or undef, freezing them would ensure that they compare equal. Yes, I understand it may have performance impacts, there may be better ways, etc. But, ignoring all that, isn't this exactly what freeze is designed for?

In D136284#3874755, @jamieschmeiser wrote:

In D136284#3874614, @nlopes wrote:

In D136284#3874596, @jamieschmeiser wrote:

In D136284#3874492, @nikic wrote:

At least in C++, working with uninitialized memory is pretty much always immediate undefined behavior, see https://eel.is/c++draft/basic.indet for the relevant wording. The only exception are "copy-like" operations on unsigned character types, which comparisons do not fall under.

I believe the C specification is less clear cut about this, but Clang and LLVM assume basically the same to also hold for C code.

What version of the C++ standard is this? Every version that I have seen has basics as section 3 and I cannot find this section, nor anything similar. Section 6 is Statements.

That discussion is orthogonal to this patch.
This patch is not desired because it's not needed per the current LLVM IR semantics. If you want to change something, you need to start by proposing a change to the LLVM IR semantics. You'll need to justify why it's needed, why it's correct, the perf impact, how to make it backwards compatible, why it's better than the proposals over the table right now.

Anyway, a patch like this solves no problem. LLVM allows loads to be duplicated. Your patch does nothing to prevent that and to ensure all loads see the same value. The issue is way more complicated than what this patch implies.

I'm not trying to flog a dead horse (I've already abandoned this) but I am trying to understand this statement. I do not dispute that there may be other situations similar to this but, assuming that we did want to ensure that the loads had the same value, why is freezing them at this point not the correct thing to do? Whether they are poison or undef, freezing them would ensure that they compare equal. Yes, I understand it may have performance impacts, there may be better ways, etc. But, ignoring all that, isn't this exactly what freeze is designed for?

It's not enough to ensure the semantics you want. What about optimizations that happen before and after SROA? This patch only deals with a subset of the cases (the ones that are detected by SROA's algorithm). Again, load duplication happens, and this patch doesn't deal with it. So it's inconsistent.
To implement the proposed semantics, you would need to change quite a few optimizations, not just SROA.

Yes, I agree it is incomplete (aside from being incorrect here :-) I've just been asking to ensure that my understanding of freeze is correct. Thanks.

Revision Contents

Path

Size

clang/

test/

CodeGen/

LoongArch/

inline-asm-gcc-regs.c

28 lines

CodeGenCXX/

return.cpp

4 lines

CodeGenOpenCL/

overload.cl

12 lines

llvm/

lib/

Transforms/

Utils/

PromoteMemoryToRegister.cpp

33 lines

test/

CodeGen/

AMDGPU/

promote-alloca-vector-to-vector.ll

20 lines

vector-alloca-limits.ll

10 lines

Transforms/

Mem2Reg/

pr24179.ll

3 lines

preserve-nonnull-load-metadata.ll

15 lines

PhaseOrdering/

X86/

nancvt.ll

2 lines

SROA/

address-spaces.ll

3 lines

addrspacecast.ll

10 lines

alloca-address-space.ll

3 lines

basictest.ll

21 lines

phi-and-select.ll

6 lines

phi-gep.ll

6 lines

phi-with-duplicate-pred.ll

3 lines

pr37267.ll

8 lines

same-promoted-undefs.ll

164 lines

scalable-vectors.ll

3 lines

select-load.ll

6 lines

slice-width.ll

3 lines

sroa-common-type-fail-promotion.ll

6 lines

vector-conversion.ll

9 lines

vector-promotion.ll

18 lines

tools/

UpdateTestChecks/

update_llc_test_checks/

Inputs/

amdgpu_asm.ll.expected

11 lines

amdgpu_isel.ll.expected

59 lines

Diff 469003

clang/test/CodeGen/LoongArch/inline-asm-gcc-regs.c

	// RUN: %clang_cc1 -triple loongarch32 -emit-llvm -O2 %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple loongarch32 -emit-llvm -O2 %s -o - \| FileCheck %s
	// RUN: %clang_cc1 -triple loongarch64 -emit-llvm -O2 %s -o - \| FileCheck %s			// RUN: %clang_cc1 -triple loongarch64 -emit-llvm -O2 %s -o - \| FileCheck %s

	/// Check GCC register names and alias can be used in register variable definition.			/// Check GCC register names and alias can be used in register variable definition.

	// CHECK-LABEL: @test_r0			// CHECK-LABEL: @test_r0
	// CHECK: call void asm sideeffect "", "{$r0}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r0}"(i32 0)
	void test_r0() {			void test_r0() {
	register int a asm ("$r0");			register int a asm ("$r0");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_r12			// CHECK-LABEL: @test_r12
	// CHECK: call void asm sideeffect "", "{$r12}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r12}"(i32 0)
	void test_r12() {			void test_r12() {
	register int a asm ("$r12");			register int a asm ("$r12");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_r31			// CHECK-LABEL: @test_r31
	// CHECK: call void asm sideeffect "", "{$r31}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r31}"(i32 0)
	void test_r31() {			void test_r31() {
	register int a asm ("$r31");			register int a asm ("$r31");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_zero			// CHECK-LABEL: @test_zero
	// CHECK: call void asm sideeffect "", "{$r0}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r0}"(i32 0)
	void test_zero() {			void test_zero() {
	register int a asm ("$zero");			register int a asm ("$zero");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_a0			// CHECK-LABEL: @test_a0
	// CHECK: call void asm sideeffect "", "{$r4}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r4}"(i32 0)
	void test_a0() {			void test_a0() {
	register int a asm ("$a0");			register int a asm ("$a0");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_t1			// CHECK-LABEL: @test_t1
	// CHECK: call void asm sideeffect "", "{$r13}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r13}"(i32 0)
	void test_t1() {			void test_t1() {
	register int a asm ("$t1");			register int a asm ("$t1");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_fp			// CHECK-LABEL: @test_fp
	// CHECK: call void asm sideeffect "", "{$r22}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r22}"(i32 0)
	void test_fp() {			void test_fp() {
	register int a asm ("$fp");			register int a asm ("$fp");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_s2			// CHECK-LABEL: @test_s2
	// CHECK: call void asm sideeffect "", "{$r25}"(i32 undef)			// CHECK: call void asm sideeffect "", "{$r25}"(i32 0)
	void test_s2() {			void test_s2() {
	register int a asm ("$s2");			register int a asm ("$s2");
	asm ("" :: "r" (a));			asm ("" :: "r" (a));
	}			}

	// CHECK-LABEL: @test_f0			// CHECK-LABEL: @test_f0
	// CHECK: call void asm sideeffect "", "{$f0}"(float undef)			// CHECK: call void asm sideeffect "", "{$f0}"(float 0.000000e+00)
	void test_f0() {			void test_f0() {
	register float a asm ("$f0");			register float a asm ("$f0");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

	// CHECK-LABEL: @test_f14			// CHECK-LABEL: @test_f14
	// CHECK: call void asm sideeffect "", "{$f14}"(float undef)			// CHECK: call void asm sideeffect "", "{$f14}"(float 0.000000e+00)
	void test_f14() {			void test_f14() {
	register float a asm ("$f14");			register float a asm ("$f14");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

	// CHECK-LABEL: @test_f31			// CHECK-LABEL: @test_f31
	// CHECK: call void asm sideeffect "", "{$f31}"(float undef)			// CHECK: call void asm sideeffect "", "{$f31}"(float 0.000000e+00)
	void test_f31() {			void test_f31() {
	register float a asm ("$f31");			register float a asm ("$f31");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

	// CHECK-LABEL: @test_fa0			// CHECK-LABEL: @test_fa0
	// CHECK: call void asm sideeffect "", "{$f0}"(float undef)			// CHECK: call void asm sideeffect "", "{$f0}"(float 0.000000e+00)
	void test_fa0() {			void test_fa0() {
	register float a asm ("$fa0");			register float a asm ("$fa0");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

	// CHECK-LABEL: @test_ft1			// CHECK-LABEL: @test_ft1
	// CHECK: call void asm sideeffect "", "{$f9}"(float undef)			// CHECK: call void asm sideeffect "", "{$f9}"(float 0.000000e+00)
	void test_ft1() {			void test_ft1() {
	register float a asm ("$ft1");			register float a asm ("$ft1");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

	// CHECK-LABEL: @test_fs2			// CHECK-LABEL: @test_fs2
	// CHECK: call void asm sideeffect "", "{$f26}"(float undef)			// CHECK: call void asm sideeffect "", "{$f26}"(float 0.000000e+00)
	void test_fs2() {			void test_fs2() {
	register float a asm ("$fs2");			register float a asm ("$fs2");
	asm ("" :: "f" (a));			asm ("" :: "f" (a));
	}			}

clang/test/CodeGenCXX/return.cpp

	// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -o - %s \| FileCheck --check-prefixes=CHECK,CHECK-COMMON %s			// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -o - %s \| FileCheck --check-prefixes=CHECK,CHECK-COMMON %s
	// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -O -o - %s \| FileCheck %s --check-prefixes=CHECK-OPT,CHECK-COMMON			// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -O -o - %s \| FileCheck %s --check-prefixes=CHECK-OPT,CHECK-COMMON
	// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT,CHECK-COMMON			// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT,CHECK-COMMON
	// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -Wno-return-type -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT,CHECK-COMMON			// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -Wno-return-type -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT,CHECK-COMMON
	// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -O -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT-OPT,CHECK-COMMON			// RUN: %clang_cc1 -emit-llvm -triple %itanium_abi_triple -std=c++11 -fno-strict-return -O -o - %s \| FileCheck %s --check-prefixes=CHECK-NOSTRICT-OPT,CHECK-COMMON

	// CHECK-COMMON-LABEL: @_Z9no_return			// CHECK-COMMON-LABEL: @_Z9no_return
	int no_return() {			int no_return() {
	// CHECK: call void @llvm.trap			// CHECK: call void @llvm.trap
	// CHECK-NEXT: unreachable			// CHECK-NEXT: unreachable

	// CHECK-OPT-NOT: call void @llvm.trap			// CHECK-OPT-NOT: call void @llvm.trap
	// CHECK-OPT: unreachable			// CHECK-OPT: unreachable

	// -fno-strict-return should not emit trap + unreachable but it should return			// -fno-strict-return should not emit trap + unreachable but it should return
	// an undefined value instead.			// an undefined value instead. At opt, this is optimized to 0.

	// CHECK-NOSTRICT: alloca			// CHECK-NOSTRICT: alloca
	// CHECK-NOSTRICT-NEXT: load			// CHECK-NOSTRICT-NEXT: load
	// CHECK-NOSTRICT-NEXT: ret i32			// CHECK-NOSTRICT-NEXT: ret i32
	// CHECK-NOSTRICT-NEXT: }			// CHECK-NOSTRICT-NEXT: }

	// CHECK-NOSTRICT-OPT: ret i32 undef			// CHECK-NOSTRICT-OPT: ret i32 0
	}			}

	enum Enum {			enum Enum {
	A, B			A, B
	};			};

	// CHECK-COMMON-LABEL: @_Z27returnNotViableDontOptimize4Enum			// CHECK-COMMON-LABEL: @_Z27returnNotViableDontOptimize4Enum
	int returnNotViableDontOptimize(Enum e) {			int returnNotViableDontOptimize(Enum e) {
	▲ Show 20 Lines • Show All 72 Lines • Show Last 20 Lines

clang/test/CodeGenOpenCL/overload.cl

	Show All 15 Lines
	void kernel test1() {			void kernel test1() {
	global int *a;			global int *a;
	global int *b;			global int *b;
	generic int *c;			generic int *c;
	local int *d;			local int *d;
	generic int generic gengen;			generic int generic gengen;
	generic int local genloc;			generic int local genloc;
	generic int global genglob;			generic int global genglob;
	// CHECK-DAG: call spir_func void @_Z3fooPU3AS1iS0_(i32 addrspace(1)* noundef undef, i32 addrspace(1)* noundef undef)			// CHECK-DAG: call spir_func void @_Z3fooPU3AS1iS0_(i32 addrspace(1)* noundef null, i32 addrspace(1)* noundef null)
	foo(a, b);			foo(a, b);
	// CHECK-DAG: call spir_func void @_Z3fooPU3AS4iS0_(i32 addrspace(4)* noundef undef, i32 addrspace(4)* noundef undef)			// CHECK-DAG: tail call spir_func void @_Z3fooPU3AS4iS0_(i32 addrspace(4)* noundef addrspacecast (i32 addrspace(1)* null to i32 addrspace(4)), i32 addrspace(4) noundef null)
	foo(b, c);			foo(b, c);
	// CHECK-DAG: call spir_func void @_Z3fooPU3AS4iS0_(i32 addrspace(4)* noundef undef, i32 addrspace(4)* noundef undef)			// CHECK-DAG: tail call spir_func void @_Z3fooPU3AS4iS0_(i32 addrspace(4)* noundef addrspacecast (i32 addrspace(1)* null to i32 addrspace(4)), i32 addrspace(4) noundef addrspacecast (i32 addrspace(3)* null to i32 addrspace(4)*))
	foo(a, d);			foo(a, d);

	// CHECK-DAG: call spir_func void @_Z3barPU3AS4PU3AS4iS2_(i32 addrspace(4)* addrspace(4)* noundef undef, i32 addrspace(4)* addrspace(4)* noundef undef)			// CHECK-DAG: tail call spir_func void @_Z3barPU3AS4PU3AS4iS2_(i32 addrspace(4)* addrspace(4)* noundef null, i32 addrspace(4)* addrspace(4)* noundef addrspacecast (i32 addrspace(4)* addrspace(3)* null to i32 addrspace(4)* addrspace(4)*))
	bar(gengen, genloc);			bar(gengen, genloc);
	// CHECK-DAG: call spir_func void @_Z3barPU3AS4PU3AS4iS2_(i32 addrspace(4)* addrspace(4)* noundef undef, i32 addrspace(4)* addrspace(4)* noundef undef)			// CHECK-DAG: tail call spir_func void @_Z3barPU3AS4PU3AS4iS2_(i32 addrspace(4)* addrspace(4)* noundef null, i32 addrspace(4)* addrspace(4)* noundef addrspacecast (i32 addrspace(4)* addrspace(1)* null to i32 addrspace(4)* addrspace(4)*))
	bar(gengen, genglob);			bar(gengen, genglob);
	// CHECK-DAG: call spir_func void @_Z3barPU3AS1PU3AS4iS2_(i32 addrspace(4)* addrspace(1)* noundef undef, i32 addrspace(4)* addrspace(1)* noundef undef)			// CHECK-DAG: tail call spir_func void @_Z3barPU3AS1PU3AS4iS2_(i32 addrspace(4)* addrspace(1)* noundef null, i32 addrspace(4)* addrspace(1)* noundef null)
	bar(genglob, genglob);			bar(genglob, genglob);
	}			}

	// Checking vector vs scalar resolution			// Checking vector vs scalar resolution
	void kernel test2() {			void kernel test2() {
	short4 e0=0;			short4 e0=0;

	// CHECK-DAG: call spir_func <4 x i16> @_Z5clampDv4_sss(<4 x i16> noundef zeroinitializer, i16 noundef signext 0, i16 noundef signext 255)			// CHECK-DAG: call spir_func <4 x i16> @_Z5clampDv4_sss(<4 x i16> noundef zeroinitializer, i16 noundef signext 0, i16 noundef signext 255)
	clamp(e0, 0, 255);			clamp(e0, 0, 255);
	// CHECK-DAG: call spir_func <4 x i16> @_Z5clampDv4_sS_S_(<4 x i16> noundef zeroinitializer, <4 x i16> noundef zeroinitializer, <4 x i16> noundef zeroinitializer)			// CHECK-DAG: call spir_func <4 x i16> @_Z5clampDv4_sS_S_(<4 x i16> noundef zeroinitializer, <4 x i16> noundef zeroinitializer, <4 x i16> noundef zeroinitializer)
	clamp(e0, e0, e0);			clamp(e0, e0, e0);
	}			}

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

Show All 27 Lines
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DebugInfo.h"		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
▲ Show 20 Lines • Show All 360 Lines • ▼ Show 20 Lines	for (User *U : make_early_inc_range(AI->users())) {
// If the load was marked as nonnull we don't want to lose		// If the load was marked as nonnull we don't want to lose
// that information when we erase this Load. So we preserve		// that information when we erase this Load. So we preserve
// it with an assume.		// it with an assume.
if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&		if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&
!isKnownNonZero(ReplVal, DL, 0, AC, LI, &DT))		!isKnownNonZero(ReplVal, DL, 0, AC, LI, &DT))
addAssumeNonNull(AC, LI);		addAssumeNonNull(AC, LI);

LI->replaceAllUsesWith(ReplVal);		LI->replaceAllUsesWith(ReplVal);
LI->eraseFromParent();
LBI.deleteValue(LI);		LBI.deleteValue(LI);
		LI->eraseFromParent();
}		}

// Finally, after the scan, check to see if the store is all that is left.		// Finally, after the scan, check to see if the store is all that is left.
if (!Info.UsingBlocks.empty())		if (!Info.UsingBlocks.empty())
return false; // If not, we'll have to fall back for the remainder.		return false; // If not, we'll have to fall back for the remainder.

// Record debuginfo for the store and remove the declaration's		// Record debuginfo for the store and remove the declaration's
// debuginfo.		// debuginfo.
for (DbgVariableIntrinsic *DII : Info.DbgUsers) {		for (DbgVariableIntrinsic *DII : Info.DbgUsers) {
if (DII->isAddressOfVariable()) {		if (DII->isAddressOfVariable()) {
DIBuilder DIB(AI->getModule(), /AllowUnresolved*/ false);		DIBuilder DIB(AI->getModule(), /AllowUnresolved*/ false);
ConvertDebugDeclareToDebugValue(DII, Info.OnlyStore, DIB);		ConvertDebugDeclareToDebugValue(DII, Info.OnlyStore, DIB);
DII->eraseFromParent();		DII->eraseFromParent();
} else if (DII->getExpression()->startsWithDeref()) {		} else if (DII->getExpression()->startsWithDeref()) {
DII->eraseFromParent();		DII->eraseFromParent();
}		}
}		}
// Remove the (now dead) store and alloca.		// Remove the (now dead) store and alloca.
Info.OnlyStore->eraseFromParent();
LBI.deleteValue(Info.OnlyStore);		LBI.deleteValue(Info.OnlyStore);
		Info.OnlyStore->eraseFromParent();

AI->eraseFromParent();		AI->eraseFromParent();
return true;		return true;
}		}

/// Many allocas are only used within a single basic block. If this is the		/// Many allocas are only used within a single basic block. If this is the
/// case, avoid traversing the CFG and inserting a lot of potentially useless		/// case, avoid traversing the CFG and inserting a lot of potentially useless
/// PHI nodes by just performing a single linear pass over the basic block		/// PHI nodes by just performing a single linear pass over the basic block
Show All 27 Lines	static bool promoteSingleBlockAlloca(AllocaInst *AI, const AllocaInfo &Info,
for (User *U : AI->users())		for (User *U : AI->users())
if (StoreInst *SI = dyn_cast<StoreInst>(U))		if (StoreInst *SI = dyn_cast<StoreInst>(U))
StoresByIndex.push_back(std::make_pair(LBI.getInstructionIndex(SI), SI));		StoresByIndex.push_back(std::make_pair(LBI.getInstructionIndex(SI), SI));

// Sort the stores by their index, making it efficient to do a lookup with a		// Sort the stores by their index, making it efficient to do a lookup with a
// binary search.		// binary search.
llvm::sort(StoresByIndex, less_first());		llvm::sort(StoresByIndex, less_first());

		Value *FI = nullptr;

// Walk all of the loads from this alloca, replacing them with the nearest		// Walk all of the loads from this alloca, replacing them with the nearest
// store above them, if any.		// store above them, if any.
for (User *U : make_early_inc_range(AI->users())) {		for (User *U : make_early_inc_range(AI->users())) {
LoadInst *LI = dyn_cast<LoadInst>(U);		LoadInst *LI = dyn_cast<LoadInst>(U);
if (!LI)		if (!LI)
continue;		continue;

unsigned LoadIdx = LBI.getInstructionIndex(LI);		unsigned LoadIdx = LBI.getInstructionIndex(LI);

// Find the nearest store that has a lower index than this load.		// Find the nearest store that has a lower index than this load.
StoresByIndexTy::iterator I = llvm::lower_bound(		StoresByIndexTy::iterator I = llvm::lower_bound(
StoresByIndex,		StoresByIndex,
std::make_pair(LoadIdx, static_cast<StoreInst *>(nullptr)),		std::make_pair(LoadIdx, static_cast<StoreInst *>(nullptr)),
less_first());		less_first());
Value *ReplVal;		Value *ReplVal;
if (I == StoresByIndex.begin()) {		if (I == StoresByIndex.begin()) {
if (StoresByIndex.empty())		if (StoresByIndex.empty()) {
// If there are no stores, the load takes the undef value.		// If there are no stores, the load takes the undef value.
		if (LI->use_empty())
ReplVal = UndefValue::get(LI->getType());		ReplVal = UndefValue::get(LI->getType());
else		else {
		// use a frozen undef value so that multiple loads of this alloca
		// will compare properly.
		if (!FI)
		FI = IRBuilder<>(AI).CreateFreeze(
		UndefValue::get(AI->getAllocatedType()), AI->getName() + ".fr");
		ReplVal = FI;
		}
		} else
// There is no store before this load, bail out (load may be affected		// There is no store before this load, bail out (load may be affected
// by the following stores - see main comment).		// by the following stores - see main comment).
return false;		return false;
} else {		} else {
// Otherwise, there was a store before this load, the load takes its		// Otherwise, there was a store before this load, the load takes its
// value.		// value.
ReplVal = std::prev(I)->second->getOperand(0);		ReplVal = std::prev(I)->second->getOperand(0);
}		}

// Note, if the load was marked as nonnull we don't want to lose that		// Note, if the load was marked as nonnull we don't want to lose that
// information when we erase it. So we preserve it with an assume.		// information when we erase it. So we preserve it with an assume.
if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&		if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&
!isKnownNonZero(ReplVal, DL, 0, AC, LI, &DT))		!isKnownNonZero(ReplVal, DL, 0, AC, LI, &DT))
addAssumeNonNull(AC, LI);		addAssumeNonNull(AC, LI);

// If the replacement value is the load, this must occur in unreachable		// If the replacement value is the load, this must occur in unreachable
// code.		// code.
if (ReplVal == LI)		if (ReplVal == LI)
ReplVal = PoisonValue::get(LI->getType());		ReplVal = PoisonValue::get(LI->getType());

LI->replaceAllUsesWith(ReplVal);		LI->replaceAllUsesWith(ReplVal);
LI->eraseFromParent();
LBI.deleteValue(LI);		LBI.deleteValue(LI);
		LI->eraseFromParent();
}		}

// Remove the (now dead) stores and alloca.		// Remove the (now dead) stores and alloca.
while (!AI->use_empty()) {		while (!AI->use_empty()) {
StoreInst *SI = cast<StoreInst>(AI->user_back());		StoreInst *SI = cast<StoreInst>(AI->user_back());
// Record debuginfo for the store before removing it.		// Record debuginfo for the store before removing it.
for (DbgVariableIntrinsic *DII : Info.DbgUsers) {		for (DbgVariableIntrinsic *DII : Info.DbgUsers) {
if (DII->isAddressOfVariable()) {		if (DII->isAddressOfVariable()) {
DIBuilder DIB(AI->getModule(), /AllowUnresolved*/ false);		DIBuilder DIB(AI->getModule(), /AllowUnresolved*/ false);
ConvertDebugDeclareToDebugValue(DII, SI, DIB);		ConvertDebugDeclareToDebugValue(DII, SI, DIB);
}		}
}		}
SI->eraseFromParent();
LBI.deleteValue(SI);		LBI.deleteValue(SI);
		SI->eraseFromParent();
}		}

AI->eraseFromParent();		AI->eraseFromParent();

// The alloca's debuginfo can be removed as well.		// The alloca's debuginfo can be removed as well.
for (DbgVariableIntrinsic *DII : Info.DbgUsers)		for (DbgVariableIntrinsic *DII : Info.DbgUsers)
if (DII->isAddressOfVariable() \|\| DII->getExpression()->startsWithDeref())		if (DII->isAddressOfVariable() \|\| DII->getExpression()->startsWithDeref())
DII->eraseFromParent();		DII->eraseFromParent();
▲ Show 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	if (LoadInst *LI = dyn_cast<LoadInst>(I)) {
AllocaInst *Src = dyn_cast<AllocaInst>(LI->getPointerOperand());		AllocaInst *Src = dyn_cast<AllocaInst>(LI->getPointerOperand());
if (!Src)		if (!Src)
continue;		continue;

DenseMap<AllocaInst *, unsigned>::iterator AI = AllocaLookup.find(Src);		DenseMap<AllocaInst *, unsigned>::iterator AI = AllocaLookup.find(Src);
if (AI == AllocaLookup.end())		if (AI == AllocaLookup.end())
continue;		continue;

		if (!LI->use_empty() && isa<UndefValue>(IncomingVals[AI->second]))
		// Freeze the undef value so that if there are multiple loads of this
		// alloca, they will still compare properly.
		IncomingVals[AI->second] = IRBuilder<>(Src).CreateFreeze(
		UndefValue::get(Src->getAllocatedType()),
		Src->getName() + ".fr");

Value *V = IncomingVals[AI->second];		Value *V = IncomingVals[AI->second];

// If the load was marked as nonnull we don't want to lose		// If the load was marked as nonnull we don't want to lose
// that information when we erase this Load. So we preserve		// that information when we erase this Load. So we preserve
// it with an assume.		// it with an assume.
if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&		if (AC && LI->getMetadata(LLVMContext::MD_nonnull) &&
!isKnownNonZero(V, SQ.DL, 0, AC, LI, &DT))		!isKnownNonZero(V, SQ.DL, 0, AC, LI, &DT))
addAssumeNonNull(AC, LI);		addAssumeNonNull(AC, LI);
▲ Show 20 Lines • Show All 57 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-to-vector.ll

Show All 32 Lines	entry:
ret void		ret void
}		}

; GCN-LABEL: {{^}}float4_alloca_load4:		; GCN-LABEL: {{^}}float4_alloca_load4:
; OPT-LABEL: define amdgpu_kernel void @float4_alloca_load4		; OPT-LABEL: define amdgpu_kernel void @float4_alloca_load4

; GCN-NOT: v_movrel		; GCN-NOT: v_movrel
; GCN-NOT: buffer_		; GCN-NOT: buffer_
; GCN-NOT: v_cmp_
; GCN-NOT: v_cndmask_		; GCN-NOT: v_cndmask_
; GCN: v_mov_b32_e32 [[ONE:v[0-9]+]], 1.0		; GCN: v_cndmask_b32_e64 [[ONE:v[0-9]+]], 2, 1, vcc
; GCN: v_mov_b32_e32 v{{[0-9]+}}, [[ONE]]		; GCN: v_cmp_gt_u32_e32 vcc, 3, v1
; GCN: v_mov_b32_e32 v{{[0-9]+}}, [[ONE]]		; GCN: v_cndmask_b32_e32 v0, 0, [[ONE]], vcc
; GCN: v_mov_b32_e32 v{{[0-9]+}}, [[ONE]]		; GCN: v_cmp_ne_u32_e32 vcc, 3, [[ONE]]
		; GCN: v_cndmask_b32_e32 v3, 1.0, [[ONE]], vcc
		; GCN: v_cmp_ne_u32_e32 vcc, 2, [[ONE]]
		; GCN: v_cndmask_b32_e32 v2, 1.0, [[ONE]], vcc
		; GCN: v_cmp_ne_u32_e32 vcc, 1, [[ONE]]
		; GCN: v_cndmask_b32_e32 v1, 1.0, [[ONE]], vcc
		; GCN: v_cmp_ne_u32_e32 vcc, 0, [[ONE]]
; GCN: store_dwordx4 v{{.+}},		; GCN: store_dwordx4 v{{.+}},

; OPT: %gep = getelementptr inbounds <4 x float>, <4 x float> addrspace(5)* %alloca, i32 0, i32 %sel2		; OPT: %gep = getelementptr inbounds <4 x float>, <4 x float> addrspace(5)* %alloca, i32 0, i32 %sel2
; OPT: %0 = load <4 x float>, <4 x float> addrspace(5)* %alloca		; OPT: %0 = load <4 x float>, <4 x float> addrspace(5)* %alloca
; OPT: %1 = insertelement <4 x float> %0, float 1.000000e+00, i32 %sel2		; OPT: %1 = insertelement <4 x float> %0, float 1.000000e+00, i32 %sel2
; OPT: store <4 x float> %1, <4 x float> addrspace(5)* %alloca		; OPT: store <4 x float> %1, <4 x float> addrspace(5)* %alloca
; OPT: %load = load <4 x float>, <4 x float> addrspace(5)* %alloca, align 4		; OPT: %load = load <4 x float>, <4 x float> addrspace(5)* %alloca, align 4
; OPT: store <4 x float> %load, <4 x float> addrspace(1)* %out, align 4		; OPT: store <4 x float> %load, <4 x float> addrspace(1)* %out, align 4
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	entry:
store half %load, half addrspace(1)* %out, align 2		store half %load, half addrspace(1)* %out, align 2
ret void		ret void
}		}

; GCN-LABEL: {{^}}half4_alloca_load4:		; GCN-LABEL: {{^}}half4_alloca_load4:
; OPT-LABEL: define amdgpu_kernel void @half4_alloca_load4		; OPT-LABEL: define amdgpu_kernel void @half4_alloca_load4

; GCN-NOT: buffer_		; GCN-NOT: buffer_
; GCN: s_mov_b64 s[{{[0-9:]+}}], 0xffff		; GCN: s_mov_b64 s[{{[0-9:]+}}], 0xffff

; OPT: %gep = getelementptr inbounds <4 x half>, <4 x half> addrspace(5)* %alloca, i32 0, i32 %sel2		; OPT: %gep = getelementptr inbounds <4 x half>, <4 x half> addrspace(5)* %alloca, i32 0, i32 %sel2
; OPT: %0 = load <4 x half>, <4 x half> addrspace(5)* %alloca		; OPT: %0 = load <4 x half>, <4 x half> addrspace(5)* %alloca
; OPT: %1 = insertelement <4 x half> %0, half 0xH3C00, i32 %sel2		; OPT: %1 = insertelement <4 x half> %0, half 0xH3C00, i32 %sel2
; OPT: store <4 x half> %1, <4 x half> addrspace(5)* %alloca		; OPT: store <4 x half> %1, <4 x half> addrspace(5)* %alloca
; OPT: %load = load <4 x half>, <4 x half> addrspace(5)* %alloca, align 2		; OPT: %load = load <4 x half>, <4 x half> addrspace(5)* %alloca, align 2
; OPT: store <4 x half> %load, <4 x half> addrspace(1)* %out, align 2		; OPT: store <4 x half> %load, <4 x half> addrspace(1)* %out, align 2

▲ Show 20 Lines • Show All 42 Lines • ▼ Show 20 Lines	entry:
store i16 %load, i16 addrspace(1)* %out, align 2		store i16 %load, i16 addrspace(1)* %out, align 2
ret void		ret void
}		}

; GCN-LABEL: {{^}}short4_alloca_load4:		; GCN-LABEL: {{^}}short4_alloca_load4:
; OPT-LABEL: define amdgpu_kernel void @short4_alloca_load4		; OPT-LABEL: define amdgpu_kernel void @short4_alloca_load4

; GCN-NOT: buffer_		; GCN-NOT: buffer_
; GCN: s_mov_b64 s[{{[0-9:]+}}], 0xffff		; GCN: s_mov_b64 s[{{[0-9:]+}}], 0xffff

; OPT: %gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(5)* %alloca, i32 0, i32 %sel2		; OPT: %gep = getelementptr inbounds <4 x i16>, <4 x i16> addrspace(5)* %alloca, i32 0, i32 %sel2
; OPT: %0 = load <4 x i16>, <4 x i16> addrspace(5)* %alloca		; OPT: %0 = load <4 x i16>, <4 x i16> addrspace(5)* %alloca
; OPT: %1 = insertelement <4 x i16> %0, i16 1, i32 %sel2		; OPT: %1 = insertelement <4 x i16> %0, i16 1, i32 %sel2
; OPT: store <4 x i16> %1, <4 x i16> addrspace(5)* %alloca		; OPT: store <4 x i16> %1, <4 x i16> addrspace(5)* %alloca
; OPT: %load = load <4 x i16>, <4 x i16> addrspace(5)* %alloca, align 2		; OPT: %load = load <4 x i16>, <4 x i16> addrspace(5)* %alloca, align 2
; OPT: store <4 x i16> %load, <4 x i16> addrspace(1)* %out, align 2		; OPT: store <4 x i16> %load, <4 x i16> addrspace(1)* %out, align 2

Show All 12 Lines	entry:
store <4 x i16> %load, <4 x i16> addrspace(1)* %out, align 2		store <4 x i16> %load, <4 x i16> addrspace(1)* %out, align 2
ret void		ret void
}		}

; GCN-LABEL: {{^}}ptr_alloca_bitcast:		; GCN-LABEL: {{^}}ptr_alloca_bitcast:
; OPT-LABEL: define i64 @ptr_alloca_bitcast		; OPT-LABEL: define i64 @ptr_alloca_bitcast

; GCN-NOT: buffer_		; GCN-NOT: buffer_
; GCN: v_mov_b32_e32 v1, 0

; OPT: %private_iptr = alloca <2 x i32>, align 8, addrspace(5)		; OPT: %private_iptr = alloca <2 x i32>, align 8, addrspace(5)
; OPT: %cast = bitcast <2 x i32> addrspace(5)* %private_iptr to i64 addrspace(5)*		; OPT: %cast = bitcast <2 x i32> addrspace(5)* %private_iptr to i64 addrspace(5)*
; OPT: %tmp1 = load i64, i64 addrspace(5)* %cast, align 8		; OPT: %tmp1 = load i64, i64 addrspace(5)* %cast, align 8

define i64 @ptr_alloca_bitcast() {		define i64 @ptr_alloca_bitcast() {
entry:		entry:
%private_iptr = alloca <2 x i32>, align 8, addrspace(5)		%private_iptr = alloca <2 x i32>, align 8, addrspace(5)
%cast = bitcast <2 x i32> addrspace(5)* %private_iptr to i64 addrspace(5)*		%cast = bitcast <2 x i32> addrspace(5)* %private_iptr to i64 addrspace(5)*
%tmp1 = load i64, i64 addrspace(5)* %cast, align 8		%tmp1 = load i64, i64 addrspace(5)* %cast, align 8
ret i64 %tmp1		ret i64 %tmp1
}		}

declare i32 @llvm.amdgcn.workitem.id.x()		declare i32 @llvm.amdgcn.workitem.id.x()
declare i32 @llvm.amdgcn.workitem.id.y()		declare i32 @llvm.amdgcn.workitem.id.y()

llvm/test/CodeGen/AMDGPU/vector-alloca-limits.ll

; RUN: opt -S -mtriple=amdgcn-- -amdgpu-promote-alloca -sroa -instcombine < %s \| FileCheck -check-prefix=OPT %s		; RUN: opt -S -mtriple=amdgcn-- -amdgpu-promote-alloca -sroa -instcombine < %s \| FileCheck -check-prefix=OPT %s
; RUN: opt -S -mtriple=amdgcn-- -amdgpu-promote-alloca -sroa -instcombine -amdgpu-promote-alloca-to-vector-limit=32 < %s \| FileCheck -check-prefix=LIMIT32 %s		; RUN: opt -S -mtriple=amdgcn-- -amdgpu-promote-alloca -sroa -instcombine -amdgpu-promote-alloca-to-vector-limit=32 < %s \| FileCheck -check-prefix=LIMIT32 %s

target datalayout = "A5"		target datalayout = "A5"

; OPT-LABEL: @alloca_8xi64_max1024(		; OPT-LABEL: @alloca_8xi64_max1024(
; OPT-NOT: alloca		; OPT-NOT: alloca
; OPT: <8 x i64>		; OPT-NOT: <8 x i64>
; LIMIT32: alloca		; LIMIT32: alloca
; LIMIT32-NOT: <8 x i64>		; LIMIT32-NOT: <8 x i64>
define amdgpu_kernel void @alloca_8xi64_max1024(i64 addrspace(1)* %out, i32 %index) #0 {		define amdgpu_kernel void @alloca_8xi64_max1024(i64 addrspace(1)* %out, i32 %index) #0 {
entry:		entry:
%tmp = alloca [8 x i64], addrspace(5)		%tmp = alloca [8 x i64], addrspace(5)
%x = getelementptr [8 x i64], [8 x i64] addrspace(5)* %tmp, i32 0, i32 0		%x = getelementptr [8 x i64], [8 x i64] addrspace(5)* %tmp, i32 0, i32 0
store i64 0, i64 addrspace(5)* %x		store i64 0, i64 addrspace(5)* %x
%tmp1 = getelementptr [8 x i64], [8 x i64] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [8 x i64], [8 x i64] addrspace(5)* %tmp, i32 0, i32 %index
Show All 15 Lines	entry:
%tmp1 = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 %index
%tmp2 = load i64, i64 addrspace(5)* %tmp1		%tmp2 = load i64, i64 addrspace(5)* %tmp1
store i64 %tmp2, i64 addrspace(1)* %out		store i64 %tmp2, i64 addrspace(1)* %out
ret void		ret void
}		}

; OPT-LABEL: @alloca_16xi64_max512(		; OPT-LABEL: @alloca_16xi64_max512(
; OPT-NOT: alloca		; OPT-NOT: alloca
; OPT: <16 x i64>		; OPT-NOT: <16 x i64>
; LIMIT32: alloca		; LIMIT32: alloca
; LIMIT32-NOT: <16 x i64>		; LIMIT32-NOT: <16 x i64>
define amdgpu_kernel void @alloca_16xi64_max512(i64 addrspace(1)* %out, i32 %index) #1 {		define amdgpu_kernel void @alloca_16xi64_max512(i64 addrspace(1)* %out, i32 %index) #1 {
entry:		entry:
%tmp = alloca [16 x i64], addrspace(5)		%tmp = alloca [16 x i64], addrspace(5)
%x = getelementptr [16 x i64], [16 x i64] addrspace(5)* %tmp, i32 0, i32 0		%x = getelementptr [16 x i64], [16 x i64] addrspace(5)* %tmp, i32 0, i32 0
store i64 0, i64 addrspace(5)* %x		store i64 0, i64 addrspace(5)* %x
%tmp1 = getelementptr [16 x i64], [16 x i64] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [16 x i64], [16 x i64] addrspace(5)* %tmp, i32 0, i32 %index
Show All 31 Lines	entry:
%tmp1 = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 %index
%tmp2 = load i128, i128 addrspace(5)* %tmp1		%tmp2 = load i128, i128 addrspace(5)* %tmp1
store i128 %tmp2, i128 addrspace(1)* %out		store i128 %tmp2, i128 addrspace(1)* %out
ret void		ret void
}		}

; OPT-LABEL: @alloca_9xi128_max256(		; OPT-LABEL: @alloca_9xi128_max256(
; OPT-NOT: alloca		; OPT-NOT: alloca
; OPT: <9 x i128>		; OPT-NOT: <9 x i128>
; LIMIT32: alloca		; LIMIT32: alloca
; LIMIT32-NOT: <9 x i128>		; LIMIT32-NOT: <9 x i128>
define amdgpu_kernel void @alloca_9xi128_max256(i128 addrspace(1)* %out, i32 %index) #2 {		define amdgpu_kernel void @alloca_9xi128_max256(i128 addrspace(1)* %out, i32 %index) #2 {
entry:		entry:
%tmp = alloca [9 x i128], addrspace(5)		%tmp = alloca [9 x i128], addrspace(5)
%x = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 0		%x = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 0
store i128 0, i128 addrspace(5)* %x		store i128 0, i128 addrspace(5)* %x
%tmp1 = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [9 x i128], [9 x i128] addrspace(5)* %tmp, i32 0, i32 %index
%tmp2 = load i128, i128 addrspace(5)* %tmp1		%tmp2 = load i128, i128 addrspace(5)* %tmp1
store i128 %tmp2, i128 addrspace(1)* %out		store i128 %tmp2, i128 addrspace(1)* %out
ret void		ret void
}		}

; OPT-LABEL: @alloca_16xi128_max256(		; OPT-LABEL: @alloca_16xi128_max256(
; OPT-NOT: alloca		; OPT-NOT: alloca
; OPT: <16 x i128>		; OPT-NOT: <16 x i128>
; LIMIT32: alloca		; LIMIT32: alloca
; LIMIT32-NOT: <16 x i128>		; LIMIT32-NOT: <16 x i128>
define amdgpu_kernel void @alloca_16xi128_max256(i128 addrspace(1)* %out, i32 %index) #2 {		define amdgpu_kernel void @alloca_16xi128_max256(i128 addrspace(1)* %out, i32 %index) #2 {
entry:		entry:
%tmp = alloca [16 x i128], addrspace(5)		%tmp = alloca [16 x i128], addrspace(5)
%x = getelementptr [16 x i128], [16 x i128] addrspace(5)* %tmp, i32 0, i32 0		%x = getelementptr [16 x i128], [16 x i128] addrspace(5)* %tmp, i32 0, i32 0
store i128 0, i128 addrspace(5)* %x		store i128 0, i128 addrspace(5)* %x
%tmp1 = getelementptr [16 x i128], [16 x i128] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [16 x i128], [16 x i128] addrspace(5)* %tmp, i32 0, i32 %index
Show All 15 Lines	entry:
%tmp1 = getelementptr [9 x i256], [9 x i256] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [9 x i256], [9 x i256] addrspace(5)* %tmp, i32 0, i32 %index
%tmp2 = load i256, i256 addrspace(5)* %tmp1		%tmp2 = load i256, i256 addrspace(5)* %tmp1
store i256 %tmp2, i256 addrspace(1)* %out		store i256 %tmp2, i256 addrspace(1)* %out
ret void		ret void
}		}

; OPT-LABEL: @alloca_9xi64_max256(		; OPT-LABEL: @alloca_9xi64_max256(
; OPT-NOT: alloca		; OPT-NOT: alloca
; OPT: <9 x i64>		; OPT-NOT: <9 x i64>
; LIMIT32: alloca		; LIMIT32: alloca
; LIMIT32-NOT: <9 x i64>		; LIMIT32-NOT: <9 x i64>
define amdgpu_kernel void @alloca_9xi64_max256(i64 addrspace(1)* %out, i32 %index) #2 {		define amdgpu_kernel void @alloca_9xi64_max256(i64 addrspace(1)* %out, i32 %index) #2 {
entry:		entry:
%tmp = alloca [9 x i64], addrspace(5)		%tmp = alloca [9 x i64], addrspace(5)
%x = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 0		%x = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 0
store i64 0, i64 addrspace(5)* %x		store i64 0, i64 addrspace(5)* %x
%tmp1 = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 %index		%tmp1 = getelementptr [9 x i64], [9 x i64] addrspace(5)* %tmp, i32 0, i32 %index
Show All 24 Lines

llvm/test/Transforms/Mem2Reg/pr24179.ll

Show All 33 Lines	;
ret void		ret void
}		}

; Same as above, except there is no following store. The alloca should just be		; Same as above, except there is no following store. The alloca should just be
; replaced with an undef		; replaced with an undef
define void @test2() {		define void @test2() {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: br label [[LOOP:%.*]]		; CHECK-NEXT: br label [[LOOP:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[C:%.*]] = call i1 @use(i32 undef)		; CHECK-NEXT: [[C:%.*]] = call i1 @use(i32 [[FR1]])
; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT:%.*]]		; CHECK-NEXT: br i1 [[C]], label [[LOOP]], label [[EXIT:%.*]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%t = alloca i32		%t = alloca i32
br label %loop		br label %loop

loop:		loop:
%v = load i32, i32* %t		%v = load i32, i32* %t
%c = call i1 @use(i32 %v)		%c = call i1 @use(i32 %v)
br i1 %c, label %loop, label %exit		br i1 %c, label %loop, label %exit

exit:		exit:
ret void		ret void
}		}

llvm/test/Transforms/Mem2Reg/preserve-nonnull-load-metadata.ll

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	; we need not add the assume.
ret float* %buf.load		ret float* %buf.load
fin:		fin:
ret float* null		ret float* null
}		}

define float* @no_store_single_load() {		define float* @no_store_single_load() {
; CHECK-LABEL: @no_store_single_load(		; CHECK-LABEL: @no_store_single_load(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = icmp ne float undef, null		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze float*
		; CHECK-NEXT: [[TMP0:%.]] = icmp ne float [[FR1]], null
; CHECK-NEXT: call void @llvm.assume(i1 [[TMP0]])		; CHECK-NEXT: call void @llvm.assume(i1 [[TMP0]])
; CHECK-NEXT: ret float* undef		; CHECK-NEXT: ret float* [[FR1]]
;		;
entry:		entry:
%buf = alloca float*		%buf = alloca float*
%buf.load = load float, float *%buf, !nonnull !0		%buf.load = load float, float *%buf, !nonnull !0
ret float* %buf.load		ret float* %buf.load
}		}

define float* @no_store_multiple_loads(i1 %c) {		define float* @no_store_multiple_loads(i1 %c) {
; CHECK-LABEL: @no_store_multiple_loads(		; CHECK-LABEL: @no_store_multiple_loads(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze float*
		; CHECK-NEXT: [[FR3:%.\.fr.]] = freeze float*
; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[IF:%.]], label [[ELSE:%.*]]
; CHECK: if:		; CHECK: if:
; CHECK-NEXT: [[TMP0:%.]] = icmp ne float undef, null		; CHECK-NEXT: [[TMP0:%.]] = icmp ne float [[FR2]], null
; CHECK-NEXT: call void @llvm.assume(i1 [[TMP0]])		; CHECK-NEXT: call void @llvm.assume(i1 [[TMP0]])
; CHECK-NEXT: ret float* undef		; CHECK-NEXT: ret float* [[FR2]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: [[TMP1:%.]] = icmp ne float undef, null		; CHECK-NEXT: [[TMP1:%.]] = icmp ne float [[FR3]], null
; CHECK-NEXT: call void @llvm.assume(i1 [[TMP1]])		; CHECK-NEXT: call void @llvm.assume(i1 [[TMP1]])
; CHECK-NEXT: ret float* undef		; CHECK-NEXT: ret float* [[FR3]]
;		;
entry:		entry:
%buf = alloca float*		%buf = alloca float*
br i1 %c, label %if, label %else		br i1 %c, label %if, label %else

if:		if:
%buf.load = load float, float *%buf, !nonnull !0		%buf.load = load float, float *%buf, !nonnull !0
ret float* %buf.load		ret float* %buf.load

else:		else:
%buf.load2 = load float, float *%buf, !nonnull !0		%buf.load2 = load float, float *%buf, !nonnull !0
ret float* %buf.load2		ret float* %buf.load2
}		}

!0 = !{}		!0 = !{}

llvm/test/Transforms/PhaseOrdering/X86/nancvt.ll

	Show All 36 Lines
	; CHECK-NEXT: store volatile i32 -1610612736, i32* @var, align 4			; CHECK-NEXT: store volatile i32 -1610612736, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 2147027116, i32* @var, align 4			; CHECK-NEXT: store volatile i32 2147027116, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 -2147483648, i32* @var, align 4			; CHECK-NEXT: store volatile i32 -2147483648, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 2147027116, i32* @var, align 4			; CHECK-NEXT: store volatile i32 2147027116, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 -1073741824, i32* @var, align 4			; CHECK-NEXT: store volatile i32 -1073741824, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4			; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4			; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4
	; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4			; CHECK-NEXT: store volatile i32 2147228864, i32* @var, align 4
	; CHECK-NEXT: ret i32 undef			; CHECK-NEXT: ret i32 0
	;			;
	entry:			entry:
	%retval = alloca i32, align 4			%retval = alloca i32, align 4
	%i = alloca i32, align 4			%i = alloca i32, align 4
	%uf = alloca %struct..0anon, align 4			%uf = alloca %struct..0anon, align 4
	%ud = alloca %struct..1anon, align 8			%ud = alloca %struct..1anon, align 8
	%"alloca point" = bitcast i32 0 to i32			%"alloca point" = bitcast i32 0 to i32
	store i32 0, i32* %i, align 4			store i32 0, i32* %i, align 4
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/address-spaces.ll

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

%struct.struct_test_27.0.13 = type { i32, float, i64, i8, [4 x i32] }		%struct.struct_test_27.0.13 = type { i32, float, i64, i8, [4 x i32] }

define void @copy_struct([5 x i64] %in.coerce, ptr addrspace(1) align 4 %ptr) {		define void @copy_struct([5 x i64] %in.coerce, ptr addrspace(1) align 4 %ptr) {
; CHECK-LABEL: @copy_struct(		; CHECK-LABEL: @copy_struct(
; CHECK-NEXT: for.end:		; CHECK-NEXT: for.end:
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: [[IN_COERCE_FCA_0_EXTRACT:%.]] = extractvalue [5 x i64] [[IN_COERCE:%.]], 0		; CHECK-NEXT: [[IN_COERCE_FCA_0_EXTRACT:%.]] = extractvalue [5 x i64] [[IN_COERCE:%.]], 0
; CHECK-NEXT: [[IN_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 1		; CHECK-NEXT: [[IN_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 1
; CHECK-NEXT: [[IN_COERCE_FCA_2_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 2		; CHECK-NEXT: [[IN_COERCE_FCA_2_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 2
; CHECK-NEXT: [[IN_COERCE_FCA_3_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 3		; CHECK-NEXT: [[IN_COERCE_FCA_3_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 3
; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_SHIFT:%.*]] = lshr i64 [[IN_COERCE_FCA_2_EXTRACT]], 32		; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_SHIFT:%.*]] = lshr i64 [[IN_COERCE_FCA_2_EXTRACT]], 32
; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_TRUNC:%.*]] = trunc i64 [[IN_SROA_2_4_EXTRACT_SHIFT]] to i32		; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_TRUNC:%.*]] = trunc i64 [[IN_SROA_2_4_EXTRACT_SHIFT]] to i32
; CHECK-NEXT: store i32 [[IN_SROA_2_4_EXTRACT_TRUNC]], ptr addrspace(1) [[PTR:%.*]], align 4		; CHECK-NEXT: store i32 [[IN_SROA_2_4_EXTRACT_TRUNC]], ptr addrspace(1) [[PTR:%.*]], align 4
; CHECK-NEXT: [[IN_SROA_4_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 4		; CHECK-NEXT: [[IN_SROA_4_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 4
; CHECK-NEXT: store i64 [[IN_COERCE_FCA_3_EXTRACT]], ptr addrspace(1) [[IN_SROA_4_20_PTR_SROA_IDX]], align 4		; CHECK-NEXT: store i64 [[IN_COERCE_FCA_3_EXTRACT]], ptr addrspace(1) [[IN_SROA_4_20_PTR_SROA_IDX]], align 4
; CHECK-NEXT: [[IN_SROA_5_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 12		; CHECK-NEXT: [[IN_SROA_5_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 12
; CHECK-NEXT: store i32 undef, ptr addrspace(1) [[IN_SROA_5_20_PTR_SROA_IDX]], align 4		; CHECK-NEXT: store i32 [[FR1]], ptr addrspace(1) [[IN_SROA_5_20_PTR_SROA_IDX]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
for.end:		for.end:
%in = alloca %struct.struct_test_27.0.13, align 8		%in = alloca %struct.struct_test_27.0.13, align 8
store [5 x i64] %in.coerce, ptr %in, align 8		store [5 x i64] %in.coerce, ptr %in, align 8
%scevgep9 = getelementptr %struct.struct_test_27.0.13, ptr %in, i32 0, i32 4, i32 0		%scevgep9 = getelementptr %struct.struct_test_27.0.13, ptr %in, i32 0, i32 4, i32 0
call void @llvm.memcpy.p1.p0.i32(ptr addrspace(1) align 4 %ptr, ptr align 4 %scevgep9, i32 16, i1 false)		call void @llvm.memcpy.p1.p0.i32(ptr addrspace(1) align 4 %ptr, ptr align 4 %scevgep9, i32 16, i1 false)
ret void		ret void
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/addrspacecast.ll

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	}			}

	;; If this was external, we wouldn't be able to prove dereferenceability			;; If this was external, we wouldn't be able to prove dereferenceability
	;; of the location.			;; of the location.
	@gv = addrspace(1) global i64 zeroinitializer			@gv = addrspace(1) global i64 zeroinitializer

	define void @select_addrspacecast_gv(i1 %a, i1 %b) {			define void @select_addrspacecast_gv(i1 %a, i1 %b) {
	; CHECK-LABEL: @select_addrspacecast_gv(			; CHECK-LABEL: @select_addrspacecast_gv(
				; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i64 undef
	; CHECK-NEXT: [[COND_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i64, ptr addrspace(1) @gv, align 8			; CHECK-NEXT: [[COND_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i64, ptr addrspace(1) @gv, align 8
	; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 undef, i64 [[COND_SROA_SPECULATE_LOAD_FALSE]]			; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 [[FR1]], i64 [[COND_SROA_SPECULATE_LOAD_FALSE]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%c = alloca i64, align 8			%c = alloca i64, align 8
	%p.0.c = select i1 %a, ptr %c, ptr %c			%p.0.c = select i1 %a, ptr %c, ptr %c
	%asc = addrspacecast ptr %p.0.c to ptr addrspace(1)			%asc = addrspacecast ptr %p.0.c to ptr addrspace(1)

	%cond.in = select i1 %b, ptr addrspace(1) %asc, ptr addrspace(1) @gv			%cond.in = select i1 %b, ptr addrspace(1) %asc, ptr addrspace(1) @gv
	%cond = load i64, ptr addrspace(1) %cond.in, align 8			%cond = load i64, ptr addrspace(1) %cond.in, align 8
	ret void			ret void
	}			}

	define void @select_addrspacecast_gv_constexpr(i1 %a, i1 %b) {			define void @select_addrspacecast_gv_constexpr(i1 %a, i1 %b) {
	; CHECK-LABEL: @select_addrspacecast_gv_constexpr(			; CHECK-LABEL: @select_addrspacecast_gv_constexpr(
				; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze i64 undef
	; CHECK-NEXT: [[COND_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i64, ptr addrspace(2) addrspacecast (ptr addrspace(1) @gv to ptr addrspace(2)), align 8			; CHECK-NEXT: [[COND_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i64, ptr addrspace(2) addrspacecast (ptr addrspace(1) @gv to ptr addrspace(2)), align 8
	; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 undef, i64 [[COND_SROA_SPECULATE_LOAD_FALSE]]			; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 [[FR2]], i64 [[COND_SROA_SPECULATE_LOAD_FALSE]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%c = alloca i64, align 8			%c = alloca i64, align 8
	%p.0.c = select i1 %a, ptr %c, ptr %c			%p.0.c = select i1 %a, ptr %c, ptr %c
	%asc = addrspacecast ptr %p.0.c to ptr addrspace(2)			%asc = addrspacecast ptr %p.0.c to ptr addrspace(2)

	%cond.in = select i1 %b, ptr addrspace(2) %asc, ptr addrspace(2) addrspacecast (ptr addrspace(1) @gv to ptr addrspace(2))			%cond.in = select i1 %b, ptr addrspace(2) %asc, ptr addrspace(2) addrspacecast (ptr addrspace(1) @gv to ptr addrspace(2))
	%cond = load i64, ptr addrspace(2) %cond.in, align 8			%cond = load i64, ptr addrspace(2) %cond.in, align 8
	ret void			ret void
	}			}

	define i8 @select_addrspacecast_i8(i1 %c) {			define i8 @select_addrspacecast_i8(i1 %c) {
	; CHECK-LABEL: @select_addrspacecast_i8(			; CHECK-LABEL: @select_addrspacecast_i8(
	; CHECK-NEXT: [[RET_SROA_SPECULATED:%.]] = select i1 [[C:%.]], i8 undef, i8 undef			; CHECK-NEXT: [[FR3:%.\.fr.]] = freeze i8 undef
				; CHECK-NEXT: [[FR4:%.\.fr.]] = freeze i8 undef
				; CHECK-NEXT: [[RET_SROA_SPECULATED:%.]] = select i1 [[C:%.]], i8 [[FR3]], i8 [[FR4]]
	; CHECK-NEXT: ret i8 [[RET_SROA_SPECULATED]]			; CHECK-NEXT: ret i8 [[RET_SROA_SPECULATED]]
	;			;
	%a = alloca i8			%a = alloca i8
	%b = alloca i8			%b = alloca i8

	%a.ptr = addrspacecast ptr %a to ptr addrspace(1)			%a.ptr = addrspacecast ptr %a to ptr addrspace(1)
	%b.ptr = addrspacecast ptr %b to ptr addrspace(1)			%b.ptr = addrspacecast ptr %b to ptr addrspace(1)

	Show All 10 Lines

llvm/test/Transforms/SROA/alloca-address-space.ll

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	;
ret void		ret void
}		}

%struct.struct_test_27.0.13 = type { i32, float, i64, i8, [4 x i32] }		%struct.struct_test_27.0.13 = type { i32, float, i64, i8, [4 x i32] }

define void @copy_struct([5 x i64] %in.coerce, ptr addrspace(1) align 4 %ptr) {		define void @copy_struct([5 x i64] %in.coerce, ptr addrspace(1) align 4 %ptr) {
; CHECK-LABEL: @copy_struct(		; CHECK-LABEL: @copy_struct(
; CHECK-NEXT: for.end:		; CHECK-NEXT: for.end:
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: [[IN_COERCE_FCA_0_EXTRACT:%.]] = extractvalue [5 x i64] [[IN_COERCE:%.]], 0		; CHECK-NEXT: [[IN_COERCE_FCA_0_EXTRACT:%.]] = extractvalue [5 x i64] [[IN_COERCE:%.]], 0
; CHECK-NEXT: [[IN_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 1		; CHECK-NEXT: [[IN_COERCE_FCA_1_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 1
; CHECK-NEXT: [[IN_COERCE_FCA_2_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 2		; CHECK-NEXT: [[IN_COERCE_FCA_2_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 2
; CHECK-NEXT: [[IN_COERCE_FCA_3_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 3		; CHECK-NEXT: [[IN_COERCE_FCA_3_EXTRACT:%.*]] = extractvalue [5 x i64] [[IN_COERCE]], 3
; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_SHIFT:%.*]] = lshr i64 [[IN_COERCE_FCA_2_EXTRACT]], 32		; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_SHIFT:%.*]] = lshr i64 [[IN_COERCE_FCA_2_EXTRACT]], 32
; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_TRUNC:%.*]] = trunc i64 [[IN_SROA_2_4_EXTRACT_SHIFT]] to i32		; CHECK-NEXT: [[IN_SROA_2_4_EXTRACT_TRUNC:%.*]] = trunc i64 [[IN_SROA_2_4_EXTRACT_SHIFT]] to i32
; CHECK-NEXT: store i32 [[IN_SROA_2_4_EXTRACT_TRUNC]], ptr addrspace(1) [[PTR:%.*]], align 4		; CHECK-NEXT: store i32 [[IN_SROA_2_4_EXTRACT_TRUNC]], ptr addrspace(1) [[PTR:%.*]], align 4
; CHECK-NEXT: [[IN_SROA_4_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 4		; CHECK-NEXT: [[IN_SROA_4_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 4
; CHECK-NEXT: store i64 [[IN_COERCE_FCA_3_EXTRACT]], ptr addrspace(1) [[IN_SROA_4_20_PTR_SROA_IDX]], align 4		; CHECK-NEXT: store i64 [[IN_COERCE_FCA_3_EXTRACT]], ptr addrspace(1) [[IN_SROA_4_20_PTR_SROA_IDX]], align 4
; CHECK-NEXT: [[IN_SROA_5_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 12		; CHECK-NEXT: [[IN_SROA_5_20_PTR_SROA_IDX:%.*]] = getelementptr inbounds i8, ptr addrspace(1) [[PTR]], i16 12
; CHECK-NEXT: store i32 undef, ptr addrspace(1) [[IN_SROA_5_20_PTR_SROA_IDX]], align 4		; CHECK-NEXT: store i32 [[FR1]], ptr addrspace(1) [[IN_SROA_5_20_PTR_SROA_IDX]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
for.end:		for.end:
%in = alloca %struct.struct_test_27.0.13, align 8, addrspace(2)		%in = alloca %struct.struct_test_27.0.13, align 8, addrspace(2)
store [5 x i64] %in.coerce, ptr addrspace(2) %in, align 8		store [5 x i64] %in.coerce, ptr addrspace(2) %in, align 8
%scevgep9 = getelementptr %struct.struct_test_27.0.13, ptr addrspace(2) %in, i32 0, i32 4, i32 0		%scevgep9 = getelementptr %struct.struct_test_27.0.13, ptr addrspace(2) %in, i32 0, i32 4, i32 0
call void @llvm.memcpy.p1.p2.i32(ptr addrspace(1) align 4 %ptr, ptr addrspace(2) align 4 %scevgep9, i32 16, i1 false)		call void @llvm.memcpy.p1.p2.i32(ptr addrspace(1) align 4 %ptr, ptr addrspace(2) align 4 %scevgep9, i32 16, i1 false)
ret void		ret void
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/basictest.ll

Show First 20 Lines • Show All 1,081 Lines • ▼ Show 20 Lines
; registers. This in turn was missed as an optimization by SROA due to the		; registers. This in turn was missed as an optimization by SROA due to the
; partial loads and stores of integers to the double alloca we were trying to		; partial loads and stores of integers to the double alloca we were trying to
; form and promote. The solution is to widen the integer operations to be		; form and promote. The solution is to widen the integer operations to be
; whole-alloca operations, and perform the appropriate bitcasting on the		; whole-alloca operations, and perform the appropriate bitcasting on the
; values rather than the pointers. When this works, partial reads and writes		; values rather than the pointers. When this works, partial reads and writes
; via integers can be promoted away.		; via integers can be promoted away.
; CHECK-LABEL: @PR14059.1(		; CHECK-LABEL: @PR14059.1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.*]] = bitcast double undef to i64		; CHECK-NEXT: [[FR3:%.\.fr.]] = freeze double undef
		; CHECK-NEXT: [[TMP0:%.*]] = bitcast double [[FR3]] to i64
; CHECK-NEXT: [[X_SROA_0_I_0_INSERT_MASK:%.*]] = and i64 [[TMP0]], -4294967296		; CHECK-NEXT: [[X_SROA_0_I_0_INSERT_MASK:%.*]] = and i64 [[TMP0]], -4294967296
; CHECK-NEXT: [[X_SROA_0_I_0_INSERT_INSERT:%.*]] = or i64 [[X_SROA_0_I_0_INSERT_MASK]], 0		; CHECK-NEXT: [[X_SROA_0_I_0_INSERT_INSERT:%.*]] = or i64 [[X_SROA_0_I_0_INSERT_MASK]], 0
; CHECK-NEXT: [[TMP1:%.*]] = bitcast i64 [[X_SROA_0_I_0_INSERT_INSERT]] to double		; CHECK-NEXT: [[TMP1:%.*]] = bitcast i64 [[X_SROA_0_I_0_INSERT_INSERT]] to double
; CHECK-NEXT: [[TMP2:%.*]] = bitcast double [[TMP1]] to i64		; CHECK-NEXT: [[TMP2:%.*]] = bitcast double [[TMP1]] to i64
; CHECK-NEXT: [[X_SROA_0_I_2_INSERT_MASK:%.*]] = and i64 [[TMP2]], -281474976645121		; CHECK-NEXT: [[X_SROA_0_I_2_INSERT_MASK:%.*]] = and i64 [[TMP2]], -281474976645121
; CHECK-NEXT: [[X_SROA_0_I_2_INSERT_INSERT:%.*]] = or i64 [[X_SROA_0_I_2_INSERT_MASK]], 0		; CHECK-NEXT: [[X_SROA_0_I_2_INSERT_INSERT:%.*]] = or i64 [[X_SROA_0_I_2_INSERT_MASK]], 0
; CHECK-NEXT: [[TMP3:%.*]] = bitcast i64 [[X_SROA_0_I_2_INSERT_INSERT]] to double		; CHECK-NEXT: [[TMP3:%.*]] = bitcast i64 [[X_SROA_0_I_2_INSERT_INSERT]] to double
; CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64		; CHECK-NEXT: [[TMP4:%.*]] = bitcast double [[TMP3]] to i64
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	entry:
ret <3 x i8> %y		ret <3 x i8> %y
}		}

define i32 @PR14572.2(<3 x i8> %x) {		define i32 @PR14572.2(<3 x i8> %x) {
; Ensure that a split integer load which is wider than the type size of the		; Ensure that a split integer load which is wider than the type size of the
; alloca (relying on the alloc size padding) doesn't trigger an assert.		; alloca (relying on the alloc size padding) doesn't trigger an assert.
; CHECK-LABEL: @PR14572.2(		; CHECK-LABEL: @PR14572.2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR5:%.\.fr.]] = freeze i8 undef
; CHECK-NEXT: [[TMP0:%.]] = bitcast <3 x i8> [[X:%.]] to i24		; CHECK-NEXT: [[TMP0:%.]] = bitcast <3 x i8> [[X:%.]] to i24
; CHECK-NEXT: [[A_SROA_2_0_INSERT_EXT:%.*]] = zext i8 undef to i32		; CHECK-NEXT: [[A_SROA_2_0_INSERT_EXT:%.*]] = zext i8 [[FR5]] to i32
; CHECK-NEXT: [[A_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_2_0_INSERT_EXT]], 24		; CHECK-NEXT: [[A_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_2_0_INSERT_EXT]], 24
; CHECK-NEXT: [[A_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 16777215		; CHECK-NEXT: [[A_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 16777215
; CHECK-NEXT: [[A_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_2_0_INSERT_MASK]], [[A_SROA_2_0_INSERT_SHIFT]]		; CHECK-NEXT: [[A_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_2_0_INSERT_MASK]], [[A_SROA_2_0_INSERT_SHIFT]]
; CHECK-NEXT: [[A_0_INSERT_EXT:%.*]] = zext i24 [[TMP0]] to i32		; CHECK-NEXT: [[A_0_INSERT_EXT:%.*]] = zext i24 [[TMP0]] to i32
; CHECK-NEXT: [[A_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_2_0_INSERT_INSERT]], -16777216		; CHECK-NEXT: [[A_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_2_0_INSERT_INSERT]], -16777216
; CHECK-NEXT: [[A_0_INSERT_INSERT:%.*]] = or i32 [[A_0_INSERT_MASK]], [[A_0_INSERT_EXT]]		; CHECK-NEXT: [[A_0_INSERT_INSERT:%.*]] = or i32 [[A_0_INSERT_MASK]], [[A_0_INSERT_EXT]]
; CHECK-NEXT: ret i32 [[A_0_INSERT_INSERT]]		; CHECK-NEXT: ret i32 [[A_0_INSERT_INSERT]]
;		;
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines

end:		end:
call void @llvm.memcpy.p0.p0.i32(ptr %data, ptr %tmp, i32 %size, i1 false)		call void @llvm.memcpy.p0.p0.i32(ptr %data, ptr %tmp, i32 %size, i1 false)
ret void		ret void
}		}

define void @PR15805(i1 %a, i1 %b) {		define void @PR15805(i1 %a, i1 %b) {
; CHECK-LABEL: @PR15805(		; CHECK-LABEL: @PR15805(
; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 undef, i64 undef		; CHECK-NEXT: [[FR6:%.\.fr.]] = freeze i64 undef
		; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[B:%.]], i64 [[FR6]], i64 [[FR6]]
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

%c = alloca i64, align 8		%c = alloca i64, align 8
%p.0.c = select i1 %a, ptr %c, ptr %c		%p.0.c = select i1 %a, ptr %c, ptr %c
%cond.in = select i1 %b, ptr %p.0.c, ptr %c		%cond.in = select i1 %b, ptr %p.0.c, ptr %c
%cond = load i64, ptr %cond.in, align 8		%cond = load i64, ptr %cond.in, align 8
ret void		ret void
}		}

define void @PR15805.1(i1 %a, i1 %b, i1 %c2) {		define void @PR15805.1(i1 %a, i1 %b, i1 %c2) {
; Same as the normal PR15805, but rigged to place the use before the def inside		; Same as the normal PR15805, but rigged to place the use before the def inside
; of looping unreachable code. This helps ensure that we aren't sensitive to the		; of looping unreachable code. This helps ensure that we aren't sensitive to the
; order in which the uses of the alloca are visited.		; order in which the uses of the alloca are visited.
;		;
; CHECK-LABEL: @PR15805.1(		; CHECK-LABEL: @PR15805.1(
		; CHECK-NEXT: [[FR7:%.\.fr.]] = freeze i64 undef
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: loop:		; CHECK: loop:
; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[A:%.]], i64 undef, i64 undef		; CHECK-NEXT: [[COND_SROA_SPECULATED:%.]] = select i1 [[A:%.]], i64 [[FR7]], i64 [[FR7]]
; CHECK-NEXT: br i1 [[C2:%.]], label [[LOOP:%.]], label [[EXIT]]		; CHECK-NEXT: br i1 [[C2:%.]], label [[LOOP:%.]], label [[EXIT]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;

%c = alloca i64, align 8		%c = alloca i64, align 8
br label %exit		br label %exit

▲ Show 20 Lines • Show All 204 Lines • ▼ Show 20 Lines
; load is unsplittable but unrelated to this alloca by just generating extra		; load is unsplittable but unrelated to this alloca by just generating extra
; loads without touching the original, but when the original load was out of		; loads without touching the original, but when the original load was out of
; this alloca we need to handle it specially to ensure the splits line up		; this alloca we need to handle it specially to ensure the splits line up
; properly for rewriting.		; properly for rewriting.
;		;
; CHECK-LABEL: @PR22093(		; CHECK-LABEL: @PR22093(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_SROA_0:%.*]] = alloca i16, align 4		; CHECK-NEXT: [[A_SROA_0:%.*]] = alloca i16, align 4
		; CHECK-NEXT: [[FR8:%.\.fr.]] = freeze i16 undef
; CHECK-NEXT: store volatile i16 42, ptr [[A_SROA_0]], align 4		; CHECK-NEXT: store volatile i16 42, ptr [[A_SROA_0]], align 4
; CHECK-NEXT: [[A_SROA_0_0_A_SROA_0_0_LOAD:%.*]] = load i16, ptr [[A_SROA_0]], align 4		; CHECK-NEXT: [[A_SROA_0_0_A_SROA_0_0_LOAD:%.*]] = load i16, ptr [[A_SROA_0]], align 4
; CHECK-NEXT: [[A_SROA_3_0_INSERT_EXT:%.*]] = zext i16 undef to i32		; CHECK-NEXT: [[A_SROA_3_0_INSERT_EXT:%.*]] = zext i16 [[FR8]] to i32
; CHECK-NEXT: [[A_SROA_3_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_3_0_INSERT_EXT]], 16		; CHECK-NEXT: [[A_SROA_3_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_3_0_INSERT_EXT]], 16
; CHECK-NEXT: [[A_SROA_3_0_INSERT_MASK:%.*]] = and i32 undef, 65535		; CHECK-NEXT: [[A_SROA_3_0_INSERT_MASK:%.*]] = and i32 undef, 65535
; CHECK-NEXT: [[A_SROA_3_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_3_0_INSERT_MASK]], [[A_SROA_3_0_INSERT_SHIFT]]		; CHECK-NEXT: [[A_SROA_3_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_3_0_INSERT_MASK]], [[A_SROA_3_0_INSERT_SHIFT]]
; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[A_SROA_0_0_A_SROA_0_0_LOAD]] to i32		; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[A_SROA_0_0_A_SROA_0_0_LOAD]] to i32
; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_3_0_INSERT_INSERT]], -65536		; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_3_0_INSERT_INSERT]], -65536
; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]		; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]
; CHECK-NEXT: [[A_SROA_0_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[A_SROA_0_0_INSERT_INSERT]] to i16		; CHECK-NEXT: [[A_SROA_0_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[A_SROA_0_0_INSERT_INSERT]] to i16
; CHECK-NEXT: store i16 [[A_SROA_0_0_EXTRACT_TRUNC]], ptr [[A_SROA_0]], align 4		; CHECK-NEXT: store i16 [[A_SROA_0_0_EXTRACT_TRUNC]], ptr [[A_SROA_0]], align 4
Show All 17 Lines
; second store of the load makes the load unsplittable because of a mismatch of		; second store of the load makes the load unsplittable because of a mismatch of
; splits. Because this makes the load unsplittable, we also have to go back and		; splits. Because this makes the load unsplittable, we also have to go back and
; remove the first store from the presplit candidates as its load won't be		; remove the first store from the presplit candidates as its load won't be
; presplit.		; presplit.
;		;
; CHECK-LABEL: @PR22093.2(		; CHECK-LABEL: @PR22093.2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_SROA_0:%.*]] = alloca i16, align 8		; CHECK-NEXT: [[A_SROA_0:%.*]] = alloca i16, align 8
		; CHECK-NEXT: [[FR9:%.\.fr.]] = freeze i16 undef
; CHECK-NEXT: [[A_SROA_31:%.*]] = alloca i8, align 4		; CHECK-NEXT: [[A_SROA_31:%.*]] = alloca i8, align 4
; CHECK-NEXT: store volatile i16 42, ptr [[A_SROA_0]], align 8		; CHECK-NEXT: store volatile i16 42, ptr [[A_SROA_0]], align 8
; CHECK-NEXT: [[A_SROA_0_0_A_SROA_0_0_LOAD:%.*]] = load i16, ptr [[A_SROA_0]], align 8		; CHECK-NEXT: [[A_SROA_0_0_A_SROA_0_0_LOAD:%.*]] = load i16, ptr [[A_SROA_0]], align 8
; CHECK-NEXT: [[A_SROA_3_0_INSERT_EXT:%.*]] = zext i16 undef to i32		; CHECK-NEXT: [[A_SROA_3_0_INSERT_EXT:%.*]] = zext i16 [[FR9]] to i32
; CHECK-NEXT: [[A_SROA_3_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_3_0_INSERT_EXT]], 16		; CHECK-NEXT: [[A_SROA_3_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_3_0_INSERT_EXT]], 16
; CHECK-NEXT: [[A_SROA_3_0_INSERT_MASK:%.*]] = and i32 undef, 65535		; CHECK-NEXT: [[A_SROA_3_0_INSERT_MASK:%.*]] = and i32 undef, 65535
; CHECK-NEXT: [[A_SROA_3_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_3_0_INSERT_MASK]], [[A_SROA_3_0_INSERT_SHIFT]]		; CHECK-NEXT: [[A_SROA_3_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_3_0_INSERT_MASK]], [[A_SROA_3_0_INSERT_SHIFT]]
; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[A_SROA_0_0_A_SROA_0_0_LOAD]] to i32		; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[A_SROA_0_0_A_SROA_0_0_LOAD]] to i32
; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_3_0_INSERT_INSERT]], -65536		; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_3_0_INSERT_INSERT]], -65536
; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]		; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]
; CHECK-NEXT: [[A_SROA_0_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[A_SROA_0_0_INSERT_INSERT]] to i16		; CHECK-NEXT: [[A_SROA_0_0_EXTRACT_TRUNC:%.*]] = trunc i32 [[A_SROA_0_0_INSERT_INSERT]] to i16
; CHECK-NEXT: store i16 [[A_SROA_0_0_EXTRACT_TRUNC]], ptr [[A_SROA_0]], align 8		; CHECK-NEXT: store i16 [[A_SROA_0_0_EXTRACT_TRUNC]], ptr [[A_SROA_0]], align 8
▲ Show 20 Lines • Show All 164 Lines • ▼ Show 20 Lines

declare void @llvm.lifetime.start.isVoid.i64.p0(i64, ptr nocapture)		declare void @llvm.lifetime.start.isVoid.i64.p0(i64, ptr nocapture)
declare void @llvm.lifetime.end.isVoid.i64.p0(i64, ptr nocapture)		declare void @llvm.lifetime.end.isVoid.i64.p0(i64, ptr nocapture)
@array = dso_local global [10 x float] zeroinitializer, align 4		@array = dso_local global [10 x float] zeroinitializer, align 4

define void @test29(i32 %num, i32 %tid) {		define void @test29(i32 %num, i32 %tid) {
; CHECK-LABEL: @test29(		; CHECK-LABEL: @test29(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR11:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[NUM:%.]], 0		; CHECK-NEXT: [[CMP1:%.]] = icmp sgt i32 [[NUM:%.]], 0
; CHECK-NEXT: br i1 [[CMP1]], label [[BB1:%.]], label [[BB7:%.]]		; CHECK-NEXT: br i1 [[CMP1]], label [[BB1:%.]], label [[BB7:%.]]
; CHECK: bb1:		; CHECK: bb1:
; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[TID:%.]], 0		; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[TID:%.]], 0
; CHECK-NEXT: [[CONV_I:%.*]] = zext i32 [[TID]] to i64		; CHECK-NEXT: [[CONV_I:%.*]] = zext i32 [[TID]] to i64
; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [10 x float], ptr @array, i64 0, i64 [[CONV_I]]		; CHECK-NEXT: [[ARRAYIDX5:%.*]] = getelementptr inbounds [10 x float], ptr @array, i64 0, i64 [[CONV_I]]
; CHECK-NEXT: br label [[BB2:%.*]]		; CHECK-NEXT: br label [[BB2:%.*]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: [[I_02:%.]] = phi i32 [ [[NUM]], [[BB1]] ], [ [[SUB:%.]], [[BB5:%.*]] ]		; CHECK-NEXT: [[I_02:%.]] = phi i32 [ [[NUM]], [[BB1]] ], [ [[SUB:%.]], [[BB5:%.*]] ]
; CHECK-NEXT: br i1 [[TOBOOL]], label [[BB3:%.]], label [[BB4:%.]]		; CHECK-NEXT: br i1 [[TOBOOL]], label [[BB3:%.]], label [[BB4:%.]]
; CHECK: bb3:		; CHECK: bb3:
; CHECK-NEXT: br label [[BB5]]		; CHECK-NEXT: br label [[BB5]]
; CHECK: bb4:		; CHECK: bb4:
; CHECK-NEXT: store i32 undef, ptr [[ARRAYIDX5]], align 4		; CHECK-NEXT: store i32 [[FR11]], ptr [[ARRAYIDX5]], align 4
; CHECK-NEXT: br label [[BB5]]		; CHECK-NEXT: br label [[BB5]]
; CHECK: bb5:		; CHECK: bb5:
; CHECK-NEXT: [[SUB]] = add i32 [[I_02]], -1		; CHECK-NEXT: [[SUB]] = add i32 [[I_02]], -1
; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i32 [[SUB]], 0		; CHECK-NEXT: [[CMP:%.*]] = icmp sgt i32 [[SUB]], 0
; CHECK-NEXT: br i1 [[CMP]], label [[BB2]], label [[BB6:%.*]]		; CHECK-NEXT: br i1 [[CMP]], label [[BB2]], label [[BB6:%.*]]
; CHECK: bb6:		; CHECK: bb6:
; CHECK-NEXT: br label [[BB7]]		; CHECK-NEXT: br label [[BB7]]
; CHECK: bb7:		; CHECK: bb7:
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/phi-and-select.ll

Show First 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	exit:
ret i32 %Z2		ret i32 %Z2
}		}

define i32 @test8(i32 %b, ptr %ptr) {		define i32 @test8(i32 %b, ptr %ptr) {
; Ensure that we rewrite allocas to the used type when that use is hidden by		; Ensure that we rewrite allocas to the used type when that use is hidden by
; a PHI that can be speculated.		; a PHI that can be speculated.
; CHECK-LABEL: @test8(		; CHECK-LABEL: @test8(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: [[TEST:%.]] = icmp ne i32 [[B:%.]], 0		; CHECK-NEXT: [[TEST:%.]] = icmp ne i32 [[B:%.]], 0
; CHECK-NEXT: br i1 [[TEST]], label [[THEN:%.]], label [[ELSE:%.]]		; CHECK-NEXT: br i1 [[TEST]], label [[THEN:%.]], label [[ELSE:%.]]
; CHECK: then:		; CHECK: then:
; CHECK-NEXT: [[PHI_SROA_SPECULATE_LOAD_THEN:%.]] = load i32, ptr [[PTR:%.]], align 4		; CHECK-NEXT: [[PHI_SROA_SPECULATE_LOAD_THEN:%.]] = load i32, ptr [[PTR:%.]], align 4
; CHECK-NEXT: br label [[EXIT:%.*]]		; CHECK-NEXT: br label [[EXIT:%.*]]
; CHECK: else:		; CHECK: else:
; CHECK-NEXT: br label [[EXIT]]		; CHECK-NEXT: br label [[EXIT]]
; CHECK: exit:		; CHECK: exit:
; CHECK-NEXT: [[PHI_SROA_SPECULATED:%.*]] = phi i32 [ undef, [[ELSE]] ], [ [[PHI_SROA_SPECULATE_LOAD_THEN]], [[THEN]] ]		; CHECK-NEXT: [[PHI_SROA_SPECULATED:%.*]] = phi i32 [ [[FR1]], [[ELSE]] ], [ [[PHI_SROA_SPECULATE_LOAD_THEN]], [[THEN]] ]
; CHECK-NEXT: ret i32 [[PHI_SROA_SPECULATED]]		; CHECK-NEXT: ret i32 [[PHI_SROA_SPECULATED]]
;		;

entry:		entry:
%f = alloca float		%f = alloca float
%test = icmp ne i32 %b, 0		%test = icmp ne i32 %b, 0
br i1 %test, label %then, label %else		br i1 %test, label %then, label %else

then:		then:
br label %exit		br label %exit

else:		else:
br label %exit		br label %exit

exit:		exit:
%phi = phi ptr [ %f, %else ], [ %ptr, %then ]		%phi = phi ptr [ %f, %else ], [ %ptr, %then ]
%loaded = load i32, ptr %phi, align 4		%loaded = load i32, ptr %phi, align 4
ret i32 %loaded		ret i32 %loaded
}		}

define i32 @test9(i32 %b, ptr %ptr) {		define i32 @test9(i32 %b, ptr %ptr) {
; Same as @test8 but for a select rather than a PHI node.		; Same as @test8 but for a select rather than a PHI node.
; CHECK-LABEL: @test9(		; CHECK-LABEL: @test9(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze i32 undef
; CHECK-NEXT: store i32 0, ptr [[PTR:%.*]], align 4		; CHECK-NEXT: store i32 0, ptr [[PTR:%.*]], align 4
; CHECK-NEXT: [[TEST:%.]] = icmp ne i32 [[B:%.]], 0		; CHECK-NEXT: [[TEST:%.]] = icmp ne i32 [[B:%.]], 0
; CHECK-NEXT: [[LOADED_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i32, ptr [[PTR]], align 4		; CHECK-NEXT: [[LOADED_SROA_SPECULATE_LOAD_FALSE:%.*]] = load i32, ptr [[PTR]], align 4
; CHECK-NEXT: [[LOADED_SROA_SPECULATED:%.*]] = select i1 [[TEST]], i32 undef, i32 [[LOADED_SROA_SPECULATE_LOAD_FALSE]]		; CHECK-NEXT: [[LOADED_SROA_SPECULATED:%.*]] = select i1 [[TEST]], i32 [[FR2]], i32 [[LOADED_SROA_SPECULATE_LOAD_FALSE]]
; CHECK-NEXT: ret i32 [[LOADED_SROA_SPECULATED]]		; CHECK-NEXT: ret i32 [[LOADED_SROA_SPECULATED]]
;		;

entry:		entry:
%f = alloca float		%f = alloca float
store i32 0, ptr %ptr		store i32 0, ptr %ptr
%test = icmp ne i32 %b, 0		%test = icmp ne i32 %b, 0
%select = select i1 %test, ptr %f, ptr %ptr		%select = select i1 %test, ptr %f, ptr %ptr
▲ Show 20 Lines • Show All 487 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/phi-gep.ll

	Show First 20 Lines • Show All 436 Lines • ▼ Show 20 Lines

	bb2:			bb2:
	ret void			ret void
	}			}

	define void @constant_value_phi(i1 %c1) {			define void @constant_value_phi(i1 %c1) {
	; CHECK-LABEL: @constant_value_phi(			; CHECK-LABEL: @constant_value_phi(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i16 undef
	; CHECK-NEXT: br label [[LAND_LHS_TRUE_I:%.*]]			; CHECK-NEXT: br label [[LAND_LHS_TRUE_I:%.*]]
	; CHECK: land.lhs.true.i:			; CHECK: land.lhs.true.i:
	; CHECK-NEXT: br i1 [[C1:%.]], label [[COND_END_I:%.]], label [[COND_END_I]]			; CHECK-NEXT: br i1 [[C1:%.]], label [[COND_END_I:%.]], label [[COND_END_I]]
	; CHECK: cond.end.i:			; CHECK: cond.end.i:
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	;			;
	entry:			entry:
	%s1 = alloca [3 x i16]			%s1 = alloca [3 x i16]
	%s = alloca [3 x i16]			%s = alloca [3 x i16]
	br label %land.lhs.true.i			br label %land.lhs.true.i

	land.lhs.true.i: ; preds = %entry			land.lhs.true.i: ; preds = %entry
	br i1 %c1, label %cond.end.i, label %cond.end.i			br i1 %c1, label %cond.end.i, label %cond.end.i

	cond.end.i: ; preds = %land.lhs.true.i, %land.lhs.true.i			cond.end.i: ; preds = %land.lhs.true.i, %land.lhs.true.i
	%.pre-phi1 = phi ptr [ %s1, %land.lhs.true.i ], [ %s1, %land.lhs.true.i ]			%.pre-phi1 = phi ptr [ %s1, %land.lhs.true.i ], [ %s1, %land.lhs.true.i ]
	call void @llvm.memcpy.p0.p0.i64(ptr %.pre-phi1, ptr %s, i64 3, i1 false)			call void @llvm.memcpy.p0.p0.i64(ptr %.pre-phi1, ptr %s, i64 3, i1 false)
	%load = load i16, ptr %s			%load = load i16, ptr %s
	unreachable			unreachable
	}			}

	define i32 @test_sroa_phi_gep_multiple_values_from_same_block(i32 %arg) {			define i32 @test_sroa_phi_gep_multiple_values_from_same_block(i32 %arg) {
	; CHECK-LABEL: @test_sroa_phi_gep_multiple_values_from_same_block(			; CHECK-LABEL: @test_sroa_phi_gep_multiple_values_from_same_block(
	; CHECK-NEXT: bb.1:			; CHECK-NEXT: bb.1:
				; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze i32 undef
				; CHECK-NEXT: [[FR3:%.\.fr.]] = freeze i32 undef
				; CHECK-NEXT: [[FR4:%.\.fr.]] = freeze i32 undef
	; CHECK-NEXT: switch i32 [[ARG:%.]], label [[BB_3:%.]] [			; CHECK-NEXT: switch i32 [[ARG:%.]], label [[BB_3:%.]] [
	; CHECK-NEXT: i32 1, label [[BB_2:%.*]]			; CHECK-NEXT: i32 1, label [[BB_2:%.*]]
	; CHECK-NEXT: i32 2, label [[BB_2]]			; CHECK-NEXT: i32 2, label [[BB_2]]
	; CHECK-NEXT: i32 3, label [[BB_4:%.*]]			; CHECK-NEXT: i32 3, label [[BB_4:%.*]]
	; CHECK-NEXT: i32 4, label [[BB_4]]			; CHECK-NEXT: i32 4, label [[BB_4]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: bb.2:			; CHECK: bb.2:
	; CHECK-NEXT: br label [[BB_4]]			; CHECK-NEXT: br label [[BB_4]]
	; CHECK: bb.3:			; CHECK: bb.3:
	; CHECK-NEXT: br label [[BB_4]]			; CHECK-NEXT: br label [[BB_4]]
	; CHECK: bb.4:			; CHECK: bb.4:
	; CHECK-NEXT: [[PHI_SROA_PHI_SROA_SPECULATED:%.]] = phi i32 [ undef, [[BB_3]] ], [ undef, [[BB_2]] ], [ undef, [[BB_1:%.]] ], [ undef, [[BB_1]] ]			; CHECK-NEXT: [[PHI_SROA_PHI_SROA_SPECULATED:%.]] = phi i32 [ [[FR2]], [[BB_3]] ], [ [[FR3]], [[BB_2]] ], [ [[FR4]], [[BB_1:%.]] ], [ [[FR4]], [[BB_1]] ]
	; CHECK-NEXT: ret i32 [[PHI_SROA_PHI_SROA_SPECULATED]]			; CHECK-NEXT: ret i32 [[PHI_SROA_PHI_SROA_SPECULATED]]
	;			;
	bb.1:			bb.1:
	%a = alloca %pair, align 4			%a = alloca %pair, align 4
	%b = alloca %pair, align 4			%b = alloca %pair, align 4
	switch i32 %arg, label %bb.3 [			switch i32 %arg, label %bb.3 [
	i32 1, label %bb.2			i32 1, label %bb.2
	i32 2, label %bb.2			i32 2, label %bb.2
	Show All 22 Lines

llvm/test/Transforms/SROA/phi-with-duplicate-pred.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=sroa -S \| FileCheck %s			; RUN: opt < %s -passes=sroa -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"

	@a = external global i16, align 1			@a = external global i16, align 1

	declare void @maybe_writes()			declare void @maybe_writes()

	define void @f2(i1 %c1) {			define void @f2(i1 %c1) {
	; CHECK-LABEL: @f2(			; CHECK-LABEL: @f2(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i16 undef
	; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]			; CHECK-NEXT: br i1 [[C1:%.]], label [[IF_THEN:%.]], label [[IF_ELSE:%.*]]
	; CHECK: if.then:			; CHECK: if.then:
	; CHECK-NEXT: br label [[CLEANUP:%.*]]			; CHECK-NEXT: br label [[CLEANUP:%.*]]
	; CHECK: cleanup:			; CHECK: cleanup:
	; CHECK-NEXT: [[G_0_SROA_SPECULATE_LOAD_CLEANUP:%.*]] = load i16, ptr @a, align 1			; CHECK-NEXT: [[G_0_SROA_SPECULATE_LOAD_CLEANUP:%.*]] = load i16, ptr @a, align 1
	; CHECK-NEXT: switch i32 2, label [[CLEANUP7:%.*]] [			; CHECK-NEXT: switch i32 2, label [[CLEANUP7:%.*]] [
	; CHECK-NEXT: i32 0, label [[LBL1:%.*]]			; CHECK-NEXT: i32 0, label [[LBL1:%.*]]
	; CHECK-NEXT: i32 2, label [[LBL1]]			; CHECK-NEXT: i32 2, label [[LBL1]]
	; CHECK-NEXT: ]			; CHECK-NEXT: ]
	; CHECK: if.else:			; CHECK: if.else:
	; CHECK-NEXT: br label [[LBL1]]			; CHECK-NEXT: br label [[LBL1]]
	; CHECK: lbl1:			; CHECK: lbl1:
	; CHECK-NEXT: [[G_0_SROA_SPECULATED:%.*]] = phi i16 [ [[G_0_SROA_SPECULATE_LOAD_CLEANUP]], [[CLEANUP]] ], [ [[G_0_SROA_SPECULATE_LOAD_CLEANUP]], [[CLEANUP]] ], [ undef, [[IF_ELSE]] ]			; CHECK-NEXT: [[G_0_SROA_SPECULATED:%.*]] = phi i16 [ [[G_0_SROA_SPECULATE_LOAD_CLEANUP]], [[CLEANUP]] ], [ [[G_0_SROA_SPECULATE_LOAD_CLEANUP]], [[CLEANUP]] ], [ [[FR1]], [[IF_ELSE]] ]
	; CHECK-NEXT: unreachable			; CHECK-NEXT: unreachable
	; CHECK: cleanup7:			; CHECK: cleanup7:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%e = alloca i16, align 1			%e = alloca i16, align 1
	br i1 %c1, label %if.then, label %if.else			br i1 %c1, label %if.then, label %if.else

	▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/pr37267.ll

Show All 37 Lines	; slice 4: [2,4)

%rc = add i16 %_tmp13, %_tmp16		%rc = add i16 %_tmp13, %_tmp16
ret i16 %rc		ret i16 %rc
}		}

define i16 @f2() {		define i16 @f2() {
; CHECK-LABEL: @f2(		; CHECK-LABEL: @f2(
; CHECK-NEXT: bb1:		; CHECK-NEXT: bb1:
; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_EXT:%.*]] = zext i16 undef to i32		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze i16 undef
		; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze i16 undef
		; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_EXT:%.*]] = zext i16 [[FR2]] to i32
; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_MASK:%.*]] = and i32 undef, -65536		; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_MASK:%.*]] = and i32 undef, -65536
; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_INSERT:%.*]] = or i32 [[A_3_SROA_2_2_INSERT_MASK]], [[A_3_SROA_2_2_INSERT_EXT]]		; CHECK-NEXT: [[A_3_SROA_2_2_INSERT_INSERT:%.*]] = or i32 [[A_3_SROA_2_2_INSERT_MASK]], [[A_3_SROA_2_2_INSERT_EXT]]
; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_EXT:%.*]] = zext i16 undef to i32		; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_EXT:%.*]] = zext i16 [[FR1]] to i32
; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_SHIFT:%.*]] = shl i32 [[A_3_SROA_0_2_INSERT_EXT]], 16		; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_SHIFT:%.*]] = shl i32 [[A_3_SROA_0_2_INSERT_EXT]], 16
; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_MASK:%.*]] = and i32 [[A_3_SROA_2_2_INSERT_INSERT]], 65535		; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_MASK:%.*]] = and i32 [[A_3_SROA_2_2_INSERT_INSERT]], 65535
; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_INSERT:%.*]] = or i32 [[A_3_SROA_0_2_INSERT_MASK]], [[A_3_SROA_0_2_INSERT_SHIFT]]		; CHECK-NEXT: [[A_3_SROA_0_2_INSERT_INSERT:%.*]] = or i32 [[A_3_SROA_0_2_INSERT_MASK]], [[A_3_SROA_0_2_INSERT_SHIFT]]
; CHECK-NEXT: [[RC:%.*]] = add i16 2, undef		; CHECK-NEXT: [[RC:%.*]] = add i16 2, [[FR1]]
; CHECK-NEXT: ret i16 [[RC]]		; CHECK-NEXT: ret i16 [[RC]]
;		;

bb1:		bb1:
; This 12-byte alloca is split into partitions as [0,2), [2,4), [4,8), [8,10), [10, 12).		; This 12-byte alloca is split into partitions as [0,2), [2,4), [4,8), [8,10), [10, 12).
; The reported error happened when visitLoadInst rewrites a split tail of slice 1 for [4, 8) partition.		; The reported error happened when visitLoadInst rewrites a split tail of slice 1 for [4, 8) partition.
; alloca 012345678901		; alloca 012345678901
; slice 1: RRRR		; slice 1: RRRR
Show All 21 Lines

llvm/test/Transforms/SROA/same-promoted-undefs.ll

This file was added.

				; RUN: opt < %s -passes=sroa -S \| FileCheck %s
				;
				; When sroa replaces loads with undefined values, the undefined values
				; have an implicit value that must be preserved for comparison purposes
				;
				; Need to test 2 paths, one where alloca/loads/stores are in one basic block
				; and when they are not in the same basic block

				%struct.array1 = type { [1 x i32] }
				%struct.array2 = type { [1 x i32] }

				; Test all loads and references in same basic block
				define void @SingleBlock() {
				entry:
				%a = alloca %struct.array1, align 4
				%b = load i32, ptr %a, align 4
				%c = load i32, ptr %a, align 4
				%0 = icmp sge i32 %b, %c
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @SingleBlock
				; CHECK-NOT: br
				; CHECK: [[FR1:%.\.fr.]] = freeze i32 undef
				; CHECK: {{.*}} = icmp sge i32 [[FR1]], [[FR1]]

				; Test all loads and references in same basic block but used elsewhere
				define void @SingleBlockLoadsUsedOutside() {
				entry:
				%a = alloca %struct.array1, align 4
				%b = load i32, ptr %a, align 4
				%c = load i32, ptr %a, align 4
				br label %lab1

				lab1:
				%0 = icmp slt i32 %b, %c
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @SingleBlockLoadsUsedOutside
				; CHECK-NOT: br
				; CHECK: [[FR2:%.\.fr.]] = freeze i32 undef
				; CHECK-LABEL: lab1:{{.*}}
				; CHECK-NOT: br
				; CHECK: {{.*}} = icmp slt i32 [[FR2]], [[FR2]]

				; Test all loads and references in same basic block with different allocas
				define void @SingleBlock2Allocas() {
				entry:
				%a1 = alloca %struct.array1, align 4
				%a2 = alloca %struct.array1, align 4
				%b = load i32, ptr %a1, align 4
				%c = load i32, ptr %a2, align 4
				%0 = icmp sge i32 %b, %c
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @SingleBlock2Allocas
				; CHECK-NOT: br
				; CHECK: [[FR3:%.\.fr.]] = freeze i32 undef
				; CHECK: [[FR4:%.\.fr.]] = freeze i32 undef
				; CHECK: {{.*}} = icmp sge i32 [[FR3]], [[FR4]]

				; Test all loads and references in same basic block but used elsewhere
				; with 2 allocas
				define void @SingleBlockLoadsUsedOutside2Allocas() {
				entry:
				%a1 = alloca %struct.array1, align 4
				%a2 = alloca %struct.array2, align 4
				%b = load i32, ptr %a1, align 4
				%c = load i32, ptr %a2, align 4
				br label %lab1

				lab1:
				%0 = icmp slt i32 %b, %c
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @SingleBlockLoadsUsedOutside2Allocas
				; CHECK-NOT: br
				; CHECK: [[FR5:%.\.fr.]] = freeze i32 undef
				; CHECK: [[FR6:%.\.fr.]] = freeze i32 undef
				; CHECK-LABEL: lab1:{{.*}}
				; CHECK-NOT: br
				; CHECK: {{.*}} = icmp slt i32 [[FR5]], [[FR6]]

				; Test multiblock scenario
				define void @MultiBlock() {
				entry:
				%a = alloca %struct.array1, align 4
				br label %lab1

				lab1:
				%b = load i32, ptr %a, align 4
				br label %lab2

				lab2:
				%c = load i32, ptr %a, align 4
				br label %lab3

				lab3:
				%0 = icmp sle i32 %b, %c
				br label %lab4

				lab4:
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @MultiBlock
				; CHECK-NOT: br
				; CHECK: [[FR7:%.\.fr.]] = freeze i32 undef
				; CHECK-LABEL: lab3:{{.*}}
				; CHECK-NOT: br
				; CHECK: {{.*}} = icmp sle i32 [[FR7]], [[FR7]]

				; Test loads and references in different basic blocks with different allocas
				define void @MultiBlock2Allocas() {
				entry:
				%a1 = alloca %struct.array1, align 4
				%a2 = alloca %struct.array2, align 4
				br label %lab1

				lab1:
				%b = load i32, ptr %a1, align 4
				br label %lab2

				lab2:
				%c = load i32, ptr %a2, align 4
				br label %lab3

				lab3:
				%0 = icmp sle i32 %b, %c
				br label %lab4

				lab4:
				br i1 %0, label %lab, label %lab

				lab:
				ret void
				}

				; CHECK-LABEL: define void @MultiBlock2Allocas
				; CHECK-NOT: br
				; CHECK: [[FR8:%.\.fr.]] = freeze i32 undef
				; CHECK: [[FR9:%.\.fr.]] = freeze i32 undef
				; CHECK-LABEL: lab3:{{.*}}
				; CHECK-NOT: br
				; CHECK: {{.*}} = icmp sle i32 [[FR8]], [[FR9]]

llvm/test/Transforms/SROA/scalable-vectors.ll

Show First 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	;
%2 = load <vscale x 4 x i32>, ptr %type.addr		%2 = load <vscale x 4 x i32>, ptr %type.addr
ret <vscale x 4 x i32> %2		ret <vscale x 4 x i32> %2
}		}

; When casting from VLA to VLS via memory check we bail out when producing a		; When casting from VLA to VLS via memory check we bail out when producing a
; GEP where the element type is a scalable vector.		; GEP where the element type is a scalable vector.
define <vscale x 4 x i32> @cast_alloca_from_svint32_t() {		define <vscale x 4 x i32> @cast_alloca_from_svint32_t() {
; CHECK-LABEL: @cast_alloca_from_svint32_t(		; CHECK-LABEL: @cast_alloca_from_svint32_t(
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze <16 x i32> undef
; CHECK-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i32>, align 16		; CHECK-NEXT: [[RETVAL_COERCE:%.*]] = alloca <vscale x 4 x i32>, align 16
; CHECK-NEXT: store <16 x i32> undef, ptr [[RETVAL_COERCE]], align 16		; CHECK-NEXT: store <16 x i32> [[FR1]], ptr [[RETVAL_COERCE]], align 16
; CHECK-NEXT: [[TMP2:%.*]] = load <vscale x 4 x i32>, ptr [[RETVAL_COERCE]], align 16		; CHECK-NEXT: [[TMP2:%.*]] = load <vscale x 4 x i32>, ptr [[RETVAL_COERCE]], align 16
; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]		; CHECK-NEXT: ret <vscale x 4 x i32> [[TMP2]]
;		;
%retval = alloca <16 x i32>		%retval = alloca <16 x i32>
%retval.coerce = alloca <vscale x 4 x i32>		%retval.coerce = alloca <vscale x 4 x i32>
call void @llvm.memcpy.p0.p0.i64(ptr align 16 %retval.coerce, ptr align 16 %retval, i64 64, i1 false)		call void @llvm.memcpy.p0.p0.i64(ptr align 16 %retval.coerce, ptr align 16 %retval, i64 64, i1 false)
%1 = load <vscale x 4 x i32>, ptr %retval.coerce		%1 = load <vscale x 4 x i32>, ptr %retval.coerce
ret <vscale x 4 x i32> %1		ret <vscale x 4 x i32> %1
}		}

declare void @llvm.memcpy.p0.p0.i64(ptr nocapture, ptr nocapture, i64, i1) nounwind		declare void @llvm.memcpy.p0.p0.i64(ptr nocapture, ptr nocapture, i64, i1) nounwind

llvm/test/Transforms/SROA/select-load.ll

	Show All 32 Lines
	}			}

	%st.args = type { i32, ptr }			%st.args = type { i32, ptr }

	; A bitcasted load and a direct load of select.			; A bitcasted load and a direct load of select.
	define void @test_multiple_loads_select(i1 %cmp){			define void @test_multiple_loads_select(i1 %cmp){
	; CHECK-LABEL: @test_multiple_loads_select(			; CHECK-LABEL: @test_multiple_loads_select(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: [[ADDR_I8_SROA_SPECULATED:%.]] = select i1 [[CMP:%.]], ptr undef, ptr undef			; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze ptr undef
				; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze ptr undef
				; CHECK-NEXT: [[ADDR_I8_SROA_SPECULATED:%.]] = select i1 [[CMP:%.]], ptr [[FR2]], ptr [[FR1]]
	; CHECK-NEXT: call void @foo_i8(ptr [[ADDR_I8_SROA_SPECULATED]])			; CHECK-NEXT: call void @foo_i8(ptr [[ADDR_I8_SROA_SPECULATED]])
	; CHECK-NEXT: [[ADDR_I32_SROA_SPECULATED:%.*]] = select i1 [[CMP]], ptr undef, ptr undef			; CHECK-NEXT: [[ADDR_I32_SROA_SPECULATED:%.*]] = select i1 [[CMP]], ptr [[FR2]], ptr [[FR1]]
	; CHECK-NEXT: call void @foo_i32(ptr [[ADDR_I32_SROA_SPECULATED]])			; CHECK-NEXT: call void @foo_i32(ptr [[ADDR_I32_SROA_SPECULATED]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	entry:			entry:
	%args = alloca [2 x %st.args], align 16			%args = alloca [2 x %st.args], align 16
	%arr1 = getelementptr inbounds [2 x %st.args], ptr %args, i64 0, i64 1			%arr1 = getelementptr inbounds [2 x %st.args], ptr %args, i64 0, i64 1
	%sel = select i1 %cmp, ptr %arr1, ptr %args			%sel = select i1 %cmp, ptr %arr1, ptr %args
	%addr = getelementptr inbounds %st.args, ptr %sel, i64 0, i32 1			%addr = getelementptr inbounds %st.args, ptr %sel, i64 0, i32 1
	Show All 9 Lines

llvm/test/Transforms/SROA/slice-width.ll

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines

	declare i32 @memcpy_vec3float_helper(ptr)			declare i32 @memcpy_vec3float_helper(ptr)

	; PR18726: Check that SROA does not rewrite a 12-byte memcpy into a 16-byte			; PR18726: Check that SROA does not rewrite a 12-byte memcpy into a 16-byte
	; vector store, hence accidentally putting gibberish onto the stack.			; vector store, hence accidentally putting gibberish onto the stack.
	define i32 @memcpy_vec3float_widening(ptr %x) {			define i32 @memcpy_vec3float_widening(ptr %x) {
	; CHECK-LABEL: @memcpy_vec3float_widening(			; CHECK-LABEL: @memcpy_vec3float_widening(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
				; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze <4 x float> undef
	; CHECK-NEXT: [[TMP1_SROA_0_0_COPYLOAD:%.]] = load <3 x float>, ptr [[X:%.]], align 4			; CHECK-NEXT: [[TMP1_SROA_0_0_COPYLOAD:%.]] = load <3 x float>, ptr [[X:%.]], align 4
	; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <3 x float> [[TMP1_SROA_0_0_COPYLOAD]], <3 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>			; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <3 x float> [[TMP1_SROA_0_0_COPYLOAD]], <3 x float> poison, <4 x i32> <i32 0, i32 1, i32 2, i32 undef>
	; CHECK-NEXT: [[TMP1_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> [[TMP1_SROA_0_0_VEC_EXPAND]], <4 x float> undef			; CHECK-NEXT: [[TMP1_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 true, i1 false>, <4 x float> [[TMP1_SROA_0_0_VEC_EXPAND]], <4 x float> [[FR1]]
	; CHECK-NEXT: [[TMP2:%.]] = alloca [[S_VEC3FLOAT:%.]], align 4			; CHECK-NEXT: [[TMP2:%.]] = alloca [[S_VEC3FLOAT:%.]], align 4
	; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXTRACT:%.*]] = shufflevector <4 x float> [[TMP1_SROA_0_0_VECBLEND]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>			; CHECK-NEXT: [[TMP1_SROA_0_0_VEC_EXTRACT:%.*]] = shufflevector <4 x float> [[TMP1_SROA_0_0_VECBLEND]], <4 x float> poison, <3 x i32> <i32 0, i32 1, i32 2>
	; CHECK-NEXT: store <3 x float> [[TMP1_SROA_0_0_VEC_EXTRACT]], ptr [[TMP2]], align 4			; CHECK-NEXT: store <3 x float> [[TMP1_SROA_0_0_VEC_EXTRACT]], ptr [[TMP2]], align 4
	; CHECK-NEXT: [[RESULT:%.*]] = call i32 @memcpy_vec3float_helper(ptr [[TMP2]])			; CHECK-NEXT: [[RESULT:%.*]] = call i32 @memcpy_vec3float_helper(ptr [[TMP2]])
	; CHECK-NEXT: ret i32 [[RESULT]]			; CHECK-NEXT: ret i32 [[RESULT]]
	;			;
	entry:			entry:
	; Create a temporary variable %tmp1 and copy %x[0] into it			; Create a temporary variable %tmp1 and copy %x[0] into it
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll

Show First 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	bb:
ret void		ret void
}		}

define amdgpu_kernel void @test_half_array() #0 {		define amdgpu_kernel void @test_half_array() #0 {
; CHECK-LABEL: @test_half_array(		; CHECK-LABEL: @test_half_array(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[B_BLOCKWISE_COPY_SROA_0:%.*]] = alloca float, align 16		; CHECK-NEXT: [[B_BLOCKWISE_COPY_SROA_0:%.*]] = alloca float, align 16
; CHECK-NEXT: [[B_BLOCKWISE_COPY_SROA_4:%.*]] = alloca float, align 4		; CHECK-NEXT: [[B_BLOCKWISE_COPY_SROA_4:%.*]] = alloca float, align 4
		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze float undef
		; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze float undef
; CHECK-NEXT: call void @llvm.memset.p0.i32(ptr align 16 [[B_BLOCKWISE_COPY_SROA_0]], i8 0, i32 4, i1 false)		; CHECK-NEXT: call void @llvm.memset.p0.i32(ptr align 16 [[B_BLOCKWISE_COPY_SROA_0]], i8 0, i32 4, i1 false)
; CHECK-NEXT: call void @llvm.memset.p0.i32(ptr align 4 [[B_BLOCKWISE_COPY_SROA_4]], i8 0, i32 4, i1 false)		; CHECK-NEXT: call void @llvm.memset.p0.i32(ptr align 4 [[B_BLOCKWISE_COPY_SROA_4]], i8 0, i32 4, i1 false)
; CHECK-NEXT: [[TMP0:%.*]] = bitcast float undef to i32		; CHECK-NEXT: [[TMP0:%.*]] = bitcast float [[FR1]] to i32
; CHECK-NEXT: [[TMP1:%.*]] = bitcast float undef to i32		; CHECK-NEXT: [[TMP1:%.*]] = bitcast float [[FR2]] to i32
; CHECK-NEXT: [[DATA:%.*]] = load [4 x float], ptr undef, align 4		; CHECK-NEXT: [[DATA:%.*]] = load [4 x float], ptr undef, align 4
; CHECK-NEXT: [[DATA_FCA_0_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 0		; CHECK-NEXT: [[DATA_FCA_0_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 0
; CHECK-NEXT: store float [[DATA_FCA_0_EXTRACT]], ptr [[B_BLOCKWISE_COPY_SROA_0]], align 16		; CHECK-NEXT: store float [[DATA_FCA_0_EXTRACT]], ptr [[B_BLOCKWISE_COPY_SROA_0]], align 16
; CHECK-NEXT: [[DATA_FCA_1_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 1		; CHECK-NEXT: [[DATA_FCA_1_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 1
; CHECK-NEXT: store float [[DATA_FCA_1_EXTRACT]], ptr [[B_BLOCKWISE_COPY_SROA_4]], align 4		; CHECK-NEXT: store float [[DATA_FCA_1_EXTRACT]], ptr [[B_BLOCKWISE_COPY_SROA_4]], align 4
; CHECK-NEXT: [[DATA_FCA_2_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 2		; CHECK-NEXT: [[DATA_FCA_2_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 2
; CHECK-NEXT: [[DATA_FCA_3_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 3		; CHECK-NEXT: [[DATA_FCA_3_EXTRACT:%.*]] = extractvalue [4 x float] [[DATA]], 3
; CHECK-NEXT: br label [[BB:%.*]]		; CHECK-NEXT: br label [[BB:%.*]]
▲ Show 20 Lines • Show All 160 Lines • Show Last 20 Lines

llvm/test/Transforms/SROA/vector-conversion.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -passes=sroa -S \| FileCheck %s			; RUN: opt < %s -passes=sroa -S \| FileCheck %s
	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n8:16:32:64"

	define <4 x i64> @vector_ptrtoint({<2 x ptr>, <2 x ptr>} %x) {			define <4 x i64> @vector_ptrtoint({<2 x ptr>, <2 x ptr>} %x) {
	; CHECK-LABEL: @vector_ptrtoint(			; CHECK-LABEL: @vector_ptrtoint(
				; CHECK-NEXT: [[FR1:%.*\.fr]] = freeze <4 x i64> undef
	; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X:%.]], 0			; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X:%.]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint <2 x ptr> [[X_FCA_0_EXTRACT]] to <2 x i64>			; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint <2 x ptr> [[X_FCA_0_EXTRACT]] to <2 x i64>
	; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP1]], <2 x i64> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> [[A_SROA_0_0_VEC_EXPAND]], <4 x i64> undef			; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i64> [[A_SROA_0_0_VEC_EXPAND]], <4 x i64> [[FR1]]
	; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X]], 1			; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x ptr>, <2 x ptr> } [[X]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint <2 x ptr> [[X_FCA_1_EXTRACT]] to <2 x i64>			; CHECK-NEXT: [[TMP2:%.*]] = ptrtoint <2 x ptr> [[X_FCA_1_EXTRACT]] to <2 x i64>
	; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x i64> [[TMP2]], <2 x i64> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i64> [[A_SROA_0_16_VEC_EXPAND]], <4 x i64> [[A_SROA_0_0_VECBLEND]]			; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i64> [[A_SROA_0_16_VEC_EXPAND]], <4 x i64> [[A_SROA_0_0_VECBLEND]]
	; CHECK-NEXT: ret <4 x i64> [[A_SROA_0_16_VECBLEND]]			; CHECK-NEXT: ret <4 x i64> [[A_SROA_0_16_VECBLEND]]
	;			;
	%a = alloca {<2 x ptr>, <2 x ptr>}			%a = alloca {<2 x ptr>, <2 x ptr>}

	store {<2 x ptr>, <2 x ptr>} %x, ptr %a			store {<2 x ptr>, <2 x ptr>} %x, ptr %a

	%vec = load <4 x i64>, ptr %a			%vec = load <4 x i64>, ptr %a

	ret <4 x i64> %vec			ret <4 x i64> %vec
	}			}

	define <4 x ptr> @vector_inttoptr({<2 x i64>, <2 x i64>} %x) {			define <4 x ptr> @vector_inttoptr({<2 x i64>, <2 x i64>} %x) {
	; CHECK-LABEL: @vector_inttoptr(			; CHECK-LABEL: @vector_inttoptr(
				; CHECK-NEXT: [[FR2:%.*\.fr]] = freeze <4 x ptr> undef
	; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <2 x i64>, <2 x i64> } [[X:%.]], 0			; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <2 x i64>, <2 x i64> } [[X:%.]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = inttoptr <2 x i64> [[X_FCA_0_EXTRACT]] to <2 x ptr>			; CHECK-NEXT: [[TMP1:%.*]] = inttoptr <2 x i64> [[X_FCA_0_EXTRACT]] to <2 x ptr>
	; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP1]], <2 x ptr> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>			; CHECK-NEXT: [[A_SROA_0_0_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP1]], <2 x ptr> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
	; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x ptr> [[A_SROA_0_0_VEC_EXPAND]], <4 x ptr> undef			; CHECK-NEXT: [[A_SROA_0_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x ptr> [[A_SROA_0_0_VEC_EXPAND]], <4 x ptr> [[FR2]]
	; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[X]], 1			; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <2 x i64>, <2 x i64> } [[X]], 1
	; CHECK-NEXT: [[TMP2:%.*]] = inttoptr <2 x i64> [[X_FCA_1_EXTRACT]] to <2 x ptr>			; CHECK-NEXT: [[TMP2:%.*]] = inttoptr <2 x i64> [[X_FCA_1_EXTRACT]] to <2 x ptr>
	; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP2]], <2 x ptr> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>			; CHECK-NEXT: [[A_SROA_0_16_VEC_EXPAND:%.*]] = shufflevector <2 x ptr> [[TMP2]], <2 x ptr> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
	; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x ptr> [[A_SROA_0_16_VEC_EXPAND]], <4 x ptr> [[A_SROA_0_0_VECBLEND]]			; CHECK-NEXT: [[A_SROA_0_16_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x ptr> [[A_SROA_0_16_VEC_EXPAND]], <4 x ptr> [[A_SROA_0_0_VECBLEND]]
	; CHECK-NEXT: ret <4 x ptr> [[A_SROA_0_16_VECBLEND]]			; CHECK-NEXT: ret <4 x ptr> [[A_SROA_0_16_VECBLEND]]
	;			;
	%a = alloca {<2 x i64>, <2 x i64>}			%a = alloca {<2 x i64>, <2 x i64>}

	store {<2 x i64>, <2 x i64>} %x, ptr %a			store {<2 x i64>, <2 x i64>} %x, ptr %a

	%vec = load <4 x ptr>, ptr %a			%vec = load <4 x ptr>, ptr %a

	ret <4 x ptr> %vec			ret <4 x ptr> %vec
	}			}

	define <2 x i64> @vector_ptrtointbitcast({<1 x ptr>, <1 x ptr>} %x) {			define <2 x i64> @vector_ptrtointbitcast({<1 x ptr>, <1 x ptr>} %x) {
	; CHECK-LABEL: @vector_ptrtointbitcast(			; CHECK-LABEL: @vector_ptrtointbitcast(
				; CHECK-NEXT: [[FR3:%.*\.fr]] = freeze <2 x i64> undef
	; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <1 x ptr>, <1 x ptr> } [[X:%.]], 0			; CHECK-NEXT: [[X_FCA_0_EXTRACT:%.]] = extractvalue { <1 x ptr>, <1 x ptr> } [[X:%.]], 0
	; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint <1 x ptr> [[X_FCA_0_EXTRACT]] to <1 x i64>			; CHECK-NEXT: [[TMP1:%.*]] = ptrtoint <1 x ptr> [[X_FCA_0_EXTRACT]] to <1 x i64>
	; CHECK-NEXT: [[TMP2:%.*]] = bitcast <1 x i64> [[TMP1]] to i64			; CHECK-NEXT: [[TMP2:%.*]] = bitcast <1 x i64> [[TMP1]] to i64
	; CHECK-NEXT: [[A_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i64> undef, i64 [[TMP2]], i32 0			; CHECK-NEXT: [[A_SROA_0_0_VEC_INSERT:%.*]] = insertelement <2 x i64> [[FR3]], i64 [[TMP2]], i32 0
	; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <1 x ptr>, <1 x ptr> } [[X]], 1			; CHECK-NEXT: [[X_FCA_1_EXTRACT:%.*]] = extractvalue { <1 x ptr>, <1 x ptr> } [[X]], 1
	; CHECK-NEXT: [[TMP3:%.*]] = ptrtoint <1 x ptr> [[X_FCA_1_EXTRACT]] to <1 x i64>			; CHECK-NEXT: [[TMP3:%.*]] = ptrtoint <1 x ptr> [[X_FCA_1_EXTRACT]] to <1 x i64>
	; CHECK-NEXT: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to i64			; CHECK-NEXT: [[TMP4:%.*]] = bitcast <1 x i64> [[TMP3]] to i64
	; CHECK-NEXT: [[A_SROA_0_8_VEC_INSERT:%.*]] = insertelement <2 x i64> [[A_SROA_0_0_VEC_INSERT]], i64 [[TMP4]], i32 1			; CHECK-NEXT: [[A_SROA_0_8_VEC_INSERT:%.*]] = insertelement <2 x i64> [[A_SROA_0_0_VEC_INSERT]], i64 [[TMP4]], i32 1
	; CHECK-NEXT: ret <2 x i64> [[A_SROA_0_8_VEC_INSERT]]			; CHECK-NEXT: ret <2 x i64> [[A_SROA_0_8_VEC_INSERT]]
	;			;
	%a = alloca {<1 x ptr>, <1 x ptr>}			%a = alloca {<1 x ptr>, <1 x ptr>}

	Show All 40 Lines

llvm/test/Transforms/SROA/vector-promotion.ll

Show First 20 Lines • Show All 235 Lines • ▼ Show 20 Lines	; PR13254
%addr = getelementptr inbounds { <4 x i64>, <4 x i64> }, ptr %tmp, i32 0, i32 0, i64 %n		%addr = getelementptr inbounds { <4 x i64>, <4 x i64> }, ptr %tmp, i32 0, i32 0, i64 %n
%res = load i64, ptr %addr, align 4		%res = load i64, ptr %addr, align 4
ret i64 %res		ret i64 %res
}		}

define <4 x i32> @test_subvec_store() {		define <4 x i32> @test_subvec_store() {
; CHECK-LABEL: @test_subvec_store(		; CHECK-LABEL: @test_subvec_store(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> undef		; CHECK-NEXT: [[FR1:%.\.fr.]] = freeze <4 x i32> undef
		; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x i32> <i32 0, i32 0, i32 undef, i32 undef>, <4 x i32> [[FR1]]
; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]]		; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x i32> <i32 undef, i32 1, i32 1, i32 undef>, <4 x i32> [[A_0_VECBLEND]]
; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]]		; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x i32> <i32 undef, i32 undef, i32 2, i32 2>, <4 x i32> [[A_4_VECBLEND]]
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x i32> [[A_8_VECBLEND]], i32 3, i32 3		; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x i32> [[A_8_VECBLEND]], i32 3, i32 3
; CHECK-NEXT: ret <4 x i32> [[A_12_VEC_INSERT]]		; CHECK-NEXT: ret <4 x i32> [[A_12_VEC_INSERT]]
;		;
entry:		entry:
%a = alloca <4 x i32>		%a = alloca <4 x i32>

Show All 40 Lines	entry:

ret <4 x i32> %ret		ret <4 x i32> %ret
}		}


define <4 x float> @test_subvec_memset() {		define <4 x float> @test_subvec_memset() {
; CHECK-LABEL: @test_subvec_memset(		; CHECK-LABEL: @test_subvec_memset(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> undef		; CHECK-NEXT: [[FR2:%.\.fr.]] = freeze <4 x float> undef
		; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> <float 0.000000e+00, float 0.000000e+00, float undef, float undef>, <4 x float> [[FR2]]
; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]]		; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> <float undef, float 0x3820202020000000, float 0x3820202020000000, float undef>, <4 x float> [[A_0_VECBLEND]]
; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]]		; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> <float undef, float undef, float 0x3860606060000000, float 0x3860606060000000>, <4 x float> [[A_4_VECBLEND]]
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float 0x38E0E0E0E0000000, i32 3		; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float 0x38E0E0E0E0000000, i32 3
; CHECK-NEXT: ret <4 x float> [[A_12_VEC_INSERT]]		; CHECK-NEXT: ret <4 x float> [[A_12_VEC_INSERT]]
;		;
entry:		entry:
%a = alloca <4 x float>		%a = alloca <4 x float>

Show All 11 Lines	entry:
%ret = load <4 x float>, ptr %a		%ret = load <4 x float>, ptr %a

ret <4 x float> %ret		ret <4 x float> %ret
}		}

define <4 x float> @test_subvec_memcpy(ptr %x, ptr %y, ptr %z, ptr %f, ptr %out) {		define <4 x float> @test_subvec_memcpy(ptr %x, ptr %y, ptr %z, ptr %f, ptr %out) {
; CHECK-LABEL: @test_subvec_memcpy(		; CHECK-LABEL: @test_subvec_memcpy(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR3:%.\.fr.]] = freeze <4 x float> undef
; CHECK-NEXT: [[A_0_COPYLOAD:%.]] = load <2 x float>, ptr [[X:%.]], align 1		; CHECK-NEXT: [[A_0_COPYLOAD:%.]] = load <2 x float>, ptr [[X:%.]], align 1
; CHECK-NEXT: [[A_0_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_0_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>		; CHECK-NEXT: [[A_0_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_0_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> [[A_0_VEC_EXPAND]], <4 x float> undef		; CHECK-NEXT: [[A_0_VECBLEND:%.*]] = select <4 x i1> <i1 true, i1 true, i1 false, i1 false>, <4 x float> [[A_0_VEC_EXPAND]], <4 x float> [[FR3]]
; CHECK-NEXT: [[A_4_COPYLOAD:%.]] = load <2 x float>, ptr [[Y:%.]], align 1		; CHECK-NEXT: [[A_4_COPYLOAD:%.]] = load <2 x float>, ptr [[Y:%.]], align 1
; CHECK-NEXT: [[A_4_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_4_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>		; CHECK-NEXT: [[A_4_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_4_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 undef, i32 0, i32 1, i32 undef>
; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]]		; CHECK-NEXT: [[A_4_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 true, i1 true, i1 false>, <4 x float> [[A_4_VEC_EXPAND]], <4 x float> [[A_0_VECBLEND]]
; CHECK-NEXT: [[A_8_COPYLOAD:%.]] = load <2 x float>, ptr [[Z:%.]], align 1		; CHECK-NEXT: [[A_8_COPYLOAD:%.]] = load <2 x float>, ptr [[Z:%.]], align 1
; CHECK-NEXT: [[A_8_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_8_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>		; CHECK-NEXT: [[A_8_VEC_EXPAND:%.*]] = shufflevector <2 x float> [[A_8_COPYLOAD]], <2 x float> poison, <4 x i32> <i32 undef, i32 undef, i32 0, i32 1>
; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> [[A_8_VEC_EXPAND]], <4 x float> [[A_4_VECBLEND]]		; CHECK-NEXT: [[A_8_VECBLEND:%.*]] = select <4 x i1> <i1 false, i1 false, i1 true, i1 true>, <4 x float> [[A_8_VEC_EXPAND]], <4 x float> [[A_4_VECBLEND]]
; CHECK-NEXT: [[A_12_COPYLOAD:%.]] = load float, ptr [[F:%.]], align 1		; CHECK-NEXT: [[A_12_COPYLOAD:%.]] = load float, ptr [[F:%.]], align 1
; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float [[A_12_COPYLOAD]], i32 3		; CHECK-NEXT: [[A_12_VEC_INSERT:%.*]] = insertelement <4 x float> [[A_8_VECBLEND]], float [[A_12_COPYLOAD]], i32 3
Show All 20 Lines	entry:
%ret = load <4 x float>, ptr %a		%ret = load <4 x float>, ptr %a

ret <4 x float> %ret		ret <4 x float> %ret
}		}

define i32 @PR14212(<3 x i8> %val) {		define i32 @PR14212(<3 x i8> %val) {
; CHECK-LABEL: @PR14212(		; CHECK-LABEL: @PR14212(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR4:%.\.fr.]] = freeze i8 undef
; CHECK-NEXT: [[TMP0:%.]] = bitcast <3 x i8> [[VAL:%.]] to i24		; CHECK-NEXT: [[TMP0:%.]] = bitcast <3 x i8> [[VAL:%.]] to i24
; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i8 undef to i32		; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_EXT:%.*]] = zext i8 [[FR4]] to i32
; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[RETVAL_SROA_2_0_INSERT_EXT]], 24		; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[RETVAL_SROA_2_0_INSERT_EXT]], 24
; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 16777215		; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 16777215
; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[RETVAL_SROA_2_0_INSERT_MASK]], [[RETVAL_SROA_2_0_INSERT_SHIFT]]		; CHECK-NEXT: [[RETVAL_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[RETVAL_SROA_2_0_INSERT_MASK]], [[RETVAL_SROA_2_0_INSERT_SHIFT]]
; CHECK-NEXT: [[RETVAL_0_INSERT_EXT:%.*]] = zext i24 [[TMP0]] to i32		; CHECK-NEXT: [[RETVAL_0_INSERT_EXT:%.*]] = zext i24 [[TMP0]] to i32
; CHECK-NEXT: [[RETVAL_0_INSERT_MASK:%.*]] = and i32 [[RETVAL_SROA_2_0_INSERT_INSERT]], -16777216		; CHECK-NEXT: [[RETVAL_0_INSERT_MASK:%.*]] = and i32 [[RETVAL_SROA_2_0_INSERT_INSERT]], -16777216
; CHECK-NEXT: [[RETVAL_0_INSERT_INSERT:%.*]] = or i32 [[RETVAL_0_INSERT_MASK]], [[RETVAL_0_INSERT_EXT]]		; CHECK-NEXT: [[RETVAL_0_INSERT_INSERT:%.*]] = or i32 [[RETVAL_0_INSERT_MASK]], [[RETVAL_0_INSERT_EXT]]
; CHECK-NEXT: ret i32 [[RETVAL_0_INSERT_INSERT]]		; CHECK-NEXT: ret i32 [[RETVAL_0_INSERT_INSERT]]
;		;
Show All 27 Lines	entry:
%vec = load <2 x i8>, ptr %a		%vec = load <2 x i8>, ptr %a

ret <2 x i8> %vec		ret <2 x i8> %vec
}		}

define i32 @PR14349.2(<2 x i8> %x) {		define i32 @PR14349.2(<2 x i8> %x) {
; CHECK-LABEL: @PR14349.2(		; CHECK-LABEL: @PR14349.2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
		; CHECK-NEXT: [[FR5:%.\.fr.]] = freeze i16 undef
; CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i8> [[X:%.]] to i16		; CHECK-NEXT: [[TMP0:%.]] = bitcast <2 x i8> [[X:%.]] to i16
; CHECK-NEXT: [[A_SROA_2_0_INSERT_EXT:%.*]] = zext i16 undef to i32		; CHECK-NEXT: [[A_SROA_2_0_INSERT_EXT:%.*]] = zext i16 [[FR5]] to i32
; CHECK-NEXT: [[A_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_2_0_INSERT_EXT]], 16		; CHECK-NEXT: [[A_SROA_2_0_INSERT_SHIFT:%.*]] = shl i32 [[A_SROA_2_0_INSERT_EXT]], 16
; CHECK-NEXT: [[A_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 65535		; CHECK-NEXT: [[A_SROA_2_0_INSERT_MASK:%.*]] = and i32 undef, 65535
; CHECK-NEXT: [[A_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_2_0_INSERT_MASK]], [[A_SROA_2_0_INSERT_SHIFT]]		; CHECK-NEXT: [[A_SROA_2_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_2_0_INSERT_MASK]], [[A_SROA_2_0_INSERT_SHIFT]]
; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[TMP0]] to i32		; CHECK-NEXT: [[A_SROA_0_0_INSERT_EXT:%.*]] = zext i16 [[TMP0]] to i32
; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_2_0_INSERT_INSERT]], -65536		; CHECK-NEXT: [[A_SROA_0_0_INSERT_MASK:%.*]] = and i32 [[A_SROA_2_0_INSERT_INSERT]], -65536
; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]		; CHECK-NEXT: [[A_SROA_0_0_INSERT_INSERT:%.*]] = or i32 [[A_SROA_0_0_INSERT_MASK]], [[A_SROA_0_0_INSERT_EXT]]
; CHECK-NEXT: ret i32 [[A_SROA_0_0_INSERT_INSERT]]		; CHECK-NEXT: ret i32 [[A_SROA_0_0_INSERT_INSERT]]
;		;
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	entry:
ret i32 %tmp4		ret i32 %tmp4
}		}

define <2 x i32> @test9(i32 %x, i32 %y) {		define <2 x i32> @test9(i32 %x, i32 %y) {
; Ensure that we can promote an alloca that doesn't mention a vector type based		; Ensure that we can promote an alloca that doesn't mention a vector type based
; on a single load with a vector type.		; on a single load with a vector type.
; CHECK-LABEL: @test9(		; CHECK-LABEL: @test9(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[A_SROA_0_0_VEC_INSERT:%.]] = insertelement <2 x i32> undef, i32 [[X:%.]], i32 0		; CHECK-NEXT: [[FR6:%.\.fr.]] = freeze <2 x i32> undef
		; CHECK-NEXT: [[A_SROA_0_0_VEC_INSERT:%.]] = insertelement <2 x i32> [[FR6]], i32 [[X:%.]], i32 0
; CHECK-NEXT: [[A_SROA_0_4_VEC_INSERT:%.]] = insertelement <2 x i32> [[A_SROA_0_0_VEC_INSERT]], i32 [[Y:%.]], i32 1		; CHECK-NEXT: [[A_SROA_0_4_VEC_INSERT:%.]] = insertelement <2 x i32> [[A_SROA_0_0_VEC_INSERT]], i32 [[Y:%.]], i32 1
; CHECK-NEXT: ret <2 x i32> [[A_SROA_0_4_VEC_INSERT]]		; CHECK-NEXT: ret <2 x i32> [[A_SROA_0_4_VEC_INSERT]]
;		;
entry:		entry:
%a = alloca i64		%a = alloca i64

store i32 %x, ptr %a		store i32 %x, ptr %a
%a.tmp2 = getelementptr inbounds i32, ptr %a, i64 1		%a.tmp2 = getelementptr inbounds i32, ptr %a, i64 1
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll.expected

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa < %s \| FileCheck %s

	define i64 @i64_test(i64 %i) nounwind readnone {			define i64 @i64_test(i64 %i) nounwind readnone {
	; CHECK-LABEL: i64_test:			; CHECK-LABEL: i64_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_add_i32_e32 v0, vcc, s4, v0
				; CHECK-NEXT: v_addc_u32_e32 v1, vcc, v1, v0, vcc
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i64			%loc = alloca i64
	%j = load i64, i64 * %loc			%j = load i64, i64 * %loc
	%r = add i64 %i, %j			%r = add i64 %i, %j
	ret i64 %r			ret i64 %r
	}			}

	define i64 @i32_test(i32 %i) nounwind readnone {			define i64 @i32_test(i32 %i) nounwind readnone {
	; CHECK-LABEL: i32_test:			; CHECK-LABEL: i32_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_add_i32_e32 v0, vcc, s4, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i32			%loc = alloca i32
	%j = load i32, i32 * %loc			%j = load i32, i32 * %loc
	%r = add i32 %i, %j			%r = add i32 %i, %j
	%ext = zext i32 %r to i64			%ext = zext i32 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i16_test(i16 %i) nounwind readnone {			define i64 @i16_test(i16 %i) nounwind readnone {
	; CHECK-LABEL: i16_test:			; CHECK-LABEL: i16_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_add_i32_e32 v0, vcc, s4, v0
				; CHECK-NEXT: v_and_b32_e32 v0, 0xffff, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i16			%loc = alloca i16
	%j = load i16, i16 * %loc			%j = load i16, i16 * %loc
	%r = add i16 %i, %j			%r = add i16 %i, %j
	%ext = zext i16 %r to i64			%ext = zext i16 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i8_test(i8 %i) nounwind readnone {			define i64 @i8_test(i8 %i) nounwind readnone {
	; CHECK-LABEL: i8_test:			; CHECK-LABEL: i8_test:
	; CHECK: ; %bb.0:			; CHECK: ; %bb.0:
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_mov_b32_e32 v0, 0			; CHECK-NEXT: v_add_i32_e32 v0, vcc, s4, v0
				; CHECK-NEXT: v_and_b32_e32 v0, 0xff, v0
	; CHECK-NEXT: v_mov_b32_e32 v1, 0			; CHECK-NEXT: v_mov_b32_e32 v1, 0
	; CHECK-NEXT: s_setpc_b64 s[30:31]			; CHECK-NEXT: s_setpc_b64 s[30:31]
	%loc = alloca i8			%loc = alloca i8
	%j = load i8, i8 * %loc			%j = load i8, i8 * %loc
	%r = add i8 %i, %j			%r = add i8 %i, %j
	%ext = zext i8 %r to i64			%ext = zext i8 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -stop-after=finalize-isel -debug-only=isel -o /dev/null %s 2>&1 \| FileCheck %s

	define i64 @i64_test(i64 %i) nounwind readnone {			define i64 @i64_test(i64 %i) nounwind readnone {
	; CHECK-LABEL: i64_test:			; CHECK-LABEL: i64_test:
	; CHECK: SelectionDAG has 9 nodes:			; CHECK: SelectionDAG has 19 nodes:
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t11: ch,glue = CopyToReg t0, Register:i32 $vgpr0, IMPLICIT_DEF:i32			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t17: i32 = V_MOV_B32_e32 TargetConstant:i32<0>			; CHECK-NEXT: t4: i32,ch = CopyFromReg # D:1 t0, Register:i32 %1
	; CHECK-NEXT: t13: ch,glue = CopyToReg t11, Register:i32 $vgpr1, t17, t11:1			; CHECK-NEXT: t31: i64 = REG_SEQUENCE # D:1 TargetConstant:i32<55>, t2, TargetConstant:i32<3>, t4, TargetConstant:i32<11>
	; CHECK-NEXT: t14: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t13, t13:1			; CHECK-NEXT: t7: i64 = COPY IMPLICIT_DEF:i64
				; CHECK-NEXT: t8: i64 = V_ADD_U64_PSEUDO # D:1 t31, t7
				; CHECK-NEXT: t21: i32 = EXTRACT_SUBREG # D:1 t8, TargetConstant:i32<3>
				; CHECK-NEXT: t14: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t21
				; CHECK-NEXT: t25: i32 = EXTRACT_SUBREG # D:1 t8, TargetConstant:i32<11>
				; CHECK-NEXT: t16: ch,glue = CopyToReg # D:1 t14, Register:i32 $vgpr1, t25, t14:1
				; CHECK-NEXT: t17: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t16, t16:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i64			%loc = alloca i64
	%j = load i64, i64 * %loc			%j = load i64, i64 * %loc
	%r = add i64 %i, %j			%r = add i64 %i, %j
	ret i64 %r			ret i64 %r
	}			}

	define i64 @i32_test(i32 %i) nounwind readnone {			define i64 @i32_test(i32 %i) nounwind readnone {
	; CHECK-LABEL: i32_test:			; CHECK-LABEL: i32_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 14 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t4: i32 = COPY IMPLICIT_DEF:i32
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t5: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t4, TargetConstant:i1<0>
				; CHECK-NEXT: t12: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t5
				; CHECK-NEXT: t20: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t14: ch,glue = CopyToReg # D:1 t12, Register:i32 $vgpr1, t20, t12:1
				; CHECK-NEXT: t15: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t14, t14:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i32			%loc = alloca i32
	%j = load i32, i32 * %loc			%j = load i32, i32 * %loc
	%r = add i32 %i, %j			%r = add i32 %i, %j
	%ext = zext i32 %r to i64			%ext = zext i32 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i16_test(i16 %i) nounwind readnone {			define i64 @i16_test(i16 %i) nounwind readnone {
	; CHECK-LABEL: i16_test:			; CHECK-LABEL: i16_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 17 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t18: i32 = COPY IMPLICIT_DEF:i32
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t19: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t18, TargetConstant:i1<0>
				; CHECK-NEXT: t28: i32 = S_MOV_B32 TargetConstant:i32<65535>
				; CHECK-NEXT: t29: i32 = V_AND_B32_e64 # D:1 t19, t28
				; CHECK-NEXT: t13: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t29
				; CHECK-NEXT: t38: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t15: ch,glue = CopyToReg # D:1 t13, Register:i32 $vgpr1, t38, t13:1
				; CHECK-NEXT: t16: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t15, t15:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i16			%loc = alloca i16
	%j = load i16, i16 * %loc			%j = load i16, i16 * %loc
	%r = add i16 %i, %j			%r = add i16 %i, %j
	%ext = zext i16 %r to i64			%ext = zext i16 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

	define i64 @i8_test(i8 %i) nounwind readnone {			define i64 @i8_test(i8 %i) nounwind readnone {
	; CHECK-LABEL: i8_test:			; CHECK-LABEL: i8_test:
	; CHECK: SelectionDAG has 8 nodes:			; CHECK: SelectionDAG has 17 nodes:
	; CHECK-NEXT: t5: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
	; CHECK-NEXT: t0: ch,glue = EntryToken			; CHECK-NEXT: t0: ch,glue = EntryToken
	; CHECK-NEXT: t7: ch,glue = CopyToReg t0, Register:i32 $vgpr0, t5			; CHECK-NEXT: t2: i32,ch = CopyFromReg # D:1 t0, Register:i32 %0
	; CHECK-NEXT: t9: ch,glue = CopyToReg t7, Register:i32 $vgpr1, t5, t7:1			; CHECK-NEXT: t18: i32 = COPY IMPLICIT_DEF:i32
	; CHECK-NEXT: t10: ch = SI_RETURN Register:i32 $vgpr0, Register:i32 $vgpr1, t9, t9:1			; CHECK-NEXT: t19: i32,i1 = V_ADD_CO_U32_e64 # D:1 t2, t18, TargetConstant:i1<0>
				; CHECK-NEXT: t28: i32 = S_MOV_B32 TargetConstant:i32<255>
				; CHECK-NEXT: t29: i32 = V_AND_B32_e64 # D:1 t19, t28
				; CHECK-NEXT: t13: ch,glue = CopyToReg # D:1 t0, Register:i32 $vgpr0, t29
				; CHECK-NEXT: t38: i32 = V_MOV_B32_e32 TargetConstant:i32<0>
				; CHECK-NEXT: t15: ch,glue = CopyToReg # D:1 t13, Register:i32 $vgpr1, t38, t13:1
				; CHECK-NEXT: t16: ch = SI_RETURN # D:1 Register:i32 $vgpr0, Register:i32 $vgpr1, t15, t15:1
	; CHECK-EMPTY:			; CHECK-EMPTY:
	%loc = alloca i8			%loc = alloca i8
	%j = load i8, i8 * %loc			%j = load i8, i8 * %loc
	%r = add i8 %i, %j			%r = add i8 %i, %j
	%ext = zext i8 %r to i64			%ext = zext i8 %r to i64
	ret i64 %ext			ret i64 %ext
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

SROA should freeze undefs for loads with no prior storesAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 469003

clang/test/CodeGen/LoongArch/inline-asm-gcc-regs.c

clang/test/CodeGenCXX/return.cpp

clang/test/CodeGenOpenCL/overload.cl

llvm/lib/Transforms/Utils/PromoteMemoryToRegister.cpp

llvm/test/CodeGen/AMDGPU/promote-alloca-vector-to-vector.ll

llvm/test/CodeGen/AMDGPU/vector-alloca-limits.ll

llvm/test/Transforms/Mem2Reg/pr24179.ll

llvm/test/Transforms/Mem2Reg/preserve-nonnull-load-metadata.ll

llvm/test/Transforms/PhaseOrdering/X86/nancvt.ll

llvm/test/Transforms/SROA/address-spaces.ll

llvm/test/Transforms/SROA/addrspacecast.ll

llvm/test/Transforms/SROA/alloca-address-space.ll

llvm/test/Transforms/SROA/basictest.ll

llvm/test/Transforms/SROA/phi-and-select.ll

llvm/test/Transforms/SROA/phi-gep.ll

llvm/test/Transforms/SROA/phi-with-duplicate-pred.ll

llvm/test/Transforms/SROA/pr37267.ll

llvm/test/Transforms/SROA/same-promoted-undefs.ll

llvm/test/Transforms/SROA/scalable-vectors.ll

llvm/test/Transforms/SROA/select-load.ll

llvm/test/Transforms/SROA/slice-width.ll

llvm/test/Transforms/SROA/sroa-common-type-fail-promotion.ll

llvm/test/Transforms/SROA/vector-conversion.ll

llvm/test/Transforms/SROA/vector-promotion.ll

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_asm.ll.expected

llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu_isel.ll.expected

SROA should freeze undefs for loads with no prior stores
AbandonedPublic