This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
GlobalOpt.cpp
-
test/Transforms/GlobalOpt/
-
Transforms/
-
GlobalOpt/
-
sra-many-stores.ll

Differential D129525

[GlobalOpt] Drop SRA split limit for struct types.
AbandonedPublic

Authored by fhahn on Jul 11 2022, 5:26 PM.

Download Raw Diff

Details

Reviewers

reames
nikic

Summary

4796b4ae7bccc7 limited the SRA to 16 types for all globals, while the
code previously did not have a similar limit for globals with
struct-typed initializers AFAICT.

This is causing notable size regressions for some large workloads. This
patch skips the size check for struct types, similar to code before
4796b4ae7bccc7 to recover the regression.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jul 11 2022, 5:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 5:26 PM

Herald added subscribers: ormris, hiraditya. · View Herald Transcript

fhahn requested review of this revision.Jul 11 2022, 5:26 PM

Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2022, 5:26 PM

fhahn mentioned this in D117223: [GlobalOpt] Make global SRA offset based.Jul 11 2022, 5:27 PM

Harbormaster completed remote builds in B174771: Diff 443792.Jul 11 2022, 6:19 PM

This will also bypass the check if you have an array that is nested inside a struct. Besides, I don't think we should be making decisions based on the global value type here, which is a another vacuous concept that shouldn't exist (e.g. in rust the global type will often be something like [N x i8], in "bag of bytes" representation).

Does raising the limit to the next power of two cover your test case?

This revision now requires changes to proceed.Jul 12 2022, 12:29 AM

fhahn mentioned this in rG2d5d6c343b8d: [GlobalOpt] Add more tests with large number of stores to globals..Jul 13 2022, 11:13 AM

In D129525#3644388, @nikic wrote:

This will also bypass the check if you have an array that is nested inside a struct. Besides, I don't think we should be making decisions based on the global value type here, which is a another vacuous concept that shouldn't exist (e.g. in rust the global type will often be something like [N x i8], in "bag of bytes" representation).

Right , I think this should be similar to the previous cut-off for struct with arrays inside, e.g.: https://llvm.godbolt.org/z/crje3aYzW . I added additional tests with a struct containing an array and a plain array.

I agree that the type shouldn't really matter. But I think changing the way the cut-off is handled should be done independently of the change to use an offset-based approach.

Does raising the limit to the next power of two cover your test case?

Unfortunately the struct has stores to 100+ fields in the workload.

@fhahn I think your last update is missing the code changes :)

I assume that you don't actually care about SRA as such, but rather about some follow-on transforms it enables. I've been thinking that we might want to make GlobalOpt in general work on the SRAd representation internally, which might remove the need to actually materialize it in some cases. So if some parts of the global are never written, it will certainly always be profitable to replace them with (part of) the initializer, regardless of how many parts there are. The limit is only relevant for cases where we actually do leave behind the globals. Do you know which optimization in particular is important for your workload?

In D129525#3649361, @nikic wrote:

@fhahn I think your last update is missing the code changes :)

I didn't make any code changes to the original version and it looks like the original code change is still there. Am I missing something? :)

I assume that you don't actually care about SRA as such, but rather about some follow-on transforms it enables. I've been thinking that we might want to make GlobalOpt in general work on the SRAd representation internally, which might remove the need to actually materialize it in some cases. So if some parts of the global are never written, it will certainly always be profitable to replace them with (part of) the initializer, regardless of how many parts there are. The limit is only relevant for cases where we actually do leave behind the globals. Do you know which optimization in particular is important for your workload?

Yeah, ideally SRA would not materialize each transform it applies straight away, e.g. for the test cases the split up globals will get removed after materializing them.

I think the main regression comes from SRA not removing stores of function pointers that are never read, which in turn prevents LLVM from removing those functions.

In D129525#3649499, @fhahn wrote:

In D129525#3649361, @nikic wrote:

@fhahn I think your last update is missing the code changes :)

I didn't make any code changes to the original version and it looks like the original code change is still there. Am I missing something? :)

Oh sorry, I misread your comment. I thought you wanted to add a check for nested arrays, but you were only referring to new tests.

I think the main regression comes from SRA not removing stores of function pointers that are never read, which in turn prevents LLVM from removing those functions.

Okay, that's a particularly simple case. It's probably worthwhile to give skipping parts that are only stored a try, as we expect them to be dropped (it's not strictly guaranteed due to leak checker roots, but close enough). I can take a look.

Harbormaster completed remote builds in B175200: Diff 444366.Jul 13 2022, 3:04 PM

nikic mentioned this in D129857: [GlobalOpt] Ignore only loaded / only stored global parts in global SRA heuristic.Jul 15 2022, 7:45 AM

Just a rebase on top of a test added in 91e67c074922cc667fa1c43fc1f01acb96faa0f9 that can't be handled by D129857.

Harbormaster completed remote builds in B184800: Diff 457563.Sep 2 2022, 5:32 AM

ping :)

Ping, any additional thoughts on how to best resolve the regression soon?

Harbormaster completed remote builds in B191677: Diff 467054.Oct 12 2022, 2:07 AM

I've updated D129857 to handle the new case -- I'd still generally prefer not to special case struct types if we can avoid it.

We should be able to handle all cases with D129857 + follow-on changes. let's go down that route!

Herald added a subscriber: StephenFan. · View Herald TranscriptFeb 21 2023, 5:37 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

GlobalOpt.cpp

4 lines

test/

Transforms/

GlobalOpt/

sra-many-stores.ll

89 lines

Diff 467054

llvm/lib/Transforms/IPO/GlobalOpt.cpp

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	static GlobalVariable SRAGlobal(GlobalVariable GV, const DataLayout &DL) {
DenseMap<uint64_t, Type *> Types;		DenseMap<uint64_t, Type *> Types;
if (!collectSRATypes(Types, GV, DL) \|\| Types.empty())		if (!collectSRATypes(Types, GV, DL) \|\| Types.empty())
return nullptr;		return nullptr;

// Make sure we don't SRA back to the same type.		// Make sure we don't SRA back to the same type.
if (Types.size() == 1 && Types.begin()->second == GV->getValueType())		if (Types.size() == 1 && Types.begin()->second == GV->getValueType())
return nullptr;		return nullptr;

		Constant *OrigInit = GV->getInitializer();
// Don't perform SRA if we would have to split into many globals.		// Don't perform SRA if we would have to split into many globals.
if (Types.size() > 16)		if (!isa<StructType>(OrigInit->getType()) && Types.size() > 16)
return nullptr;		return nullptr;

// Sort by offset.		// Sort by offset.
SmallVector<std::pair<uint64_t, Type *>, 16> TypesVector;		SmallVector<std::pair<uint64_t, Type *>, 16> TypesVector;
append_range(TypesVector, Types);		append_range(TypesVector, Types);
sort(TypesVector, llvm::less_first());		sort(TypesVector, llvm::less_first());

// Check that the types are non-overlapping.		// Check that the types are non-overlapping.
uint64_t Offset = 0;		uint64_t Offset = 0;
for (const auto &Pair : TypesVector) {		for (const auto &Pair : TypesVector) {
// Overlaps with previous type.		// Overlaps with previous type.
if (Pair.first < Offset)		if (Pair.first < Offset)
return nullptr;		return nullptr;

Offset = Pair.first + DL.getTypeAllocSize(Pair.second);		Offset = Pair.first + DL.getTypeAllocSize(Pair.second);
}		}

// Some accesses go beyond the end of the global, don't bother.		// Some accesses go beyond the end of the global, don't bother.
if (Offset > DL.getTypeAllocSize(GV->getValueType()))		if (Offset > DL.getTypeAllocSize(GV->getValueType()))
return nullptr;		return nullptr;

// Collect initializers for new globals.		// Collect initializers for new globals.
Constant *OrigInit = GV->getInitializer();
DenseMap<uint64_t, Constant *> Initializers;		DenseMap<uint64_t, Constant *> Initializers;
for (const auto &Pair : Types) {		for (const auto &Pair : Types) {
Constant *NewInit = ConstantFoldLoadFromConst(OrigInit, Pair.second,		Constant *NewInit = ConstantFoldLoadFromConst(OrigInit, Pair.second,
APInt(64, Pair.first), DL);		APInt(64, Pair.first), DL);
if (!NewInit) {		if (!NewInit) {
LLVM_DEBUG(dbgs() << "Global SRA: Failed to evaluate initializer of "		LLVM_DEBUG(dbgs() << "Global SRA: Failed to evaluate initializer of "
<< GV << " with type " << Pair.second << " at offset "		<< GV << " with type " << Pair.second << " at offset "
<< Pair.first << "\n");		<< Pair.first << "\n");
▲ Show 20 Lines • Show All 2,124 Lines • Show Last 20 Lines

llvm/test/Transforms/GlobalOpt/sra-many-stores.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
	; RUN: opt -passes=globalopt -S %s \| FileCheck %s			; RUN: opt -passes=globalopt -S %s \| FileCheck %s

	%struct.widget = type { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }			%struct.widget = type { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }

	@global = internal global %struct.widget zeroinitializer			@global = internal global %struct.widget zeroinitializer


	;.			;.
	; CHECK: @[[GLOBAL:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global [[STRUCT_WIDGET:%.*]] zeroinitializer
	; CHECK: @[[GLOBAL_ARRAY_IN_STRUCT:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global [[STRUCT_WITH_ARRAY:%.*]] zeroinitializer
	; CHECK: @[[GLOBAL_ARRAY:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global [100 x i64] zeroinitializer			; CHECK: @[[GLOBAL_ARRAY:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global [100 x i64] zeroinitializer
	; CHECK: @[[A:[a-zA-Z0-9_$"\\.-]+]] = global i8 0, align 4			; CHECK: @[[A:[a-zA-Z0-9_$"\\.-]+]] = global i8 0, align 4
	; CHECK: @[[B:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global [[STRUCT_20I8:%.*]] { i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4 }
	;.			;.
	define internal void @read_struct() {			define internal void @read_struct() {
	; CHECK-LABEL: @read_struct(			; CHECK-LABEL: @read_struct(
	; CHECK-NEXT: [[TMP:%.]] = load ptr, ptr getelementptr inbounds ([[STRUCT_WIDGET:%.]], ptr @global, i64 0, i32 16), align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp = load ptr, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 16), align 8			%tmp = load ptr, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 16), align 8
	ret void			ret void
	}			}

	define void @write_struct() {			define void @write_struct() {
	; CHECK-LABEL: @write_struct(			; CHECK-LABEL: @write_struct(
	; CHECK-NEXT: store ptr null, ptr @global, align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET:%.*]], ptr @global, i64 0, i32 1), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 2), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 3), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 4), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 5), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 6), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 7), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 8), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 9), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 10), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 11), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 12), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 13), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 14), align 8
	; CHECK-NEXT: store ptr null, ptr getelementptr inbounds ([[STRUCT_WIDGET]], ptr @global, i64 0, i32 15), align 8
	; CHECK-NEXT: tail call fastcc void @read_struct()			; CHECK-NEXT: tail call fastcc void @read_struct()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 0), align 8			store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 0), align 8
	store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 1), align 8			store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 1), align 8
	store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 2), align 8			store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 2), align 8
	store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 3), align 8			store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 3), align 8
	store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 4), align 8			store ptr null, ptr getelementptr inbounds (%struct.widget, ptr @global, i64 0, i32 4), align 8
	Show All 14 Lines


	%struct.with.array = type { [100 x i64], i64 }			%struct.with.array = type { [100 x i64], i64 }

	@global.array_in_struct = internal global %struct.with.array zeroinitializer			@global.array_in_struct = internal global %struct.with.array zeroinitializer

	define internal void @read_non_array_field() {			define internal void @read_non_array_field() {
	; CHECK-LABEL: @read_non_array_field(			; CHECK-LABEL: @read_non_array_field(
	; CHECK-NEXT: [[TMP:%.]] = load i64, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY:%.]], ptr @global.array_in_struct, i64 0, i32 1), align 8
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%tmp = load i64, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 1), align 8			%tmp = load i64, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 1), align 8
	ret void			ret void
	}			}

	define void @store_to_struct_array() {			define void @store_to_struct_array() {
	; CHECK-LABEL: @store_to_struct_array(			; CHECK-LABEL: @store_to_struct_array(
	; CHECK-NEXT: store i64 0, ptr @global.array_in_struct, align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY:%.*]], ptr @global.array_in_struct, i64 0, i32 0, i32 1), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 2), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 3), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 4), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 5), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 6), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 7), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 8), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 9), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 10), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 11), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 12), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 13), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 14), align 8
	; CHECK-NEXT: store i64 0, ptr getelementptr inbounds ([[STRUCT_WITH_ARRAY]], ptr @global.array_in_struct, i64 0, i32 0, i32 15), align 8
	; CHECK-NEXT: tail call fastcc void @read_non_array_field()			; CHECK-NEXT: tail call fastcc void @read_non_array_field()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 0), align 8			store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 0), align 8
	store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 1), align 8			store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 1), align 8
	store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 2), align 8			store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 2), align 8
	store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 3), align 8			store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 3), align 8
	store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 4), align 8			store i64 0, ptr getelementptr inbounds (%struct.with.array, ptr @global.array_in_struct, i64 0, i32 0, i32 4), align 8
	▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines


	%struct.20i8 = type { i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8 }			%struct.20i8 = type { i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8 }
	@a = global i8 0, align 4			@a = global i8 0, align 4
	@b = internal global %struct.20i8 { i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4 }			@b = internal global %struct.20i8 { i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 5, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4, i8 4 }

	define void @test_single_write_to_global_b() {			define void @test_single_write_to_global_b() {
	; CHECK-LABEL: @test_single_write_to_global_b(			; CHECK-LABEL: @test_single_write_to_global_b(
	; CHECK-NEXT: store i8 0, ptr getelementptr inbounds ([[STRUCT_20I8:%.*]], ptr @b, i64 0, i32 1), align 1			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I3:%.*]] = load i8, ptr @b, align 16			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I3]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 5, ptr @a, align 4
	; CHECK-NEXT: [[I4:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 2), align 2			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I4]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I5:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 3), align 1			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I5]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I6:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 4), align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I6]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 5, ptr @a, align 4
	; CHECK-NEXT: [[I7:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 5), align 1			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I7]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I8:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 7), align 1			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I8]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I9:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 8), align 8			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I9]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I10:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 9), align 1			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: store volatile i8 [[I10]], ptr @a, align 4			; CHECK-NEXT: store volatile i8 4, ptr @a, align 4
	; CHECK-NEXT: [[I11:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 10), align 2
	; CHECK-NEXT: store volatile i8 [[I11]], ptr @a, align 4
	; CHECK-NEXT: [[I12:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 11), align 1
	; CHECK-NEXT: store volatile i8 [[I12]], ptr @a, align 4
	; CHECK-NEXT: [[I13:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 12), align 4
	; CHECK-NEXT: store volatile i8 [[I13]], ptr @a, align 4
	; CHECK-NEXT: [[I14:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 13), align 1
	; CHECK-NEXT: store volatile i8 [[I14]], ptr @a, align 4
	; CHECK-NEXT: [[I15:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 14), align 2
	; CHECK-NEXT: store volatile i8 [[I15]], ptr @a, align 4
	; CHECK-NEXT: [[I16:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 15), align 1
	; CHECK-NEXT: store volatile i8 [[I16]], ptr @a, align 4
	; CHECK-NEXT: [[I17:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 16), align 16
	; CHECK-NEXT: store volatile i8 [[I17]], ptr @a, align 4
	; CHECK-NEXT: [[I18:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 17), align 1
	; CHECK-NEXT: store volatile i8 [[I18]], ptr @a, align 4
	; CHECK-NEXT: [[I19:%.*]] = load i8, ptr getelementptr inbounds ([[STRUCT_20I8]], ptr @b, i64 0, i32 18), align 2
	; CHECK-NEXT: store volatile i8 [[I19]], ptr @a, align 4
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	store i8 0, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 1), align 1			store i8 0, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 1), align 1
	%i3 = load i8, ptr @b, align 16			%i3 = load i8, ptr @b, align 16
	store volatile i8 %i3, ptr @a, align 4			store volatile i8 %i3, ptr @a, align 4
	%i4 = load i8, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 2), align 2			%i4 = load i8, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 2), align 2
	store volatile i8 %i4, ptr @a, align 4			store volatile i8 %i4, ptr @a, align 4
	%i5 = load i8, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 3), align 1			%i5 = load i8, ptr getelementptr inbounds (%struct.20i8, ptr @b, i64 0, i32 3), align 1
	Show All 31 Lines