This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] recompute alignments for loads and stores of updated globals
ClosedPublic

Authored by spatel on May 15 2021, 5:44 AM.

Download Raw Diff

Details

Reviewers

MaskRay
aeubanks
lebedev.ri
fhahn
efriedma

Commits

rGf34311c4024d: [GlobalOpt] recompute alignments for loads and stores of updated globals

Summary

GlobalOpt can slice structs/arrays and change GEPs in the process, but it was not updating alignments for load/store users. This eventually causes the crashing seen in:
https://llvm.org/PR49661
https://llvm.org/PR50253

On x86, this required SLP+codegen to create an aligned vector store on an invalid address. The bugs would be easier to demonstrate on a target with stricter alignment requirements.

This is my first time looking at this pass, so I'm not sure if this is a complete solution. The alignment updating code is adapted from InstCombine, so I assume that part is tested and good.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

spatel created this revision.May 15 2021, 5:44 AM

Herald added subscribers: pengfei, hiraditya, mcrosier. · View Herald TranscriptMay 15 2021, 5:44 AM

spatel requested review of this revision.May 15 2021, 5:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2021, 5:44 AM

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

Otherwise, aren't we going to loose overalignment of the aggregate scalars?

In D102552#2761376, @lebedev.ri wrote:

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

I think we are already correctly finding the alignment of the updated global itself (see inline code comment). But I'm not sure how that can propagate to potentially stale alignment specifiers on the users other than what I proposed here.

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

Let me know if I'm not seeing it correctly (not too familiar with this!).

llvm/lib/Transforms/IPO/GlobalOpt.cpp
555–557	Alignment of the new global is updated here.
llvm/test/Transforms/GlobalOpt/globalsra-align.ll
40–41	This is accessing element 8 (7 + 1) of the 16-byte aligned global with 4-byte elements, so "align 16" is correct and the best that it can be for the original code, but that's wrong after we strip away the outer array and only have [7 x i32*]. This is the test that most closely models what is happening in the bug reports.

In D102552#2761382, @spatel wrote:

In D102552#2761376, @lebedev.ri wrote:

I think the bug is the other way around. For each scalar we are going to split off of aggregate,
we should first determine the alignment for scalar by determining what largest legal alignment
could it had as part of the aggregate (based on the alignment of the outer type, and offset).
Then, we shouldn't need to update uses, because by then we didn't change the alignment,
and if any use overestimated it, then it is, and was, UB.

I think we are already correctly finding the alignment of the updated global itself (see inline code comment). But I'm not sure how that can propagate to potentially stale alignment specifiers on the users other than what I proposed here.

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

Let me know if I'm not seeing it correctly (not too familiar with this!).

Harbormaster completed remote builds in B104650: Diff 345629.May 15 2021, 6:52 AM

In D102552#2761398, @lebedev.ri wrote:

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

I don't think it is possible to overalign the global / base pointer to make this work. For example, take the motivating case:

@a = global [2 x [7 x i32*]] align 16

Assume 32-bit alignment for each pointer element in that array. We have (7 + 1) * 4 = 32-bytes ahead of the 8th element, so the original access to element [1][1] is guaranteed to be "align 16" bytes.
Now we strip off the outer array specifier:

@a0 = global [7 x i32*] align ??

What over-alignment of the base pointer can we use to make the access of element [1] of this new array continue to have "align 16"?
Do we want to use padding to make this work? Not sure how to specify that.

In D102552#2763262, @spatel wrote:
In D102552#2761398, @lebedev.ri wrote:

The problem is that if we have an alignment that's known better than the minimum (and that alignment is correct, not UB, for the original code), it may not hold for the new inner type (see inline test comment).

That's precisely my point. Isn't it a bug that we reduce the alignment? Shouldn't we instead overalign the split-off scalar to the maximal non-UB alignment requested by the uses?

I don't think it is possible to overalign the global / base pointer to make this work. For example, take the motivating case:
@a = global [2 x [7 x i32*]] align 16
Assume 32-bit alignment for each pointer element in that array. We have (7 + 1) * 4 = 32-bytes ahead of the 8th element, so the original access to element [1][1] is guaranteed to be "align 16" bytes.
Now we strip off the outer array specifier:
@a0 = global [7 x i32*] align ??
What over-alignment of the base pointer can we use to make the access of element [1] of this new array continue to have "align 16"?
Do we want to use padding to make this work? Not sure how to specify that.

Thank you for spelling this out.
Indeed, if we want to align an offset element, then we indeed need padding.
I'm conflicted here.

Thinking about it more, i guess not having an obvious miscompile would be a good starting point.
We can deal with padding, if need be, later.
Thanks.

This revision is now accepted and ready to land.May 19 2021, 7:06 AM

In D102552#2768607, @lebedev.ri wrote:

Thinking about it more, i guess not having an obvious miscompile would be a good starting point.
We can deal with padding, if need be, later.

Thanks! I have no sense of whether this is a common/impactful transform, but yes, the fuzzers found this hole in the logic using a simple C program, so we should get it fixed.

Looks great!

llvm/test/Transforms/GlobalOpt/globalsra-align.ll

Consider adding a load test.

define i32* @reduce_align_2(i32* %y) {
; CHECK-LABEL: @reduce_align_2(
; CHECK-NEXT:    store i32* null, i32** getelementptr inbounds ([7 x i32*], [7 x i32*]* @a.1, i32 0, i64 2), align 8
; CHECK-NEXT:    ret i32* null
;
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 0), align 1
  store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 1, i64 2), align 4
  ret i32* %x
}

define i32* @reduce_align_3(i32* %y) {
; CHECK-LABEL: @reduce_align_3(
; CHECK-NEXT:    store i32* null, i32** getelementptr inbounds ([7 x i32*], [7 x i32*]* @a.1, i32 0, i64 3), align 4
; CHECK-NEXT:    ret i32* null
;
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 0), align 1
  store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 1, i64 3), align 8
  ret i32* %x
}

define i32* @reduce_align_4() {
  %x = load i32*, i32** getelementptr inbounds ([2 x [7 x i32*]], [2 x [7 x i32*]]* @a, i64 0, i64 0, i64 2), align 1
  ret i32* %x
}

spatel added inline comments.May 20 2021, 8:22 AM

llvm/test/Transforms/GlobalOpt/globalsra-align.ll
65	I'll add "externally_initialized" to the global declaration and adjust the tests. The existing tests have loads, but they are all getting folded to null.

Closed by commit rGf34311c4024d: [GlobalOpt] recompute alignments for loads and stores of updated globals (authored by spatel). · Explain WhyMay 20 2021, 9:12 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rGf34311c4024d: [GlobalOpt] recompute alignments for loads and stores of updated globals.

spatel mentioned this in rGee4055cf23e7: [GlobalOpt] adjust test to show load problems; NFC.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

GlobalOpt.cpp

17 lines

test/

Transforms/

GlobalOpt/

externally-initialized-global-ctr.ll

8 lines

globalsra-align.ll

25 lines

Diff 346761

llvm/lib/Transforms/IPO/GlobalOpt.cpp

Show First 20 Lines • Show All 546 Lines • ▼ Show 20 Lines	GlobalVariable *NGV = new GlobalVariable(
GV->getType()->getAddressSpace());		GV->getType()->getAddressSpace());
NGV->setExternallyInitialized(GV->isExternallyInitialized());		NGV->setExternallyInitialized(GV->isExternallyInitialized());
NGV->copyAttributesFrom(GV);		NGV->copyAttributesFrom(GV);
NewGlobals.insert(std::make_pair(ElementIdx, NGV));		NewGlobals.insert(std::make_pair(ElementIdx, NGV));

if (StructType *STy = dyn_cast<StructType>(Ty)) {		if (StructType *STy = dyn_cast<StructType>(Ty)) {
const StructLayout &Layout = *DL.getStructLayout(STy);		const StructLayout &Layout = *DL.getStructLayout(STy);

// Calculate the known alignment of the field. If the original aggregate		// Calculate the known alignment of the field. If the original aggregate
// had 256 byte alignment for example, something might depend on that:		// had 256 byte alignment for example, something might depend on that:
// propagate info to each field.		// propagate info to each field.
		spatelAuthorUnsubmitted Done Reply Inline Actions Alignment of the new global is updated here. spatel: Alignment of the new global is updated here.
uint64_t FieldOffset = Layout.getElementOffset(ElementIdx);		uint64_t FieldOffset = Layout.getElementOffset(ElementIdx);
Align NewAlign = commonAlignment(StartAlignment, FieldOffset);		Align NewAlign = commonAlignment(StartAlignment, FieldOffset);
if (NewAlign > DL.getABITypeAlign(STy->getElementType(ElementIdx)))		if (NewAlign > DL.getABITypeAlign(STy->getElementType(ElementIdx)))
NGV->setAlignment(NewAlign);		NGV->setAlignment(NewAlign);

// Copy over the debug info for the variable.		// Copy over the debug info for the variable.
uint64_t Size = DL.getTypeAllocSizeInBits(NGV->getValueType());		uint64_t Size = DL.getTypeAllocSizeInBits(NGV->getValueType());
uint64_t FragmentOffsetInBits = Layout.getElementOffsetInBits(ElementIdx);		uint64_t FragmentOffsetInBits = Layout.getElementOffsetInBits(ElementIdx);
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	if (GEP->getNumOperands() > 3) {
Idxs.push_back(GEPI->getOperand(i));		Idxs.push_back(GEPI->getOperand(i));
NewPtr = GetElementPtrInst::Create(		NewPtr = GetElementPtrInst::Create(
NewTy, NewPtr, Idxs, GEPI->getName() + "." + Twine(ElementIdx),		NewTy, NewPtr, Idxs, GEPI->getName() + "." + Twine(ElementIdx),
GEPI);		GEPI);
}		}
}		}
GEP->replaceAllUsesWith(NewPtr);		GEP->replaceAllUsesWith(NewPtr);

		// We changed the pointer of any memory access user. Recalculate alignments.
		for (User *U : NewPtr->users()) {
		if (auto *Load = dyn_cast<LoadInst>(U)) {
		Align PrefAlign = DL.getPrefTypeAlign(Load->getType());
		Align NewAlign = getOrEnforceKnownAlignment(Load->getPointerOperand(),
		PrefAlign, DL, Load);
		Load->setAlignment(NewAlign);
		}
		if (auto *Store = dyn_cast<StoreInst>(U)) {
		Align PrefAlign =
		DL.getPrefTypeAlign(Store->getValueOperand()->getType());
		Align NewAlign = getOrEnforceKnownAlignment(Store->getPointerOperand(),
		PrefAlign, DL, Store);
		Store->setAlignment(NewAlign);
		}
		}

if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(GEP))		if (GetElementPtrInst *GEPI = dyn_cast<GetElementPtrInst>(GEP))
GEPI->eraseFromParent();		GEPI->eraseFromParent();
else		else
cast<ConstantExpr>(GEP)->destroyConstant();		cast<ConstantExpr>(GEP)->destroyConstant();
}		}

// Delete the old global, now that it is dead.		// Delete the old global, now that it is dead.
Globals.erase(GV);		Globals.erase(GV);
▲ Show 20 Lines • Show All 2,049 Lines • Show Last 20 Lines

llvm/test/Transforms/GlobalOpt/externally-initialized-global-ctr.ll

	Show All 11 Lines
	@_ZL14buttonInitData = internal global [1 x %struct.ButtonInitData] zeroinitializer, align 4			@_ZL14buttonInitData = internal global [1 x %struct.ButtonInitData] zeroinitializer, align 4

	@"\01L_OBJC_METH_VAR_NAME_40" = internal global [7 x i8] c"print:\00", section "__TEXT,__objc_methname,cstring_literals", align 1			@"\01L_OBJC_METH_VAR_NAME_40" = internal global [7 x i8] c"print:\00", section "__TEXT,__objc_methname,cstring_literals", align 1
	@"\01L_OBJC_SELECTOR_REFERENCES_41" = internal externally_initialized global i8* getelementptr inbounds ([7 x i8], [7 x i8]* @"\01L_OBJC_METH_VAR_NAME_40", i32 0, i32 0), section "__DATA, __objc_selrefs, literal_pointers, no_dead_strip"			@"\01L_OBJC_SELECTOR_REFERENCES_41" = internal externally_initialized global i8* getelementptr inbounds ([7 x i8], [7 x i8]* @"\01L_OBJC_METH_VAR_NAME_40", i32 0, i32 0), section "__DATA, __objc_selrefs, literal_pointers, no_dead_strip"

	@llvm.global_ctors = appending global [1 x { i32, void (), i8 }] [{ i32, void (), i8 } { i32 65535, void ()* @_GLOBAL__I_a, i8* null }]			@llvm.global_ctors = appending global [1 x { i32, void (), i8 }] [{ i32, void (), i8 } { i32 65535, void ()* @_GLOBAL__I_a, i8* null }]
	@llvm.used = appending global [2 x i8] [i8 getelementptr inbounds ([7 x i8], [7 x i8]* @"\01L_OBJC_METH_VAR_NAME_40", i32 0, i32 0), i8* bitcast (i8** @"\01L_OBJC_SELECTOR_REFERENCES_41" to i8*)]			@llvm.used = appending global [2 x i8] [i8 getelementptr inbounds ([7 x i8], [7 x i8]* @"\01L_OBJC_METH_VAR_NAME_40", i32 0, i32 0), i8* bitcast (i8** @"\01L_OBJC_SELECTOR_REFERENCES_41" to i8*)]

	; CHECK: @[[_ZL14BUTTONINITDATA_0_0:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global i8* null, align 4			; Choose the preferred alignment.

				; CHECK: @[[_ZL14BUTTONINITDATA_0_0:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr global i8* null, align 16
	;.			;.
	define internal void @__cxx_global_var_init() section "__TEXT,__StaticInit,regular,pure_instructions" {			define internal void @__cxx_global_var_init() section "__TEXT,__StaticInit,regular,pure_instructions" {
	%1 = load i8, i8* @"\01L_OBJC_SELECTOR_REFERENCES_41", !invariant.load !2009			%1 = load i8, i8* @"\01L_OBJC_SELECTOR_REFERENCES_41", !invariant.load !2009
	store i8* %1, i8** getelementptr inbounds ([1 x %struct.ButtonInitData], [1 x %struct.ButtonInitData]* @_ZL14buttonInitData, i32 0, i32 0, i32 0), align 4			store i8* %1, i8** getelementptr inbounds ([1 x %struct.ButtonInitData], [1 x %struct.ButtonInitData]* @_ZL14buttonInitData, i32 0, i32 0, i32 0), align 4
	ret void			ret void
	}			}

	define internal void @_GLOBAL__I_a() section "__TEXT,__StaticInit,regular,pure_instructions" {			define internal void @_GLOBAL__I_a() section "__TEXT,__StaticInit,regular,pure_instructions" {
	call void @__cxx_global_var_init()			call void @__cxx_global_var_init()
	ret void			ret void
	}			}

	declare void @test(i8*)			declare void @test(i8*)

				; The preferred alignment is available.

	define void @print() {			define void @print() {
	; CHECK-LABEL: @print(			; CHECK-LABEL: @print(
	; CHECK-NEXT: [[TMP1:%.]] = load i8, i8** @_ZL14buttonInitData.0.0, align 4			; CHECK-NEXT: [[TMP1:%.]] = load i8, i8** @_ZL14buttonInitData.0.0, align 16
	; CHECK-NEXT: call void @test(i8* [[TMP1]])			; CHECK-NEXT: call void @test(i8* [[TMP1]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = load i8, i8* getelementptr inbounds ([1 x %struct.ButtonInitData], [1 x %struct.ButtonInitData]* @_ZL14buttonInitData, i32 0, i32 0, i32 0), align 4			%1 = load i8, i8* getelementptr inbounds ([1 x %struct.ButtonInitData], [1 x %struct.ButtonInitData]* @_ZL14buttonInitData, i32 0, i32 0, i32 0), align 4
	call void @test(i8* %1)			call void @test(i8* %1)
	ret void			ret void
	}			}

	!2009 = !{}			!2009 = !{}

llvm/test/Transforms/GlobalOpt/globalsra-align.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --check-globals
	; RUN: opt < %s -globalopt -S \| FileCheck %s			; RUN: opt < %s -globalopt -S \| FileCheck %s

	target datalayout = "p:16:32:64" ; 16-bit pointers with 32-bit ABI alignment and 64-bit preferred alignmentt			target datalayout = "p:16:32:64" ; 16-bit pointers with 32-bit ABI alignment and 64-bit preferred alignmentt

	@a = internal externally_initialized global [3 x [7 x i32*]] zeroinitializer, align 16			@a = internal externally_initialized global [3 x [7 x i32*]] zeroinitializer, align 16

	; FIXME:
	; PR50253			; PR50253
	; The store alignments are correct initially, but they should be updated			; The alignments are correct initially, but they should be updated
	; after transforming the global. The global pointer retains its original			; after transforming the global. The stored global pointer array retains
	; "align 16", so access to element N into the new array should be offset			; its original "align 16", so access to element N into the new array
	; by the ABI alignment of N pointers.			; should be offset by the ABI alignment of N pointers.
				; Loaded globals are split into individual pointers and use the
				; preferred alignment from the datalayout.

	;.			;.
	; CHECK: @[[A_1:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global [7 x i32*] zeroinitializer, align 16			; CHECK: @[[A_1:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global [7 x i32*] zeroinitializer, align 16
	; CHECK: @[[A_2_0:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8			; CHECK: @[[A_2_0:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8
	; CHECK: @[[A_2_1:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8			; CHECK: @[[A_2_1:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8
	; CHECK: @[[A_2_2:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8			; CHECK: @[[A_2_2:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8
	; CHECK: @[[A_2_3:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8			; CHECK: @[[A_2_3:[a-zA-Z0-9_$"\\.-]+]] = internal unnamed_addr externally_initialized global i32* null, align 8
	;.			;.
	define i32* @reduce_align_0() {			define i32* @reduce_align_0() {
	; CHECK-LABEL: @reduce_align_0(			; CHECK-LABEL: @reduce_align_0(
	; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.0, align 8			; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.0, align 8
	; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 0), align 4			; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 0), align 16
	; CHECK-NEXT: ret i32* [[X]]			; CHECK-NEXT: ret i32* [[X]]
	;			;
	%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 0), align 8			%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 0), align 8
	store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 0), align 4			store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 0), align 4
	ret i32* %x			ret i32* %x
	}			}

	define i32* @reduce_align_1() {			define i32* @reduce_align_1() {
	; CHECK-LABEL: @reduce_align_1(			; CHECK-LABEL: @reduce_align_1(
	; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.1, align 4			; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.1, align 8
	; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 1), align 16			; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 1), align 4
	; CHECK-NEXT: ret i32* [[X]]			; CHECK-NEXT: ret i32* [[X]]
	;			;
	%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 1), align 4			%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 1), align 4
	store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 1), align 16			store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 1), align 16
				spatelAuthorUnsubmitted Done Reply Inline Actions This is accessing element 8 (7 + 1) of the 16-byte aligned global with 4-byte elements, so "align 16" is correct and the best that it can be for the original code, but that's wrong after we strip away the outer array and only have [7 x i32]. This is the test that most closely models what is happening in the bug reports. spatel:* This is accessing element 8 (7 + 1) of the 16-byte aligned global with 4-byte elements, so…
	ret i32* %x			ret i32* %x
	}			}

	define i32* @reduce_align_2() {			define i32* @reduce_align_2() {
	; CHECK-LABEL: @reduce_align_2(			; CHECK-LABEL: @reduce_align_2(
	; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.2, align 16			; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.2, align 8
	; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 2), align 4			; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 2), align 8
	; CHECK-NEXT: ret i32* [[X]]			; CHECK-NEXT: ret i32* [[X]]
	;			;
	%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 2), align 16			%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 2), align 16
	store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 2), align 4			store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 2), align 4
	ret i32* %x			ret i32* %x
	}			}

	define i32* @reduce_align_3() {			define i32* @reduce_align_3() {
	; CHECK-LABEL: @reduce_align_3(			; CHECK-LABEL: @reduce_align_3(
	; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.3, align 4			; CHECK-NEXT: [[X:%.]] = load i32, i32** @a.2.3, align 8
	; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 3), align 8			; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 3), align 4
	; CHECK-NEXT: ret i32* [[X]]			; CHECK-NEXT: ret i32* [[X]]
	;			;
	%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 3), align 4			%x = load i32, i32* getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 2, i64 3), align 4
	store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 3), align 8			store i32* null, i32** getelementptr inbounds ([3 x [7 x i32]], [3 x [7 x i32]]* @a, i64 0, i64 1, i64 3), align 8
	ret i32* %x			ret i32* %x
	}			}
				MaskRayUnsubmitted Not Done Reply Inline Actions Consider adding a load test. define i32* @reduce_align_2(i32* %y) { ; CHECK-LABEL: @reduce_align_2( ; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 2), align 8 ; CHECK-NEXT: ret i32* null ; %x = load i32, i32* getelementptr inbounds ([2 x [7 x i32]], [2 x [7 x i32]]* @a, i64 0, i64 0, i64 0), align 1 store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32]], [2 x [7 x i32]]* @a, i64 0, i64 1, i64 2), align 4 ret i32* %x } define i32* @reduce_align_3(i32* %y) { ; CHECK-LABEL: @reduce_align_3( ; CHECK-NEXT: store i32* null, i32** getelementptr inbounds ([7 x i32], [7 x i32]* @a.1, i32 0, i64 3), align 4 ; CHECK-NEXT: ret i32* null ; %x = load i32, i32* getelementptr inbounds ([2 x [7 x i32]], [2 x [7 x i32]]* @a, i64 0, i64 0, i64 0), align 1 store i32* %y, i32** getelementptr inbounds ([2 x [7 x i32]], [2 x [7 x i32]]* @a, i64 0, i64 1, i64 3), align 8 ret i32* %x } define i32* @reduce_align_4() { %x = load i32, i32* getelementptr inbounds ([2 x [7 x i32]], [2 x [7 x i32]]* @a, i64 0, i64 0, i64 2), align 1 ret i32* %x } MaskRay: Consider adding a load test. ``` define i32* @reduce_align_2(i32* %y) { ; CHECK-LABEL…
				spatelAuthorUnsubmitted Done Reply Inline Actions I'll add "externally_initialized" to the global declaration and adjust the tests. The existing tests have loads, but they are all getting folded to null. spatel: I'll add "externally_initialized" to the global declaration and adjust the tests. The existing…

This is an archive of the discontinued LLVM Phabricator instance.

[GlobalOpt] recompute alignments for loads and stores of updated globalsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 346761

llvm/lib/Transforms/IPO/GlobalOpt.cpp

llvm/test/Transforms/GlobalOpt/externally-initialized-global-ctr.ll

llvm/test/Transforms/GlobalOpt/globalsra-align.ll

[GlobalOpt] recompute alignments for loads and stores of updated globals
ClosedPublic