This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
2/4
ArgumentPromotion.cpp
-
test/Transforms/ArgumentPromotion/
-
Transforms/
-
ArgumentPromotion/
-
align.ll

Differential D125009

[ArgPromotion] Use common alignment for loads in caller
AbandonedPublic

Authored by psamolysov on May 5 2022, 6:01 AM.

Download Raw Diff

Details

Reviewers

nikic
aeubanks
jdoerfert

Summary

In byval promotion, the generated in callers 'load' instructions have
the common alignment with the corresponding offset and alignment of
the promoted byval argument:

define internal void @callee_load_first_element(%struct.ss* byval(%struct.ss) align 16 %b) {
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  ...
}

will be optimized into:

define internal void @callee_load_from_aligned_1(i32 %b.0.val) {
  %temp2 = add i32 %b.0.val, 1
  ret void
}


define i32 @caller_load_first_element() {
  %S = alloca %struct.ss, align 16
...
  %S.0 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
  %S.0.val = load i32, i32* %S.0, align 16
  %S.1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
  %S.1.val = load i64, i64* %S.1, align 4
  call void @callee_load_first_element(i32 %S.0.val, i64 %S.1.val)
...
}

(So, '%S.0.val' has 'align 16', the same align as the initial
'%struct.ss' argument has.)

But the "usual" promotion doesn't follow this rule, the corresponding
'load' instruction generated in the caller will have just the maximum
alignment of all the loads for the argument's part in the callee, and
the possible alignment for the argument itself is just ignored:

define internal void @callee_load_from_aligned(%struct.ss* align 16 %b) {
  %temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
  %temp1 = load i32, i32* %temp, align 4
  %temp2 = add i32 %temp1, 1
  %temp3 = load i32, i32* %temp, align 8
  ret void
}

will be optimized into:

define internal void @callee_load_from_aligned(i32 %b.0.val) {
  %temp2 = add i32 %b.0.val, 1
  ret void
}

define i32 @caller_load_from_aligned() {
  %S = alloca %struct.ss, align 16
...
  %1 = getelementptr %struct.ss, %struct.ss* %S, i64 0, i32 0
  %S.val = load i32, i32* %1, align 8
  call void @callee_load_from_aligned_1(i32 %S.val)
...
}

(So, '%S.val' has 'align 8' while the pointer, argument '%b' in the
callee, is aligned by 16.)

The intent of the patch is to align the behavior of both propagation
schemes: byval and non-byval. However, if there is a load with a larger
value of the 'align' attribute than the argument has, the non-byval
promotion will use this alignment while the byval one doesn't.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

psamolysov created this revision.May 5 2022, 6:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 6:01 AM

Herald added subscribers: ormris, hiraditya. · View Herald Transcript

psamolysov requested review of this revision.May 5 2022, 6:01 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 5 2022, 6:01 AM

I've opened this review to get a more common behavior for byval and non-byval promotion schemes before replacing the byval scheme with tweaked non-byval (with allowed stores) one. I think having some more ifs in the source code to check whether this is promotion from byval or not and emulate the old behavior of alignments in each scheme is not a good idea and makes the code more difficult. I understand that this is not a common thing to have non-byval pointers in function arguments with the align attribute, I've actually seen a few cases in the align.ll LIT test only (@caller_guaranteed_aligned_1(i1 %c, i32* align 16 dereferenceable(4) %p) for example).

Harbormaster completed remote builds in B162898: Diff 427291.May 5 2022, 6:55 AM

nikic added inline comments.May 6 2022, 2:40 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521	Isn't this incorrect for non-zero offsets? Overall, I'm not sure I really understand what you're trying to achieve here. What we're interested here is whether loads are speculatable, which is the case when they are defereferenceable and aligned. This can be either because the load is guaranteed to execute, or because we have known dereferenceability/alignment knowledge. Byval deref/align will be taken into account for the latter check (allCallersPassValidPointerForArgument).

psamolysov added inline comments.May 6 2022, 3:01 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521	@nikic Yes, you are absolutely right, this is incorrect for non-zero offsets. I'm going to fix it ASAP. Initially, my intent was to copy the byval promotion behavior where the alignment of the byval argument is taken into account because if the argument is aligned by 32, for example, we should not use less alignment for the part with zero offset loading. And because the byval argument is just a copy of the original value from the caller, it's very easy to get it's alignment: just use the value from the corresponding argument definition in the callee. But after rethinking, I see that for non-byval pointers, we should take into account not the argument alignment from the callee definition (as I did here) but actual alignment value from the caller in which we generate new load instructions. It can be considered as an error in the frontend of course, but if we have loads in the callee with less alignment for the part with zero offset than the whole structure was allocated in the caller with, and we use this less alignment in the new generated load instructions, it might be a source of errors in the compiled application. What do you think?

nikic added inline comments.May 6 2022, 4:01 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521	I think you're mixing up two things here: Legality and optimization. I believe the current code fully handles the legality aspect for both byval and non-byval arguments. If I understand right, your proposed change here is related to optimization: If you have an `align 32` argument that is used in an `align 4` load, you want to make the promoted load use `align 32` instead. Doing that would be fine, but I also think that this is not really the place to do it: We already have transforms that will increase load/store alignment based on known alignment (e.g. from parameter align attributes). For example, InstCombine does this (see getOrEnforceKnownAlignment). I think the alignment handling code here is already tricky enough without also trying to optimize the alignments at the same time.

psamolysov added inline comments.May 6 2022, 4:27 AM

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
521	@nikic Thank you very much for the good explanation. My concern was about legality, so I though this is illegal to use `align 4` loads for `align 32` arguments. If we replace the `byval` promotion approach with the currently used `non-byval` one as is, the `non-byval` rules will be used for alignments of the generated `load` instructions too: just the maximum `align` of the present in the callee `load` instructions will be used and `align` of the `byval` argument will just be ignored. Example, the `byval.ll` test, function `@g`: define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind { ... } after promotion by `non-byval` scheme, in caller: define i32 @main() nounwind { %S = alloca %struct.ss, align 32 ... %S01 = getelementptr %struct.ss, %struct.ss* , i64 0, i32 0 %S01_VAL = load i32, i32* %S01, align 4 call void @g(i32 %S01_VAL) ... } Will this `%S01_VAL = load i32, i32* %S01, align 4`, not `align 32` as it is now, be correct?

jdoerfert edited the summary of this revision. (Show Details)May 6 2022, 9:06 AM

I though this is illegal to use align 4 loads for align 32 arguments.

It is not. Any lower alignment (for the load) is perfectly legal.

In D125009#3496960, @jdoerfert wrote:

I though this is illegal to use align 4 loads for align 32 arguments.

It is not. Any lower alignment (for the load) is perfectly legal.

@jdoerfert thank you for the answer.
If so, I see no sense for these changes and the revision can be abandoned. From another point of view, this is an opportunity from the optimization perspective to use for the promoted 'load' instructions the alignment from the argument's allocation if this won't make the code of the pass significant harder but as @nikic has already mentioned, this pass looks as it's not the right place to do it.

psamolysov abandoned this revision.May 11 2022, 6:50 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

IPO/

ArgumentPromotion.cpp

12 lines

test/

Transforms/

ArgumentPromotion/

align.ll

116 lines

Diff 427291

llvm/lib/Transforms/IPO/ArgumentPromotion.cpp

Show First 20 Lines • Show All 511 Lines • ▼ Show 20 Lines	if (Size.isScalable())
return false;		return false;

// If this is a recursive function and one of the types is a pointer,		// If this is a recursive function and one of the types is a pointer,
// then promoting it might lead to recursive promotion.		// then promoting it might lead to recursive promotion.
if (IsRecursive && Ty->isPointerTy())		if (IsRecursive && Ty->isPointerTy())
return false;		return false;

int64_t Off = Offset.getSExtValue();		int64_t Off = Offset.getSExtValue();
		Align ArgAlign = Arg->getParamAlign().valueOrOne();
		Align PartAlign = std::max(LI->getAlign(), ArgAlign);
		nikicUnsubmitted Not Done Reply Inline Actions Isn't this incorrect for non-zero offsets? Overall, I'm not sure I really understand what you're trying to achieve here. What we're interested here is whether loads are speculatable, which is the case when they are defereferenceable and aligned. This can be either because the load is guaranteed to execute, or because we have known dereferenceability/alignment knowledge. Byval deref/align will be taken into account for the latter check (allCallersPassValidPointerForArgument). nikic: Isn't this incorrect for non-zero offsets? Overall, I'm not sure I really understand what…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions @nikic Yes, you are absolutely right, this is incorrect for non-zero offsets. I'm going to fix it ASAP. Initially, my intent was to copy the byval promotion behavior where the alignment of the byval argument is taken into account because if the argument is aligned by 32, for example, we should not use less alignment for the part with zero offset loading. And because the byval argument is just a copy of the original value from the caller, it's very easy to get it's alignment: just use the value from the corresponding argument definition in the callee. But after rethinking, I see that for non-byval pointers, we should take into account not the argument alignment from the callee definition (as I did here) but actual alignment value from the caller in which we generate new load instructions. It can be considered as an error in the frontend of course, but if we have loads in the callee with less alignment for the part with zero offset than the whole structure was allocated in the caller with, and we use this less alignment in the new generated load instructions, it might be a source of errors in the compiled application. What do you think? psamolysov: @nikic Yes, you are absolutely right, this is incorrect for non-zero offsets. I'm going to fix…
		nikicUnsubmitted Not Done Reply Inline Actions I think you're mixing up two things here: Legality and optimization. I believe the current code fully handles the legality aspect for both byval and non-byval arguments. If I understand right, your proposed change here is related to optimization: If you have an `align 32` argument that is used in an `align 4` load, you want to make the promoted load use `align 32` instead. Doing that would be fine, but I also think that this is not really the place to do it: We already have transforms that will increase load/store alignment based on known alignment (e.g. from parameter align attributes). For example, InstCombine does this (see getOrEnforceKnownAlignment). I think the alignment handling code here is already tricky enough without also trying to optimize the alignments at the same time. nikic: I think you're mixing up two things here: Legality and optimization. I believe the current code…
		psamolysovAuthorUnsubmitted Done Reply Inline Actions @nikic Thank you very much for the good explanation. My concern was about legality, so I though this is illegal to use `align 4` loads for `align 32` arguments. If we replace the `byval` promotion approach with the currently used `non-byval` one as is, the `non-byval` rules will be used for alignments of the generated `load` instructions too: just the maximum `align` of the present in the callee `load` instructions will be used and `align` of the `byval` argument will just be ignored. Example, the `byval.ll` test, function `@g`: define internal void @g(%struct.ss* byval(%struct.ss) align 32 %b) nounwind { ... } after promotion by `non-byval` scheme, in caller: define i32 @main() nounwind { %S = alloca %struct.ss, align 32 ... %S01 = getelementptr %struct.ss, %struct.ss* , i64 0, i32 0 %S01_VAL = load i32, i32* %S01, align 4 call void @g(i32 %S01_VAL) ... } Will this `%S01_VAL = load i32, i32* %S01, align 4`, not `align 32` as it is now, be correct? psamolysov: @nikic Thank you very much for the good explanation. My concern was about legality, so I though…
auto Pair = ArgParts.try_emplace(		auto Pair = ArgParts.try_emplace(
Off, ArgPart{Ty, LI->getAlign(), GuaranteedToExecute ? LI : nullptr});		Off, ArgPart{Ty, PartAlign, GuaranteedToExecute ? LI : nullptr});
ArgPart &Part = Pair.first->second;		ArgPart &Part = Pair.first->second;
bool OffsetNotSeenBefore = Pair.second;		bool OffsetNotSeenBefore = Pair.second;

// We limit promotion to only promoting up to a fixed number of elements of		// We limit promotion to only promoting up to a fixed number of elements of
// the aggregate.		// the aggregate.
if (MaxElements > 0 && ArgParts.size() > MaxElements) {		if (MaxElements > 0 && ArgParts.size() > MaxElements) {
LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "		LLVM_DEBUG(dbgs() << "ArgPromotion of " << *Arg << " failed: "
<< "more than " << MaxElements << " parts\n");		<< "more than " << MaxElements << " parts\n");
Show All 10 Lines	auto HandleLoad = [&](LoadInst *LI,

// If this load is not guaranteed to execute, and we haven't seen a load at		// If this load is not guaranteed to execute, and we haven't seen a load at
// this offset before (or it had lower alignment), then we need to remember		// this offset before (or it had lower alignment), then we need to remember
// that requirement.		// that requirement.
// Note that skipping loads of previously seen offsets is only correct		// Note that skipping loads of previously seen offsets is only correct
// because we only allow a single type for a given offset, which also means		// because we only allow a single type for a given offset, which also means
// that the number of accessed bytes will be the same.		// that the number of accessed bytes will be the same.
if (!GuaranteedToExecute &&		if (!GuaranteedToExecute &&
(OffsetNotSeenBefore \|\| Part.Alignment < LI->getAlign())) {		(OffsetNotSeenBefore \|\| Part.Alignment < PartAlign)) {
// We won't be able to prove dereferenceability for negative offsets.		// We won't be able to prove dereferenceability for negative offsets.
if (Off < 0)		if (Off < 0)
return false;		return false;

// If the offset is not aligned, an aligned base pointer won't help.		// If the offset is not aligned, an aligned base pointer won't help.
if (!isAligned(LI->getAlign(), Off))		if (!isAligned(PartAlign, Off))
return false;		return false;

NeededDerefBytes = std::max(NeededDerefBytes, Off + Size.getFixedValue());		NeededDerefBytes = std::max(NeededDerefBytes, Off + Size.getFixedValue());
NeededAlign = std::max(NeededAlign, LI->getAlign());		NeededAlign = std::max(NeededAlign, PartAlign);
}		}

Part.Alignment = std::max(Part.Alignment, LI->getAlign());		Part.Alignment = std::max(Part.Alignment, PartAlign);
return true;		return true;
};		};

// Look for loads that are guaranteed to execute on entry.		// Look for loads that are guaranteed to execute on entry.
for (Instruction &I : Arg->getParent()->getEntryBlock()) {		for (Instruction &I : Arg->getParent()->getEntryBlock()) {
if (LoadInst *LI = dyn_cast<LoadInst>(&I))		if (LoadInst *LI = dyn_cast<LoadInst>(&I))
if (Optional<bool> Res = HandleLoad(LI, /* GuaranteedToExecute */ true))		if (Optional<bool> Res = HandleLoad(LI, /* GuaranteedToExecute */ true))
if (!*Res)		if (!*Res)
▲ Show 20 Lines • Show All 510 Lines • Show Last 20 Lines

llvm/test/Transforms/ArgumentPromotion/align.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature --scrub-attributes
	; RUN: opt -S -argpromotion < %s \| FileCheck %s			; RUN: opt -S -argpromotion < %s \| FileCheck %s

				%struct.ss = type { i32, i64 }

	define internal i32 @callee_must_exec(i32* %p) {			define internal i32 @callee_must_exec(i32* %p) {
	; CHECK-LABEL: define {{[^@]+}}@callee_must_exec			; CHECK-LABEL: define {{[^@]+}}@callee_must_exec
	; CHECK-SAME: (i32 [[P_0_VAL:%.*]]) {			; CHECK-SAME: (i32 [[P_0_VAL:%.*]]) {
	; CHECK-NEXT: ret i32 [[P_0_VAL]]			; CHECK-NEXT: ret i32 [[P_0_VAL]]
	;			;
	%x = load i32, i32* %p, align 16			%x = load i32, i32* %p, align 16
	ret i32 %x			ret i32 %x
	}			}
	▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: define {{[^@]+}}@caller_not_guaranteed_aligned			; CHECK-LABEL: define {{[^@]+}}@caller_not_guaranteed_aligned
	; CHECK-SAME: (i1 [[C:%.]], i32 dereferenceable(4) [[P:%.*]]) {			; CHECK-SAME: (i1 [[C:%.]], i32 dereferenceable(4) [[P:%.*]]) {
	; CHECK-NEXT: [[TMP1:%.]] = call i32 @callee_not_guaranteed_aligned(i1 [[C]], i32 [[P]])			; CHECK-NEXT: [[TMP1:%.]] = call i32 @callee_not_guaranteed_aligned(i1 [[C]], i32 [[P]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call i32 @callee_not_guaranteed_aligned(i1 %c, i32* %p)			call i32 @callee_not_guaranteed_aligned(i1 %c, i32* %p)
	ret void			ret void
	}			}

				define internal void @callee_load_from_aligned_1(%struct.ss* align 16 %b) {
				; CHECK-LABEL: define {{[^@]+}}@callee_load_from_aligned_1
				; CHECK-SAME: (i32 [[B_0:%.*]]) {
				; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[B_0]], 1
				; CHECK-NEXT: ret void
				;
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 4
				%temp2 = add i32 %temp1, 1
				%temp3 = load i32, i32* %temp, align 8
				ret void
				}

				define internal void @callee_load_from_aligned_2(%struct.ss* align 16 %b) {
				; CHECK-LABEL: define {{[^@]+}}@callee_load_from_aligned_2
				; CHECK-SAME: (i32 [[B_0:%.*]]) {
				; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[B_0]], 1
				; CHECK-NEXT: ret void
				;
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 4
				%temp2 = add i32 %temp1, 1
				%temp3 = load i32, i32* %temp, align 32
				ret void
				}

				define internal void @callee_load_from_aligned_3(%struct.ss* %b) {
				; CHECK-LABEL: define {{[^@]+}}@callee_load_from_aligned_3
				; CHECK-SAME: (i32 [[B_0:%.*]]) {
				; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[B_0]], 1
				; CHECK-NEXT: ret void
				;
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 4
				%temp2 = add i32 %temp1, 1
				%temp3 = load i32, i32* %temp, align 8
				ret void
				}

				define i32 @caller_load_from_aligned() {
				; CHECK-LABEL: define {{[^@]+}}@caller_load_from_aligned
				; CHECK-SAME: () {
				; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 16
				; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
				; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 16
				; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
				; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4
				; CHECK-NEXT: [[S_1_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_1_0_VAL:%.]] = load i32, i32 [[S_1_0]], align 16
				; CHECK-NEXT: call void @callee_load_from_aligned_1(i32 [[S_1_0_VAL]])
				; CHECK-NEXT: [[S_2_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_2_0_VAL:%.]] = load i32, i32 [[S_2_0]], align 32
				; CHECK-NEXT: call void @callee_load_from_aligned_2(i32 [[S_2_0_VAL]])
				; CHECK-NEXT: [[S_3_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i64 0, i32 0
				; CHECK-NEXT: [[S_3_0_VAL:%.]] = load i32, i32 [[S_3_0]], align 8
				; CHECK-NEXT: call void @callee_load_from_aligned_3(i32 [[S_3_0_VAL]])
				; CHECK-NEXT: ret i32 0
				;
				%S = alloca %struct.ss, align 16
				%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
				store i32 1, i32* %temp1, align 16
				%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
				store i64 2, i64* %temp4, align 4
				call void @callee_load_from_aligned_1(%struct.ss* %S)
				call void @callee_load_from_aligned_2(%struct.ss* %S)
				call void @callee_load_from_aligned_3(%struct.ss* %S)
				ret i32 0
				}

				define internal void @callee_load_first_element(%struct.ss* byval(%struct.ss) align 16 %b) {
				; CHECK-LABEL: define {{[^@]+}}@callee_load_first_element
				; CHECK-SAME: (i32 [[B_0:%.]], i64 [[B_1:%.]]) {
				; CHECK-NEXT: [[B:%.]] = alloca [[STRUCT_SS:%.]], align 16
				; CHECK-NEXT: [[DOT0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
				; CHECK-NEXT: store i32 [[B_0]], i32* [[DOT0]], align 16
				; CHECK-NEXT: [[DOT1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 1
				; CHECK-NEXT: store i64 [[B_1]], i64* [[DOT1]], align 4
				; CHECK-NEXT: [[TEMP:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[B]], i32 0, i32 0
				; CHECK-NEXT: [[TEMP1:%.]] = load i32, i32 [[TEMP]], align 8
				; CHECK-NEXT: [[TEMP2:%.*]] = add i32 [[TEMP1]], 1
				; CHECK-NEXT: [[TEMP3:%.]] = load i32, i32 [[TEMP]], align 32
				; CHECK-NEXT: ret void
				;
				%temp = getelementptr %struct.ss, %struct.ss* %b, i32 0, i32 0
				%temp1 = load i32, i32* %temp, align 8
				%temp2 = add i32 %temp1, 1
				%temp3 = load i32, i32* %temp, align 32
				ret void
				}

				define i32 @caller_load_first_element() {
				; CHECK-LABEL: define {{[^@]+}}@caller_load_first_element
				; CHECK-SAME: () {
				; CHECK-NEXT: [[S:%.]] = alloca [[STRUCT_SS:%.]], align 16
				; CHECK-NEXT: [[TEMP1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
				; CHECK-NEXT: store i32 1, i32* [[TEMP1]], align 16
				; CHECK-NEXT: [[TEMP4:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
				; CHECK-NEXT: store i64 2, i64* [[TEMP4]], align 4
				; CHECK-NEXT: [[S_0:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 0
				; CHECK-NEXT: [[S_0_VAL:%.]] = load i32, i32 [[S_0]], align 16
				; CHECK-NEXT: [[S_1:%.]] = getelementptr [[STRUCT_SS]], %struct.ss [[S]], i32 0, i32 1
				; CHECK-NEXT: [[S_1_VAL:%.]] = load i64, i64 [[S_1]], align 4
				; CHECK-NEXT: call void @callee_load_first_element(i32 [[S_0_VAL]], i64 [[S_1_VAL]])
				; CHECK-NEXT: ret i32 0
				;
				%S = alloca %struct.ss, align 16
				%temp1 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 0
				store i32 1, i32* %temp1, align 16
				%temp4 = getelementptr %struct.ss, %struct.ss* %S, i32 0, i32 1
				store i64 2, i64* %temp4, align 4
				call void @callee_load_first_element(%struct.ss* byval(%struct.ss) align 16 %S)
				ret i32 0
				}