This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
9/27
InstCombineCalls.cpp
-
test/DebugInfo/X86/
-
DebugInfo/
-
X86/
1
array2.ll

Differential D35035

[InstCombine] Prevent memcpy generation for small data size
AcceptedPublic

Authored by hiraditya on Jul 5 2017, 2:22 PM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
majnemer
DIVYA
sebpop
SirishP
alnemr55
lebedev.ri

Summary

InstCombine pass converts @llvm.memcpy to either memcpy or LD/ST depending on if the data size is greater than 8 bytes or not.
However for different targets, 8 bytes may be smaller.
Hence, we queried target datalayout to find the threshold.

Worked in collaboration with Aditya Kumar

Diff Detail

Event Timeline

DIVYA created this revision.Jul 5 2017, 2:22 PM

Updated array2.ll testcase file

spatel added inline comments.Jul 6 2017, 11:29 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
195–204	This code comment isn't correct any more.
199–200	Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain.
test/Transforms/InstCombine/builtin_memcpy_pattern.ll
1 ↗	(On Diff #105339)	The test should be limited to -instcombine, not -O2. The tests here have way too much extraneous stuff. Please minimize to just the IR necessary to show the transform. You can check the tests in ahead of this patch to show the current IR, then this patch will just show the diffs from the existing code. Please use the script at util/update_tests_checks.py to auto-generate the complete CHECK lines.

hiraditya added a subscriber: hiraditya.Jul 6 2017, 1:56 PM

hiraditya added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
199–200	Why 32? 32 because we want to default to 8, same as the previous behavior before this patch. I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. For example, in test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.

removed some comments
added testcase with different datalayout

DIVYA marked 2 inline comments as done.Jul 25 2017, 11:31 AM

ping

spatel added inline comments.Jul 31 2017, 2:16 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
184–185	This comment is wrong now? I'm still confused about the current behavior and what this patch is changing. Did you see my earlier comment to check in minimal tests to trunk before this patch so we have a baseline view of the current behavior? Do the tests really need loops, attributes, globals, etc? If there are no legal types in the DL, why wouldn't we just bail out rather than assuming that 32-bit and smaller is safe/desirable to transform?

Added a diff for the output before and after applying the patch
Removed some comments

DIVYA updated this revision to Diff 109979.Aug 7 2017, 6:51 AM

DIVYA marked an inline comment as done.Aug 7 2017, 6:56 AM

DIVYA added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
184–185	The test contains loops to show that builtin_memcpy within the loops , will also be converted to either memcpy or store and load operations depending on the maximum allowed stores

DIVYA marked an inline comment as done.Aug 7 2017, 6:57 AM

DIVYA updated this revision to Diff 109989.Aug 7 2017, 7:27 AM

Diff for test/Transforms/InstCombine/builtin_memcpy_patterns.ll

TESTCASE1
Before applying the patch

define void @foo(i8* %a, i8* %b) local_unnamed_addr #0 {
entry:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 16, i32 1, i1 false)
ret void

}

After Applying the patch

define void @foo(i8* %a, i8* %b) local_unnamed_addr #0 {
entry:

%0 = bitcast i8* %b to i128*
%1 = bitcast i8* %a to i128*
%2 = load i128, i128* %0, align 1
store i128 %2, i128* %1, align 1
ret void

}

TESTCASE2

Before Applying the patch

%0 = bitcast i32* %add.ptr to i8*
%add.ptr3 = getelementptr inbounds [16 x i32], [16 x i32]* @b, i64 0, i64 %idx.ext
%1 = bitcast i32* %add.ptr3 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)
%inc = add nsw i32 %i.0, 1

After Applying the patch

%0 = bitcast i32* %add.ptr3 to i128*
%1 = bitcast i32* %add.ptr to i128*
%2 = load i128, i128* %0, align 16
store i128 %2, i128* %1, align 4

DIVYA updated this revision to Diff 110467.Aug 9 2017, 1:33 PM

spatel mentioned this in rL310611: [InstCombine] add memcpy expansion tests with potential DL dependency; NFC.Aug 10 2017, 8:38 AM

This patch as uploaded makes even less sense to me than before:

There are no tests to show the diff.
You've changed the code rather than an assertion in an unrelated test to show a functional difference?

I've checked in some extra tests where I think your patch will fire here:
https://reviews.llvm.org/rL310611

Please update this patch to include that and then explain the intent of each diff.

As I mentioned before, you can use a script to update the check lines (please don't hand-edit them):
$ path/to/update_test_checks.py --opt=path/to/build/bin/opt memcpy-to-load.ll

Updated patch rL310611 and added the diff
added a new testcase

spatel added inline comments.Aug 11 2017, 1:36 PM

test/Transforms/InstCombine/builtin_memcpy_patterns.ll
3–6 ↗	(On Diff #110787)	I still don't understand what value these tests provide over the existing ones. These are the same llvm.memcpy calls except they are in loops? The transform doesn't depend on loops from what I can tell, so why do we need to test this?
test/Transforms/InstCombine/memcpy-to-load.ll
81 ↗	(On Diff #110787)	That is the existing behavior, but why is that valid now that we're using the datalayout?
93 ↗	(On Diff #110787)	You're mixing bytes and bits here. It makes sense to me that we would use the datalayout to drive this transform, but why are we then ignoring the datalayout and producing load + store of an illegal type (i128)?

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

In D35035#839833, @joerg wrote:

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

I can't answer for 'memcpy' directly, but probably related - I'm looking at cases where late 'memcmp' expansion causes us to miss optimizations in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13

instcombine is definitely not the IR pass for 'memcmp' expansion. (It's complicated enough to be its own pass, but currently it's a pass-within-a-pass of CGP.)

We could make the argument that the post-IR backend should handle those cases, but I think it would require the collective powers of gvn, cse, instcombine, and simplifycfg.

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

In D35035#839852, @joerg wrote:

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

Thanks, @joerg . I can't see any reason not to do prefix elimination either. I'm not sure if that will solve everything for that case, but it should make it better. I'll give that a try.

DIVYA updated this revision to Diff 110982.Aug 14 2017, 8:42 AM

DIVYA marked an inline comment as done.

DIVYA added inline comments.Aug 14 2017, 8:44 AM

test/Transforms/InstCombine/memcpy-to-load.ll
81 ↗	(On Diff #110787)	In test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.For those cases it will have the previous behavious before applying the patch

DIVYA updated this revision to Diff 111029.Aug 14 2017, 10:27 AM

spatel mentioned this in D36922: [LibCallSimplifier] try harder to fold memcmp with constant arguments.Aug 19 2017, 8:54 AM

spatel mentioned this in D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.Sep 15 2018, 7:14 AM

craig.topper added inline comments.Sep 16 2018, 6:00 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
195–204	Is there an extra blank line here?
199–200	But the previous behavior allowed 8 bytes so shouldn't it be 64?

hiraditya added inline comments.Sep 16 2018, 11:54 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
195–204	We'll fix this.
199–200	It's been a while when we submitted the patch but I think you're right unless @DIVYA has some comments. We'll make the changes.

I would quickly like to check the status of this patch: do you have plans to continue this work? If not, I would like to pick it up.

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

In D35035#1238502, @hiraditya wrote:

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

In D35035#1238511, @lebedev.ri wrote:

In D35035#1238502, @hiraditya wrote:

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

I think so...

dmgreen added a subscriber: dmgreen.Sep 19 2018, 6:08 AM

To fix tests and comments from reviewers.

Fix some unit-tests, a few more remaining.

Herald added a subscriber: jfb. · View Herald TranscriptSep 30 2018, 10:16 AM

lebedev.ri added inline comments.Sep 30 2018, 10:19 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
133	You can check that the `Size` is power of two here.
139–140	I think we decided that the tests should be updated instead?
142	You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`.

Fixed all testcases, and removed memcpy inlining when target datalayout is not present.

Moved the check of Size before checking the LargestInt.

hiraditya marked 3 inline comments as done.Sep 30 2018, 12:28 PM

hiraditya added reviewers: sebpop, SirishP.

Code looks good.

test/Transforms/InstCombine/element-atomic-memintrins.ll
101 ↗	(On Diff #167660)	Please just use `utils/update_test_checks.py`, as stated in the first line of the test.
test/Transforms/InstCombine/memcpy-to-load.ll
76 ↗	(On Diff #167660)	Will running the `utils/update_test_checks.py` preserve this 'inline' comments?

lebedev.ri added inline comments.Sep 30 2018, 1:45 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
133	Actually, please use `!isPowerOf2_64(Size)`.

Updated testcases with utils/update_test_checks.py
use isPowerOf2_64

hiraditya marked 2 inline comments as done.Oct 1 2018, 12:01 AM

dmgreen added inline comments.Oct 1 2018, 6:26 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
140	Are the units of these the same? Or is one bits and the other bytes?

I don't understand why it makes sense to tie this to the larger integer type. I understand that targets might want what you're doing, but tying it to the largest integer type instead of having a separate target-specific value seems odd.
How does this interact with memcpyopt?
Which targets are affected by this change?
Can you please provide size and performance numbers for relevant targets?

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter.
test/DebugInfo/X86/array2.ll
21	Why change this?
test/Transforms/InstCombine/2007-10-10-EliminateMemCpy.ll
3 ↗	(On Diff #167673)	Why change this?
test/Transforms/InstCombine/alloca.ll
147 ↗	(On Diff #167673)	This test checks that the `inalloca` remains. Change the test to still test the same thing, don't just delete the `CHECK`. https://bugs.llvm.org/show_bug.cgi?id=19569

jfb added a subscriber: MatzeB.Oct 1 2018, 10:44 AM

hiraditya added inline comments.Oct 2 2018, 7:17 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available.
140	Thank you!

spatel added inline comments.Oct 2 2018, 8:44 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	I'm not understanding this discussion... This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air. The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081. But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend. Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine.

jfb added inline comments.Oct 2 2018, 9:24 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Let me try to expand a bit, and let me know if that makes sense: IMO `LargestLegalIntTypeSizeInBits` doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't.

lebedev.ri added inline comments.Oct 2 2018, 9:45 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be a new field in target datalayout? I'm not sure if we can get TargetLowering info from the backend in the middle-end?

spatel added inline comments.Oct 2 2018, 9:50 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality. So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all. IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong). But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering).

jfb added inline comments.Oct 2 2018, 9:55 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage == LargestLegalIntTypeSizeInBits` isn't what we should be doing.

hiraditya added inline comments.Oct 2 2018, 10:14 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should try to be independent of target specific behavior at instcombine. It be ideal to just not have inlining of memcpy here and do it all in memcpyopt with better cost-model.

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

In D35035#1252796, @SjoerdMeijer wrote:

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

In D35035#1252976, @lebedev.ri wrote:

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

In D35035#1253023, @spatel wrote:

In D35035#1252976, @lebedev.ri wrote:

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

clang also performs some of this work, even at O0.

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.

Makes sense.

And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

In D35035#1253827, @hiraditya wrote:

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

We should wait for the perf results from @SjoerdMeijer (and anyone else that is in a position to collect benchmark results?).
If the full solution (don't expand at all) is too ambitious, then I support continuing with this patch, but we need to make sure that we've answered @jfb's concerns. Ie, if the datalayout query is too distasteful, what is the alternative?

I am doing some experiments with a hack that simply comments out the call to InstCombiner::SimplifyAnyMemTransfer. I've ran 3 smaller benchmarks. 2 didn't show any difference. The 3rd shows 1 tiny regression and 1 tiny improvement in a test case, and will probably not even show a difference in the geomean. I am doing this as a background task; tomorrow I will ran bigger benchmarks, on different platforms. But I guess we need some numbers for non-Arm platforms too.

Any updates?

Oops, sorry, forgot to reply, and also got distracted by a few other things. I ran one bigger benchmark, and didn't see anything worth mentioning. A first preliminary conclusion: looks like we don't miss much by not doing this lowering here. Disclaimer: I've tested only on Arm targets, definitely not on all interesting architecture combinations, and a handful of benchmarks.

lebedev.ri requested changes to this revision.Oct 16 2018, 8:27 AM

lebedev.ri added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
140	I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it was previously pointed out by @dmgreen, `Size` is bytes, while `LargestInt` is clearly in bits.

This revision now requires changes to proceed.Oct 16 2018, 8:27 AM

Fix testcases and address comments

hiraditya marked 2 inline comments as done.Oct 21 2018, 12:37 PM

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

lib/Transforms/InstCombine/InstCombineCalls.cpp
199	Nit: spaces around `/`.

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?
If not, there are unanswered comments about the test diffs.
There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
Are there any updates on the perf for the ideal change (removing this expansion completely)?

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

In D35035#1284164, @jfb wrote:

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

In D35035#1297082, @hiraditya wrote:

In D35035#1284164, @jfb wrote:

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

Yes, please add a constant which informs this optimization. Don't reuse an unrelated constant for this purpose. Don't use this new constant to dive other unrelated optimizations.

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

lebedev.ri mentioned this in D54887: WIP: hack to use DL for promoting to load/store.Nov 25 2018, 11:41 PM

In D35035#1253023, @spatel wrote:

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing. @jfb can you elaborate why we shouldn't do that? I'm not sure I understand why are you qualifying this as `ErroneousUsage`? I can see that it would be suboptimal in some cases (and ideally it would be a TTI), but that does not enough to make it erroneous. I see a rational to consider that if I have a legal integer I can likely load/store it with a single pair of instructions. So it seems to me like a strict improvements over the existing "magic number".

In D35035#1308250, @mehdi_amini wrote:

In D35035#1253023, @spatel wrote:

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

Is it possible to improve SROA to understand llvm.memcpy instead? IIUC, the memcpy intrinsic is considered canonical form, and the consensus in this review was that we would not expand memcpy in instcombine if that did not cause regressions.

D54887 is proposing a similar change as here, but it's more liberal. It would allow expansion to load/store of weird types (non-power-of-2) as long as they fit in a single integer register. That seems dangerous given that the backend isn't very good at handling those illegal types. Example:

target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1) nounwind

define void @copy_3_bytes(i8* %d, i8* %s) {
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 3, i1 false)
  ret void
}

$ opt -instcombine -S memcpy.ll
...
define void @copy_3_bytes(i8* %d, i8* %s) {
  %1 = bitcast i8* %s to i24*
  %2 = bitcast i8* %d to i24*
  %3 = load i24, i24* %1, align 1
  store i24 %3, i24* %2, align 1
  ret void
}

jfb added inline comments.Nov 26 2018, 10:49 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Because we should be able to tune this magic number over time, per target, and using `LargestLegalIntTypeSizeInBits` doesn't allow us to do so. It's trivial to add this new magic number, and default it to `LargestLegalIntTypeSizeInBits` in the generic target setup. Let's just do that, and then the code is even self-documenting.

alnemr55 accepted this revision.Jan 4 2019, 2:38 AM

This revision is now accepted and ready to land.Jan 4 2019, 2:38 AM

Data layout question wasn't resolved. (just signalling, i don't have a preference one way or another)

This revision now requires changes to proceed.Jan 4 2019, 3:05 AM

Thanks Roman.

@alnemr55 - Why did you mark this as accepted? Maybe it was an accidental click? If not, please don't mark things as accepted unless you've been actively participating in the review and the review has genuinely concluded.

ychen added a subscriber: ychen.Jun 20 2019, 4:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 4:05 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision is now accepted and ready to land.Jan 12 2023, 4:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:41 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

14 lines

test/

DebugInfo/

X86/

array2.ll

2 lines

Diff 110467

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	if (NumElements > 0) {
Value *Src = AMI->getRawSource();		Value *Src = AMI->getRawSource();
Value *Dst = AMI->getRawDest();		Value *Dst = AMI->getRawDest();
Type *ElementPointerType =		Type *ElementPointerType =
Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,		Type::getIntNPtrTy(AMI->getContext(), ElementSizeInBits,
Src->getType()->getPointerAddressSpace());		Src->getType()->getPointerAddressSpace());

Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,		Value *SrcCasted = Builder->CreatePointerCast(Src, ElementPointerType,
"memcpy_unfold.src_casted");		"memcpy_unfold.src_casted");
Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,		Value *DstCasted = Builder->CreatePointerCast(Dst, ElementPointerType,
		lebedev.riUnsubmitted Done Reply Inline Actions You can check that the `Size` is power of two here. lebedev.ri: You can check that the `Size` is power of two here.
		lebedev.riUnsubmitted Done Reply Inline Actions Actually, please use `!isPowerOf2_64(Size)`. lebedev.ri: Actually, please use `!isPowerOf2_64(Size)`.
"memcpy_unfold.dst_casted");		"memcpy_unfold.dst_casted");

for (uint64_t i = 0; i < NumElements; ++i) {		for (uint64_t i = 0; i < NumElements; ++i) {
// Get current element addresses		// Get current element addresses
ConstantInt *ElementIdxCI =		ConstantInt *ElementIdxCI =
ConstantInt::get(AMI->getContext(), APInt(64, i));		ConstantInt::get(AMI->getContext(), APInt(64, i));
		jfbUnsubmitted Not Done Reply Inline Actions Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter. jfb: Why does this make sense? I don't understand why we'd want to tie this to the largest legal…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available. hiraditya: We used 'largest legal integer size' because that will fit in a register for sure. I think…
		spatelUnsubmitted Not Done Reply Inline Actions I'm not understanding this discussion... This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air. The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081. But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend. Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine. spatel: I'm not understanding this discussion... This patch is trying to apply some target-based…
		jfbUnsubmitted Not Done Reply Inline Actions Let me try to expand a bit, and let me know if that makes sense: IMO `LargestLegalIntTypeSizeInBits` doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't. jfb: Let me try to expand a bit, and let me know if that makes sense: IMO…
		lebedev.riUnsubmitted Not Done Reply Inline Actions It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be a new field in target datalayout? I'm not sure if we can get TargetLowering info from the backend in the middle-end? lebedev.ri: It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be…
		spatelUnsubmitted Not Done Reply Inline Actions I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality. So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all. IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong). But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering). spatel: I agree that this is a strange transform for instcombine. We're trying to safely replace an…
		jfbUnsubmitted Not Done Reply Inline Actions All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage == LargestLegalIntTypeSizeInBits` isn't what we should be doing. jfb: All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage ==…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing. @jfb can you elaborate why we shouldn't do that? I'm not sure I understand why are you qualifying this as `ErroneousUsage`? I can see that it would be suboptimal in some cases (and ideally it would be a TTI), but that does not enough to make it erroneous. I see a rational to consider that if I have a legal integer I can likely load/store it with a single pair of instructions. So it seems to me like a strict improvements over the existing "magic number". mehdi_amini: > All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage ==…
		jfbUnsubmitted Not Done Reply Inline Actions Because we should be able to tune this magic number over time, per target, and using `LargestLegalIntTypeSizeInBits` doesn't allow us to do so. It's trivial to add this new magic number, and default it to `LargestLegalIntTypeSizeInBits` in the generic target setup. Let's just do that, and then the code is even self-documenting. jfb: Because we should be able to tune this magic number over time, per target, and using…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should try to be independent of target specific behavior at instcombine. It be ideal to just not have inlining of memcpy here and do it all in memcpyopt with better cost-model. hiraditya: Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should…
Value *SrcElementAddr =		Value *SrcElementAddr =
		lebedev.riUnsubmitted Done Reply Inline Actions I think we decided that the tests should be updated instead? lebedev.ri: I think we decided that the tests should be updated instead?
		dmgreenUnsubmitted Done Reply Inline Actions Are the units of these the same? Or is one bits and the other bytes? dmgreen: Are the units of these the same? Or is one bits and the other bytes?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Thank you! hiraditya: Thank you!
		lebedev.riUnsubmitted Done Reply Inline Actions I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it was previously pointed out by @dmgreen, `Size` is bytes, while `LargestInt` is clearly in bits. lebedev.ri: I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it…
Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");		Builder->CreateGEP(SrcCasted, ElementIdxCI, "memcpy_unfold.src_addr");
Value *DstElementAddr =		Value *DstElementAddr =
		lebedev.riUnsubmitted Done Reply Inline Actions You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`. lebedev.ri: You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`.
Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");		Builder->CreateGEP(DstCasted, ElementIdxCI, "memcpy_unfold.dst_addr");

// Load from the source. Transfer alignment information and mark load as		// Load from the source. Transfer alignment information and mark load as
// unordered atomic.		// unordered atomic.
LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");		LoadInst *Load = Builder->CreateLoad(SrcElementAddr, "memcpy_unfold.val");
Load->setOrdering(AtomicOrdering::Unordered);		Load->setOrdering(AtomicOrdering::Unordered);
// We know alignment of the first element. It is also guaranteed by the		// We know alignment of the first element. It is also guaranteed by the
// verifier that element size is less or equal than first element		// verifier that element size is less or equal than first element
Show All 25 Lines	Instruction InstCombiner::SimplifyMemTransfer(MemIntrinsic MI) {
unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);		unsigned SrcAlign = getKnownAlignment(MI->getArgOperand(1), DL, MI, &AC, &DT);
unsigned MinAlign = std::min(DstAlign, SrcAlign);		unsigned MinAlign = std::min(DstAlign, SrcAlign);
unsigned CopyAlign = MI->getAlignment();		unsigned CopyAlign = MI->getAlignment();

if (CopyAlign < MinAlign) {		if (CopyAlign < MinAlign) {
MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, false));		MI->setAlignment(ConstantInt::get(MI->getAlignmentType(), MinAlign, false));
return MI;		return MI;
}		}

// If MemCpyInst length is 1/2/4/8 bytes then replace memcpy with
// load/store.
ConstantInt *MemOpLength = dyn_cast<ConstantInt>(MI->getArgOperand(2));		ConstantInt *MemOpLength = dyn_cast<ConstantInt>(MI->getArgOperand(2));
		spatelUnsubmitted Done Reply Inline Actions This comment is wrong now? I'm still confused about the current behavior and what this patch is changing. Did you see my earlier comment to check in minimal tests to trunk before this patch so we have a baseline view of the current behavior? Do the tests really need loops, attributes, globals, etc? If there are no legal types in the DL, why wouldn't we just bail out rather than assuming that 32-bit and smaller is safe/desirable to transform? spatel: This comment is wrong now? I'm still confused about the current behavior and what this patch is…
		DIVYAUnsubmitted Not Done Reply Inline Actions The test contains loops to show that builtin_memcpy within the loops , will also be converted to either memcpy or store and load operations depending on the maximum allowed stores DIVYA: The test contains loops to show that builtin_memcpy within the loops , will also be converted…
if (!MemOpLength) return nullptr;		if (!MemOpLength) return nullptr;

// Source and destination pointer types are always "i8*" for intrinsic. See		// Source and destination pointer types are always "i8*" for intrinsic. See
// if the size is something we can handle with a single primitive load/store.		// if the size is something we can handle with a single primitive load/store.
// A single load+store correctly handles overlapping memory in the memmove		// A single load+store correctly handles overlapping memory in the memmove
// case.		// case.
uint64_t Size = MemOpLength->getLimitedValue();		uint64_t Size = MemOpLength->getLimitedValue();
assert(Size && "0-sized memory transferring should be removed already.");		assert(Size && "0-sized memory transferring should be removed already.");

if (Size > 8 \|\| (Size&(Size-1)))
return nullptr; // If not 1/2/4/8 bytes, exit.		// Since we don't have perfect knowledge here, make some assumptions: assume
		// the maximum allowed stores for memcpy operation is the same size as the
		// largest legal integer size.
		unsigned LargestInt = DL.getLargestLegalIntTypeSizeInBits();
		lebedev.riUnsubmitted Not Done Reply Inline Actions Nit: spaces around `/`. lebedev.ri: Nit: spaces around `/`.
		if (LargestInt == 0)
		spatelUnsubmitted Done Reply Inline Actions Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. spatel: Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Why 32? 32 because we want to default to 8, same as the previous behavior before this patch. I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. For example, in test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0. hiraditya: > Why 32? 32 because we want to default to 8, same as the previous behavior before this patch.
		craig.topperUnsubmitted Not Done Reply Inline Actions But the previous behavior allowed 8 bytes so shouldn't it be 64? craig.topper: But the previous behavior allowed 8 bytes so shouldn't it be 64?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions It's been a while when we submitted the patch but I think you're right unless @DIVYA has some comments. We'll make the changes. hiraditya: It's been a while when we submitted the patch but I think you're right unless @DIVYA has some…
		LargestInt = 32;

		if (Size > 2*LargestInt/8 \|\| (Size&(Size-1)))
		return nullptr;
		spatelUnsubmitted Done Reply Inline Actions This code comment isn't correct any more. spatel: This code comment isn't correct any more.
		craig.topperUnsubmitted Not Done Reply Inline Actions Is there an extra blank line here? craig.topper: Is there an extra blank line here?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions We'll fix this. hiraditya: We'll fix this.

// Use an integer load+store unless we can find something better.		// Use an integer load+store unless we can find something better.
unsigned SrcAddrSp =		unsigned SrcAddrSp =
cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace();		cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace();
unsigned DstAddrSp =		unsigned DstAddrSp =
cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace();		cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace();

IntegerType* IntType = IntegerType::get(MI->getContext(), Size<<3);		IntegerType* IntType = IntegerType::get(MI->getContext(), Size<<3);
▲ Show 20 Lines • Show All 4,215 Lines • Show Last 20 Lines

test/DebugInfo/X86/array2.ll

Show All 12 Lines
; }		; }
;		;
; RUN: opt %s -O2 -S -o - \| FileCheck %s		; RUN: opt %s -O2 -S -o - \| FileCheck %s
; Test that we correctly lower dbg.declares for arrays.		; Test that we correctly lower dbg.declares for arrays.
;		;
; CHECK: define i32 @main		; CHECK: define i32 @main
; CHECK: call void @llvm.dbg.value(metadata i32 42, i64 0, metadata ![[ARRAY:[0-9]+]], metadata ![[EXPR:[0-9]+]])		; CHECK: call void @llvm.dbg.value(metadata i32 42, i64 0, metadata ![[ARRAY:[0-9]+]], metadata ![[EXPR:[0-9]+]])
; CHECK: ![[ARRAY]] = !DILocalVariable(name: "array",{{.*}} line: 6		; CHECK: ![[ARRAY]] = !DILocalVariable(name: "array",{{.*}} line: 6
; CHECK: ![[EXPR]] = !DIExpression(DW_OP_LLVM_fragment, 0, 32)		; CHECK: ![[EXPR]] = !DIExpression(DW_OP_LLVM_fragment, 0, 32)
		jfbUnsubmitted Not Done Reply Inline Actions Why change this? jfb: Why change this?
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"		target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.9.0"		target triple = "x86_64-apple-macosx10.9.0"

@main.array = private unnamed_addr constant [4 x i32] [i32 0, i32 1, i32 2, i32 3], align 16		@main.array = private unnamed_addr constant [4 x i32] [i32 0, i32 1, i32 2, i32 3], align 16

; Function Attrs: nounwind ssp uwtable		; Function Attrs: nounwind ssp uwtable
define void @f(i32* %p) #0 !dbg !4 {		define void @f(i32* %p) #0 !dbg !4 {
entry:		entry:
Show All 18 Lines	entry:
%array = alloca [4 x i32], align 16		%array = alloca [4 x i32], align 16
store i32 0, i32* %retval		store i32 0, i32* %retval
store i32 %argc, i32* %argc.addr, align 4		store i32 %argc, i32* %argc.addr, align 4
call void @llvm.dbg.declare(metadata i32* %argc.addr, metadata !23, metadata !DIExpression()), !dbg !24		call void @llvm.dbg.declare(metadata i32* %argc.addr, metadata !23, metadata !DIExpression()), !dbg !24
store i8 %argv, i8* %argv.addr, align 8		store i8 %argv, i8* %argv.addr, align 8
call void @llvm.dbg.declare(metadata i8*** %argv.addr, metadata !25, metadata !DIExpression()), !dbg !24		call void @llvm.dbg.declare(metadata i8*** %argv.addr, metadata !25, metadata !DIExpression()), !dbg !24
call void @llvm.dbg.declare(metadata [4 x i32]* %array, metadata !26, metadata !DIExpression()), !dbg !30		call void @llvm.dbg.declare(metadata [4 x i32]* %array, metadata !26, metadata !DIExpression()), !dbg !30
%0 = bitcast [4 x i32]* %array to i8*, !dbg !30		%0 = bitcast [4 x i32]* %array to i8*, !dbg !30
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast ([4 x i32]* @main.array to i8*), i64 16, i32 16, i1 false), !dbg !30		call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* bitcast ([4 x i32]* @main.array to i8*), i64 128, i32 16, i1 false), !dbg !30
%arraydecay = getelementptr inbounds [4 x i32], [4 x i32]* %array, i32 0, i32 0, !dbg !31		%arraydecay = getelementptr inbounds [4 x i32], [4 x i32]* %array, i32 0, i32 0, !dbg !31
call void @f(i32* %arraydecay), !dbg !31		call void @f(i32* %arraydecay), !dbg !31
%arrayidx = getelementptr inbounds [4 x i32], [4 x i32]* %array, i32 0, i64 0, !dbg !32		%arrayidx = getelementptr inbounds [4 x i32], [4 x i32]* %array, i32 0, i64 0, !dbg !32
%1 = load i32, i32* %arrayidx, align 4, !dbg !32		%1 = load i32, i32* %arrayidx, align 4, !dbg !32
ret i32 %1, !dbg !32		ret i32 %1, !dbg !32
}		}

; Function Attrs: nounwind		; Function Attrs: nounwind
▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines