This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
9/27
InstCombineCalls.cpp
-
test/
-
DebugInfo/X86/
-
X86/
1
array2.ll
-
Transforms/InstCombine/
-
InstCombine/
1
2007-10-10-EliminateMemCpy.ll
1
alloca.ll
1
element-atomic-memintrins.ll
2/4
memcpy-to-load.ll
-
memmove.ll
-
pr31990_wrong_memcpy.ll
-
snprintf.ll
-
sprintf-1.ll

Differential D35035

[InstCombine] Prevent memcpy generation for small data size
AcceptedPublic

Authored by hiraditya on Jul 5 2017, 2:22 PM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
majnemer
DIVYA
sebpop
SirishP
alnemr55
lebedev.ri

Summary

InstCombine pass converts @llvm.memcpy to either memcpy or LD/ST depending on if the data size is greater than 8 bytes or not.
However for different targets, 8 bytes may be smaller.
Hence, we queried target datalayout to find the threshold.

Worked in collaboration with Aditya Kumar

Diff Detail

Repository: rL LLVM

Event Timeline

DIVYA created this revision.Jul 5 2017, 2:22 PM

Updated array2.ll testcase file

spatel added inline comments.Jul 6 2017, 11:29 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
136–141	This code comment isn't correct any more.
140–141	Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain.
test/Transforms/InstCombine/builtin_memcpy_pattern.ll
1 ↗	(On Diff #105339)	The test should be limited to -instcombine, not -O2. The tests here have way too much extraneous stuff. Please minimize to just the IR necessary to show the transform. You can check the tests in ahead of this patch to show the current IR, then this patch will just show the diffs from the existing code. Please use the script at util/update_tests_checks.py to auto-generate the complete CHECK lines.

hiraditya added a subscriber: hiraditya.Jul 6 2017, 1:56 PM

hiraditya added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
140–141	Why 32? 32 because we want to default to 8, same as the previous behavior before this patch. I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. For example, in test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.

removed some comments
added testcase with different datalayout

DIVYA marked 2 inline comments as done.Jul 25 2017, 11:31 AM

ping

spatel added inline comments.Jul 31 2017, 2:16 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
123–124	This comment is wrong now? I'm still confused about the current behavior and what this patch is changing. Did you see my earlier comment to check in minimal tests to trunk before this patch so we have a baseline view of the current behavior? Do the tests really need loops, attributes, globals, etc? If there are no legal types in the DL, why wouldn't we just bail out rather than assuming that 32-bit and smaller is safe/desirable to transform?

Added a diff for the output before and after applying the patch
Removed some comments

DIVYA updated this revision to Diff 109979.Aug 7 2017, 6:51 AM

DIVYA marked an inline comment as done.Aug 7 2017, 6:56 AM

DIVYA added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
123–124	The test contains loops to show that builtin_memcpy within the loops , will also be converted to either memcpy or store and load operations depending on the maximum allowed stores

DIVYA marked an inline comment as done.Aug 7 2017, 6:57 AM

DIVYA updated this revision to Diff 109989.Aug 7 2017, 7:27 AM

Diff for test/Transforms/InstCombine/builtin_memcpy_patterns.ll

TESTCASE1
Before applying the patch

define void @foo(i8* %a, i8* %b) local_unnamed_addr #0 {
entry:

call void @llvm.memcpy.p0i8.p0i8.i64(i8* %a, i8* %b, i64 16, i32 1, i1 false)
ret void

}

After Applying the patch

define void @foo(i8* %a, i8* %b) local_unnamed_addr #0 {
entry:

%0 = bitcast i8* %b to i128*
%1 = bitcast i8* %a to i128*
%2 = load i128, i128* %0, align 1
store i128 %2, i128* %1, align 1
ret void

}

TESTCASE2

Before Applying the patch

%0 = bitcast i32* %add.ptr to i8*
%add.ptr3 = getelementptr inbounds [16 x i32], [16 x i32]* @b, i64 0, i64 %idx.ext
%1 = bitcast i32* %add.ptr3 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %0, i8* %1, i64 16, i32 4, i1 false)
%inc = add nsw i32 %i.0, 1

After Applying the patch

%0 = bitcast i32* %add.ptr3 to i128*
%1 = bitcast i32* %add.ptr to i128*
%2 = load i128, i128* %0, align 16
store i128 %2, i128* %1, align 4

DIVYA updated this revision to Diff 110467.Aug 9 2017, 1:33 PM

spatel mentioned this in rL310611: [InstCombine] add memcpy expansion tests with potential DL dependency; NFC.Aug 10 2017, 8:38 AM

This patch as uploaded makes even less sense to me than before:

There are no tests to show the diff.
You've changed the code rather than an assertion in an unrelated test to show a functional difference?

I've checked in some extra tests where I think your patch will fire here:
https://reviews.llvm.org/rL310611

Please update this patch to include that and then explain the intent of each diff.

As I mentioned before, you can use a script to update the check lines (please don't hand-edit them):
$ path/to/update_test_checks.py --opt=path/to/build/bin/opt memcpy-to-load.ll

Updated patch rL310611 and added the diff
added a new testcase

spatel added inline comments.Aug 11 2017, 1:36 PM

test/Transforms/InstCombine/builtin_memcpy_patterns.ll
3–6 ↗	(On Diff #110787)	I still don't understand what value these tests provide over the existing ones. These are the same llvm.memcpy calls except they are in loops? The transform doesn't depend on loops from what I can tell, so why do we need to test this?
test/Transforms/InstCombine/memcpy-to-load.ll
92	That is the existing behavior, but why is that valid now that we're using the datalayout?
104	You're mixing bytes and bits here. It makes sense to me that we would use the datalayout to drive this transform, but why are we then ignoring the datalayout and producing load + store of an illegal type (i128)?

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

In D35035#839833, @joerg wrote:

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

I can't answer for 'memcpy' directly, but probably related - I'm looking at cases where late 'memcmp' expansion causes us to miss optimizations in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13

instcombine is definitely not the IR pass for 'memcmp' expansion. (It's complicated enough to be its own pass, but currently it's a pass-within-a-pass of CGP.)

We could make the argument that the post-IR backend should handle those cases, but I think it would require the collective powers of gvn, cse, instcombine, and simplifycfg.

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

In D35035#839852, @joerg wrote:

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

Thanks, @joerg . I can't see any reason not to do prefix elimination either. I'm not sure if that will solve everything for that case, but it should make it better. I'll give that a try.

DIVYA updated this revision to Diff 110982.Aug 14 2017, 8:42 AM

DIVYA marked an inline comment as done.

DIVYA added inline comments.Aug 14 2017, 8:44 AM

test/Transforms/InstCombine/memcpy-to-load.ll
92	In test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.For those cases it will have the previous behavious before applying the patch

DIVYA updated this revision to Diff 111029.Aug 14 2017, 10:27 AM

spatel mentioned this in D36922: [LibCallSimplifier] try harder to fold memcmp with constant arguments.Aug 19 2017, 8:54 AM

spatel mentioned this in D52081: [InstCombine] do not expand 8 byte memcpy if optimising for minsize.Sep 15 2018, 7:14 AM

craig.topper added inline comments.Sep 16 2018, 6:00 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
136–141	Is there an extra blank line here?
140–141	But the previous behavior allowed 8 bytes so shouldn't it be 64?

hiraditya added inline comments.Sep 16 2018, 11:54 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
136–141	We'll fix this.
140–141	It's been a while when we submitted the patch but I think you're right unless @DIVYA has some comments. We'll make the changes.

I would quickly like to check the status of this patch: do you have plans to continue this work? If not, I would like to pick it up.

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

In D35035#1238502, @hiraditya wrote:

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

In D35035#1238511, @lebedev.ri wrote:

In D35035#1238502, @hiraditya wrote:

In D35035#1237497, @spatel wrote:

In D35035#1237401, @hiraditya wrote:

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

I think so...

dmgreen added a subscriber: dmgreen.Sep 19 2018, 6:08 AM

To fix tests and comments from reviewers.

Fix some unit-tests, a few more remaining.

Herald added a subscriber: jfb. · View Herald TranscriptSep 30 2018, 10:16 AM

lebedev.ri added inline comments.Sep 30 2018, 10:19 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
133	You can check that the `Size` is power of two here.
139–140	I think we decided that the tests should be updated instead?
142	You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`.

Fixed all testcases, and removed memcpy inlining when target datalayout is not present.

Moved the check of Size before checking the LargestInt.

hiraditya marked 3 inline comments as done.Sep 30 2018, 12:28 PM

hiraditya added reviewers: sebpop, SirishP.

Code looks good.

test/Transforms/InstCombine/element-atomic-memintrins.ll
102	Please just use `utils/update_test_checks.py`, as stated in the first line of the test.
test/Transforms/InstCombine/memcpy-to-load.ll
76	Will running the `utils/update_test_checks.py` preserve this 'inline' comments?

lebedev.ri added inline comments.Sep 30 2018, 1:45 PM

lib/Transforms/InstCombine/InstCombineCalls.cpp
133	Actually, please use `!isPowerOf2_64(Size)`.

Updated testcases with utils/update_test_checks.py
use isPowerOf2_64

hiraditya marked 2 inline comments as done.Oct 1 2018, 12:01 AM

dmgreen added inline comments.Oct 1 2018, 6:26 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
140	Are the units of these the same? Or is one bits and the other bytes?

I don't understand why it makes sense to tie this to the larger integer type. I understand that targets might want what you're doing, but tying it to the largest integer type instead of having a separate target-specific value seems odd.
How does this interact with memcpyopt?
Which targets are affected by this change?
Can you please provide size and performance numbers for relevant targets?

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter.
test/DebugInfo/X86/array2.ll
21	Why change this?
test/Transforms/InstCombine/2007-10-10-EliminateMemCpy.ll
3	Why change this?
test/Transforms/InstCombine/alloca.ll
147	This test checks that the `inalloca` remains. Change the test to still test the same thing, don't just delete the `CHECK`. https://bugs.llvm.org/show_bug.cgi?id=19569

jfb added a subscriber: MatzeB.Oct 1 2018, 10:44 AM

hiraditya added inline comments.Oct 2 2018, 7:17 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available.
140	Thank you!

spatel added inline comments.Oct 2 2018, 8:44 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	I'm not understanding this discussion... This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air. The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081. But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend. Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine.

jfb added inline comments.Oct 2 2018, 9:24 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Let me try to expand a bit, and let me know if that makes sense: IMO `LargestLegalIntTypeSizeInBits` doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't.

lebedev.ri added inline comments.Oct 2 2018, 9:45 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be a new field in target datalayout? I'm not sure if we can get TargetLowering info from the backend in the middle-end?

spatel added inline comments.Oct 2 2018, 9:50 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality. So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all. IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong). But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering).

jfb added inline comments.Oct 2 2018, 9:55 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage == LargestLegalIntTypeSizeInBits` isn't what we should be doing.

hiraditya added inline comments.Oct 2 2018, 10:14 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should try to be independent of target specific behavior at instcombine. It be ideal to just not have inlining of memcpy here and do it all in memcpyopt with better cost-model.

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

In D35035#1252796, @SjoerdMeijer wrote:

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

In D35035#1252976, @lebedev.ri wrote:

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

In D35035#1253023, @spatel wrote:

In D35035#1252976, @lebedev.ri wrote:

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

clang also performs some of this work, even at O0.

In D35035#1252975, @spatel wrote:

In D35035#1252829, @hiraditya wrote:

In D35035#1252796, @SjoerdMeijer wrote:

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.

Makes sense.

And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

In D35035#1253827, @hiraditya wrote:

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

We should wait for the perf results from @SjoerdMeijer (and anyone else that is in a position to collect benchmark results?).
If the full solution (don't expand at all) is too ambitious, then I support continuing with this patch, but we need to make sure that we've answered @jfb's concerns. Ie, if the datalayout query is too distasteful, what is the alternative?

I am doing some experiments with a hack that simply comments out the call to InstCombiner::SimplifyAnyMemTransfer. I've ran 3 smaller benchmarks. 2 didn't show any difference. The 3rd shows 1 tiny regression and 1 tiny improvement in a test case, and will probably not even show a difference in the geomean. I am doing this as a background task; tomorrow I will ran bigger benchmarks, on different platforms. But I guess we need some numbers for non-Arm platforms too.

Any updates?

Oops, sorry, forgot to reply, and also got distracted by a few other things. I ran one bigger benchmark, and didn't see anything worth mentioning. A first preliminary conclusion: looks like we don't miss much by not doing this lowering here. Disclaimer: I've tested only on Arm targets, definitely not on all interesting architecture combinations, and a handful of benchmarks.

lebedev.ri requested changes to this revision.Oct 16 2018, 8:27 AM

lebedev.ri added inline comments.

lib/Transforms/InstCombine/InstCombineCalls.cpp
140	I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it was previously pointed out by @dmgreen, `Size` is bytes, while `LargestInt` is clearly in bits.

This revision now requires changes to proceed.Oct 16 2018, 8:27 AM

Fix testcases and address comments

hiraditya marked 2 inline comments as done.Oct 21 2018, 12:37 PM

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

lib/Transforms/InstCombine/InstCombineCalls.cpp
140	Nit: spaces around `/`.

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?
If not, there are unanswered comments about the test diffs.
There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
Are there any updates on the perf for the ideal change (removing this expansion completely)?

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

In D35035#1284164, @jfb wrote:

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

In D35035#1297082, @hiraditya wrote:

In D35035#1284164, @jfb wrote:

In D35035#1282236, @spatel wrote:

In D35035#1281462, @lebedev.ri wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

Yes, please add a constant which informs this optimization. Don't reuse an unrelated constant for this purpose. Don't use this new constant to dive other unrelated optimizations.

If not, there are unanswered comments about the test diffs.

There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).

Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

lebedev.ri mentioned this in D54887: WIP: hack to use DL for promoting to load/store.Nov 25 2018, 11:41 PM

In D35035#1253023, @spatel wrote:

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing. @jfb can you elaborate why we shouldn't do that? I'm not sure I understand why are you qualifying this as `ErroneousUsage`? I can see that it would be suboptimal in some cases (and ideally it would be a TTI), but that does not enough to make it erroneous. I see a rational to consider that if I have a legal integer I can likely load/store it with a single pair of instructions. So it seems to me like a strict improvements over the existing "magic number".

In D35035#1308250, @mehdi_amini wrote:

In D35035#1253023, @spatel wrote:

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

Is it possible to improve SROA to understand llvm.memcpy instead? IIUC, the memcpy intrinsic is considered canonical form, and the consensus in this review was that we would not expand memcpy in instcombine if that did not cause regressions.

D54887 is proposing a similar change as here, but it's more liberal. It would allow expansion to load/store of weird types (non-power-of-2) as long as they fit in a single integer register. That seems dangerous given that the backend isn't very good at handling those illegal types. Example:

target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1) nounwind

define void @copy_3_bytes(i8* %d, i8* %s) {
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 3, i1 false)
  ret void
}

$ opt -instcombine -S memcpy.ll
...
define void @copy_3_bytes(i8* %d, i8* %s) {
  %1 = bitcast i8* %s to i24*
  %2 = bitcast i8* %d to i24*
  %3 = load i24, i24* %1, align 1
  store i24 %3, i24* %2, align 1
  ret void
}

jfb added inline comments.Nov 26 2018, 10:49 AM

lib/Transforms/InstCombine/InstCombineCalls.cpp
139	Because we should be able to tune this magic number over time, per target, and using `LargestLegalIntTypeSizeInBits` doesn't allow us to do so. It's trivial to add this new magic number, and default it to `LargestLegalIntTypeSizeInBits` in the generic target setup. Let's just do that, and then the code is even self-documenting.

alnemr55 accepted this revision.Jan 4 2019, 2:38 AM

This revision is now accepted and ready to land.Jan 4 2019, 2:38 AM

Data layout question wasn't resolved. (just signalling, i don't have a preference one way or another)

This revision now requires changes to proceed.Jan 4 2019, 3:05 AM

Thanks Roman.

@alnemr55 - Why did you mark this as accepted? Maybe it was an accidental click? If not, please don't mark things as accepted unless you've been actively participating in the review and the review has genuinely concluded.

ychen added a subscriber: ychen.Jun 20 2019, 4:05 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 4:05 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

This revision is now accepted and ready to land.Jan 12 2023, 4:41 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 4:41 PM

Herald added a subscriber: StephenFan. · View Herald Transcript

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

12 lines

test/

DebugInfo/

X86/

array2.ll

4 lines

Transforms/

InstCombine/

2007-10-10-EliminateMemCpy.ll

2 lines

alloca.ll

1 line

element-atomic-memintrins.ll

33 lines

memcpy-to-load.ll

46 lines

memmove.ll

1 line

pr31990_wrong_memcpy.ll

1 line

snprintf.ll

2 lines

sprintf-1.ll

2 lines

Diff 170348

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 114 Lines • ▼ Show 20 Lines	Instruction InstCombiner::SimplifyAnyMemTransfer(AnyMemTransferInst MI) {
}		}

unsigned SrcAlign = getKnownAlignment(MI->getRawSource(), DL, MI, &AC, &DT);		unsigned SrcAlign = getKnownAlignment(MI->getRawSource(), DL, MI, &AC, &DT);
unsigned CopySrcAlign = MI->getSourceAlignment();		unsigned CopySrcAlign = MI->getSourceAlignment();
if (CopySrcAlign < SrcAlign) {		if (CopySrcAlign < SrcAlign) {
MI->setSourceAlignment(SrcAlign);		MI->setSourceAlignment(SrcAlign);
return MI;		return MI;
}		}

// If MemCpyInst length is 1/2/4/8 bytes then replace memcpy with
// load/store.
ConstantInt *MemOpLength = dyn_cast<ConstantInt>(MI->getLength());		ConstantInt *MemOpLength = dyn_cast<ConstantInt>(MI->getLength());
		spatelUnsubmitted Done Reply Inline Actions This comment is wrong now? I'm still confused about the current behavior and what this patch is changing. Did you see my earlier comment to check in minimal tests to trunk before this patch so we have a baseline view of the current behavior? Do the tests really need loops, attributes, globals, etc? If there are no legal types in the DL, why wouldn't we just bail out rather than assuming that 32-bit and smaller is safe/desirable to transform? spatel: This comment is wrong now? I'm still confused about the current behavior and what this patch is…
		DIVYAUnsubmitted Not Done Reply Inline Actions The test contains loops to show that builtin_memcpy within the loops , will also be converted to either memcpy or store and load operations depending on the maximum allowed stores DIVYA: The test contains loops to show that builtin_memcpy within the loops , will also be converted…
if (!MemOpLength) return nullptr;		if (!MemOpLength) return nullptr;

// Source and destination pointer types are always "i8*" for intrinsic. See		// Source and destination pointer types are always "i8*" for intrinsic. See
// if the size is something we can handle with a single primitive load/store.		// if the size is something we can handle with a single primitive load/store.
// A single load+store correctly handles overlapping memory in the memmove		// A single load+store correctly handles overlapping memory in the memmove
// case.		// case.
uint64_t Size = MemOpLength->getLimitedValue();		uint64_t Size = MemOpLength->getLimitedValue();
assert(Size && "0-sized memory transferring should be removed already.");		assert(Size && "0-sized memory transferring should be removed already.");
		if (!isPowerOf2_64(Size))
		lebedev.riUnsubmitted Done Reply Inline Actions You can check that the `Size` is power of two here. lebedev.ri: You can check that the `Size` is power of two here.
		lebedev.riUnsubmitted Done Reply Inline Actions Actually, please use `!isPowerOf2_64(Size)`. lebedev.ri: Actually, please use `!isPowerOf2_64(Size)`.
		return nullptr;

if (Size > 8 \|\| (Size&(Size-1)))		// Since we don't have perfect knowledge here, make some assumptions: assume
return nullptr; // If not 1/2/4/8 bytes, exit.		// the maximum allowed stores for memcpy operation is the same size as the
		// largest legal integer size.
		unsigned LargestInt = DL.getLargestLegalIntTypeSizeInBits();
		jfbUnsubmitted Not Done Reply Inline Actions Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter. jfb: Why does this make sense? I don't understand why we'd want to tie this to the largest legal…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available. hiraditya: We used 'largest legal integer size' because that will fit in a register for sure. I think…
		spatelUnsubmitted Not Done Reply Inline Actions I'm not understanding this discussion... This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air. The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081. But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend. Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine. spatel: I'm not understanding this discussion... This patch is trying to apply some target-based…
		jfbUnsubmitted Not Done Reply Inline Actions Let me try to expand a bit, and let me know if that makes sense: IMO `LargestLegalIntTypeSizeInBits` doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't. jfb: Let me try to expand a bit, and let me know if that makes sense: IMO…
		lebedev.riUnsubmitted Not Done Reply Inline Actions It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be a new field in target datalayout? I'm not sure if we can get TargetLowering info from the backend in the middle-end? lebedev.ri: It's replacing magic hardcoded number with a number pulled out of target datalayout. Should be…
		spatelUnsubmitted Not Done Reply Inline Actions I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality. So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all. IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong). But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering). spatel: I agree that this is a strange transform for instcombine. We're trying to safely replace an…
		jfbUnsubmitted Not Done Reply Inline Actions All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage == LargestLegalIntTypeSizeInBits` isn't what we should be doing. jfb: All I'm saying is that replacing `MagicNumber == 8` with `ErroneousUsage ==…
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing. @jfb can you elaborate why we shouldn't do that? I'm not sure I understand why are you qualifying this as `ErroneousUsage`? I can see that it would be suboptimal in some cases (and ideally it would be a TTI), but that does not enough to make it erroneous. I see a rational to consider that if I have a legal integer I can likely load/store it with a single pair of instructions. So it seems to me like a strict improvements over the existing "magic number". mehdi_amini: > All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage ==…
		jfbUnsubmitted Not Done Reply Inline Actions Because we should be able to tune this magic number over time, per target, and using `LargestLegalIntTypeSizeInBits` doesn't allow us to do so. It's trivial to add this new magic number, and default it to `LargestLegalIntTypeSizeInBits` in the generic target setup. Let's just do that, and then the code is even self-documenting. jfb: Because we should be able to tune this magic number over time, per target, and using…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should try to be independent of target specific behavior at instcombine. It be ideal to just not have inlining of memcpy here and do it all in memcpyopt with better cost-model. hiraditya: Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should…
		if (!LargestInt \|\| Size > LargestInt/8)
		lebedev.riUnsubmitted Done Reply Inline Actions I think we decided that the tests should be updated instead? lebedev.ri: I think we decided that the tests should be updated instead?
		dmgreenUnsubmitted Done Reply Inline Actions Are the units of these the same? Or is one bits and the other bytes? dmgreen: Are the units of these the same? Or is one bits and the other bytes?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Thank you! hiraditya: Thank you!
		lebedev.riUnsubmitted Done Reply Inline Actions I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it was previously pointed out by @dmgreen, `Size` is bytes, while `LargestInt` is clearly in bits. lebedev.ri: I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.) As it…
		lebedev.riUnsubmitted Not Done Reply Inline Actions Nit: spaces around `/`. lebedev.ri: Nit: spaces around `/`.
		return nullptr;
		spatelUnsubmitted Done Reply Inline Actions This code comment isn't correct any more. spatel: This code comment isn't correct any more.
		spatelUnsubmitted Done Reply Inline Actions Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. spatel: Why 32? I'm not sure what it means if there are no legal types in the DL. Please add a code…
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions Why 32? 32 because we want to default to 8, same as the previous behavior before this patch. I'm not sure what it means if there are no legal types in the DL. Please add a code comment to explain. For example, in test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0. hiraditya: > Why 32? 32 because we want to default to 8, same as the previous behavior before this patch.
		craig.topperUnsubmitted Not Done Reply Inline Actions But the previous behavior allowed 8 bytes so shouldn't it be 64? craig.topper: But the previous behavior allowed 8 bytes so shouldn't it be 64?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions It's been a while when we submitted the patch but I think you're right unless @DIVYA has some comments. We'll make the changes. hiraditya: It's been a while when we submitted the patch but I think you're right unless @DIVYA has some…
		craig.topperUnsubmitted Not Done Reply Inline Actions Is there an extra blank line here? craig.topper: Is there an extra blank line here?
		hiradityaAuthorUnsubmitted Not Done Reply Inline Actions We'll fix this. hiraditya: We'll fix this.

		lebedev.riUnsubmitted Done Reply Inline Actions You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`. lebedev.ri: You overrode LargestInt already, it can't be `0`. This should only be `Size > LargestInt`.
// Use an integer load+store unless we can find something better.		// Use an integer load+store unless we can find something better.
unsigned SrcAddrSp =		unsigned SrcAddrSp =
cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace();		cast<PointerType>(MI->getArgOperand(1)->getType())->getAddressSpace();
unsigned DstAddrSp =		unsigned DstAddrSp =
cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace();		cast<PointerType>(MI->getArgOperand(0)->getType())->getAddressSpace();

IntegerType* IntType = IntegerType::get(MI->getContext(), Size<<3);		IntegerType* IntType = IntegerType::get(MI->getContext(), Size<<3);
Type *NewSrcPtrTy = PointerType::get(IntType, SrcAddrSp);		Type *NewSrcPtrTy = PointerType::get(IntType, SrcAddrSp);
▲ Show 20 Lines • Show All 4,456 Lines • Show Last 20 Lines

test/DebugInfo/X86/array2.ll

	Show All 10 Lines
	; f(array);			; f(array);
	; return array[0];			; return array[0];
	; }			; }
	;			;
	; RUN: opt %s -O2 -S -o - \| FileCheck %s			; RUN: opt %s -O2 -S -o - \| FileCheck %s
	; Test that we correctly lower dbg.declares for arrays.			; Test that we correctly lower dbg.declares for arrays.
	;			;
	; CHECK: define i32 @main			; CHECK: define i32 @main
	; CHECK: call void @llvm.dbg.value(metadata i32 42, metadata ![[ARRAY:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32))			; CHECK: tail call void @llvm.dbg.value(metadata i32 [[ARGC:%.*]], i64 0, metadata !22, metadata !12), !dbg !23
				; CHECK: tail call void @llvm.dbg.value(metadata i8** [[ARGV:%.*]], i64 0, metadata !24, metadata !12), !dbg !23
				; CHECK: tail call void @llvm.dbg.value(metadata i32 42, metadata ![[ARRAY:[0-9]+]], metadata !DIExpression(DW_OP_LLVM_fragment, 0, 32))
				jfbUnsubmitted Not Done Reply Inline Actions Why change this? jfb: Why change this?
	; CHECK: ![[ARRAY]] = !DILocalVariable(name: "array",{{.*}} line: 6			; CHECK: ![[ARRAY]] = !DILocalVariable(name: "array",{{.*}} line: 6
	target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-apple-macosx10.9.0"			target triple = "x86_64-apple-macosx10.9.0"

	@main.array = private unnamed_addr constant [4 x i32] [i32 0, i32 1, i32 2, i32 3], align 16			@main.array = private unnamed_addr constant [4 x i32] [i32 0, i32 1, i32 2, i32 3], align 16

	; Function Attrs: nounwind ssp uwtable			; Function Attrs: nounwind ssp uwtable
	define void @f(i32* %p) #0 !dbg !4 {			define void @f(i32* %p) #0 !dbg !4 {
	▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/Transforms/InstCombine/2007-10-10-EliminateMemCpy.ll

	; RUN: opt < %s -instcombine -S \| not grep call			; RUN: opt < %s -instcombine -S \| not grep call
	; RUN: opt < %s -O3 -S \| not grep xyz			; RUN: opt < %s -O3 -S \| not grep xyz
	target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128"			target datalayout = "E-p:64:64:64-a0:0:8-f32:32:32-f64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-v64:64:64-v128:128:128-n32"
				jfbUnsubmitted Not Done Reply Inline Actions Why change this? jfb: Why change this?

	@.str = internal constant [4 x i8] c"xyz\00" ; <[4 x i8]*> [#uses=1]			@.str = internal constant [4 x i8] c"xyz\00" ; <[4 x i8]*> [#uses=1]

	define void @foo(i8* %P) {			define void @foo(i8* %P) {
	entry:			entry:
	%P_addr = alloca i8*			%P_addr = alloca i8*
	store i8* %P, i8** %P_addr			store i8* %P, i8** %P_addr
	%tmp = load i8, i8* %P_addr, align 4			%tmp = load i8, i8* %P_addr, align 4
	Show All 9 Lines

test/Transforms/InstCombine/alloca.ll

	Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines
	declare i8* @llvm.stacksave()			declare i8* @llvm.stacksave()
	declare void @llvm.stackrestore(i8*)			declare void @llvm.stackrestore(i8*)

	define void @test9(%struct_type* %a) {			define void @test9(%struct_type* %a) {
	; CHECK-LABEL: @test9(			; CHECK-LABEL: @test9(
	entry:			entry:
	%inalloca.save = call i8* @llvm.stacksave()			%inalloca.save = call i8* @llvm.stacksave()
	%argmem = alloca inalloca <{ %struct_type }>			%argmem = alloca inalloca <{ %struct_type }>
	; CHECK: alloca inalloca i64, align 8
	jfbUnsubmitted Not Done Reply Inline Actions This test checks that the `inalloca` remains. Change the test to still test the same thing, don't just delete the `CHECK`. https://bugs.llvm.org/show_bug.cgi?id=19569 jfb: This test checks that the `inalloca` remains. Change the test to still test the same thing…
	%0 = getelementptr inbounds <{ %struct_type }>, <{ %struct_type }>* %argmem, i32 0, i32 0			%0 = getelementptr inbounds <{ %struct_type }>, <{ %struct_type }>* %argmem, i32 0, i32 0
	%1 = bitcast %struct_type* %0 to i8*			%1 = bitcast %struct_type* %0 to i8*
	%2 = bitcast %struct_type* %a to i8*			%2 = bitcast %struct_type* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %1, i8* align 4 %2, i32 8, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 4 %1, i8* align 4 %2, i32 8, i1 false)
	call void @test9_aux(<{ %struct_type }>* inalloca %argmem)			call void @test9_aux(<{ %struct_type }>* inalloca %argmem)
	call void @llvm.stackrestore(i8* %inalloca.save)			call void @llvm.stackrestore(i8* %inalloca.save)
	ret void			ret void
	}			}
	Show All 24 Lines

test/Transforms/InstCombine/element-atomic-memintrins.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -instcombine -S < %s \| FileCheck %s			; RUN: opt -instcombine -S < %s \| FileCheck %s

				target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"
	;; ---- memset -----			;; ---- memset -----

	; Ensure 0-length memset is removed			; Ensure 0-length memset is removed
	define void @test_memset_zero_length(i8* %dest) {			define void @test_memset_zero_length(i8* %dest) {
	; CHECK-LABEL: @test_memset_zero_length(			; CHECK-LABEL: @test_memset_zero_length(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 1 %dest, i8 1, i32 0, i32 1)			call void @llvm.memset.element.unordered.atomic.p0i8.i32(i8* align 1 %dest, i8 1, i32 0, i32 1)
	▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	;; =========================================			;; =========================================
	;; ----- memmove ------			;; ----- memmove ------


	@gconst = constant [32 x i8] c"0123456789012345678901234567890\00"			@gconst = constant [32 x i8] c"0123456789012345678901234567890\00"
	; Check that a memmove from a global constant is converted into a memcpy			; Check that a memmove from a global constant is converted into a memcpy
	define void @test_memmove_to_memcpy(i8* %dest) {			define void @test_memmove_to_memcpy(i8* %dest) {
	; CHECK-LABEL: @test_memmove_to_memcpy(			; CHECK-LABEL: @test_memmove_to_memcpy(
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 [[DEST:%.]], i8 align 16 getelementptr inbounds ([32 x i8], [32 x i8]* @gconst, i64 0, i64 0), i32 32, i32 1)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 16 getelementptr inbounds ([32 x i8], [32 x i8]* @gconst, i64 0, i64 0), i32 32, i32 1)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
				lebedev.riUnsubmitted Not Done Reply Inline Actions Please just use `utils/update_test_checks.py`, as stated in the first line of the test. lebedev.ri: Please just use `utils/update_test_checks.py`, as stated in the first line of the test.
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 getelementptr inbounds ([32 x i8], [32 x i8]* @gconst, i64 0, i64 0), i32 32, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 getelementptr inbounds ([32 x i8], [32 x i8]* @gconst, i64 0, i64 0), i32 32, i32 1)
	ret void			ret void
	}			}

	define void @test_memmove_zero_length(i8* %dest, i8* %src) {			define void @test_memmove_zero_length(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memmove_zero_length(			; CHECK-LABEL: @test_memmove_zero_length(
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	Show All 31 Lines
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[SRC]] to i32*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[SRC]] to i32*
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[DEST]] to i32*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[DEST]] to i32*
	; CHECK-NEXT: [[TMP7:%.]] = load atomic i32, i32 [[TMP5]] unordered, align 1			; CHECK-NEXT: [[TMP7:%.]] = load atomic i32, i32 [[TMP5]] unordered, align 1
	; CHECK-NEXT: store atomic i32 [[TMP7]], i32* [[TMP6]] unordered, align 1			; CHECK-NEXT: store atomic i32 [[TMP7]], i32* [[TMP6]] unordered, align 1
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP10:%.]] = load atomic i64, i64 [[TMP8]] unordered, align 1			; CHECK-NEXT: [[TMP10:%.]] = load atomic i64, i64 [[TMP8]] unordered, align 1
	; CHECK-NEXT: store atomic i64 [[TMP10]], i64* [[TMP9]] unordered, align 1			; CHECK-NEXT: store atomic i64 [[TMP10]], i64* [[TMP9]] unordered, align 1
	; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 [[DEST]], i8* align 1 [[SRC]], i32 16, i32 1)			; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 1, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 1, i32 1)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 2, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 2, i32 1)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 4, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 4, i32 1)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 8, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 8, i32 1)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)
	ret void			ret void
	}			}

	define void @test_memmove_loadstore_2(i8* %dest, i8* %src) {			define void @test_memmove_loadstore_2(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memmove_loadstore_2(			; CHECK-LABEL: @test_memmove_loadstore_2(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i16			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i16
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i16			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i16
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i16, i16 [[TMP1]] unordered, align 2			; CHECK-NEXT: [[TMP3:%.]] = load atomic i16, i16 [[TMP1]] unordered, align 2
	; CHECK-NEXT: store atomic i16 [[TMP3]], i16* [[TMP2]] unordered, align 2			; CHECK-NEXT: store atomic i16 [[TMP3]], i16* [[TMP2]] unordered, align 2
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i32*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i32*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i32*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i32*
	; CHECK-NEXT: [[TMP6:%.]] = load atomic i32, i32 [[TMP4]] unordered, align 2			; CHECK-NEXT: [[TMP6:%.]] = load atomic i32, i32 [[TMP4]] unordered, align 2
	; CHECK-NEXT: store atomic i32 [[TMP6]], i32* [[TMP5]] unordered, align 2			; CHECK-NEXT: store atomic i32 [[TMP6]], i32* [[TMP5]] unordered, align 2
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP9:%.]] = load atomic i64, i64 [[TMP7]] unordered, align 2			; CHECK-NEXT: [[TMP9:%.]] = load atomic i64, i64 [[TMP7]] unordered, align 2
	; CHECK-NEXT: store atomic i64 [[TMP9]], i64* [[TMP8]] unordered, align 2			; CHECK-NEXT: store atomic i64 [[TMP9]], i64* [[TMP8]] unordered, align 2
	; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 [[DEST]], i8* align 2 [[SRC]], i32 16, i32 2)			; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 16, i32 2)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 2, i32 2)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 2, i32 2)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 4, i32 2)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 4, i32 2)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 8, i32 2)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 8, i32 2)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 16, i32 2)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 16, i32 2)
	ret void			ret void
	}			}

	define void @test_memmove_loadstore_4(i8* %dest, i8* %src) {			define void @test_memmove_loadstore_4(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memmove_loadstore_4(			; CHECK-LABEL: @test_memmove_loadstore_4(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i32
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i32
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4			; CHECK-NEXT: [[TMP3:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[TMP2]] unordered, align 4			; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP6:%.]] = load atomic i64, i64 [[TMP4]] unordered, align 4			; CHECK-NEXT: [[TMP6:%.]] = load atomic i64, i64 [[TMP4]] unordered, align 4
	; CHECK-NEXT: store atomic i64 [[TMP6]], i64* [[TMP5]] unordered, align 4			; CHECK-NEXT: store atomic i64 [[TMP6]], i64* [[TMP5]] unordered, align 4
	; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 [[DEST]], i8* align 4 [[SRC]], i32 16, i32 4)			; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 16, i32 4)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 4, i32 4)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 4, i32 4)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 8, i32 4)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 8, i32 4)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 16, i32 4)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 16, i32 4)
	ret void			ret void
	}			}

	define void @test_memmove_loadstore_8(i8* %dest, i8* %src) {			define void @test_memmove_loadstore_8(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memmove_loadstore_8(			; CHECK-LABEL: @test_memmove_loadstore_8(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i64			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i64			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i64
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i64, i64 [[TMP1]] unordered, align 8			; CHECK-NEXT: [[TMP3:%.]] = load atomic i64, i64 [[TMP1]] unordered, align 8
	; CHECK-NEXT: store atomic i64 [[TMP3]], i64* [[TMP2]] unordered, align 8			; CHECK-NEXT: store atomic i64 [[TMP3]], i64* [[TMP2]] unordered, align 8
	; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 [[DEST]], i8* align 8 [[SRC]], i32 16, i32 8)			; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 16, i32 8)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 8, i32 8)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 8, i32 8)
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 16, i32 8)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 16, i32 8)
	ret void			ret void
	}			}

	define void @test_memmove_loadstore_16(i8* %dest, i8* %src) {			define void @test_memmove_loadstore_16(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memmove_loadstore_16(			; CHECK-LABEL: @test_memmove_loadstore_16(
	; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 [[DEST:%.]], i8 align 16 [[SRC:%.*]], i32 16, i32 16)			; CHECK-NEXT: call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 16 %src, i32 16, i32 16)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 16 %src, i32 16, i32 16)			call void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 16 %src, i32 16, i32 16)
	ret void			ret void
	}			}

	declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i32) nounwind argmemonly			declare void @llvm.memmove.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i32) nounwind argmemonly

	Show All 37 Lines
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[SRC]] to i32*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[SRC]] to i32*
	; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[DEST]] to i32*			; CHECK-NEXT: [[TMP6:%.]] = bitcast i8 [[DEST]] to i32*
	; CHECK-NEXT: [[TMP7:%.]] = load atomic i32, i32 [[TMP5]] unordered, align 1			; CHECK-NEXT: [[TMP7:%.]] = load atomic i32, i32 [[TMP5]] unordered, align 1
	; CHECK-NEXT: store atomic i32 [[TMP7]], i32* [[TMP6]] unordered, align 1			; CHECK-NEXT: store atomic i32 [[TMP7]], i32* [[TMP6]] unordered, align 1
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP9:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP9:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP10:%.]] = load atomic i64, i64 [[TMP8]] unordered, align 1			; CHECK-NEXT: [[TMP10:%.]] = load atomic i64, i64 [[TMP8]] unordered, align 1
	; CHECK-NEXT: store atomic i64 [[TMP10]], i64* [[TMP9]] unordered, align 1			; CHECK-NEXT: store atomic i64 [[TMP10]], i64* [[TMP9]] unordered, align 1
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 [[DEST]], i8* align 1 [[SRC]], i32 16, i32 1)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)
				; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 128, i32 1)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 1, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 1, i32 1)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 2, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 2, i32 1)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 4, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 4, i32 1)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 8, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 8, i32 1)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 16, i32 1)
				call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 1 %dest, i8* align 1 %src, i32 128, i32 1)
	ret void			ret void
	}			}

	define void @test_memcpy_loadstore_2(i8* %dest, i8* %src) {			define void @test_memcpy_loadstore_2(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memcpy_loadstore_2(			; CHECK-LABEL: @test_memcpy_loadstore_2(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i16			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i16
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i16			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i16
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i16, i16 [[TMP1]] unordered, align 2			; CHECK-NEXT: [[TMP3:%.]] = load atomic i16, i16 [[TMP1]] unordered, align 2
	; CHECK-NEXT: store atomic i16 [[TMP3]], i16* [[TMP2]] unordered, align 2			; CHECK-NEXT: store atomic i16 [[TMP3]], i16* [[TMP2]] unordered, align 2
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i32*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i32*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i32*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i32*
	; CHECK-NEXT: [[TMP6:%.]] = load atomic i32, i32 [[TMP4]] unordered, align 2			; CHECK-NEXT: [[TMP6:%.]] = load atomic i32, i32 [[TMP4]] unordered, align 2
	; CHECK-NEXT: store atomic i32 [[TMP6]], i32* [[TMP5]] unordered, align 2			; CHECK-NEXT: store atomic i32 [[TMP6]], i32* [[TMP5]] unordered, align 2
	; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP7:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP8:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP9:%.]] = load atomic i64, i64 [[TMP7]] unordered, align 2			; CHECK-NEXT: [[TMP9:%.]] = load atomic i64, i64 [[TMP7]] unordered, align 2
	; CHECK-NEXT: store atomic i64 [[TMP9]], i64* [[TMP8]] unordered, align 2			; CHECK-NEXT: store atomic i64 [[TMP9]], i64* [[TMP8]] unordered, align 2
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 [[DEST]], i8* align 2 [[SRC]], i32 16, i32 2)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 [[DEST]], i8* align 2 [[SRC]], i32 128, i32 2)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 2, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 2, i32 2)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 4, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 4, i32 2)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 8, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 8, i32 2)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 16, i32 2)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 2 %dest, i8* align 2 %src, i32 128, i32 2)
	ret void			ret void
	}			}

	define void @test_memcpy_loadstore_4(i8* %dest, i8* %src) {			define void @test_memcpy_loadstore_4(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memcpy_loadstore_4(			; CHECK-LABEL: @test_memcpy_loadstore_4(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i32			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i32
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i32			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i32
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4			; CHECK-NEXT: [[TMP3:%.]] = load atomic i32, i32 [[TMP1]] unordered, align 4
	; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[TMP2]] unordered, align 4			; CHECK-NEXT: store atomic i32 [[TMP3]], i32* [[TMP2]] unordered, align 4
	; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i64*			; CHECK-NEXT: [[TMP4:%.]] = bitcast i8 [[SRC]] to i64*
	; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i64*			; CHECK-NEXT: [[TMP5:%.]] = bitcast i8 [[DEST]] to i64*
	; CHECK-NEXT: [[TMP6:%.]] = load atomic i64, i64 [[TMP4]] unordered, align 4			; CHECK-NEXT: [[TMP6:%.]] = load atomic i64, i64 [[TMP4]] unordered, align 4
	; CHECK-NEXT: store atomic i64 [[TMP6]], i64* [[TMP5]] unordered, align 4			; CHECK-NEXT: store atomic i64 [[TMP6]], i64* [[TMP5]] unordered, align 4
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 [[DEST]], i8* align 4 [[SRC]], i32 16, i32 4)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 [[DEST]], i8* align 4 [[SRC]], i32 128, i32 4)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 4, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 4, i32 4)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 8, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 8, i32 4)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 16, i32 4)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 4 %dest, i8* align 4 %src, i32 128, i32 4)
	ret void			ret void
	}			}

	define void @test_memcpy_loadstore_8(i8* %dest, i8* %src) {			define void @test_memcpy_loadstore_8(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memcpy_loadstore_8(			; CHECK-LABEL: @test_memcpy_loadstore_8(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i64			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[SRC:%.]] to i64
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i64			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[DEST:%.]] to i64
	; CHECK-NEXT: [[TMP3:%.]] = load atomic i64, i64 [[TMP1]] unordered, align 8			; CHECK-NEXT: [[TMP3:%.]] = load atomic i64, i64 [[TMP1]] unordered, align 8
	; CHECK-NEXT: store atomic i64 [[TMP3]], i64* [[TMP2]] unordered, align 8			; CHECK-NEXT: store atomic i64 [[TMP3]], i64* [[TMP2]] unordered, align 8
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 [[DEST]], i8* align 8 [[SRC]], i32 16, i32 8)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 [[DEST]], i8* align 8 [[SRC]], i32 128, i32 8)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 8, i32 8)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 8, i32 8)
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 16, i32 8)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 8 %dest, i8* align 8 %src, i32 128, i32 8)
	ret void			ret void
	}			}

	define void @test_memcpy_loadstore_16(i8* %dest, i8* %src) {			define void @test_memcpy_loadstore_16(i8* %dest, i8* %src) {
	; CHECK-LABEL: @test_memcpy_loadstore_16(			; CHECK-LABEL: @test_memcpy_loadstore_16(
	; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 [[DEST:%.]], i8 align 16 [[SRC:%.*]], i32 16, i32 16)			; CHECK-NEXT: call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 [[DEST:%.]], i8 align 16 [[SRC:%.*]], i32 128, i32 16)
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 16 %src, i32 16, i32 16)			call void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* align 16 %dest, i8* align 16 %src, i32 128, i32 16)
	ret void			ret void
	}			}

	declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i32) nounwind argmemonly			declare void @llvm.memcpy.element.unordered.atomic.p0i8.p0i8.i32(i8* nocapture writeonly, i8* nocapture readonly, i32, i32) nounwind argmemonly

test/Transforms/InstCombine/memcpy-to-load.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s --check-prefix=ALL --check-prefix=NODL
	; RUN: opt < %s -instcombine -S -data-layout=n32 \| FileCheck %s --check-prefix=ALL --check-prefix=I32			; RUN: opt < %s -instcombine -S -data-layout=n32 \| FileCheck %s --check-prefix=ALL --check-prefix=I32
	; RUN: opt < %s -instcombine -S -data-layout=n32:64 \| FileCheck %s --check-prefix=ALL --check-prefix=I64			; RUN: opt < %s -instcombine -S -data-layout=n32:64 \| FileCheck %s --check-prefix=ALL --check-prefix=I64
	; RUN: opt < %s -instcombine -S -data-layout=n32:64:128 \| FileCheck %s --check-prefix=ALL --check-prefix=I128			; RUN: opt < %s -instcombine -S -data-layout=n32:64:128 \| FileCheck %s --check-prefix=ALL --check-prefix=I128

	declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1) nounwind			declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1) nounwind

	; memcpy can be expanded inline with load/store (based on the datalayout?)			; memcpy can be expanded inline with load/store (based on the datalayout?)

	▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines
	; ALL-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 5, i1 false)			; ALL-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 5, i1 false)
	; ALL-NEXT: ret void			; ALL-NEXT: ret void
	;			;
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 5, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 5, i1 false)
	ret void			ret void
	}			}

	define void @copy_8_bytes(i8* %d, i8* %s) {			define void @copy_8_bytes(i8* %d, i8* %s) {
	; ALL-LABEL: @copy_8_bytes(			; I32-LABEL: @copy_8_bytes(
	; ALL-NEXT: [[TMP1:%.]] = bitcast i8 [[S:%.]] to i64			; I32-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 8, i1 false)
	; ALL-NEXT: [[TMP2:%.]] = bitcast i8 [[D:%.]] to i64			; I32-NEXT: ret void
	; ALL-NEXT: [[TMP3:%.]] = load i64, i64 [[TMP1]], align 1			;
	; ALL-NEXT: store i64 [[TMP3]], i64* [[TMP2]], align 1			; For datalayout with largest legal integer type size of 4 bytes, all memcpy with size less than 8 bytes (and power-of-2) will be expanded inline with load/store
	; ALL-NEXT: ret void			; I64-LABEL: @copy_8_bytes(
				; I64-NEXT: [[TMP1:%.]] = bitcast i8 [[S:%.]] to i64
				; I64-NEXT: [[TMP2:%.]] = bitcast i8 [[D:%.]] to i64
				; I64-NEXT: [[TMP3:%.]] = load i64, i64 [[TMP1]], align 1
				; I64-NEXT: store i64 [[TMP3]], i64* [[TMP2]], align 1
				lebedev.riUnsubmitted Done Reply Inline Actions Will running the `utils/update_test_checks.py` preserve this 'inline' comments? lebedev.ri: Will running the `utils/update_test_checks.py` preserve this 'inline' comments?
				; I64-NEXT: ret void
				;
				; I128-LABEL: @copy_8_bytes(
				; I128-NEXT: [[TMP1:%.]] = bitcast i8 [[S:%.]] to i64
				; I128-NEXT: [[TMP2:%.]] = bitcast i8 [[D:%.]] to i64
				; I128-NEXT: [[TMP3:%.]] = load i64, i64 [[TMP1]], align 1
				; I128-NEXT: store i64 [[TMP3]], i64* [[TMP2]], align 1
				; I128-NEXT: ret void
	;			;
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 8, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 8, i1 false)
	ret void			ret void
	}			}

	define void @copy_16_bytes(i8* %d, i8* %s) {			define void @copy_16_bytes(i8* %d, i8* %s) {
	; ALL-LABEL: @copy_16_bytes(			; I32-LABEL: @copy_16_bytes(
	; ALL-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 16, i1 false)			; I32-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 16, i1 false)
				spatelUnsubmitted Not Done Reply Inline Actions That is the existing behavior, but why is that valid now that we're using the datalayout? spatel: That is the existing behavior, but why is that valid now that we're using the datalayout?
				DIVYAUnsubmitted Not Done Reply Inline Actions In test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.For those cases it will have the previous behavious before applying the patch DIVYA: In test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits…
	; ALL-NEXT: ret void			; I32-NEXT: ret void
				;
				; I64-LABEL: @copy_16_bytes(
				; I64-NEXT: call void @llvm.memcpy.p0i8.p0i8.i32(i8* align 1 [[D:%.]], i8 align 1 [[S:%.*]], i32 16, i1 false)
				; I64-NEXT: ret void

				; For datalayout with largest legal integer type size of 4 bytes, all memcpy with size less than 8 bytes (and power-of-2) will be expanded inline with load/store
				; I128-LABEL: @copy_16_bytes(
				; I128-NEXT: [[TMP1:%.]] = bitcast i8 [[S:%.]] to i128
				; I128-NEXT: [[TMP2:%.]] = bitcast i8 [[D:%.]] to i128
				; I128-NEXT: [[TMP3:%.]] = load i128, i128 [[TMP1]], align 1
				; I128-NEXT: store i128 [[TMP3]], i128* [[TMP2]], align 1
				spatelUnsubmitted Done Reply Inline Actions You're mixing bytes and bits here. It makes sense to me that we would use the datalayout to drive this transform, but why are we then ignoring the datalayout and producing load + store of an illegal type (i128)? spatel: You're mixing bytes and bits here. It makes sense to me that we would use the datalayout to…
				; I128-NEXT: ret void
	;			;
				; For datalayout with largest legal integer type size of 8 bytes, all memcpy with size less than 16 bytes (and power-of-2) will be expanded inline with load/store
				; For datalayout with largest legal integer type size of 16 bytes, all memcpy with size less than 32 bytes (and power-of-2) will be expanded inline with load/store

	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 16, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 16, i1 false)
	ret void			ret void
	}			}

test/Transforms/InstCombine/memmove.ll

	; This test makes sure that memmove instructions are properly eliminated.			; This test makes sure that memmove instructions are properly eliminated.
	;			;
				target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s

	@S = internal constant [33 x i8] c"panic: restorelist inconsistency\00" ; <[33 x i8]*> [#uses=1]			@S = internal constant [33 x i8] c"panic: restorelist inconsistency\00" ; <[33 x i8]*> [#uses=1]
	@h = constant [2 x i8] c"h\00" ; <[2 x i8]*> [#uses=1]			@h = constant [2 x i8] c"h\00" ; <[2 x i8]*> [#uses=1]
	@hel = constant [4 x i8] c"hel\00" ; <[4 x i8]*> [#uses=1]			@hel = constant [4 x i8] c"hel\00" ; <[4 x i8]*> [#uses=1]
	@hello_u = constant [8 x i8] c"hello_u\00" ; <[8 x i8]*> [#uses=1]			@hello_u = constant [8 x i8] c"hello_u\00" ; <[8 x i8]*> [#uses=1]

	define void @test1(i8* %A, i8* %B, i32 %N) {			define void @test1(i8* %A, i8* %B, i32 %N) {
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

test/Transforms/InstCombine/pr31990_wrong_memcpy.ll

	; RUN: opt -S -instcombine %s -o - \| FileCheck %s			; RUN: opt -S -instcombine %s -o - \| FileCheck %s
				target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"

	; Regression test of PR31990. A memcpy of one byte, copying 0xff, was			; Regression test of PR31990. A memcpy of one byte, copying 0xff, was
	; replaced with a single store of an i4 0xf.			; replaced with a single store of an i4 0xf.

	@g = constant i8 -1			@g = constant i8 -1

	define void @foo() {			define void @foo() {
	entry:			entry:
	Show All 16 Lines

test/Transforms/InstCombine/snprintf.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt -data-layout=n32 < %s -instcombine -S \| FileCheck %s

	@.str = private unnamed_addr constant [4 x i8] c"str\00", align 1			@.str = private unnamed_addr constant [4 x i8] c"str\00", align 1
	@.str.1 = private unnamed_addr constant [3 x i8] c"%%\00", align 1			@.str.1 = private unnamed_addr constant [3 x i8] c"%%\00", align 1
	@.str.2 = private unnamed_addr constant [3 x i8] c"%c\00", align 1			@.str.2 = private unnamed_addr constant [3 x i8] c"%c\00", align 1
	@.str.3 = private unnamed_addr constant [3 x i8] c"%s\00", align 1			@.str.3 = private unnamed_addr constant [3 x i8] c"%s\00", align 1

	declare i32 @snprintf(i8, i64, i8, ...) #1			declare i32 @snprintf(i8, i64, i8, ...) #1

	▲ Show 20 Lines • Show All 128 Lines • Show Last 20 Lines

test/Transforms/InstCombine/sprintf-1.ll

	; Test that the sprintf library call simplifier works correctly.			; Test that the sprintf library call simplifier works correctly.
	;			;
	; RUN: opt < %s -instcombine -S \| FileCheck %s			; RUN: opt < %s -instcombine -S \| FileCheck %s
	; RUN: opt < %s -mtriple xcore-xmos-elf -instcombine -S \| FileCheck %s -check-prefix=CHECK-IPRINTF			; RUN: opt < %s -mtriple xcore-xmos-elf -instcombine -S \| FileCheck %s -check-prefix=CHECK-IPRINTF

	target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128"			target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64-f80:128:128-n32"

	@hello_world = constant [13 x i8] c"hello world\0A\00"			@hello_world = constant [13 x i8] c"hello world\0A\00"
	@null = constant [1 x i8] zeroinitializer			@null = constant [1 x i8] zeroinitializer
	@null_hello = constant [7 x i8] c"\00hello\00"			@null_hello = constant [7 x i8] c"\00hello\00"
	@h = constant [2 x i8] c"h\00"			@h = constant [2 x i8] c"h\00"
	@percent_c = constant [3 x i8] c"%c\00"			@percent_c = constant [3 x i8] c"%c\00"
	@percent_d = constant [3 x i8] c"%d\00"			@percent_d = constant [3 x i8] c"%d\00"
	@percent_f = constant [3 x i8] c"%f\00"			@percent_f = constant [3 x i8] c"%f\00"
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Prevent memcpy generation for small data sizeAcceptedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 170348

lib/Transforms/InstCombine/InstCombineCalls.cpp

test/DebugInfo/X86/array2.ll

test/Transforms/InstCombine/2007-10-10-EliminateMemCpy.ll

test/Transforms/InstCombine/alloca.ll

test/Transforms/InstCombine/element-atomic-memintrins.ll

test/Transforms/InstCombine/memcpy-to-load.ll

test/Transforms/InstCombine/memmove.ll

test/Transforms/InstCombine/pr31990_wrong_memcpy.ll

test/Transforms/InstCombine/snprintf.ll

test/Transforms/InstCombine/sprintf-1.ll

[InstCombine] Prevent memcpy generation for small data size
AcceptedPublic