Page MenuHomePhabricator

[InstCombine] Prevent memcpy generation for small data size
Needs RevisionPublic

Authored by hiraditya on Jul 5 2017, 2:22 PM.

Details

Summary

InstCombine pass converts @llvm.memcpy to either memcpy or LD/ST depending on if the data size is greater than 8 bytes or not.
However for different targets, 8 bytes may be smaller.
Hence, we queried target datalayout to find the threshold.

Worked in collaboration with Aditya Kumar

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What is the advantage of expanding the memcpy intrinsic in InstCombine vs doing it later in the target-specific code?

I can't answer for 'memcpy' directly, but probably related - I'm looking at cases where late 'memcmp' expansion causes us to miss optimizations in:
https://bugs.llvm.org/show_bug.cgi?id=34032#c13

instcombine is definitely not the IR pass for 'memcmp' expansion. (It's complicated enough to be its own pass, but currently it's a pass-within-a-pass of CGP.)

We could make the argument that the post-IR backend should handle those cases, but I think it would require the collective powers of gvn, cse, instcombine, and simplifycfg.

joerg added a comment.Aug 12 2017, 7:18 AM

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

@spatel: I don't see a reason why we can't (or shouldn't) try to do common-prefix elimination for the memcmp intrinsic. It certainly seems to be better to me to preserve the intrinsics in your case as they should be easier to reason about. That's kind of my question for here too -- why does the expansion allow better code?

Thanks, @joerg . I can't see any reason not to do prefix elimination either. I'm not sure if that will solve everything for that case, but it should make it better. I'll give that a try.

DIVYA updated this revision to Diff 110982.Aug 14 2017, 8:42 AM
DIVYA marked an inline comment as done.
DIVYA added inline comments.
test/Transforms/InstCombine/memcpy-to-load.ll
92

In test cases when the target datalayout is not specified, DL.getLargestLegalIntTypeSizeInBits() returns 0.For those cases it will have the previous behavious before applying the patch

DIVYA updated this revision to Diff 111029.Aug 14 2017, 10:27 AM
craig.topper added inline comments.Sep 16 2018, 6:00 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
136–141

Is there an extra blank line here?

140–141

But the previous behavior allowed 8 bytes so shouldn't it be 64?

hiraditya added inline comments.Sep 16 2018, 11:54 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
136–141

We'll fix this.

140–141

It's been a while when we submitted the patch but I think you're right unless @DIVYA has some comments. We'll make the changes.

I would quickly like to check the status of this patch: do you have plans to continue this work? If not, I would like to pick it up.

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

I think this patch is good to go, I can push this if someone accepts. I'll fix the comments.

I still don't understand the (LargestInt == 0) hack. I think everyone agrees that the existing code is wrong, so why preserve the existing behavior if we don't have a valid datalayout? Ie, there's no risk for real targets because they always have a non-zero getLargestLegalIntTypeSizeInBits()?

Agreed, I'll remove that part then. That was only to make other testcases happy by preserving existing behavior. Thanks

Presumably those testcases should simply be updated to contain the expected datalayout?

I think so...

hiraditya commandeered this revision.Sep 30 2018, 10:15 AM
hiraditya added a reviewer: DIVYA.

To fix tests and comments from reviewers.

Fix some unit-tests, a few more remaining.

lebedev.ri added inline comments.Sep 30 2018, 10:19 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
133

You can check that the Size is power of two here.

139–140

I think we decided that the tests should be updated instead?

142

You overrode LargestInt already, it can't be 0.
This should only be Size > LargestInt.

Fixed all testcases, and removed memcpy inlining when target datalayout is not present.

Moved the check of Size before checking the LargestInt.

hiraditya marked 3 inline comments as done.Sep 30 2018, 12:28 PM
hiraditya added reviewers: sebpop, SirishP.

Code looks good.

test/Transforms/InstCombine/element-atomic-memintrins.ll
102

Please just use utils/update_test_checks.py, as stated in the first line of the test.

test/Transforms/InstCombine/memcpy-to-load.ll
76

Will running the utils/update_test_checks.py preserve this 'inline' comments?

lebedev.ri added inline comments.Sep 30 2018, 1:45 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
133

Actually, please use !isPowerOf2_64(Size).

hiraditya updated this revision to Diff 167673.Oct 1 2018, 12:00 AM

Updated testcases with utils/update_test_checks.py
use isPowerOf2_64

hiraditya marked 2 inline comments as done.Oct 1 2018, 12:01 AM
dmgreen added inline comments.Oct 1 2018, 6:26 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
140

Are the units of these the same? Or is one bits and the other bytes?

jfb added a comment.Oct 1 2018, 10:44 AM

I don't understand why it makes sense to tie this to the larger integer type. I understand that targets might want what you're doing, but tying it to the largest integer type instead of having a separate target-specific value seems odd.
How does this interact with memcpyopt?
Which targets are affected by this change?
Can you please provide size and performance numbers for relevant targets?

lib/Transforms/InstCombine/InstCombineCalls.cpp
139

Why does this make sense? I don't understand why we'd want to tie this to the largest legal integer type, and not have it be its own target parameter.

test/DebugInfo/X86/array2.ll
21

Why change this?

test/Transforms/InstCombine/2007-10-10-EliminateMemCpy.ll
3

Why change this?

test/Transforms/InstCombine/alloca.ll
147

This test checks that the inalloca remains. Change the test to still test the same thing, don't just delete the CHECK.
https://bugs.llvm.org/show_bug.cgi?id=19569

jfb added a subscriber: MatzeB.Oct 1 2018, 10:44 AM
hiraditya added inline comments.Oct 2 2018, 7:17 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

We used 'largest legal integer size' because that will fit in a register for sure. I think making a target specific parameter seems reasonable, or maybe using TargetLowering.h:getMaxStoresPerMemcpy() which is already available.

140

Thank you!

spatel added inline comments.Oct 2 2018, 8:44 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

I'm not understanding this discussion...
This patch is trying to apply some target-based constraint to a questionable (reverse) IR canonicalization that currently just pulls a number out of the air.
The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.
But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.
Expanding memcpy for performance using target parameters belongs in a late, target-aware pass (and as mentioned, it already exists), not early in generic instcombine.

jfb added inline comments.Oct 2 2018, 9:24 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

Let me try to expand a bit, and let me know if that makes sense:

IMO LargestLegalIntTypeSizeInBits doesn't make sense to use here. On ARM I might want to use paired integer load / store for memcpy, or paired Q registers. These have nothing to do with the largest legal integer type. It doesn't matter what the patch is trying to do or what is currently being done: the patch is adding something weird. It shouldn't.

lebedev.ri added inline comments.Oct 2 2018, 9:45 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

It's replacing magic hardcoded number with a number pulled out of target datalayout.
Should be a new field in target datalayout?
I'm not sure if we can get TargetLowering info from the backend in the middle-end?

spatel added inline comments.Oct 2 2018, 9:50 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

I agree that this is a strange transform for instcombine. We're trying to safely replace an obviously bogus magic number constraint (8) with something based on target reality.

So this is an intermediate step as I see it. The goal of this patch (and this may be counter to the original goal...) is to not expand memcpy(x, y, 8) on a 32-bit system because that could take >1 load/store pair. The final goal is to not expand memcpy at all.

IIUC, you'd like to make this transform more aggressive rather than more conservative though? Ie, a pair of registers or Q regs are always bigger than LargestLegalIntType (let me know if I'm wrong).

But that's moving us further away from the target-independent canonicalization goal of instcombine. The kind of expansion where we ask if pairs or vector regs are available should be happening in memcpyopt (if it's not, then that should be enhanced in that pass/lowering).

jfb added inline comments.Oct 2 2018, 9:55 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing.

hiraditya added inline comments.Oct 2 2018, 10:14 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

Totally agreed with @spatel , ideally we shouldn't be inlining memcpy this early on, we should try to be independent of target specific behavior at instcombine. It be ideal to just not have inlining of memcpy here and do it all in memcpyopt with better cost-model.

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

The ultimate goal would be to simply always canonicalize to memcpy and not expand it ever in instcombine as mentioned in D52081.

Looks like we all agree on this now.

But we don't do that (yet) because we're afraid of missed optimizations that can't be replicated in the backend.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

spatel added a comment.Oct 2 2018, 1:38 PM

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

jfb added a comment.Oct 2 2018, 1:50 PM

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.
And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

Just to drop a nit, here is a somewhat idiomatic pattern, that is recommended to avoid breaking strict aliasing: https://godbolt.org/z/_JNwOp
If we don't do this memcpy expansion where, will we have memcpy till the backend?
Surely this is not good.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

clang also performs some of this work, even at O0.

Perhaps the impact is negligible, non-existent, and we worry about this for nothing. As also suggested earlier, I will try to get some numbers on the table for ARM and AArch64 if we strip out the lowering here, if that is helpful for this discussion, but probably need a day or two to get them.

If you could provide some numbers, I can go ahead and remove the inlining of memcpy altogether provided the reviewers agree with it, or we can merge this patch which is trying to improve on previously hardcoded numbers.

Yes, I support removing the expansion entirely, but I don't think we can commit that change without doing some advance perf testing.

Makes sense.

And yes, in the best case, we'll discover that there are no regressions because all of the other analyses and lowering will do the transform as intended when it's profitable.

If that doesn't work though, using LargestLegalIntTypeSizeInBits still seems like a good compromise to me. We want to conservatively limit the expansion to a size/type that the target tells us is ok (can be performed with a single load/store), and that's the value that most closely matches what we have today, so we avoid regressions as we work to the goal. It's not the ideal change, but there's precedence for this sort of datalayout use in instcombine (see InstCombiner::shouldChangeType()). Adding a new specifier to the datalayout to account for things like pair ops or vectors doesn't make sense to me - that moves us away from the goal of improving the other passes and removing the expansion in instcombine.

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

spatel added a comment.Oct 3 2018, 2:32 PM

Do you think getting this patch in is a good for now. After some performance analysis if we find that we dont need to inline memcpy here then we can remove this entirely at a later stage.

We should wait for the perf results from @SjoerdMeijer (and anyone else that is in a position to collect benchmark results?).
If the full solution (don't expand at all) is too ambitious, then I support continuing with this patch, but we need to make sure that we've answered @jfb's concerns. Ie, if the datalayout query is too distasteful, what is the alternative?

I am doing some experiments with a hack that simply comments out the call to InstCombiner::SimplifyAnyMemTransfer. I've ran 3 smaller benchmarks. 2 didn't show any difference. The 3rd shows 1 tiny regression and 1 tiny improvement in a test case, and will probably not even show a difference in the geomean. I am doing this as a background task; tomorrow I will ran bigger benchmarks, on different platforms. But I guess we need some numbers for non-Arm platforms too.

Oops, sorry, forgot to reply, and also got distracted by a few other things. I ran one bigger benchmark, and didn't see anything worth mentioning. A first preliminary conclusion: looks like we don't miss much by not doing this lowering here. Disclaimer: I've tested only on Arm targets, definitely not on all interesting architecture combinations, and a handful of benchmarks.

lebedev.ri requested changes to this revision.Oct 16 2018, 8:27 AM
lebedev.ri added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
140

I think this is still broken? (and thus, any and all benchmarks thus far are incorrect.)
As it was previously pointed out by @dmgreen, Size is bytes, while LargestInt is clearly in bits.

This revision now requires changes to proceed.Oct 16 2018, 8:27 AM

Fix testcases and address comments

hiraditya marked 2 inline comments as done.Oct 21 2018, 12:37 PM
lebedev.ri resigned from this revision.Oct 30 2018, 3:23 PM

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

lib/Transforms/InstCombine/InstCombineCalls.cpp
140

Nit: spaces around /.

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

  1. Is @jfb objecting to this as an intermediate improvement?
  2. If not, there are unanswered comments about the test diffs.
  3. There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
  4. Are there any updates on the perf for the ideal change (removing this expansion completely)?
jfb added a comment.Nov 1 2018, 10:44 AM

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

  1. Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

  1. If not, there are unanswered comments about the test diffs.
  2. There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
  3. Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

In D35035#1284164, @jfb wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

  1. Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

  1. If not, there are unanswered comments about the test diffs.
  2. There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
  3. Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

jfb added a comment.Nov 13 2018, 9:57 AM
In D35035#1284164, @jfb wrote:

Not sure what is the general consensus wrt this patch, but i guess it now consistently uses bytes.

Agree - the code change looks like what I expected now.

So:

  1. Is @jfb objecting to this as an intermediate improvement?

My concern is here: https://reviews.llvm.org/D35035#inline-464768
This uses an unrelated constant to drive optimization decisions. Create a new per-target constant.

I think the intent of this patch is to do best-effort without relying a lot on target specific constants. Will it help to have per-target constant only for mem* functions? Could that constant be used to drive other optimizations?

Yes, please add a constant which informs this optimization. Don't reuse an unrelated constant for this purpose. Don't use this new constant to dive other unrelated optimizations.

  1. If not, there are unanswered comments about the test diffs.
  2. There was also an unanswered request for the targets and size/perf improvements for this change (presumably some 32-bit target shows wins).
  3. Are there any updates on the perf for the ideal change (removing this expansion completely)?

I do want to see size and perf results.

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

lib/Transforms/InstCombine/InstCombineCalls.cpp
139

All I'm saying is that replacing MagicNumber == 8 with ErroneousUsage == LargestLegalIntTypeSizeInBits isn't what we should be doing.

@jfb can you elaborate why we shouldn't do that?

I'm not sure I understand why are you qualifying this as ErroneousUsage?
I can see that it would be suboptimal in some cases (and ideally it would be a TTI), but that does not enough to make it erroneous. I see a rational to consider that if I have a legal integer I can likely load/store it with a single pair of instructions. So it seems to me like a strict improvements over the existing "magic number".

That case (and a couple of similar tests that I tried) are handled by -sroa, so they probably never made it to instcombine in the 1st place. I don't know anything about SROA, but hopefully, it's making that transform using some principled logic. :)

FYI: @chandlerc mentioned to me (in the context of https://bugs.llvm.org/show_bug.cgi?id=39780 ) that SROA expects instcombine to handle memcpy and promote them to registers.

Is it possible to improve SROA to understand llvm.memcpy instead? IIUC, the memcpy intrinsic is considered canonical form, and the consensus in this review was that we would not expand memcpy in instcombine if that did not cause regressions.

D54887 is proposing a similar change as here, but it's more liberal. It would allow expansion to load/store of weird types (non-power-of-2) as long as they fit in a single integer register. That seems dangerous given that the backend isn't very good at handling those illegal types. Example:

target datalayout = "e-p:64:64:64-p1:32:32:32-p2:16:16:16-n8:16:32:64"
declare void @llvm.memcpy.p0i8.p0i8.i32(i8* nocapture, i8* nocapture, i32, i1) nounwind

define void @copy_3_bytes(i8* %d, i8* %s) {
  call void @llvm.memcpy.p0i8.p0i8.i32(i8* %d, i8* %s, i32 3, i1 false)
  ret void
}
$ opt -instcombine -S memcpy.ll
...
define void @copy_3_bytes(i8* %d, i8* %s) {
  %1 = bitcast i8* %s to i24*
  %2 = bitcast i8* %d to i24*
  %3 = load i24, i24* %1, align 1
  store i24 %3, i24* %2, align 1
  ret void
}
jfb added inline comments.Nov 26 2018, 10:49 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
139

Because we should be able to tune this magic number over time, per target, and using LargestLegalIntTypeSizeInBits doesn't allow us to do so. It's trivial to add this new magic number, and default it to LargestLegalIntTypeSizeInBits in the generic target setup. Let's just do that, and then the code is even self-documenting.

alnemr55 accepted this revision.Jan 4 2019, 2:38 AM
This revision is now accepted and ready to land.Jan 4 2019, 2:38 AM
lebedev.ri requested changes to this revision.Jan 4 2019, 3:05 AM

Data layout question wasn't resolved. (just signalling, i don't have a preference one way or another)

This revision now requires changes to proceed.Jan 4 2019, 3:05 AM

Thanks Roman.

@alnemr55 - Why did you mark this as accepted? Maybe it was an accidental click? If not, please don't mark things as accepted unless you've been actively participating in the review and the review has genuinely concluded.

ychen added a subscriber: ychen.Jun 20 2019, 4:05 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2019, 4:05 PM