[InlineCost] Find identical loads in the callee
ClosedPublic

Authored by haicheng on Jun 6 2017, 9:40 AM.

Details

Summary

LLVM Inliner encourages to inline single block callee by giving it higher threshold. However, some large single block callees still cannot be inlined although they have many redundant instructions that can be removed if they are inlined.

The motivation example is a fully unrolled 3x3 matrix multiplication. It loads every data in matrix a and b three times because of the Stores between them. The SROA analysis can figure out that these Stores can be simplified and then these redundant loads should also be free. Thus, running sroa and gvn after inlining the callee can remove 54% of the instructions.

define void @outer(%struct.matrix* %a, %struct.matrix* %b) {
  %c = alloca %struct.matrix
  call void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c)
  ret void
}
define void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c) {

This simple patch tries to find repeated loads in the callee. It stops finding Loads if there are Stores that cannot be simplified or there are function calls in the callee. The above restriction can be relaxed to find more CSE opportunities with more expensive analysis.

I tested the patch with SPEC20xx and llvm-test-suite using O3+LTO/O3/O2/Os on x86 and AArch64. Only spec2006/milc gets impacted when using O3+LTO. On X86, the performance is improved by +3.6% and the code size is increased by 0.07%. On AArch64, the performance is improved by +4.8% and code size has no change.

Diff Detail

Repository
rL LLVM
There are a very large number of changes, so older changes are hidden. Show Older Changes

Lift the requirement of SROA. Update the test cases. Rewrite the comments. Add the support of multi-block function to the FIXME.

haicheng marked 6 inline comments as done.Jun 19 2017, 3:19 PM
haicheng updated this revision to Diff 103940.Jun 26 2017, 7:00 AM
haicheng marked an inline comment as done.
haicheng retitled this revision from [InlineCost] Find identical loads in single block function to [InlineCost] Find identical loads in the callee.
haicheng edited the summary of this revision. (Show Details)

I enabled the support of multi-block function. The benchmark that got regressed before was spec2017/xalancbmk. I looked into it and no hot function (> 0.5% execution time) got changed.

Please take a look. Thank you.

Haicheng

The description says "The motivation example is a fully unrolled 3x3 matrix multiplication. It loads every data in matrix a and b three times because of the Stores between them", but from reading the code I don't see how this case is handled since it seems that you bail as soon as you see a store (or a call)? What did I miss? Is the description still up-to-date?

lib/Analysis/InlineCost.cpp
265 ↗(On Diff #101570)

Why do we have both LoadEliminationCostSavings and LoadEliminationCost ?

haicheng added a comment.EditedJun 28 2017, 7:08 AM

The description says "The motivation example is a fully unrolled 3x3 matrix multiplication. It loads every data in matrix a and b three times because of the Stores between them", but from reading the code I don't see how this case is handled since it seems that you bail as soon as you see a store (or a call)? What did I miss? Is the description still up-to-date?

Hi Mehdi,

Sorry for the confusion. The matrix multiplication was the test case, but I replaced it with some simple ones when I updated the patch.

Here is the IR of matrix multiplication example. Every input data are loaded three times (e.g. %arrayidx8). The address of Stores in the callee are based on an alloca in the caller. Inline cost model can figure out that these stores can be eliminated if the callee is inlined (see visitStore()), then the repeated Loads clobbered by the Stores are safe to be eliminated too. Does that make sense?

If the callee has Stores that cannot be eliminated, I just bail out for now.

%struct.matrix = type { [3 x [3 x double]] }

define void @outer(%struct.matrix* %a, %struct.matrix* %b) {
  %c = alloca %struct.matrix
  call void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c)
  ret void
}

; Every data in matrix a and b are loaded three times.
; typedef struct { double m[3][3]; } matrix;
; void matrix_multiply( matrix *a, matrix *b, matrix *c ){
;   int i,j,k;
;     for(i=0; i<3; i++)
;       for(j=0; j<3; j++)
;          for(k=0; k<3; k++)
;            c->m[i][j] += a->m[i][k] * b->m[k][j];
; }

define void @matrix_multiply(%struct.matrix* %a, %struct.matrix* %b, %struct.matrix* %c) {
entry:
  %arrayidx8 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 0, i64 0
  %arrayidx18 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 0, i64 0
  %0 = load double, double* %arrayidx8
  %arrayidx13 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 0, i64 0
  %1 = load double, double* %arrayidx13
  %mul = fmul double %0, %1
  %2 = load double, double* %arrayidx18
  %add = fadd double %2, %mul
  store double %add, double* %arrayidx18
  %arrayidx8.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 0, i64 1
  %3 = load double, double* %arrayidx8.1
  %arrayidx13.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 1, i64 0
  %4 = load double, double* %arrayidx13.1
  %mul.1 = fmul double %3, %4
  %add.1 = fadd double %add, %mul.1
  store double %add.1, double* %arrayidx18
  %arrayidx8.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 0, i64 2
  %5 = load double, double* %arrayidx8.2
  %arrayidx13.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 2, i64 0
  %6 = load double, double* %arrayidx13.2
  %mul.2 = fmul double %5, %6
  %add.2 = fadd double %add.1, %mul.2
  store double %add.2, double* %arrayidx18
  %arrayidx18.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 0, i64 1
  %7 = load double, double* %arrayidx8
  %arrayidx13.140 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 0, i64 1
  %8 = load double, double* %arrayidx13.140
  %mul.141 = fmul double %7, %8
  %9 = load double, double* %arrayidx18.1
  %add.142 = fadd double %9, %mul.141
  store double %add.142, double* %arrayidx18.1
  %10 = load double, double* %arrayidx8.1
  %arrayidx13.1.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 1, i64 1
  %11 = load double, double* %arrayidx13.1.1
  %mul.1.1 = fmul double %10, %11
  %add.1.1 = fadd double %add.142, %mul.1.1
  store double %add.1.1, double* %arrayidx18.1
  %12 = load double, double* %arrayidx8.2
  %arrayidx13.2.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 2, i64 1
  %13 = load double, double* %arrayidx13.2.1
  %mul.2.1 = fmul double %12, %13
  %add.2.1 = fadd double %add.1.1, %mul.2.1
  store double %add.2.1, double* %arrayidx18.1
  %arrayidx18.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 0, i64 2
  %14 = load double, double* %arrayidx8
  %arrayidx13.243 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 0, i64 2
  %15 = load double, double* %arrayidx13.243
  %mul.244 = fmul double %14, %15
  %16 = load double, double* %arrayidx18.2
  %add.245 = fadd double %16, %mul.244
  store double %add.245, double* %arrayidx18.2
  %17 = load double, double* %arrayidx8.1
  %arrayidx13.1.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 1, i64 2
  %18 = load double, double* %arrayidx13.1.2
  %mul.1.2 = fmul double %17, %18
  %add.1.2 = fadd double %add.245, %mul.1.2
  store double %add.1.2, double* %arrayidx18.2
  %19 = load double, double* %arrayidx8.2
  %arrayidx13.2.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %b, i64 0, i32 0, i64 2, i64 2
  %20 = load double, double* %arrayidx13.2.2
  %mul.2.2 = fmul double %19, %20
  %add.2.2 = fadd double %add.1.2, %mul.2.2
  store double %add.2.2, double* %arrayidx18.2
  %arrayidx8.146 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 1, i64 0
  %arrayidx18.147 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 1, i64 0
  %21 = load double, double* %arrayidx8.146
  %22 = load double, double* %arrayidx13
  %mul.149 = fmul double %21, %22
  %23 = load double, double* %arrayidx18.147
  %add.150 = fadd double %23, %mul.149
  store double %add.150, double* %arrayidx18.147
  %arrayidx8.1.151 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 1, i64 1
  %24 = load double, double* %arrayidx8.1.151
  %25 = load double, double* %arrayidx13.1
  %mul.1.153 = fmul double %24, %25
  %add.1.154 = fadd double %add.150, %mul.1.153
  store double %add.1.154, double* %arrayidx18.147
  %arrayidx8.2.155 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 1, i64 2
  %26 = load double, double* %arrayidx8.2.155
  %27 = load double, double* %arrayidx13.2
  %mul.2.157 = fmul double %26, %27
  %add.2.158 = fadd double %add.1.154, %mul.2.157
  store double %add.2.158, double* %arrayidx18.147
  %arrayidx18.1.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 1, i64 1
  %28 = load double, double* %arrayidx8.146
  %29 = load double, double* %arrayidx13.140
  %mul.141.1 = fmul double %28, %29
  %30 = load double, double* %arrayidx18.1.1
  %add.142.1 = fadd double %30, %mul.141.1
  store double %add.142.1, double* %arrayidx18.1.1
  %31 = load double, double* %arrayidx8.1.151
  %32 = load double, double* %arrayidx13.1.1
  %mul.1.1.1 = fmul double %31, %32
  %add.1.1.1 = fadd double %add.142.1, %mul.1.1.1
  store double %add.1.1.1, double* %arrayidx18.1.1
  %33 = load double, double* %arrayidx8.2.155
  %34 = load double, double* %arrayidx13.2.1
  %mul.2.1.1 = fmul double %33, %34
  %add.2.1.1 = fadd double %add.1.1.1, %mul.2.1.1
  store double %add.2.1.1, double* %arrayidx18.1.1
  %arrayidx18.2.1 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 1, i64 2
  %35 = load double, double* %arrayidx8.146
  %36 = load double, double* %arrayidx13.243
  %mul.244.1 = fmul double %35, %36
  %37 = load double, double* %arrayidx18.2.1
  %add.245.1 = fadd double %37, %mul.244.1
  store double %add.245.1, double* %arrayidx18.2.1
  %38 = load double, double* %arrayidx8.1.151
  %39 = load double, double* %arrayidx13.1.2
  %mul.1.2.1 = fmul double %38, %39
  %add.1.2.1 = fadd double %add.245.1, %mul.1.2.1
  store double %add.1.2.1, double* %arrayidx18.2.1
  %40 = load double, double* %arrayidx8.2.155
  %41 = load double, double* %arrayidx13.2.2
  %mul.2.2.1 = fmul double %40, %41
  %add.2.2.1 = fadd double %add.1.2.1, %mul.2.2.1
  store double %add.2.2.1, double* %arrayidx18.2.1
  %arrayidx8.260 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 2, i64 0
  %arrayidx18.261 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 2, i64 0
  %42 = load double, double* %arrayidx8.260
  %43 = load double, double* %arrayidx13
  %mul.263 = fmul double %42, %43
  %44 = load double, double* %arrayidx18.261
  %add.264 = fadd double %44, %mul.263
  store double %add.264, double* %arrayidx18.261
  %arrayidx8.1.265 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 2, i64 1
  %45 = load double, double* %arrayidx8.1.265
  %46 = load double, double* %arrayidx13.1
  %mul.1.267 = fmul double %45, %46
  %add.1.268 = fadd double %add.264, %mul.1.267
  store double %add.1.268, double* %arrayidx18.261
  %arrayidx8.2.269 = getelementptr inbounds %struct.matrix, %struct.matrix* %a, i64 0, i32 0, i64 2, i64 2
  %47 = load double, double* %arrayidx8.2.269
  %48 = load double, double* %arrayidx13.2
  %mul.2.271 = fmul double %47, %48
  %add.2.272 = fadd double %add.1.268, %mul.2.271
  store double %add.2.272, double* %arrayidx18.261
  %arrayidx18.1.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 2, i64 1
  %49 = load double, double* %arrayidx8.260
  %50 = load double, double* %arrayidx13.140
  %mul.141.2 = fmul double %49, %50
  %51 = load double, double* %arrayidx18.1.2
  %add.142.2 = fadd double %51, %mul.141.2
  store double %add.142.2, double* %arrayidx18.1.2
  %52 = load double, double* %arrayidx8.1.265
  %53 = load double, double* %arrayidx13.1.1
  %mul.1.1.2 = fmul double %52, %53
  %add.1.1.2 = fadd double %add.142.2, %mul.1.1.2
  store double %add.1.1.2, double* %arrayidx18.1.2
  %54 = load double, double* %arrayidx8.2.269
  %55 = load double, double* %arrayidx13.2.1
  %mul.2.1.2 = fmul double %54, %55
  %add.2.1.2 = fadd double %add.1.1.2, %mul.2.1.2
  store double %add.2.1.2, double* %arrayidx18.1.2
  %arrayidx18.2.2 = getelementptr inbounds %struct.matrix, %struct.matrix* %c, i64 0, i32 0, i64 2, i64 2
  %56 = load double, double* %arrayidx8.260
  %57 = load double, double* %arrayidx13.243
  %mul.244.2 = fmul double %56, %57
  %58 = load double, double* %arrayidx18.2.2
  %add.245.2 = fadd double %58, %mul.244.2
  store double %add.245.2, double* %arrayidx18.2.2
  %59 = load double, double* %arrayidx8.1.265
  %60 = load double, double* %arrayidx13.1.2
  %mul.1.2.2 = fmul double %59, %60
  %add.1.2.2 = fadd double %add.245.2, %mul.1.2.2
  store double %add.1.2.2, double* %arrayidx18.2.2
  %61 = load double, double* %arrayidx8.2.269
  %62 = load double, double* %arrayidx13.2.2
  %mul.2.2.2 = fmul double %61, %62
  %add.2.2.2 = fadd double %add.1.2.2, %mul.2.2.2
  store double %add.2.2.2, double* %arrayidx18.2.2
  ret void
}

Haicheng

lib/Analysis/InlineCost.cpp
265 ↗(On Diff #101570)

LoadEliminationCost is used in the cost calculation, and LoadEliminationCostSavings is used for stats printing. This is the same as SROACostSavings and SROAArgCosts.

Kindly ping.

Kindly ping.

Will try to get to this patch this week, sorry for delay and thanks for the update.

eraman added inline comments.Jul 7 2017, 4:22 PM
lib/Analysis/InlineCost.cpp
836 ↗(On Diff #103940)

Nit: As written, IMO this is not very clear. I would be more specific - "... there is no reachable stores/calls or all reachable stores can be SROA'd..."

935 ↗(On Diff #103940)

If simplifyCallSite returns true below, disabling this is not needed. Even this is a bit conservative - I think you could call canConstantFoldCallTo directly and if it returns true do not disable load elimination.

haicheng updated this revision to Diff 105850.Jul 10 2017, 7:40 AM
haicheng marked 2 inline comments as done.

Check canConstantFoldCallTo() in visitCallSite(). Thank you, Easwaran.

mehdi_amini added inline comments.Jul 12 2017, 8:14 PM
lib/Analysis/InlineCost.cpp
327 ↗(On Diff #105850)

It seems to me that it is LoadEliminationCost that needs to be set to 0 instead LoadEliminationCostSavings?

265 ↗(On Diff #101570)

I'm still not making sense of the difference between the two: as far as I can tell they are zero-initialized in the same ctor and gets incremented the same way?

Hi Mehdi,

I can remove LoadEliminationCostSavings if you think it is better.

Haicheng

lib/Analysis/InlineCost.cpp
327 ↗(On Diff #105850)

When load elimination is disabled, LoadEliminationCostSavings needs to be set to 0 so that we can print the correct value. LoadEliminationCost is not used any more after LoadElimination is disabled.

265 ↗(On Diff #101570)

Your understanding is correct. If load elimination is not disabled, they should have the same value. LoadEliminationCostSavings is just used in stats printing.

Eventually, every load address needs to have its own LoadEliminationCost as written in the FIXME no.1. At that time, LoadEliminationCostSavings is the sum of all LoadEliminationCost.

mehdi_amini added inline comments.Jul 13 2017, 9:07 AM
lib/Analysis/InlineCost.cpp
327 ↗(On Diff #105850)

But calling multiple times disableLoadElimination will increase the Cost which does not seem right?

haicheng updated this revision to Diff 106645.Jul 14 2017, 8:33 AM

Add a check of EnableLoadElimination before calling disableLoadElimination() in disableSROA().

haicheng added inline comments.Jul 14 2017, 8:35 AM
lib/Analysis/InlineCost.cpp
327 ↗(On Diff #105850)

disableLoadElimination() should be called only once. It is guarded by the flag EnableLoadElimination. I missed one check in disableSROA() before calling disableLoadElimination() and now I fix it. Thank you.

mehdi_amini added inline comments.Jul 14 2017, 9:25 AM
lib/Analysis/InlineCost.cpp
327 ↗(On Diff #105850)

It seems fragile: I'd expect the API itself to be idempotent. I'm fine with guarding it on EnableLoadElimination but the guard should be in the method itself.

haicheng updated this revision to Diff 106693.Jul 14 2017, 12:39 PM

Move the check of EnableLoadElimination inside disableLoadElimination().

haicheng marked 2 inline comments as done.Jul 14 2017, 12:43 PM

Kindly Ping.

Kindly Ping (#2).

Kindly Ping (#3).

Kindly Ping (#4)

Would anyone please take a look? I believe I made all the changes that are requested.

Kindly Ping (#5)

This patch can reduce the inline cost of the questionable function in PR34173.

LGTM with the comments addressed, but please check with Chandler if he has more to add.

lib/Analysis/InlineCost.cpp
150 ↗(On Diff #106693)

Nit: SmallPtrSet rounds up the size to a power of 2 and so this will get rounded up to 16. I prefer using a power of 2 explicitly rather than just letting the implementation round it up.

265 ↗(On Diff #101570)

I understand that once you improve this you might mimic what SROA does and need to have a separate variable for both. But you don' need this now and so I agree with Mehdi's comments.

test/Transforms/Inline/redundant-loads.ll
67 ↗(On Diff #106693)

Even if this load is eliminated, this callee will not be inlined into the caller (cost and threshold are both 0) and so you're really not testing that the call is preventing the second load from being eliminated. One fix is to increase the threshold to 5 and adjust the other cases slightly if needed.

haicheng updated this revision to Diff 112414.Aug 23 2017, 12:00 PM
haicheng marked 6 inline comments as done.

Address Easwaran's comments. Thank you very much.

@chandlerc , would you please take another look?

Haicheng

mcrosier added inline comments.Sep 1 2017, 11:06 AM
lib/Analysis/InlineCost.cpp
1008 ↗(On Diff #112414)

"there is there is"

haicheng updated this revision to Diff 113577.Sep 1 2017, 12:35 PM

Fix a typo found by Chad. Thank you.

(I suspect most folks are waiting for chandler to say whether he feels this is good enough at this point or not)

Kindly Ping

(I suspect most folks are waiting for chandler to say whether he feels this is good enough at this point or not)

I agree.

chandlerc requested changes to this revision.Sep 19 2017, 6:01 PM

Sorry for the awful delay when I got pulled into stuff, should be able to stay with the (small) stuff left in the review.

lib/Analysis/InlineCost.cpp
1469–1470 ↗(On Diff #101570)

Yeah, this makes sense. Especially the fact that the multiple block check handles the trivial cases for us makes it easy. THanks for the explanation!

1008 ↗(On Diff #113577)

'data are' -> 'data is'.

1009 ↗(On Diff #113577)

I would say something more like:

'If the data is already leaded from this address and hasn't be clobbered by any stores or calls, this load ...'

1011–1012 ↗(On Diff #113577)

No need for two ifs, short circuit is sufficient.

1033 ↗(On Diff #113577)

'clobber the repeated loads and prevent them from elimination' -> 'clobber loads and prevent repeated loads from being eliminated'

1106–1107 ↗(On Diff #113577)

I don't think we should use canConstantFoldCallTo here, because simplifyCallSite will already do that below along with a bunch of other things.

I think you really want to sink this down and handle it in a more detailed way. As a specific example: you should handle debug info intrinsics correctly (ignore them) and you should probably make some effort to handle lifetime markers because it seems like those can be handled trivially.

test/Transforms/Inline/redundant-loads.ll
8 ↗(On Diff #113577)

I really dislike basing tests on debug prints. It makes them quite brittle. Typically for the inliner we craft a function that is too expensive to inline w/o the specific threshold applying, and then check that it does or doesn't get inlined in each case.

This revision now requires changes to proceed.Sep 19 2017, 6:01 PM
haicheng updated this revision to Diff 116206.Sep 21 2017, 9:25 AM
haicheng marked 7 inline comments as done.

Thank you, Chandler.

I rewrote the test cases, fixed the comments, and made the patch less conservative when checking a callsite.

haicheng updated this revision to Diff 116384.Sep 22 2017, 12:05 PM

Fix a typo in the test case. Please take a look, thank you.

haicheng updated this revision to Diff 117194.Sep 29 2017, 12:03 PM

Whitelist lifetime and debug info intrinsics as not needed to disable load elimination.

Please take a look. Thank you.

mcrosier added inline comments.Sep 29 2017, 1:01 PM
lib/Analysis/InlineCost.cpp
1129 ↗(On Diff #117194)

Does the assume intrinsic fall into the category as well (or is it properly handled by the default case)?

hfinkel added inline comments.
lib/Analysis/InlineCost.cpp
1133 ↗(On Diff #117194)

llvm.ptr.annotation seems like it might belong in this list too (maybe this is the same list as isAssumeLikeIntrinsic in ValueTracking.cpp?).

haicheng added inline comments.Sep 29 2017, 1:33 PM
lib/Analysis/InlineCost.cpp
1133 ↗(On Diff #117194)

I think all free intrinsics can be put here including (see TargetTransformInfoImpl.h getIntrinsicCost()):

Intrinsic::annotation:
Intrinsic::assume:
Intrinsic::dbg_declare:
Intrinsic::dbg_value:
Intrinsic::invariant_start:
Intrinsic::invariant_end:
Intrinsic::lifetime_start:
Intrinsic::lifetime_end:
Intrinsic::objectsize:
Intrinsic::ptr_annotation:
Intrinsic::var_annotation:
Intrinsic::experimental_gc_result:
Intrinsic::experimental_gc_relocate:
Intrinsic::coro_alloc:
Intrinsic::coro_begin:
Intrinsic::coro_free:
Intrinsic::coro_end:
Intrinsic::coro_frame:
Intrinsic::coro_size:
Intrinsic::coro_suspend:
Intrinsic::coro_param:
Intrinsic::coro_subfn_addr:

What do you think?

hfinkel added inline comments.Sep 29 2017, 2:08 PM
lib/Analysis/InlineCost.cpp
1133 ↗(On Diff #117194)

That seems dangerous. I don't think it's a good idea to derive semantic information from the cost model.

haicheng updated this revision to Diff 117297.Oct 1 2017, 6:52 PM

Use ValueTracking::isAssumeLikeIntrinsic() to find intrinsics that do not clobber load.

Kindly ping

Kindly Ping (#2)

Kindly Ping (#3)

Kindly Ping (#4)

Kindly Ping (#5)

Kindly Ping (#6)

Kindly Ping (#7)

Kindly Ping (#8)

I ran the spec2006/17 performance again on top of the ToT. Here is the number:

BenchmarkPerformance (+ is better)Code Size (+ is larger)
spec2006/milc+6.14%-0.04%
spec2017/deepsjeng+1.06%-0.32%
spec2017/xalancbmk+0.74%0%
spec2006/povray+0.14%0%
spec2006/h264ref+0.02%+0.13%
spec2017/imagick0%+0.01%
spec2017/gcc-0.02%0%
spec2017/povray-0.12%0%
spec2017/blender-0.52%0.02%
spec2006/xalancbmk-0.71%0%

I took a closer look at this and LGTM.
I tried to think about other possible negative test cases , but nothing other than outer2 and outer3 I can think of.

haicheng updated this revision to Diff 126283.Dec 9 2017, 1:55 PM

Thank you, Jun.

I added a test case about indirect call.

mcrosier accepted this revision.Dec 11 2017, 6:44 AM

All of the review feedback has been addressed and I have no additional comments. LGTM. Thanks, Haicheng.

Thank you, Chad. I will wait for a couple of days before I commit this patch in case I will get some new comments.

This revision was not accepted when it landed; it landed in state Needs Review.Dec 15 2017, 6:35 AM
Closed by commit rL320814: [InlineCost] Find repeated loads in the callee (authored by haicheng, committed by ). · Explain Why
This revision was automatically updated to reflect the committed changes.