Page MenuHomePhabricator

[SCEVExpander] Make findExistingExpansion smarter
ClosedPublic

Authored by flyingforyou on Dec 16 2015, 12:59 AM.

Details

Summary

Extending findExistingExpansion can use existing value in ExprValueMap.

This patch gives 0.3~0.5% performance improvements on benchmarks(test-suite, spec2000, spec2006, commercial benchmark)

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
flyingforyou added inline comments.Dec 23 2015, 6:43 PM
test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
27–28 ↗(On Diff #43517)

It will not be unroll factor. I just want to make example that trip count can be divided by original codes. Not belong to IV's step is unknown likes first example. The main point of first example(test) is that if IV's step is unknown, we should make division for getting tripcount for unrolled loop.

we don't need to generate a new expression for the new tripcount. Is this correct?

This is correct.

If you have better explanation about this example, please let me know.

53 ↗(On Diff #43517)

This is original code. I'm not sure it's ok to remove.
Sanjoy, Can I remove this?

Current algorithm doesn't care about where does trip count's division come from.
That means, if TripCount is divided by some value in loop's preheader, compiler will give up doing unrolling. (Even if IV's step is one or minus one.)

If IV's step is constant likes one or minus one or multiple of 2, we don't need to generate division for computing trip count.

I ran some benchmarks against http://reviews.llvm.org/differential/diff/43408/

spec regressions on a Cortex-A57 (A64):

spec.cpu2000.ref.253_perlbmk 	4.71%
spec.cpu2000.ref.256_bzip2 	2.07%
spec.cpu2006.ref.433_milc 	1.24%

There's a small improvement in 254_gap

spec.cpu2000.ref.254_gap 	-1.76%

(I'm focussing on SPEC because it's less noisy than the benchmarks in test-suite, although I do note a 23% improvement in lnt.MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan and a 7.76% regression in lnt.SingleSource/Benchmarks/Shootout-C++/ackermann)

spec regressions on a Cortex-A57 (T32):

spec.cpu2006.ref.482_sphinx3 	2.31%

and an improvement in:

spec.cpu2000.ref.181_mcf 	-2.13%

I tested this patch on r256132 with test-suite, spec2000,2006, commercial benchmarks. There is no regression on Cortex-A57.

Did you notice these regressions on your hardware? I find it strange to think this could all be accounted to different revisions of this core. I'm testing on this platform: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dto0038a/index.html)

I see some small improvments in a commerical benchmark (1-2%), but it is worth mentioning the SPEC changes, since I wouldn't write those off as no regressions; it's definite regression in SPEC, but whether the other improvements in test-suite balance them in the view of the community is debatable.

flyingforyou marked an inline comment as done.Dec 24 2015, 8:24 PM

I ran test-suite, spec on Galaxy S6 Target(CPU 2.1Ghz, MIF 1.5Ghz, Of course it's developtment board. Not exactly same with Galaxy S6).
I also can test on Juno. But it's very slow and I want to test on real world environment.

In my test, I got different data.

Charlie's test result
spec on a Cortex-A57 (A64):
spec.cpu2000.ref.253_perlbmk 4.71%
spec.cpu2000.ref.256_bzip2 2.07%
spec.cpu2006.ref.433_milc 1.24%
spec.cpu2000.ref.254_gap -1.76%

(I'm focussing on SPEC because it's less noisy than the benchmarks in test-suite, although I do note a 23% improvement in lnt.MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan and a 7.76% regression in lnt.SingleSource/Benchmarks/Shootout-C++/ackermann)

My test result about Spec (Opt/Ori)
spec.cpu2000.ref.253_perlbmk 	-> I can't get result. I think without modifying source code, we can't compile it.
spec.cpu2000.ref.256_bzip2 	99.93% 
spec.cpu2006.ref.433_milc 	99.52%
spec.cpu2000.ref.254_gap 	99.18%
My test result about test-suite (Opt/Ori)
I got totally different result on this.
SingleSource/Benchmarks/Shootout-C++/ackermann	81.35%
MultiSource/Benchmarks/MiBench/automotive-susan/automotive-susan -> execution time is 0. It's same.

I think both of data also can be worth. And This data is not meaningful any more. As you know, the patch was completely changed. I will share the result data base on recent patch soon.

PS. Could you share how to build "spec.cpu2000.ref.253_perlbmk", please?

I ran some benchmarks against http://reviews.llvm.org/differential/diff/43578/. (Base Revision: r256315)
Env: Cortex-A57 2.1Ghz, MIF 1.5Ghz, AArch64
Test: Execute each benchmark 4 times and get average.

Test NameOpt_Exe_Time/Ori_Exe_Time
External/SPEC/CFP2000/177.mesa/177.mesa100.33%
External/SPEC/CFP2000/179.art/179.art102.20%
External/SPEC/CFP2000/183.equake/183.equake99.79%
External/SPEC/CFP2000/188.ammp/188.ammp99.97%
External/SPEC/CFP2006/433.milc/433.milc100.05%
External/SPEC/CFP2006/444.namd/444.namd100.06%
External/SPEC/CFP2006/447.dealII/447.dealII100.08%
External/SPEC/CFP2006/450.soplex/450.soplex100.00%
External/SPEC/CFP2006/470.lbm/470.lbm99.97%
External/SPEC/CINT2000/164.gzip/164.gzip100.13%
External/SPEC/CINT2000/175.vpr/175.vpr99.67%
External/SPEC/CINT2000/176.gcc/176.gcc99.60%
External/SPEC/CINT2000/181.mcf/181.mcf99.93%
External/SPEC/CINT2000/186.crafty/186.crafty99.76%
External/SPEC/CINT2000/197.parser/197.parser100.34%
External/SPEC/CINT2000/254.gap/254.gap100.34%
External/SPEC/CINT2000/255.vortex/255.vortex99.61%
External/SPEC/CINT2000/256.bzip2/256.bzip299.71%
External/SPEC/CINT2006/400.perlbench/400.perlbench100.21%
External/SPEC/CINT2006/401.bzip2/401.bzip299.87%
External/SPEC/CINT2006/403.gcc/403.gcc99.74%
External/SPEC/CINT2006/429.mcf/429.mcf100.25%
External/SPEC/CINT2006/445.gobmk/445.gobmk100.89%
External/SPEC/CINT2006/456.hmmer/456.hmmer101.26%
External/SPEC/CINT2006/458.sjeng/458.sjeng99.88%
External/SPEC/CINT2006/462.libquantum/462.libquantum99.75%
External/SPEC/CINT2006/471.omnetpp/471.omnetpp98.04%
External/SPEC/CINT2006/473.astar/473.astar99.80%
External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk98.48%
Geomean99.99%

Other benchmarks in test-suite's Geomean value : 99.73%

On commertial benchmark, we can see the 0.1 ~0.2% improvement of overall.

Junmo.

sanjoy requested changes to this revision.Jan 17 2016, 1:12 PM
sanjoy edited edge metadata.
sanjoy added inline comments.
lib/Analysis/ScalarEvolutionExpander.cpp
1825 ↗(On Diff #43578)

Nit: do you mean I's operands?

1828 ↗(On Diff #43578)

Nit: LLVM convention is Depth.

1833 ↗(On Diff #43578)

Might want to check SE.isSCEVAble(I->getType()) before calling getSCEV to avoid creating too many SCEVUnknown objects unnecessarily.

1836 ↗(On Diff #43578)

Use a range for here.

1838 ↗(On Diff #43578)

I think it will be cleaner to handle the non-instruction case in the caller (and remove the I == nullptr check above), since there is no reason to have the initial caller (first level of recursion) pass in nullptr for I.

I don't think depth++ is correct -- you'll blow the stack if I is a PHI node with itself as its first operand. At the very least, it needs to be ++depth. Also, since you're calling the counter depth, it'd make more sense to pass in depth + 1 here, and not ++depth. If you do want to increment depth in place, it might be better to call it Iter.

1871 ↗(On Diff #43578)

Why do you care about RHS being a constant?

I think it is better to have the logic be:

  1. Check LHS as usual
  2. If RHSV is an Instruction then check RHS as well
  3. Use FindSameSCEVInst to do a deeper search on LHS

Is there any reason why you don't need to call FindSameSCEVInst on RHS as well (when it is an instruction)?

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
53 ↗(On Diff #43578)

Reading the code for @test, I think the metadata is required to tell LLVM that it is okay to generate a divide by %step if it thinks that is profitable to do (and so the test case actually checks LLVM's heuristics, not correctness).

This revision now requires changes to proceed.Jan 17 2016, 1:12 PM
flyingforyou marked 5 inline comments as done and an inline comment as not done.Jan 17 2016, 10:40 PM
flyingforyou added inline comments.
lib/Analysis/ScalarEvolutionExpander.cpp
1825 ↗(On Diff #43578)

Yep.

1838 ↗(On Diff #43578)

Sure, I will remove` I == nullptr`, add nullptr check when calling this function.

Thanks. I will change ++depth to Depth + 1.

1871 ↗(On Diff #43578)

There is no critical reason that I care about RHS beging a constant.

I will modify the logic what you said. It seems more proper way for findExistingExpansion.

Thanks.

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
53 ↗(On Diff #43578)

Thanks. I'll remain the code.

flyingforyou edited edge metadata.
flyingforyou marked 2 inline comments as done.

Addressed Sanjoy's comments.

mzolotukhin requested changes to this revision.Jan 20 2016, 4:17 PM
mzolotukhin edited edge metadata.

Hi Junmo,

Please find some comments inline.

Thanks,
Michael

lib/Analysis/ScalarEvolutionExpander.cpp
1863–1864 ↗(On Diff #45155)

I don't understand why we need to break symmetry in this code. Anyway, if we do this, we should either add an assert that the opposite situation never happens, or properly handle both cases. The fact that we've only seen one case is not a justification for not handling the other one.

IOW, why can't LHS be Value, and RHS be Instruction?

lib/Transforms/Utils/LoopUnrollRuntime.cpp
346–347 ↗(On Diff #45155)

This should be in the same patch, otherwise the change will only paper over the issue, and lead to actually worse code in the end. If expandCodeFor isn't updated, it'll fail to reuse the existing value, so it'll generate yet another division. The idea of the patch is to teach SCEV to reuse existing expressions, not just hack around the high cost of division. The division should still be considered expensive, so we definitely shouldn't generate redundant ones.

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
27–28 ↗(On Diff #45155)

Maybe something like this?

Though SCEV for loop tripcount contains division, it shouldn't be considered expensive, since the division already exists in the code and we don't need to expand it once more. Thus, it shouldn't prevent us from unrolling the loop.

Also, I think I don't completely understand why we care so much about step being -1. Why is it important?

53 ↗(On Diff #45155)

This metadata is used as !range !0 in @test and as !tbaa !0 in @test2. Are you sure this is correct?

This revision now requires changes to proceed.Jan 20 2016, 4:17 PM
flyingforyou added inline comments.Jan 20 2016, 9:19 PM
lib/Analysis/ScalarEvolutionExpander.cpp
1863–1864 ↗(On Diff #45155)

I think LHS can be Value. But what I know is above pattern can cover most of cases.

if (match(BB->getTerminator(),
           m_Br(m_ICmp(Pred, m_Value(LHSV), m_Value(RHSV)), TrueBB,
                FalseBB))) {
  dbgs() << "m_Value(LHSV), m_Value(RHSV) pattern\n";
  if (match(BB->getTerminator(),
             m_Br(m_ICmp(Pred, m_Instruction(LHS), m_Instruction(RHS)), TrueBB,
                  FalseBB)))
    dbgs() << "m_Instruction(LHS), m_Instruction(RHS) pattern\n";
  else if (match(BB->getTerminator(),
             m_Br(m_ICmp(Pred, m_Instruction(LHS), m_Value(RHSV)), TrueBB,
                  FalseBB)))
    dbgs() << "m_Instruction(LHS), m_Value(RHSV) pattern\n";
  else
    dbgs() << "unknown(LHS), unknown(RHS) pattern\n";

}

When I test above code, I can't find "unknown(LHS), unknown(RHS) pattern\n" on building big benchmarks programs including LLVM itself.

m_Value(LHSV), m_Value(RHSV) pattern50650
m_Instruction(LHS), m_Instruction(RHS) pattern30178
m_Instruction(LHS), m_Value(RHSV) pattern20472

So I think it's no necessary to consider LHS is Value now.

Actually, we don't make compare instruction likes cmp constant, reg...

lib/Transforms/Utils/LoopUnrollRuntime.cpp
346–347 ↗(On Diff #45155)

Michael, I am not an expert of SCEV. I think you might want to get review from Sanjoy.

Sanjoy, How about applying below patch on this?

-    if (SE.getSCEV(LHS) == S && SE.DT.dominates(LHS, At))
+    const SCEV *LHSSE = SE.getSCEV(LHS);
+    if (LHSSE->getType() == S->getType() &&
+        isa<SCEVConstant>(SE.getMinusSCEV(LHSSE, S)) &&
+        SE.DT.dominates(LHS, At))
       return LHS;
 
     if (Instruction *RHS = dyn_cast<Instruction>(RHSV)) {
-      if (SE.getSCEV(RHS) == S && SE.DT.dominates(RHS, At))
+      const SCEV *RHSSE = SE.getSCEV(RHS);
+      if (RHSSE->getType() == S->getType() &&
+          isa<SCEVConstant>(SE.getMinusSCEV(RHSSE, S)) &&
+          SE.DT.dominates(RHS, At))
         return RHS;
     }
test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
27–28 ↗(On Diff #45155)

Cool. Michael.
I will modify comments as you said.

-1 indicates that tripcount doesn't need to be divided for calculation when loop unrolling.
Anyway thanks for good sugesstion.

53 ↗(On Diff #45155)

I don't think that makes errors. But I got your point. Thanks.
I will modify !tbaa !0 to !tbaa !1
and add !1 = !{i64 0}

Is this ok?

Michael, I replied your inline comments. Could you check them, please?

Sanjoy, Michael want to insert some patch. Could you give him a opinion, please?

Junmo.

mzolotukhin added inline comments.Jan 25 2016, 12:09 PM
lib/Analysis/ScalarEvolutionExpander.cpp
1863–1864 ↗(On Diff #45155)

My concern isn't about us missing some real cases right now, it's more about future maintenance of this code. If later on someone finds such a case, it'll be hard to figure out, why isn't it covered in the first place, when the code was committed.

test/Transforms/LoopUnroll/high-cost-trip-count-computation.ll
53 ↗(On Diff #45155)

This is ok (or you can just remove !tbaa !1 and !1 = !{i64 0} at all). Thanks!

flyingforyou updated this object.
flyingforyou edited edge metadata.
flyingforyou marked 5 inline comments as done.

Addressed Michael's comments.
Thanks for good description example.

Waiting Sanjoy's comment.

sanjoy requested changes to this revision.Jan 27 2016, 12:40 PM
sanjoy edited edge metadata.

This is looking close to ready to land now. Some nitpicky comments inline.

lib/Analysis/ScalarEvolutionExpander.cpp
1827 ↗(On Diff #46105)

How about renaming this to FindCongruentInst, or FindInstWithSameSCEV. FindSameSCEVInst does not sound okay grammatically.

1834 ↗(On Diff #46105)

I'm worried that we're possibly doing too much work here (getSCEV can be expensive). Can we further filter on the type of I? For instance, if the width of I->getType() and S->getType() are different, then we don't need to bother calling getSCEV on I.

Edit: I'm much less worried about doing too much work here, after seeing the guard on SE.hasOperand in the caller; but I still think filtering on the type is a good idea.

1865 ↗(On Diff #46105)

These two identical blocks should definitely be combined into a lambda.

1866 ↗(On Diff #46105)

I'd "CSE" the two calls to getSCEV(LHS) to avoid the extra dictionary lookup.

1870 ↗(On Diff #46105)

Please don't reuse the LHS variable here. Perhaps auto *ExistingInst = ..?

1833 ↗(On Diff #45155)

Minor: I'd add a small comment that you're doing the pre-check on using isSCEVable to avoid creating too many SCEVUnknown expressions.

This revision now requires changes to proceed.Jan 27 2016, 12:40 PM
mzolotukhin requested changes to this revision.Jan 27 2016, 4:40 PM
mzolotukhin edited edge metadata.
mzolotukhin added inline comments.
lib/Transforms/Utils/LoopUnrollRuntime.cpp
346–347 ↗(On Diff #46105)

We can wait for Sanjoy's opinion too, but the patch can't be landed without this issue fixed first anyway.

I slightly modified your testcase to expose the issue:

define i32 @test2(i64* %loc, i64 %conv7) {
entry:
  %rem0 = load i64, i64* %loc, align 8
  %ExpensiveComputation = udiv i64 %rem0, 42 ; <<< Extra computations are added to the trip-count expression
  br label %bb1
bb1:
  %div11 = udiv i64 %ExpensiveComputation, %conv7
  %cmp.i38 = icmp ugt i64 %div11, 1
  %div12 = select i1 %cmp.i38, i64 %div11, i64 1
  br label %for.body
for.body:
  %rem1 = phi i64 [ %rem0, %bb1 ], [ %rem2, %for.body ]
  %k1 = phi i64 [ %div12, %bb1 ], [ %dec, %for.body ]
  %mul1 = mul i64 %rem1, 48271
  %rem2 = urem i64 %mul1, 2147483647
  %dec = add i64 %k1, -1
  %cmp = icmp eq i64 %dec, 0
  br i1 %cmp, label %exit, label %for.body
exit:
  %rem3 = phi i64 [ %rem2, %for.body ]
  store i64 %rem3, i64* %loc, align 8
  ret i32 0
}

If we apply the patch and run this test thru opt, the loop will be successfully unrolled, *but* we'll generate the code for "expensive" computations. This is IR after unrolling:

...
entry:
  %rem0 = load i64, i64* %loc, align 8
  %ExpensiveComputation = udiv i64 %rem0, 42
  br label %bb1

bb1:                                              ; preds = %entry
  %div11 = udiv i64 %ExpensiveComputation, %conv7
  %cmp.i38 = icmp ugt i64 %div11, 1
  %div12 = select i1 %cmp.i38, i64 %div11, i64 1
  %0 = udiv i64 %rem0, 42      ; <<<< This is our %ExpensiveComputation, and we just duplicated it!
  %1 = udiv i64 %0, %conv7     ; <<<
  %2 = icmp ugt i64 %1, 1
  %umax = select i1 %2, i64 %1, i64 1
  %3 = add i64 %umax, -1
  %xtraiter = and i64 %umax, 7
  %lcmp.mod = icmp ne i64 %xtraiter, 0
  br i1 %lcmp.mod, label %for.body.prol, label %bb1.split
...

Please note inserted instruction %0 and %1 - they're completely redundant (as we compute the same value, that we have in %div11).

So, as I said, it's not sufficient to just fix findExistingExpransion, we also need to fix SCEVExpander::expand.

sanjoy added inline comments.Jan 27 2016, 5:43 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
346–347 ↗(On Diff #46105)

I think the original idea was that if we know that a certain value is already being computed, then we can emit code to compute it again, and rely on later passes to get rid of the redundancy (so in your example above we'd expect %0 and %1 to be CSE'ed with %ExpensiveComputation and %div11).

But I understand your concern that this is a pessimization unless we run some cleanup passes afterwards, and even then we rely on the cleanup passes being smart enough to undo our mess. Perhaps SCEVExpander::expand can be fixed in a separate change (since the scope of that fix is bigger than what is addressed here) that goes in before this one?

mzolotukhin added inline comments.Jan 27 2016, 6:00 PM
lib/Transforms/Utils/LoopUnrollRuntime.cpp
346–347 ↗(On Diff #46105)

My example was pretty simple, just to demonstrate the problem, and in this case indeed a simple clean-up would be sufficient to remove the introduced redundancy. But it's easy to construct a more complicated test, which existing clean-up passes might have problems to deal with, and since SCEV expressions might be very complicated, expanding all of them might be not a good idea, especially taking into account that some of them are actually expensive.

As for separating the patches - I'm entirely for it, but I just want to first make sure that there are no fundamental issues that can prevent us from fixing expand. My biggest concern is that this change leads to somewhat inconsistent state, where findExistingExpansion and expand have different assumptions. I'll feel much easier about it once I see the expand part of the patch.

flyingforyou marked 6 inline comments as done.Jan 27 2016, 9:47 PM
flyingforyou added inline comments.
lib/Analysis/ScalarEvolutionExpander.cpp
1834 ↗(On Diff #46105)

Thanks for good sugesstion.

1870 ↗(On Diff #46105)

Thanks for guiding this. As you said, using lambda could be more clear than before. Great!.

flyingforyou edited edge metadata.
flyingforyou marked 2 inline comments as done.

Addressed Sanjoy's comments.

Thanks,
Junmo.

Sanjoy, I am waiting your review. :)

Michael, could you make an example which is more complex that current clean up passes can't deal with?
I checked your previous example's expensive expression is cleaned up.

mzolotukhin requested changes to this revision.Feb 3 2016, 3:53 PM
mzolotukhin edited edge metadata.

Michael, could you make an example which is more complex that current clean up passes can't deal with? I checked your previous example's expensive expression is cleaned up.

Providing a concrete example here isn't even that important due to arguments I raised earlier, but here you go:

@a = global i64 1

declare void @foo()

define i32 @test3(i64* %loc, i64 %conv7) {
entry:
  %rem0 = load i64, i64* %loc, align 8
  %ExpensiveComputation = udiv i64 %rem0, 42
  br label %outer.header

outer.header:
  %i = phi i64 [ 1, %entry ], [ %i.next, %outer.latch ]
  br label %bb1

bb1:
  %addr = getelementptr i64, i64* @a, i64 %i
  %x = load volatile i64, i64* %addr
  %div11 = udiv i64 %x, %i
  %cmp.i38 = icmp ugt i64 %div11, 1
  %div12 = select i1 %cmp.i38, i64 %div11, i64 1
  br label %for.body

for.body:
  %rem1 = phi i64 [ %rem0, %bb1 ], [ %rem2, %for.body ]
  %k1 = phi i64 [ %div12, %bb1 ], [ %dec, %for.body ]
  %mul1 = mul i64 %rem1, 48271
  %rem2 = urem i64 %mul1, 2147483647
  %dec = add i64 %k1, -1
  %cmp = icmp eq i64 %dec, 0
  br i1 %cmp, label %outer.latch, label %for.body

outer.latch:
  %i.next = add i64 %i, 1
  %outer.cmp = icmp eq i64 %i.next, 1000
  br i1 %outer.cmp, label %exit, label %outer.header

exit:
  %rem3 = phi i64 [ %rem2, %outer.latch ]
  store i64 %rem3, i64* %loc, align 8
  ret i32 0
}

I ran it with

opt < test.ll -S -unroll-runtime -loop-unroll -early-cse -instcombine -simplifycfg -licm

and always saw a code like this in the output:

%x = load volatile i64, i64* %addr, align 8
%div11 = udiv i64 %x, %i
%cmp.i38 = icmp ugt i64 %div11, 1
%div12 = select i1 %cmp.i38, i64 %div11, i64 1
%1 = udiv i64 %x, %0                           ; <<< Redundant expensive computation
%2 = icmp ugt i64 %1, 1
%umax = select i1 %2, i64 %1, i64 1

Probably, you can come up with another set of passes that will be able to clean this up, but then the example might be made even more complicated, so the new set will fail. The problem here is that with this patch compiler thinks that it generates no redundant code to clean-up, while in fact it's not even limited in what it can generate - we expand *any* existing SCEV expression.

So, I'd suggest you to address the issue with expand before going further with this one.

Michael

This revision now requires changes to proceed.Feb 3 2016, 3:53 PM

Looks like the change I've been asking for function has been committed (see D12090). So, probably we just need to update this patch, and it'll be ok to go in.

flyingforyou edited edge metadata.

Addressed Michael's comment.

It's very good timing.!

Thanks.

mzolotukhin added inline comments.Feb 5 2016, 11:42 AM
lib/Analysis/ScalarEvolutionExpander.cpp
1905 ↗(On Diff #46993)

I think FindCongruentInst isn't needed right now. The logic in findExistingExpansion should basically mirror the logic from expand: expand just looks at the hash map, so we should do the same here. (And that should do the trick in our example here)

flyingforyou edited edge metadata.
flyingforyou marked an inline comment as done.

Addressed Michael's comment.

Hi Junmo,

It looks better now, couple of comments/questions inline.

Michael

lib/Analysis/ScalarEvolutionExpander.cpp
1865–1870 ↗(On Diff #47946)

Is there any point from changing LHS and RHS from Instruction to Value? Looks like later on we cast it to Instruction anyway..

1901–1902 ↗(On Diff #47946)

Why do we need these conditions? Isn't dominance sufficient?

Also, would it make sense to factor out this code block to a separate function (and use it both here and in expand)?

flyingforyou marked an inline comment as done.Feb 15 2016, 4:19 PM
flyingforyou added a subscriber: sunfish.
flyingforyou added inline comments.
lib/Analysis/ScalarEvolutionExpander.cpp
1901–1902 ↗(On Diff #47946)

Attached Wei's comment in r259815

Fix a regression for r259736.

When SCEV expansion tries to reuse an existing value, it is needed to ensure
that using the Value at the InsertPt will not break LCSSA. The fix adds a
check that InsertPt is either inside the candidate Value's parent loop, or
the candidate Value's parent loop is nullptr.

would it make sense to factor out this code block to a separate function (and use it both here and in expand)?

Sure.

flyingforyou updated this object.

Addressed Michael's comments.
Thanks.

mzolotukhin accepted this revision.Feb 15 2016, 4:41 PM
mzolotukhin edited edge metadata.

Attached Wei's comment in r259815

Fix a regression for r259736.

When SCEV expansion tries to reuse an existing value, it is needed to ensure
that using the Value at the InsertPt will not break LCSSA. The fix adds a
check that InsertPt is either inside the candidate Value's parent loop, or
the candidate Value's parent loop is nullptr.

Oh, I see, makes sense. The patch LGTM, thanks for your patience:)

Michael

include/llvm/Analysis/ScalarEvolutionExpander.h
266 ↗(On Diff #48035)

Please add a comment for the new function.

This revision was automatically updated to reflect the committed changes.
flyingforyou marked an inline comment as done.

Michael, Sanjoy,

Thank you for your patience, so many great suggestions!

It was great time. !

Merged in r260938.

Thanks,
Junmo.