This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
Analysis/
-
LazyValueInfo.h
-
IR/
-
Dominators.h
-
lib/
-
Analysis/
-
LazyValueInfo.cpp
-
IR/
-
Dominators.cpp
-
Transforms/Scalar/
-
Scalar/
-
JumpThreading.cpp
-
test/Transforms/JumpThreading/
-
Transforms/
-
JumpThreading/
-
pr36133.ll

Differential D42717

[JumpThreading] sync DT for LVI analysis (PR 36133)
ClosedPublic

Authored by brzycki on Jan 30 2018, 3:59 PM.

Download Raw Diff

Details

Reviewers

sebpop
• dberlin
kuhar

Commits

rGf1a7df5ef296: [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis
rL325356: [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis

Summary

LVI will use DT if it is available and in the case of the JumpThreading pass it is. When using LVI for analysis it is necessary to flush all pending DominatorTree updates inside JumpThreading to be certain the analysis is correct.

See https://bugs.llvm.org/show_bug.cgi?id=36133 for detailed debug information.

Diff Detail

Repository: rL LLVM

Event Timeline

brzycki created this revision.Jan 30 2018, 3:59 PM

Herald added subscribers: hiraditya, aprantl. · View Herald TranscriptJan 30 2018, 3:59 PM

brzycki mentioned this in D34135: [JumpThreading] Use DT to avoid processing dead blocks.Jan 30 2018, 4:06 PM

Looks OK. Out of curiosity, what is the compilation time impact?

kuhar accepted this revision.Feb 2 2018, 9:37 AM

This revision is now accepted and ready to land.Feb 2 2018, 9:37 AM

Thank you @kuhar for the review. I ran CTmark (x86_64) on tip today and with this patch applied. The results are:

"compile_time": 1264.1519999999994, # tip
"compile_time": 1267.0480000000002, # patched

Which is around 0.3% degredation.

That seems to be within noise and not an issue

In D42717#996670, @kuhar wrote:

That seems to be within noise and not an issue

I tend to agree. Just for my own curiosity I also ran a check to see if my proposed patch for D34135 changed compile time:

"compile_time": 1268.6039999999998, # tip patched with proposed fix for both this issue and D34135

It's also still within noise levels I think.

uabelho added a subscriber: uabelho.Feb 4 2018, 10:52 PM

@kuhar I had a bug in my test-suite runner script and just fixed it. The new results are require more analysis, particularly sqlite.

## COMPARING TIP OF LLVM-PROJECT VS THE SAME SHA WITH THE PROPOSED PATCH FOR D42717
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.D42717/results.json  compile_time
#  left column: build.upstream/results.json
# right column: build.D42717/results.json
#  metric name: compile_time
    336.3600 <- 377.3240     [12.18%]    CTMark/sqlite3/sqlite3.test
    493.8600 -> 486.7000     [1.47%]    CTMark/tramp3d-v4/tramp3d-v4.test
    497.1560 <- 500.4880     [0.67%]    CTMark/ClamAV/clamscan.test
   1265.6800 <- 1272.0480    [0.50%]    CTMark/7zip/7zip-benchmark.test
    589.8000 <- 591.0520     [0.21%]    CTMark/lencod/lencod.test
    387.2960 -> 386.5120     [0.20%]    CTMark/kimwitu++/kc.test
    459.5960 <- 460.3640     [0.17%]    CTMark/SPASS/SPASS.test
    256.1360 <- 256.4000     [0.10%]    CTMark/mafft/pairlocalalign.test
    373.0960 -> 372.7520     [0.09%]    CTMark/consumer-typeset/consumer-typeset.test
    924.1160 -> 923.4120     [0.08%]    CTMark/Bullet/bullet.test

## COMPARING TIP OF LLVM-PROJECT VS THE SAME SHA WITH THE PROPOSED PATCH FOR D34135
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.D34135/results.json  compile_time
#  left column: build.upstream/results.json
# right column: build.D34135/results.json
#  metric name: compile_time
    336.3600 <- 387.9720     [15.34%]    CTMark/sqlite3/sqlite3.test
    493.8600 -> 485.5160     [1.72%]    CTMark/tramp3d-v4/tramp3d-v4.test
    256.1360 -> 252.7320     [1.35%]    CTMark/mafft/pairlocalalign.test
    387.2960 <- 392.0080     [1.22%]    CTMark/kimwitu++/kc.test
    497.1560 <- 499.8040     [0.53%]    CTMark/ClamAV/clamscan.test
    373.0960 -> 371.5320     [0.42%]    CTMark/consumer-typeset/consumer-typeset.test
    459.5960 <- 461.2160     [0.35%]    CTMark/SPASS/SPASS.test
    589.8000 -> 588.1520     [0.28%]    CTMark/lencod/lencod.test
   1265.6800 <- 1268.2080    [0.20%]    CTMark/7zip/7zip-benchmark.test
    924.1160 -> 923.6120     [0.05%]    CTMark/Bullet/bullet.test

## COMPARING THE PROPOSED PATCH FOR D34135 VS THE SAME SHA WITH THE PROPOSED PATCH FOR D42717
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.D34135/results.json build.D42717/results.json  compile_time
#  left column: build.D34135/results.json
# right column: build.D42717/results.json
#  metric name: compile_time
    387.9720 -> 377.3240     [2.82%]    CTMark/sqlite3/sqlite3.test
    252.7320 <- 256.4000     [1.45%]    CTMark/mafft/pairlocalalign.test
    392.0080 -> 386.5120     [1.42%]    CTMark/kimwitu++/kc.test
    588.1520 <- 591.0520     [0.49%]    CTMark/lencod/lencod.test
    371.5320 <- 372.7520     [0.33%]    CTMark/consumer-typeset/consumer-typeset.test
   1268.2080 <- 1272.0480    [0.30%]    CTMark/7zip/7zip-benchmark.test
    485.5160 <- 486.7000     [0.24%]    CTMark/tramp3d-v4/tramp3d-v4.test
    461.2160 -> 460.3640     [0.19%]    CTMark/SPASS/SPASS.test
    499.8040 <- 500.4880     [0.14%]    CTMark/ClamAV/clamscan.test
    923.6120 -> 923.4120     [0.02%]    CTMark/Bullet/bullet.test

All the other benchmarks are ~1% deltas and I'm not very concerned with them. The sqlite benchmark loses 12% with this proposed patch and an additional 3% with the patch proposed for D34135. I'm going to do another run of the upstream compiler to "check the cable" but I'm fairly certain these results are accurate.

My conjecture is the DT edge calls on a large tree are very expensive and I'm not sure how to mitigate this...

I'm seeing about 3% run variation on sqlite with the upstream compiler:

brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.upstream2/results.json compile_time
#  left column: build.upstream/results.json
# right column: build.upstream2/results.json
#  metric name: compile_time
    336.3600 <- 347.4120     [  3.29%]  CTMark/sqlite3/sqlite3.test
    493.8600 -> 487.0560     [  1.40%]  CTMark/tramp3d-v4/tramp3d-v4.test
    373.0960 -> 368.5280     [  1.24%]  CTMark/consumer-typeset/consumer-typeset.test
    387.2960 <- 389.8320     [  0.65%]  CTMark/kimwitu++/kc.test
    459.5960 <- 462.5440     [  0.64%]  CTMark/SPASS/SPASS.test
    924.1160 -> 921.2680     [  0.31%]  CTMark/Bullet/bullet.test
    497.1560 <- 498.5240     [  0.28%]  CTMark/ClamAV/clamscan.test
    589.8000 -> 588.3840     [  0.24%]  CTMark/lencod/lencod.test
   1265.6800 <- 1267.2280    [  0.12%]  CTMark/7zip/7zip-benchmark.test
    256.1360 -> 256.1000     [  0.01%]  CTMark/mafft/pairlocalalign.test

That still puts the patch around 10% slower.

Here is a comparison of a run of LLVM before the DDT patch was applied to JT vs upstream.

brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt/results.json build.upstream/results.json compile_time
#  left column: build.pre_dt/results.json
# right column: build.upstream/results.json
#  metric name: compile_time
    527.9440 -> 493.8600     [  6.90%]  CTMark/tramp3d-v4/tramp3d-v4.test
   1203.8440 <- 1265.6800    [  5.14%]  CTMark/7zip/7zip-benchmark.test
    881.3400 <- 924.1160     [  4.85%]  CTMark/Bullet/bullet.test
    576.0160 <- 589.8000     [  2.39%]  CTMark/lencod/lencod.test
    365.8560 <- 373.0960     [  1.98%]  CTMark/consumer-typeset/consumer-typeset.test
    452.0200 <- 459.5960     [  1.68%]  CTMark/SPASS/SPASS.test
    252.3400 <- 256.1360     [  1.50%]  CTMark/mafft/pairlocalalign.test
    491.1680 <- 497.1560     [  1.22%]  CTMark/ClamAV/clamscan.test
    382.8120 <- 387.2960     [  1.17%]  CTMark/kimwitu++/kc.test
    336.8720 -> 336.3600     [  0.15%]  CTMark/sqlite3/sqlite3.test


brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt/results.json build.upstream2/results.json compile_time
#  left column: build.pre_dt/results.json
# right column: build.upstream2/results.json
#  metric name: compile_time
    527.9440 -> 487.0560     [  8.39%]  CTMark/tramp3d-v4/tramp3d-v4.test
   1203.8440 <- 1267.2280    [  5.27%]  CTMark/7zip/7zip-benchmark.test
    881.3400 <- 921.2680     [  4.53%]  CTMark/Bullet/bullet.test
    336.8720 <- 347.4120     [  3.13%]  CTMark/sqlite3/sqlite3.test
    452.0200 <- 462.5440     [  2.33%]  CTMark/SPASS/SPASS.test
    576.0160 <- 588.3840     [  2.15%]  CTMark/lencod/lencod.test
    382.8120 <- 389.8320     [  1.83%]  CTMark/kimwitu++/kc.test
    491.1680 <- 498.5240     [  1.50%]  CTMark/ClamAV/clamscan.test
    252.3400 <- 256.1000     [  1.49%]  CTMark/mafft/pairlocalalign.test
    365.8560 <- 368.5280     [  0.73%]  CTMark/consumer-typeset/consumer-typeset.test

@kuhar and @dberlin I would appreciate any guidance you two may have. I spoke with @sebpop about this problem and we don't see a way around the performance impact to preserve DT for LVI analysis. As it stands now there are cases where the deferred updates change the LVI analysis sufficiently to cause a miscompile. We do think there may be ways to mitigate this loss in the future: one way is to prevent pointless threads where a BB is threaded only to be discarded in a subsequent loop iteration within JumpThreading.

LVI functions fine without DT, and it's going to be too expensive to ever use a lazy analysis that is constantly recomputing info like LVI because it may require hundreds or thousands of Dominator tree rebuilds. I would hand LVI a fake dt pointer class where when the Dominator tree is up to date, it works and when it's not up to date, it is null. That would work with how LVI currently checking I think. Regardless I think the right answer here is to make it not use it when it's not up to date. That will give better answers then it used to get and you get gradually better answers as you can preserve more or become less lazy

Added a feature to LVI to enable or disable the use of the DominatorTree for analysis. Updated JumpThreading to disable the use of DT for LVI analysis when there are pending updates in the DDT.

@dberlin thank you for the suggestion and I have implemented the change. The good news is sqlite is back to our original performance. The not-so-great news is this causes a regression for tramp3d-v4:

brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.D42717_newpatch/results.json compile_time
#  left column: build.upstream/results.json
# right column: build.D42717_newpatch/results.json
#  metric name: compile_time
    493.8600 <- 536.3320     [  8.60%]  CTMark/tramp3d-v4/tramp3d-v4.test
    336.3600 <- 343.2440     [  2.05%]  CTMark/sqlite3/sqlite3.test
    373.0960 -> 369.3400     [  1.02%]  CTMark/consumer-typeset/consumer-typeset.test
    256.1360 -> 253.9560     [  0.86%]  CTMark/mafft/pairlocalalign.test
    459.5960 <- 462.5360     [  0.64%]  CTMark/SPASS/SPASS.test
   1265.6800 <- 1270.5560    [  0.39%]  CTMark/7zip/7zip-benchmark.test
    589.8000 <- 591.7280     [  0.33%]  CTMark/lencod/lencod.test
    387.2960 <- 388.3840     [  0.28%]  CTMark/kimwitu++/kc.test
    924.1160 -> 921.7360     [  0.26%]  CTMark/Bullet/bullet.test
    497.1560 -> 496.2280     [  0.19%]  CTMark/ClamAV/clamscan.test
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream2/results.json build.D42717_newpatch/results.json compile_time
#  left column: build.upstream2/results.json
# right column: build.D42717_newpatch/results.json
#  metric name: compile_time
    487.0560 <- 536.3320     [ 10.12%]  CTMark/tramp3d-v4/tramp3d-v4.test
    347.4120 -> 343.2440     [  1.21%]  CTMark/sqlite3/sqlite3.test
    256.1000 -> 253.9560     [  0.84%]  CTMark/mafft/pairlocalalign.test
    588.3840 <- 591.7280     [  0.57%]  CTMark/lencod/lencod.test
    498.5240 -> 496.2280     [  0.46%]  CTMark/ClamAV/clamscan.test
    389.8320 -> 388.3840     [  0.37%]  CTMark/kimwitu++/kc.test
   1267.2280 <- 1270.5560    [  0.26%]  CTMark/7zip/7zip-benchmark.test
    368.5280 <- 369.3400     [  0.22%]  CTMark/consumer-typeset/consumer-typeset.test
    921.2680 <- 921.7360     [  0.05%]  CTMark/Bullet/bullet.test
    462.5440 -> 462.5360     [  0.00%]  CTMark/SPASS/SPASS.test

The upstream and upstream2 builds both use DT for LVI analysis. In the case of tramp3d-v4 we lucked out using DT analysis without causing a miscompile. I don't see a way to get that performance back.

This patch also invalidates my proposed fix for D34135 since we no longer have a valid DT before calling LVI in all instances.

I'm seeing considerable variance on the tramp3d-v4 compile time test before and after the preservation of DT on JumpThreading patch:

brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt_1/results.json build.pre_dt_2/results.json compile_time
#  left column: build.pre_dt_1/results.json
# right column: build.pre_dt_2/results.json
#  metric name: compile_time
    524.8640 -> 474.9400     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    343.1280 <- 369.4960     [  7.68%]  CTMark/sqlite3/sqlite3.test
    379.4560 <- 386.0880     [  1.75%]  CTMark/kimwitu++/kc.test
    250.3400 -> 248.0680     [  0.92%]  CTMark/mafft/pairlocalalign.test
    583.2080 <- 587.5400     [  0.74%]  CTMark/lencod/lencod.test
    918.8360 -> 912.4920     [  0.70%]  CTMark/Bullet/bullet.test
    456.7720 -> 454.7480     [  0.45%]  CTMark/SPASS/SPASS.test
    367.2440 -> 366.3920     [  0.23%]  CTMark/consumer-typeset/consumer-typeset.test
   1254.5240 -> 1252.3720    [  0.17%]  CTMark/7zip/7zip-benchmark.test
    490.1600 -> 489.4360     [  0.15%]  CTMark/ClamAV/clamscan.test
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt_1/results.json build.pre_dt_2/results.json compile_time
#  left column: build.pre_dt_1/results.json
# right column: build.pre_dt_2/results.json
#  metric name: compile_time
    524.8640 -> 474.9400     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    343.1280 <- 369.4960     [  7.68%]  CTMark/sqlite3/sqlite3.test
    379.4560 <- 386.0880     [  1.75%]  CTMark/kimwitu++/kc.test
    250.3400 -> 248.0680     [  0.92%]  CTMark/mafft/pairlocalalign.test
    583.2080 <- 587.5400     [  0.74%]  CTMark/lencod/lencod.test
    918.8360 -> 912.4920     [  0.70%]  CTMark/Bullet/bullet.test
    456.7720 -> 454.7480     [  0.45%]  CTMark/SPASS/SPASS.test
    367.2440 -> 366.3920     [  0.23%]  CTMark/consumer-typeset/consumer-typeset.test
   1254.5240 -> 1252.3720    [  0.17%]  CTMark/7zip/7zip-benchmark.test
    490.1600 -> 489.4360     [  0.15%]  CTMark/ClamAV/clamscan.test
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.0cb78d41fca7bac_1/results.json build.0cb78d41fca7bac_2/results.json compile_time
#  left column: build.0cb78d41fca7bac_1/results.json
# right column: build.0cb78d41fca7bac_2/results.json
#  metric name: compile_time
    535.8520 -> 483.3720     [ 10.86%]  CTMark/tramp3d-v4/tramp3d-v4.test
    372.9000 <- 378.4200     [  1.48%]  CTMark/sqlite3/sqlite3.test
    256.3040 -> 253.7600     [  1.00%]  CTMark/mafft/pairlocalalign.test
    387.8480 -> 385.0480     [  0.73%]  CTMark/kimwitu++/kc.test
    369.8760 <- 372.3440     [  0.67%]  CTMark/consumer-typeset/consumer-typeset.test
    458.8480 <- 461.5720     [  0.59%]  CTMark/SPASS/SPASS.test
    590.1440 -> 588.8520     [  0.22%]  CTMark/lencod/lencod.test
    499.0920 <- 499.9840     [  0.18%]  CTMark/ClamAV/clamscan.test
   1270.1200 -> 1267.9480    [  0.17%]  CTMark/7zip/7zip-benchmark.test
    928.7720 <- 928.8000     [  0.00%]  CTMark/Bullet/bullet.test
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.D42717_1/results.json build.D42717_2/results.json compile_time
#  left column: build.D42717_1/results.json
# right column: build.D42717_2/results.json
#  metric name: compile_time
    494.1120 -> 487.4120     [  1.37%]  CTMark/tramp3d-v4/tramp3d-v4.test
    369.8320 <- 373.5960     [  1.02%]  CTMark/consumer-typeset/consumer-typeset.test
    386.5280 <- 390.0800     [  0.92%]  CTMark/kimwitu++/kc.test
    501.9440 -> 497.7200     [  0.85%]  CTMark/ClamAV/clamscan.test
    255.9640 -> 254.5120     [  0.57%]  CTMark/mafft/pairlocalalign.test
    347.9000 -> 346.5880     [  0.38%]  CTMark/sqlite3/sqlite3.test
   1267.7040 <- 1269.8640    [  0.17%]  CTMark/7zip/7zip-benchmark.test
    462.5080 <- 462.8000     [  0.06%]  CTMark/SPASS/SPASS.test
    923.5880 -> 923.0320     [  0.06%]  CTMark/Bullet/bullet.test
    589.6800 <- 589.8840     [  0.03%]  CTMark/lencod/lencod.test

All of these are the same compiler compared against itself on as close to a quiescent x86_64 machine as I could find. The D42717 version has been rebased on top of llvm-project Git sha 0cb78d41fca7bac. I'm going to run more iterations of the pre_dt compiler (llvm-project sha 788d5233ad1f2 from January 12) to see if the variance persists.

If it does then this current patch does not meaningfully alter the compile time of tramp3d-v4.

The variance of tramp3d-v4 is considerable, as much as 18.5%. I ran CTMark on the pre-DDT compiler and here are the results:

$ cat bar.sh
#!/bin/bash

for i in 1 2 3 4 5 6; do
    for j in 1 2 3 4 5 6; do
        ~/git/me/llvm/litdiff \
            build.pre_dt_$i/results.json \
            build.pre_dt_$j/results.json \
            compile_time  | grep tramp3d
    done
done
brzycki@cc01 /work/brzycki/test-suite $ ./bar.sh
    524.8640 -> 474.9400     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    524.8640 -> 486.9560     [  7.78%]  CTMark/tramp3d-v4/tramp3d-v4.test
    524.8640 -> 482.9680     [  8.67%]  CTMark/tramp3d-v4/tramp3d-v4.test
    524.8640 -> 484.6760     [  8.29%]  CTMark/tramp3d-v4/tramp3d-v4.test
    524.8640 <- 562.8000     [  7.23%]  CTMark/tramp3d-v4/tramp3d-v4.test
    474.9400 <- 524.8640     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    474.9400 <- 486.9560     [  2.53%]  CTMark/tramp3d-v4/tramp3d-v4.test
    474.9400 <- 482.9680     [  1.69%]  CTMark/tramp3d-v4/tramp3d-v4.test
    474.9400 <- 484.6760     [  2.05%]  CTMark/tramp3d-v4/tramp3d-v4.test
    474.9400 <- 562.8000     [ 18.50%]  CTMark/tramp3d-v4/tramp3d-v4.test
    486.9560 <- 524.8640     [  7.78%]  CTMark/tramp3d-v4/tramp3d-v4.test
    486.9560 -> 474.9400     [  2.53%]  CTMark/tramp3d-v4/tramp3d-v4.test
    486.9560 -> 482.9680     [  0.83%]  CTMark/tramp3d-v4/tramp3d-v4.test
    486.9560 -> 484.6760     [  0.47%]  CTMark/tramp3d-v4/tramp3d-v4.test
    486.9560 <- 562.8000     [ 15.58%]  CTMark/tramp3d-v4/tramp3d-v4.test
    482.9680 <- 524.8640     [  8.67%]  CTMark/tramp3d-v4/tramp3d-v4.test
    482.9680 -> 474.9400     [  1.69%]  CTMark/tramp3d-v4/tramp3d-v4.test
    482.9680 <- 486.9560     [  0.83%]  CTMark/tramp3d-v4/tramp3d-v4.test
    482.9680 <- 484.6760     [  0.35%]  CTMark/tramp3d-v4/tramp3d-v4.test
    482.9680 <- 562.8000     [ 16.53%]  CTMark/tramp3d-v4/tramp3d-v4.test
    484.6760 <- 524.8640     [  8.29%]  CTMark/tramp3d-v4/tramp3d-v4.test
    484.6760 -> 474.9400     [  2.05%]  CTMark/tramp3d-v4/tramp3d-v4.test
    484.6760 <- 486.9560     [  0.47%]  CTMark/tramp3d-v4/tramp3d-v4.test
    484.6760 -> 482.9680     [  0.35%]  CTMark/tramp3d-v4/tramp3d-v4.test
    484.6760 <- 562.8000     [ 16.12%]  CTMark/tramp3d-v4/tramp3d-v4.test
    562.8000 -> 524.8640     [  7.23%]  CTMark/tramp3d-v4/tramp3d-v4.test
    562.8000 -> 474.9400     [ 18.50%]  CTMark/tramp3d-v4/tramp3d-v4.test
    562.8000 -> 486.9560     [ 15.58%]  CTMark/tramp3d-v4/tramp3d-v4.test
    562.8000 -> 482.9680     [ 16.53%]  CTMark/tramp3d-v4/tramp3d-v4.test
    562.8000 -> 484.6760     [ 16.12%]  CTMark/tramp3d-v4/tramp3d-v4.test

Best time was 474.94, worst was 562.80. I'll perform the same experiment with proposed patch in this differential.

Six runs of tramp3d-v4 with this patch applied show comperable variance and similar low/high bounds:

$ cat baz.sh
#!/bin/bash

for i in 1 2 3 4 5 6; do
    for j in 1 2 3 4 5 6; do
        ~/git/me/llvm/litdiff \
            build.D42717_$i/results.json \
            build.D42717_$j/results.json \
            compile_time  | grep tramp3d
    done
done
brzycki@cc01 /work/brzycki/test-suite $ ./baz.sh
    494.1120 -> 487.4120     [  1.37%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.1120 -> 485.5440     [  1.76%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.1120 <- 494.9720     [  0.17%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.1120 <- 502.4760     [  1.69%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.1120 <- 536.5600     [  8.59%]  CTMark/tramp3d-v4/tramp3d-v4.test
    487.4120 <- 494.1120     [  1.37%]  CTMark/tramp3d-v4/tramp3d-v4.test
    487.4120 -> 485.5440     [  0.38%]  CTMark/tramp3d-v4/tramp3d-v4.test
    487.4120 <- 494.9720     [  1.55%]  CTMark/tramp3d-v4/tramp3d-v4.test
    487.4120 <- 502.4760     [  3.09%]  CTMark/tramp3d-v4/tramp3d-v4.test
    487.4120 <- 536.5600     [ 10.08%]  CTMark/tramp3d-v4/tramp3d-v4.test
    485.5440 <- 494.1120     [  1.76%]  CTMark/tramp3d-v4/tramp3d-v4.test
    485.5440 <- 487.4120     [  0.38%]  CTMark/tramp3d-v4/tramp3d-v4.test
    485.5440 <- 494.9720     [  1.94%]  CTMark/tramp3d-v4/tramp3d-v4.test
    485.5440 <- 502.4760     [  3.49%]  CTMark/tramp3d-v4/tramp3d-v4.test
    485.5440 <- 536.5600     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.9720 -> 494.1120     [  0.17%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.9720 -> 487.4120     [  1.55%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.9720 -> 485.5440     [  1.94%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.9720 <- 502.4760     [  1.52%]  CTMark/tramp3d-v4/tramp3d-v4.test
    494.9720 <- 536.5600     [  8.40%]  CTMark/tramp3d-v4/tramp3d-v4.test
    502.4760 -> 494.1120     [  1.69%]  CTMark/tramp3d-v4/tramp3d-v4.test
    502.4760 -> 487.4120     [  3.09%]  CTMark/tramp3d-v4/tramp3d-v4.test
    502.4760 -> 485.5440     [  3.49%]  CTMark/tramp3d-v4/tramp3d-v4.test
    502.4760 -> 494.9720     [  1.52%]  CTMark/tramp3d-v4/tramp3d-v4.test
    502.4760 <- 536.5600     [  6.78%]  CTMark/tramp3d-v4/tramp3d-v4.test
    536.5600 -> 494.1120     [  8.59%]  CTMark/tramp3d-v4/tramp3d-v4.test
    536.5600 -> 487.4120     [ 10.08%]  CTMark/tramp3d-v4/tramp3d-v4.test
    536.5600 -> 485.5440     [ 10.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    536.5600 -> 494.9720     [  8.40%]  CTMark/tramp3d-v4/tramp3d-v4.test
    536.5600 -> 502.4760     [  6.78%]  CTMark/tramp3d-v4/tramp3d-v4.test

The best time was 485.544 and the worst time was 536.56.

@dberlin I think this patch can be applied as-is to fix pr36133.

I used clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
37.7480 (  9.8%)   0.1080 (  3.9%)  37.8560 (  9.7%)  37.8099 (  9.7%)  Global Value Numbering
32.5400 (  8.4%)   0.3320 ( 11.9%)  32.8720 (  8.4%)  32.8960 (  8.5%)  Function Integration/Inlining
26.7520 (  6.9%)   0.2280 (  8.2%)  26.9800 (  6.9%)  26.9368 (  6.9%)  X86 DAG->DAG Instruction Selection
25.8080 (  6.7%)   0.0960 (  3.4%)  25.9040 (  6.7%)  25.8395 (  6.6%)  Combine redundant instructions
16.7080 (  4.3%)   0.1160 (  4.2%)  16.8240 (  4.3%)  17.0581 (  4.4%)  Combine redundant instructions
14.7120 (  3.8%)   0.1000 (  3.6%)  14.8120 (  3.8%)  14.8012 (  3.8%)  Combine redundant instructions
13.6320 (  3.5%)   0.1120 (  4.0%)  13.7440 (  3.5%)  13.6522 (  3.5%)  Combine redundant instructions
11.3840 (  2.9%)   0.0120 (  0.4%)  11.3960 (  2.9%)  11.2862 (  2.9%)  Loop Strength Reduction
10.6120 (  2.7%)   0.0080 (  0.3%)  10.6200 (  2.7%)  10.6267 (  2.7%)  Combine redundant instructions
10.0360 (  2.6%)   0.0320 (  1.1%)  10.0680 (  2.6%)  10.1269 (  2.6%)  Combine redundant instructions
 9.7280 (  2.5%)   0.0160 (  0.6%)   9.7440 (  2.5%)   9.7972 (  2.5%)  Combine redundant instructions
 8.1040 (  2.1%)   0.0240 (  0.9%)   8.1280 (  2.1%)   8.0646 (  2.1%)  Dead Store Elimination
 7.6040 (  2.0%)   0.0480 (  1.7%)   7.6520 (  2.0%)   7.6756 (  2.0%)  Value Propagation
 7.5520 (  2.0%)   0.0200 (  0.7%)   7.5720 (  1.9%)   7.5224 (  1.9%)  SLP Vectorizer
 6.6240 (  1.7%)   0.0800 (  2.9%)   6.7040 (  1.7%)   6.7257 (  1.7%)  Induction Variable Simplification
 6.6320 (  1.7%)   0.0320 (  1.1%)   6.6640 (  1.7%)   6.7028 (  1.7%)  Loop Invariant Code Motion
 6.5040 (  1.7%)   0.1000 (  3.6%)   6.6040 (  1.7%)   6.5648 (  1.7%)  Combine redundant instructions
 6.0480 (  1.6%)   0.0440 (  1.6%)   6.0920 (  1.6%)   6.2509 (  1.6%)  Memory SSA
 5.9080 (  1.5%)   0.0560 (  2.0%)   5.9640 (  1.5%)   6.0655 (  1.6%)  Value Propagation
 5.8120 (  1.5%)   0.0160 (  0.6%)   5.8280 (  1.5%)   5.9670 (  1.5%)  Early CSE w/ MemorySSA
 5.0480 (  1.3%)   0.0040 (  0.1%)   5.0520 (  1.3%)   5.0647 (  1.3%)  Loop Invariant Code Motion
 4.3280 (  1.1%)   0.0120 (  0.4%)   4.3400 (  1.1%)   4.3600 (  1.1%)  Loop Invariant Code Motion
 4.0080 (  1.0%)   0.0280 (  1.0%)   4.0360 (  1.0%)   4.0010 (  1.0%)  Greedy Register Allocator
 3.7480 (  1.0%)   0.0400 (  1.4%)   3.7880 (  1.0%)   3.7836 (  1.0%)  CodeGen Prepare
 3.2240 (  0.8%)   0.0080 (  0.3%)   3.2320 (  0.8%)   3.3606 (  0.9%)  Machine Instruction Scheduler
 3.2640 (  0.8%)   0.0280 (  1.0%)   3.2920 (  0.8%)   3.3383 (  0.9%)  MemCpy Optimization
 3.1040 (  0.8%)   0.0160 (  0.6%)   3.1200 (  0.8%)   3.1925 (  0.8%)  Induction Variable Users
 2.5760 (  0.7%)   0.0160 (  0.6%)   2.5920 (  0.7%)   2.6184 (  0.7%)  Unroll loops
 2.5720 (  0.7%)   0.0320 (  1.1%)   2.6040 (  0.7%)   2.5598 (  0.7%)  SROA
 2.5320 (  0.7%)   0.0200 (  0.7%)   2.5520 (  0.7%)   2.3682 (  0.6%)  Jump Threading
 2.3400 (  0.6%)   0.0040 (  0.1%)   2.3440 (  0.6%)   2.3639 (  0.6%)  Jump Threading
 2.2880 (  0.6%)   0.0280 (  1.0%)   2.3160 (  0.6%)   2.2819 (  0.6%)  Reassociate expressions
 2.1440 (  0.6%)   0.0120 (  0.4%)   2.1560 (  0.6%)   2.1243 (  0.5%)  Loop Load Elimination
 1.9400 (  0.5%)   0.0080 (  0.3%)   1.9480 (  0.5%)   2.0576 (  0.5%)  SROA
 1.9640 (  0.5%)   0.0160 (  0.6%)   1.9800 (  0.5%)   1.9998 (  0.5%)  Loop Vectorization
 1.5960 (  0.4%)   0.0280 (  1.0%)   1.6240 (  0.4%)   1.6763 (  0.4%)  Simplify the CFG
 1.5000 (  0.4%)   0.0160 (  0.6%)   1.5160 (  0.4%)   1.6653 (  0.4%)  Simplify the CFG
 1.6280 (  0.4%)   0.0160 (  0.6%)   1.6440 (  0.4%)   1.5298 (  0.4%)  Remove redundant instructions
 1.5160 (  0.4%)   0.0160 (  0.6%)   1.5320 (  0.4%)   1.4801 (  0.4%)  Module Verifier
 1.4280 (  0.4%)   0.0240 (  0.9%)   1.4520 (  0.4%)   1.4439 (  0.4%)  Module Verifier
 1.3960 (  0.4%)   0.0000 (  0.0%)   1.3960 (  0.4%)   1.4080 (  0.4%)  Module Verifier
 1.4440 (  0.4%)   0.0080 (  0.3%)   1.4520 (  0.4%)   1.3567 (  0.3%)  Simplify the CFG
 1.2880 (  0.3%)   0.0160 (  0.6%)   1.3040 (  0.3%)   1.3025 (  0.3%)  Simplify the CFG
 1.2040 (  0.3%)   0.0200 (  0.7%)   1.2240 (  0.3%)   1.2963 (  0.3%)  Early CSE
 1.0320 (  0.3%)   0.0120 (  0.4%)   1.0440 (  0.3%)   1.1418 (  0.3%)  Sparse Conditional Constant Propagation

and

 ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
38.5520 (  9.9%)   0.1040 (  3.9%)  38.6560 (  9.9%)  38.7477 (  9.9%)  Global Value Numbering
32.6440 (  8.4%)   0.2720 ( 10.2%)  32.9160 (  8.4%)  32.8803 (  8.4%)  Function Integration/Inlining
26.3040 (  6.8%)   0.2440 (  9.1%)  26.5480 (  6.8%)  26.6897 (  6.8%)  X86 DAG->DAG Instruction Selection
26.3120 (  6.8%)   0.0760 (  2.8%)  26.3880 (  6.7%)  26.2355 (  6.7%)  Combine redundant instructions
17.1440 (  4.4%)   0.0800 (  3.0%)  17.2240 (  4.4%)  17.3240 (  4.4%)  Combine redundant instructions
14.9080 (  3.8%)   0.0480 (  1.8%)  14.9560 (  3.8%)  15.0418 (  3.8%)  Combine redundant instructions
13.6120 (  3.5%)   0.0560 (  2.1%)  13.6680 (  3.5%)  13.8665 (  3.5%)  Combine redundant instructions
11.0240 (  2.8%)   0.0400 (  1.5%)  11.0640 (  2.8%)  11.0079 (  2.8%)  Loop Strength Reduction
10.7280 (  2.8%)   0.0320 (  1.2%)  10.7600 (  2.7%)  10.7927 (  2.7%)  Combine redundant instructions
10.2160 (  2.6%)   0.0400 (  1.5%)  10.2560 (  2.6%)  10.2907 (  2.6%)  Combine redundant instructions
 9.9440 (  2.6%)   0.0200 (  0.7%)   9.9640 (  2.5%)   9.9617 (  2.5%)  Combine redundant instructions
 8.3960 (  2.2%)   0.0320 (  1.2%)   8.4280 (  2.1%)   8.2761 (  2.1%)  Dead Store Elimination
 7.7520 (  2.0%)   0.0440 (  1.6%)   7.7960 (  2.0%)   7.8317 (  2.0%)  Value Propagation
 7.5040 (  1.9%)   0.0360 (  1.3%)   7.5400 (  1.9%)   7.4545 (  1.9%)  SLP Vectorizer
 6.8080 (  1.7%)   0.0240 (  0.9%)   6.8320 (  1.7%)   6.8237 (  1.7%)  Loop Invariant Code Motion
 6.7880 (  1.7%)   0.1160 (  4.3%)   6.9040 (  1.8%)   6.7762 (  1.7%)  Combine redundant instructions
 6.5240 (  1.7%)   0.0880 (  3.3%)   6.6120 (  1.7%)   6.6196 (  1.7%)  Induction Variable Simplification
 6.3320 (  1.6%)   0.0360 (  1.3%)   6.3680 (  1.6%)   6.4497 (  1.6%)  Memory SSA
 6.2720 (  1.6%)   0.0080 (  0.3%)   6.2800 (  1.6%)   6.2596 (  1.6%)  Value Propagation
 5.9040 (  1.5%)   0.0480 (  1.8%)   5.9520 (  1.5%)   6.0199 (  1.5%)  Early CSE w/ MemorySSA
 5.1880 (  1.3%)   0.0080 (  0.3%)   5.1960 (  1.3%)   5.1699 (  1.3%)  Loop Invariant Code Motion
 4.4320 (  1.1%)   0.0040 (  0.1%)   4.4360 (  1.1%)   4.4453 (  1.1%)  Loop Invariant Code Motion
 4.0040 (  1.0%)   0.0200 (  0.7%)   4.0240 (  1.0%)   4.0277 (  1.0%)  Greedy Register Allocator
 3.7680 (  1.0%)   0.0040 (  0.1%)   3.7720 (  1.0%)   3.7186 (  0.9%)  CodeGen Prepare
 3.3680 (  0.9%)   0.0200 (  0.7%)   3.3880 (  0.9%)   3.4240 (  0.9%)  MemCpy Optimization
 3.3920 (  0.9%)   0.0120 (  0.4%)   3.4040 (  0.9%)   3.3959 (  0.9%)  Machine Instruction Scheduler
 3.1280 (  0.8%)   0.0120 (  0.4%)   3.1400 (  0.8%)   3.1442 (  0.8%)  Induction Variable Users
 2.5080 (  0.6%)   0.0360 (  1.3%)   2.5440 (  0.6%)   2.5870 (  0.7%)  SROA
 2.4840 (  0.6%)   0.0320 (  1.2%)   2.5160 (  0.6%)   2.5860 (  0.7%)  Unroll loops
 2.3200 (  0.6%)   0.0160 (  0.6%)   2.3360 (  0.6%)   2.3900 (  0.6%)  Jump Threading
 2.3520 (  0.6%)   0.0200 (  0.7%)   2.3720 (  0.6%)   2.3781 (  0.6%)  Jump Threading
 2.1680 (  0.6%)   0.0240 (  0.9%)   2.1920 (  0.6%)   2.2602 (  0.6%)  Reassociate expressions
 2.1120 (  0.5%)   0.0160 (  0.6%)   2.1280 (  0.5%)   2.0903 (  0.5%)  Loop Load Elimination
 2.0600 (  0.5%)   0.0240 (  0.9%)   2.0840 (  0.5%)   2.0447 (  0.5%)  SROA
 2.0080 (  0.5%)   0.0000 (  0.0%)   2.0080 (  0.5%)   1.9710 (  0.5%)  Loop Vectorization
 1.5600 (  0.4%)   0.0160 (  0.6%)   1.5760 (  0.4%)   1.7095 (  0.4%)  Simplify the CFG
 1.6880 (  0.4%)   0.0200 (  0.7%)   1.7080 (  0.4%)   1.6968 (  0.4%)  Simplify the CFG
 1.4960 (  0.4%)   0.0080 (  0.3%)   1.5040 (  0.4%)   1.5448 (  0.4%)  Remove redundant instructions
 1.4880 (  0.4%)   0.0200 (  0.7%)   1.5080 (  0.4%)   1.4811 (  0.4%)  Module Verifier
 1.4680 (  0.4%)   0.0160 (  0.6%)   1.4840 (  0.4%)   1.4391 (  0.4%)  Module Verifier
 1.4120 (  0.4%)   0.0120 (  0.4%)   1.4240 (  0.4%)   1.4021 (  0.4%)  Module Verifier
 1.3120 (  0.3%)   0.0040 (  0.1%)   1.3160 (  0.3%)   1.3868 (  0.4%)  Simplify the CFG
 1.3240 (  0.3%)   0.0160 (  0.6%)   1.3400 (  0.3%)   1.3271 (  0.3%)  Simplify the CFG
 1.2680 (  0.3%)   0.0200 (  0.7%)   1.2880 (  0.3%)   1.3067 (  0.3%)  Early CSE
 1.1160 (  0.3%)   0.0160 (  0.6%)   1.1320 (  0.3%)   1.1449 (  0.3%)  Sparse Conditional Constant Propagation

There are small differences between their execution times, but none of them to me stand out. The total difference in wall-clock time of these runs is 3.16 seconds.

a.elovikov added inline comments.Feb 13 2018, 1:43 AM

llvm/lib/Transforms/Scalar/JumpThreading.cpp
648 ↗	(On Diff #133630)	Isn't it strange that we enable LVI without flushing DDT first? Even if it's ok now I'm afraid it might become a source of nasty bugs in the future. And related question - would it be better to integrate the LVI->disableDT/enableDT calls into the DDT itself (or a wrapper around it)? So that any deferred update to the DT would automatically make DT disabled in LVI and every flush would make it enabled?

brzycki added inline comments.Feb 13 2018, 8:21 AM

llvm/lib/Transforms/Scalar/JumpThreading.cpp

648 ↗

(On Diff #133630)

Hello @a.elovikov,

I chose to enable/disable around each LVI analysis call in the case that future patches call DDT->flush() within JumpThreading. I could also make this call and feel comfortable with its implications:

if (DDT->pending())
  LVI->disableDT();
else
  LVI->enableDT();
// LVI analysis begins...

As for adding LVI into the DDT class itself, that approach ended up being full of problems. The LazyValue class doesn't actually have the code for enableDT()/disableDT() in the header like traditional LLVM classes. Because the class is broken into two for caching on-demand I get compile-time errors with LLVM unable to find the actual implementation of the disableDT()/enableDT() functions at link time:

lib/IR/CMakeFiles/LLVMCore.dir/Dominators.cpp.o: In function `llvm::DeferredDominance::applyUpdate(llvm::DomTreeBuilder::UpdateKind, llvm::BasicBlock*, llvm::BasicBlock*)':
/work/brzycki/D42717/llvm-project/llvm/lib/IR/Dominators.cpp:569: undefined reference to `llvm::LazyValueInfo::disableDT()'
lib/IR/CMakeFiles/LLVMCore.dir/Dominators.cpp.o: In function `llvm::DeferredDominance::flush()':
/work/brzycki/D42717/llvm-project/llvm/lib/IR/Dominators.cpp:464: undefined reference to `llvm::LazyValueInfo::enableDT()'
lib/IR/CMakeFiles/LLVMCore.dir/Dominators.cpp.o: In function `llvm::DeferredDominance::recalculate(llvm::Function&)':
/work/brzycki/D42717/llvm-project/llvm/lib/IR/Dominators.cpp:481: undefined reference to `llvm::LazyValueInfo::enableDT()'
clang: error: linker command failed with exit code 1 (use -v to see invocation)

If you know of a way to fix these errors I'll give it another try.

• dberlin added inline comments.Feb 13 2018, 8:51 AM

llvm/lib/Transforms/Scalar/JumpThreading.cpp
648 ↗	(On Diff #133630)	Hey @a.elovikov , You really don't want DDT to know things about passes. IMHO, if you were going to go that route, LVI checks whether DT is null, and so you could wrap this all in a "UpdatedDTOrNull" object that, when cast to a DominatorTree , returned the tree if it was up to date, or a null pointer if it was not. That looks* nicer code wise. But one advantage to the way Brian is doing it is that it is easily verifiable and doesn't hide where and when LVI has a disabled DT. When we hide stuff like that, it tends to regress and nobody notices for a while.

Rebased to tip, switched to always enabling or disabling before calling LVI analysis. Prevents LVI from being accidentally enabled.

Use std::swap() in enableDT/DisableDT methods.

IMHO, if you were going to go that route, LVI checks whether DT is null, and so you could wrap this all in a "UpdatedDTOrNull" object that, when cast to a DominatorTree *, returned the tree if it was up to date, or a null pointer if it was not.

It's a bit more subtle than that @dberlin due to the way LazyValueInfo is defined. Because the LazyValueInfoImpl object is cached and re-used it would store a pointer to DT (or nullptr) at the instance of creating the first cachced LVI instance. I ran into this problem with my first attempt to disable DT within LVI and is another reason I decided on the new external methods.

I think this patch is ready to commit, barring any objections.

I might be missing something, but why don't we enable DT back in LVI after the JT pass finishes and flushes the DDT? Is it possible that we end up with disabled DT *after* the JT?

llvm/lib/Analysis/LazyValueInfo.cpp
469 ↗	(On Diff #134132)	I'd add assert((!DT \|\| !DisabledDT) && "Both DT and DisabledDT are not nullptr!")
476 ↗	(On Diff #134132)	Same assert for DT and DisabledDT not being null simultaneously.

In D42717#1007270, @a.elovikov wrote:

I might be missing something, but why don't we enable DT back in LVI after the JT pass finishes and flushes the DDT? Is it possible that we end up with disabled DT *after* the JT?

I missed that, thank you @a.elovikov . I'm adding it to the next patch revision.

llvm/lib/Analysis/LazyValueInfo.cpp
469 ↗	(On Diff #134132)	Good point, will add asserts.

Added asserts to enableDT / disableDT to verify both DT and DisabledDT are never both valid pointers. Added a final LVI->enabledDT() at the very end of runImpl to allow subsequent passes to use DT with the cached LVI object.

@a.elovikov and @dberlin does this look good to commit?

In D42717#1007875, @brzycki wrote:

@a.elovikov and @dberlin does this look good to commit?

It's looks ok for me but I have not contributed enough to give lgtms.

In D42717#1008959, @a.elovikov wrote:

It's looks ok for me but I have not contributed enough to give lgtms.

Thanks @a.elovikov .
@kuhar , @sebpop , or @dberlin does the patch look ok to commit?

LGTM.

Closed by commit rL325356: [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis (authored by brzycki). · Explain WhyFeb 16 2018, 8:38 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

LazyValueInfo.h

7 lines

IR/

Dominators.h

3 lines

lib/

Analysis/

LazyValueInfo.cpp

30 lines

IR/

Dominators.cpp

3 lines

Transforms/

Scalar/

JumpThreading.cpp

37 lines

test/

Transforms/

JumpThreading/

pr36133.ll

44 lines

Diff 134636

llvm/trunk/include/llvm/Analysis/LazyValueInfo.h

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	public:
void eraseBlock(BasicBlock *BB);		void eraseBlock(BasicBlock *BB);

/// Print the \LazyValueInfo Analysis.		/// Print the \LazyValueInfo Analysis.
/// We pass in the DTree that is required for identifying which basic blocks		/// We pass in the DTree that is required for identifying which basic blocks
/// we can solve/print for, in the LVIPrinter. The DT is optional		/// we can solve/print for, in the LVIPrinter. The DT is optional
/// in LVI, so we need to pass it here as an argument.		/// in LVI, so we need to pass it here as an argument.
void printLVI(Function &F, DominatorTree &DTree, raw_ostream &OS);		void printLVI(Function &F, DominatorTree &DTree, raw_ostream &OS);

		/// Disables use of the DominatorTree within LVI.
		void disableDT();

		/// Enables use of the DominatorTree within LVI. Does nothing if the class
		/// instance was initialized without a DT pointer.
		void enableDT();

// For old PM pass. Delete once LazyValueInfoWrapperPass is gone.		// For old PM pass. Delete once LazyValueInfoWrapperPass is gone.
void releaseMemory();		void releaseMemory();

/// Handle invalidation events in the new pass manager.		/// Handle invalidation events in the new pass manager.
bool invalidate(Function &F, const PreservedAnalyses &PA,		bool invalidate(Function &F, const PreservedAnalyses &PA,
FunctionAnalysisManager::Invalidator &Inv);		FunctionAnalysisManager::Invalidator &Inv);
};		};

Show All 37 Lines

llvm/trunk/include/llvm/IR/Dominators.h

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	public:
void deleteEdge(BasicBlock From, BasicBlock To);		void deleteEdge(BasicBlock From, BasicBlock To);

/// \brief Delays the deletion of a basic block until a flush() event.		/// \brief Delays the deletion of a basic block until a flush() event.
void deleteBB(BasicBlock *DelBB);		void deleteBB(BasicBlock *DelBB);

/// \brief Returns true if DelBB is awaiting deletion at a flush() event.		/// \brief Returns true if DelBB is awaiting deletion at a flush() event.
bool pendingDeletedBB(BasicBlock *DelBB);		bool pendingDeletedBB(BasicBlock *DelBB);

		/// \brief Returns true if pending DT updates are queued for a flush() event.
		bool pending();

/// \brief Flushes all pending updates and block deletions. Returns a		/// \brief Flushes all pending updates and block deletions. Returns a
/// correct DominatorTree reference to be used by the caller for analysis.		/// correct DominatorTree reference to be used by the caller for analysis.
DominatorTree &flush();		DominatorTree &flush();

/// \brief Drops all internal state and forces a (slow) recalculation of the		/// \brief Drops all internal state and forces a (slow) recalculation of the
/// DominatorTree based on the current state of the LLVM IR in F. This should		/// DominatorTree based on the current state of the LLVM IR in F. This should
/// only be used in corner cases such as the Entry block of F being deleted.		/// only be used in corner cases such as the Entry block of F being deleted.
void recalculate(Function &F);		void recalculate(Function &F);
Show All 27 Lines

llvm/trunk/lib/Analysis/LazyValueInfo.cpp

Show First 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	bool pushBlockValue(const std::pair<BasicBlock , Value > &BV) {
<< "\n");		<< "\n");
BlockValueStack.push_back(BV);		BlockValueStack.push_back(BV);
return true;		return true;
}		}

AssumptionCache *AC; ///< A pointer to the cache of @llvm.assume calls.		AssumptionCache *AC; ///< A pointer to the cache of @llvm.assume calls.
const DataLayout &DL; ///< A mandatory DataLayout		const DataLayout &DL; ///< A mandatory DataLayout
DominatorTree *DT; ///< An optional DT pointer.		DominatorTree *DT; ///< An optional DT pointer.
		DominatorTree *DisabledDT; ///< Stores DT if it's disabled.

ValueLatticeElement getBlockValue(Value Val, BasicBlock BB);		ValueLatticeElement getBlockValue(Value Val, BasicBlock BB);
bool getEdgeValue(Value V, BasicBlock F, BasicBlock *T,		bool getEdgeValue(Value V, BasicBlock F, BasicBlock *T,
ValueLatticeElement &Result, Instruction *CxtI = nullptr);		ValueLatticeElement &Result, Instruction *CxtI = nullptr);
bool hasBlockValue(Value Val, BasicBlock BB);		bool hasBlockValue(Value Val, BasicBlock BB);

// These methods process one work item and may add more. A false value		// These methods process one work item and may add more. A false value
// returned means that the work item was not completely processed and must		// returned means that the work item was not completely processed and must
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	public:
}		}

/// This is part of the update interface to inform the cache		/// This is part of the update interface to inform the cache
/// that a block has been deleted.		/// that a block has been deleted.
void eraseBlock(BasicBlock *BB) {		void eraseBlock(BasicBlock *BB) {
TheCache.eraseBlock(BB);		TheCache.eraseBlock(BB);
}		}

		/// Disables use of the DominatorTree within LVI.
		void disableDT() {
		if (DT) {
		assert(!DisabledDT && "Both DT and DisabledDT are not nullptr!");
		std::swap(DT, DisabledDT);
		}
		}

		/// Enables use of the DominatorTree within LVI. Does nothing if the class
		/// instance was initialized without a DT pointer.
		void enableDT() {
		if (DisabledDT) {
		assert(!DT && "Both DT and DisabledDT are not nullptr!");
		std::swap(DT, DisabledDT);
		}
		}

/// This is the update interface to inform the cache that an edge from		/// This is the update interface to inform the cache that an edge from
/// PredBB to OldSucc has been threaded to be from PredBB to NewSucc.		/// PredBB to OldSucc has been threaded to be from PredBB to NewSucc.
void threadEdge(BasicBlock PredBB,BasicBlock OldSucc,BasicBlock *NewSucc);		void threadEdge(BasicBlock PredBB,BasicBlock OldSucc,BasicBlock *NewSucc);

LazyValueInfoImpl(AssumptionCache *AC, const DataLayout &DL,		LazyValueInfoImpl(AssumptionCache *AC, const DataLayout &DL,
DominatorTree *DT = nullptr)		DominatorTree *DT = nullptr)
: AC(AC), DL(DL), DT(DT) {}		: AC(AC), DL(DL), DT(DT), DisabledDT(nullptr) {}
};		};
} // end anonymous namespace		} // end anonymous namespace


void LazyValueInfoImpl::solve() {		void LazyValueInfoImpl::solve() {
SmallVector<std::pair<BasicBlock , Value >, 8> StartingStack(		SmallVector<std::pair<BasicBlock , Value >, 8> StartingStack(
BlockValueStack.begin(), BlockValueStack.end());		BlockValueStack.begin(), BlockValueStack.end());

▲ Show 20 Lines • Show All 1,305 Lines • ▼ Show 20 Lines


void LazyValueInfo::printLVI(Function &F, DominatorTree &DTree, raw_ostream &OS) {		void LazyValueInfo::printLVI(Function &F, DominatorTree &DTree, raw_ostream &OS) {
if (PImpl) {		if (PImpl) {
getImpl(PImpl, AC, DL, DT).printLVI(F, DTree, OS);		getImpl(PImpl, AC, DL, DT).printLVI(F, DTree, OS);
}		}
}		}

		void LazyValueInfo::disableDT() {
		if (PImpl)
		getImpl(PImpl, AC, DL, DT).disableDT();
		}

		void LazyValueInfo::enableDT() {
		if (PImpl)
		getImpl(PImpl, AC, DL, DT).enableDT();
		}

// Print the LVI for the function arguments at the start of each basic block.		// Print the LVI for the function arguments at the start of each basic block.
void LazyValueInfoAnnotatedWriter::emitBasicBlockStartAnnot(		void LazyValueInfoAnnotatedWriter::emitBasicBlockStartAnnot(
const BasicBlock *BB, formatted_raw_ostream &OS) {		const BasicBlock *BB, formatted_raw_ostream &OS) {
// Find if there are latticevalues defined for arguments of the function.		// Find if there are latticevalues defined for arguments of the function.
auto *F = BB->getParent();		auto *F = BB->getParent();
for (auto &Arg : F->args()) {		for (auto &Arg : F->args()) {
ValueLatticeElement Result = LVIImpl->getValueInBlock(		ValueLatticeElement Result = LVIImpl->getValueInBlock(
const_cast<Argument >(&Arg), const_cast<BasicBlock >(BB));		const_cast<Argument >(&Arg), const_cast<BasicBlock >(BB));
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

llvm/trunk/lib/IR/Dominators.cpp

	Show First 20 Lines • Show All 442 Lines • ▼ Show 20 Lines

	/// \brief Returns true if DelBB is awaiting deletion at a flush() event.			/// \brief Returns true if DelBB is awaiting deletion at a flush() event.
	bool DeferredDominance::pendingDeletedBB(BasicBlock *DelBB) {			bool DeferredDominance::pendingDeletedBB(BasicBlock *DelBB) {
	if (DeletedBBs.empty())			if (DeletedBBs.empty())
	return false;			return false;
	return DeletedBBs.count(DelBB) != 0;			return DeletedBBs.count(DelBB) != 0;
	}			}

				/// \brief Returns true if pending DT updates are queued for a flush() event.
				bool DeferredDominance::pending() { return !PendUpdates.empty(); }

	/// \brief Flushes all pending updates and block deletions. Returns a			/// \brief Flushes all pending updates and block deletions. Returns a
	/// correct DominatorTree reference to be used by the caller for analysis.			/// correct DominatorTree reference to be used by the caller for analysis.
	DominatorTree &DeferredDominance::flush() {			DominatorTree &DeferredDominance::flush() {
	// Updates to DT must happen before blocks are deleted below. Otherwise the			// Updates to DT must happen before blocks are deleted below. Otherwise the
	// DT traversal will encounter badref blocks and assert.			// DT traversal will encounter badref blocks and assert.
	if (!PendUpdates.empty()) {			if (!PendUpdates.empty()) {
	DT.applyUpdates(PendUpdates);			DT.applyUpdates(PendUpdates);
	PendUpdates.clear();			PendUpdates.clear();
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp

Show First 20 Lines • Show All 419 Lines • ▼ Show 20 Lines	for (Function::iterator I = F.begin(), E = F.end(); I != E;) {
Changed = true;		Changed = true;
}		}
}		}
EverChanged \|= Changed;		EverChanged \|= Changed;
} while (Changed);		} while (Changed);

LoopHeaders.clear();		LoopHeaders.clear();
DDT->flush();		DDT->flush();
		LVI->enableDT();
return EverChanged;		return EverChanged;
}		}

// Replace uses of Cond with ToVal when safe to do so. If all uses are		// Replace uses of Cond with ToVal when safe to do so. If all uses are
// replaced, we can remove Cond. We cannot blindly replace all uses of Cond		// replaced, we can remove Cond. We cannot blindly replace all uses of Cond
// because we may incorrectly replace uses when guards/assumes are uses of		// because we may incorrectly replace uses when guards/assumes are uses of
// of `Cond` and we used the guards/assume to reason about the `Cond` value		// of `Cond` and we used the guards/assume to reason about the `Cond` value
// at the end of block. RAUW unconditionally replaces all uses		// at the end of block. RAUW unconditionally replaces all uses
▲ Show 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	if (!I \|\| I->getParent() != BB) {
/// predecessor based on its terminator.		/// predecessor based on its terminator.
//		//
// FIXME: change this to use the more-rich 'getPredicateOnEdge' method if		// FIXME: change this to use the more-rich 'getPredicateOnEdge' method if
// "I" is a non-local compare-with-a-constant instruction. This would be		// "I" is a non-local compare-with-a-constant instruction. This would be
// able to handle value inequalities better, for example if the compare is		// able to handle value inequalities better, for example if the compare is
// "X < 4" and "X < 3" is known true but "X < 4" itself is not available.		// "X < 4" and "X < 3" is known true but "X < 4" itself is not available.
// Perhaps getConstantOnEdge should be smart enough to do this?		// Perhaps getConstantOnEdge should be smart enough to do this?

		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
for (BasicBlock *P : predecessors(BB)) {		for (BasicBlock *P : predecessors(BB)) {
// If the value is known by LazyValueInfo to be a constant in a		// If the value is known by LazyValueInfo to be a constant in a
// predecessor, use that information to try to thread this block.		// predecessor, use that information to try to thread this block.
Constant *PredCst = LVI->getConstantOnEdge(V, P, BB, CxtI);		Constant *PredCst = LVI->getConstantOnEdge(V, P, BB, CxtI);
if (Constant *KC = getKnownConstant(PredCst, Preference))		if (Constant *KC = getKnownConstant(PredCst, Preference))
Result.push_back(std::make_pair(KC, P));		Result.push_back(std::make_pair(KC, P));
}		}

return !Result.empty();		return !Result.empty();
}		}

/// If I is a PHI node, then we know the incoming values for any constants.		/// If I is a PHI node, then we know the incoming values for any constants.
if (PHINode *PN = dyn_cast<PHINode>(I)) {		if (PHINode *PN = dyn_cast<PHINode>(I)) {
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
Value *InVal = PN->getIncomingValue(i);		Value *InVal = PN->getIncomingValue(i);
if (Constant *KC = getKnownConstant(InVal, Preference)) {		if (Constant *KC = getKnownConstant(InVal, Preference)) {
Result.push_back(std::make_pair(KC, PN->getIncomingBlock(i)));		Result.push_back(std::make_pair(KC, PN->getIncomingBlock(i)));
} else {		} else {
Constant *CI = LVI->getConstantOnEdge(InVal,		Constant *CI = LVI->getConstantOnEdge(InVal,
PN->getIncomingBlock(i),		PN->getIncomingBlock(i),
BB, CxtI);		BB, CxtI);
▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	if (CmpInst *Cmp = dyn_cast<CmpInst>(I)) {
Value *CmpRHS = Cmp->getOperand(1);		Value *CmpRHS = Cmp->getOperand(1);
CmpInst::Predicate Pred = Cmp->getPredicate();		CmpInst::Predicate Pred = Cmp->getPredicate();

PHINode *PN = dyn_cast<PHINode>(CmpLHS);		PHINode *PN = dyn_cast<PHINode>(CmpLHS);
if (PN && PN->getParent() == BB) {		if (PN && PN->getParent() == BB) {
const DataLayout &DL = PN->getModule()->getDataLayout();		const DataLayout &DL = PN->getModule()->getDataLayout();
// We can do this simplification if any comparisons fold to true or false.		// We can do this simplification if any comparisons fold to true or false.
// See if any do.		// See if any do.
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {		for (unsigned i = 0, e = PN->getNumIncomingValues(); i != e; ++i) {
BasicBlock *PredBB = PN->getIncomingBlock(i);		BasicBlock *PredBB = PN->getIncomingBlock(i);
Value *LHS = PN->getIncomingValue(i);		Value *LHS = PN->getIncomingValue(i);
Value *RHS = CmpRHS->DoPHITranslation(BB, PredBB);		Value *RHS = CmpRHS->DoPHITranslation(BB, PredBB);

Value *Res = SimplifyCmpInst(Pred, LHS, RHS, {DL});		Value *Res = SimplifyCmpInst(Pred, LHS, RHS, {DL});
if (!Res) {		if (!Res) {
if (!isa<Constant>(RHS))		if (!isa<Constant>(RHS))
Show All 17 Lines	if (CmpInst *Cmp = dyn_cast<CmpInst>(I)) {

// If comparing a live-in value against a constant, see if we know the		// If comparing a live-in value against a constant, see if we know the
// live-in value on any predecessors.		// live-in value on any predecessors.
if (isa<Constant>(CmpRHS) && !CmpType->isVectorTy()) {		if (isa<Constant>(CmpRHS) && !CmpType->isVectorTy()) {
Constant *CmpConst = cast<Constant>(CmpRHS);		Constant *CmpConst = cast<Constant>(CmpRHS);

if (!isa<Instruction>(CmpLHS) \|\|		if (!isa<Instruction>(CmpLHS) \|\|
cast<Instruction>(CmpLHS)->getParent() != BB) {		cast<Instruction>(CmpLHS)->getParent() != BB) {
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
for (BasicBlock *P : predecessors(BB)) {		for (BasicBlock *P : predecessors(BB)) {
// If the value is known by LazyValueInfo to be a constant in a		// If the value is known by LazyValueInfo to be a constant in a
// predecessor, use that information to try to thread this block.		// predecessor, use that information to try to thread this block.
LazyValueInfo::Tristate Res =		LazyValueInfo::Tristate Res =
LVI->getPredicateOnEdge(Pred, CmpLHS,		LVI->getPredicateOnEdge(Pred, CmpLHS,
CmpConst, P, BB, CxtI ? CxtI : Cmp);		CmpConst, P, BB, CxtI ? CxtI : Cmp);
if (Res == LazyValueInfo::Unknown)		if (Res == LazyValueInfo::Unknown)
continue;		continue;
Show All 12 Lines	if (isa<Constant>(CmpRHS) && !CmpType->isVectorTy()) {
using namespace PatternMatch;		using namespace PatternMatch;

Value *AddLHS;		Value *AddLHS;
ConstantInt *AddConst;		ConstantInt *AddConst;
if (isa<ConstantInt>(CmpConst) &&		if (isa<ConstantInt>(CmpConst) &&
match(CmpLHS, m_Add(m_Value(AddLHS), m_ConstantInt(AddConst)))) {		match(CmpLHS, m_Add(m_Value(AddLHS), m_ConstantInt(AddConst)))) {
if (!isa<Instruction>(AddLHS) \|\|		if (!isa<Instruction>(AddLHS) \|\|
cast<Instruction>(AddLHS)->getParent() != BB) {		cast<Instruction>(AddLHS)->getParent() != BB) {
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
for (BasicBlock *P : predecessors(BB)) {		for (BasicBlock *P : predecessors(BB)) {
// If the value is known by LazyValueInfo to be a ConstantRange in		// If the value is known by LazyValueInfo to be a ConstantRange in
// a predecessor, use that information to try to thread this		// a predecessor, use that information to try to thread this
// block.		// block.
ConstantRange CR = LVI->getConstantRangeOnEdge(		ConstantRange CR = LVI->getConstantRangeOnEdge(
AddLHS, P, BB, CxtI ? CxtI : cast<Instruction>(CmpLHS));		AddLHS, P, BB, CxtI ? CxtI : cast<Instruction>(CmpLHS));
// Propagate the range through the addition.		// Propagate the range through the addition.
CR = CR.add(AddConst->getValue());		CR = CR.add(AddConst->getValue());
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	if ((TrueVal \|\| FalseVal) &&
Result.push_back(std::make_pair(Val, C.second));		Result.push_back(std::make_pair(Val, C.second));
}		}

return !Result.empty();		return !Result.empty();
}		}
}		}

// If all else fails, see if LVI can figure out a constant value for us.		// If all else fails, see if LVI can figure out a constant value for us.
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
Constant *CI = LVI->getConstant(V, BB, CxtI);		Constant *CI = LVI->getConstant(V, BB, CxtI);
if (Constant *KC = getKnownConstant(CI, Preference)) {		if (Constant *KC = getKnownConstant(CI, Preference)) {
for (BasicBlock *Pred : predecessors(BB))		for (BasicBlock *Pred : predecessors(BB))
Result.push_back(std::make_pair(KC, Pred));		Result.push_back(std::make_pair(KC, Pred));
}		}

return !Result.empty();		return !Result.empty();
}		}
▲ Show 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	if (CmpInst *CondCmp = dyn_cast<CmpInst>(CondInst)) {
BranchInst *CondBr = dyn_cast<BranchInst>(BB->getTerminator());		BranchInst *CondBr = dyn_cast<BranchInst>(BB->getTerminator());
Constant *CondConst = dyn_cast<Constant>(CondCmp->getOperand(1));		Constant *CondConst = dyn_cast<Constant>(CondCmp->getOperand(1));
if (CondBr && CondConst) {		if (CondBr && CondConst) {
// We should have returned as soon as we turn a conditional branch to		// We should have returned as soon as we turn a conditional branch to
// unconditional. Because its no longer interesting as far as jump		// unconditional. Because its no longer interesting as far as jump
// threading is concerned.		// threading is concerned.
assert(CondBr->isConditional() && "Threading on unconditional terminator");		assert(CondBr->isConditional() && "Threading on unconditional terminator");

		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
LazyValueInfo::Tristate Ret =		LazyValueInfo::Tristate Ret =
LVI->getPredicateAt(CondCmp->getPredicate(), CondCmp->getOperand(0),		LVI->getPredicateAt(CondCmp->getPredicate(), CondCmp->getOperand(0),
CondConst, CondBr);		CondConst, CondBr);
if (Ret != LazyValueInfo::Unknown) {		if (Ret != LazyValueInfo::Unknown) {
unsigned ToRemove = Ret == LazyValueInfo::True ? 1 : 0;		unsigned ToRemove = Ret == LazyValueInfo::True ? 1 : 0;
unsigned ToKeep = Ret == LazyValueInfo::True ? 0 : 1;		unsigned ToKeep = Ret == LazyValueInfo::True ? 0 : 1;
BasicBlock *ToRemoveSucc = CondBr->getSuccessor(ToRemove);		BasicBlock *ToRemoveSucc = CondBr->getSuccessor(ToRemove);
ToRemoveSucc->removePredecessor(BB, true);		ToRemoveSucc->removePredecessor(BB, true);
▲ Show 20 Lines • Show All 784 Lines • ▼ Show 20 Lines	bool JumpThreadingPass::ThreadEdge(BasicBlock *BB,
}		}

// And finally, do it!		// And finally, do it!
DEBUG(dbgs() << " Threading edge from '" << PredBB->getName() << "' to '"		DEBUG(dbgs() << " Threading edge from '" << PredBB->getName() << "' to '"
<< SuccBB->getName() << "' with cost: " << JumpThreadCost		<< SuccBB->getName() << "' with cost: " << JumpThreadCost
<< ", across block:\n "		<< ", across block:\n "
<< *BB << "\n");		<< *BB << "\n");

		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
LVI->threadEdge(PredBB, BB, SuccBB);		LVI->threadEdge(PredBB, BB, SuccBB);

// We are going to have to map operands from the original BB block to the new		// We are going to have to map operands from the original BB block to the new
// copy of the block 'NewBB'. If there are PHI nodes in BB, evaluate them to		// copy of the block 'NewBB'. If there are PHI nodes in BB, evaluate them to
// account for entry from PredBB.		// account for entry from PredBB.
DenseMap<Instruction, Value> ValueMapping;		DenseMap<Instruction, Value> ValueMapping;

BasicBlock *NewBB = BasicBlock::Create(BB->getContext(),		BasicBlock *NewBB = BasicBlock::Create(BB->getContext(),
▲ Show 20 Lines • Show All 453 Lines • ▼ Show 20 Lines	for (unsigned I = 0, E = CondLHS->getNumIncomingValues(); I != E; ++I) {

BranchInst *PredTerm = dyn_cast<BranchInst>(Pred->getTerminator());		BranchInst *PredTerm = dyn_cast<BranchInst>(Pred->getTerminator());
if (!PredTerm \|\| !PredTerm->isUnconditional())		if (!PredTerm \|\| !PredTerm->isUnconditional())
continue;		continue;

// Now check if one of the select values would allow us to constant fold the		// Now check if one of the select values would allow us to constant fold the
// terminator in BB. We don't do the transform if both sides fold, those		// terminator in BB. We don't do the transform if both sides fold, those
// cases will be threaded in any case.		// cases will be threaded in any case.
		if (DDT->pending())
		LVI->disableDT();
		else
		LVI->enableDT();
LazyValueInfo::Tristate LHSFolds =		LazyValueInfo::Tristate LHSFolds =
LVI->getPredicateOnEdge(CondCmp->getPredicate(), SI->getOperand(1),		LVI->getPredicateOnEdge(CondCmp->getPredicate(), SI->getOperand(1),
CondRHS, Pred, BB, CondCmp);		CondRHS, Pred, BB, CondCmp);
LazyValueInfo::Tristate RHSFolds =		LazyValueInfo::Tristate RHSFolds =
LVI->getPredicateOnEdge(CondCmp->getPredicate(), SI->getOperand(2),		LVI->getPredicateOnEdge(CondCmp->getPredicate(), SI->getOperand(2),
CondRHS, Pred, BB, CondCmp);		CondRHS, Pred, BB, CondCmp);
if ((LHSFolds != LazyValueInfo::Unknown \|\|		if ((LHSFolds != LazyValueInfo::Unknown \|\|
RHSFolds != LazyValueInfo::Unknown) &&		RHSFolds != LazyValueInfo::Unknown) &&
▲ Show 20 Lines • Show All 263 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/JumpThreading/pr36133.ll

				; RUN: opt -jump-threading -S < %s \| FileCheck %s
				@global = external global i8*, align 8

				define i32 @foo(i32 %arg) {
				; CHECK-LABEL: @foo
				; CHECK-LABEL: bb:
				; CHECK: icmp eq
				; CHECK-NEXT: br i1 %tmp1, label %bb7, label %bb7
				bb:
				%tmp = load i8, i8* @global, align 8
				%tmp1 = icmp eq i8* %tmp, null
				br i1 %tmp1, label %bb3, label %bb2

				; CHECK-NOT: bb2:
				bb2:
				br label %bb3

				; CHECK-NOT: bb3:
				bb3:
				%tmp4 = phi i8 [ 1, %bb2 ], [ 0, %bb ]
				%tmp5 = icmp eq i8 %tmp4, 0
				br i1 %tmp5, label %bb7, label %bb6

				; CHECK-NOT: bb6:
				bb6:
				br label %bb7

				; CHECK-LABEL: bb7:
				bb7:
				%tmp8 = icmp eq i32 %arg, -1
				br i1 %tmp8, label %bb9, label %bb10

				; CHECK-LABEL: bb9:
				bb9:
				ret i32 0

				; CHECK-LABEL: bb10:
				bb10:
				%tmp11 = icmp sgt i32 %arg, -1
				call void @llvm.assume(i1 %tmp11)
				ret i32 1
				}

				declare void @llvm.assume(i1)