This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] use UnreachableBlocks to avoid unreachable regions
ClosedPublic

Authored by brzycki on Mar 6 2018, 3:55 PM.

Details

Summary

It is possible to cause hangs in JumpThreading due to how the pass decides there is no more work to be done. An older attempt at fixing this problem was D3991 where removeUnreachableBlocks(F) is now called on the function before any JumpThreading is attempted. This has a few drawbacks though:

  • It's an expensive transformation.
  • It is a coarse tool: it does not comprehend or care about preserving Loops.
  • It actually has nothing to do with the JumpThreading pass.
  • It does more than just remove unrechable blocks, it changes reachable blocks too, especially Terminator Instructions.

This patch foregoes calling removeUnreachableBlocks and uses a SmallPtrSet using the initial DominatorTree to track unreachable blocks.

Diff Detail

Repository
rL LLVM

Event Timeline

brzycki created this revision.Mar 6 2018, 3:55 PM

Here are 4 ctmark runs on x86_64 of compile_time results. The tip compiler is llvm-project SHA 7a365bd12dd84be1e7c9ca6d9a2268b80f604ae9. The upstream compiler is the same as tip with this patch applied.

# tip (left), upstream (right)
#  left column: build.tip_ctmark_1/results.json
# right column: build.upstream_ctmark_1/results.json
#  metric name: compile_time
    378.3760 -> 346.6120     [  9.16%]  CTMark/sqlite3/sqlite3.test
    499.5760 -> 492.7440     [  1.39%]  CTMark/tramp3d-v4/tramp3d-v4.test
    390.3720 -> 386.4720     [  1.01%]  CTMark/kimwitu++/kc.test
    503.2800 -> 499.5320     [  0.75%]  CTMark/ClamAV/clamscan.test
    253.5440 <- 255.4160     [  0.74%]  CTMark/mafft/pairlocalalign.test
   1272.1440 <- 1279.2400    [  0.56%]  CTMark/7zip/7zip-benchmark.test
    463.2200 -> 461.1600     [  0.45%]  CTMark/SPASS/SPASS.test
    595.7480 <- 597.0600     [  0.22%]  CTMark/lencod/lencod.test
    928.3480 -> 927.5960     [  0.08%]  CTMark/Bullet/bullet.test
    370.5800 -> 370.5160     [  0.02%]  CTMark/consumer-typeset/consumer-typeset.test
#  left column: build.tip_ctmark_1/results.json
# right column: build.upstream_ctmark_2/results.json
#  metric name: compile_time
    499.5760 -> 487.3400     [  2.51%]  CTMark/tramp3d-v4/tramp3d-v4.test
    378.3760 -> 372.8840     [  1.47%]  CTMark/sqlite3/sqlite3.test
    253.5440 -> 250.2840     [  1.30%]  CTMark/mafft/pairlocalalign.test
    390.3720 -> 387.8960     [  0.64%]  CTMark/kimwitu++/kc.test
    595.7480 -> 592.4760     [  0.55%]  CTMark/lencod/lencod.test
    928.3480 <- 931.6960     [  0.36%]  CTMark/Bullet/bullet.test
    370.5800 <- 371.3080     [  0.20%]  CTMark/consumer-typeset/consumer-typeset.test
    503.2800 -> 502.3880     [  0.18%]  CTMark/ClamAV/clamscan.test
   1272.1440 <- 1273.3800    [  0.10%]  CTMark/7zip/7zip-benchmark.test
    463.2200 -> 462.7920     [  0.09%]  CTMark/SPASS/SPASS.test
#  left column: build.tip_ctmark_1/results.json
# right column: build.upstream_ctmark_3/results.json
#  metric name: compile_time
    253.5440 <- 256.0280     [  0.98%]  CTMark/mafft/pairlocalalign.test
    595.7480 -> 591.4720     [  0.72%]  CTMark/lencod/lencod.test
    390.3720 <- 392.2520     [  0.48%]  CTMark/kimwitu++/kc.test
    463.2200 -> 461.2000     [  0.44%]  CTMark/SPASS/SPASS.test
   1272.1440 <- 1275.2320    [  0.24%]  CTMark/7zip/7zip-benchmark.test
    503.2800 -> 502.0800     [  0.24%]  CTMark/ClamAV/clamscan.test
    928.3480 -> 927.0640     [  0.14%]  CTMark/Bullet/bullet.test
    370.5800 <- 371.0080     [  0.12%]  CTMark/consumer-typeset/consumer-typeset.test
    499.5760 -> 499.0160     [  0.11%]  CTMark/tramp3d-v4/tramp3d-v4.test
    378.3760 -> 378.0360     [  0.09%]  CTMark/sqlite3/sqlite3.test
#  left column: build.tip_ctmark_1/results.json
# right column: build.upstream_ctmark_4/results.json
#  metric name: compile_time
    499.5760 <- 541.9040     [  8.47%]  CTMark/tramp3d-v4/tramp3d-v4.test
    378.3760 -> 350.2080     [  8.04%]  CTMark/sqlite3/sqlite3.test
    253.5440 <- 257.6560     [  1.62%]  CTMark/mafft/pairlocalalign.test
    370.5800 -> 365.7360     [  1.32%]  CTMark/consumer-typeset/consumer-typeset.test
    595.7480 -> 590.7960     [  0.84%]  CTMark/lencod/lencod.test
    390.3720 <- 392.9920     [  0.67%]  CTMark/kimwitu++/kc.test
    463.2200 -> 461.2760     [  0.42%]  CTMark/SPASS/SPASS.test
    503.2800 -> 502.0880     [  0.24%]  CTMark/ClamAV/clamscan.test
    928.3480 <- 929.7520     [  0.15%]  CTMark/Bullet/bullet.test
   1272.1440 <- 1273.3720    [  0.10%]  CTMark/7zip/7zip-benchmark.test
#  left column: build.tip_ctmark_2/results.json
# right column: build.upstream_ctmark_1/results.json
#  metric name: compile_time
    587.0520 <- 597.0600     [  1.70%]  CTMark/lencod/lencod.test
    466.3600 -> 461.1600     [  1.13%]  CTMark/SPASS/SPASS.test
    257.6840 -> 255.4160     [  0.89%]  CTMark/mafft/pairlocalalign.test
    496.7760 -> 492.7440     [  0.82%]  CTMark/tramp3d-v4/tramp3d-v4.test
    343.8520 <- 346.6120     [  0.80%]  CTMark/sqlite3/sqlite3.test
    502.8960 -> 499.5320     [  0.67%]  CTMark/ClamAV/clamscan.test
   1275.4080 <- 1279.2400    [  0.30%]  CTMark/7zip/7zip-benchmark.test
    930.1400 -> 927.5960     [  0.27%]  CTMark/Bullet/bullet.test
    369.5240 <- 370.5160     [  0.27%]  CTMark/consumer-typeset/consumer-typeset.test
    387.4960 -> 386.4720     [  0.26%]  CTMark/kimwitu++/kc.test
#  left column: build.tip_ctmark_2/results.json
# right column: build.upstream_ctmark_2/results.json
#  metric name: compile_time
    343.8520 <- 372.8840     [  8.44%]  CTMark/sqlite3/sqlite3.test
    257.6840 -> 250.2840     [  2.96%]  CTMark/mafft/pairlocalalign.test
    496.7760 -> 487.3400     [  1.94%]  CTMark/tramp3d-v4/tramp3d-v4.test
    587.0520 <- 592.4760     [  0.92%]  CTMark/lencod/lencod.test
    466.3600 -> 462.7920     [  0.77%]  CTMark/SPASS/SPASS.test
    369.5240 <- 371.3080     [  0.48%]  CTMark/consumer-typeset/consumer-typeset.test
    930.1400 <- 931.6960     [  0.17%]  CTMark/Bullet/bullet.test
   1275.4080 -> 1273.3800    [  0.16%]  CTMark/7zip/7zip-benchmark.test
    387.4960 <- 387.8960     [  0.10%]  CTMark/kimwitu++/kc.test
    502.8960 -> 502.3880     [  0.10%]  CTMark/ClamAV/clamscan.test
#  left column: build.tip_ctmark_2/results.json
# right column: build.upstream_ctmark_3/results.json
#  metric name: compile_time
    343.8520 <- 378.0360     [  9.94%]  CTMark/sqlite3/sqlite3.test
    387.4960 <- 392.2520     [  1.23%]  CTMark/kimwitu++/kc.test
    466.3600 -> 461.2000     [  1.12%]  CTMark/SPASS/SPASS.test
    587.0520 <- 591.4720     [  0.75%]  CTMark/lencod/lencod.test
    257.6840 -> 256.0280     [  0.65%]  CTMark/mafft/pairlocalalign.test
    496.7760 <- 499.0160     [  0.45%]  CTMark/tramp3d-v4/tramp3d-v4.test
    369.5240 <- 371.0080     [  0.40%]  CTMark/consumer-typeset/consumer-typeset.test
    930.1400 -> 927.0640     [  0.33%]  CTMark/Bullet/bullet.test
    502.8960 -> 502.0800     [  0.16%]  CTMark/ClamAV/clamscan.test
   1275.4080 -> 1275.2320    [  0.01%]  CTMark/7zip/7zip-benchmark.test
#  left column: build.tip_ctmark_2/results.json
# right column: build.upstream_ctmark_4/results.json
#  metric name: compile_time
    496.7760 <- 541.9040     [  9.08%]  CTMark/tramp3d-v4/tramp3d-v4.test
    343.8520 <- 350.2080     [  1.85%]  CTMark/sqlite3/sqlite3.test
    387.4960 <- 392.9920     [  1.42%]  CTMark/kimwitu++/kc.test
    466.3600 -> 461.2760     [  1.10%]  CTMark/SPASS/SPASS.test
    369.5240 -> 365.7360     [  1.04%]  CTMark/consumer-typeset/consumer-typeset.test
    587.0520 <- 590.7960     [  0.64%]  CTMark/lencod/lencod.test
    502.8960 -> 502.0880     [  0.16%]  CTMark/ClamAV/clamscan.test
   1275.4080 -> 1273.3720    [  0.16%]  CTMark/7zip/7zip-benchmark.test
    930.1400 -> 929.7520     [  0.04%]  CTMark/Bullet/bullet.test
    257.6840 -> 257.6560     [  0.01%]  CTMark/mafft/pairlocalalign.test
#  left column: build.tip_ctmark_3/results.json
# right column: build.upstream_ctmark_1/results.json
#  metric name: compile_time
    540.9680 -> 492.7440     [  9.79%]  CTMark/tramp3d-v4/tramp3d-v4.test
    357.0480 -> 346.6120     [  3.01%]  CTMark/sqlite3/sqlite3.test
    589.7160 <- 597.0600     [  1.25%]  CTMark/lencod/lencod.test
    366.6280 <- 370.5160     [  1.06%]  CTMark/consumer-typeset/consumer-typeset.test
    257.4080 -> 255.4160     [  0.78%]  CTMark/mafft/pairlocalalign.test
    502.8920 -> 499.5320     [  0.67%]  CTMark/ClamAV/clamscan.test
    388.6480 -> 386.4720     [  0.56%]  CTMark/kimwitu++/kc.test
    930.4400 -> 927.5960     [  0.31%]  CTMark/Bullet/bullet.test
   1277.1600 <- 1279.2400    [  0.16%]  CTMark/7zip/7zip-benchmark.test
    460.5200 <- 461.1600     [  0.14%]  CTMark/SPASS/SPASS.test
#  left column: build.tip_ctmark_3/results.json
# right column: build.upstream_ctmark_2/results.json
#  metric name: compile_time
    540.9680 -> 487.3400     [ 11.00%]  CTMark/tramp3d-v4/tramp3d-v4.test
    357.0480 <- 372.8840     [  4.44%]  CTMark/sqlite3/sqlite3.test
    257.4080 -> 250.2840     [  2.85%]  CTMark/mafft/pairlocalalign.test
    366.6280 <- 371.3080     [  1.28%]  CTMark/consumer-typeset/consumer-typeset.test
    460.5200 <- 462.7920     [  0.49%]  CTMark/SPASS/SPASS.test
    589.7160 <- 592.4760     [  0.47%]  CTMark/lencod/lencod.test
   1277.1600 -> 1273.3800    [  0.30%]  CTMark/7zip/7zip-benchmark.test
    388.6480 -> 387.8960     [  0.19%]  CTMark/kimwitu++/kc.test
    930.4400 <- 931.6960     [  0.13%]  CTMark/Bullet/bullet.test
    502.8920 -> 502.3880     [  0.10%]  CTMark/ClamAV/clamscan.test
#  left column: build.tip_ctmark_3/results.json
# right column: build.upstream_ctmark_3/results.json
#  metric name: compile_time
    540.9680 -> 499.0160     [  8.41%]  CTMark/tramp3d-v4/tramp3d-v4.test
    357.0480 <- 378.0360     [  5.88%]  CTMark/sqlite3/sqlite3.test
    366.6280 <- 371.0080     [  1.19%]  CTMark/consumer-typeset/consumer-typeset.test
    388.6480 <- 392.2520     [  0.93%]  CTMark/kimwitu++/kc.test
    257.4080 -> 256.0280     [  0.54%]  CTMark/mafft/pairlocalalign.test
    930.4400 -> 927.0640     [  0.36%]  CTMark/Bullet/bullet.test
    589.7160 <- 591.4720     [  0.30%]  CTMark/lencod/lencod.test
    502.8920 -> 502.0800     [  0.16%]  CTMark/ClamAV/clamscan.test
   1277.1600 -> 1275.2320    [  0.15%]  CTMark/7zip/7zip-benchmark.test
    460.5200 <- 461.2000     [  0.15%]  CTMark/SPASS/SPASS.test
#  left column: build.tip_ctmark_3/results.json
# right column: build.upstream_ctmark_4/results.json
#  metric name: compile_time
    357.0480 -> 350.2080     [  1.95%]  CTMark/sqlite3/sqlite3.test
    388.6480 <- 392.9920     [  1.12%]  CTMark/kimwitu++/kc.test
   1277.1600 -> 1273.3720    [  0.30%]  CTMark/7zip/7zip-benchmark.test
    366.6280 -> 365.7360     [  0.24%]  CTMark/consumer-typeset/consumer-typeset.test
    589.7160 <- 590.7960     [  0.18%]  CTMark/lencod/lencod.test
    540.9680 <- 541.9040     [  0.17%]  CTMark/tramp3d-v4/tramp3d-v4.test
    460.5200 <- 461.2760     [  0.16%]  CTMark/SPASS/SPASS.test
    502.8920 -> 502.0880     [  0.16%]  CTMark/ClamAV/clamscan.test
    257.4080 <- 257.6560     [  0.10%]  CTMark/mafft/pairlocalalign.test
    930.4400 -> 929.7520     [  0.07%]  CTMark/Bullet/bullet.test
#  left column: build.tip_ctmark_4/results.json
# right column: build.upstream_ctmark_1/results.json
#  metric name: compile_time
    351.1320 -> 346.6120     [  1.30%]  CTMark/sqlite3/sqlite3.test
    486.5040 <- 492.7440     [  1.28%]  CTMark/tramp3d-v4/tramp3d-v4.test
    253.6800 <- 255.4160     [  0.68%]  CTMark/mafft/pairlocalalign.test
    464.2840 -> 461.1600     [  0.68%]  CTMark/SPASS/SPASS.test
   1275.1200 <- 1279.2400    [  0.32%]  CTMark/7zip/7zip-benchmark.test
    595.3080 <- 597.0600     [  0.29%]  CTMark/lencod/lencod.test
    387.2880 -> 386.4720     [  0.21%]  CTMark/kimwitu++/kc.test
    500.5120 -> 499.5320     [  0.20%]  CTMark/ClamAV/clamscan.test
    369.8800 <- 370.5160     [  0.17%]  CTMark/consumer-typeset/consumer-typeset.test
    926.4040 <- 927.5960     [  0.13%]  CTMark/Bullet/bullet.test
#  left column: build.tip_ctmark_4/results.json
# right column: build.upstream_ctmark_2/results.json
#  metric name: compile_time
    351.1320 <- 372.8840     [  6.19%]  CTMark/sqlite3/sqlite3.test
    253.6800 -> 250.2840     [  1.36%]  CTMark/mafft/pairlocalalign.test
    926.4040 <- 931.6960     [  0.57%]  CTMark/Bullet/bullet.test
    595.3080 -> 592.4760     [  0.48%]  CTMark/lencod/lencod.test
    369.8800 <- 371.3080     [  0.39%]  CTMark/consumer-typeset/consumer-typeset.test
    500.5120 <- 502.3880     [  0.37%]  CTMark/ClamAV/clamscan.test
    464.2840 -> 462.7920     [  0.32%]  CTMark/SPASS/SPASS.test
    486.5040 <- 487.3400     [  0.17%]  CTMark/tramp3d-v4/tramp3d-v4.test
    387.2880 <- 387.8960     [  0.16%]  CTMark/kimwitu++/kc.test
   1275.1200 -> 1273.3800    [  0.14%]  CTMark/7zip/7zip-benchmark.test
#  left column: build.tip_ctmark_4/results.json
# right column: build.upstream_ctmark_3/results.json
#  metric name: compile_time
    351.1320 <- 378.0360     [  7.66%]  CTMark/sqlite3/sqlite3.test
    486.5040 <- 499.0160     [  2.57%]  CTMark/tramp3d-v4/tramp3d-v4.test
    387.2880 <- 392.2520     [  1.28%]  CTMark/kimwitu++/kc.test
    253.6800 <- 256.0280     [  0.93%]  CTMark/mafft/pairlocalalign.test
    464.2840 -> 461.2000     [  0.67%]  CTMark/SPASS/SPASS.test
    595.3080 -> 591.4720     [  0.65%]  CTMark/lencod/lencod.test
    500.5120 <- 502.0800     [  0.31%]  CTMark/ClamAV/clamscan.test
    369.8800 <- 371.0080     [  0.30%]  CTMark/consumer-typeset/consumer-typeset.test
    926.4040 <- 927.0640     [  0.07%]  CTMark/Bullet/bullet.test
   1275.1200 <- 1275.2320    [  0.01%]  CTMark/7zip/7zip-benchmark.test
#  left column: build.tip_ctmark_4/results.json
# right column: build.upstream_ctmark_4/results.json
#  metric name: compile_time
    486.5040 <- 541.9040     [ 11.39%]  CTMark/tramp3d-v4/tramp3d-v4.test
    253.6800 <- 257.6560     [  1.57%]  CTMark/mafft/pairlocalalign.test
    387.2880 <- 392.9920     [  1.47%]  CTMark/kimwitu++/kc.test
    369.8800 -> 365.7360     [  1.13%]  CTMark/consumer-typeset/consumer-typeset.test
    595.3080 -> 590.7960     [  0.76%]  CTMark/lencod/lencod.test
    464.2840 -> 461.2760     [  0.65%]  CTMark/SPASS/SPASS.test
    926.4040 <- 929.7520     [  0.36%]  CTMark/Bullet/bullet.test
    500.5120 <- 502.0880     [  0.31%]  CTMark/ClamAV/clamscan.test
    351.1320 -> 350.2080     [  0.26%]  CTMark/sqlite3/sqlite3.test
   1275.1200 -> 1273.3720    [  0.14%]  CTMark/7zip/7zip-benchmark.test
sebpop accepted this revision.Mar 7 2018, 6:48 AM

LGTM.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
366 ↗(On Diff #137283)

I would have called this bitmap UnreachableBlocks.

This revision is now accepted and ready to land.Mar 7 2018, 6:48 AM
brzycki edited the summary of this revision. (Show Details)Mar 7 2018, 8:11 AM
brzycki marked an inline comment as done.Mar 7 2018, 8:28 AM
brzycki added inline comments.
llvm/lib/Transforms/Scalar/JumpThreading.cpp
366 ↗(On Diff #137283)

Good point, renaming.

brzycki updated this revision to Diff 137401.Mar 7 2018, 8:29 AM
brzycki marked an inline comment as done.
brzycki retitled this revision from [JumpThreading] use InvalidBlocks to avoid unreachable regions to [JumpThreading] use UnreachableBlocks to avoid unreachable regions.
brzycki edited the summary of this revision. (Show Details)

Renamed InvalidBlocks to UnreachableBlocks, minor tweaks to comments.

kuhar added inline comments.Mar 7 2018, 1:10 PM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
362 ↗(On Diff #137401)

I can see that UnreachableBlocks are being cleared every time at the beginning and at the end. Can it be a local function variable instead, or am I missing something?

378 ↗(On Diff #137401)

Can you add a comment on why the entry block is skipped? I guess this is because the latter code tries to delete block with no predecessor as it figures they are unreachable, but that's not the case for the function entry?

395 ↗(On Diff #137401)

I think the type is obvious here: auto *BI = ...

369 ↗(On Diff #137283)

I'd put it on a separate line, one uninitialized value looks a bit suspicious.

372 ↗(On Diff #137283)

Can this be rewritten to a range-based for loop?

Hi @kuhar , thanks for the review. As a general comment I tried to minimize changes to JumpThreading to remove removeUnreachableBlocks.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
362 ↗(On Diff #137401)

I added UnreachableBlocks to JumpThreading.h because I used the LoopHeaders code as my model. Originally I was tracking ValidBlocks but that became memory and performance prohibitive. I can try UnreachableBlocks as a local variable although I'm not quite sure what the LLVM style guide is for defining a local variable differently based on NDEBUG.

Either way this UnreachableBlocks.clear() should be removed.

378 ↗(On Diff #137401)

It's the same code as before. If you look at the original JumpThreading loop both of the subsequent cleanups (DeleteDeadBlock and TryToSimplifyUncondBranchFromEmptyBlock) guarded against touching the entry. This is because both of these delete BB block from`F`. Whether or not this is pessimizing potential JumpThreads is uncertain. I can certainly add a comment.

395 ↗(On Diff #137401)

Will do.

369 ↗(On Diff #137283)

Will do. Changed is set 2 lines later but I don't mind either way.

372 ↗(On Diff #137283)

Possibly. I left it as it was to minimize disruptions. I wasn't certain if this form handles deleted elements better than for (auto &BB : F) does. I can give it a try and see if all tests pass.

kuhar added a comment.Mar 8 2018, 7:53 AM

Thanks for clarifying. It would be definitely nice to see some cleanup, but I'll be fine with this happening in a follow-up patch if that makes testing easier.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
362 ↗(On Diff #137401)

I think that having a different field types depending on build type is is much more problematic than having local variables of different types. This is because in the former case you may expose different ABI based on build type, while having local variables of different types is not something others can depend on.

@kuhar I attempted to use a ranged iterator for (auto &BB : F) loop and it causes crashes in test-suite.

FAILED: Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o
/work/brzycki/test-suite/build/tools/timeit --summary Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.time /work/brzycki/upstream/install/bin/clang++  -DNDEBUG -I../Bitcode/Benchmarks/Halide/bilateral_grid -IBitcode/Benchmarks/Halide/bilateral_grid -O3   -O3 -DNDEBUG   -w -Werror=date-time -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -MF Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.d -o Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -c ../Bitcode/Benchmarks/Halide/bilateral_grid/driver.cpp
While deleting: label %lpad142

I'm going to revert to the original loop structure.

brzycki updated this revision to Diff 137665.Mar 8 2018, 3:45 PM

Addressed @kuhar 's comments.

@kuhar , I was incorrect about the for (auto &BB : F) causing crashes. It was actually the ifdef NDEBUG when I moved it into JumpThreading.cpp. I opted to not add an NDEBUG stanza and just use SmallPtrSet. There are several other instances of local pointer sets within JumpThreading that do the same.

Ping. @sebpop, @kuhar, or @dberlin any objections to pushing this patch as-is?

kuhar accepted this revision.Mar 15 2018, 7:42 AM

LGTM. Thanks for the fixes

sebpop accepted this revision.Mar 15 2018, 7:45 AM

LGTM. Thanks!

Thanks guys. I found a few spelling mistakes in the comments that I'm fixing and testing now. I'll upload the diff and push the change shortly if all goes well.

brzycki updated this revision to Diff 138705.Mar 16 2018, 7:40 AM

Comment cleanup and rebase. NFC.

This revision was automatically updated to reflect the committed changes.

It looks like in general, the union of the the Unreachable set and DeletedBBs does not contain all unreachable blocks; the Unreachable set never gets updated, and DeletedBBs only contains blocks which trivially have no predecessors. Could this lead to problems when jump threading erases an edge?

Could this lead to problems when jump threading erases an edge?

Hi @efriedma , it is possible there are still problems inside JumpThreading related to unreachable block chains. However, this patch wouldn't make that problem any better (or worse) than how JumpThreading was before. Here's the high-level algorithm of JumpThreading before this change:

removeUnreachableBlocks(F);
while Changed:
  for BB in F:
    ProcessBlock(BB);  // Can modify Changed
    // Try to delete BB (can modify Changed)
    // Try to merge BB with its successor (can modify Changed)

Previously removeUnreachableBlocks(F) removed unreachable blocks (and altered the reachable blocks) before starting its main while Changed loop. There are several places in JumpThreading where blocks were deleted or altered that didn't necessarily lead to threadable edges: DeleteDeadBlock, TryToSimplifyUncondBranchFromEmptyBlock, MergeBasicBlockIntoOnlyPred, and ConstantFoldTerminator. Any one of these calls could lead to unreachable regions being generated and they weren't being tracked before.

This patch works almost identical to the old algorithm:

// Generate Unreachable Blocks Set
while Changed:
  if BB in Unreachable:
    continue;
  for BB in F:
    ProcessBlock(BB);  // Can modify Changed
    // Try to delete BB (can modify Changed)
    // Try to merge BB with its successor (can modify Changed)

The big difference here is instead of actually removing the blocks we simply ignore them. But in both cases they are identified once before the main while Changed loop starts.

The only way I know of preventing any wasted work done on unreachable regions is to check if BB is to maintain a set of blocks that are reachable from entry. We'd need to verify this state with calls to DT on almost every iteration inside ProcessBlock. It's not practical as it greatly lengthens compile time. For large functions like those found in sqlite3.c I was seeing a slowdown of 20% or more to track the 3,000+ blocks in the main REPL function when I initially tried this approach.

If you have ideas to better test IR coverage for JumpThreading I'd be very interested in hearing about them.