- User Since
- Jun 30 2016, 1:50 PM (89 w, 4 d)
I'm not a reviewer, but it LGTM. You might want to ask @kuhar to review too.
Fri, Mar 16
Comment cleanup and rebase. NFC.
Thu, Mar 15
Thanks guys. I found a few spelling mistakes in the comments that I'm fixing and testing now. I'll upload the diff and push the change shortly if all goes well.
Tue, Mar 13
Mon, Mar 12
Gentle ping. Are there any objections to this patch? Should I open another Differential for this patch in preparation to commiting it?
Thu, Mar 8
Addressed @kuhar 's comments.
@kuhar I attempted to use a ranged iterator for (auto &BB : F) loop and it causes crashes in test-suite.
FAILED: Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o /work/brzycki/test-suite/build/tools/timeit --summary Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.time /work/brzycki/upstream/install/bin/clang++ -DNDEBUG -I../Bitcode/Benchmarks/Halide/bilateral_grid -IBitcode/Benchmarks/Halide/bilateral_grid -O3 -O3 -DNDEBUG -w -Werror=date-time -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -MF Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.d -o Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -c ../Bitcode/Benchmarks/Halide/bilateral_grid/driver.cpp While deleting: label %lpad142
Wed, Mar 7
Hi @kuhar , thanks for the review. As a general comment I tried to minimize changes to JumpThreading to remove removeUnreachableBlocks.
Renamed InvalidBlocks to UnreachableBlocks, minor tweaks to comments.
Tue, Mar 6
Here are 4 ctmark runs on x86_64 of compile_time results. The tip compiler is llvm-project SHA 7a365bd12dd84be1e7c9ca6d9a2268b80f604ae9. The upstream compiler is the same as tip with this patch applied.
I have analyzed this failure and have what I believe (in @dberlin 's words) to be the smallest hammer fix:
$ git diff lib/Analysis/LazyValueInfo.cpp diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp index 65fd007dc0b..36e66f7f6ef 100644 --- a/llvm/lib/Analysis/LazyValueInfo.cpp +++ b/llvm/lib/Analysis/LazyValueInfo.cpp @@ -1145,9 +1145,17 @@ getValueFromConditionImpl(Value *Val, Value *Cond, bool isTrueDest, (!isTrueDest && BO->getOpcode() != BinaryOperator::Or)) return ValueLatticeElement::getOverdefined();
Mon, Feb 26
+1 LGTM as well.
Wed, Feb 21
This method is too expensive as it is designed today.
Mon, Feb 19
+1 from me. I'd love to see a single interface for all Dominator-related analysis or updates. This idea reminds me (a bit) of a comment @dberlin made in D43173 about an interface similar to getBestSimplifyQuery to reduce the number of functions that need explicit pointers to a DT, DDT, PDT or other analysis object. I don't quite understand how this would work but I wanted to bring it up in case the work could be done in a way to anticipate such an interface to reduce re-work.
Feb 16 2018
Feb 15 2018
Feb 14 2018
Added asserts to enableDT / disableDT to verify both DT and DisabledDT are never both valid pointers. Added a final LVI->enabledDT() at the very end of runImpl to allow subsequent passes to use DT with the cached LVI object.
Feb 13 2018
IMHO, if you were going to go that route, LVI checks whether DT is null, and so you could wrap this all in a "UpdatedDTOrNull" object that, when cast to a DominatorTree *, returned the tree if it was up to date, or a null pointer if it was not.
It's a bit more subtle than that @dberlin due to the way LazyValueInfo is defined. Because the LazyValueInfoImpl object is cached and re-used it would store a pointer to DT (or nullptr) at the instance of creating the first cachced LVI instance. I ran into this problem with my first attempt to disable DT within LVI and is another reason I decided on the new external methods.
Use std::swap() in enableDT/DisableDT methods.
Rebased to tip, switched to always enabling or disabling before calling LVI analysis. Prevents LVI from being accidentally enabled.
Feb 12 2018
Hi @kuhar , is there a testcase that causes this to fail?
I used clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:
Six runs of tramp3d-v4 with this patch applied show comperable variance and similar low/high bounds:
The variance of tramp3d-v4 is considerable, as much as 18.5%. I ran CTMark on the pre-DDT compiler and here are the results:
I'm seeing considerable variance on the tramp3d-v4 compile time test before and after the preservation of DT on JumpThreading patch:
I haven't checked, but from my previous experiments I doubt it does -- forming or to infinite loops are very rare, and this piece of code was only executed a few times when compiling clang or sqlite. I don't have numbers for it, but could get them later this week if someone's really interested.
LGTM. Out of curiosity does this change impact compile time?
Feb 9 2018
@dberlin thank you for the suggestion and I have implemented the change. The good news is sqlite is back to our original performance. The not-so-great news is this causes a regression for tramp3d-v4:
Added a feature to LVI to enable or disable the use of the DominatorTree for analysis. Updated JumpThreading to disable the use of DT for LVI analysis when there are pending updates in the DDT.
Feb 7 2018
@kuhar and @dberlin I would appreciate any guidance you two may have. I spoke with @sebpop about this problem and we don't see a way around the performance impact to preserve DT for LVI analysis. As it stands now there are cases where the deferred updates change the LVI analysis sufficiently to cause a miscompile. We do think there may be ways to mitigate this loss in the future: one way is to prevent pointless threads where a BB is threaded only to be discarded in a subsequent loop iteration within JumpThreading.
Here is a comparison of a run of LLVM before the DDT patch was applied to JT vs upstream.
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt/results.json build.upstream/results.json compile_time # left column: build.pre_dt/results.json # right column: build.upstream/results.json # metric name: compile_time 527.9440 -> 493.8600 [ 6.90%] CTMark/tramp3d-v4/tramp3d-v4.test 1203.8440 <- 1265.6800 [ 5.14%] CTMark/7zip/7zip-benchmark.test 881.3400 <- 924.1160 [ 4.85%] CTMark/Bullet/bullet.test 576.0160 <- 589.8000 [ 2.39%] CTMark/lencod/lencod.test 365.8560 <- 373.0960 [ 1.98%] CTMark/consumer-typeset/consumer-typeset.test 452.0200 <- 459.5960 [ 1.68%] CTMark/SPASS/SPASS.test 252.3400 <- 256.1360 [ 1.50%] CTMark/mafft/pairlocalalign.test 491.1680 <- 497.1560 [ 1.22%] CTMark/ClamAV/clamscan.test 382.8120 <- 387.2960 [ 1.17%] CTMark/kimwitu++/kc.test 336.8720 -> 336.3600 [ 0.15%] CTMark/sqlite3/sqlite3.test
Feb 6 2018
I'm seeing about 3% run variation on sqlite with the upstream compiler:
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.upstream2/results.json compile_time # left column: build.upstream/results.json # right column: build.upstream2/results.json # metric name: compile_time 336.3600 <- 347.4120 [ 3.29%] CTMark/sqlite3/sqlite3.test 493.8600 -> 487.0560 [ 1.40%] CTMark/tramp3d-v4/tramp3d-v4.test 373.0960 -> 368.5280 [ 1.24%] CTMark/consumer-typeset/consumer-typeset.test 387.2960 <- 389.8320 [ 0.65%] CTMark/kimwitu++/kc.test 459.5960 <- 462.5440 [ 0.64%] CTMark/SPASS/SPASS.test 924.1160 -> 921.2680 [ 0.31%] CTMark/Bullet/bullet.test 497.1560 <- 498.5240 [ 0.28%] CTMark/ClamAV/clamscan.test 589.8000 -> 588.3840 [ 0.24%] CTMark/lencod/lencod.test 1265.6800 <- 1267.2280 [ 0.12%] CTMark/7zip/7zip-benchmark.test 256.1360 -> 256.1000 [ 0.01%] CTMark/mafft/pairlocalalign.test
@kuhar I had a bug in my test-suite runner script and just fixed it. The new results are require more analysis, particularly sqlite.
Feb 2 2018
Thank you @kuhar for the review. I ran CTmark (x86_64) on tip today and with this patch applied. The results are:
Jan 31 2018
I have a patch (that relies on my fix from D42717) to also fix this issue with little additional overhead. Because we must flush all pending DT transactions before using LVI analysis we can also check if the block is unreachable from entry. If that's the case we can immediately pessimize threads across the block. Here's the patch along with a modified version of D42717:
Jan 30 2018
Jan 29 2018
Jan 26 2018
Jan 23 2018
I too am curious how this managed to not be found for several years. @arsenm how did you discover this?
Jan 22 2018
LGTM. I just have one question about the patch, see inline.
Jan 19 2018
LGTM. I've thrown everything I can think of to attempt to trigger a DT regression and found no asserts().
Some files like sqlite3.c felt like they would never finish...
This code is superceded by D40146.
Jan 18 2018
The changes to GenericDomTreeConstruction.h look good to me and I verified the new unit test fails without the patch. Other than my issue with ddt-crash3.ll the patch looks good.
Your patch to GenericDomTreeConstruction.h passes ninja check-all on x86_64 as well as test-suite on x86_64. I used my version of ddt-crash3.ll posted above for the testing.
Hi @kuba, I was unable to recreate the crash with the version of ddt-crash3.ll included in this patch. Here's my version of ddt-crash3.ll that fails before the patch. I've also verified this version passes after your patch. I'll take my time and review the real patch later today.
Jan 17 2018
Added inline comment.
Or should we addif (!DDT->flush().isReachableFromEntry(BB)) continue;
That alone seems to solve the problem, even if I remove the fix in ProcessBlock.
Jan 16 2018
Jan 12 2018
Preparing to commit to SVN.
Fixed punctuation in unit test comment. NFC.
Update nits in the unit test for clarity.
@kuba Thank you again for the careful review. I'm uploading a new diff shortly. If you could give it a quick LGTM I'll try to push the patch to tip again today.
Jan 11 2018
Fixed comments in unit test.
Jan 10 2018
Fixed Chrome build patch with suggestions from @kuhar . Added a first pass at DDT unit tests.
Jan 9 2018
I'm OK with reapplying this, but when it causes a couple more crashes I'd strongly consider some more exhaustive testing approach.
Thanks @kuhar for the review. I agree with you that DDT should have a unittest module and I've started work on it. I have the skeleton in place and am working on the actual tests now. I'm still trying to determine the set of unique CFG shapes/mutations to test for.
Jan 8 2018
@kuhar if you have the time I would appreciate your feedback on the fix and if you're still ok with the patch.
Fxed assert() caused when compiling Chrome.
Jan 5 2018
Thank you @rnk for reducing the test case. I don't think you need to go further, I'll convert to llvm bytecode inside gdb and reduce with bugpoint once I verify the crash locally. Once there I'll post the .ll.
The patch was reverted in r321832 by @rnk for the following reason:
I accidentally re-opened the wrong diff.
Jan 4 2018
Jan 3 2018
Getting patch ready to commit again. Rebased to llvm tip, rancheck, check-all, and test-suite on x86_64. Also passed building with LLVM_ENABLE_MODULES=1.
Jan 2 2018
Dec 28 2017
It looks ok to me. I’d prefer the comments reflect the overloaded method parameters but it’s not necessary.
Dec 18 2017
Update patch with a minimal class definition of DeferredDominance in Dominators.h. Also removed an unnecessary #include of Dominators.h in JumpThreading.h
I wonder what caused the original cycle. I'm not entirely sure how modules are built, but have you tried getting rid of the include in Local.h? If that's not enough to break the cycle, I don't have any idea better than yours.
It's a bit of a mystery to me how -fmodules actually works. I am getting cyclic errors when compiling a file that does not include either of Dominators.h or DeferredDominance.h. I have emailed @rsmith requesting his expertise if he can help us understand this.
Dec 15 2017
Dec 14 2017
Here is another failure linking it directly to DeferredDominance.h:
Refactored patch to prevent -fmodules circular dependency errors.
This patch caused a build break when attempting to use -fmodules due to a cyclic header dependency chain. I've refactored the patch to no longer use DeferredDominance.h (the code now lives as a class in Dominators.h). I also moved one of the methods into Dominators.cpp that needed Constants.h and Instructions.h. @sebpop and @kuhar could you please review again?