- User Since
- Jun 30 2016, 1:50 PM (116 w, 18 h)
Aug 2 2018
Delayed LGTM. It's a good assert to know if someone tries to alter DelBB after it's marked for deletion.
LGTM. DTU should be the only way to do Lazy or Eager updates to DT.
LGTM if you have done a full test-suite correctness test on this patch.
OK by me. Probably won't save much compile time but it is a bit cleaner to read.
LGTM. I like how this revision of the patch is minimally invasive and most of the hard work is hidden in DTU. I strongly recommend running ninja check, ninja check-all, and a standard test-suite run on tip before committing.
Jul 12 2018
Gentle ping for @trentxintong .
Jul 11 2018
@NutshellySima I think you said that BasicBlocks that are deleted with deleteBB() aren't removed immediately and are held until a a flush() event, correct? I don't see any comments informing the users that this is the case. I consider it to be unexpected behavior and should be noted.
Jul 10 2018
Hello @NutshellySima , comment are again inline.
Jul 9 2018
Hello @NutshellySima , comments are inline.
(call delBB() still pends BasicBlock deletion until a flush event according to the doc).
This is an interesting idea as it gives callers a deferred deletion tool for Lazy and both trees are nullptr. It'd be worthwhile to add documentation somewhere to let others know about this. I know of at least one case when I was working on JumpThreading that the original code flushed LVI information about a block because it called an external routine that might delete the block (it was at the end of runImpl()). This feature gives others a simple way of calling potentially destructive calls, checking the return code, and cleaning up on success or failure.
Hello @NutshellySima , comments are inline.
Jul 6 2018
That is definitely not sufficient. The .ll test cases are only a small subset of real-world code and people only add to them to verify a new feature or to prove a bug is fixed. I recommend running test-suite to get a better representation of code out in the world. If you are uncertain how to do so please e-mail me and I'll provide detailed instructions. I'm hoping we can avoid stressful bug report debugging and LLVM code reverts/reapplies.
Hello @NutshellySima , my comments are inline below. These changes are a good real-world usage of the new DTU class and should help us find any bugs or deficiencies. I'm curious to know what kind of testing you've done on this code: ninja check, test-suite, and/or test-suite for compile time?
Jul 5 2018
Jul 2 2018
+1 LGTM. Thank you @NutshellySima for your dedication to getting this patch ready for adoption!
Jun 29 2018
Jun 28 2018
Jun 27 2018
If we agree that it would be better to have well-behaved updates in most of the places in the future, then having distinct names for these two should make it much easier to grep for. Additionally, it seems like it's not always possible to perform them with the eager strategy, is it?
Good point, I didn't think of the calls being easier to grep. Sounds good to me. And yes, I would prefer all updates to be strict and done after CFG updates, but reality isn't always so forgiving. Having explicit *Relaxed() calls can make them easier to identify and potentially fix as we go.
Thank you for continuing this work @NutshellySima .
Jun 26 2018
Jun 25 2018
+1. I'd like to see most, if not all, of the updates to PDT/DT to be done through the DTU interface applyUpdate() call. It simplifies method signatures and provides a consistent interface to analysis and updates.
Jun 22 2018
Added small fix to suggested code.
My comments are inline. I'm also curious to know what kind of testing you've done on this (ninja check, ninja check-all, any test-suite runs, etc). I'd like more testing if it's just ninja check: the included tests are a small sampling of real-world code and test-suite is a bit better for this kind of change.
Hi @trentxintong , I am just starting to review this diff and will get back to you with more meaningful comments soon. While I'm doing that could you tell me if this work is from a bug you discovered or for a new feature you'd like to see implemented. Looking at your .ll changes below makes me believe this is a new feature, but I'd like to be sure to know.
Jun 21 2018
Thank you for working on this @NutshellySima , a unified dominator interface will be most welcome in LLVM. Please see my inline comments.
This problem seems to be aggravated by the new SSAUpdaterBulk() patch pushed on May 12 ( D44282 ) since the PR37745.ll test case does not fail before that date. I don't understand the SSA updater change enough to understand why this problem only happens with the new bulk updates, @mzolotukhin do you happen to know why?
Jun 20 2018
I always prefer deterministic execution (all other things being equal). Have you done any testing of performance impact to this change? The adding and erasing of Dominator elements may cost more with how the SmallVector type performs the operations.
Mar 20 2018
I'm not a reviewer, but it LGTM. You might want to ask @kuhar to review too.
Mar 19 2018
Mar 16 2018
Comment cleanup and rebase. NFC.
Mar 15 2018
Thanks guys. I found a few spelling mistakes in the comments that I'm fixing and testing now. I'll upload the diff and push the change shortly if all goes well.
Mar 13 2018
Mar 12 2018
Gentle ping. Are there any objections to this patch? Should I open another Differential for this patch in preparation to commiting it?
Mar 8 2018
Addressed @kuhar 's comments.
@kuhar I attempted to use a ranged iterator for (auto &BB : F) loop and it causes crashes in test-suite.
FAILED: Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o /work/brzycki/test-suite/build/tools/timeit --summary Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.time /work/brzycki/upstream/install/bin/clang++ -DNDEBUG -I../Bitcode/Benchmarks/Halide/bilateral_grid -IBitcode/Benchmarks/Halide/bilateral_grid -O3 -O3 -DNDEBUG -w -Werror=date-time -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -MF Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.d -o Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -c ../Bitcode/Benchmarks/Halide/bilateral_grid/driver.cpp While deleting: label %lpad142
Mar 7 2018
Hi @kuhar , thanks for the review. As a general comment I tried to minimize changes to JumpThreading to remove removeUnreachableBlocks.
Renamed InvalidBlocks to UnreachableBlocks, minor tweaks to comments.
Mar 6 2018
Here are 4 ctmark runs on x86_64 of compile_time results. The tip compiler is llvm-project SHA 7a365bd12dd84be1e7c9ca6d9a2268b80f604ae9. The upstream compiler is the same as tip with this patch applied.
I have analyzed this failure and have what I believe (in @dberlin 's words) to be the smallest hammer fix:
$ git diff lib/Analysis/LazyValueInfo.cpp diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp index 65fd007dc0b..36e66f7f6ef 100644 --- a/llvm/lib/Analysis/LazyValueInfo.cpp +++ b/llvm/lib/Analysis/LazyValueInfo.cpp @@ -1145,9 +1145,17 @@ getValueFromConditionImpl(Value *Val, Value *Cond, bool isTrueDest, (!isTrueDest && BO->getOpcode() != BinaryOperator::Or)) return ValueLatticeElement::getOverdefined();
Feb 26 2018
+1 LGTM as well.
Feb 21 2018
This method is too expensive as it is designed today.
Feb 19 2018
+1 from me. I'd love to see a single interface for all Dominator-related analysis or updates. This idea reminds me (a bit) of a comment @dberlin made in D43173 about an interface similar to getBestSimplifyQuery to reduce the number of functions that need explicit pointers to a DT, DDT, PDT or other analysis object. I don't quite understand how this would work but I wanted to bring it up in case the work could be done in a way to anticipate such an interface to reduce re-work.
Feb 16 2018
Feb 15 2018
Feb 14 2018
Added asserts to enableDT / disableDT to verify both DT and DisabledDT are never both valid pointers. Added a final LVI->enabledDT() at the very end of runImpl to allow subsequent passes to use DT with the cached LVI object.
Feb 13 2018
IMHO, if you were going to go that route, LVI checks whether DT is null, and so you could wrap this all in a "UpdatedDTOrNull" object that, when cast to a DominatorTree *, returned the tree if it was up to date, or a null pointer if it was not.
It's a bit more subtle than that @dberlin due to the way LazyValueInfo is defined. Because the LazyValueInfoImpl object is cached and re-used it would store a pointer to DT (or nullptr) at the instance of creating the first cachced LVI instance. I ran into this problem with my first attempt to disable DT within LVI and is another reason I decided on the new external methods.
Use std::swap() in enableDT/DisableDT methods.
Rebased to tip, switched to always enabling or disabling before calling LVI analysis. Prevents LVI from being accidentally enabled.
Feb 12 2018
Hi @kuhar , is there a testcase that causes this to fail?
I used clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:
Six runs of tramp3d-v4 with this patch applied show comperable variance and similar low/high bounds:
The variance of tramp3d-v4 is considerable, as much as 18.5%. I ran CTMark on the pre-DDT compiler and here are the results:
I'm seeing considerable variance on the tramp3d-v4 compile time test before and after the preservation of DT on JumpThreading patch:
I haven't checked, but from my previous experiments I doubt it does -- forming or to infinite loops are very rare, and this piece of code was only executed a few times when compiling clang or sqlite. I don't have numbers for it, but could get them later this week if someone's really interested.
LGTM. Out of curiosity does this change impact compile time?
Feb 9 2018
@dberlin thank you for the suggestion and I have implemented the change. The good news is sqlite is back to our original performance. The not-so-great news is this causes a regression for tramp3d-v4:
Added a feature to LVI to enable or disable the use of the DominatorTree for analysis. Updated JumpThreading to disable the use of DT for LVI analysis when there are pending updates in the DDT.
Feb 7 2018
@kuhar and @dberlin I would appreciate any guidance you two may have. I spoke with @sebpop about this problem and we don't see a way around the performance impact to preserve DT for LVI analysis. As it stands now there are cases where the deferred updates change the LVI analysis sufficiently to cause a miscompile. We do think there may be ways to mitigate this loss in the future: one way is to prevent pointless threads where a BB is threaded only to be discarded in a subsequent loop iteration within JumpThreading.
Here is a comparison of a run of LLVM before the DDT patch was applied to JT vs upstream.
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.pre_dt/results.json build.upstream/results.json compile_time # left column: build.pre_dt/results.json # right column: build.upstream/results.json # metric name: compile_time 527.9440 -> 493.8600 [ 6.90%] CTMark/tramp3d-v4/tramp3d-v4.test 1203.8440 <- 1265.6800 [ 5.14%] CTMark/7zip/7zip-benchmark.test 881.3400 <- 924.1160 [ 4.85%] CTMark/Bullet/bullet.test 576.0160 <- 589.8000 [ 2.39%] CTMark/lencod/lencod.test 365.8560 <- 373.0960 [ 1.98%] CTMark/consumer-typeset/consumer-typeset.test 452.0200 <- 459.5960 [ 1.68%] CTMark/SPASS/SPASS.test 252.3400 <- 256.1360 [ 1.50%] CTMark/mafft/pairlocalalign.test 491.1680 <- 497.1560 [ 1.22%] CTMark/ClamAV/clamscan.test 382.8120 <- 387.2960 [ 1.17%] CTMark/kimwitu++/kc.test 336.8720 -> 336.3600 [ 0.15%] CTMark/sqlite3/sqlite3.test
Feb 6 2018
I'm seeing about 3% run variation on sqlite with the upstream compiler:
brzycki@cc01 /work/brzycki/test-suite $ ~/git/me/llvm/litdiff build.upstream/results.json build.upstream2/results.json compile_time # left column: build.upstream/results.json # right column: build.upstream2/results.json # metric name: compile_time 336.3600 <- 347.4120 [ 3.29%] CTMark/sqlite3/sqlite3.test 493.8600 -> 487.0560 [ 1.40%] CTMark/tramp3d-v4/tramp3d-v4.test 373.0960 -> 368.5280 [ 1.24%] CTMark/consumer-typeset/consumer-typeset.test 387.2960 <- 389.8320 [ 0.65%] CTMark/kimwitu++/kc.test 459.5960 <- 462.5440 [ 0.64%] CTMark/SPASS/SPASS.test 924.1160 -> 921.2680 [ 0.31%] CTMark/Bullet/bullet.test 497.1560 <- 498.5240 [ 0.28%] CTMark/ClamAV/clamscan.test 589.8000 -> 588.3840 [ 0.24%] CTMark/lencod/lencod.test 1265.6800 <- 1267.2280 [ 0.12%] CTMark/7zip/7zip-benchmark.test 256.1360 -> 256.1000 [ 0.01%] CTMark/mafft/pairlocalalign.test
@kuhar I had a bug in my test-suite runner script and just fixed it. The new results are require more analysis, particularly sqlite.
Feb 2 2018
Thank you @kuhar for the review. I ran CTmark (x86_64) on tip today and with this patch applied. The results are:
Jan 31 2018
I have a patch (that relies on my fix from D42717) to also fix this issue with little additional overhead. Because we must flush all pending DT transactions before using LVI analysis we can also check if the block is unreachable from entry. If that's the case we can immediately pessimize threads across the block. Here's the patch along with a modified version of D42717:
Jan 30 2018
Jan 29 2018
Jan 26 2018
Jan 23 2018
I too am curious how this managed to not be found for several years. @arsenm how did you discover this?
Jan 22 2018
LGTM. I just have one question about the patch, see inline.
Jan 19 2018
LGTM. I've thrown everything I can think of to attempt to trigger a DT regression and found no asserts().
Some files like sqlite3.c felt like they would never finish...
This code is superceded by D40146.
Jan 18 2018
The changes to GenericDomTreeConstruction.h look good to me and I verified the new unit test fails without the patch. Other than my issue with ddt-crash3.ll the patch looks good.
Your patch to GenericDomTreeConstruction.h passes ninja check-all on x86_64 as well as test-suite on x86_64. I used my version of ddt-crash3.ll posted above for the testing.
Hi @kuba, I was unable to recreate the crash with the version of ddt-crash3.ll included in this patch. Here's my version of ddt-crash3.ll that fails before the patch. I've also verified this version passes after your patch. I'll take my time and review the real patch later today.
Jan 17 2018
Added inline comment.
Or should we addif (!DDT->flush().isReachableFromEntry(BB)) continue;
That alone seems to solve the problem, even if I remove the fix in ProcessBlock.