- User Since
- Jun 30 2016, 1:50 PM (127 w, 6 d)
Mon, Dec 10
Awesome, thanks for doing this @amehsan .
Thu, Dec 6
Hi @amehsan , I think the patch looks mostly good but for a few small comments or variable names to enhance readability.
Thu, Nov 29
@kachkov98 thank you for helping to solve this bug. I agree with @vsk 's review comments and I'm most concerned about skipping PHIs with only one predecessor. I think it would be good to add code coverage testing in Utils/CodeExtractorTest.cpp explicitly testing your assumptions of the new code in severSplitPHINodesOfExits()
Mon, Nov 26
I just re-ran both to make sure it wasn't caused by other factors on the host. There's some movement in the <= 2% range but tramp3d-v4 is the same regression.
# left column: D54730/results.json # right column: tip/results.json # metric name: compile_time 43.7640 -> 39.6280 [ 10.44%] CTMark/tramp3d-v4/tramp3d-v4.test 26.4680 <- 27.0880 [ 2.34%] CTMark/sqlite3/sqlite3.test 130.5600 -> 128.1000 [ 1.92%] CTMark/7zip/7zip-benchmark.test 35.4600 <- 36.1000 [ 1.80%] CTMark/kimwitu++/kc.test 32.2400 -> 31.7520 [ 1.54%] CTMark/consumer-typeset/consumer-typeset.test 23.3600 <- 23.6480 [ 1.23%] CTMark/mafft/pairlocalalign.test 52.7280 -> 52.5520 [ 0.33%] CTMark/lencod/lencod.test 50.8120 <- 50.9560 [ 0.28%] CTMark/ClamAV/clamscan.test 44.1040 <- 44.2160 [ 0.25%] CTMark/SPASS/SPASS.test 86.8800 -> 86.8200 [ 0.07%] CTMark/Bullet/bullet.test
@efriedma I ran CTMark on the patch with the following patch as tip (both compilers are Release):
commit 9b6b09070e777106e6e91c15406e197c7d18e1fb (origin/master, origin/HEAD) Author: Ilya Biryukov <email@example.com> Date: Mon Nov 26 15:38:01 2018 +0000
Tue, Nov 20
Has anyone run CTMark test-suite numbers on this patch before and after? @efriedma is it possible to post the extreme C++ case to analyze exactly what's happening in the Dominator update routines and why?
Nov 9 2018
LGTM. Thanks again for doing this. :)
Nov 8 2018
LGTM and I'm glad to see other passes working to preserve DTU.
+1 for me as well. JumpThreading, as it stands today, is quite haphazard in how it attacks a function and anything that helps reduce time spent recomputing known values is desirable.
Oct 26 2018
Oct 25 2018
LGTM. I have no doubt someone will speak up if they detect a regression. :)
The issue I find with cl::opt here is that this is a generic class that may not necessarily have a corresponding cpp file -- there can be multiple clients to GenericDomTreeConstruction, each with different requirements. Because of that, it's not clear to me in which translation unit one would put cl::opt for the constant in.
Is that a solved problem somewhere else in LLVM? I'm not familiar with using cl::opt in header files.
Fair enough. I'm more used to working on function passes that have several cl::opt() lines at the top of the .cpp file. If we don't have a corresponding cpp it's non-trivial to add it as a command-line argument. I too have never tried to use cl::opt() in a .h.
Oct 24 2018
Gentle ping and a few comments. There are a few proprietary benchmarks that perform better with this pass and I'd like to see this (hopefully) moving towards a commit.
Hi @trentxintong , did you decide if this patch was useful?
I'd still like to see the constant as a hidden command line argument cl::opt(). Otherwise I think this is an excellent patch with considerable compile-time speedups. Nice work @NutshellySima .
Oct 16 2018
Nice work @sebpop. Here's a delayed "LGTM". :)
Oct 15 2018
Hi @hiraditya , comments inline.
Hi @hiraditya , comments inline.
Hi @hiraditya , pardon my ignorance in this part of llvm. Most of my comments are questions about your reasoning.
Ii might also be worth investigating to make these constant hidden command line parameters. For one it makes testing faster and it also gives users an emergency knob (the cl::opt() mechanism).
Oct 9 2018
Hi @sebpop, I have a few comments inline regarding the function determining side effect status.
Oct 2 2018
LGTM as well, provided @tejohnson 's test complete successfully.
Aug 2 2018
Delayed LGTM. It's a good assert to know if someone tries to alter DelBB after it's marked for deletion.
LGTM. DTU should be the only way to do Lazy or Eager updates to DT.
LGTM if you have done a full test-suite correctness test on this patch.
OK by me. Probably won't save much compile time but it is a bit cleaner to read.
LGTM. I like how this revision of the patch is minimally invasive and most of the hard work is hidden in DTU. I strongly recommend running ninja check, ninja check-all, and a standard test-suite run on tip before committing.
Jul 12 2018
Gentle ping for @trentxintong .
Jul 11 2018
@NutshellySima I think you said that BasicBlocks that are deleted with deleteBB() aren't removed immediately and are held until a a flush() event, correct? I don't see any comments informing the users that this is the case. I consider it to be unexpected behavior and should be noted.
Jul 10 2018
Hello @NutshellySima , comment are again inline.
Jul 9 2018
Hello @NutshellySima , comments are inline.
(call delBB() still pends BasicBlock deletion until a flush event according to the doc).
This is an interesting idea as it gives callers a deferred deletion tool for Lazy and both trees are nullptr. It'd be worthwhile to add documentation somewhere to let others know about this. I know of at least one case when I was working on JumpThreading that the original code flushed LVI information about a block because it called an external routine that might delete the block (it was at the end of runImpl()). This feature gives others a simple way of calling potentially destructive calls, checking the return code, and cleaning up on success or failure.
Hello @NutshellySima , comments are inline.
Jul 6 2018
That is definitely not sufficient. The .ll test cases are only a small subset of real-world code and people only add to them to verify a new feature or to prove a bug is fixed. I recommend running test-suite to get a better representation of code out in the world. If you are uncertain how to do so please e-mail me and I'll provide detailed instructions. I'm hoping we can avoid stressful bug report debugging and LLVM code reverts/reapplies.
Hello @NutshellySima , my comments are inline below. These changes are a good real-world usage of the new DTU class and should help us find any bugs or deficiencies. I'm curious to know what kind of testing you've done on this code: ninja check, test-suite, and/or test-suite for compile time?
Jul 5 2018
Jul 2 2018
+1 LGTM. Thank you @NutshellySima for your dedication to getting this patch ready for adoption!
Jun 29 2018
Jun 28 2018
Jun 27 2018
If we agree that it would be better to have well-behaved updates in most of the places in the future, then having distinct names for these two should make it much easier to grep for. Additionally, it seems like it's not always possible to perform them with the eager strategy, is it?
Good point, I didn't think of the calls being easier to grep. Sounds good to me. And yes, I would prefer all updates to be strict and done after CFG updates, but reality isn't always so forgiving. Having explicit *Relaxed() calls can make them easier to identify and potentially fix as we go.
Thank you for continuing this work @NutshellySima .
Jun 26 2018
Jun 25 2018
+1. I'd like to see most, if not all, of the updates to PDT/DT to be done through the DTU interface applyUpdate() call. It simplifies method signatures and provides a consistent interface to analysis and updates.
Jun 22 2018
Added small fix to suggested code.
My comments are inline. I'm also curious to know what kind of testing you've done on this (ninja check, ninja check-all, any test-suite runs, etc). I'd like more testing if it's just ninja check: the included tests are a small sampling of real-world code and test-suite is a bit better for this kind of change.
Hi @trentxintong , I am just starting to review this diff and will get back to you with more meaningful comments soon. While I'm doing that could you tell me if this work is from a bug you discovered or for a new feature you'd like to see implemented. Looking at your .ll changes below makes me believe this is a new feature, but I'd like to be sure to know.
Jun 21 2018
Thank you for working on this @NutshellySima , a unified dominator interface will be most welcome in LLVM. Please see my inline comments.
This problem seems to be aggravated by the new SSAUpdaterBulk() patch pushed on May 12 ( D44282 ) since the PR37745.ll test case does not fail before that date. I don't understand the SSA updater change enough to understand why this problem only happens with the new bulk updates, @mzolotukhin do you happen to know why?
Jun 20 2018
I always prefer deterministic execution (all other things being equal). Have you done any testing of performance impact to this change? The adding and erasing of Dominator elements may cost more with how the SmallVector type performs the operations.
Mar 20 2018
I'm not a reviewer, but it LGTM. You might want to ask @kuhar to review too.
Mar 19 2018
Mar 16 2018
Comment cleanup and rebase. NFC.
Mar 15 2018
Thanks guys. I found a few spelling mistakes in the comments that I'm fixing and testing now. I'll upload the diff and push the change shortly if all goes well.
Mar 13 2018
Mar 12 2018
Gentle ping. Are there any objections to this patch? Should I open another Differential for this patch in preparation to commiting it?
Mar 8 2018
Addressed @kuhar 's comments.
@kuhar I attempted to use a ranged iterator for (auto &BB : F) loop and it causes crashes in test-suite.
FAILED: Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o /work/brzycki/test-suite/build/tools/timeit --summary Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.time /work/brzycki/upstream/install/bin/clang++ -DNDEBUG -I../Bitcode/Benchmarks/Halide/bilateral_grid -IBitcode/Benchmarks/Halide/bilateral_grid -O3 -O3 -DNDEBUG -w -Werror=date-time -std=c++11 -MD -MT Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -MF Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o.d -o Bitcode/Benchmarks/Halide/bilateral_grid/CMakeFiles/halide_bilateral_grid.dir/driver.cpp.o -c ../Bitcode/Benchmarks/Halide/bilateral_grid/driver.cpp While deleting: label %lpad142
Mar 7 2018
Hi @kuhar , thanks for the review. As a general comment I tried to minimize changes to JumpThreading to remove removeUnreachableBlocks.
Renamed InvalidBlocks to UnreachableBlocks, minor tweaks to comments.
Mar 6 2018
Here are 4 ctmark runs on x86_64 of compile_time results. The tip compiler is llvm-project SHA 7a365bd12dd84be1e7c9ca6d9a2268b80f604ae9. The upstream compiler is the same as tip with this patch applied.
I have analyzed this failure and have what I believe (in @dberlin 's words) to be the smallest hammer fix:
$ git diff lib/Analysis/LazyValueInfo.cpp diff --git a/llvm/lib/Analysis/LazyValueInfo.cpp b/llvm/lib/Analysis/LazyValueInfo.cpp index 65fd007dc0b..36e66f7f6ef 100644 --- a/llvm/lib/Analysis/LazyValueInfo.cpp +++ b/llvm/lib/Analysis/LazyValueInfo.cpp @@ -1145,9 +1145,17 @@ getValueFromConditionImpl(Value *Val, Value *Cond, bool isTrueDest, (!isTrueDest && BO->getOpcode() != BinaryOperator::Or)) return ValueLatticeElement::getOverdefined();
Feb 26 2018
+1 LGTM as well.
Feb 21 2018
This method is too expensive as it is designed today.
Feb 19 2018
+1 from me. I'd love to see a single interface for all Dominator-related analysis or updates. This idea reminds me (a bit) of a comment @dberlin made in D43173 about an interface similar to getBestSimplifyQuery to reduce the number of functions that need explicit pointers to a DT, DDT, PDT or other analysis object. I don't quite understand how this would work but I wanted to bring it up in case the work could be done in a way to anticipate such an interface to reduce re-work.
Feb 16 2018
Feb 15 2018
Feb 14 2018
Added asserts to enableDT / disableDT to verify both DT and DisabledDT are never both valid pointers. Added a final LVI->enabledDT() at the very end of runImpl to allow subsequent passes to use DT with the cached LVI object.
Feb 13 2018
IMHO, if you were going to go that route, LVI checks whether DT is null, and so you could wrap this all in a "UpdatedDTOrNull" object that, when cast to a DominatorTree *, returned the tree if it was up to date, or a null pointer if it was not.
It's a bit more subtle than that @dberlin due to the way LazyValueInfo is defined. Because the LazyValueInfoImpl object is cached and re-used it would store a pointer to DT (or nullptr) at the instance of creating the first cachced LVI instance. I ran into this problem with my first attempt to disable DT within LVI and is another reason I decided on the new external methods.
Use std::swap() in enableDT/DisableDT methods.
Rebased to tip, switched to always enabling or disabling before calling LVI analysis. Prevents LVI from being accidentally enabled.
Feb 12 2018
Hi @kuhar , is there a testcase that causes this to fail?
I used clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:
Six runs of tramp3d-v4 with this patch applied show comperable variance and similar low/high bounds:
The variance of tramp3d-v4 is considerable, as much as 18.5%. I ran CTMark on the pre-DDT compiler and here are the results:
I'm seeing considerable variance on the tramp3d-v4 compile time test before and after the preservation of DT on JumpThreading patch:
I haven't checked, but from my previous experiments I doubt it does -- forming or to infinite loops are very rare, and this piece of code was only executed a few times when compiling clang or sqlite. I don't have numbers for it, but could get them later this week if someone's really interested.
LGTM. Out of curiosity does this change impact compile time?
Feb 9 2018
@dberlin thank you for the suggestion and I have implemented the change. The good news is sqlite is back to our original performance. The not-so-great news is this causes a regression for tramp3d-v4:
Added a feature to LVI to enable or disable the use of the DominatorTree for analysis. Updated JumpThreading to disable the use of DT for LVI analysis when there are pending updates in the DDT.
Feb 7 2018
@kuhar and @dberlin I would appreciate any guidance you two may have. I spoke with @sebpop about this problem and we don't see a way around the performance impact to preserve DT for LVI analysis. As it stands now there are cases where the deferred updates change the LVI analysis sufficiently to cause a miscompile. We do think there may be ways to mitigate this loss in the future: one way is to prevent pointless threads where a BB is threaded only to be discarded in a subsequent loop iteration within JumpThreading.