- User Since
- Jul 15 2015, 7:49 AM (192 w, 3 d)
Wed, Feb 27
If you think you there's enough evidence that is_permutation can cause a problems here, go ahead with the current patch.
It seems it would be best to create a small helper function that does permutation check using std::is_permuatation for small arrays and uses SmallPtrSet for the rest.
Feb 20 2019
I think the default should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.
The best method I came up with is to instrument the updater primitives (applyUpdates, insertEdge, deleteEdge, recalculate, etc.), and measure time within the running optimizer. It should be best to get some big bitcode file and run opt with O3 on it. clang is reasonably sized in this configuration, you should also have luck with webassembly and rippled. If you are with google, you can try some bigger internal targets (there is a script to compile a given target to bitcode).
Great, thanks for putting effort into this and the related patches
Feb 19 2019
Awesome, so it seems like we have ~4.5% improvement. Thanks for running the experiments.
As a sanity check, to account for possible bugs somewhere else, have you tried adding an assertion ensuring that levels don't change in the function?
Roots.size() is only > 1 for the PostDominatorTree, and I'd expect it to be small (< 100) on virtually all functions. is_permulation was used in the original code because it doesn't allocate and should be cheaper for small trees.
Feb 18 2019
Thanks for the fixes and explanations. Seems good to me now.
@MaskRay, awesome, thanks for looking into this.
Feb 17 2019
Another idea for benchmarking is to count the number of iterations of loops that you expect to be affected by your patches and making sure it decreases, as an indirect measure of running time improvement.
Thanks for the comments, it looks amazing now.
This looks like a fantastic improvement! The only issue is that the code, both original and after your changes, lacks comments.
Feb 14 2019
Feb 13 2019
I believe that a real fix would be to always create a virtual root in DomTree and support fast rerooting, but that requires much more implementation and testing effort.
Seems like a nice improvement!
Looks better now, found just a couple of nits.
Feb 7 2019
Nice, thanks for working on this!
Feb 6 2019
Feb 5 2019
Jan 30 2019
@NutshellySima What was the reason we introduced DTU.insertEdge and DTU.deleteEdge? While they are essential primitives in the incremental updater, couldn't we get rid of it entirely? My memory is blurry.
@mkazantsev Can you replace insert/deleteEdge with a bundle of updates using DTU.applyUpdates? Just eyeballing the code, it's very likely you are misusing the API by not respecting the contract of insertEdge/deleteEdge -- like Alina explained in the other thread.
Jan 7 2019
Exactly. Given a set of CFG updates, if figures out how to change the DomTree and PostDomTree such that they match the CFG.
Jan 3 2019
@anna Can this be rewritten to use the DomTreeUpdater utility? We have large number of examples in the codebase already, so the question is whether this affects the performance in this case. Historically, we had tons of bugs in manual DomTree updates and I'd rather we replaces as much as possible using the incremental updater.
Nov 20 2018
Nov 19 2018
We experimented with different ordering of updates before, including scheduling deletions before insertions, but didn't discover any promising strategies. Because of that, updates are performed in the exact order they were scheduled (except optimizing out trivially redundant ones).
Nov 13 2018
Nov 9 2018
Nov 8 2018
Looks good to me now, thanks for the fixes! You may want for somebody else to take a look at it, though.
Oct 25 2018
LGTM. Thanks for looking into this!
Oct 24 2018
Oct 16 2018
Awesome, thanks for the numbers. It's much easier to reason about it this way. But I'm a bit confused by why you measured it like this. Can we have statistics with respect to nodes visited during DFS? That should be highly correlated and give us more confidence in the measured times.
I think maybe "40" is a better choice?
30 seems best to me :)
Oct 15 2018
I think that for small n it shouldn't recalculate when k < n.
I'd be very interested to see the numbers for these constants.
OK, makes sense if that the case in the bug report.
I'd be much more comfortable if this change didn't always recalculate for graphs smaller than 75. I think there is a lot of value in making sure the incremental updater is behaving correctly on small examples (e.g., existing unit tests, most regression tests, etc.). If for some reason it stops working, it will be much easier to detect and fix it on these small examples.
Oct 13 2018
Thanks for the detailed explanation. This is what I expected. I don't it's enough to use 650 in the code without any explanation or justification -- even if it works in practise, I think it deserves a comment.
- It is definitely better to get it related to the tree size but I'm lack of inputs with different CFG sizes to experiment on.
You can take intermediate bitcode files when doing full-LTO built using lld. I think the command that does the trick is --save-temps=DIR or similar.
- I acknowledged this as an issue of the incremental updater not handling a huge update vector correctly; It definitely should fallback to recalculating if it is much faster. I believe this constant is related to the size of updates after legalization so I put it in the incremental updater. [Currently, there is a similar functionality in DTU which is Ω(n^2) and I believe it is slow and (redundant if JT submits updates normally) so I am considering removing it from DTU, then it is impossible to know the size of the legalized update vector]. Besides, it can also benefit other places not using the DomTreeUpdater.
I see. If we expect to see many redundant updates, then it makes sense to check that after legalization.
Thanks for looking into this. A couple of high-level questions:
- how was the constant "650" picked? Is it just playing with different values and choosing something that seemed reasonable?
- why is the threshold independent of the tree size? Intuitively the policy could be to recalculate if the number of updates is greater than some multiple of size. Maybe we can refer to the average-case complexity from Lucuas' paper?
- why do we want to put it in the incremental updater instead of the DomTreeUpdater class?
Sep 24 2018
Sep 21 2018
Sep 4 2018
Thank you for the comments, Richard!
Aug 24 2018
Rename missed CallInlineKind parameters to CIK
Fix naming and add more tests.
One extra question: do you think this deserves a separate RFC?
Aug 23 2018
Disclaimer: I'm a Clang newbie and I'm not sure if that's a good way to implement these intrinsics. I'm not sure about the following things:
- The new enum CallInlineKind may not be in the right place
- Not sure if adding the extra parameter to EmitSomething methods is the cleanest thing possible -- some functions have it now, some don't, and it makes argument lists longer
- Not sure if just adding an extra attribute at each callsite is enough -- I'm not sure what is supposed to happen when there are conflicting inline attributes at call sites and function declarations (e.g. __builtin_always_inline and __attribute__ ((always_inline)))
- I don't know how to handle constexpr functions
- I don't know how to implement it for constructor calls
Aug 20 2018
Aug 17 2018
Aug 16 2018
Aug 14 2018
Aug 13 2018
Thanks for the changes. LGTM.
Aug 10 2018
Aug 9 2018
Sure. The children are being reversed to preserve the successor order in DFS, which uses stack and reverses them (again) internally. But this reverse can be moved somewhere else and you don't have to worry about it.
Aug 8 2018
One thing that is not clear to me is how you decide whether to apply the updates on top of the current CFG or to reverse-apply them. DomTreeConstruction needs the latter, but I don't know what makes sense in your use-case.
Aug 2 2018
Thanks for the changes!
Looks good, thanks for the patch!
Nit: It's removal and not deprecation.
Looks good otherwise.
Looks fine except for the test IR.