Page MenuHomePhabricator

kuhar (Jakub Kuderski)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 15 2015, 7:49 AM (213 w, 6 d)

Recent Activity

Jul 2 2019

kuhar added a comment to D62619: [analyzer][IDF] Add a control dependency calculator + a new debug checker.

The non-static-analyzer bits look good to me, I added a few nits.

Jul 2 2019, 7:46 AM · Restricted Project, Restricted Project

Jun 26 2019

kuhar accepted D63389: [IDF] Generalize IDFCalculator to be used with Clang's CFG.

Looks good now. I added a few nits inline.

Jun 26 2019, 11:13 AM · Restricted Project

Jun 19 2019

kuhar added a comment to D63389: [IDF] Generalize IDFCalculator to be used with Clang's CFG.

@Szelethus

That wasn't going to happen, since the GraphDiff object has to be used in the getChildren function. I made llvm::IDFCalculator::getChildren virtual, is this okay?

I also have a local branch on which I used std::function (had to be used in order to pass a lambda with a non-empty capture list) and kept everything non-virtual, but I read somewhere that std::function is heavy-weight. Which solution is better, if any?

I'd expect the getChildren to be potentially performance-sensitive and am worried about switching it into a virtual call or another indirection through std::function. Do you have some evidence that this does not affect performance?

Jun 19 2019, 8:26 AM · Restricted Project

Jun 17 2019

kuhar added a comment to D63389: [IDF] Generalize IDFCalculator to be used with Clang's CFG.

Thanks!

Looks great, I'm only not sure about the GraphDiff removal.

Practically revert D50675. This was done because GraphDiff is only specialized for BasicBlock, and wouldn't be used for clang::CFGBlcok. Interestingly, it wasn't even used at all: In D45299, it was used at a certain point (https://reviews.llvm.org/D45299?id=160698) but was ultimately removed. Is this okay?

@asbirlea?

If having a virtual function isn't too painful, it'd be easy to reimplement it.

Jun 17 2019, 8:25 AM · Restricted Project
kuhar added a comment to D63389: [IDF] Generalize IDFCalculator to be used with Clang's CFG.

Looks great, I'm only not sure about the GraphDiff removal.

Jun 17 2019, 8:17 AM · Restricted Project
kuhar added inline comments to D62619: [analyzer][IDF] Add a control dependency calculator + a new debug checker.
Jun 17 2019, 8:10 AM · Restricted Project, Restricted Project

May 29 2019

kuhar added a comment to D62619: [analyzer][IDF] Add a control dependency calculator + a new debug checker.

You can easily get CD by calculating the PostDominanceFrontier. LLVM implements a templated IDF (Iterated Dominance Frontier) construction.
A native implementation for llvm ir for reference, if you need:

May 29 2019, 12:27 PM · Restricted Project, Restricted Project
kuhar added inline comments to D62611: [analyzer][Dominators] Add unittests.
May 29 2019, 10:56 AM · Restricted Project, Restricted Project
kuhar added inline comments to D62611: [analyzer][Dominators] Add unittests.
May 29 2019, 10:56 AM · Restricted Project, Restricted Project

May 28 2019

kuhar added inline comments to D62551: [analyzer][Dominator] Add post dominator tree builder for the CFG + a debug checker.
May 28 2019, 3:02 PM · Restricted Project, Restricted Project
kuhar added inline comments to D62551: [analyzer][Dominator] Add post dominator tree builder for the CFG + a debug checker.
May 28 2019, 2:59 PM · Restricted Project, Restricted Project
kuhar added inline comments to D62551: [analyzer][Dominator] Add post dominator tree builder for the CFG + a debug checker.
May 28 2019, 2:17 PM · Restricted Project, Restricted Project
kuhar accepted D62507: [Dominators] PR42041: Skip nullpointer successors.

Thanks, I think this is a much better solution than directly hacking on DomTreeBuilder.

May 28 2019, 8:39 AM · Restricted Project, Restricted Project
kuhar added inline comments to D62507: [Dominators] PR42041: Skip nullpointer successors.
May 28 2019, 8:27 AM · Restricted Project, Restricted Project
kuhar requested changes to D62507: [Dominators] PR42041: Skip nullpointer successors.

Like I mentioned in another thread, I really dislike this approach. The single source of truth about graph traversal in dominators is the ChildGetter structure that internally uses children and inverse_children, and expects them to only return pointers to actual successors/predecessors. It's very weird that children and inverse_children return nullptrs -- they are not valid graph nodes, and I don't see why this would be desired, despite how unreachable regions are represented in the CFG.

May 28 2019, 8:09 AM · Restricted Project, Restricted Project

Apr 29 2019

kuhar accepted D61055: [LoopSimplifyCFG] Suppress expensive DomTree verification in LoopSimplifyCFG.
Apr 29 2019, 4:24 AM · Restricted Project
kuhar added a comment to D61055: [LoopSimplifyCFG] Suppress expensive DomTree verification in LoopSimplifyCFG.

I claim that it's better to make it obvious that these checks are slow, and make developers either guard them with EXPENSIVE_CHECKS or opt-in with Fast.

Do you mean removing the default entirely? To makes it explicit.

Whatever we do in the long run needn't be done here, and making this Fast and behind expensive checks is a definite improvement. (I think Full is still more expensive than necessary, even for an expensive checks build).

Apr 29 2019, 4:22 AM · Restricted Project

Apr 28 2019

kuhar added a comment to D61055: [LoopSimplifyCFG] Suppress expensive DomTree verification in LoopSimplifyCFG.

Yes, I think expensive checks are sufficient. If you think there's a strong need to assert in debug builds, maybe try the Fast level for now and remove it after a few weeks, once you gain confidence in the code or someone complains the check is too slow?

Apr 28 2019, 9:18 PM · Restricted Project
kuhar added a comment to D61055: [LoopSimplifyCFG] Suppress expensive DomTree verification in LoopSimplifyCFG.

But I was not aware that Fast used to be the default, and was still quite slow. I was under the impression that the problems in the past were to do with it using Full under expensive checks, and that Fast was fairly quick (quick enough for an asserts build).

It of course depends on exactly location, but we found that Fast is generally not quick enough to be run during a typical loop pass (on large files, or loopy code from non-clang frontends). I claim that it's better to make it obvious that these checks are slow, and make developers either guard them with EXEPNSIVE_CHECKS or opt-in with Fast.

Apr 28 2019, 1:40 PM · Restricted Project
kuhar added a comment to D61055: [LoopSimplifyCFG] Suppress expensive DomTree verification in LoopSimplifyCFG.

Hello. I think it's fine to always use DominatorTree::VerificationLevel::Fast, the others are more about checking that the DomTree construction code was correct. Fast will compare against a new tree, so for updates will check the the tree is correct.

may be we have to set Fast by default instead of Full?

bool DominatorTreeBase<>::verify(VerificationLevel VL = VerificationLevel::Full) const {
Apr 28 2019, 11:44 AM · Restricted Project

Apr 17 2019

kuhar added a comment to D60755: [NFCI] Improve efficiency of isPotentiallyReachableFromManyDomTree..

IMO you could introduce df_iterator_tree_storage by a slight modification to the default one:

// df_iterator_storage - A private class which is used to figure out where to
// store the visited set.
template<class SetType, bool External>   // Non-external set
class df_iterator_storage {
public:
  SetType Visited;
};
Apr 17 2019, 8:55 AM · Restricted Project
kuhar added a comment to D60754: Add forEachDescendant to DominatorTree..

OK, I can see that this is exactly what you want to avoid in the related revision. In this case, isn't it possible to provide a dummy storage to DepthFirstIterator and use it to efficiently traverse trees?

Apr 17 2019, 8:54 AM · Restricted Project
kuhar added a comment to D60754: Add forEachDescendant to DominatorTree..

Would it make sense to use one of the llvm graph iterators and have descendants_begin() and descendants_end() functions instead?

Apr 17 2019, 8:47 AM · Restricted Project

Feb 27 2019

kuhar accepted D58373: [Dominators] Avoid potentially quadratic std::is_permutation.
Feb 27 2019, 8:28 PM · Restricted Project
kuhar added a comment to D58373: [Dominators] Avoid potentially quadratic std::is_permutation.

If you think you there's enough evidence that is_permutation can cause a problems here, go ahead with the current patch.

Feb 27 2019, 7:39 PM · Restricted Project
kuhar added a comment to D58373: [Dominators] Avoid potentially quadratic std::is_permutation.

Roots.size() is only > 1 for the PostDominatorTree, and I'd expect it to be small (< 100) on virtually all functions. is_permulation was used in the original code because it doesn't allocate and should be cheaper for small trees.

Do you have some statistics on the size of Roots? At what size does SmallPtrSet start to outperform is_permuatation?

Thanks for the empirical data (<100). I find SmallPtrSet<void *, 4> starts to outperform std::is_permutation when there are about 16 elements on my machine. It may be 2x slower (should be larger, but I put it in a loop and the loop itself has some costs) when there is only one root. So the benefit to use a SmallPtrSet here is just to filter out some unlikely cases (>1000 roots) and to clear up my doubts when I see that verifyRoots claims to be O(N)...

It seems it would be best to create a small helper function that does permutation check using std::is_permuatation for small arrays and uses SmallPtrSet for the rest.

Feb 27 2019, 7:33 AM · Restricted Project

Feb 20 2019

kuhar added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

I think the default for applyUpdates should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

I'm fine with making applyUpdates default to strict mode to make it consistent to other parts of the codebase and encourage people to use the strict one which does not have extra overhead on scanning the CFG.

But if the user follows the rules on using applyUpdatesRelaxed/Checked, DTU is also guaranteed to have precise information on updates.

Feb 20 2019, 11:03 AM · Restricted Project
kuhar added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

I think the default should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

Feb 20 2019, 10:25 AM · Restricted Project
kuhar added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

And how can I benchmark dynamic dominator tree? CalculateFromScratch is easy to benchmark as I can just repeatedly run the function analysis domtree and discard the result but I don't know how for the dynamic case...

The best method I came up with is to instrument the updater primitives (applyUpdates, insertEdge, deleteEdge, recalculate, etc.), and measure time within the running optimizer. It should be best to get some big bitcode file and run opt with O3 on it. clang is reasonably sized in this configuration, you should also have luck with webassembly and rippled. If you are with google, you can try some bigger internal targets (there is a script to compile a given target to bitcode).

Feb 20 2019, 10:04 AM · Restricted Project
kuhar accepted D58443: [DTU] Deprecate insertEdge*/deleteEdge*.

Great, thanks for putting effort into this and the related patches

Feb 20 2019, 5:52 AM · Restricted Project

Feb 19 2019

kuhar removed a reviewer for D58425: Fix the build with gcc 8.x when `-Wredundant-decls` is passed: kuhar.
Feb 19 2019, 8:32 PM · Restricted Project, Restricted Project
kuhar accepted D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

Awesome, so it seems like we have ~4.5% improvement. Thanks for running the experiments.

Feb 19 2019, 7:35 PM · Restricted Project
kuhar added a comment to D58369: [Dominators] Delete UpdateLevelsAfterInsertion in edge insertion of depth-based search for release builds.

As a sanity check, to account for possible bugs somewhere else, have you tried adding an assertion ensuring that levels don't change in the function?

Feb 19 2019, 6:32 AM · Restricted Project
kuhar added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

Comparing the results of opt -passes='default<O3>,function(print<domtree>)' -disable-output (llvm-link.bc,llvm-as.bc,sqlite3.bc,...) give some more confidence.

I don't expect this patch to have perceivable performance boost.. Maybe there is a bit. Ran time opt -domtree -disable-output < ~/llvm/GLLVM/bin/clang-9.bc several times with and without the patch:

Before: 1:05.27 1:05.45 1:05.46 1:06.96 1:05.25 1:04.58
After: 1:05.71 1:04.49 1:03.50 1:05.14 1:05.04 1:03.77

Feb 19 2019, 6:28 AM · Restricted Project
kuhar added a comment to D58373: [Dominators] Avoid potentially quadratic std::is_permutation.

Roots.size() is only > 1 for the PostDominatorTree, and I'd expect it to be small (< 100) on virtually all functions. is_permulation was used in the original code because it doesn't allocate and should be cheaper for small trees.

Feb 19 2019, 6:22 AM · Restricted Project
kuhar accepted D58369: [Dominators] Delete UpdateLevelsAfterInsertion in edge insertion of depth-based search for release builds.
Feb 19 2019, 6:14 AM · Restricted Project

Feb 18 2019

kuhar accepted D58349: [Dominators] Fix and optimize edge insertion of depth-based search.

Thanks for the fixes and explanations. Seems good to me now.

Feb 18 2019, 8:56 PM · Restricted Project
kuhar added a comment to D58349: [Dominators] Fix and optimize edge insertion of depth-based search.

@MaskRay, awesome, thanks for looking into this.

Feb 18 2019, 8:17 AM · Restricted Project

Feb 17 2019

kuhar added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

Another idea for benchmarking is to count the number of iterations of loops that you expect to be affected by your patches and making sure it decreases, as an indirect measure of running time improvement.

Feb 17 2019, 9:48 PM · Restricted Project
kuhar added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

Thanks for the comments, it looks amazing now.

Feb 17 2019, 9:46 PM · Restricted Project
kuhar updated subscribers of D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..
Feb 17 2019, 1:54 PM · Restricted Project
kuhar updated subscribers of D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..
Feb 17 2019, 1:54 PM · Restricted Project
kuhar requested changes to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

This looks like a fantastic improvement! The only issue is that the code, both original and after your changes, lacks comments.

Feb 17 2019, 11:03 AM · Restricted Project

Feb 14 2019

kuhar added inline comments to D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.
Feb 14 2019, 10:19 PM · Restricted Project

Feb 13 2019

kuhar added a comment to D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.

I believe that a real fix would be to always create a virtual root in DomTree and support fast rerooting, but that requires much more implementation and testing effort.

Feb 13 2019, 9:09 AM · Restricted Project
kuhar added a comment to D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.

Seems like a nice improvement!

Feb 13 2019, 9:08 AM · Restricted Project
kuhar accepted D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).

Looks better now, found just a couple of nits.

Feb 13 2019, 8:45 AM · Restricted Project

Feb 7 2019

kuhar added a comment to D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).

Nice, thanks for working on this!

Feb 7 2019, 7:57 AM · Restricted Project
kuhar accepted D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

LGTM

Feb 7 2019, 7:45 AM · Restricted Project

Feb 6 2019

kuhar added a comment to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

I've given up using DTU, now the updates are applied directly.

Max, why don't you use DTU with .applyUpdates? Are you facing some difficulties, or found documentation confusing somewhere else?
I don't have much time recently, but I can definitely find enough to fix some issues people have with updating dominators. If you think something is not obvious or too difficult, there are probably multiple people who faced the same problems but never bother to share their experience.

Feb 6 2019, 11:45 AM · Restricted Project
kuhar added a comment to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

I've given up using DTU, now the updates are applied directly.

Feb 6 2019, 11:45 AM · Restricted Project

Feb 5 2019

kuhar requested changes to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

So, is it OK to go with this fix here?

Feb 5 2019, 6:57 AM · Restricted Project

Jan 30 2019

kuhar added a comment to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

@NutshellySima What was the reason we introduced DTU.insertEdge and DTU.deleteEdge? While they are essential primitives in the incremental updater, couldn't we get rid of it entirely? My memory is blurry.

Jan 30 2019, 7:58 AM · Restricted Project
kuhar added a comment to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

It is indeed a bug in DTU, I was able to construct a unit test to reproduce it. https://bugs.llvm.org/show_bug.cgi?id=40528

Jan 30 2019, 7:57 AM · Restricted Project
kuhar added a comment to D57448: [DO NOT MERGE] DTU bug demonstration.

@mkazantsev Can you replace insert/deleteEdge with a bundle of updates using DTU.applyUpdates? Just eyeballing the code, it's very likely you are misusing the API by not respecting the contract of insertEdge/deleteEdge -- like Alina explained in the other thread.

I don't have any recent builds of LLVM ready and it would help me of you checked it, as a quick sanity check.

Jan 30 2019, 6:31 AM
kuhar added a comment to D57448: [DO NOT MERGE] DTU bug demonstration.

@mkazantsev Can you replace insert/deleteEdge with a bundle of updates using DTU.applyUpdates? Just eyeballing the code, it's very likely you are misusing the API by not respecting the contract of insertEdge/deleteEdge -- like Alina explained in the other thread.

Jan 30 2019, 6:28 AM

Jan 7 2019

kuhar added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

I've got a naive question about the DTU: Does the automatic updater handle the ImmediateDominator "automatically" for the remaining nodes in the dom tree once identify that one of the nodes in the DT has a change in IDom?

Exactly. Given a set of CFG updates, if figures out how to change the DomTree and PostDomTree such that they match the CFG.

Jan 7 2019, 1:25 PM
kuhar added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

the same bug can happen with:

  1. latch exit
  2. non-immediate successors of latchexit/otherexit.

    I have simplified test cases for each of these and we need a more general fix. Working on this.
Jan 7 2019, 12:16 PM

Jan 3 2019

kuhar added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

@anna Can this be rewritten to use the DomTreeUpdater utility?

@kuhar, I have not looked at the DTU uttility yet, but I'd think it can be rewritten (unless there are some limitations for the DomTreeUpdater utility). However, that would be a large enough change and I think it's better to do it as a separate change at a later point rather than as part of fixing this bug. Also, there are enough DT updates through out this code that it will take a while to get through the whole process of porting over for runtime unrolling.

Jan 3 2019, 3:19 PM
kuhar added a comment to D56284: [UnrollRuntime] Fix domTree failure in multiexit unrolling.

@anna Can this be rewritten to use the DomTreeUpdater utility? We have large number of examples in the codebase already, so the question is whether this affects the performance in this case. Historically, we had tons of bugs in manual DomTree updates and I'd rather we replaces as much as possible using the incremental updater.

Jan 3 2019, 3:00 PM

Nov 20 2018

kuhar added a comment to D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor..

Fundamentally, does this have to be an expensive operation? It seems like there should be some way to keep a "local" operation like this from propagating across the entire tree.

Nov 20 2018, 2:06 PM · Restricted Project
kuhar added a comment to D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor..

Though I don't have bitcode samples to reproduce your findings, I guess three times faster is caused by the DomTree is consuming a lot of time in DeleteUnreachable().

Maybe we can put some high level APIs that have these best practices inside DTU such as something like updateAfterSpliceBlocks and updateAfterChangePredecessorTo or just add a sort after running updates legalization. :)

Nov 20 2018, 11:45 AM · Restricted Project
kuhar added inline comments to D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor..
Nov 20 2018, 11:34 AM · Restricted Project
kuhar accepted D54732: [LoopUnroll] Don't verify domtree by default with +Asserts..
Nov 20 2018, 11:11 AM

Nov 19 2018

kuhar added a comment to D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor..

We experimented with different ordering of updates before, including scheduling deletions before insertions, but didn't discover any promising strategies. Because of that, updates are performed in the exact order they were scheduled (except optimizing out trivially redundant ones).

Nov 19 2018, 3:12 PM · Restricted Project
kuhar added a reviewer for D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor.: NutshellySima.
Nov 19 2018, 3:08 PM · Restricted Project

Nov 13 2018

kuhar added inline comments to D54312: [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater..
Nov 13 2018, 8:33 AM

Nov 9 2018

kuhar added inline comments to D54312: [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater..
Nov 9 2018, 7:15 AM

Nov 8 2018

kuhar accepted D47259: [IPSCCP,PM] Preserve DT in the new pass manager..

Looks good to me now, thanks for the fixes! You may want for somebody else to take a look at it, though.

Nov 8 2018, 10:04 AM
kuhar added a reviewer for D47259: [IPSCCP,PM] Preserve DT in the new pass manager.: NutshellySima.
Nov 8 2018, 7:21 AM

Oct 25 2018

kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Since bug 37929 is a regression in version 7.0, should we backport this patch into branch 7.0.1 later (maybe November) if there isn't anyone reporting issues on this patch?

Oct 25 2018, 11:26 PM
kuhar accepted D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

LGTM. Thanks for looking into this!

Oct 25 2018, 10:23 AM

Oct 24 2018

kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

I'd still like to see the constant as a hidden command line argument cl::opt(). Otherwise I think this is an excellent patch with considerable compile-time speedups. Nice work @NutshellySima .

The issue I find with cl::opt here is that this is a generic class that may not necessarily have a corresponding cpp file -- there can be multiple clients to GenericDomTreeConstruction, each with different requirements. Because of that, it's not clear to me in which translation unit one would put cl::opt for the constant in.
Is that a solved problem somewhere else in LLVM? I'm not familiar with using cl::opt in header files.

Oct 24 2018, 8:18 AM
kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

I'd still like to see the constant as a hidden command line argument cl::opt(). Otherwise I think this is an excellent patch with considerable compile-time speedups. Nice work @NutshellySima .

Oct 24 2018, 8:07 AM

Oct 16 2018

kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

@kuhar , here is the statistics for k*α>n:

Awesome, thanks for the numbers. It's much easier to reason about it this way. But I'm a bit confused by why you measured it like this. Can we have statistics with respect to nodes visited during DFS? That should be highly correlated and give us more confidence in the measured times.

I think maybe "40" is a better choice?

30 seems best to me :)

Oct 16 2018, 8:34 AM

Oct 15 2018

kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Do you think for the unittests of the incremental updater, the updater should directly apply updates on a small k (don't do recalculation with few updates) or n (don't do recalculation on a small graph)?

I think that for small n it shouldn't recalculate when k < n.

Oct 15 2018, 8:25 AM
kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

I have tested heuristic k>n/α with α=10/40/65/70/75/80/100/300/1000 on clang.bcs. (But I didn't save them to a file :\) 10/1000 produce very bad result. ("10" is better than the current implementation). 300 produces better result than 1000. 40/100 produce better ones than 10/300/1000. And 65-80 don't make much difference and produce the best results among all αs I selected.

I'd be very interested to see the numbers for these constants.

Oct 15 2018, 8:13 AM
kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

For k>n you mentioned, I think you mean k>n/α (α is a constant) because for the reproducer in 37929, the minimum value of α to make the time consumed by clang acceptable is about 3 (α should definitely be larger than 3). But selecting a higher value like "75" can also boost other inputs more (about 2x~4.5x on clang*.bcs).

OK, makes sense if that the case in the bug report.

Oct 15 2018, 7:58 AM
kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?

I haven't conducted dedicated tests on graphs with the number of vertices less than 75, but among all the domtrees recalculated in the two clang-*.bc I mentioned above, there are approximately 63% of them has an n smaller than 75.

I'd be much more comfortable if this change didn't always recalculate for graphs smaller than 75. I think there is a lot of value in making sure the incremental updater is behaving correctly on small examples (e.g., existing unit tests, most regression tests, etc.). If for some reason it stops working, it will be much easier to detect and fix it on these small examples.

Oct 15 2018, 7:07 AM

Oct 13 2018

kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).
  • For the constant "650", I used the 2nd reproducer in bug PR37929 and collected some statistics on the size of the updates after legalization and the time used by applying updates. The statistics shows that the time complexity is about Θ(k^2.4) (let k be the size of the update vector after legalization) [I have just checked the original paper, it is O(m min{n, k} + kn), so about Ω(k*(k+n)) in the case of the reproducer] and it seems that at least for a CFG size no larger than the one in PR37929, "650" is a reasonable value that the time used by recalculating is sure to be less than applying updates directly. (I think there sure should be some better heuristics deciding this.)

Thanks for the detailed explanation. This is what I expected. I don't it's enough to use 650 in the code without any explanation or justification -- even if it works in practise, I think it deserves a comment.

  • It is definitely better to get it related to the tree size but I'm lack of inputs with different CFG sizes to experiment on.

You can take intermediate bitcode files when doing full-LTO built using lld. I think the command that does the trick is --save-temps=DIR or similar.

  • I acknowledged this as an issue of the incremental updater not handling a huge update vector correctly; It definitely should fallback to recalculating if it is much faster. I believe this constant is related to the size of updates after legalization so I put it in the incremental updater. [Currently, there is a similar functionality in DTU which is Ω(n^2) and I believe it is slow and (redundant if JT submits updates normally) so I am considering removing it from DTU, then it is impossible to know the size of the legalized update vector]. Besides, it can also benefit other places not using the DomTreeUpdater.

I see. If we expect to see many redundant updates, then it makes sense to check that after legalization.

Oct 13 2018, 2:39 PM
kuhar added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Thanks for looking into this. A couple of high-level questions:

  • how was the constant "650" picked? Is it just playing with different values and choosing something that seemed reasonable?
  • why is the threshold independent of the tree size? Intuitively the policy could be to recalculate if the number of updates is greater than some multiple of size. Maybe we can refer to the average-case complexity from Lucuas' paper?
  • why do we want to put it in the incremental updater instead of the DomTreeUpdater class?
Oct 13 2018, 8:35 AM

Sep 24 2018

kuhar added inline comments to D51664: [IR] Lazily number instructions for local dominance queries.
Sep 24 2018, 10:24 AM · Restricted Project
kuhar added inline comments to D51664: [IR] Lazily number instructions for local dominance queries.
Sep 24 2018, 10:24 AM · Restricted Project

Sep 21 2018

kuhar added inline comments to D51664: [IR] Lazily number instructions for local dominance queries.
Sep 21 2018, 1:08 PM · Restricted Project

Sep 4 2018

kuhar added a comment to D51200: Introduce per-callsite inline intrinsics.

Thank you for the comments, Richard!

Sep 4 2018, 1:33 PM · Restricted Project
kuhar added a comment to D51200: Introduce per-callsite inline intrinsics.

+rjmccall for CodeGen changes

Sep 4 2018, 1:28 PM · Restricted Project

Aug 24 2018

kuhar updated the diff for D51200: Introduce per-callsite inline intrinsics.
Aug 24 2018, 5:54 PM · Restricted Project
kuhar updated the diff for D51200: Introduce per-callsite inline intrinsics.

Rename missed CallInlineKind parameters to CIK

Aug 24 2018, 5:50 PM · Restricted Project
kuhar edited reviewers for D51200: Introduce per-callsite inline intrinsics, added: Quuxplusone, amharc; removed: grosser.
Aug 24 2018, 4:08 PM · Restricted Project
kuhar added inline comments to D51200: Introduce per-callsite inline intrinsics.
Aug 24 2018, 4:02 PM · Restricted Project
kuhar updated the diff for D51200: Introduce per-callsite inline intrinsics.

Fix naming and add more tests.

Aug 24 2018, 4:02 PM · Restricted Project
kuhar added a comment to D51200: Introduce per-callsite inline intrinsics.

One extra question: do you think this deserves a separate RFC?

Aug 24 2018, 8:58 AM · Restricted Project

Aug 23 2018

kuhar added a comment to D51200: Introduce per-callsite inline intrinsics.

Disclaimer: I'm a Clang newbie and I'm not sure if that's a good way to implement these intrinsics. I'm not sure about the following things:

  • The new enum CallInlineKind may not be in the right place
  • Not sure if adding the extra parameter to EmitSomething methods is the cleanest thing possible -- some functions have it now, some don't, and it makes argument lists longer
  • Not sure if just adding an extra attribute at each callsite is enough -- I'm not sure what is supposed to happen when there are conflicting inline attributes at call sites and function declarations (e.g. __builtin_always_inline and __attribute__ ((always_inline)))
  • I don't know how to handle constexpr functions
  • I don't know how to implement it for constructor calls
Aug 23 2018, 5:34 PM · Restricted Project
kuhar created D51200: Introduce per-callsite inline intrinsics.
Aug 23 2018, 5:29 PM · Restricted Project

Aug 20 2018

kuhar added inline comments to D45299: API to update MemorySSA for cloned blocks..
Aug 20 2018, 1:54 PM

Aug 17 2018

kuhar accepted D50675: [IDF] Teach Iterated Dominance Frontier to use a snapshot CFG based on a GraphDiff..

Seems fine

Aug 17 2018, 10:11 AM

Aug 16 2018

kuhar accepted D50671: [DomTree] Add constructor to create a new DT based on current DT/CFG and a set of Updates..
Aug 16 2018, 2:19 PM

Aug 14 2018

kuhar accepted D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.
Aug 14 2018, 10:21 AM
kuhar added a comment to D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.

I'm no expert, but I thought that SmallPtrSet->erase(BB) would treat BB as an opaque value that it removes from it's set, not accessing it's content in any way. You may be right about the eraseFromParent invalidating things though.

It's certainly much better this new way :)

Aug 14 2018, 10:18 AM
kuhar added a comment to D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.

I guess because it's only erasing it from a SmallPtrSet, it's not actually _using_ it, but this looks like a very sensible change.

Aug 14 2018, 9:55 AM