Page MenuHomePhabricator

NutshellySima (Chijun Sima)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 21 2018, 10:09 PM (60 w, 5 d)

Recent Activity

Feb 28 2019

NutshellySima added a comment to D58444: Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.applyUpdates.

I finished performing functional testing using MergeBlockIntoPredecessor in JumpThreading.cpp. With this change I was able to pass make-check and test-suite. I saw no crashes and believe this to be the correct updates for DTU.

Thank you @NutshellySima for proactively fixing this. :)

Feb 28 2019, 8:49 AM · Restricted Project
NutshellySima committed rG586187639af2: Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU. (authored by NutshellySima).
Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.
Feb 28 2019, 8:48 AM

Feb 22 2019

NutshellySima abandoned D50305: [WIP!][DomTreeUpdater/Auto[1]] Preserve the DomTree in the SimplifyCFG pass.

Close along with D50302.

Feb 22 2019, 12:19 PM · Restricted Project
NutshellySima abandoned D50303: [WIP!][DomTreeUpdater-Auto[1]] Collect DomTree construction/update statistics and timing data.

Close along with D50302.

Feb 22 2019, 12:19 PM · Restricted Project
NutshellySima abandoned D50302: [WIP!][DomTreeUpdater/Auto[1]] A new Auto UpdateStrategy to test the impact of passes preserving DomTree.

Close this revision because D58170 is landed. D58170 solves the problem this patch trying to resolve.
It also improves the implementation of deduplication, which only takes 0.31s when it deduplicates and validates updates when optimizing clang-5.0.0.bc, while this patch takes 10.8 seconds to snapshot the CFG.

Feb 22 2019, 12:19 PM · Restricted Project
NutshellySima abandoned D50309: [WIP!][DomTreeUpdater-Auto[2]] Collect DomTree construction/update statistics and timing data.

Close along with D50308.

Feb 22 2019, 12:15 PM · Restricted Project
NutshellySima abandoned D50308: [WIP!][DomTreeUpdater/Auto[2]] A new Auto UpdateStrategy candidate.

Close this revision because D58170 is landed. D58170 solves the problem this patch trying to resolve.
It also improves the implementation of deduplication, which only takes 0.31s when it deduplicates and validates updates when optimizing clang-5.0.0.bc, while this patch takes 14 seconds to snapshot the CFG.

Feb 22 2019, 12:15 PM · Restricted Project
NutshellySima abandoned D50312: [WIP!][DomTreeUpdater/Auto[3]] Convert existing passes and utils to use the Auto UpdateStrategy.

Close this revision along with D50311.

Feb 22 2019, 11:02 AM · Restricted Project
NutshellySima abandoned D50313: [WIP!][DomTreeUpdater-Auto[3]] Collect DomTree construction/update statistics and timing data.

Close this revision along with D50311.

Feb 22 2019, 11:02 AM · Restricted Project
NutshellySima abandoned D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate.

Close this revision because D58170 is landed.
D58170 provides a cleaner interface than the Auto[3] strategy here as it does not require users to add extra functions calls before CFG changes. Moreover, it doesn't suffer from potential performance issues presented in Auto* implementations.

Feb 22 2019, 11:02 AM · Restricted Project
NutshellySima added a comment to D58444: Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.applyUpdates.

Thank you for looking into this. I was working on applying MergeBLockIntoPredecessor replacement in JumpThreading.cpp (via D48202) and ran into check failures that I haven't had time to properly debug. I suspect at least one of these is related to the change you made above.

Thanks for your comment! I am worried about the difficulty in discovering and debugging bugs involving incremental DT updating. I think it is indeed easy to violet the preconditions of applyUpdates() and I believe if it is a concern, a strict validation should be added to check if preconditions are fulfilled in DEBUG mode later.

@NutshellySima I've spent some time this week working to use the replacement MergeBLockIntoPredecessor in JumpThreading.cpp along with this patch. I'm still dealing with correctness issues due to slight differences in the API interface assumptions as well as the change of BB being hoisted (new) versus SinglePred being sunk (old). There are impacts to other analysis like LVI and LoopHeaders along with changes in the IR due to blocks getting different names.

I'll keep you informed if I pass all correctness tests with my changes.

Feb 22 2019, 10:46 AM · Restricted Project
NutshellySima committed rG70e97163e080: [DTU] Refine the interface and logic of applyUpdates (authored by NutshellySima).
[DTU] Refine the interface and logic of applyUpdates
Feb 22 2019, 5:50 AM
NutshellySima updated the diff for D58170: [DTU] Refine the interface and logic of applyUpdates.

Rebase.

Feb 22 2019, 5:26 AM · Restricted Project

Feb 21 2019

NutshellySima committed rGf131d6110eb1: [DTU] Deprecate insertEdge*/deleteEdge* (authored by NutshellySima).
[DTU] Deprecate insertEdge*/deleteEdge*
Feb 21 2019, 9:42 PM

Feb 20 2019

NutshellySima updated the diff for D58170: [DTU] Refine the interface and logic of applyUpdates.
  1. Rename applyUpdatesHinted to applyUpdatesPermissive; Remove applyUpdatesAccurate.
  2. Default to strict mode.
Feb 20 2019, 12:00 PM · Restricted Project
NutshellySima added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

I think the default for applyUpdates should be the strict mode; any code that relies on imprecise information about updates should make it clear by using some longer name, like applyUpdatesRelaxed.

Feb 20 2019, 10:49 AM · Restricted Project
NutshellySima added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

...
And how can I benchmark dynamic dominator tree? CalculateFromScratch is easy to benchmark as I can just repeatedly run the function analysis domtree and discard the result but I don't know how for the dynamic case...

Feb 20 2019, 10:33 AM · Restricted Project
NutshellySima added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

@NutshellySima I like the idea of the new naming scheme that makes usage more explicit.

I noticed there are no changes to JumpThreading.cpp, do they need to be altered in a separate patch or are you fine with all of applyUpdates implicitly becoming applyUpdatesHinted in Lazy mode?

I also find the name applyUpdatesHinted a bit confusing and had to read your comments above to understand Hinted meant "we check the CFG for validity". Maybe use applyUpdatesChecked or applyUpdatesValidated as an alternative name that makes it a bit more clear that extra work must be done for this set of updates.

Feb 20 2019, 10:20 AM · Restricted Project
NutshellySima added a comment to D58444: Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.applyUpdates.

Thank you for looking into this. I was working on applying MergeBLockIntoPredecessor replacement in JumpThreading.cpp (via D48202) and ran into check failures that I haven't had time to properly debug. I suspect at least one of these is related to the change you made above.

Feb 20 2019, 10:07 AM · Restricted Project
NutshellySima added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

I have also checked all code involving DTU.applyUpdates and is confident that the Updates array provided conforms to "Users need to apply updates in the same order they made to CFG for the same pair of <From, To>".

Feb 20 2019, 7:19 AM · Restricted Project
NutshellySima updated the diff for D58170: [DTU] Refine the interface and logic of applyUpdates.
  1. Add explanations and examples on why the invalid updates removing code works.
  2. Add the precondition that the order of an edge in Updates needs to be ordered as CFG changes were made.
  3. Separate applyUpdates into applyUpdatesAccurate and applyUpdatesHinted to decouple different semantics of applyUpdates from Eager/Lazy mode and enable DTU(Lazy) to accept unordered accurate updates.
Feb 20 2019, 7:08 AM · Restricted Project
NutshellySima created D58444: Make MergeBlockIntoPredecessor conformant to the precondition of calling DTU.applyUpdates.
Feb 20 2019, 5:55 AM · Restricted Project
NutshellySima created D58443: [DTU] Deprecate insertEdge*/deleteEdge*.
Feb 20 2019, 5:37 AM · Restricted Project

Feb 19 2019

NutshellySima committed rGf3d4166132aa: [DTU] Refine the document of mutation APIs [NFC] (PR40528) (authored by NutshellySima).
[DTU] Refine the document of mutation APIs [NFC] (PR40528)
Feb 19 2019, 9:49 PM
NutshellySima updated the diff for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).

Address comments.

Feb 19 2019, 9:14 PM · Restricted Project

Feb 18 2019

NutshellySima added a comment to D58327: [Dominators] Simplify and optimize path compression used in link-eval forest..

The improvement is quite fantastic!

Feb 18 2019, 9:11 AM · Restricted Project

Feb 13 2019

NutshellySima added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

I find I have assumed some extra limitations:
Users need to apply updates in the same order they made to CFG (at least for a same pair of <From, To>).
Because the only functionality the Update array applied by users is inferring the existence of a specific edge in the initial CFG.

Feb 13 2019, 8:17 PM · Restricted Project
NutshellySima added a comment to D58170: [DTU] Refine the interface and logic of applyUpdates.

Further explanation:
The motivation of this patch is that currently there can be bugs like that mentioned in summary.
But after inspecting that JT and other utils never submit updates that have been applied, we can get the information of whether a specific edge <From, To> exists in the previous CFG.
For example, if we observe a sequence of updates of an edge<From, To> to be

1. Insert --> 2. Delete --> 3. Delete --> 4. Insert --> 5. Delete

Because it is supposed that "never submit updates that have been applied", so we can infer from 1. Insert that <From, To> does not exist in the previous CFG. We can then inspect the current CFG to see if there is an edge<From, To>. If there is one, it is sure that an insertion happens. Otherwise, the 5 operations result in a no-op.

Feb 13 2019, 9:22 AM · Restricted Project
NutshellySima added a comment to D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.

This is much cleaner, I like this very much. Originally adding the method recalculate() was a compromise. With this change do we even need the public recalculate()method? Does it ever make sense to make a new entry BB that has zero successors?

Feb 13 2019, 9:15 AM · Restricted Project
NutshellySima added inline comments to D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 13 2019, 9:05 AM · Restricted Project
NutshellySima added a parent revision for D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed: D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 13 2019, 8:57 AM · Restricted Project
NutshellySima added a child revision for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528): D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.
Feb 13 2019, 8:57 AM · Restricted Project
NutshellySima created D58187: Teach DTU to recalculate DT/PDT automatically when EntryBB is changed.
Feb 13 2019, 8:56 AM · Restricted Project
NutshellySima added inline comments to D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 13 2019, 5:47 AM · Restricted Project
NutshellySima updated the diff for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
  1. Add the documents on flush APIs and clarify the precondition when flush APIs are called.
  2. Add information on recalculate().
  3. Add information on the invalid updates removal feature. (applyUpates())
  4. Remove documents on internal details which API users don't need to know about.
  5. Mark insertEdge*()/deleteEdge*() as deprecated.
  6. Address other comments.
Feb 13 2019, 5:33 AM · Restricted Project
NutshellySima created D58170: [DTU] Refine the interface and logic of applyUpdates.
Feb 13 2019, 3:43 AM · Restricted Project

Feb 12 2019

NutshellySima added inline comments to D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 12 2019, 6:30 AM · Restricted Project

Feb 11 2019

NutshellySima added inline comments to D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 11 2019, 9:14 PM · Restricted Project

Feb 7 2019

NutshellySima updated the diff for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).

Fix a typo.

Feb 7 2019, 3:49 AM · Restricted Project
NutshellySima set the repository for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528) to rG LLVM Github Monorepo.
Feb 7 2019, 3:01 AM · Restricted Project
NutshellySima updated the diff for D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 7 2019, 3:00 AM · Restricted Project
NutshellySima created D57881: [DTU] Refine the document of mutation APIs [NFC] (PR40528).
Feb 7 2019, 2:42 AM · Restricted Project

Jan 30 2019

NutshellySima added a comment to D57316: [LoopSimplifyCFG] Use DTU.applyUpdates instead of insert/deleteEdge.

@NutshellySima What was the reason we introduced DTU.insertEdge and DTU.deleteEdge? While they are essential primitives in the incremental updater, couldn't we get rid of it entirely? My memory is blurry.

Jan 30 2019, 10:35 PM · Restricted Project

Nov 20 2018

NutshellySima accepted D54732: [LoopUnroll] Don't verify domtree by default with +Asserts..

LGTM.

Nov 20 2018, 11:28 AM
NutshellySima accepted D54730: [DomTree] Fix order of domtree updates in MergeBlockIntoPredecessor..

We experimented with different ordering of updates before, including scheduling deletions before insertions, but didn't discover any promising strategies. Because of that, updates are performed in the exact order they were scheduled (except optimizing out trivially redundant ones).

The preferred API to use is currently DomTreeUpeater, but I don't think it would affect performance here.

Any thoughts @NutshellySima?

Nov 20 2018, 10:38 AM

Nov 13 2018

NutshellySima accepted D54312: [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater..

LGTM. Thanks!

Nov 13 2018, 8:31 AM

Nov 9 2018

NutshellySima added inline comments to D54312: [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater..
Nov 9 2018, 11:01 AM
NutshellySima added inline comments to D54312: [CSP, Cloning] Update DuplicateInstructionsInSplitBetween to use DomTreeUpdater..
Nov 9 2018, 10:37 AM
NutshellySima accepted D54317: [IPSCCP,PM] Preserve PDT in the new pass manager..

LGTM, thanks for the patch :)

Nov 9 2018, 9:06 AM

Nov 8 2018

NutshellySima accepted D47259: [IPSCCP,PM] Preserve DT in the new pass manager..

LGTM.

Nov 8 2018, 11:53 AM
NutshellySima added inline comments to D47259: [IPSCCP,PM] Preserve DT in the new pass manager..
Nov 8 2018, 10:24 AM

Oct 30 2018

NutshellySima updated the diff for D50300: [WIP!][Dominators] Collect DomTree construction/update statistics and timing data.
  1. Add the timer on stateful deduplication (which aims to be deprecated soon).
  2. rebase.
Oct 30 2018, 10:36 PM

Oct 27 2018

NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Since bug 37929 is a regression in version 7.0, should we backport this patch into branch 7.0.1 later (maybe November) if there isn't anyone reporting issues on this patch?

Sounds like a good idea

I recommend waiting about 2 weeks for the patch to get through all the various automated regression test frameworks out there. If there's a performance regression you'll likely hear about it within that time frame.

Oct 27 2018, 1:51 AM

Oct 25 2018

NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Since bug 37929 is a regression in version 7.0, should we backport this patch into branch 7.0.1 later (maybe November) if there isn't anyone reporting issues on this patch?

Oct 25 2018, 10:24 PM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

The constant "40" selected is based on the previous table and additional tests on "25" and "50" (see below),

αy/clangy/sqlitey/oggenc
0/baseline33.20%25.60%28.40%
2020.70%21.70%22.50%
2520.40%------
3013.10%23.10%23.70%
4013.60%24.10%24.90%
50---25.60%---
5514.20%26.00%28.00%
40 (n>=100)13.30%21.20%24.50%

I'll test n>=200/300/400 to see whether it is better.

Oct 25 2018, 10:08 AM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

The issue I find with cl::opt here is that this is a generic class that may not necessarily have a corresponding cpp file -- there can be multiple clients to GenericDomTreeConstruction, each with different requirements. Because of that, it's not clear to me in which translation unit one would put cl::opt for the constant in.
Is that a solved problem somewhere else in LLVM? I'm not familiar with using cl::opt in header files.

Fair enough. I'm more used to working on function passes that have several cl::opt() lines at the top of the .cpp file. If we don't have a corresponding cpp it's non-trivial to add it as a command-line argument. I too have never tried to use cl::opt() in a .h.

Oct 25 2018, 9:45 AM
NutshellySima updated the diff for D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).
  1. Adjust the constant to be 40.
  2. Disable recalculation when n<=100 && k<=n making the unittest of the incremental algorithm works as before.
Oct 25 2018, 8:53 AM

Oct 16 2018

NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

@kuhar , here is the statistics for k*α>n:
Let y=time used by DT.applyUpdates/Times used by all DomTree Updating operations.
raw statistics:
clang: https://paste.ubuntu.com/p/yC8sSQqbsx/
SQLite: https://paste.ubuntu.com/p/cTpqDthsNC/
oggenc: https://paste.ubuntu.com/p/4RJkCxjhkY/

αy/clangy/sqlitey/oggenc
0/always update directly33.20%24.90%28.40%
522.10%19.60%20.80%
1020.90%20.00%20.30%
2020.70%21.70%22.50%
3013.10%23.10%23.70%
4013.60%24.10%24.90%
5514.20%26.00%28.00%
6513.80%27%28.70%
7014.10%27.30%29.30%
7514.00%28.80%30.10%
8014.20%28.10%30.20%
8514.20%29.50%30.80%
10014.70%30.60%31.00%
30017.60%42.90%34.60%
100056.40%43.00%34.50%
Oct 16 2018, 6:55 AM

Oct 15 2018

NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

I have tested heuristic k>n/α with α=10/40/65/70/75/80/100/300/1000 on clang.bcs. (But I didn't save them to a file :\) 10/1000 produce very bad result. ("10" is better than the current implementation). 300 produces better result than 1000. 40/100 produce better ones than 10/300/1000. And 65-80 don't make much difference and produce the best results among all αs I selected.

I'd be very interested to see the numbers for these constants.

Oct 15 2018, 8:21 AM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

For k>n you mentioned, I think you mean k>n/α (α is a constant) because for the reproducer in 37929, the minimum value of α to make the time consumed by clang acceptable is about 3 (α should definitely be larger than 3). But selecting a higher value like "75" can also boost other inputs more (about 2x~4.5x on clang*.bcs).

OK, makes sense if that the case in the bug report.

Would you be able to produce some numbers for clang for a heuristic like this:
if (n > 100) k * a >n; else if (k > n)
with different as?

Oct 15 2018, 8:06 AM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?

I haven't conducted dedicated tests on graphs with the number of vertices less than 75, but among all the domtrees recalculated in the two clang-*.bc I mentioned above, there are approximately 63% of them has an n smaller than 75.

I'd be much more comfortable if this change didn't always recalculate for graphs smaller than 75. I think there is a lot of value in making sure the incremental updater is behaving correctly on small examples (e.g., existing unit tests, most regression tests, etc.). If for some reason it stops working, it will be much easier to detect and fix it on these small examples.

Thanks for the feedback. It makes sense to stop recalculating on small graphs but I need to have a check on whether it performs well.

In fact, the time used by directly applying updates to the domtree also depends on whether the graph is sparse or dense. It is hard to generate accurate heuristic to estimate when to recalculate the domtree or not. Because edge insertions have a complexity of O(m*min{n,k}+nk), edge deletions have a complexity of O(m*k) and recalculation has a complexity of O(n^2). (Let n denote number of vertices of the graph/m be the number of edges after updates/k be the number of updates). I think if we want to fine-tune this heuristic, we need to collect a lot of statistics and mathematically fit the function (n,m,k)->bool (whether to recalculate or not).

Since we cannot measure running time directly in any realistic scenario, I think that the best way to approach this would be to count the number of graph nodes visited during DFS walks and see how many of them we get with different heuristics.
I think that the bugs already established that we need to have some fallback heuristic. I'd suggest being careful here and trying at least a couple of different policies on different projects (maybe clang, sqlite, the lnt test suite). There are a couple of properties I propose for the heuristic:

  1. Don't always recalculate small trees (say, n < 100)
  2. Recalculate when k > n
  3. Recalculate in the case of the performance bugs we saw
  4. Visit fewer nodes than the current updater for Clang + SQLite + Oggenc + ...

For k>n you mentioned, I think you mean k>n/α (α is a constant) because for the reproducer in 37929, the minimum value of α to make the time consumed by clang acceptable is about 3 (α should definitely be larger than 3). But selecting a higher value like "75" can also boost other inputs more (about 2x~4.5x on clang*.bcs).

@kuhar , do you think we need to use another heuristic involving m? Though I think maybe using only n is sufficient? :)

Intuitively, I think that the heuristic can be some linear function: a * n + b * k > threshold, where a and b are coefficients and the threshold is a constant. I don't think it's possible to calculate m efficiently, is it?

m is the number of edges after updates. I think m can only be calculated by traversing two levels of linked lists to count the number of edges. It seems that it is O(m). I just think whether it is worthy to find some heuristic like n^2/m (I'm not sure whether it works better than the current one) and can avoid some cases when the graph is really sparse but also triggers the recalculation.

Oct 15 2018, 7:34 AM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?

Oct 15 2018, 12:58 AM

Oct 14 2018

NutshellySima updated the summary of D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).
Oct 14 2018, 7:50 AM
NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).
  • For the constant "650", I used the 2nd reproducer in bug PR37929 and collected some statistics on the size of the updates after legalization and the time used by applying updates. The statistics shows that the time complexity is about Θ(k^2.4) (let k be the size of the update vector after legalization) [I have just checked the original paper, it is O(m min{n, k} + kn), so about Ω(k*(k+n)) in the case of the reproducer] and it seems that at least for a CFG size no larger than the one in PR37929, "650" is a reasonable value that the time used by recalculating is sure to be less than applying updates directly. (I think there sure should be some better heuristics deciding this.)

Thanks for the detailed explanation. This is what I expected. I don't it's enough to use 650 in the code without any explanation or justification -- even if it works in practise, I think it deserves a comment.

  • It is definitely better to get it related to the tree size but I'm lack of inputs with different CFG sizes to experiment on.

You can take intermediate bitcode files when doing full-LTO built using lld. I think the command that does the trick is --save-temps=DIR or similar.

  • I acknowledged this as an issue of the incremental updater not handling a huge update vector correctly; It definitely should fallback to recalculating if it is much faster. I believe this constant is related to the size of updates after legalization so I put it in the incremental updater. [Currently, there is a similar functionality in DTU which is Ω(n^2) and I believe it is slow and (redundant if JT submits updates normally) so I am considering removing it from DTU, then it is impossible to know the size of the legalized update vector]. Besides, it can also benefit other places not using the DomTreeUpdater.

I see. If we expect to see many redundant updates, then it makes sense to check that after legalization.

How about this heuristic instead: since we expect the complexity of a batch update to be O(k^2 + nk) and full recomputation is O(n^2) worst case, we can fallback to recomputation when k > n. Size of the tree should be accessible with just DT.size(), and it seems like a conservative choice we could improve later. What do you think?

Oct 14 2018, 7:31 AM
NutshellySima updated the diff for D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Make the threshold selected proportional to the domtree size.

Oct 14 2018, 7:25 AM

Oct 13 2018

NutshellySima added a comment to D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).

Thanks for looking into this. A couple of high-level questions:

  • how was the constant "650" picked? Is it just playing with different values and choosing something that seemed reasonable?
  • why is the threshold independent of the tree size? Intuitively the policy could be to recalculate if the number of updates is greater than some multiple of size. Maybe we can refer to the average-case complexity from Lucuas' paper?
  • why do we want to put it in the incremental updater instead of the DomTreeUpdater class?
Oct 13 2018, 9:24 AM
NutshellySima created D53245: Teach the DominatorTree fallback to recalculation when applying updates to speedup JT (PR37929).
Oct 13 2018, 7:04 AM

Aug 16 2018

NutshellySima accepted D48790: Update MemorySSA in Local utils removing blocks..

Thanks for the change.
Please have a look at my new inline comments. (I have a quick glance at the source code of MemorySSA and I think it's fine; but it's better for you to check that).
LGTM.

Aug 16 2018, 12:05 PM
NutshellySima added a comment to D48790: Update MemorySSA in Local utils removing blocks..

Thanks for the improvement! Comments are inline.

Aug 16 2018, 4:32 AM

Aug 15 2018

NutshellySima added a comment to D48790: Update MemorySSA in Local utils removing blocks..

Looks like a nice improvement.

Aug 15 2018, 8:21 AM

Aug 14 2018

NutshellySima retitled D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion from [NFC][SimplifyCFG] Remove pointer from SmallPtrSet before deletion to [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.
Aug 14 2018, 11:44 PM
NutshellySima retitled D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion from [SimplifyCFG] Fix BasicBlock use-after-free to [NFC][SimplifyCFG] Remove pointer from SmallPtrSet before deletion.
Aug 14 2018, 9:54 AM
NutshellySima added a comment to D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.

LGTM

I guess because it's only erasing it from a SmallPtrSet, it's not actually _using_ it, but this looks like a very sensible change.

Aug 14 2018, 9:50 AM
NutshellySima created D50717: [SimplifyCFG] Remove pointer from SmallPtrSet before deletion.
Aug 14 2018, 9:23 AM

Aug 11 2018

NutshellySima edited P8098 [WIP!][DomTreeUpdater] Benchmarks on Lazy UpdateStrategy.
Aug 11 2018, 5:11 AM

Aug 7 2018

NutshellySima edited P8097 [WIP!][DomTreeUpdater] Benchmarks on preserving DT in SimplifyCFG pass.
Aug 7 2018, 9:58 AM
NutshellySima added a comment to D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate.

Benchmark results on Auto UpdateStrategy 3:
https://reviews.llvm.org/P8101

Aug 7 2018, 9:51 AM · Restricted Project
NutshellySima created P8101 [WIP!][DomTreeUpdater] Benchmarks on Auto UpdateStrategy 3.
Aug 7 2018, 9:50 AM
NutshellySima added a comment to D50308: [WIP!][DomTreeUpdater/Auto[2]] A new Auto UpdateStrategy candidate.

Some benchmarks on Auto UpdateStrategy 2:
https://reviews.llvm.org/P8100

Aug 7 2018, 9:49 AM · Restricted Project
NutshellySima created P8100 [WIP!][DomTreeUpdater] Benchmarks on Auto UpdateStrategy 2.
Aug 7 2018, 9:49 AM
NutshellySima added a comment to D50302: [WIP!][DomTreeUpdater/Auto[1]] A new Auto UpdateStrategy to test the impact of passes preserving DomTree.

Some benchmarks on Auto UpdateStrategy 1:
https://reviews.llvm.org/P8099

Aug 7 2018, 9:48 AM · Restricted Project
NutshellySima created P8099 [WIP!][DomTreeUpdater] Benchmarks on Auto UpdateStrategy 1.
Aug 7 2018, 9:47 AM
NutshellySima added a comment to D50300: [WIP!][Dominators] Collect DomTree construction/update statistics and timing data.

Some timing benchmarks on Lazy UpdateStrategy:
https://reviews.llvm.org/P8098

Aug 7 2018, 9:45 AM
NutshellySima created P8098 [WIP!][DomTreeUpdater] Benchmarks on Lazy UpdateStrategy.
Aug 7 2018, 9:45 AM
NutshellySima added a comment to D50305: [WIP!][DomTreeUpdater/Auto[1]] Preserve the DomTree in the SimplifyCFG pass.

Benchmark results: https://reviews.llvm.org/P8097

Aug 7 2018, 9:41 AM · Restricted Project
NutshellySima created P8097 [WIP!][DomTreeUpdater] Benchmarks on preserving DT in SimplifyCFG pass.
Aug 7 2018, 9:40 AM

Aug 5 2018

NutshellySima retitled D50302: [WIP!][DomTreeUpdater/Auto[1]] A new Auto UpdateStrategy to test the impact of passes preserving DomTree from [WIP!][Dominators] A new Auto UpdateStrategy to test the impact of passes preserving DomTree to [WIP!][DomTreeUpdater/Auto[1]] A new Auto UpdateStrategy to test the impact of passes preserving DomTree.
Aug 5 2018, 8:31 AM · Restricted Project
NutshellySima retitled D50305: [WIP!][DomTreeUpdater/Auto[1]] Preserve the DomTree in the SimplifyCFG pass from [WIP!][SimplifyCFG] Preserve the DomTree in the SimplifyCFG pass to [WIP!][DomTreeUpdater/Auto[1]] Preserve the DomTree in the SimplifyCFG pass.
Aug 5 2018, 8:31 AM · Restricted Project
NutshellySima added a child revision for D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate: D50313: [WIP!][DomTreeUpdater-Auto[3]] Collect DomTree construction/update statistics and timing data.
Aug 5 2018, 8:28 AM · Restricted Project
NutshellySima added a parent revision for D50313: [WIP!][DomTreeUpdater-Auto[3]] Collect DomTree construction/update statistics and timing data: D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate.
Aug 5 2018, 8:28 AM · Restricted Project
NutshellySima created D50313: [WIP!][DomTreeUpdater-Auto[3]] Collect DomTree construction/update statistics and timing data.
Aug 5 2018, 8:27 AM · Restricted Project
NutshellySima added a parent revision for D50312: [WIP!][DomTreeUpdater/Auto[3]] Convert existing passes and utils to use the Auto UpdateStrategy: D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate.
Aug 5 2018, 8:21 AM · Restricted Project
NutshellySima added a child revision for D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate: D50312: [WIP!][DomTreeUpdater/Auto[3]] Convert existing passes and utils to use the Auto UpdateStrategy.
Aug 5 2018, 8:21 AM · Restricted Project
NutshellySima created D50312: [WIP!][DomTreeUpdater/Auto[3]] Convert existing passes and utils to use the Auto UpdateStrategy.
Aug 5 2018, 8:21 AM · Restricted Project
NutshellySima created D50311: [WIP!][DomTreeUpdater/Auto[3]] A new Auto UpdateStrategy candidate.
Aug 5 2018, 8:16 AM · Restricted Project
NutshellySima added a child revision for D50308: [WIP!][DomTreeUpdater/Auto[2]] A new Auto UpdateStrategy candidate: D50309: [WIP!][DomTreeUpdater-Auto[2]] Collect DomTree construction/update statistics and timing data.
Aug 5 2018, 8:11 AM · Restricted Project
NutshellySima added a parent revision for D50309: [WIP!][DomTreeUpdater-Auto[2]] Collect DomTree construction/update statistics and timing data: D50308: [WIP!][DomTreeUpdater/Auto[2]] A new Auto UpdateStrategy candidate.
Aug 5 2018, 8:11 AM · Restricted Project
NutshellySima created D50309: [WIP!][DomTreeUpdater-Auto[2]] Collect DomTree construction/update statistics and timing data.
Aug 5 2018, 7:55 AM · Restricted Project
NutshellySima created D50308: [WIP!][DomTreeUpdater/Auto[2]] A new Auto UpdateStrategy candidate.
Aug 5 2018, 7:51 AM · Restricted Project
NutshellySima added a child revision for D50302: [WIP!][DomTreeUpdater/Auto[1]] A new Auto UpdateStrategy to test the impact of passes preserving DomTree: D50305: [WIP!][DomTreeUpdater/Auto[1]] Preserve the DomTree in the SimplifyCFG pass.
Aug 5 2018, 4:22 AM · Restricted Project