- User Since
- Feb 10 2016, 9:51 AM (279 w, 5 d)
It may be too much to overload the existing APIs, since this applies only to calls (e.g. in GVNHoist, metadata is dropped for geps, this would be unnecessary). So either doing the two calls independently as in this patch or add another API that does both.
I don't have strong opinions either way.
Tue, Jun 15
Got it! Can you get some tests where some of these differences appear?
It would be a good motivator to see a concrete example; here are a couple of scenarios I can think of, but perhaps your use case is different.
- LNICM can hoist additionally where LICM doesn't. If that can happen, we should show case with a test.
- another loop nest pass, in the same LPM with LNICM, and how optimizations would differ if LICM was separated.
Mon, Jun 14
See clang-format suggestion, otherwise LGTM.
I will second @fhahn's comments. Please include follow up patches with the improvements you mention, and tests that motivate those.
Fri, Jun 4
The issue in PR31613 is a legitimate bug, since two instructions that are not congruent end up in the same congruence class based on the PredicateInfo data. Increasing the limit will not help there since the exhibited behavior in much larger apps is "out of memory", the testcase I provided is only a reduced version triggering a debuggable assertion.
Wed, May 26
This looks good to me, but please wait for the reviewers' comments from the previous patch version.
May 20 2021
May 18 2021
I believe the process is to additionally test a release after it's been cut, and potential issues that arise can be fixed and cherry-picked into that release. Flipping back the flag would probably not be a useful fix after 6+month, rather resolving the bug would be.
As an additional data point, we've had several internal releases since the flag was flipped and haven't encountered any issues. I'm not saying bugs are sure not to exist, just that there has been some fairly thorough testing done and I'd opt to fix forward rather than switch back to the MemDepAnalysis implementation.
If you have strong reasons to hold back another couple of month on this cleanup, that's an option too, I'm just outlining my reasoning for moving forward.
May 17 2021
Thank you for cleaning this up!
May 4 2021
Apr 27 2021
Thank you writing this, well written and useful doc.
No more big comments from my side, please wait to see if the others have additional comments.
Apr 24 2021
Apr 23 2021
Thanks, looks good.
Could you add the const to resolve the clang-tidy warning?
Minor typos in description.
s/no update the/no update of the
Hence I won't/Hence it won't
Apr 22 2021
Thank you! Looks good.
I think it makes sense to add the users at the upper call level where the real instruction is known. Just one comment inline.
Apr 20 2021
Sounds good, thank you!
Apr 14 2021
From my understanding so far, this is the correct approach.
I am going to need a few more days to go over the entire NewGVN pass functionality for the other patches, and I will only be back at work on Monday. Apologies for the delay on this.
Thank you for the typo fixes.
Apr 13 2021
Test added in 1e0b813fc082.
Apr 12 2021
I agree with @aeubanks on removing the NFC from the title. This is changing the functionality, even if it's under a disabled flag.
Apr 9 2021
Apr 8 2021
Performance-wise this looks good.
Please wait to see if the other reviewers have comments on this.
This is good to go from my side.
Restating summary of offline discussion:
Up until now, the repetition of simplification passes has hidden phase ordering issues or the need to rerun certain passes. Turning this flag on will improve compile time and resolve potential exponential compile-time increases, but will also reveal all the cases where the pass pipeline needs to be adjusted to not miss optimizations that were previously done on the follow-up optimization runs.
Apr 6 2021
I'm seeing a crash with this. Repro: opt -passes=dse on:
I'm running some additional tests for this patch.
Could you rebase the GVN dependent patch so I get some testing done in parallel?
Apr 5 2021
Mar 31 2021
Most performance results look ok. There are a few that are mixed (+ & -), one regression that's still investigated that so far seems architecture specific and not related to this patch, a couple of noticeable improvements and a few marginal improvements.
Please note I have only tested the new pass manager.
So, green light from my side assuming the other reviewers are also on board with this.
Mar 24 2021
Depending on what the run-time impact is in practice, the additional compile-time may or may not be worth it.
I'm running some additional testing offline on this, please give me a couple of days to test and analyze.
Mar 18 2021
Mar 17 2021
Seems reasonable to verify MSSA unconditionally at this point but if the debug build becomes too overloaded, it can be put under the VerifyMemorySSA flag, like in other loop passes.
@nikic, could you check this patch through the compiler-tracker, please?
Mar 12 2021
This change looks fine. Could you update the follow-up as well as a dependent patch?
Your point is valid and the patch looks good.
Updates look good.
Mar 11 2021
I'd have liked the thoughts of folks familiar with coroutines here, but overall the change looks like an improvement.
Mar 10 2021
Mar 9 2021
Changes looks good.
+1 on the additional testcase.
Mar 8 2021
Mar 2 2021
Feb 26 2021
Thank you for adding this, it was long overdue and it is well written!
Feb 18 2021
Runtime impact looks reasonable now. What's the compile-time impact for the patch now?
I tested this with 512 threshold and I'm only seeing a notable runtime regression in the testsuite multisource in MallocBench. I would not block on that, the stack overflow issue is a larger concern.
Feb 17 2021
Fix looks good. As a detail, perhaps move the NDEBUG inside the body of setParentLoop to keep the code cleaner?
Ack, testing in progress.
This looks good on my end, including run time testing.
Feb 16 2021
This is absolutely not the right way resolve this.
First, the restriction to not allow getting a stale analysis is very much intentional and part of the design of the new pass manager. The API being added here must not exist.
Second, it is not safe to use the stale DA. LoopUnswitch gets the LegacyDivergenceAnalysis only when making the final unswitching decision, and it does not reuse a stale instance. The same needs to happen in SimpleLoopUnswitch.
A proper solution is to change the divergence analysis pass so it can be created as an object. An example of a pass used as an object is the OptimizationRemarkEmitter.
Feb 9 2021
Feb 8 2021
Feb 5 2021
Feb 3 2021
This looks good!
Feb 2 2021
Here's what the test reduced to. To reproduce run opt -passes='loop-vectorize' test.ll with the assert in place.
target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128" target triple = "wasm32-unknown--wasm"
Feb 1 2021
I'm not familiar with the original reasoning, so here are some thoughts.
Jan 23 2021
I'm seeing crashes with this at stage3 in llvm::MemCpyOptPass::processMemCpyMemCpyDependence when invoking MSSA's Walker. I should have a repro next week.
Jan 22 2021
Please wait to see if @fhahn has additional feedback.
Thanks! Diff looks reasonable to me, but I'd like to see the use case changes as well. Could you (re-)upload a concrete use case as dependent patch?
One reason I have in mind is we may want to concretely enable this caching solely for a particular BatchAA instance, i.e. pass true to BatchAA and change the meaning in AAQI from BatchMode to CacheOffsets (and, same as now, only allow BatchAA to set this bool). Then only pass true at the BatchAA callsites where it will be used.
Jan 21 2021
My concern with this patch is in the iteration on all instructions in the blocks in cloneAndAdaptNoAliasScopes and identifyNoAliasScopesToClone.
Perhaps for LoopUnroll is not an issue, if unroll cost model is taken into account before this (i.e. unroll doesn't happen for large BBs). But when doing loop rotate, this may explode for IRs with large BBs, so I'd say some measurements would be helpful here.
For loop rotate, which instructions are cloned from the header to the preheader is known at the callsite, so as to not iterate over the entire original preheader, but I'm not sure this is sufficient.
FWIW, this also lgtm.
Thanks for the fix.
Forgot to send this nits, feel free to add in a follow up.
Adding this only seems to make sense when using BatchAA.
Could we restrict populating this cache? e.g. extend AAQueryInfo constructor to take bool BatchMode = false, and set it to true for BatchAA. Then update the get/set methods to first check this (check in the setter and assert in the getter).
Could we have at least one test showing how to pass an empty pipeline, solely for testing purposes?
AFAICT, currently all tests use the 'default', 'basic-aa' or 'tbaa, basic-aa'. Potentially add the test to test/Other/pass-pipeline-parsing.ll, which is testing 'bad' AA pipeline, or in one of the AA test folders.
Jan 14 2021
Do you have an idea if there is overhead of adding the AAR object to each query (each query info object)? (vs the setAAResults being done once for all the AAs available).
Jan 13 2021
Runtime impact results look good.
Thank you for confirming @SjoerdMeijer.
Jan 12 2021
Could you test clang as well, and update test?
I'm curious if the compile time impact is complemented by impact in run time. Looking into that for O3.
Jan 11 2021
Jan 7 2021
The changes LGTM, but I'd rather defer final review to ahatanak@.
It would be good to have some testing but it could be addressed in a separate patch, potentially by folks more familiar with the ObjCARC passes.
Jan 6 2021
Looks reasonable to me as well, that the metadata info should hold at the same loop level.
Jan 5 2021
Apologies for the delay, I spent some more time understanding how the assumptions changed in the new implementation.